Controller for Optimizing Motion Trajectory to Control Motion of One or More Devices
20230074148 · 2023-03-09
Assignee
- Mitsubishi Electric Research Laboratories, Inc. (Cambridge, MA)
- Mitsubishi Electric Corporation (Chiyoda ku, JP)
Inventors
- Stefano Di Cairano (Newton, MA)
- Ankush Chakrabarty (Bedford, MA, US)
- Rien Quirynen (Somerville, MA, US)
- Mohit Srinivasan (Atlanta, GA, US)
- Nobuyuki Yoshikawa (Tokyo, JP)
- Toshisada MARIYAMA (Tokyo, JP)
Cpc classification
G06F18/214
PHYSICS
G05D1/106
PHYSICS
G05D1/0088
PHYSICS
G06N5/01
PHYSICS
International classification
G05D1/00
PHYSICS
Abstract
A controller for controlling a motion of at least one device subject to constraints on the motion, is disclosed. The controller comprises a processor and a memory, where the controller inputs parameters of the task including the state of the at least one device to a neural network trained to output an estimated motion trajectory for performing the task. Further, the controller extracts at least some of the integer values of a solution to a mixed-integer optimization problem for planning an execution of the task that results in the estimated motion trajectory. Further, the controller solves the mixed-integer optimization problem for the parameters of the task with corresponding integer values fixed to the extracted integer values to produce an optimized motion trajectory subject to the constraint and changes the state of the at least one device to track the optimized motion trajectory.
Claims
1. A controller, for controlling a motion of at least one device, performing a task of changing a state of the at least one device, wherein the state of the at least one device includes at least a position of the at least one device, subject to a constraint on the motion of the at least one device, the controller comprising: a processor; and a memory having instructions stored thereon that, when executed by the processor, cause the controller to: input parameters of the task including the state of the at least one device to a neural network trained to output an estimated motion trajectory for performing the task; extract at least some of integer values of a solution to a mixed-integer optimization problem for planning an execution of the task that results in the estimated motion trajectory; solve the mixed-integer optimization problem for the parameters of the task with corresponding integer values fixed to the extracted integer values to produce an optimized motion trajectory subject to the constraint; and change the state of the at least one device to track to the optimized motion trajectory.
2. The controller of claim 1, wherein the parameters of the task include one or a combination of an initial position of the at least one device, a target position of the at least one device, a geometrical configuration of at least one stationary obstacle defining at least a part of the constraint, geometrical configuration and motion of moving obstacles defining at least a part of the constraint.
3. The controller of claim 1, wherein the at least one device is one of at least autonomous vehicles, mobile robots, aerial drones, ground vehicles, aerial vehicles, water surface vehicles or underwater vehicles.
4. The controller of claim 1, wherein to extract the at least some of the integer values, the processor is configured to: formulate a mixed-integer optimization problem to search a global optimal solution within a search space determined by the constraint; and evaluate an admissibility of a value of at least some integer variables partitioning the search space of the mixed-integer optimization based on the estimated motion trajectory for performing the task which is output of the neural network.
5. The controller of claim 4, wherein the processor is further configured to: formulate a mixed-integer optimization problem, wherein at least some of the integer variables are binary, wherein the at least some of the integer variables are associated with groups of disjunctive constraints of real valued variables for avoidance of collisions between the at least one device and obstacles, and among multiple devices including the at least one device, and wherein when one of the constraints of real-valued variables in the groups of disjunctive constraints is satisfied at a specific time instant, the collision with the corresponding obstacle or another device of the multiple devices is avoided at the specific instant.
6. The controller of claim 5, wherein the processor is further configured to test the admissibility of values of at least some integer variables of the mixed-integer optimization problem, by determining a membership of at least some parts of the estimated motion trajectory to regions determined by the constraints of real-valued variables defining the obstacles or the other device of the multiple devices.
7. The controller of claim 6, wherein the region of the obstacles and the at least one device are defined by one or more constraints of real-valued variables that are linear in state variables, and the parameters of the task; wherein each constraint, of the one or more constraints of real-valued variables, defining the region of the obstacles is associated with at least one binary variable indicating whether a constraint is satisfied at a specific time instant; and wherein the processor is further configured to test the admissibility of a value of the at least one binary variable associated with the constraint by determining whether the estimated motion trajectory satisfies the constraint to which the integer variable is associated at the specific time instant.
8. The controller of claim 1, wherein the processor is further configured to: transform the mixed-integer optimization problem into a real-valued optimization problem by fixing the integer variables of the mixed-integer optimization problem to the extracted integer values; and determine the solution of the mixed-integer optimization problem from a solution of the real-valued optimization problem by adding fixed values of the integer variables.
9. The controller of claim 1, wherein the processor is further configured to update the optimized motion trajectory by solving the mixed-integer optimization problem warm-started based on the optimized motion trajectory.
10. The controller of claim 1, wherein the processor is further configured to update the optimized motion trajectory by solving the mixed-integer optimization problem warm-started based on the optimized motion trajectory in response to detection of time availability before an instant at which the optimized motion trajectory computation is to be completed.
11. The controller of claim 1, wherein the processor is further configured to: update the optimized motion trajectory by refining the mixed-integer optimization problem from a warm-start based on the optimized motion trajectory, wherein the optimized motion trajectory is updated within an available time period before an instant at which the optimized motion trajectory computation needs to be completed; and produce the optimized motion trajectory from the solution with a lowest cost computed within the available time period.
12. The controller of claim 1, wherein the processor is further configured to: detect infeasibility of the estimated motion trajectory as an infeasible estimated motion trajectory; update the parameters of the task by changing the state of the at least one device according to a portion of the infeasible estimated motion trajectory; and estimate the optimized motion trajectory for the updated parameters of the task.
13. The controller of claim 12, wherein the infeasibility of the estimated motion trajectory is detected by evaluating whether the estimated motion trajectory causes at least one collision between multiple devices including the at least one device or between a device of the at least one device and an obstacle.
14. The controller of claim 1, wherein the neural network is trained with a reconstruction loss function modified with a barrier function to increase a value of the reconstruction loss function based on a distance of the state of the at least one device to a region where the constraints on the motion of the at least one device is violated.
15. The controller of claim 14, wherein the barrier function has: a small value at a large distance from the region where the constraint is violated, a large value in the region where the constraint is violated, wherein the barrier function grows rapidly from the small value to the large value in a region at a small distance from the region where the constraint is violated, and wherein the barrier function is non-decreasing with respect to the distance of the area where the constraint is violated.
16. The controller of claim 12, wherein the neural network is trained with training data including samples on a search space of the parameters of the task labeled with an optimal solution of the mixed-integer optimization problem, wherein the samples include a set of regular samples and a set of biased samples, and wherein the set of biased samples are biased towards the constraint.
17. The controller of claim 1, wherein the neural network is a probabilistic neural network.
18. The controller of claim 1, wherein the processor is an embedded processing unit (EPU).
19. A method for controlling a motion of at least one device performing a task of changing a state of the at least one device subject to a constraint on the motion of the at least one device, wherein the method uses a processor coupled with stored instructions implementing the method, wherein the instructions, when executed by the processor carry out steps of the method, comprising: inputting parameters of the task including the state of the at least one device to a neural network trained to output an estimated motion trajectory for performing the task; extracting at least some of integer values of a solution to a mixed-integer optimization problem for planning an execution of the task that results in the estimated motion trajectory; solving the mixed-integer optimization problem for the parameters of the task with corresponding integer values fixed to the extracted integer values to produce an optimized motion trajectory subject to the constraint; and changing the state of the at least one device to track to the optimized motion trajectory.
20. A non-transitory computer-readable storage medium embodied thereon a program executable by a processor for performing a method for controlling a motion of at least one device performing a task changing a state of the at least one device subject to a constraint on the motion of the at least one device, the method comprising: inputting parameters of the task including the state of the at least one device to a neural network trained to output an estimated motion trajectory for performing the task; extracting at least some of integer values of a solution to a mixed-integer optimization problem for planning an execution of the task that results in the estimated motion trajectory; solving the mixed-integer optimization problem for the parameters of the task with corresponding integer values fixed to the extracted integer values to produce an optimized motion trajectory subject to the constraint; and changing the state of the at least one device to track to the optimized motion trajectory.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
DETAILED DESCRIPTION
[0038] The embodiments of the present disclosure disclose a system for generating a motion plan of one or multiple autonomous devices so that each device achieves its corresponding goal, represented by a target position. Furthermore, the motion plan must avoid collision of each device with obstacles in the environment, and between any two different devices at all times in a future planning time horizon. Example of devices include autonomous ground vehicles, such as cars or robots in factory automation, aircraft on airport surfaces, and unmanned aerial vehicles such as drones for infrastructure monitoring or inspections.
[0039]
[0040] Further, based on the information transmitted by the drone 100, a controller 111 is configured to control motion of the drone 100 by computing a motion plan for the drone 100. The motion plan for the drone 100 may comprise one or more trajectories to be traveled by the drone. In some embodiments, there are one or multiple devices (drones such as the drone 100 as shown in
[0041] The controller 111 controls the motion of the drone 100 performing a task by changing a state of the one drone 100, where the state of the drone 100 may include a position of the drone 100 subject to constraints on the motion of the drone 100. The controller 111 controls the motion of the drone 100 by using a neural network 112 limiting the search space for searching for the motion trajectory according to some embodiments. The neural network 112 may be a probabilistic or deterministic neural network.
[0042] In different embodiments, the controller 111 obtains parameters of the task from the drone 100 and/or remote server(not shown). The parameters of the task include the state of the drone 100, but may include more information. In some embodiments, the parameters may include one or a combination of an initial position of the drone 100, a target position of the drone 100, a geometrical configuration of one or multiple stationary obstacles defining at least a part of the constraint, geometrical configuration, and motion of moving obstacles defining at least a part of the constraint. The parameters are submitted to the neural network 112 to obtain an estimated motion trajectory for performing the task, where the neural network 112 is trained to output the estimated motion trajectory for performing the task.
[0043] Further, based on the input parameters, the neural network 112 may extract at least some of the integer values of a solution to the mixed-integer optimization problem for planning an execution of the task that results in the estimated motion trajectory. The mixed-integer optimization problem is solved for the parameters of the task with corresponding integer values fixed to the extracted integer values to produce an optimized motion trajectory subject to the constraints. Accordingly, the task of changing the state of the drone 100 is performed to track the optimized motion trajectory. In such a manner, the neural network 112 limits the search space for finding motion trajectories, but the optimal motion trajectory is determined by the controller 111. As a result, such a combination allows finding a feasible and/or optimal motion trajectory for the drone 100 with reduced computation and memory resources.
[0044]
[0045] Method 199, upon receiving parameters of the task including the state of the device, inputs 110 the parameters to a neural network 112 trained to output an estimated motion trajectory 115 for performing the task. Method 199 extracts 120 at least some of integer values 125 of a solution to a mixed-integer optimization problem for planning an execution of the task that results in the estimated motion trajectory and solves 130 the mixed-integer optimization problem for the parameters of the task with corresponding integer values fixed to the extracted integer values 125 to produce an optimized motion trajectory 135 subject to the constraint.
[0046] Next, controller 111 commands the device 100 to change 140 its state according to the optimized motion trajectory 135 thereby performing the task. For example, the controller can determine a control input to an actuator of the device 100 to track the motion trajectory. Examples of the controller generating the control input can be the proportional-integral-derivative controller, model predictive controller, reinforcement learning controller, and the like.
[0047] Notably, some embodiments train the neural network to estimate the trajectory 115, not the integer values 125. Some embodiments are based on the realization that doing in such a manner would simplify the training that produces a feasible result.
Multi-Device Motion Planning
[0048]
[0049]
[0050]
[0051] In some implementations, the memory 403 may include a plurality of sections, where a first section stores data about each of the devices 302, a second section stores one or more computer readable programs for computing motion plan of each device 302, a third section stores parameters of a machine learning (ML) module for predicting a solution of the motion planning problem, and a fourth section stores data about environment of surroundings of one or more devices 302. In some embodiments, the memory 403 may store algorithms associated with the neural network 112. The hardware processor 401 and the memory 403 form a controller (e.g., the controller 111) that controls motion of one or more devices performing a task of changing a state of the one or more devices. The state of the at least one device includes a position of the one or more devices, subject to the constraints on the motion of the one or more devices.
Exemplar Formulation
[0052] A motion planning problem, such as the multi-device planning system 301 for the optimal solution for coordinating the motion of the one or more devices 302 may exploit mixed-integer programming In some embodiments, the motion model of each device i=1, . . . , n.sub.a, may be formulated as the discrete-time dynamic system
x.sub.k+1.sup.i=A.sup.ix.sub.k.sup.i+B.sup.iu.sub.k.sup.i,
z.sub.k.sup.i=C.sup.ix.sub.k.sup.i+D.sup.iu.sub.k.sup.i, (1)
where k is the discrete-time step counter, x.sup.i is the i-th device state vector, u.sup.i is the i-th device command vector, z.sup.i is the i-th device configuration vector which includes at least the position of the device, and A.sup.i, B.sup.i, C.sup.i, D.sup.i are matrices describing the behavior of the i-th device in response to a value of the command vector, applied at a particular system state.
[0053] The initial state, physical, and functional limitations of the devices are formulated as real-valued constraints (illustrated in
x.sub.0.sup.i=x.sup.i(t) (2a)
u.sub.k.sup.i∈ (2b)
x.sub.k.sup.i∈ (2c)
where x.sup.i(t) is the state of the i-th device at the current time t, is the set of admissible inputs of the i-th device, and
is the set of admissible states of the i-th device. The sets
maybe polyhedra, which are determined by affine constraints, each of the constraints representing a halfspace.
[0054] The collision avoidance of each device i=1, . . . , n.sub.a with each obstacle j=1, . . . , n.sub.o can be formulated as mixed-integer constraints
where M is the big-M constant which is larger than any admissible value for the inequalities, b.sub.k.sup.i,j is a vector of binary variables, [b.sub.k.sup.i,j].sub.h denotes the h-th component of such vector, and H.sub.o.sup.j, K.sub.o.sup.j are a matrix and a vector that describe the region
H.sub.o.sup.j(z−ϑ.sup.j)≤K.sub.o.sup.j (4)
occupied by the j-th obstacle with respect to the obstacle center ϑ.sup.j, n.sub.j.sup.o is the number of elements in K.sub.o.sup.j, i.e., the number of inequality constraints in (4).
[0055]
[0056] Similarly, collision avoidance constraints of device i=1, . . . , n.sub.a, with each other device l=1, . . . , n.sub.a different from i≠l, can be formulated as mixed-integer constraints similar to constraints (3) as follows
where d.sub.k.sup.i,j is a vector of binary variables, and H.sub.r.sup.l, K.sub.r.sup.l are a matrix and a vector that describe the region
H.sub.r.sup.l(z.sub.k.sup.i−z.sub.k.sup.l)≤K.sub.r.sup.l (5a)
where collision between i-th and l-th device occurs, and n.sub.l.sup.r is the number of elements in K.sub.r.sup.l, i.e., the number of inequality constraints in (5a).
[0057] The collision avoidance constraints between the device 502 and the obstacle 501 and between the device 502 and another device are constructed from a group of real-valued constraints determining the region of the obstacle 501 or other device as the intersection of the corresponding regions 504 where the constraints are satisfied. Further, each constraint of the group of real-valued constraints is associated with at least one binary variable indicating whether the constraint is satisfied at a specific time instant.
[0058] To construct the collision avoidance constraints, the complements 503 of the region 504 are formulated as constraints and to each constraint, a binary variable is added to obtain mixed-integer disjunctive constraints. A disjunctive constraint composed of a group of real-valued constraints is satisfied if at least one of the constraints in the group of the constraints is satisfied. Thus, if one real-valued constraint in the group of constraints defining avoidance to the obstacle 501 is satisfied, then the other constraints may be satisfied or violated. Disjunctive constraints can be formulated by mixed-integer constraints by collecting the real-valued constraints, adding an integer variable to each in a way so that it takes value 1 if the constraint is satisfied, and imposing that at least one variable in the group takes value 1, as done in constraints (3), (5).
The goal-reaching condition
∥x.sub.k.sup.i−x.sub.goal.sup.i∥.sub.1≤d can also be implemented by mixed integer constraints:
where g.sub.k.sup.i is a binary variable and x.sub.goal.sup.i is the goal state for the i-th device, which ensures that the g.sub.k.sup.i binary variables can be 1 only if the goal is reached at time step k and, for each device i=1, . . . , n.sub.a, only one binary variable g.sub.k.sup.i can be equal to 1 for a particular time step k, where 0≤k≤T, ensuring the goal will be reached at some time step k smaller than the future planning time horizon length T.
[0059] The objective function or cost function determines the performance metrics that should be optimized by the selected trajectories for all the devices. In some embodiments, the cost function can be set as
where the first term favors moving quickly to the goal, and the second term favors minimizing the energy for going towards the goal for each device i=1, . . . , n.sub.a at each time step k=1, . . . , T.
[0060] By collecting the cost function (7), the motion model (1), and the constraints (2), (3), (5), (6) a mixed-integer linear program (MILP) is obtained:
where c is the cost function vector, H, K are the constraint matrix and vector, x is the vector of variables whose components with indices h∈V.sub.r are real valued, while those with indices h∈V.sub.b are binary (i.e., 0-1) valued. The components with indices h∈V.sub.r include the state, input, and configuration vectors, x.sub.k.sup.i, u.sub.k.sup.i, z.sub.k.sup.i for all devices i=1, . . . , n.sub.a at each time step k=1, . . . , T, and possibly a vector of additional auxiliary variables y, for example, including auxiliary variables for representing the l-norm in the cost function (7). The components with indices h∈V.sub.b include the binary variables for collision avoidance with obstacles, b.sub.k.sup.i,j, and for collision avoidance between devices, d.sub.k.sup.i,l, and the goal-reaching binary variables g.sub.k.sup.i, for all devices i=1, . . . , n.sub.a at each time step k=1, . . . , T.
[0061] Other embodiments of the present disclosure change cost function (7) or alternatively contain quadratic terms, resulting in a mixed-integer quadratic program (MIQP), with the same constraints as in (8), but with quadratic cost function ξ.sup.TQξ+c.sup.Tξ. For example, in some embodiments, the quadratic cost function may include a quadratic penalty on control inputs or distance from the target at each time instant.
[0062] The MIP problem (8) may be solved by a branch and bound method described in
[0063]
[0064] In case an integer feasible solution is found at some node (e.g., a node 605), it is compared with the stored integer solution. In case it is found to be a better solution, i.e., it has a smaller cost, the new solution can replace the stored solution, otherwise, it is discarded. In both cases, further exploration from that node stops 606. In case the solution of a relaxed problem is found to be worse at any node (e.g., a node 607), i.e., it has a higher cost, than the stored integer solution, further exploration from that solution stops 608.
[0065] Accordingly, the solution of the MIP problem (8) obtained for instance by the branch and bound algorithm shown in
[0066] In order to remove such issues, some embodiments are based on ML techniques as described further with reference to
[0067] Some embodiments are based on the realization that there are several cases in which the approach described with respect to
[0068]
[0069]
[0070]
[0071] In operation, an expected behavior of a continuous prediction model may be obtained by an ML module (e.g., the ML module 704) using a neural network. For initial conditions, strongly to a left side 901 and strongly to a right side 902 of a center of the obstacle 906, the ML module may produce a correct result of circling to the left side 903 and to the right side 904 towards the goal position 907. But for a region of intermediate values 905, a resulting trajectory may not bend enough. Hence, the resulting trajectory will go through the obstacle, causing a collision. Therefore, the solution of the approach as described with reference to
[0072] Some embodiments are based on further realization that existing alternative approaches only use the ML modules to initialize an MIP solver achieving a warm starting. A schematic of such an approach is illustrated in
[0073]
[0074] Fast Computation of Solution of Multi-Device Motion Planning Problems by Partial Variable Fixing
[0075] In the present disclosure, some embodiments are based on the realization that trajectories for the multi-device motion planning problem with limited computational and memory burden are obtained by leveraging an ML module to fix only part of the variables of the mixed-integer optimization problem. Specifically, only some or all of the integer variables are fixed, because the number of relaxed problems that need to be solved in the branch and bound tree of
[0076] In some embodiments, the value of some or all of the integer variables are extracted from an ML module that predicts the trajectories of the one or more devices, yet the ML module does not explicitly predict the integer variables. In some embodiments, after the integer variables are fixed, desirable trajectories for the multi-device motion planning problem are obtained by solving a single real-valued optimization problem. The solution of the real-valued optimization problem can be obtained using standard methods for real-valued mathematical programming, such as an interior point method, a gradient-based method, alternating direction method of multipliers (ADMM), active set method, or simplex method. In some embodiments, a fixed number of real-valued optimization problems is solved for different future planning time horizon lengths, in order to compute a time-optimal trajectory for each of the devices. In some embodiments, only some of the integer variables are fixed, such that a reduced MIP problem needs to be solved to compute the trajectories for each of the devices, resulting in a considerable reduction of the number of relaxed problem solutions and therefore a considerable reduction in the computational and memory requirements of the multi-device motion planning Finally, in the present disclosure, some embodiments are based on the realization that a success rate of the proposed method is increased by training the ML module favoring the generation of trajectories that are close-to-feasible, as opposed to close-to-optimal, which can be accomplished by focused sampling and by modifying the loss function used for training to include barrier terms for collision avoidance.
[0077] Additionally, some embodiments are based on the realization that a repeated evaluation of the ML module initialized at different predicted steps is more likely to provide a feasible solution to the multi-device motion planning problem. The latter is due to the prediction model producing an approximated trajectory so that if such approximated trajectory is predicted to experience a collision at some future step, re-evaluating the ML module from a future predicted step before the collision will return a differently approximated trajectory which is likely to avoid one or multiple of the previously predicted collisions. Further, operation of the system based on the realizations of the present disclosure is illustrated in
[0078]
[0079] Some embodiments are based on the realization that the configuration, i.e., the data generation 1101 and the training module 1102, can be performed offline and do not need to be implemented in an embedded processing unit (EPU) and therefore these components do not affect the computational and memory resources that are required for multi-device motion planning During operation, a scenario state is received 1103, and the ML module 1104 computes predicted trajectories 1105 from which a problem reduction module 1106 extracts variables such that some or all the integer variables are fixed. Based on the predicted trajectories, motions of one or multiple devices are planned such that the one or multiple devices do not collide with either obstacles or each other.
[0080] To extract at least some of the integer values, the problem reduction module 1106 may formulate the mixed-integer optimization problem that searches for a global optimal solution within a search space determined by the constraints defined by equations (2) to (6). The mixed-integer optimization problem is further transformed into a real-valued optimization problem by fixing integer variables of the mixed-integer optimization problem to the integer values extracted by the problem reduction module 1106. Further, a solution of the mixed-integer optimization problem is determined from the solution of the real-valued optimization problem by adding the fixed values of the integer variables.
[0081] In the formulation of the mixed-integer optimization problem, at least some of the integer variables are binary. Further, at least some of the integer variables are associated with groups of disjunctive constraints of real-valued variables for the avoidance of collisions between the one or multiple devices and obstacles, and among the one or multiple devices. When one of the constraints of real-valued variables in the groups of disjunctive constraints is satisfied at a specific time instant, the collision with the corresponding obstacle or another device of the one or multiple devices is avoided at the specific instant.
[0082] Further, an admissibility of a value of at least some integer variables partitioning the search space of the mixed-integer optimization is evaluated based on the predicted (or estimated) motion trajectory 1105 for performing the task of changing the state of the one or multiple devices. The admissibility of values of the at least some integer variables of the mixed-integer optimization problem is tested by determining a membership of at least some parts of the estimated motion trajectory 1105 to regions determined by the constraints of real-valued variables. The constraints of the real-valued variables define the obstacles or the other devices of the one or multiple devices.
[0083] In some embodiments, the region of the obstacles and the one or multiple devices are defined by one or more constraints of real-valued variables and parameters of the task. The one or more constraints of the real-valued variables are linear in state variables. Further, each constraint of the one or more constraints of real-valued variables, defining the region of the obstacles is associated with at least one binary variable indicating whether the constraint is satisfied at a specific time instant, where the admissibility of a valued of the at least one binary variable associated with the constraint is tested. To that end, it is determined whether the estimated motion trajectory 1105 satisfies the constraint to which the integer variable is associated at the specific time instant.
[0084] Still referring to
[0085] Further, depending on whether the refinement is done or not, trajectories 1111 are assigned to be the optimized trajectories 1108 obtained from the reduced solver module 1107 or trajectories 1110 from the full MIP solver module 1109.
[0086] Some embodiments of the disclosure are based on the realization that the operation, i.e., the ML prediction 1104, the problem reduction 1106, and the reduced solver 1107, needs to be performed online and implemented in an EPU for multi-device motion planning, but these components require a considerably reduced amount of computational and memory resources for multi-device motion planning In some embodiments, the ML module 1104, the problem reduction module 1106, the reduced solver module 1107, and the full solver module 1109 are comprised by the controller 111 (
[0087] In some embodiments, the full solver module 1109 may be a part of the online operation of the multi-device motion planning system, and the computational and memory resources for the full solver module 1109 may be bounded by limiting the number of iterations in the optimization algorithm, for example, by limiting the number of nodes in the branch and bound tree.
[0088] The data generation module 1101 generates samples of at least one among devices initial conditions, goals, and obstacle center positions and/or obstacle dimensions, obtaining data tuples
{(x.sub.0.sup.i,r).sub.i=1.sup.n.sup.
[0089] Based on such a tuple, the full mixed integer program (8) is built and solved. The optimal solution ξ.sub.r* contains the state, input and position trajectories, and all the integer or binary variables. First, the data generation module removes all solutions and corresponding data tuples that result in infeasible solutions, ξ.sub.r*=∞. When the infeasible solution ξ.sub.r*=∞ or infeasibility of the estimated motion trajectory 1105 is detected, the parameters of the task of changing the state of the one or multiple devices are updated by changing the state of the one or multiple devices according to a portion of the infeasible estimated motion trajectory 1105. Further, steps for estimating the optimized motion trajectories 1108 are repeated for the updated parameters of the task. To detect the infeasibility of the estimated motion trajectory 1105, the estimated motion trajectory 1105 is evaluated to determine whether the estimated motion trajectory 1105 causes at least one collision among devices of the one or multiple devices or between a device of the one or multiple devices and an obstacle.
[0090] After removing the infeasible solution ξ.sub.r*=∞, the remaining data tuples with feasible solutions, with their corresponding optimal state trajectories for a future horizon of N-steps for all devices
{((x.sub.k.sup.i,r*).sub.k=0.sup.T).sub.i=1.sup.n.sup.
is collected to build a training dataset for a prediction model in the ML module 1104
{(x.sub.0.sup.i,r).sub.i=1.sup.n.sup.
[0091] , i=
, then select neighborhoods of the obstacles and randomly sample x.sub.0.sup.i, i∈
, in such neighborhoods. Some embodiments of focused samplers 1251 are describe further with reference to
[0092]
[0093] Another embodiment of the focused sampler involves density-based importance sampling 1263 from a distribution that is designed taking the obstacle positions into account. According to this embodiment, a distribution ξ.sub.imp(x.sub.0.sup.i) on the state-space of interest is constructed such that the probability of selecting a data point increases as the focused sampler 1251 gets closer to the obstacle. For instance, such a distribution can be generated efficiently via kernelized density estimation methods and extracting samples from such a skewed distribution automatically focuses samples on close proximity to the obstacle.
[0094] Another embodiment of focused sampling involves active learning 1265, wherein the initial set of data is used for constructing a learner, which subsequently chooses the next batch of samples that contain the most useful information (for example, according to an entropy measure). By repeated learning and batch selection, the data collected will comprise focused samples around the most informative regions of the state-space.
[0095] The training dataset is used in the training module 1102 to train the prediction model in the ML module 1104, which contains a prediction model, such as a neural network, for the trajectories of the multiple devices based on the scenario data {(x.sup.i(t)).sub.i=1.sup.n.sup.
[0096]
[0097]
ƒ.sub.0W.sub.0.sup.TX+w.sub.0 (12a)
where W.sub.0 and w.sub.0 form the weights and biases of the input layer and X is the input to the deep neural network 1341. Consequently, the features are passed through multiple hidden layers 1355 (typically larger than 3 for a deep neural network), and they are transformed via activation functions α(⋅) such as rectified linear units (ReLUs), sigmoid functions, or modifications and combinations thereof. One can represent the operations occurring in the hidden layers using compositions of functions, where the output from the final hidden layer is given by
ƒ.sub.n=α.sub.n ○ . . . ○α.sub.2(W.sub.2.sup.Tα.sub.1(W.sub.1.sup.Tα.sub.0(ƒ.sub.0)+w.sub.1)+w.sub.2. (12b)
[0098] The output layer 1357 is used here for regression and therefore, can be linear, so the prediction output 1359 becomes
Y=W.sub.∞.sup.Tƒ.sub.n+w.sub.∞ (12c)
and training the deep neural network 1341 involves obtaining good values of the weights and biases in the input layer 1353, in the output layer 1357 and in each of the hidden layers 1355.
[0099] While common training methods for prediction models aim at minimizing the training error of the prediction model equally in every direction, in this disclosure it is realized that for multi-device motion planning, the training error should be minimized primarily in the directions that cause violations of the constraints that determine collision avoidance, since violating those constraints causes critical damage to the devices, while other errors cause only a performance degradation.
[0100]
[0101] Some embodiments are based on the realization that to achieve asymmetric training errors, a standard loss function can be modified for training, for instance, based on mean squared error (MSE) between trajectory in the dataset and obtained trajectory from the prediction model in the ML module
by adding a device-obstacle collision term that penalizes training errors that lead to a device colliding with an obstacle
and an inter-device collision term that penalizes training errors that lead to a device colliding with another device
where in (14), (15) the functions ϕ.sub.o.sup.j, ϕ.sub.a.sup.j are barrier functions, that is, functions that approximate the behavior of being 0 outside of the region of the obstacle, and a positive constant inside the region of the obstacle, while being continuous and continuously differentiable. An example of such behavior is obtained by
where P.sub.j, C.sub.j and a are the shape matrix, center, and scaling factor of the ellipsoid ({circumflex over (z)}.sub.k.sup.i,r−C.sub.j).sup.T P.sub.m({circumflex over (z)}.sub.k.sup.i,r−C.sub.j)≤1/α that covers the region of the j-th obstacle.
[0102]
[0103] Thus, the training module 1102 trains the prediction model in the ML module 1104 by adjusting the value of the parameters, for example, the weights and biases in a neural network, such that the loss function
=
.sub.MSE
.sub.MSE+
.sub.aoc
.sub.aoc+
.sub.iac
.sub.iac (17)
is reduced, where .sub.MSE,
.sub.aoc,
.sub.iac are positive scalar weights. The operation of the training module 1102 using the barrier terms in the loss function (also referred to as “reconstruction loss function”) according to some embodiments of the present disclosure is shown in
[0104]
[0105] At step 1601, the current prediction of the prediction model is evaluated. Further, at step 1602, the MSE cost is computed. At step 1603, the barrier costs are computed, and at step 1604, the total loss function is computed. At step 1605, termination conditions are checked. If the termination conditions, such as total number of updates or progress obtained by the updates, are achieved, at step 1606 the training module terminates and returns the current best prediction model parameters. Otherwise, it updates the parameters at step 1607. In some embodiments, each update of the model parameters 1607 is performed by a stochastic gradient descent algorithm or any variant thereof such as the Adam optimization algorithm.
[0106] During online operation of the multi-device planning system, the ML module 1104 containing a prediction model trained by the training module 1102 receives the data of the current scenario 1103 consisting of the state of the devices, goal of the devices, and centers and/or dimensions of the obstacles at current time t
{(x.sup.i(t)).sub.i=1.sup.n.sup.
and evaluates the prediction model 1104 obtaining a state predicted trajectory 1105 for a future horizon of T steps for the n.sub.a devices
({circumflex over (x)}.sup.i(t+k)).sub.k=0.sup.T i=1, . . . , n.sub.a (19)
[0107] Some embodiments are based on the realization that even if the aim is to predict the values of some or all integer variables in the problem reduction module 1106, it is more convenient to predict device state trajectories. This is due to the realization that integer variables can be reconstructed from the device state trajectories, and the state trajectories have lower dimensions and fewer constraints between their vector components. Instead, the set of integer variables is much larger: for each device, for each time step, for each obstacle, and for each side there is one integer variable, as opposed to for each device, and for each time step there is a state vector. Also, the integer variables are discrete-valued, so that they are not easily predicted by a continuous real-valued prediction model and are subject to relative constraints that cannot be enforced easily within the prediction model. Accordingly, predicting real-valued state trajectories requires less data and a simpler prediction model, i.e., with fewer equations, and creates fewer and less critical prediction errors.
[0108] Despite using a training loss function with additional barrier terms, it is possible that the predicted trajectories may be experiencing collisions. However, in the present disclosure, some embodiments are based on the realization that collisions experienced by the predicted trajectories are still due to the approximation of the continuous prediction model in predicting a discontinuous solution. As the region where the discontinuity happens is limited, small perturbations to the state from which the predicted trajectory is computed may cause its re-computation to avoid collisions. Thus, in some embodiments, the ML module operates as a receding horizon implementation as shown in
[0109] =x.sup.i(t), i=1, . . . , n.sub.a.
[0110] At step 1701, the ML module checks whether the counter S has reached horizon T. If the counter S has reached the horizon T, at step 1720, the ML module returns the determined trajectory for each of the devices. The prediction model 1104 is evaluated using the last state for each device in , the goals, and the obstacle centers and/or dimensions.
[0111] At step 1702, the predicted trajectories are obtained ({circumflex over (x)}.sup.i(t+S+k)).sub.k=0.sup.T i=1, . . . , n.sub.a. At step 1703, a set of candidate trajectories) =({circumflex over (x)}.sup.i(t+S+k)).sub.k=−S.sup.T−S i=1, . . . , n.sub.a are obtained by concatenating the determined trajectories
=({circumflex over (x)}.sup.i(t+k)).sub.k=0.sup.S i=1, . . . , n.sub.a, which have length S steps with indices k=1, . . . , T−S of the corresponding predicted trajectories
=({circumflex over (x)}.sup.i(t+S+k)).sub.k=1.sup.T−S i=1, . . . , n.sub.a.
[0112] Further, at step 1704, the ML module checks whether the candidate trajectories experience collisions between a device and an obstacle or between two devices, by evaluating whether the inequalities (4), (5a) are satisfied at any time step, respectively. If no collisions are detected, the ML module proceeds to step 1705.
[0113] At step 1705, the counter is set to S=T, the candidate trajectories are accepted and produced as the output of the ML module. Otherwise, if the collisions are detected, the ML module proceeds to step 1706.
[0114] At step 1706, the algorithm determines whether the predicted trajectories are collision-free. If the collision occurs at the step with index k=1 of the predicted trajectories, i.e., the initial step of the predicted trajectories is not collision-free, and at step 1707, a failure is returned. Otherwise, at step 1708, the first elements of the predicted trajectories ({circumflex over (x)}.sup.i(t+S+1)), i=1, . . . , n.sub.a are added to the determined trajectories, the counter S is increased by 1, and the operation repeats.
[0115] Some embodiments in the present disclosure are based on the realization that it is more convenient to predict real-valued state trajectories and use these real-valued state trajectories to determine the values of some or all the integer and/or binary variables because state trajectories can be predicted by a prediction model that requires less data, fewer equations, is real-valued and produces fewer and less critical errors. The problem reduction module 1106 uses trajectories ({circumflex over (x)}.sup.i(t+k)).sub.k=0.sup.T i=1, . . . , n.sub.a from the ML module to determine the value of some or all the integer or binary variables as shown in
[0116]
[0117] At step 1801, the ML module determines the smallest time step N.sub.g.sup.i of the trajectory at which the goal-reaching constraint is achieved for each device ∥x(t+k).sup.i−x.sub.goal.sup.i∥.sub.1≤d.
[0118] At step 1802, the ML module fixes the binary variables for goal reaching g.sub.k.sup.i all to 0 except for g.sub.k.sup.i=1 for k=N.sub.g.sup.i for each device i=1, . . . , n.sub.a. Then, at step 1803, a collision avoidance constraint among those in (4) and (5a) is selected.
[0119] At step 1804, it is checked whether an inequality constraint is satisfied by the predicted state trajectories at a specific time instant. If the inequality constraint is satisfied by the predicted state trajectories ({circumflex over (x)}.sup.i(t+k)).sub.k=0.sup.T i=1, . . . , n.sub.a, at step 1805, the value of the corresponding binary variable according to (3) and (5) is set to 0. Otherwise, if the inequality constraint is not satisfied by the predicted state trajectories, the ML module proceeds to step 1806.
[0120] At step 1806, the binary variable is set to 1. At step 1807, if all constraints in (4) and (5a) have been selected, the algorithm stops at step 1808 since the values of all binary variables have been fixed to either 0 or 1. Otherwise, it repeats the operation. The latter approach determines fixed values for all the binary variables b.sub.k.sup.i,j in (3), d.sub.k.sup.i,l in (5) and g.sub.k.sup.i in (6) for each device i=1, . . . , n.sub.a and at each time step k=0, . . . , T.
[0121] Some embodiments are based on the realization that one or multiple integers or binary variables in the MIP problem may not be fixed by the problem reduction module 1106. Even if not all integer or binary variables are fixed, the resulting reduced MIP problem can be solved with a significantly reduced amount of computational and memory resources for multi-device motion planning For example, in some embodiments, the binary variables b.sub.k.sup.i,j in (3), d.sub.k.sup.i,l in (5) are fixed by the problem reduction module 1106, but the binary variables g.sub.k.sup.i in (6) can be determined by the reduced solver 1107 for each device i=1, . . . , n.sub.a and at each time step k=0, . . . , T.
[0122] In some embodiments, the collision avoidance constraints or the goal-reaching constraints may be defined differently from the mixed-integer constraints in (3), (5), and in (6), respectively, but the problem reduction module 1106 can be implemented as a procedure as described in
[0123] The reduced solver module 1107 uses the scenario state 1103 to construct the MIP problem (8). But instead of solving this directly, it uses the values of integer or binary variables determined by the problem reduction module 1106 to fix the values of some or all the integer or binary optimization variables in the problem (8). Consequently, rather than starting from the root 601 of the branch and bound tree shown in
{((x.sub.k.sup.i*).sub.k=0.sup.T).sub.i=1.sup.n.sup.
where the conditioning is due to the fixing of the integer or binary variables according to the output of the problem reduction module, and the module also provides a corresponding value of all the integer or binary variables
{((b.sub.k.sup.i,j*).sub.k=0.sup.T).sub.i=1,j=1.sup.n.sup.
which is consistent with the values set by the problem reduction module 1106. By solving the reduced MIP problem, the reduced solver can correct for prediction errors in the trajectories predicted by the ML module, by changing the real-valued variables of the problem (8), and optionally by changing one or multiple of the integer or binary variables that have not yet been fixed. Thus, the reduced solver module solves a simpler problem for multi-device motion planning, but it still has control to avoid collision due to the approximated prediction of the ML module. In case the reduced problem is infeasible, an error is returned. In some embodiments, if an error is returned due to infeasibility of the reduced MIP problem, a correction to one or multiple of the values of the integer or binary variables can be computed by a heuristic or approximate optimization algorithm for mixed-integer programming. For example, a continuous optimization algorithm could be used for a smooth nonlinear relaxation of the MIP problem based on sequential convexifications to compute a feasible solution for the multi-device motion planning problem, starting from the predicted trajectories by the ML module.
[0124] In some embodiments, the reduced solver module 1107 requires the solution of one or multiple real-valued optimization problems using standard methods for real-valued mathematical programming, such as an interior point method, a gradient-based method, ADMM, active set method, or simplex method. In some embodiments, the reduced solver module 1107 requires the solution of a MIP problem with a significantly smaller amount of integer or binary variables, and a heuristic or approximate optimization algorithm can be used to compute a feasible and close to the optimal solution to the multi-device motion planning problem, for example, using rounding schemes or a feasibility pump. In other embodiments of the disclosure, the reduced solver module 1107 computes a solution to the reduced MIP problem based on a global optimization algorithm, for example, including branch and bound, branch and cut, branch and price, or similar methods based on branching and/or cutting plane strategies.
[0125] According to
[0126] Some embodiments are based on the realization that the full solver module 1109 uses an iterative procedure for mixed-integer programming that can be stopped at any time, and it will always provide a feasible solution that is at least as optimal as the warm-start solution from the reduced solver module 1107. In particular, the full solver module may explore only a fixed and relatively small number of nodes in the branch and bound tree in
[0127] In case a failure is returned by the problem reduction module 1106 or by the reduced solver module 1107, each device is stopped and moved slowly away from one or multiple of the closest obstacles in a random and/or safe direction and the approach is repeated. Since the infeasibility is likely due to approximation of the discontinuities which are restricted to relatively small regions and mostly close to obstacles, a failure recovery mechanism based on a new prediction from the ML module using the updated scenario state has high probability to provide feasible trajectories for each of the devices at future time steps.
Embodiments
[0128] The description provides exemplary embodiments only and is not intended to limit the scope, applicability, or configuration of the disclosure. Rather, the following description of the exemplary embodiments will provide those skilled in the art with an enabling description for implementing one or more exemplary embodiments. Contemplated are various changes that may be made in the function and arrangement of elements without departing from the spirit and scope of the subject matter disclosed as set forth in the appended claims. Specific details are given in the following description to provide a thorough understanding of the embodiments. However, understood by one of ordinary skill in the art can be that the embodiments may be practiced without these specific details. For example, systems, processes, and other elements in the subject matter disclosed may be shown as components in block diagram form in order not to obscure the embodiments in unnecessary detail. In other instances, well-known processes, structures, and techniques may be shown without unnecessary detail in order to avoid obscuring the embodiments. Further, like reference numbers and designations in the various drawings indicated like elements.
[0129] Also, individual embodiments may be described as a process which is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process may be terminated when its operations are completed but may have additional steps not discussed or included in a figure. Furthermore, not all operations in any particularly described process may occur in all embodiments. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, the function's termination can correspond to a return of the function to the calling function or the main function.
[0130] Furthermore, embodiments of the subject matter disclosed may be implemented, at least in part, either manually or automatically. Manual or automatic implementations may be executed, or at least assisted, using machines, hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the necessary tasks may be stored in a machine-readable medium. A processor(s) may perform the necessary tasks.
[0131] Further, embodiments of the present disclosure and the functional operations described in this specification can be implemented in digital electronic circuitry, in tangibly embodied computer software or firmware, in computer hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Further, some embodiments of the present disclosure can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non transitory program carrier for execution by, or to control the operation of, data processing apparatus. Further still, program instructions can be encoded on an artificially generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus. The computer storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
[0132] A computer program (which may also be referred to or described as a program, software, a software application, a module, a software module, a script, or code) can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data, e.g., one or more scripts stored in a markup language document, in a single file dedicated to the program in question, or in multiple coordinated files, e.g., files that store one or more modules, sub programs, or portions of code. A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
[0133] Computers suitable for the execution of a computer program include, by way of example, can be based on general or special purpose microprocessors or both, or any other kind of central processing unit. Generally, a central processing unit will receive instructions and data from a read only memory or a random access memory or both. The essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks. However, a computer need not have such devices. Moreover, a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device, e.g., a universal serial bus (USB) flash drive, to name just a few.
[0134] To provide for interaction with a user, embodiments of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input. In addition, a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a web browser on a user's client device in response to requests received from the web browser.
[0135] Embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back end, middleware, or front end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), e.g., the Internet.
[0136] The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
[0137] Although the present disclosure has been described with reference to certain preferred embodiments, it is to be understood that various other adaptations and modifications can be made within the spirit and scope of the present disclosure. Therefore, it is the aspect of the append claims to cover all such variations and modifications as come within the true spirit and scope of the present disclosure.