METHOD AND DEVICE FOR DETERMINING AN OPTIMIZED CONTROL STRATEGY OF A MOBILE AGENT IN A DYNAMIC OBJECTS ENVIRONMENT
20220050429 · 2022-02-17
Inventors
Cpc classification
B60W2400/00
PERFORMING OPERATIONS; TRANSPORTING
B60W30/0953
PERFORMING OPERATIONS; TRANSPORTING
G05B13/041
PHYSICS
International classification
B60W30/095
PERFORMING OPERATIONS; TRANSPORTING
Abstract
A computer-implemented method for determining an appropriate control strategy for a mobile agent for an environment with one or more dynamic objects. The method includes: providing a number of different scenarios wherein to each of the scenarios a number of dynamic objects is associated, wherein for each of the scenarios, each of the dynamic objects is associated with a start, a goal and a behavior specification; providing a number of control strategy candidates for the mobile agent; benchmarking each of the control strategy candidates in any of the scenarios; selecting the control strategy for the mobile agent depending on the result of the benchmarking of the control strategy candidates.
Claims
1. A computer-implemented method for determining an appropriate control strategy for a mobile agent for an environment with one or more dynamic objects, comprising the following steps: providing a number of different scenarios, wherein, to each of the scenarios a number of dynamic objects is associated, wherein for each of the scenarios, each of the dynamic objects is associated with a start, a goal, and a behavior specification; providing a number of control strategy candidates for the mobile agent; benchmarking each of the control strategy candidates in any of the scenarios; and selecting a control strategy for the mobile agent depending on a result of the benchmarking of the control strategy candidates.
2. The method according to claim 1, wherein each of the control strategy candidates applies a cost function or is rule-based to optimize a motion trajectory for the mobile agent in the environment with the dynamic objects.
3. The method according to claim 1, wherein the benchmarking of each respective control strategy candidate of the control strategy candidates in every scenario is performed depending on results of optimization problems to reflect behaviors of each dynamic object the mobile agent may face during application of the respective control strategy candidate in the benchmarking.
4. The method according to claim 3, wherein the optimization is performed repeatedly and/or simultaneously with the benchmarking.
5. The method according to claim 1, wherein the control strategy candidates include at least one of the following: a Dynamic Window Approach, an Elastic Band method, and a Timed Elastic Band method.
6. The method according to claim 1, wherein the result of the benchmarking is a benchmark indicator for each of the control strategy candidates.
7. The method according to claim 3, wherein the result of the benchmarking is obtained by means of a function of one or more key performance indicators, including optimized cost values, for each of the scenarios, including a mean of all cost values of the scenarios for each individual control strategy candidate.
8. A device for determining an appropriate control strategy for a mobile agent for an environment with one or more dynamic objects, wherein the device is configured to: provide a number of different scenarios, wherein, to each of the scenarios, a number of dynamic objects is associated, wherein for each of the scenarios, each of the dynamic objects is associated with a start, a goal, and a behavior specification; providing a number of control strategy candidates for the mobile agent; benchmarking each of the control strategy candidates in any of the scenarios; and selecting the control strategy for the mobile agent depending on a result of the benchmarking of the control strategy candidates.
9. A computer program product, comprising: a non-transitory computer readable medium on which is stored a computer program code for determining an appropriate control strategy for a mobile agent for an environment with one or more dynamic objects, the control program code, when executed by a computer, cause the computer to perform the following steps: providing a number of different scenarios, wherein, to each of the scenarios a number of dynamic objects is associated, wherein for each of the scenarios, each of the dynamic objects is associated with a start, a goal, and a behavior specification; providing a number of control strategy candidates for the mobile agent; benchmarking each of the control strategy candidates in any of the scenarios; and selecting a control strategy for the mobile agent depending on a result of the benchmarking of the control strategy candidates.
10. A non-transitory machine readable medium on which is recorded a program for determining an appropriate control strategy for a mobile agent for an environment with one or more dynamic objects, the program, when execute by a computer, causing the computer to perform the following steps: providing a number of different scenarios, wherein, to each of the scenarios a number of dynamic objects is associated, wherein for each of the scenarios, each of the dynamic objects is associated with a start, a goal, and a behavior specification; providing a number of control strategy candidates for the mobile agent; benchmarking each of the control strategy candidates in any of the scenarios; and selecting a control strategy for the mobile agent depending on a result of the benchmarking of the control strategy candidates.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0023] Example embodiments of the present invention are described in more detail in conjunction with the figures.
[0024]
[0025]
DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS
[0026]
[0027] The mobile agent 1 may have a configuration as it is schematically shown in
[0028] The control unit 11 may have a microprocessor or a microcontroller as well as a memory for storing data and an algorithm code. Furthermore, the mobile agent 1 has an actuation unit 12 for interaction with the environment, for instance the actuation unit 12 may include a traction motor for driving wheels of the mobile agent 1 to move the mobile agent 1 in the environment E. The actuation unit 12 is controlled by the control unit 11.
[0029] Furthermore, the mobile agent 1 may include a sensor system 13 for sensing the environment E, particularly to detect other objects and structures which may allow the localization of the mobile agent 1 in the environment and to identify dynamic objects and their poses in the environment E. The sensor system 13 may include radar, Lidar, and/or cameras to scan the environment E surrounding the mobile agent 1.
[0030] When configuring such a mobile agent 1 a control strategy has to be selected and installed which allows the mobile agent 1 to autonomously perform its task in the environment E. The control strategy should have implemented a collision avoidance/handling to prevent motion trajectories where a collision with another dynamic object is likely, and/or to cope with collisions.
[0031] In simulation-based benchmarking of navigation algorithms for mobile agents, it is generally difficult to effectively model the behavior of the dynamic objects 2, 3, 4. This behavior of each of the dynamic objects 2, 3, 4 may contribute to or in fact disturb the performing of the task of the mobile agent 1.
[0032] For determining the appropriate control strategy for the mobile agent 1 a method is performed, e.g. on a data processing system, as it is described in more detail in the flowchart of
[0033] In step S1 it is provided a number of differing scenarios wherein to each of the scenarios a number of dynamic objects 2, 3, 4 being distributed in the environment E. To each of the dynamic objects 2,3, 4 it is associated a start S and a goal G for a motion task, behavior characterizing the policy of its respective action and being indicated by a behavior indicator.
[0034] In step S2 there is made a setup for a benchmarking process wherein a number of control strategy candidates for the mobile agent 1 is provided, wherein each control strategy candidate may apply a cost function to optimize a motion trajectory for the mobile agent 1 in the environment with the dynamic objects 2, 3, 4 or a rule based strategy. The control strategy candidates may include Elastic Bands (EB), Dynamic Window Approach (DWA), Timed Elastic Bands (TEB).
[0035] The optimization can be performed repeatedly and simultaneously with the ongoing benchmark. The reason is that the exact behavior (in terms of trajectories) of the dynamic objects 2, 3, 4 depends on the trajectory of mobile agent 1.
[0036] Basically, EB, DWA and TEB aim to plan the motion of a mobile agent 1 along a given horizon while minimizing a given cost function and while adhering to kinodynamic constraints of the mobile agent 1. After commanding the first control action to the mobile agent 1, the optimization is continuously repeated which is known in control theory as model predictive control. As computing of the optimal solution is demanding, the above indicated approaches approximate the optimal solution with different optimization strategies.
[0037] The DWA performs a sample-based optimization. It samples a control action and rolls out the trajectory for these particular sampled actions by simulating them according to a specified horizon length based on the agent's motion model. After rolling out predictions for all samples, the best motion trajectory is selected based on a specified cost function and constraints.
[0038] The TEB primarily tries to seek for the time-optimal solution. The approach discretizes the trajectory along the prediction horizon in terms of time and applies a continuous numerical optimization scheme. TEB is able to optimize multiple trajectories in different topologies at once in order to find the solution. Since the approach relies on continuous optimization, the cost function must be smooth.
[0039] In step S3 each of the control strategy candidates is benchmarked in scenarios with different kinds/behaviors of dynamic objects 2, 3, 4. The behavior of the dynamic objects 2, 3, 4 is defined and tuned by an optimal control problem as introduced below. It can describe different degrees of cooperativity of the dynamic objects 2, 3, 4.
[0040] Each of the control strategy candidates may be benchmarked depending on a predefined key performance indicator, representing e.g. a minimum distance between different agents, an overall time to complete a task or to reach a goal, respectively, a distance measure to a desired path to follow, a control effort measure, an energy consumption or the like in each of the number of scenarios, such as depending on the time to complete the given task.
[0041] The key performance indicator is determined based on an optimization problem to reflect “intelligent-like” behavior of the dynamic objects 2, 3, 4 the mobile agent 1 may face during the application of the control strategy candidate in the benchmarking process.
[0042] Let x.sub.k∈.sup.n.sup.
.sup.n.sup.
.sup.n.sup.
including the mobile agent 1 and the dynamic objects 2, 3, 4, external actors
denotes the state of actor i∈ and v.sub.k.sup.i∈
its control degree of freedom. For example, for dynamic object 2, the external actors are denoted by 1, 3, and 4. The dynamic behavior of the dynamic object is governed by the difference equation f:
.sup.n.sup.
.sup.n.sup.
.sup.n.sup.
in the simulation environment it is possible that the mobile agent 1 and the dynamic objects 2, 3, 4 are not known exactly, e.g. governed by a higher-fidelity model.
[0043] The to be determined optimal control strategy is defined by a cost function J that consists of the sum of stage costs J.sub.0, . . . , J.sub.N-1:
and the terminal cost J.sub.N:
the difference equations for the respective dynamic objects 2, 3, 4, the vector fields of inequality constraints h.sub.0, . . . , h.sub.N, and the model of the behavior of the other actor described by a minimization/optimization problem for every actor. The inequality constraints can encode for instance the mutual collision avoidance between all actors between each other and the environment (encoded by E), but they also specify physical limits of the mobile agent 1 and the other dynamic objects 2, 3, 4 (acceleration, velocity, steering angles, etc.). Consider the structure of the (bi-level) optimal control problem below:
[0044] Where from the perspective of dynamic object 2 represent the currently measured or estimated states of the ego vehicle: dynamic object 2 and the external actors: mobile agent 1 and dynamic objects 3, 4, respectively. Every dynamic object 2, 3, 4 solves this optimization problem. The respective dynamic object's ego state is denoted by
variables denote the states of the “external actors” with respect to the respective dynamic object 2, 3, 4, this includes the mobile agent 1. Particularly the trajectories {{circumflex over (z)}.sub.0 . . . N.sup.i
denote the expected behavior of the other agents driven by an approximation of their objective, dynamics and constraints.
[0045] In the shown example, if the optimization shall be made from the perspective of dynamic object 2. The external actors are the mobile agent, the dynamic object 3 and the dynamic object 4. Likewise, from the perspective of dynamic object 3 the external actors are the mobile agent 1, the dynamic object 2, and the dynamic object 4.
[0046] The aim is to let the solution of this optimal control problem exhibit the desired tunable behavior of the dynamic objects. If for instance that the dynamic object is aware of the cost function that is being optimized by the mobile agent 1 and the other dynamic objects, or that it at least has a reasonable approximation thereof. The stage cost for stage k can be for instance be defined as a weighted sum of the isolated stage costs of all agents using the weighting coefficients {α.sub.k.sup.i:
[0047] where ϕ.sub.k denotes the stage cost for the dynamic object that will effectively achieve the task as this cost function is minimized, whereas ψ.sub.k.sup.i denotes the (estimated) stage cost of the external actors. This cost function can effectively encode different kinds of cooperative behavior, consider that actor i=1∈ refers to the benchmarked mobile agent 1, examples include: [0048] Cooperative behavior: α.sub.k.sup.i=1 ∀k, i∈
. [0049] Non-cooperative (greedy) behavior: α.sub.k.sup.i=0 ∀k, i∈
. [0050] Sabotage benchmarked robot: α.sub.k.sup.1=−1 ∀k, α.sub.k.sup.i=0 ∀k, i∈
\1, and ϕ.sub.k:=0 ∀k. [0051] The {α.sub.k.sup.i
are selected by some priority-assignment scheme.
[0052] The dynamic objects can be assumed to have perfect knowledge of the current state of itself by querying the global state of the simulator. Similarly, for the state of the environment E, the global state of the simulator can be queried without limitations. The model predictive control algorithm solves the optimal control problem in above equations at a fixed control rate of e.g. 10 Hz based on new states of the dynamic objects, the external actors and the environment. Every time it applies it executes the first part of the optimized trajectory u.sub.k only, as is common in model predictive control.
[0053] In step S4 the control strategy for the mobile agent 1 is selected depending on the result of the benchmarking of the control strategy candidates. The result of the benchmarking can be a benchmark indicator for each of the control strategy candidates wherein the benchmark indicator is obtained by means of a given function of the optimized cost values for each of the scenarios such as a mean of all cost values of the scenarios for each individual control strategy candidate. The result of the benchmarking may be different for every scenario and behavior of the dynamic objects.
[0054] In step S5 the mobile agent 1 is provided with the selected control strategy and operated therewith.
[0055] Finally, the control strategy candidate having the best benchmark result for the intended use-case can be used for configuration of the mobile agent 1.