Method and Device for Optimum Parameterization of a Driving Dynamics Control System for Vehicles
20230001940 · 2023-01-05
Inventors
- Andreas Doerr (Stuttgart, DE)
- Felix Berkenkamp (Muenchen, DE)
- Maksym Lefarov (Stuttgart, DE)
- Valentin Loeffelmann (Dielheim, DE)
Cpc classification
B60W50/085
PERFORMING OPERATIONS; TRANSPORTING
B60W50/0098
PERFORMING OPERATIONS; TRANSPORTING
B60W2510/182
PERFORMING OPERATIONS; TRANSPORTING
B60W2050/0082
PERFORMING OPERATIONS; TRANSPORTING
B60W2050/0006
PERFORMING OPERATIONS; TRANSPORTING
B60W50/12
PERFORMING OPERATIONS; TRANSPORTING
G06N7/01
PHYSICS
B60W2050/0031
PERFORMING OPERATIONS; TRANSPORTING
International classification
B60W50/08
PERFORMING OPERATIONS; TRANSPORTING
Abstract
A method and device parameterize a driving dynamics controller of a vehicle, which intervenes in a controlling manner in a driving dynamics of the vehicle. The driving dynamics controller ascertains an action depending on a vehicle state. The method includes providing a model for predicting a vehicle state. The model configured to predict a subsequent vehicle state depending on the vehicle state and the action. At least one data tuple is ascertained including a sequence of vehicle states and respectively associated actions. The vehicle states are ascertained by the driving dynamics controller using the model depending on an ascertained action. The parameters of the driving dynamics controller are changed/adjusted such that a cost function which ascertains costs of the trajectory depending on the vehicle states and on the ascertained actions of the respectively associated vehicle states and is dependent on the parameters of the driving dynamics controller is minimized.
Claims
1. A method for parameterizing a driving dynamics controller of a vehicle for intervening in a controlling manner in a driving dynamics of the vehicle, comprising: ascertaining, using the driving dynamics controller, an action depending on a vehicle state of the vehicle; predicting a subsequent vehicle state of the vehicle depending on the ascertained vehicle state and the ascertained action using a model; ascertaining at least one data tuple comprising a sequence of the vehicle states and respectively associated actions, wherein the vehicle states are ascertained by the driving dynamics controller using the model and depend on corresponding ascertained actions; and adjusting parameters of the driving dynamics controller such that a cost function is minimized, wherein the cost function ascertains costs of the data tuple depending on the vehicle states of the data tuple and on the ascertained actions of the respectively associated vehicle states, and wherein the cost function is dependent on the parameters of the driving dynamics controller.
2. The method according to claim 1, wherein the model is a machine learning system, a parameterization of which has been learned depending on detected driving maneuvers of the vehicle or another vehicle.
3. The method according to claim 1, further comprising: detecting a trajectory of a real driving maneuver of the vehicle; and creating a correction model depending on the detected trajectory and the model, such that the correction model corrects outputs of the model in such a way that the corrected outputs substantially correspond to the detected trajectory.
4. The method according to claim 3, wherein the model is deterministic and the correction model is dependent on time.
5. The method according to claim 1, wherein: a plurality of different models are provided, and the data tuple is detected randomly for one model of the plurality of different models.
6. The method according to claim 5, wherein the different models differ from one another in that they each describe different dynamics of external variables or different dynamics of variables of the vehicle.
7. The method according to claim 5, wherein: a respective data tuple is detected for each of the models, and the parameters are changed depending on all of the data tuples.
8. The method according to claim 1, further comprising: filtering the vehicle states using a Kalman filter.
9. The method according to claim 1, wherein: the driving dynamics controller has a modular controller structure, and the parameters are adjusted in such a way that the adjusted parameters are within predefined value ranges.
10. The method according to claim 1, wherein the driving dynamics controller is a radial basis function network.
11. The method according to claim 1, wherein: after the parameters have been adjusted, a vehicle state is detected during operation of the vehicle, and an actuator of the vehicle is actuated depending on the action using the driving dynamics controller depending on the detected vehicle state.
12. The method according to claim 2, wherein: the driving dynamics controller includes an antilock braking system (ABS) controller and outputs an action which characterizes a braking force, the physical model comprises a plurality of submodels, and the submodels are each a physical model of a component of the vehicle.
13. The method according to claim 1, wherein: the cost function is a weighted superposition of a plurality of functions, and the functions characterize a difference between a current slip of tires of the vehicle and a target slip, a distance covered since intervention of the driving dynamics controller, and temporal deviations in the distance covered.
14. The method according to claim 1, wherein a device is configured to execute the method.
15. The method according to claim 1, wherein a computer program comprises instructions that, when the computer program is executed by a computer, cause the computer to execute the method.
16. The method according to claim 15, wherein the computer program is stored on a non-transitory machine-readable storage medium.
17. The method according to claim 1, wherein the model is a physical model configured to describe driving dynamics of the vehicle along a longitudinal, lateral, and horizontal axis of vehicle.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0052] Embodiments of the disclosure are explained in more detail below with reference to the appended drawings. In the drawings:
[0053]
[0054]
[0055]
DETAILED DESCRIPTION
[0056]
[0057] The vehicle 100 may generally be a motor vehicle which is controlled by a driver or a partly autonomous or even fully autonomous vehicle. In other embodiments, the motor vehicle may be a wheeled vehicle, a track vehicle or a rail vehicle. It is also conceivable that the motor vehicle is a two-wheeled vehicle, such as a bicycle, motor bike etc., for example.
[0058] A state of the vehicle is detected at preferably regular time intervals using at least one sensor 30, which may also be provided by a plurality of sensors. The state may also be ascertained independently of detected sensor values. The sensor 30 is preferably an acceleration sensor (in the vehicle longitudinal direction, but could also be a 3D sensor in all axes), a wheel speed sensor (on all wheels), a rotation rate sensor about a vertical axis, but could also be about all other axes).
[0059] The control system 40 receives a sequence of sensor signals S from the sensor 30 in an optional reception unit which converts the sequence of sensor signals S into a sequence of preprocessed sensor signals.
[0060] The sequence of sensor signals S or preprocessed sensor signals is supplied to a vehicle dynamics controller 60 of the control system 40. The driving dynamics controller 60 is preferably parameterized by parameters θ which are stored in a parameter memory P and are provided by same.
[0061] The driving dynamics controller 60 ascertains an action, also referred to as control signal A in the following text, depending on the sensor signals S and the parameters θ thereof, said control signal being transmitted to an actuator 10 of the vehicle. The actuator 10 receives the actuation signals A, is actuated accordingly and subsequently executes the corresponding action. It is also conceivable that the actuator 10 is configured to convert the actuation signal A into a direct actuation signal. If, for example, the actuator 10 receives a brake force as actuation signal A, the actuator can convert said brake force into a corresponding brake pressure which is used to directly actuate brakes. In this case, the actuator 10 may be a brake system, comprising the brakes of the vehicle 100. In addition or as an alternative, the actuator 10 may be a drive or a steering system of the vehicle 100.
[0062] In further preferred embodiments, the control system 40 comprises one or a plurality of processors 45 and at least one machine-readable storage medium 46 on which instructions are stored, which, when they are executed on the processors 45, cause the control system 40 to execute the method according to the disclosure.
[0063] In further embodiments, a display unit 10a is provided in addition to the actuator 10. The display unit 10a is provided, for example, to display intervention of the driving dynamics controller 60 and/or to output a warning that the driving dynamics controller 60 will immediately intervene.
[0064] The driving dynamics controller 60 is provided by a parameterized function a=ƒ(s, θ) which outputs the actuation signal A depending on the state s and/or on the sensor signals S of the sensor 30. In the event that the driving dynamics controller 60 outputs an actuation signal A for the actuator 10, where the actuator 10 has a plurality of actuators, the actuation signal A can have a respective control signal for each of the actuators. The individual actuators may be the individual brakes of the vehicle 100.
[0065] In a preferred exemplary embodiment, the driving dynamics controller 60 is an ABS controller, where this controller outputs a brake force or a brake pressure as actuation signal. In this case, the driving dynamics controller 60 preferably outputs a brake pressure or brake force for each of the brakes of the wheels or for each of the axles of the vehicle 100 in order to be able to control the wheels individually.
[0066] The driving dynamics controller 60 preferably has a controller structure which is able to be interpreted. This can be provided, for example, by virtue of valid parameter limits within the controller being able to be defined. This has the advantage that the behavior of the driving dynamics controller 60 is able to be understood in each situation.
[0067] Examples of the parameterized function ƒ of the driving dynamics controller 60 are as follows:
[0068] A driving dynamics controller 60, which has a controller structure which is able to be interpreted, may be provided, for example, by a structured driving dynamics controller 60 that is structured like a decision tree.
[0069] In order to ascertain an action based on the decision tree, a root node along the tree is taken as starting point. For each node, an attribute is retrieved (for example a vehicle state) and a decision is made about the selection of the following nodes by means thereof. This procedure is continued until a leaf of the decision tree is reached. The leaf characterizes one action of a plurality of possible actions. The leaf may characterize a brake pressure build-up/reduction, for example.
[0070] In this example, the parameters θ are decision threshold values or the like.
[0071] The driving dynamics controller 60 may alternatively be provided by an RBF (radial basis function) network or by a deep RNN policy.
[0072] It should be noted that the parameterized function ƒ may also be any other mathematical function that maps the state of the vehicle onto an actuation signal depending on the parameters.
[0073]
[0074] The method begins with step S21. In this step, driving data of the vehicle 100 are collected. The driving data are, for example, a series of data s.sub.0, s.sub.1, . . . , s.sub.t, . . . , s.sub.T that describe a state s of the vehicle 100 along a driving maneuver. Said driving data are preferably a data tuple, comprising the state data s.sub.t and action data a.sub.t at each time t of the maneuver.
[0075] In the event that the driving dynamics controller 60 is an ABS controller, a brake process can be recorded, for example, using a known ABS controller or by a driver driving the vehicle 100, wherein the state data (s.sub.t) comprise for example the following sensor data: vehicle speed v.sub.veh, acceleration a.sub.veh, preferably subsequent sensor data per wheel of the vehicle 100: wheel speed v.sub.wheel, acceleration a.sub.wheel, jerk j.sub.wheel. The action data (a.sub.t) are the brake forces selected in the respective state, preferably also a variable that characterizes a road surface.
[0076] As an alternative, the driving data can be generated by simulation, in which a fictitious vehicle executes one or a plurality of (brake) maneuvers in a simulated environment.
[0077] Step S22 can subsequently follow. In said step, the recorded state data s are partly reconstructed. This is because not all of the required information about the state of a system (car) are typically measured by internal sensors (for example inclination of the vehicle, suspension behavior, wheel acceleration). This latent information has to be retrieved for the learning and the modeling in order to enable predictions and optimizations. This area is typically referred to as latent state inference (for example hidden Markov models) and resolved by filter/smoothing algorithms (for example Kalman filters). The vehicle states are preferably reconstructed in step S22 by means of the Kalman filter.
[0078] After step S21 or step S22 have ended, step S23 follows. In this step, a model P is provided. This can be provided either by the model P being created based on the recordings according to step S21 or a physical model is provided.
[0079] The model P(s.sub.t+1|s.sub.t, a.sub.t) is a model that predicts a subsequent vehicle state s.sub.t+1 at an immediately following time t+1 depending on a vehicle state s.sub.t at a time t and an actuation signal selected depending thereon.
[0080] The model P(s.sub.t+1|s.sub.t, a.sub.t) is preferably a physical model of the first order. That is to say the physical model comprises equations which describe physical relationships and predict the subsequent vehicle state s.sub.t+1, in particular in a deterministic manner, depending on the current vehicle state s.sub.t and the action a.sub.t. By way of example, for the driving dynamics controller 60 for ABS, the physical model may be made up of one or a plurality of submodels from the following list of submodels: a first submodel which is a physical model of a wheel of the vehicle 100, a second submodel which describes the center of mass of the vehicle, a third submodel which is a physical model of the damper, a fourth submodel which is a physical model of the tire and a fifth submodel which is a multidimensional model of a hydraulic model. It should be noted that the list is not exhaustive and other physical features such as tire/brake temperature etc. can be taken into account.
[0081] It should be noted that, in addition to the model P, other approaches are also conceivable for optimizing the parameterization. As an alternative to the model, what is known as a model-free reinforcement learning approach or a value-based reinforcement learning approach can also be selected. Accordingly, in step S23, for example, the Q function for value-based reinforcement learning is then created based on the recordings from step S21.
[0082] Step S24 may follow step S23. Step S24 may be referred to as “on-policy correction”. In this case, a correction model g is produced which corrects predictions of the model P(s.sub.t+1|s.sub.t, a.sub.t) by means of vehicle states in such a way that the corrected predictions are substantially covered by the detected predictions from step S21.
[0083] The corrected vehicle state is preferably corrected as follows:
s′.sub.t+1=P(s.sub.t+1|s.sub.t,a.sub.t)+g(s.sub.t,a.sub.t)
The correction model g is created so that it is optimized to the effect that it, given s.sub.t and a.sub.t, outputs a value that corresponds to the error of the model P(s.sub.t+1|s.sub.t, a.sub.t) in relation to the detected vehicle states according to S21.
[0084] Furthermore, the correction model g has the advantage that it corrects a lack of conformity of the model P compared with the actual behavior of the vehicle.
[0085] In order to be able to correct the lack of conformity of the model P compared with the actual behavior of the vehicle, the following measures can be taken as an alternative or in addition. It is conceivable that what is known as transfer learning is used for this, which involves previously ascertained vehicle states and thus permits more rapid learning of the model for the specific vehicle instance. It is also conceivable that a plurality of different models are used, as a result of which a more robust controller behavior can be learned through this group.
[0086] Step S25 follows step S23 or step S24. In said step, a plurality of rollouts are executed. That is to say the driving dynamics controller is applied for a maneuver and the resulting trajectory, in particular ascertained sequences of vehicle states, is detected using the current parameterization θ.sub.k of the driving dynamics controller 60 and using the model P, in particular additionally using the correction model g.
[0087] It should be noted that, in addition to the model P, other approaches are also conceivable for optimizing the parameterization (model-free reinforcement learning approach or value-based reinforcement learning approach). Accordingly, in this rollout step, the detection of the trajectory has to be adjusted.
[0088] Step S26 follows after step S25 has been executed or after step S25 has been executed repeatedly several times. In this step, costs for the detected trajectory/trajectories from step S25 are evaluated.
[0089] The costs for the trajectory can be ascertained as follows. Costs are preferably ascertained for each proposed action of the driving dynamics controller 60. For this purpose, a cost function c(s, a) can ascertain the costs depending on the previous trajectory or the current vehicle state s.sub.t and the currently selected action a.sub.t. The total cost for a trajectory can then be accumulated over the entire maneuver, that is to say over all times t:
The cost function c(s, a) can be made up as follows:
c(s,a)=α.sub.1*mean deceleration+α.sub.2*steerability+α.sub.3* . . . .
where α.sub.n are predeterminable coefficients that are predetermined, for example, by an application engineer or are set to initial values. These coefficients may assume a value between 0 and 1.
[0090] steerability can be understood to mean a controllability of the vehicle. Said controllability can be ascertained (for example F_lat, max−F_lat, current) depending on a force (F_lat) that acts laterally on the vehicle, possibly also depending on a normalized lateral force:
(F_lat,max−F_lat,current)/F_lat,max,non_braking).
[0091] The controllability can also be defined negatively if the cost function is intended to be minimized. In addition or as an alternative, the controllability can also be ascertained depending on longitudinal forces, such that the longitudinal forces are not fully utilized in order to allow “leeway” for lateral forces. For this purpose, a target slip range can be defined (for example slip∈[slip_min, slip_max]) in order to map onto corresponding costs using a sigmoid function, for example.
[0092] mean deceleration can be understood to mean an averaging over all accelerations of the trajectory, for example
in ABS.SUB.active..
[0093] Further components of the cost function can be given by any behavior of the vehicle that is intended to be penalized or rewarded. By way of example, this may be: comfort/jerk, that is to say how juddery the braking is, or hardware requirements (how encumbering the braking is for brake system, vehicle, hydraulics, tires, performance (for example braking distance, acceleration), directional stability, that is to say a behavior about the vertical axis.
[0094] All of these components can be evaluated based on different signals (sensor signals or estimations) and based on different cost functions, for example mean absolute error, mean squared error, root mean squared error between actual and target state (for example in slip, friction value, jerk), or standard deviation of a signal.
[0095] Another component of the cost function may be an overall braking distance. This may be a single value that is obtainable only in the temporal step in which the braking is ended (for example v.sub.veh<v.sub.threshold->c, wherein c is the overall braking distance) or a sum over v*dt for each temporal step in which the braking is active.
[0096] Another component of the cost function may be a deviation of a slip with respect to a target slip: ∥slip−lip_target∥{circumflex over ( )}2 and/or an average acceleration: mean std(acceleration).
[0097] Step S27 follows after the total costs for the trajectory or the plurality of trajectories have been ascertained in step S26. In this step, the parameters θ of the driving dynamics controller 60 are adjusted iteratively in such a way that they reduce the overall costs. In this case, optimization may be defined as follows:
θ*=argmin.sub.θJ(θ)
This optimization by means of the parameters θ can be carried out by means of a gradient descent method by means of the overall costs J.
[0098] The current parameters θ.sub.k are then adapted as follows per iteration k of the optimization:
wherein λ is a coefficient that assumes a value less than 1.
[0099] The iteration can be executed until a stop criterion is satisfied. Stop criteria could be, for example; a number of maximum iterations, a minimum change in J<J.sub.threshold, a minimum change in the parameters θ<θ.sub.threshold.
[0100] In the event that a plurality of overall costs have been ascertained, in particular for different maneuvers, the parameters can be adjusted in batches over the plurality of overall costs. The batch-wise procedure can be carried out as a batch over model parameters, as a batch over scenarios/maneuvers or as a batch over subtrajectories.
[0101] After step S27 has been terminated, steps S25 to S27 can be executed again; as an alternative, steps S21 to S27 can also be executed again.
[0102] In the optional step S28, the control system 40 of the vehicle 100 is initialized using the adjusted vehicle dynamics controller 60 from step S27.
[0103] In the subsequent optional step S29, the vehicle 100 is operated using the adjusted vehicle dynamics controller 60. In this case, the vehicle 100 can be controlled by said vehicle dynamics controller 60 when it is activated in a corresponding situation, for example when an emergency brake is carried out.
[0104] In another embodiment of the method according to
[0105] In another embodiment of the method according to
[0106]
[0107] The steps executed by the training device 300 can be stored as a computer program implemented on a machine-readable storage medium 34 and can be executed by a processor 35.
[0108] The term “computer” includes any devices for processing predefinable calculation specifications. These calculation specifications may be present in the form of software or in the form of hardware or else in a mixed form of software and hardware.