REINFORCEMENT LEARNING ALGORITHM-BASED PREDICTIVE CONTROL METHOD FOR LATERAL AND LONGITUDINAL COUPLED VEHICLE FORMATION

20240083428 ยท 2024-03-14

Assignee

Inventors

Cpc classification

International classification

Abstract

A reinforcement learning algorithm-based predictive control method for lateral and longitudinal coupled vehicle formation includes S1, combining a 3-DOF vehicle dynamics model that takes into account a nonlinear magic formula tire model with a lane keeping model and establishing a vehicle formation model; S2, constructing a distributed control framework and designing a local predictive controller for each following vehicle based on the vehicle formation model under the control framework; S3, using a reinforcement learning algorithm to solve the optimal control strategy of the local predictive controller, and applying the optimal control strategy to the target following vehicle. The present application completes the lateral and longitudinal coupled modeling of vehicle formation and considers the nonlinear characteristics of tires. In addition, the present application also transforms the global optimization problem of vehicle formation into a local optimization problem of each following vehicle.

Claims

1. A reinforcement learning algorithm-based predictive control method for a lateral and longitudinal coupled vehicle formation, comprising: S1, combining a 3-degree of freedom (DOF) vehicle dynamics model with a lane keeping model to establish a vehicle formation model, wherein the 3-DOF vehicle dynamics model takes into account a nonlinear magic formula tire model;
x.sub.i(k+1)=f(x.sub.i(k),u.sub.i(k)); wherein x.sub.i(k) is a state quantity and u.sub.i(k) is an input quantity; S2, constructing a distributed control framework and designing a local predictive controller for each target following vehicle based on the vehicle formation model under the distributed control framework; J i ( x i ( k ) , U i ( k ) ) = .Math. i = 0 T p - 1 .Math. x i ( k + i ) - r i ( k + i ) .Math. Q i 2 + .Math. x i ( k + i ) - x ^ i ( k + i ) .Math. F i 2 + .Math. x i ( k + i ) - x ^ i - 1 ( k + i ) .Math. G i 2 + .Math. u i ( k + i ) .Math. R i 2 wherein k is a current moment, k+1 is a first moment in a prediction time domain, x.sub.i() is a prediction state, r.sub.i() is an ideal state, {circumflex over (x)}.sub.i-1() and {circumflex over (x)}.sub.i() represent assumed trajectory states of the vehicle, {circumflex over (x)}.sub.i-() is obtained through an inter-vehicle communication, T.sub.p is a prediction time domain, Q.sub.i, F.sub.i, G.sub.i, R.sub.i are weight matrices; S3, using a reinforcement learning algorithm to solve an optimal control strategy of the local predictive controller and applying the optimal control strategy to the each target following vehicle.

2. The reinforcement learning algorithm-based predictive control method for the lateral and longitudinal coupled vehicle formation according to claim 1, wherein a lateral force of a tire in the nonlinear magic formula tire model is calculated by the following magic formula:
F.sub.i.sup.y=D sin(C arctan(BE(Barctan B))) wherein is a cornering angle of the tire, and B, C, D, E are simulation parameters.

3. The reinforcement learning algorithm-based predictive control method for the lateral and longitudinal coupled vehicle formation according to claim 1, wherein the 3-DOF vehicle dynamics model is expressed by the following formula: ? ( m i v . i x - m i v i y . i ) = F i x ? ( m i v . i y - m i v i x . i ) = F i yf + F i y r ; ? I i z .Math. i = a i F i yf cos i - b i F i y r ? indicates text missing or illegible when filed each parameter in the formula is a parameter of an i.sup.th vehicle, and v.sub.i.sup.x, v.sub.i.sup.y, {dot over ()}.sub.i are longitudinal speed, lateral speed, and yaw rate, F.sub.i.sup.x is a longitudinal force, F.sub.i.sup.yf and F.sub.i.sup.yr are the front and rear wheel lateral forces, respectively, m.sub.i is a vehicle mass, I.sub.i.sup.z is a moment of inertia of the 3-DOF vehicle dynamics around the z axis, .sub.i is the front wheel angle, a.sub.i and b.sub.i are a distance from the center of mass to a front axle and a distance from the mass center to a rear axle, respectively.

4. The reinforcement learning algorithm-based predictive control method for the lateral and longitudinal coupled vehicle formation according to claim 3, wherein the lane keeping model is expressed as: ? e . i p = v i x - v 0 x ? e . i y = v i x e i - v i y - L . i ? e . i = . i , des - . i ? indicates text missing or illegible when filed wherein {dot over ()}.sub.i,des is an expected heading angular speed, L is a preview distance, e.sub.i.sup.p is a longitudinal spacing error, e.sub.i.sup.y is a lateral position error between the 3-DOF vehicle dynamics model and a lane line of the lane keeping model, and e.sub.i.sup. is a heading angle error between a vehicle heading angle and a road tangent.

5. The reinforcement learning algorithm-based predictive control method for the lateral and longitudinal coupled vehicle formation according to claim 1, wherein when utilizing the reinforcement learning algorithm to solve the optimal control strategy of the local predictive controller: constructing and training an actor strategy function neural network to optimize strategy parameters; constructing and training a critic value function neural network to evaluate a pros and a cons of a current control strategy optimized by the actor strategy function neural network; obtaining the optimal control strategy according to an alternating convergence of the actor strategy function neural network and the critic value function neural network.

6. The reinforcement learning algorithm-based predictive control method for the lateral and longitudinal coupled vehicle formation according to claim 5, wherein when optimizing the strategy parameters, the actor strategy functional neural network uses a network composed of T.sub.p radial basis functions to approximate a T.sub.p-step optimal strategy and takes a state s as a first input and an action a as a first output; when assessing the pros and the cons of the current control strategy, the critic value function neural network is evaluated by the network composed of the T.sub.p radial basis functions and takes the state s and the action a as a second input and a predicted value q(s,a) as a second output.

7. The reinforcement learning algorithm-based predictive control method for the lateral and longitudinal coupled vehicle formation according to claim 6, wherein basis vectors (x) and (x) in the actor strategy function neural network and the critic value function neural network are the radial basis functions, and ( x ) = ( x ) = ( exp - .Math. x - x 1 .Math. 2 / 2 , exp - .Math. x - x 2 .Math. 2 / 2 , .Math. e - .Math. x - x M .Math. 2 / 2 ) T ; wherein is set to 1, {x.sub.i, i=1, 2, . . . M} is a center of the radial basis functions, and M is a number of hidden layers.

8. The reinforcement learning algorithm-based predictive control method for the lateral and longitudinal coupled vehicle formation according to claim 7, wherein the center of the radial basis functions is obtained by a normalization: x normalize = x collect - x min x max - x min ; wherein x.sub.normalize is a normalized data, x.sub.collect is a collected data, x.sub.max and x.sub.min are a maximum value and a minimum value in the collected data respectively; the collected data involves a randomly given within a control input range, and an input data and an output data of the vehicle formation model are collected by a simulation.

9. The reinforcement learning algorithm-based predictive control method for the lateral and longitudinal coupled vehicle formation according to claim 8, wherein the optimal control strategy is obtained through the alternating convergence; initializing an actor strategy function neural network weight 9 and a critic value function neural network weight ; obtaining the action a by the actor strategy function neural network according to a current state s of the each target following vehicle, and acting the action a on the each target following vehicle to obtain a state s.sup.c and an instant reward r; obtaining an action a.sup.c by the actor strategy function neural network according to the state s.sup.c; evaluating and scoring the action a and the action a.sup.c by the critic value function neural network to obtain the predicted value q(s,a) and a predicted value q(s.sup.ii,a) and then calculating an error TDerror: TDerror=y.sub.iq(s,a), y.sub.i=r+q(s.sup.ii,a) between the predicted value q(s,a) and an expected value y.sub.i of the critic value function neural network according to a Bellman equation; using a gradient descent method to minimize an iterative update of the actor strategy function neural network weight and the critic value function neural network weight to obtain the optimal control strategy U.sub.i*, L()=q(s,a), L()=TDerror.sup.2.

10. The reinforcement learning algorithm-based predictive control method for the lateral and longitudinal coupled vehicle formation according to claim 8, wherein in the prediction time domain, the optimal control strategy U.sub.i* is applied to the each target following vehicle through the local predictive controller.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0045] FIG. 1 is a schematic diagram of the 3-DOF vehicle dynamics model of the present application;

[0046] FIG. 2 is a Map table reflecting the relationship curve between the tire cornering angle and the tire lateral force.

[0047] FIG. 3 is a control flow chart of the local predictive controller of the present application;

[0048] FIG. 4 is a structure diagram of the reinforcement learning algorithm of the present application;

[0049] FIG. 5 is a network structure diagram composed of the radial basis function of the present application.

DETAILED DESCRIPTION OF THE EMBODIMENTS

[0050] The technical solution of the embodiment of the present application will be described clearly and completely, combined with the accompanying drawings. Obviously, the described embodiment is only part of the embodiments of the present application, not all of the embodiments. Based on the embodiment in the present application, all other embodiments obtained by ordinary technicians in this field without making creative labor belong to the scope of protection of the present application.

[0051] A reinforcement learning algorithm-based predictive control method for lateral and longitudinal coupled vehicle formation, including:

[0052] S1, combining the 3-DOF vehicle dynamics model that takes into account the nonlinear magic formula tire model with the lane keeping model, in order to establish the vehicle formation model; [0053] among them: [0054] as shown in FIG. 1, the 3-DOF vehicle dynamics model is expressed as

[00006] ? ( m i v . i x - m i v i y . i ) = F i x ? ( m i v . i y - m i v i x . i ) = F i yf + F i y r ; ? I i z .Math. i = a i F i yf cos i - b i F i y r ? indicates text missing or illegible when filed each parameter in the formula is the parameter of i.sup.th vehicle, and [0055] v.sub.i.sup.x, v.sub.i.sup.y, {dot over ()}.sub.i are longitudinal speed, lateral speed, and yaw rate, F.sub.i.sup.x is a longitudinal force, F.sub.i.sup.yf and F.sub.i.sup.yr are the front and rear wheel lateral forces, m.sub.i is a vehicle mass, I.sub.i.sup.z is a moment of inertia of the vehicle around the z axis, .sub.i is the front wheel angle, a.sub.i and b.sub.i are the distances from the center of mass to the front and rear axles, respectively.

[0056] The tire lateral force of the tire in the nonlinear magic formula tire model is calculated by the following magic formula:


F.sub.i.sup.y=D sin(C arctan(BE(Barctan B))) [0057] in the formula, is a cornering angle of the tire, and B, C, D, E are simulation parameters.

[0058] Specifically, when considering the nonlinear magic formula tire model, calculating the longitudinal load of the front and rear tires first, and then obtaining the relationship curve between the tire sideslip angle and the tire lateral force by using the interpolation method with reference to FIG. 2, and then obtaining the parameter B, C, D, E by fitting the above magic formula of the tire.

[0059] The lane keeping model is expressed as.

[00007] ? e . i p = v i x - v 0 x ? e . i y = v i x e i - v i y - L . i ? e . i = . i , des - . i ? indicates text missing or illegible when filed

[0060] in the formula, {dot over ()}.sub.i,des is expected heading angular speed, L is a preview distance, e.sub.i.sup.p is a longitudinal spacing error, e.sub.i.sup.y is a lateral position error between the vehicle and the lane line, and e.sub.i.sup. is a heading angle error between the vehicle heading angle and the road tangent.

[0061] In summary, combining the above 3-DOF vehicle dynamics model with the lane keeping model, the vehicle formation model can be obtained as follows;

[00008] ? v . i x = v i y . i + 1 m i F i x ? v . i y = - v i x . i + F i y f + F i y r m i ? .Math. i = 1 I i z ( a i F i y f - b i F i y r ) ; ? e . i p = v i x - v 0 x ? e . i y = v i x e i - v i y - L . i ? e . i = . i , des - . i ? indicates text missing or illegible when filed [0062] taking the state quantity x.sub.i=[v.sub.i.sup.x v.sub.i.sup.y {dot over ()}.sub.i e.sub.i.sup.p e.sub.i.sup.y e.sub.i.sup.].sup.T, the control quantity u.sub.i=[F.sub.i.sup.x .sub.i].sup.T and the sampling time T.sub.s, the discrete form of the vehicle formation model can be obtained after discretization of the above vehicle formation model: x.sub.i(k+1)=f(x.sub.i(k),u.sub.i(k)); in the formula, x.sub.i(k) is the state variable and u.sub.i(k) is the input variable. [0063] S2, constructing a distributed control framework and designing a local predictive controller for each following vehicle based on the vehicle formation model under the control framework


J.sub.i(x.sub.i(k),U.sub.i(k))=.sub.l=0.sup.T.sup.p.sup.-1x.sub.i(k+i)r.sub.i(k+i).sub.Q.sub.i.sup.2+x.sub.i(k+i){circumflex over (x)}.sub.i(k+i).sub.F.sub.i.sup.2+x.sub.i(k+i){circumflex over (x)}.sub.i-1(k+i).sub.G.sub.i.sup.2+u.sub.i(k+i).sub.R.sub.i.sup.2

[0064] in the formula, k is a current moment, k+i is a first moment in the prediction time domain, x.sub.i() is a prediction state, r.sub.i() is an ideal state, {circumflex over (x)}.sub.i-1() and {circumflex over (x)}.sub.i() represent assumed trajectory states of the vehicle, {circumflex over (x)}.sub.i-() is obtained through inter-vehicle communication, T.sub.p is a prediction time domain, Q.sub.i, F.sub.i, G.sub.i, R.sub.i are the weight matrices; where the assumed input of each vehicle is defined as follows:

[00009] u ^ i ( k + j | k + 1 ) = ? u i * ( k + j | k ) , j = 0 , .Math. T p - 2 ? 0 , j = T p - 1 ; ? indicates text missing or illegible when filed

[0065] It is assumed that the trajectory can be calculated from the assumed input:

[00010] ? x ^ i ( k + j + 1 | k + 1 ) = f ( x ^ i ( k + j | k + 1 ) , u ^ i ( k + j | k + 1 ) ) ? x ^ i ( k + 1 | k + 1 ) = x i * ( k + 1 | k ) ? indicates text missing or illegible when filed [0066] S3, using a reinforcement learning algorithm to solve the optimal control strategy of the local predictive controller and applying the optimal control strategy to the target following vehicle through the local predictive controller.

[0067] (31) Constructing the actor strategy function neural network and the critic value function neural network.

[0068] Specifically, combined with the structure shown in FIG. 4, the actor strategy function neural network and the critic value function neural network are set as follows:

[0069] The actor strategy function neural network uses a network consisting of T.sub.p radial basis functions to approximate the T.sub.p-step optimal strategy; [0070] evaluating the critical value function neural network by a network consisting of T.sub.p radial basis functions; [0071] the network structure composed of T.sub.p radial basis functions is shown in FIG. 5, in the approximation optimal strategy, the actor strategy function neural network takes state s as input and action a as output; in the evaluation, the critic value function neural network takes state s and action a as input and state-action value q(s,a) as output.

[0072] Preferably, the basis vectors (x) and (x) in the actor strategy function neural network and the critic value function neural network are radial basis functions, and

[00011] ( x ) = ( x ) = ( exp - .Math. x - x 1 .Math. 2 / 2 , exp - .Math. x - x 2 .Math. 2 / 2 , .Math. e - .Math. x - x M .Math. 2 / 2 ) T

[0073] In the formula, x is set to 1, {x.sub.i=1, 2, . . . M} as the center of the radial basis function, and M is the number of hidden layers.

[0074] The center of radial basis function is obtained by normalization:

[00012] x normalize = x collect - x min x max - x min ; [0075] in the formula, x.sub.normalize is normalized data, x.sub.collect is collected data, x.sub.max and x.sub.min are the maximum and minimum values in the collected data respectively; specifically, the collected data involves a randomly given within the control input range, and the input data and output data of the vehicle formation model are collected by simulation.

[0076] (32) Training actor strategy function neural network and critic value function neural network.

[0077] initializing the actor strategy function neural network weight and the critic value function neural network weight ;

[0078] obtaining action a by the actor strategy function neural network according to the current state s of the target following vehicle, and acting action a on the target following vehicle to obtain new state s.sup.c and instant reward r; [0079] obtaining new action a.sup.c by the actor strategy function neural network according to the new state s.sup.c; [0080] evaluating and scoring action a and action a.sup.c by the critic value function neural network to obtain q(s,a) and q(s.sup.ii,a) and then calculating an error TDerror: TDerror=y.sub.iq(s,a), y.sub.i=r+q(s.sup.ii,a) between a predicted value q(s,a) and an expected value y.sub.i of the critic value function neural network according to the Bellman equation;

[0081] in order to minimize the value function obtained by the action output of the actor strategy function neural network, taking the value function q(s,a) as the loss function L() of the actor strategy function neural network, that is, L()=q(s,a), and using the gradient descent method to iteratively update the weight ; in order to make the score of the critic value function neural network more accurate, taking the loss function L()=TDerror.sup.2 of the critic value function neural network, and updating the weight iteratively by the gradient descent method; specifically, when the number of iterations or the accuracy meets the preset conditions, the optimal control strategy U.sub.i* is obtained.

[0082] (33) Using the above actor strategy function neural network and critic value function neural network to solve the optimal control strategy of the local predictive controller, and the optimal control strategy U.sub.i* solved in the prediction time domain acts on the target following vehicle through the local predictive controller.

[0083] Although the embodiment of the present application has been presented and described, it is understandable to ordinary technicians in the field that these embodiments can be varied, modified, replaced, and amended without departing from the principles and spirit of the present application, Therefore, the scope of the present application is limited by the accompanying claims and their equivalents.