Complex network cognition-based federated reinforcement learning end-to-end autonomous driving control system, method, and vehicular device

Abstract

The provided are a federated reinforcement learning (FRL) end-to-end autonomous driving control system and method, as well as vehicular equipment, based on complex network cognition. An FRL algorithm framework is provided, designated as FLDPPO, for dense urban traffic. This framework combines rule-based complex network cognition with end-to-end FRL through the design of a loss function. FLDPPO employs a dynamic driving guidance system to assist agents in learning rules, thereby enabling them to navigate complex urban driving environments and dense traffic scenarios. Moreover, the provided framework utilizes a multi-agent FRL architecture, whereby models are trained through parameter aggregation to safeguard vehicle-side privacy, accelerate network convergence, reduce communication consumption, and achieve a balance between sampling efficiency and high robustness of the model.

Claims

1. A complex network cognition-based federated reinforcement learning (FRL) end-to-end autonomous driving control system comprising: a measurement encoder, an image encoder, a complex network cognition module, a reinforcement learning module, and a federated learning module, wherein: the measurement encoder is implemented by at least one processor and is configured to obtain state quantities required by the complex network cognition module, which is implemented by the at least one processor, and the reinforcement learning module, which is implemented by the at least one processor, the state quantities required by the complex network cognition module comprise a x-coordinate, a y-coordinate, a heading angle change and a speed of a given driving agent of a plurality of driving agents, the state quantities required by the complex network cognition module are handed over to the complex network cognition module as inputs, the state quantities required by the reinforcement learning module comprise a steering wheel angle, a throttle, a brake, a gear, a lateral speed and a longitudinal speed, the state quantities required by the reinforcement learning module are given to the reinforcement learning module after extracting features from a two-layer fully connected network; the image encoder is implemented by the at least one processor and is configured to obtain an amount of image implicit state required by the reinforcement learning module, an image used is a 15-channel semantic bird's eye view (BEV), i.sub.RL[0, 1].sup.192*192*15, 192 is in pixels and the BEV used is 5 px/m, 15 channels contain a drivable domain, a desired path, a road edge, 4 frames of other vehicles, 4 frames of pedestrians, and 4 frames of traffic signs, wherein the desired path is calculated using an A* algorithm, the semantic BEV is extracted by multilayer convolutional layers to extract implicit features and then passed to the reinforcement learning module as another part of the inputs; the complex network cognition module is configured to model a driving situation of a driving subject, and to obtain a maximum risk value of the driving subject in a current driving situation according to the state quantity provided by the measurement encoder, and finally to output dynamic driving suggestions based on a risk value through an activation function; the reinforcement learning module is configured to integrate the state quantities output from the measurement encoder and the image encoder, output corresponding strategies according to integrated network inputs, and interact with an environment to generate experience samples stored in a local replay buffer in the federated learning module, which is implemented by the at least one processor, when a number of the experience samples reaches a certain threshold, a batch of sample is taken from the local replay buffer for training, and finally trained neural network parameters are uploaded to the federated learning module; and the federated learning module is configured to receive the finally trained neural network parameters uploaded by the reinforcement learning module of the plurality of driving agents, and to aggregate a set of global parameters based on the finally trained neural network parameters, and finally to send the global parameters to the plurality of driving agents until a neural network converges, a global parameter aggregation is performed by a following equation: $_{m}^{*} = \frac{1}{N} \underset{n}{.Math.}_{m}^{n}$ wherein .sub.m* denotes the global parameters at time m, N denotes a number of the plurality of driving agents, and .sub.m.sup.n denotes the neural network parameters at time m of an nth driving agent of the plurality of driving agents; wherein the activation function is configured to map the risk value, Activate(Risk) represents different activation functions according to different driving suggestions, and the mapped risk value will be used as a basis for guiding an output strategy of the reinforcement learning module: ${Activate}_{go} (Risk) = \frac{4}{(1 + \exp (- 300 / Risk)) - 1} {Activate}_{stop} (Risk) = \frac{4}{(1 + \exp (- 0.2 * Risk)) - 1}$ wherein Activate.sub.go(Risk) denotes an activation function when a driving suggestion is forward, Activate.sub.stop(Risk) denotes an activation function when the driving suggestion is stop, and Risk denotes a current risk value of a self-vehicle, a dynamic risk suggestion B.sub.risk: $B_{risk} = B ({Activate}_{go} (Risk),_{go}), go B_{risk} = B (_{stop}, {Activate}_{stop} (Risk)), stop$ wherein B denotes a beta distribution with .sub.stop=.sub.go=1.

2. The complex network cognition-based FRL end-to-end autonomous driving control system according to claim 1, wherein a modeling process of the complex network cognition module constructs a dynamic complex network model with a traffic participant and a road infrastructure as nodes: $G_{t} = {(P, E, W,)}_{t}$ wherein G.sub.t denotes a dynamic complex network at a moment t, P={p.sub.1, p.sub.2, . . . , p.sub.N} is a set of nodes, and a number of nodes is N; E={e.sub.1,2, e.sub.1,3, . . . , e.sub.i,j} is a set of edges, and a number of edges is $\frac{N (N - 1)}{2},$ and e.sub.i,j stands for a connectivity between nodes p.sub.i and p.sub.j; W={w.sub.1,2, w.sub.1,3, . . . , w.sub.i,j} is a set of weights of the edges, w.sub.i,j represents a coupling strength between nodes p.sub.i and p.sub.j; is an active region of the nodes, representing dynamic constraints on the set of nodes in the dynamic complex network, model as a smooth bounded surface: $F_{} (x, y, z) = 0, s . t . (x, y)$ wherein is a boundary of a representation of a slip surface, consider a continuous time dynamic complex network with N nodes on with a node state equation of a form: $\dot{X_{i}} = A_{i} X_{i} + B_{i} U_{i}$ wherein X.sub.iR.sup.m denotes a state vector of the node p.sub.i, R.sup.m denotes a vector space consisting of m-dimensional real numbers R, U.sub.iR.sup.q is an input vector, R.sup.q denotes a vector space consisting of q-dimensional real numbers R, A.sub.i denotes a dynamic matrix, B.sub.i denotes an input matrix, based on the node state equation, an output vector of the node p.sub.i can be obtained: $Y_{i} = f_{i} (X_{i})$ wherein f.sub.i denotes an output function of the node p.sub.i, then a weight function between the nodes p.sub.i and p.sub.j is: $w_{ij} = F (Y_{i}, Y_{j})$ wherein F denotes the weight function between the nodes p.sub.i and p.sub.j.

3. The complex network cognition-based FRL end-to-end autonomous driving control system according to claim 2, wherein a Gaussian function is used in the dynamic complex network to reveal a static property between nodes: $S_{sta} = C_{a} .Math. \exp (- \frac{{(x - x_{0})}^{2}}{a_{x}^{2}} - \frac{{(y - y_{0})}^{2}}{b_{y}^{2}})$ wherein S.sub.sta denotes a static field strength, C.sub.a denotes a field strength coefficient, x.sub.0 and y.sub.0 denote a coordinate of a risk center O(x.sub.0, y.sub.0), and a.sub.x and b.sub.y denote vehicle appearance coefficients, respectively, a safety field is characterized by shape anisotropy: $= \frac{a_{x}^{2} - b_{y}^{2}}{a_{x}^{2} + b_{y}^{2}} = \frac{^{2} - 1}{^{2} + 1} = a_{x} / b_{y} = l_{v} / w_{v}$ wherein is an aspect ratio, l.sub.v denotes a vehicle length, and w.sub.v denotes a vehicle width.

4. The complex network cognition-based FRL end-to-end autonomous driving control system according to claim 3, wherein a series of isofield lines are used to delineate the safety field, a top view projection of the series of isofield lines is a region covered by a series of ellipses, with a region covered by a smallest center ellipse being a core domain, a region between the smallest center ellipse and a second ellipse being a limited domain, and a region between the second ellipse and a largest ellipse being an extended domain, size and shape of the region are determined by the series of isofield lines, are related to a vehicle shape and a motion state, and are described based on the Gaussian function, a direction of the safety field is aligned with a direction of vehicle motion, when a vehicle is in motion, the risk center O(x.sub.0, y.sub.0) of the safety field will be transferred to a new risk center O(x.sub.0, y.sub.0): ${\begin{matrix} x_{0}^{} = x_{0} + k_{v} .Math. \overset{.fwdarw.}{v} .Math. \cos \\ y_{0}^{} = y_{0} + k_{v} .Math. \overset{.fwdarw.}{v} .Math. \sin \end{matrix}$ wherein k.sub.v denotes a moderating factor and k.sub.v custom character {(1, 0)(0,1)}, k.sub.v's positive or negative is related to the direction of vehicle motion, and denotes an angle of a transfer vector k.sub.v|{right arrow over (v)}| with axes in a Cartesian coordinate system, a virtual vehicle, with a vehicle length l.sub.v and a vehicle width w.sub.v, is formed under risk center transfer, a dynamic safety field: $S_{dyn} = C_{a} .Math. \exp (- \frac{{(x - x_{0}^{})}^{2}}{{(a_{x}^{})}^{2}} - \frac{{(y - y_{0}^{})}^{2}}{{(b_{y}^{})}^{2}})$ wherein S.sub.syn denotes a dynamic field strength and a new aspect ratio is denoted as =a.sub.x/b.sub.y=l.sub.v/w.sub.v, as the motion state of the virtual vehicle changes, a shape of a Gaussian safety field changes, thus changing three fields covered by the safety field: a core region, a limited region and an extended region.

5. The complex network cognition-based FRL end-to-end autonomous driving control system according to claim 4, wherein a risk perception is categorized into three types on a planar scale based on different levels of human driving reaction time: a first cognitive domain, a second cognitive domain and an extra-domain space, wherein: the first cognitive domain: $\begin{matrix} a_{x}^{} s_{th 1} \\ s_{th 1} = t_{c 1} .Math. v_{e} \end{matrix}$ the second cognitive domain: $\begin{matrix} s_{th 1} < a_{x}^{} s_{th 2} \\ s_{th 2} = t_{c 2} .Math. v_{e} \end{matrix}$ the extra-domain space: $s_{th 2} < a_{x}^{}$ wherein s.sub.th1 denotes a first cognitive domain threshold, obtained from a human driving a first reaction time t.sub.c1 and a maximum approach speed v.sub.e of other nodes relative to the self-vehicle, s.sub.th2 denotes a second cognitive domain threshold, obtained from a human driving a second reaction time t.sub.c2 and the maximum approach speed v.sub.e of the other nodes relative to the self-vehicle, establish a risk perception function between nodes under a dynamic safety field model: $Risk (p_{i}, p_{j}) = .Math. \overset{.fwdarw.}{S_{i, j}} .Math. \exp (- k_{c} .Math. \overset{.fwdarw.}{v} .Math. \cos_{i, j})$ wherein $.Math. \overset{.fwdarw.}{S_{i, j}} .Math.$ denotes a field strength of the node p.sub.i at the node p.sub.j, k.sub.c denotes a risk-adjustment cognitive coefficient, |{right arrow over (v)}.sub.j| denotes a scalar velocity of the node p.sub.j, and .sub.i,j denotes an angle (positive in a clockwise direction) between a velocity vector {right arrow over (v)}.sub.j of the node p.sub.j and a field strength vector $\overset{.fwdarw.}{S_{i, j}},$ a risk value Risk(p.sub.i, p.sub.j), obtained through the risk perception function, indicates a coupling strength between nodes, wherein the risk value and the coupling strength are positively correlated.

6. The complex network cognition-based FRL end-to-end autonomous driving control system according to claim 1, wherein a CARLA simulator is used as an interaction environment, the CARLA simulator realizes vehicle control by inputting control quantities of a steering, a throttle and a brake, wherein steering [1, 1], throttle [0, 1] and brake [0, 1], based on a CARLA simulator's control method, a reinforcement learning action space [1, 1].sup.2, is categorized into the steering and a throttle-brake, when outputting the throttle-brake, [1, 0] denotes the brake and [0, 1] denotes the throttle, the driving control system outputs two parameters of the beta distribution by the reinforcement learning module, and then obtains a policy action by sampling: $Beta = B (,),, > 0$ an interaction process produces an experience, described by a tuple, containing a previous moment state quantity, the policy action, a reward function, a next moment state quantity, and the dynamic driving suggestion, calculate a weighted reward function with a mapped risk value as a weight for a termination state-related reward: $\begin{matrix} r = r_{speed} + r_{position} + r_{action} + Activate (Risk) * r_{terminal} \\ r_{speed} = \frac{1 - .Math. v - v_{desire} .Math.}{v_{\max}} \\ r_{position} = - 0.5 * d - \end{matrix}$ wherein r.sub.speed denotes a speed-related reward function, r.sub.position denotes a position-related reward function, r.sub.action denotes an action-related reward function, r.sub.terminal denotes a termination state-related reward function, v denotes a vehicle speed, v.sub.desire denotes a desired speed, and v.sub.max=6 m/s denotes a maximum speed, d denotes a vehicle lateral distance from the desired path, and denotes an angle between a vehicle traveling direction and a tangent line of the desired path, table 1 describes values of the r.sub.action and the r.sub.terminal in detail, wherein steering denotes an amount of steering wheel angle change in two frames TABLE-US-00002 TABLE 1 Reward Condition Value r.sub.action steering 0.01 1 v r.sub.terminal Run red light 1 v Run stop sign 1 v Collision 1 v Route deviation 1 Blocked 1.

7. The complex network cognition-based FRL end-to-end autonomous driving control system according to claim 6, wherein in a training process, a parameter updating is performed through following loss functions for the reinforcement learning module: $\begin{matrix} _{k + 1} = \arg \max_{} \underset{_{_{k}}}{E} [_{ppo} +_{\exp} +_{risk}] \\ _{\exp} = -_{\exp} * H (_{} (.Math. .Math. i_{RL}, m_{RL})) \\ H (_{}) = - KL (_{} .Math. (- 1, 1)) \\ _{risk} =_{risk} * {T - N_{z} + 1, .Math., T} (k) \\ * KL (_{} (.Math. .Math. i_{RL, k}, m_{RL, k}) .Math. B_{risk}) \end{matrix}$ wherein custom character .sub.ppo denotes a clipped policy gradient loss with advantages estimated using a generalized advantage estimation, .sub.exp denotes a maximum entropy loss, H(.sub.(.Math.|i.sub.RL, m.sub.RL)) denotes an entropy of a policy .sub. under an image input i.sub.RL and a measurement input m.sub.RL, and custom character (1, 1) denotes a uniform distribution, L.sub.exp encourage the given driving agent to explore by converging an action distribution to the uniform distribution, .sub.exp denotes a weight of the maximum entropy loss, .sub.risk denotes dynamic risk suggestions based loss, and .sub.{TN.sub.z.sub.+1, . . . ,T}(k) denotes a calculation of a KL-divergence of the strategy output by the driving subject N.sub.z=100 steps before a termination state and the dynamic driving suggestions to realize a guidance of the given driving agent, and .sub.risk denotes a weight of a dynamic suggestions loss.

8. The complex network cognition-based FRL end-to-end autonomous driving control system according to claim 1, wherein in a training process, a parameter updating is performed through following loss functions for the reinforcement learning module: $\begin{matrix} _{k + 1} = \arg \max_{} \underset{_{_{k}}}{E} [_{ppo} +_{\exp} +_{risk}] \\ _{\exp} = -_{\exp} * H (_{} (.Math. .Math. i_{RL}, m_{RL})) \\ H (_{}) = - KL (_{} .Math. (- 1, 1)) \\ _{risk} =_{risk} * {T - N_{z} + 1, .Math., T} (k) \\ * KL (_{} (.Math. .Math. i_{RL, k}, m_{RL, k}) .Math. B_{risk}) \end{matrix}$ wherein custom character .sub.ppo denotes a clipped policy gradient loss with advantages estimated using a generalized advantage estimation, .sub.exp denotes a maximum entropy loss, H(.sub.(.Math.|i.sub.RL, m.sub.RL)) denotes an entropy of a policy .sub. under an image input i.sub.RL and a measurement input m.sub.RL, and custom character (1, 1) denotes a uniform distribution, .sub.exp encourage the given driving agent to explore by converging an action distribution to the uniform distribution, .sub.exp denotes a weight of the maximum entropy loss, .sub.risk denotes dynamic risk suggestions based loss, and .sub.{TN.sub.z.sub.+1, . . . ,T}(k) denotes a calculation of a KL-divergence of the strategy output by the driving subject N.sub.z=100 steps before a termination state and the dynamic driving suggestions to realize a guidance of the given driving agent, and .sub.risk denotes a weight of a dynamic suggestions loss.

9. A complex network cognition-based FRL end-to-end autonomous driving control method, comprising the following steps: step 1: building an urban dense traffic simulation environment in a CARLA simulator, wherein the simulation environment contains a driving subject, traffic participants, and a road infrastructure, the driving subject is a plurality of agents, modeled as Markov decision processes, respectively, and using a reinforcement learning module implemented by at least one processor for a steering wheel, a throttle and a brake control, the Markov decision process is described by a tuple (S, A, P, R, ), wherein S denotes a state set, corresponding to state quantities acquired by a measurement encoder implemented by the at least one processor and an image encoder implemented by the at least one processor, and contains a steering wheel angle, a throttle, a brake, a gear, lateral and longitudinal speeds, and a 15-channel semantic BEV; A denotes an action set, corresponding to the steering wheel, the throttle, and the brake control quantities of the driving subject; P denotes a state transfer equation p: SA.fwdarw.P(S), each state-action pair (s, a)SA has a probability distribution p(.Math.|s, a) of entering a new state after adopting an action a, in a state s; R denotes a reward function R: SXA.fwdarw.R, R(s.sub.t+1, s.sub.t, a.sub.t) denotes a reward obtained after entering the new state s.sub.t+1 from an original state s.sub.t, a goodness of performing the action is defined by the reward function; denotes a discount factor, [0, 1], is configured to compute a cumulative reward (.sub.)=.sub.i=0.sup.T.sup.ir.sub.i, wherein T denotes a current moment, .sup.i denotes a discount factor of moment i, and r.sub.i denotes an immediate reward of moment i, a solution to the Markov decision process is to find a strategy : S.fwdarw.A maximize the cumulative reward *:=argmax.sub.(.sub.), the reinforcement learning module integrates implicit state quantities output by the measurement encoder and the image encoder and outputs a corresponding optimal control policy; step 2: building the complex network cognition module implemented by the at least one processor to model a driving situation of the driving subject, establish a complex network model, and output dynamic driving suggestions based on state quantities provided by the measurement encoder through an activation function, the complex network model represents a dynamic relationship between nodes within a field range through a variable Gaussian safety field based on risk center transfer, the nodes contain the driving subject, the traffic participants, and the road infrastructure; step 3: constructing an end-to-end neural network, comprising 2 fully connected layers used by the measurement encoder, 6 convolutional layers used by the image encoder and 6 fully connected layers used by the reinforcement learning module, the neural network has two output heads, an action head and a value head, the action head outputs two parameters of a beta distribution and the value head outputs a value of the action; step 4: interacting the driving subject with a CARLA simulation environment and storing experiences in respective local replay buffers, wherein when the a number of samples reaches a certain threshold, samples of the number of samples are sampled from the respective local replay buffer according to a mini-batch, and then neural network parameters are updated according to a designed loss function; step 5: uploading the neural network parameters corresponding to each driving subject to a federated learning module implemented by the at least one processor, aggregating global parameters based on the neural network parameters according to an aggregation interval, and sending the global parameters to each agent until the neural network converges; wherein in step 1, the traffic participants comprise other vehicles and pedestrians, the road infrastructure contains traffic lights and traffic signs, represented in the image encoder input using a 4-frame semantic BEV; in step 2, the complex network cognition module takes different activation functions according to different driving suggestions: $\begin{matrix} {Activate}_{go} (Risk) = \frac{4}{(1 + \exp (- 300 / Risk)) - 1} \\ {Activate}_{stop} (Risk) = \frac{4}{(1 + \exp (- 0.2 * Risk)) - 1} \end{matrix}$ in step 2, the dynamic driving suggestions are represented using the beta distribution: $\begin{matrix} B_{risk} = B ({Activate}_{go} (Risk),, go \\ B_{risk} = B (, {Activate}_{stop} (Risk)), stop \end{matrix}$ in step 3, the neural network of the measurement encoder uses a Relu activation function for the 2 fully connected layers, the neural network of the image encoder uses the Relu activation function for the 5 convolutional layers except for the last convolutional layer that spreads a state volume without using any activation function, the neural network of the reinforcement learning module uses a Softplus activation function for a last layer of the action head, a last layer of the value head does not use any activation function, the other fully connected layers use the Relu activation function; in step 4, a set of parameters used in the training process: a learning rate 0.00001; a total step size 12288; a mini-batch sampling size 256; a set of loss function weights .sub.ppo 0.5, .sub.exp 0.01, and .sub.risk 0.05, respectively; a range of a PPO clip 0.2; and a pair of parameters, 0.99 and 0.9, for a generalized advantage estimation; in step 4, the loss function used for the training process uses dynamic risk suggestions based loss custom character .sub.risk, a guidance of the agent is realized by calculating a KL-divergence of the strategy and the dynamic driving suggestions output by the driving subject N.sub.z=100 steps before a termination state; in step 5, the federated learning module is a multi-agent framework that uses a local replay buffer architecture between agents; and in step 5, a global parameter aggregation uses a parameter-averaged aggregation method with the aggregation interval of 256.

10. A vehicular device, wherein the vehicular device includes A complex network cognition-based FRL end-to-end autonomous driving control system comprising: a measurement encoder, an image encoder, a complex network cognition module, a reinforcement learning module, and a federated learning module, wherein: the measurement encoder is implemented by at least one processor and is configured to obtain state quantities required by the complex network cognition module, which is implemented by the at least one processor, and the reinforcement learning module, which is implemented by the at least one processor, the state quantities required by the complex network cognition module comprise a x-coordinate, a y-coordinate, a heading angle change and a speed of a given driving agent of a plurality of driving agents, the state quantities required by the complex network cognition module are handed over to the complex network cognition module as inputs, the state quantities required by the reinforcement learning module comprise a steering wheel angle, a throttle, a brake, a gear, a lateral speed and a longitudinal speed, the state quantities required by the reinforcement learning module are given to the reinforcement learning module after extracting features from a two-layer fully connected network; the image encoder is implemented by the at least one processor and is configured to obtain an amount of image implicit state required by the reinforcement learning module, an image used is a 15-channel semantic bird's eye view (BEV), i.sub.RL[0, 1].sup.192*192*15, 192 is in pixels and the BEV used is 5 px/m, 15 channels contain a drivable domain, a desired path, a road edge, 4 frames of other vehicles, 4 frames of pedestrians, and 4 frames of traffic signs, wherein the desired path is calculated using an A* algorithm, the semantic BEV is extracted by multilayer convolutional layers to extract implicit features and then passed to the reinforcement learning module as another part of the inputs; the complex network cognition module is configured to model a driving situation of a driving subject, and to obtain a maximum risk value of the driving subject in a current driving situation according to the state quantity provided by the measurement encoder, and finally to output dynamic driving suggestions based on a risk value through an activation function; the reinforcement learning module is configured to integrate the state quantities output from the measurement encoder and the image encoder, output corresponding strategies according to integrated network inputs, and interact with an environment to generate experience samples stored in a local replay buffer in the federated learning module, which is implemented by the at least one processor, when a number of the experience samples reaches a certain threshold, a batch of sample is taken from the local replay buffer for training, and finally trained neural network parameters are uploaded to the federated learning module; and the federated learning module is configured to receive the finally trained neural network parameters uploaded by the reinforcement learning module of the plurality of driving agents, and to aggregate a set of global parameters based on the finally trained neural network parameters, and finally to send the global parameters to the plurality of driving agents until a neural network converges, a global parameter aggregation is performed by a following equation: $_{m}^{*} = \frac{1}{N} \underset{n}{.Math.}_{m}^{n}$ wherein .sub.m* denotes the global parameters at time m, N denotes a number of the plurality of driving agents, and .sub.m.sup.n denotes the neural network parameters at time m of an nth driving agent of the plurality of driving agents; wherein the activation function is configured to map the risk value, Activate(Risk) represents different activation functions according to different driving suggestions, and the mapped risk value will be used as a basis for guiding an output strategy of the reinforcement learning module: ${Activate}_{go} (Risk) = \frac{4}{(1 + \exp (- 300 / (Risk))} - 1$ ${Activate}_{stop} (Risk) = \frac{4}{(1 + \exp (- 0.2 * (Risk))} - 1$ wherein Activate.sub.go(Risk) denotes an activation function when a driving suggestion is forward, Activate.sub.stop(Risk) denotes an activation function when the driving suggestion is stop, and Risk denotes a current risk value of a self-vehicle, a dynamic risk suggestion B.sub.risk: $B_{r i s k} = B ({Activate}_{g o} (Risk),), go$ $B_{r i s k} = B (, {Activate}_{stop} (Risk)), stop$ wherein B denotes a beta distribution with .sub.stop=.sub.go=1.

11. The vehicular device according to claim 10, wherein in the complex network cognition-based FRL end-to-end autonomous driving control system, a modeling process of the complex network cognition module constructs a dynamic complex network model with the traffic participant and the road infrastructure as nodes; $G_{t} = {(P, E, W,)}_{t}$ wherein G.sub.t denotes a dynamic complex network at a moment t, P={p.sub.1, p.sub.2 . . . , p.sub.N} is a set of nodes, and a number of nodes is N; E={e.sub.1,2, e.sub.1,3, . . . , e.sub.i,j} is a set of edges, and a number of edges is $\frac{N (N - 1)}{2},$ and e.sub.i,j stands for a connectivity between nodes p.sub.i and p.sub.j; W={w.sub.1,2, w.sub.1,3, . . . , w.sub.i,j} is a set of weights of the edges, w.sub.i,j represents a coupling strength between nodes p.sub.i and p.sub.j; is an active region of the nodes, representing dynamic constraints on the set of nodes in the dynamic complex network, model as a smooth bounded surface: $\begin{matrix} F_{} (x, y, z) = 0, & s . t . & (x, y) \end{matrix}$ wherein is a boundary of a representation of a slip surface, consider a continuous time dynamic complex network with N nodes on with a node state equation of a form: ${\overset{}{X}}_{i} = A_{i} X_{i} + B_{i} U_{i}$ wherein X.sub.iR.sup.m denotes a state vector of the node p.sub.i, R.sup.m denotes a vector space consisting of m-dimensional real numbers R, U.sub.iR.sup.q is an input vector, R.sup.q denotes a vector space consisting of q-dimensional real numbers R, A.sub.i denotes a dynamic matrix, B.sub.i denotes an input matrix, based on the node state equation, an output vector of the node p.sub.i can be obtained: $Y_{i} = f_{i} (X_{i})$ wherein f.sub.i denotes an output function of the p.sub.i node, then a weight function between the nodes p.sub.i and p.sub.j is: $w_{i j} = F (Y_{i}, Y_{j})$ wherein F denotes the weight function between the nodes p.sub.i and p.sub.j.

12. The vehicular device according to claim 11, wherein in the complex network cognition-based FRL end-to-end autonomous driving control system, a Gaussian function is used in the dynamic complex network to reveal a static property between nodes: $S_{s t a} = C_{a} .Math. \exp (- \frac{{(x - x_{0})}^{2}}{a_{x}^{2}} - \frac{{(y - y_{0})}^{2}}{b_{y}^{2}})$ wherein S.sub.sta denotes a static field strength, C.sub.a denotes a field strength coefficient, x.sub.0 and y.sub.0 denote a coordinate of a risk center O(x.sub.0, y.sub.0), and a.sub.x and b.sub.y denote vehicle appearance coefficients, respectively, a safety field is characterized by shape anisotropy: $\begin{matrix} = \frac{a_{x}^{2} - b_{y}^{2}}{a_{x}^{2} + b_{y}^{2}} = \frac{^{2} - 1}{^{2} + 1} \\ = a_{x} / b_{y} = l_{v} / w_{v} \end{matrix}$ wherein is an aspect ratio, l.sub.v denotes a vehicle length, and w.sub.v denotes a vehicle width.

13. The vehicular device according to claim 12, wherein in the complex network cognition-based FRL end-to-end autonomous driving control system, a series of isofield lines are used to delineate the safety field, a top view projection of the series of isofield lines is a region covered by a series of ellipses, with a region covered by a smallest center ellipse being a core domain, a region between the smallest center ellipse and a second ellipse being a limited domain, and a region between the second ellipse and a largest ellipse being an extended domain, size and shape of the region are determined by the series of isofield lines, are related to a vehicle shape and a motion state, and are described based on the Gaussian function, a direction of the safety field is aligned with a direction of vehicle motion, when a vehicle is in motion, the risk center O(x.sub.0, y.sub.0) of the safety field will be transferred to a new risk center O(x.sub.0, y.sub.0): ${\begin{matrix} x_{0}^{'} = x_{0} + k_{v} .Math. \overset{.fwdarw.}{v} .Math. \cos \\ y_{0}^{'} = y_{0} + k_{v} .Math. \overset{.fwdarw.}{v} .Math. \sin \end{matrix}$ wherein k.sub.v denotes a moderating factor and k.sub.v custom character {(1, 0)(0, 1)}, k.sub.v's positive or negative is related to the direction of vehicle motion, and denotes an angle of a transfer vector k.sub.v|{right arrow over (v)}| with axes in a Cartesian coordinate system, a virtual vehicle, with a vehicle length l.sub.v and a vehicle width w.sub.v, is formed under the risk center transfer, a dynamic safety field: $S_{dy n} = C_{a} .Math. \exp (- \frac{{(x - x_{0}^{'})}^{2}}{{(a_{x}^{'})}^{2}} - \frac{{(y - y_{0}^{'})}^{2}}{{(b_{y}^{'})}^{2}})$ wherein S.sub.dyn denotes a dynamic field strength and a new aspect ratio is denoted as =a.sub.x/b.sub.y=l.sub.v/w.sub.v, as the motion state of the virtual vehicle changes, a shape of a Gaussian safety field changes, thus changing three fields covered by the safety field: a core region, a limited region and an extended region.

14. The vehicular device according to claim 13, wherein in the complex network cognition-based FRL end-to-end autonomous driving control system, a risk perception is categorized into three types on a planar scale based on different levels of human driving reaction time: a first cognitive domain, a second cognitive domain and an extra-domain space, wherein: the first cognitive domain: $\begin{matrix} a_{x}^{'} s_{t h 1} \\ s_{t h 1} = t_{c 1} .Math. v_{e} \end{matrix}$ the second cognitive domain: $\begin{matrix} s_{t h 1} < a_{x}^{'} s_{t h 2} \\ s_{t h 2} = t_{c 2} .Math. v_{e} \end{matrix}$ the extra-domain space: $s_{t h 2} < a_{x}^{'}$ wherein s.sub.th1 denotes a first cognitive domain threshold, obtained from a human driving a first reaction time t.sub.c1 and a maximum approach speed v.sub.e of other nodes relative to the self-vehicle, s.sub.th2 denotes a second cognitive domain threshold, obtained from a human driving a second reaction time t.sub.c2 and the maximum approach speed v.sub.e of the other nodes relative to the self-vehicle, establish a risk perception function between nodes under a dynamic safety field model: $Risk (p_{i}, p_{j}) = .Math. \overset{.fwdarw.}{S_{i, j}} .Math. \exp (- k_{c} .Math. \overset{.fwdarw.}{v_{j}} .Math. \cos_{i, j})$ wherein |{right arrow over (S.sub.i,j)}| denotes a field strength of the node p.sub.i at the node p.sub.j, k.sub.c denotes a risk-adjustment cognitive coefficient, |{right arrow over (v.sub.j)}| denotes a scalar velocity of the node p.sub.j, and .sub.i,j denotes an angle (positive in a clockwise direction) between a velocity vector {right arrow over (v.sub.j)} of the node p.sub.j and a field strength vector {right arrow over (S.sub.i,j)}, a risk value Risk(p.sub.i, p.sub.j), obtained through the risk perception function, indicates a coupling strength between nodes, wherein the risk value and the coupling strength are positively correlated.

15. The vehicular device according to claim 10, wherein in the complex network cognition-based FRL end-to-end autonomous driving control system, the CARLA simulator is used as an interaction environment, the CARLA simulator realizes vehicle control by inputting control quantities of a steering, the throttle and the brake, wherein steering [1, 1], throttle [0, 1] and brake [0, 1], based on a CARLA simulator's control method, a reinforcement learning action space [1, 1].sup.2, is categorized into the steering and a throttle-brake, when outputting the throttle-brake, [1, 0] denotes the brake and [0, 1] denotes the throttle, the driving control system outputs two parameters of the beta distribution by the reinforcement learning module, and then obtains a policy action by sampling: $Beta = B (,),, > 0$ an interaction process produces an experience, described by a tuple, containing a previous moment state quantity, the policy action, the reward function, a next moment state quantity, and the dynamic driving suggestion, calculate a weighted reward function with a mapped risk value as a weight for a termination state-related reward: $\begin{matrix} r = r_{s p e e d} + r_{p o s i t i o n} + r_{a c t i o n} + Activate (Risk) * r_{terminal} \\ r_{s p e e d} = \frac{1 - .Math. v - v_{d e s i r e} .Math.}{v_{\max}} \\ r_{position} = - 0.5 * d - \end{matrix}$ wherein r.sub.speed denotes a speed-related reward function, r.sub.position denotes a position-related reward function, r.sub.action denotes an action-related reward function, r.sub.terminal denotes a termination state-related reward function, v denotes a vehicle speed, v.sub.desire denotes a desired speed, and v.sub.max=6 m/s denotes a maximum speed, d denotes a vehicle lateral distance from the desired path, and denotes an angle between a vehicle traveling direction and a tangent line of the desired path, table 1 describes values of the r.sub.action and the r.sub.terminal in detail, wherein steering denotes an amount of steering wheel angle change in two frames TABLE-US-00003 TABLE 1 Reward Condition Value r.sub.action steering 0.01 1 v r.sub.terminal Run red light 1 v Run stop sign 1 v Collision 1 v Route deviation 1 Blocked 1.

16. The vehicular device according to claim 10, wherein in the complex network cognition-based FRL end-to-end autonomous driving control system, in a training process, a parameter updating is performed through following loss functions for the reinforcement learning module: $\begin{matrix} _{k + 1} = \arg \max_{} \underset{~_{_{k}}}{E} [_{p p o} +_{\exp} +_{r i s k}] \\ _{ex p} = -_{\exp} * H (_{} (.Math. .Math. i_{RL}, m_{R L})) \\ H (_{}) = - KL (_{} .Math. (- 1, 1)) \\ _{risk} =_{r i s k} * {T - N_{z} + 1, .Math., T} (k) * KL (_{} (.Math. .Math. i_{R L, k}, m_{R L, k}) .Math. B_{r i s k}) \end{matrix}$ wherein custom character .sub.ppo denotes a clipped policy gradient loss with advantages estimated using a generalized advantage estimation, .sub.exp denotes a maximum entropy loss, H(.sub.(.Math.|i.sub.RL, m.sub.RL)) denotes an entropy of a policy .sub. under an image input i.sub.RL and a measurement input m.sub.RL, and custom character (1, 1) denotes a uniform distribution, .sub.exp encourage the given driving agent to explore by converging an action distribution to the uniform distribution, .sub.exp denotes a weight of the maximum entropy loss, .sub.risk denotes dynamic risk suggestions based loss, and .sub.{TN.sub.z.sub.+1, . . . ,T}(k) denotes a calculation of the KL-divergence of the strategy output by the driving subject N.sub.z=100 steps before a termination state and the dynamic driving suggestions to realize the guidance of the given driving agent, and .sub.risk denotes a weight of a dynamic suggestions loss.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 is the complex network cognition based FRL framework.

(2) FIGS. 2A-2D show the schematic of risk perception based on dynamic safety field.

(3) FIG. 3 is the schematic of the reward function.

(4) FIG. 4 is the schematic of the neural network.

(5) FIG. 5 is the diagram of the federated learning architecture.

(6) FIG. 6 is the diagram of the FLDPPO framework.

DETAILED DESCRIPTION OF THE EMBODIMENTS

(7) The technical solution of the present disclosure is described in detail below in conjunction with the drawings, but is not limited to the contents of the present disclosure.

(8) The present disclosure provides a complex network cognition-based FRL end-to-end algorithmic framework that enables autonomous driving under dense urban traffic, specifically including the following steps:

(9) (1) The framework of FRL algorithm based on complex network cognition is built in CARLA simulator, as shown in FIG. 1 and FIG. 6. The framework includes a measurement encoder, an image encoder, a complex network cognition module, a reinforcement learning module, and a federated learning module. The measurement encoder is configured to obtain x-coordinate, y-coordinate, heading angle change and speed of the driving agent. The cognition state quantities are handed over to the complex network cognition module as input. The measurement encoder is also used to obtain steering wheel angle, throttle, brake, gear, lateral speed and longitudinal speed. The RL state quantities are given to the reinforcement learning module as part of the inputs after extracting features from the two-layer fully connected network. The image encoder used a 15-channel semantic BEV, i.sub.RL[0, 1].sup.192*192*15. Where, 192 is in pixels and the BEV used is 5 px/m. The 15 channels contain a drivable domain, a desired path, a road edge, 4 frames of other vehicles, 4 frames of pedestrians, and 4 frames of traffic signs. The desired path is calculated using the A* algorithm. The semantic BEV is extracted by multilayer convolutional layers to extract implicit features and then passed to the reinforcement learning module as another part of the inputs.

(10) (2) Model the driving situation of the driving subject, as shown in FIGS. 2A-2D, a dynamic complex network model is constructed with traffic participants and road infrastructure as nodes:

(11) $G_{t} = {(P, E, W,)}_{t}$

(12) Where, G.sub.t denotes the dynamic complex network at the moment t, P={p.sub.1, p.sub.2, . . . , p.sub.N} is the set of nodes, and the number of nodes is N; E={e.sub.1,2, e.sub.1,3, . . . , e.sub.i,j} is the set of edges, and the number of edges is

(13) $\frac{N (N - 1)}{2},$
and e.sub.i,j stands for the connectivity between nodes p.sub.i and p.sub.j; W={w.sub.1,2, w.sub.1,3, . . . , w.sub.i,j} is the set of weights of the edges, w.sub.i,j represents the coupling strength between nodes p.sub.i and p.sub.j; is the active region of the nodes, representing the dynamic constraints on the nodes in the network. Model as a smooth bounded surface:

(14) $F_{} (x, y, z) = 0,$ $s . t . (x, y)$

(15) Where, is the boundary of the representation of the slip surface. Consider a continuous time dynamic network with N nodes on with a node state equation of the form:

(16) ${\dot{X}}_{i} = A_{i} X_{i} + B_{i} U_{i}$

(17) Where, X.sub.iR.sup.m denotes the state vector of node p.sub.i, R.sup.m denotes the vector space consisting of m-dimensional real numbers R, U.sub.iR.sup.q is the input vector, R.sup.q denotes the vector space consisting of q-dimensional real numbers R, A.sub.i denotes the dynamics matrix, B.sub.i denotes the input matrix. Based on the node state equation, the output vector of node p.sub.i can be obtained:

(18) $Y_{i} = f_{i} (X_{i})$

(19) Where, f.sub.i denotes the output function of the node. Then the weight function between nodes p.sub.i and p.sub.j is:

(20) $w_{i j} = F (Y_{i}, Y_{j})$

(21) Where, F denotes the weight function between the nodes. The present disclosure uses a Gaussian function to reveal the static properties between nodes:

(22) $S_{s t a} = C_{a} .Math. \exp (- \frac{{(x - x_{0})}^{2}}{a_{x}^{2}} - \frac{{(y - y_{0})}^{2}}{b_{y}^{2}})$

(23) Where, S.sub.sta denotes the static field strength, C.sub.a denotes the field strength coefficients, x.sub.0 and y.sub.0 denote the coordinates of the risk center O(x.sub.0, y.sub.0), and a.sub.x and b.sub.y denote the vehicle appearance coefficients, respectively. The safety field is characterized by shape anisotropy:

(24) 0 $= \frac{a_{x}^{2} - b_{y}^{2}}{a_{x}^{2} + b_{y}^{2}} = \frac{^{2} - 1}{^{2} + 1}$ $= a_{x} / b_{y} = l_{v} / w_{v}$

(25) Where, is the aspect ratio, l.sub.v denotes the vehicle length, and w.sub.v denotes the vehicle width. The safety field is delineated by a series of isofields, and the top view projection is the region covered by a series of ellipses as shown in FIGS. 2A-2D, with the region covered by the smallest center ellipse being the core domain, the region between the smallest center ellipse and the second ellipse being the limited domain, and the region between the second ellipse and the largest ellipse being the extended domain. The size and shape of the region are determined by the isofield lines, are related to the vehicle shape and motion state and are described based on a Gaussian function. The direction of the safety field is aligned with the direction of vehicle motion.

(26) When the vehicle is in motion, the risk center O(x.sub.0, y.sub.0) of the safety field will be transferred to a new risk center O(x.sub.0, y.sub.0):

(27) ${\begin{matrix} x_{0}^{} = x_{0} + k_{v} .Math. \overset{.fwdarw.}{v} .Math. \cos \\ y_{0}^{} = y_{0} + k_{v} .Math. \overset{.fwdarw.}{v} .Math. \sin \end{matrix}$

(28) Where, k.sub.v denotes the moderating factor and k.sub.v custom character {(1, 0)(0,1)}, k.sub.v's positive or negative is related to the direction of motion, and denotes the angle of the transfer vector k.sub.v|{right arrow over (v)}| with the axes in the Cartesian coordinate system. A virtual vehicle, with length l.sub.v and width w.sub.v, is formed under the risk center transfer. The dynamic safety field:

(29) $S_{dyn} = C_{a} .Math. \exp (- \frac{{(x - x_{0}^{})}^{2}}{{(a_{x}^{})}^{2}} - \frac{{(y - y_{0}^{})}^{2}}{{(b_{y}^{})}^{2}})$

(30) Where, S.sub.dyn denotes the dynamic field strength and the new aspect ratio is denoted as =a.sub.x/b.sub.y=l.sub.v/w.sub.v. As the motion state of the vehicle changes, the shape of the Gaussian safety field changes, thus changing the three fields covered by the safety field: the core region, the limited region and the extended region.

(31) The present disclosure categorizes risk perception into three layers on a planar scale based on different levels of human driving reaction time: the first cognitive domain, the second cognitive domain and the extra-domain space.

(32) The first cognitive domain:

(33) $a_{x}^{} s_{t h 1}$ $s_{t h 1} = t_{c 1} .Math. v_{e}$

(34) The second cognitive domain:

(35) $s_{t h 1} < a_{x}^{} s_{t h 2}$ $s_{t h 2} = t_{c 2} .Math. v_{e}$

(36) The extra-domain space:

(37) $s_{t h 2} < a_{x}^{}$

(38) Where, s.sub.th1 denotes the first cognitive domain threshold, obtained from the human driving the first reaction time t.sub.c1 and the maximum approach speed v.sub.e of the other nodes relative to the self-vehicle. s.sub.th2 denotes the second cognitive domain threshold, obtained from the human driving the second reaction time t.sub.c2 and the maximum approach speed v.sub.e of the other nodes relative to the self-vehicle.

(39) Establish a risk perception function between nodes under a variable safety field model:

(40) $Risk (p_{i}, p_{j}) = .Math. \overset{.fwdarw.}{S_{i j}} .Math. \exp (- k_{c} .Math. \overset{.fwdarw.}{v_{j}} .Math. \cos_{i, j})$

(41) Where, |{right arrow over (S.sub.i,j)}| denotes the field strength of node p.sub.i at node p.sub.j, k.sub.c denotes the risk-adjustment cognitive coefficient, |{right arrow over (v.sub.j)}| denotes the scalar velocity of node p.sub.j, and .sub.i,j denotes the angle (positive in the clockwise direction) between the velocity vector {right arrow over (v.sub.j)} of node p.sub.j and the field strength vector {right arrow over (S.sub.i,j)}. The risk value Risk(p.sub.i, p.sub.j), obtained through the risk perception function, indicates the coupling strength between the nodes, and the higher the risk value, the higher the coupling strength, implying the higher correlation between the nodes.

(42) The activation function is configured to map the risk value, Activate(Risk) represents different activation functions according to different driving suggestions, and the mapped risk value will be used as the basis for guiding the output strategy of the reinforcement learning module:

(43) ${Activate}_{go} (Risk) = \frac{4}{(1 + \exp (- 300 / Risk)) - 1}$ ${Activate}_{stop} (Risk) = \frac{4}{(1 + \exp (- 0.2 * R i s k)) - 1}$

(44) Where, Activate.sub.go(Risk) denotes the activation function when the suggestion is forward, Activate.sub.stop(Risk) denotes the activation function when the suggestion is stop, and Risk denotes the current risk value of the self-vehicle. The dynamic risk suggestion B.sub.risk:

(45) $B_{r i s k} = B ({Activate}_{g o} (Risk),_{g o}), go$ $B_{r i s k} = B (_{stop}, {Activate}_{stop} (Risk)), stop$

(46) Where, B denotes the beta distribution with .sub.stop=.sub.go=1.

(47) (3) Construct the reinforcement learning model of the driving subject. Based on CARLA's control method, the reinforcement learning action space [1, 1].sup.2, are categorized into steering wheel corner and throttle brake. When outputting the throttle brake, [1,0] denotes the brake and [0,1] denotes the throttle. The present disclosure outputs the two parameters of the beta distribution by reinforcement learning, and then obtains the policy actions by sampling:

(48) $Beta = B (,),, > 0$

(49) In contrast to Gaussian distributions, are commonly used for model-free reinforcement learning, beta distributions are bounded and do not require mandatory constraints.

(50) The reward function setting, as shown in FIG. 3, considers from four aspects: velocity, coordinates, action and termination state. Calculate the weighted reward function with the mapped risk value as a weight for termination state-related reward:

(51) 0 $r = r_{speed} + r_{position} + r_{action} + Activate (Risk) * r_{terminal} r_{speed} = \frac{1 - .Math. v - v_{desire} .Math.}{v_{\max}} r_{position} = - 0.5 * d -$

(52) Where, r.sub.speed denotes the speed-related reward function, r.sub.position denotes the position-related reward function, r.sub.action denotes the action-related reward function, r.sub.terminal denotes the termination state-related reward function, v denotes the vehicle speed, v.sub.desire denotes the desired speed, and v.sub.max=6 m/s denotes the maximum speed. d denotes the vehicle lateral distance from the desired path, and denotes the angle between the vehicle traveling direction and the tangent line of the desired path. Table 1 describes the values of r.sub.action and r.sub.terminal in detail, where steering denotes the amount of steering wheel angle change in two frames.

(53) Construct an end-to-end neural network, as shown in FIG. 4, comprising 2 fully connected layers used by the measurement encoder, 6 convolutional layers used by the image encoder and 6 fully connected layers used by the reinforcement learning module. The neural network has two output heads, action head and value head. The action head outputs two parameters of the beta distribution and the value head outputs the value of the action.

(54) (4) The driving subjects interact with the CARLA simulation environment and the experiences are stored in their respective local replay buffers. As shown in FIG. 5, when the number of samples reaches a certain threshold, the samples are sampled from the respective local replay buffer according to the mini-batch, and then the neural network parameters are updated according to the designed loss function:

(55) $_{k + 1} = \arg \max_{} \underset{~_{_{k}}}{E} [_{ppo} +_{\exp} +_{risk}]_{\exp} = -_{\exp} * H (_{} (.Math. .Math. i_{RL}, m_{RL})) H (_{}) = - KL (_{} .Math. (- 1, 1))_{risk} =_{risk} * {T - N_{z} + 1, .Math., T} (k) * KL (_{} (.Math. .Math. i_{RL, k}, m_{RL, k}) .Math. B_{risk})$

(56) Where, custom character .sub.ppo denotes the clipped policy gradient loss with advantages estimated using generalized advantage estimation. .sub.exp denotes the maximum entropy loss, H(.sub.(.Math.|i.sub.RL, m.sub.RL)) denotes the entropy of the policy .sub. under the image input i.sub.RL and the measurement input m.sub.RL, and custom character (1, 1) denotes the uniform distribution. .sub.exp encourage the agent to explore by converging the action distribution to a uniform distribution, .sub.exp denotes the weight of the maximum entropy loss. .sub.risk denotes the loss based on the dynamic risk suggestions, and .sub.{TN.sub.z.sub.+1, . . . ,T}(k) denotes the calculation of the KL-divergence of the strategy output by the driving subject N.sub.z=100 steps before the termination state and the dynamic driving suggestions to realize the guidance of the agent, and .sub.risk denotes the weight of the dynamic suggestions loss.

(57) (5) The federated learning module is configured to receive the neural network parameters uploaded by the reinforcement learning module of each agent, and to aggregate the global parameters based on the plurality of neural network parameters, and finally to send the global parameters to each agent until the network converges. The global parameter aggregation is performed by the following equation:

(58) $_{m}^{*} = \frac{1}{N} \underset{n}{.Math.}_{m}^{n}$

(59) Where, .sub.m* denotes the global parameters at time m, N denotes the number of agents, and .sub.m.sup.n denotes the neural network parameters at time m of the nth agent.

(60) Overall, the present disclosure proposes FLDPPO, a complex network cognition based FRL algorithmic framework for urban autonomous driving in dense traffic scenarios. The FLDPPO algorithm realizes the combination of rule-based complex network cognition and end-to-end FRL by designing the loss function. The dynamic driving suggestions guide the agents to learn the rules, enabling them to cope with complex urban driving environments and dense traffic scenarios. The present disclosure introduces federated learning to train models by the method of parameter aggregation. The federated learning architecture accelerates network convergence, reduces communication consumption. The multi-agent architecture in the algorithmic framework not only improves the sample efficiency, but the trained models also exhibit high robustness and generalization.

(61) The present disclosure also proposes a vehicular device, the vehicular device being capable of executing the contents of the complex network cognition-based FRL end-to-end autonomous driving control system, or complex network cognition-based FRL end-to-end autonomous driving control method.

(62) A serious of detailed descriptions above are only specific descriptions of the practicable mode of implementation of the present disclosure and are not intended to limit the scope of protection of the present disclosure. Any equivalent mode or modification that does not depart from the technology of the present disclosure is included in the scope of protection of the present disclosure.

Complex network cognition-based federated reinforcement learning end-to-end autonomous driving control system, method, and vehicular device

Assignee

Inventors

Cpc classification

Classification Explorer

B60W50/08

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

G06V10/82

PHYSICS

Classification Explorer

B60W2520/12

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

G06V10/774

PHYSICS

Classification Explorer

B60W60/001

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

G06N3/098

PHYSICS

Classification Explorer

G06N3/092

PHYSICS

Classification Explorer

Y02T10/40

GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS

Classification Explorer

B60W2510/188

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

B60W2520/10

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

G06N3/045

PHYSICS

Classification Explorer

G06V20/56

PHYSICS

Classification Explorer

G05B13/027

PHYSICS

International classification

Classification Explorer

B60W50/08

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

B60W60/00

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

G05B13/02

PHYSICS

Classification Explorer

G06N3/092

PHYSICS

Classification Explorer

G06N3/098

PHYSICS

Abstract

Claims

Description