Training an Artificial Intelligence Unit for an Automated Vehicle

20230147000 · 2023-05-11

    Inventors

    Cpc classification

    International classification

    Abstract

    Systems and methods for training an artificial intelligence unit for an automated vehicle are provided. The artificial intelligence unit includes a knowledge configuration. The artificial intelligence unit determines an evaluation value for at least two motion actions for the automated vehicle that considers an input state and the knowledge configuration. The input state characterizes the automated vehicle and at least one other road user. The system selects one motion action from the at least two motion actions, considers the evaluation value of the respective motion actions, and trains the artificial intelligence unit by adapting the knowledge configuration of the artificial intelligence unit based on the selected motion action. The knowledge configuration characterizes at least the empowerment of the at least one other road user.

    Claims

    1-10. (canceled)

    11. A system for training an artificial intelligence unit for an automated vehicle, comprising: a processor; a memory in communication with the processor, the memory storing a plurality of instructions executable by the processor to cause the system to implement: an artificial intelligence unit comprising: a knowledge configuration, wherein the artificial intelligence unit is configured to: determine an evaluation value for at least two motion actions for the automated vehicle based on an input state and based on the knowledge configuration (KC), wherein the input state characterizes the automated vehicle and at least one other road user, wherein the memory further comprises instructions to cause the system to:  select one motion action from the at least two motion actions based on the evaluation value of the respective motion actions; and  train the artificial intelligence unit by adapting the knowledge configuration of the artificial intelligence unit based on the selected motion action, wherein the knowledge configuration characterizes at least an empowerment of the at least one other road user.

    12. The system according to claim 11, wherein the empowerment of the at least one other road user is at least characterized by a number of possible future motion actions of the at least one other road user.

    13. The system according to claim 11, wherein the knowledge configuration further characterizes a reward with respect to the automated vehicle reaching a goal.

    14. The system according to claim 11, wherein the knowledge configuration further characterizes a distance between the automated vehicle and the other road user.

    15. The system according to claim 11, wherein a first motion action is determined to have a higher evaluation value than a second motion action when the first motion action provides the at least one other road user a higher number of possible future motion actions than the second motion action.

    16. The system according to claim 11, wherein a first motion action is determined to have a higher evaluation value than a second motion action when a future state of an environment of the automated vehicle is more predictable for the first motion action than for the second motion action.

    17. The system according to claim 11, wherein a first motion action is determined a higher evaluation value than a second motion action when a probability of occurrence of a future state of an environment of the automated vehicle is higher when the automated vehicle would perform the first motion action than a probability of occurrence of a future state of an environment of the automated vehicle when the automated vehicle would perform the second motion action.

    18. The system according to claim 11, wherein the artificial intelligence unit is further configured to: predict a future state of an environment of the automated vehicle for each of the motion actions for the automated vehicle, with the artificial intelligence unit determining two probabilities of occurrence for each of the future states of the environment of the automated vehicle, wherein a first probability of occurrence is a conditional probability given the occurrence of the respective motion action, a second probability is independent of the occurring of the respective motion action, and the artificial intelligence unit determines an evaluation value for at least two motion actions for the automated vehicle such that a first motion action is determined a higher evaluation value than a second motion action when a difference of the two probabilities for the first motion action is higher than a difference of the two probabilities for the second motion action.

    19. The system according to claim 11, wherein the artificial intelligence unit is a reinforcement learning unit.

    20. A method for training an artificial intelligence unit for an automated vehicle, wherein the artificial intelligence unit comprises a knowledge configuration and determines or reads out an evaluation value for at least two motion actions for the automated vehicle, the method comprising: selecting one motion action from the at least two motions actions based on the evaluation value of the respective motion actions, wherein the evaluation value considers an input state that characterizes the automated vehicle and at least one other road user and the evaluation value considers the knowledge configuration, and training the artificial intelligence unit by adapting the knowledge configuration of the artificial intelligence unit considering the selected motion action, wherein the knowledge configuration characterizes at least an empowerment of the at least one other road user.

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0049] FIG. 1 shows an example traffic situation,

    [0050] FIG. 2 shows the basic principle of reinforcement learning,

    [0051] FIG. 3 shows an example structure of the system for training the artificial intelligence unit, and

    [0052] FIG. 4 shows an example for the knowledge configuration of the artificial intelligence unit.

    DETAILED DESCRIPTION OF THE DRAWINGS

    [0053] FIG. 1 show an example traffic situation on a road that comprises three lanes L0, L1, L2. The automated vehicle EGO is driving on the middle lane L1, one road user RU1 is also driving on the middle lane L1 but in front of the automated vehicle EGO, and another road user RU2 is driving on the right lane L0.

    [0054] The automated vehicle EGO has three motion actions ma1, ma2, ma3 available, wherein one motion action ma1 is a lane change to the left lane L2, one motion action ma2 is staying in the current lane L1 and one motion action ma3 is a lane change to the right lane L0.

    [0055] Depending on the chosen motion action ma1, ma2, ma3, and the speed of the automated vehicle EGO, the at least one other road user RU1, RU2 will experience different levels of empowerment.

    [0056] For example at the current time step, the automated vehicle EGO can perform three different motions actions ma1, ma2, ma3, because it is driving on the middle lane L1. The at least one other road user RU2 can only stay on its current lane L0 at the current time step.

    [0057] However, after a lane change to the left lane L2, the at least one other road user RU2 will have two motion actions available, because it then can stay on its current lane L0 and can switch to the middle lane L1.

    [0058] FIG. 2 shows the basic principle of reinforcement learning. The automated vehicle EGO is selecting and executing a motion action ma, which influences the environment ENV of the automated vehicle. The automated vehicle EGO receives an input state IS characterizing the automated vehicle EGO and/or its environment ENV and a reward r for the transition from one state to another state.

    [0059] FIG. 3 shows an example structure of the system for training the artificial intelligence unit AIU for an automated vehicle EGO.

    [0060] The artificial intelligence unit AIU comprises a knowledge configuration KC, and the artificial intelligence unit AIU determines an evaluation value for at least two motion actions ma1, ma2, ma3 for the automated vehicle EGO considering an input state IS, s1-s5 and considering the knowledge configuration KC, wherein the input state IS, s1-s5 characterizes the automated vehicle EGO and the at least one other road user RU1, RU2.

    [0061] Moreover, the system is configured to select one motion action ma from the at least two motion actions ma1, ma2, ma3, considering the evaluation value of the respective motion actions ma1, ma2, ma3.

    [0062] The system comprises for example a selection unit S for selecting one motion action ma from the at least two motion actions ma1, ma2, ma3, considering the evaluation value of the respective motion actions ma1, ma2, ma3.

    [0063] Additionally, the system is configured to train the artificial intelligence unit AIU by adapting the knowledge configuration KC of the artificial intelligence unit AIU considering the selected motion action ma.

    [0064] In particular, the artificial intelligence unit AIU is a neural network. The neural network AIU comprises a plurality of neurons A1-A4; B1-B5; C1-C5; D1-D3 interconnected via a plurality of synapses. A first set of neurons A1-A4 is receiving information about the input state IS, s1-s5, and a second set of neurons B1-B5; C1-C5 is approximating at least two evaluation functions considering the input state IS, s1-s5. A third set of neurons D1-D3 is assigning the at least two evaluation functions to the at least two motion actions ma1, ma2, ma3 of the automated vehicle.

    [0065] The knowledge configuration KC of the artificial intelligence unit AIU is the synaptic weight of the at least one synapse of the second set of synapses.

    [0066] FIG. 4 shows an example for the knowledge configuration KC of the artificial intelligence unit AIU.

    [0067] In this example, for each of the input states IS, s1-s5, a reward r for every motion action ma1, ma2, ma3 for the automated vehicle EGO is defined in the knowledge configuration KC, which is represented as a table.

    [0068] The artificial intelligence unit AIU reads out the reward r as an evaluation value for at least two motion actions ma1, ma2, ma3 for the automated vehicle EGO considering the input state IS, s1-s5 from the knowledge configuration KC. The input state IS, s1-s5 characterizes the automated vehicle EGO and/or its environment ENV.

    [0069] Moreover, the system is configured to select one motion action ma from the at least two motion actions ma1, ma2, ma3, considering the evaluation value of the respective motion actions ma1, ma2, ma3. For example, the motion action ma with the highest reward r may be selected. In this example, the selected motion action ma is motion action ma3, because it has the highest reward r of the at least two motion actions ma1, ma2, ma3 considering the current input state s2.

    [0070] Additionally, the system is configured to train the artificial intelligence unit AIU by adapting the knowledge configuration KC of the artificial intelligence unit AIU considering the selected motion action ma.

    [0071] In particular, the adaption of the knowledge configuration KC of the artificial intelligence unit AIU considering the selected motion action ma may be performed by determination of the following input state IS, s1-s5, in particular the following input state s4. The reward r for the selected motion action ma, ma3 and the current input state s2 may be adapted, for example, by considering the reward r of the one motion action of the at least two motion actions ma1, ma2, ma3, which has the highest reward regarding the following input state s4. In this example, the motion action ma1 has the highest reward r of the at least two motion actions ma1, ma2, ma3 considering the following input state s4.

    [0072] For example, the reward r for the selected motion action ma, ma3 and the current input state s2 may be set to a weighted sum of the old value of the reward r for the selected motion action ma, ma3 and the of the reward r for the motion action ma1 with the highest reward r of the at least two motion actions ma1, ma2, ma3 considering the following input state s4. The weights of the weighted sum specifies a learning rate or step size, which determines to what extent newly acquired information overrides old information. A factor of 0 makes the artificial intelligence unit AIU learn nothing (exclusively exploiting prior knowledge), while a factor of 1 makes the artificial intelligence unit AIU consider only the most recent information (ignoring prior knowledge to explore possibilities).

    [0073] The knowledge configuration KC characterizes at least the empowerment of the at least one other road user RU1, RU2. Moreover, the knowledge configuration KC characterizes in particular also a reward with the respect to the automated vehicle EGO reaching a goal. The reward r may for example be the sum of a first value that characterizes the empowerment of the at least one other road user RU1, RU2 and of second value that characterizes the automated vehicle EGO reaching a goal.

    [0074] The artificial intelligence unit AIU in particular determines this first value such that a first motion action is determined a higher first value than a second motion action, if the first motion action provides the at least one other road user RU1, RU2 a higher number of possible future motion actions than the second motion action.

    [0075] Alternatively, the artificial intelligence unit AIU in particular determines this reward r for at least two motion actions ma1, ma2, ma3 for the automated vehicle EGO such that the a reward r for a first motion action is determined a higher first value than a reward r for a second motion action, if a future state of the environment of the automated vehicle EGO is more predictable for the first motion action than for the second motion action.