Method for Generating Training Data for a Recognition Model for Recognizing Objects in Sensor Data from a Surroundings Sensor System of a Vehicle, Method for Generating a Recognition Model of this kind, and Method for Controlling an Actuator System of a Vehicle
20220156517 · 2022-05-19
Inventors
Cpc classification
G06F18/214
PHYSICS
B60W2050/0085
PERFORMING OPERATIONS; TRANSPORTING
G06V20/56
PHYSICS
B60W50/00
PERFORMING OPERATIONS; TRANSPORTING
G06V10/774
PHYSICS
B60W2050/0028
PERFORMING OPERATIONS; TRANSPORTING
International classification
B60W50/00
PERFORMING OPERATIONS; TRANSPORTING
Abstract
The present disclosure relates to a method for generating training data for a recognition model for recognizing objects in sensor data of a vehicle. First sensor data and second sensor data are input into a learning algorithm. The first sensor data comprise measurements of a first surroundings sensor. The second sensor data comprise a measurements of a second surroundings sensor. A training data generation model is generated, using learning algorithm, that generates measurements of the second surroundings sensor assigned to measurements of the first surroundings sensor. First simulation data are input into the training data generation model. The first simulation data comprise simulated measurements of the first surroundings sensor. Second simulation data are generated as the training data based on the first simulation data using the training data generation model. The second simulation data comprise simulated measurements of the second surroundings sensor.
Claims
1. A method for generating training data for a recognition model configured to recognize objects in sensor data from a surroundings sensor system of a vehicle, the method comprising: inputting first sensor data and second sensor data into a learning algorithm, the first sensor data including a plurality of chronologically successive real measurements of a first surroundings sensor of the surroundings sensor system, the second sensor data including a plurality of chronologically successive real measurements of a second surroundings sensor of the surroundings sensor system, each real measurement in the plurality of chronologically successive real measurements of the second surroundings sensor being assigned to a temporally corresponding real measurement in the plurality of chronologically successive real measurements of the first surroundings sensor; generating a training data generation model configured to generate measurements of the second surroundings sensor assigned to measurements of the first surroundings sensor based on the first sensor data and the second sensor data using the learning algorithm; inputting first simulation data into the training data generation model, the first simulation data including a plurality of chronologically successive simulated measurements of the first surroundings sensor; and generating second simulation data as the training data based on the first simulation data using of the training data generation model, the second simulation data including a plurality of chronologically successive simulated measurements of the second surroundings sensor.
2. The method according to claim 1, wherein the learning algorithm includes an artificial neural network.
3. The method according to claim 1, wherein the learning algorithm includes a generator configured to generate the second simulation data and a discriminator configured to evaluate the second simulation data based on at least one of (i) the first sensor data and (ii) the second sensor data.
4. The method according to claim 1 further comprising: generating the first simulation data using a computation model that describes physical properties of the first surroundings sensor and of surroundings of the vehicle.
5. The method according to claim 4, wherein the computation model is configured to assign a target value to be output by the recognition model to each of the simulated measurements in the plurality of chronologically successive simulated measurements of the first surroundings sensor.
6. The method according to claim 1 further comprising: generating the recognition model by: inputting the second simulation data as training data into a further learning algorithm; and generating the recognition model based on the training data using the further learning algorithm.
7. The method according to claim 6, the generating the recognition model further comprising: inputting the first simulation data as training data into the further learning algorithm, the first simulation data having been generated using a computation model that describes physical properties of the first surroundings sensor and of surroundings of the vehicle; and at least one of: generating, based on the first simulation data using the further learning algorithm, as the recognition model a first classifier configured to assign object classes to measurements of the first surroundings sensor; and generating, based on the second simulation data using the further learning algorithm, as the recognition model a second classifier configured to assign object classes to measurements of the second surroundings sensor.
8. The method according to claim 7, the generating the recognition model further comprising: inputting, into the further learning algorithm, target values to be output by the recognition model, the target values having been assigned by the computation model to each of the simulated measurements in the plurality of chronologically successive simulated measurements of the first surroundings sensor; and generating the recognition model further based on the target values using the further learning algorithm.
9. The method according to claim 6 further comprising: controlling an actuator system of the vehicle by: receiving further sensor data generated by the surroundings sensor system; inputting the further sensor data into the recognition model; and generating a control signal configured to control the actuator system based on outputs from the recognition model.
10. A data processing apparatus for generating training data for a recognition model configured to recognize objects in sensor data from a surroundings sensor system of a vehicle, the data processing apparatus comprising: a processor configured to: input first sensor data and second sensor data into a learning algorithm, the first sensor data including a plurality of chronologically successive real measurements of a first surroundings sensor of the surroundings sensor system, the second sensor data including a plurality of chronologically successive real measurements of a second surroundings sensor of the surroundings sensor system, each real measurement in the plurality of chronologically successive real measurements of the second surroundings sensor being assigned to a temporally corresponding real measurement in the plurality of chronologically successive real measurements of the first surroundings sensor; generate a training data generation model configured to generate measurements of the second surroundings sensor assigned to measurements of the first surroundings sensor based on the first sensor data and the second sensor data using the learning algorithm; input first simulation data into the training data generation model, the first simulation data including a plurality of chronologically successive simulated measurements of the first surroundings sensor; and generate second simulation data as the training data based on the first simulation data using of the training data generation model, the second simulation data including a plurality of chronologically successive simulated measurements of the second surroundings sensor.
11. The method according to claim 1, wherein the method is performed by a processor that exectutes instructions of a computer program.
12. A non-transitory computer-readable medium that stores a computer program for generating training data for a recognition model configured to recognize objects in sensor data from a surroundings sensor system of a vehicle, the computer program including instructions that, when executed by a processor, cause the processor to: input first sensor data and second sensor data into a learning algorithm, the first sensor data including a plurality of chronologically successive real measurements of a first surroundings sensor of the surroundings sensor system, the second sensor data including a plurality of chronologically successive real measurements of a second surroundings sensor of the surroundings sensor system, each real measurement in the plurality of chronologically successive real measurements of the second surroundings sensor being assigned to a temporally corresponding real measurement in the plurality of chronologically successive real measurements of the first surroundings sensor; generate a training data generation model configured to generate measurements of the second surroundings sensor assigned to measurements of the first surroundings sensor based on the first sensor data and the second sensor data using the learning algorithm; input first simulation data into the training data generation model, the first simulation data including a plurality of chronologically successive simulated measurements of the first surroundings sensor; and generate second simulation data as the training data based on the first simulation data using of the training data generation model, the second simulation data including a plurality of chronologically successive simulated measurements of the second surroundings sensor.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0057] Embodiments of the disclosure will be described below with reference to the appended drawings, wherein neither the drawings nor the description should be interpreted as limiting the disclosure.
[0058]
[0059]
[0060]
[0061]
[0062] The figures are merely schematic rather than to scale. Identical reference signs in the figures designate identical features or features having the same effect.
DETAILED DESCRIPTION
[0063]
[0064] The method steps described hereinbelow are illustrated in flowcharts in
[0065] In order to generate the training data 102, the apparatus 100 comprises a training data generation module 116, which executes a suitable learning algorithm.
[0066] In step 210 (see
[0067] In step 220, the learning algorithm executed by the training data generation module 116 generates a training data generation model 124 from the first sensor data 120a and the second sensor data 120b, said training data generation model assigning measurements of the second surroundings sensor 122b to measurements of the first surroundings sensor 122a. More specifically, the training data generation model 124 generates the measurements of the second surroundings sensor 122b, which are assigned to measurements of the first surroundings sensor 122a. For this purpose, the learning algorithm may, for example, train an artificial neural network, as described in more detail below.
[0068] The sensor data 120 that are used to generate the training data generation model 124 may originate from one and the same vehicle 110 or else from a plurality of vehicles 110.
[0069] Subsequently, in step 230, first simulation data 126a are input into the training data generation model 124. Similarly to the first sensor data 120a, the first simulation data 126a comprise a plurality of chronologically successive simulated measurements of the first surroundings sensor 122a, with the difference that in this case the measurements are measurements of the virtual, and not of the physical, first surroundings sensor 122a.
[0070] In step 240, the training data generation model 124 then generates corresponding second simulation data 126b as the training data 102 and outputs said data to a training module 128 for the generation of the recognition model 104. Similarly to the first simulation data 126a, the second simulation data 126b or the training data 102 comprise a plurality of chronologically successive simulated measurements of the second surroundings sensor 122b, which are temporally correlated with the simulated measurements of the first surroundings sensor 122a.
[0071] For example, the first simulation data 126a may be generated by a simulation module 129, on which a suitable physical computation model 130 runs, in a step 230′ that precedes the step 230. Depending on the sensor modality to be simulated, the computation model 130 may, for example, comprise a sensor model 132 for simulating the first surroundings sensor 122a, an object model 134 for simulating the objects 106, 108 and/or a sensor wave propagation model 136, as has been described above.
[0072] The learning algorithm executed by the training data generation module 116 may, for example, be configured to generate an artificial neural network in the form of a generative adversarial network, GAN for short, as the training data generation model 124. A GAN of this kind may comprise a generator 138 for generating the second simulation data 126b and a discriminator 140 for evaluating the second simulation data 126b. For example, in step 220, the discriminator 140 may be trained, using the sensor data 120, to distinguish between measured sensor data, i.e. real measurements of the surroundings sensor system 122, and computer-calculated simulation data, i.e. simulated measurements of the surroundings sensor system 122, wherein the generator 138 may be trained, using outputs from the discriminator 140, such as “1” for “simulated” and “0” for “real”, to generate the second simulation data 126b in such a way that the discriminator 140 is no longer able to distinguish them from the real sensor data, i.e. recognizes them as real. The training data generation model 124 may thus be generated by unsupervised learning, i.e. without the use of labeled input data.
[0073] In addition, in step 230′, the simulation module 129 may generate a target value 142 for each of the simulated measurements of the first surroundings sensor 122a, said target value indicating a desired output of the recognition model 104 to be generated. The target value 142, which is also known as a label, may, for example, indicate an object class, in this case, for example, “tree” and “pedestrian”, or another suitable class. The target value 142 may, for example, be a numerical value assigned to the (object) class.
[0074] In step 310 (see
[0075] In step 320, the further learning algorithm, which may, for example, be a further artificial neural network, uses machine learning to generate the recognition model 104 for recognizing the objects 106, 108 in the surroundings of the vehicle 110 as a “tree” or “pedestrian” from the training data 102. In this case, at least one classifier 144, 146 may be trained to assign the training data 102 to corresponding object classes, here, for example, to the object classes “tree” and “pedestrian”.
[0076] The training data 102 may comprise the first simulation data 126a and/or the second simulation data 126b. For example, the further learning algorithm may use the first simulation data 126a to train a first classifier 144 for classifying the first sensor data 120a, said first classifier being assigned to the first surroundings sensor 122a, and/or use the second simulation data 126b to train a second classifier 146 for classifying the second sensor data 120b, said second classifier being assigned to the second surroundings sensor 122b. However, it is also possible for the further learning algorithm to train more than two classifiers or just a single classifier. Additionally or alternatively to the classifier, the further learning algorithm may, for example, train at least one regression model.
[0077] The generation of the recognition model 104 in step 320 may be carried out using the target values 142 or labels 142 generated by the simulation module 129.
[0078] The recognition model 104 generated in this way may then, for example, be implemented as a software and/or hardware module in a control unit 148 of the vehicle 110 and be used to automatically control an actuator system 150 of the vehicle 110, for example a steering or braking actuator or a drive motor of the vehicle 110. For example, the vehicle 110 may be equipped with a suitable driver assistance function for this purpose. However, the vehicle 110 may also be an autonomous robot with a suitable control program.
[0079] The sensor data 120 provided by the surroundings sensor system 122 are received in the control unit 148 in step 410 (see
[0080] In step 420, the sensor data 120 are input into the recognition model 104, which is executed by a processor of the control unit 148 in the form of a corresponding computer program.
[0081] In step 430, depending on the output from the recognition model 104, for example depending on the recognized object 106 or 108 and/or depending on the recognized speed, position and/or location of the recognized object 106 or 108, the control unit 148 finally generates a corresponding control signal 152 for controlling the actuator system 150 and outputs said control signal to the actuator system 150. The control signal 152 may, for example, cause the actuator system 150 to control the vehicle 110 in such a way that a collision with the recognized object 106 or 108 is avoided.
[0082] Various exemplary embodiments of the disclosure will be described once again hereinbelow in other words.
[0083] For example, the generation of the training data 102 may comprise the following phases.
[0084] In a first phase, there is obtained and recorded a multimodal, unlabeled sample of real sensor data 120 with associated measurements, i.e. the sample consists of pairs of sets of measurements of both sensor modalities A and B, i.e. of both surroundings sensors 122a and 122b, for each point in time.
[0085] In a second phase, an artificial neural network, e.g. a GAN, is trained using the unlabeled sample obtained in the first phase.
[0086] In a third phase, a labeled sample is generated by simulation and transformation using the artificial neural network trained in the second phase.
[0087] The generation of the multimodal, unlabeled sample of real sensor data 120 in the first phase may, for example, take place as follows.
[0088] A single vehicle 110 or a fleet of vehicles 110 may be used for this purpose. The vehicle 110 may be equipped with two or more surroundings sensors 122a, 122b of two different sensor modalities A and B. For example, the sensor modality A may be a lidar sensor system and the sensor modality B may be a radar sensor system. The sensor modality A should be a surroundings sensor for which sensor data can be generated by simulation with the aid of the computation model 130, wherein these simulation data should have a high quality insofar as they match the real sensor data of the sensor modality A to a good approximation. The two surroundings sensors 122a, 122b should be provided and attached to the vehicle 110 and oriented such that there is a significant region of overlap of the respective fields of view thereof. A multimodal, unlabeled sample is created using the vehicle 110 equipped in this way or the vehicles 110 equipped in this way.
[0089] In this case, it should be possible to assign the totality of all the measurements of the sensor modality A at a particular point in time to the totality of all the measurements of the sensor modality B at the same point in time, or at least at a point in time that is approximately the same. For example, the surroundings sensors 122a, 122b may be synchronized with one another such that the measurements of both surroundings sensors 122a, 122b are taken in each case at the same point in time. In this context, “assignment” or “association” should therefore not be understood to mean that measurements of the sensor modality A with respect to a particular static or dynamic object are associated with measurements of the sensor modality B with respect to the same object. This would require a corresponding (manual) annotation of the sample, which is precisely what the method described here is intended to avoid.
[0090] For example, the multimodal, unlabeled sample may be recorded on a persistent memory in the vehicle 110 and then transferred to an apparatus 100 suitable for the second phase. Alternatively, the sample may be transferred while the vehicle is actually traveling, e.g. via a mobile radio network or the like.
[0091] The generation of the training data generation model 124 by training the GAN in the second phase may, for example, take place as follows.
[0092] As has already been mentioned, the multimodal sample obtained and recorded in the first phase may be used in the second phase to train an artificial neural network in the form of a GAN. The GAN may be trained so that it is able, after completion of the training, to transform measurements of the sensor modality A that can readily be simulated into measurements of the sensor modality B which is less easy to simulate.
[0093] The training may take place using pairs of associated sets of measurements of the two sensor modalities A and B. In this context, a set of measurements should be understood to mean all the measurements of the respective sensor modality A or B at a particular point in time or within a short period of time. A set of measurements of this kind may typically contain sensor data for a plurality of static and dynamic objects and may, for example, also be referred to as a frame. A frame may, for example, be an individual image of a camera or a point cloud of a single lidar sweep.
[0094] The set of measurements of the sensor modality A at a particular point in time t(n) may be used as input for the GAN, while the set of measurements of the sensor modality B at the same point in time t(n) may constitute a desired output for the associated input. The time t is not absolutely necessary for the training. The weights of the training data generation model 124 may then be determined by iterative training of the GAN, which may be a deep neural network (DNN). After completion of the training, the GAN is able to generate, for a frame of the sensor modality A that is not included in the training set, a corresponding frame of the sensor modality B.
[0095] The generation of a simulated, labeled sample in the third phase may, for example, take place as follows.
[0096] A labeled sample of the sensor modality B may now be generated by simulation in the third phase by using the GAN trained in the second phase, even if there is no suitable physical computation model available for the sensor modality B.
[0097] Initially, the first simulation data 126a of the sensor modality A are generated. This takes place with the aid of the simulation module 129, which may, for example, simulate both the movement of the vehicle 110 and the movement of other objects 106, 108 in the surroundings of the vehicle 110. In addition, the static surroundings of the vehicle 110 may be simulated, with the result that static and dynamic surroundings of the vehicle 110 are generated at each point in time, wherein the object attributes can be selected in a suitable manner, and relevant labels 142 for the objects 106, 108 can thus be derived. The synthetic sensor data for these objects 106, 108 in the form of the first simulation data 126a are generated by the computation model 130 in this case.
[0098] The respectively assigned labels 142, which are referred to hereinabove as target values 142, i.e. the attributes of the simulated dynamic and static objects, are thus also available as ground truth for the first simulation data 126a of the sensor modality A. Said labels may also be output by the simulation module 129. The first simulation data 126a without the labels 142 are then transformed by the training data generation model 124 in the form of the trained GAN model into sensor data of the sensor modality B, i.e. into the second simulation data 126b, which represent the same, simulated surroundings of the vehicle 110 at each point in time. For this reason, the labels 142 generated by the simulation module 129 also apply to the second simulation data 126b. For example, the assignment of sensor data of the sensor modality A to sensor data of the sensor modality B can take place such that the labels 142, which describe the surroundings of the vehicle 110 at a particular point in time, can be transferred directly without any change, e.g. without prior interpolation.
[0099] Depending on the application, a resulting labeled sample consisting of the second simulation data 126b and the labels 142 or target values 142, or else a resulting labeled multimodal sample consisting of the first simulation data 126a, the second simulation data 126b and the labels 142 or target values 142 can be used as the training data 102 for generating the recognition model 104, e.g. for training a deep neural network.
[0100] Alternatively or additionally, the training data 102 can be used to optimize and/or validate surroundings perception algorithms, e.g. in that a replay of the unlabeled sample is carried out, and a comparison of the symbolic surroundings representation generated by the algorithms, i.e. of the attributes of the objects of the surroundings that are generated by pattern recognition algorithms, with the ground truth attributes of the labeled sample is carried out.
[0101] Finally, it should be noted that terms such as “having”, “comprising”, etc. do not exclude other elements or steps, and terms such as “a” or “an” do not exclude a plurality.