Method and Device for Training a Machine Learning Algorithm
20220383146 · 2022-12-01
Inventors
- Markus Schoeler (Wuppertal, DE)
- Jan Siegemund (Köln, DE)
- Christian Nunn (Hückeswagen, DE)
- Yu Su (Wupperal, DE)
- Mirko Meuter (Erkrath, DE)
- Adrian Becker (Leverkusen, DE)
- Peet Cremer (Dusseldorf, DE)
Cpc classification
G06F18/214
PHYSICS
G06V10/25
PHYSICS
G06V20/56
PHYSICS
International classification
Abstract
A method is provided for training a machine-learning algorithm which relies on primary data captured by at least one primary sensor. Labels are identified based on auxiliary data provided by at least one auxiliary sensor. A care attribute or a no-care attribute is assigned to each label by determining a perception capability of the primary sensor for the label based on the primary data and based on the auxiliary data. Model predictions for the labels are generated via the machine-learning algorithm. A loss function is defined for the model predictions. Negative contributions to the loss function are permitted for all labels. Positive contributions to the loss function are permitted for labels having a care attribute, while positive contributions to the loss function for labels having a no-care attribute are permitted only if a confidence of the model prediction for the respective label is greater than a threshold.
Claims
1. A method for training a machine-learning algorithm configured to process primary data captured by at least one primary sensor in order to determine at least one property of entities in an environment of the at least one primary sensor, the method comprising: receiving auxiliary data from at least one auxiliary sensor; identifying labels based on the auxiliary data, the identifying labels comprising determining a respective spatial area to which each label is related; assigning at least one of a care attribute or a no-care attribute to each identified label by determining a perception capability of the at least one primary sensor for the respective label based on the primary data captured by the at least one primary sensor and based on the auxiliary data captured by the at least one auxiliary sensor, the primary data usable to determine a reference value for a respective spatial area and, for each label, the care attribute is assigned to the respective label if the reference value is greater than a reference threshold and the no-care attribute is assigned to the respective label if the reference value is smaller than or equal to the reference threshold; generating model predictions for the labels via a machine-learning algorithm; defining a loss function for the model predictions, wherein the loss function receives a positive loss contribution for which weights of a model on which the machine-learning algorithm relies are increased if the weights contribute constructively to a prediction corresponding to the respective label and a negative loss contribution for which weights of the model are decreased if the weights contribute constructively to a prediction not corresponding to the respective label; permitting negative contributions to the loss function for all labels; permitting positive contributions to the loss function for labels having a care attribute; and permitting positive contributions to the loss function for labels having a no-care attribute only if a confidence value of the model prediction for the respective label is greater than a predetermined threshold.
2. The method according to claim 1, wherein the predetermined threshold for the confidence value is zero.
3. The method according to claim 2, wherein: the at least one primary sensor includes at least one radar sensor; and the reference value is determined based on radar energy detected by the radar sensor within the spatial area to which the respective label is related.
4. The method according to claim 3, wherein: ranges and angles at which radar energy is perceived are determined based on the primary data captured by the radar sensor; and the ranges and angles are assigned to the spatial areas to which the respective labels are related in order to determine the at least one of the care attribute or the no-care attribute for each label.
5. The method according to claim 4, wherein: an expected range, an expected range rate and an expected angle are estimated for each label based on the auxiliary data; and the expected range, the expected range rate and the expected angle of the respective label are assigned to a range, a range rate and an angle derived from the primary data of the radar sensor in order to determine the radar energy associated with the respective label.
6. The method according to claim 5, wherein the expected range rate is estimated for each label based on a speed vector which is estimated for a respective label by using differences of label positions determined based on the auxiliary data at different points in time.
7. The method according to claim 2, wherein: a subset of auxiliary data points is selected which are located within the spatial area related to the respective label; for each auxiliary data point of the subset, it is determined whether a direct line of sight exists between the at least one primary sensor and the auxiliary data point; and for each label, a care attribute is assigned to the respective label if a ratio of a number of auxiliary data points for which the direct line of sight exists to a total number of auxiliary data points of the subset is greater than a further predetermined threshold.
8. The method according to claim 7, wherein: the at least one primary sensor includes a plurality of radar sensors; and the auxiliary data point is regarded as having a direct line of sight to the at least one primary sensor if the auxiliary data point is located within an instrumental field of view of at least one of the radar sensors and has a direct line of sight to at least one of the radar sensors.
9. The method according to claim 8, wherein: for each of the radar sensors, a specific subset of the auxiliary data points is selected for which the auxiliary data points are related to a respective spatial area within an instrumental field of view of the respective radar sensor; the auxiliary data points of the specific subset are projected to a cylinder or sphere surrounding the respective radar sensor; a surface of the cylinder or sphere is divided into pixel areas; for each pixel area, the auxiliary data point having a projection within the respective pixel area and having the closest distance to the respective radar sensor is marked as visible; for each label, a number of visible auxiliary data points is determined which are located within the spatial area related to the respective label and which are marked as visible for at least one of the radar sensors; and the care attribute is assigned to the respective label if the number of visible auxiliary data points is greater than a visibility threshold.
10. The method according to claim 1, wherein: identifying labels based on the auxiliary data includes determining a respective spatial area to which each label is related; a reference value for the respective spatial area is determined based on the primary data; a subset of auxiliary data points is selected which are located within the spatial area related to the respective label; for each auxiliary data point of the subset, it is determined whether a direct line of sight exists between the at least one primary sensor and the auxiliary data point; and for each label, a care attribute is assigned to the respective label if the reference value is greater than a reference threshold and if a ratio of a number of auxiliary data points for which the direct line of sight exists to a total number of auxiliary data points of the subset is greater than a further predetermined threshold.
11. A system for training a machine-learning algorithm, the system comprising: at least one primary sensor configured to capture primary data; at least one auxiliary sensor configured to capture auxiliary data; and a processing unit configured to be used by the machine-learning algorithm to process the primary data in order to determine at least one property of entities in an environment of the at least one primary sensor, the processing unit further configured to: receive labels identified based on the auxiliary data and a respective spatial area to which each label is related; assign at least one of a care attribute or a no-care attribute to each identified label by determining a perception capability of the at least one primary sensor for the respective label based on the primary data captured by the at least one primary sensor and based on the auxiliary data captured by the at least one auxiliary sensor, the primary data usable to determine a reference value for a respective spatial area and, for each label, the care attribute is assigned to the respective label if the reference value is greater than a reference threshold, and the no-care attribute is assigned to the respective label if the reference value is smaller than or equal to the reference threshold; generate model predictions for the labels via the machine learning algorithm; define a loss function for the model predictions, the loss function receives a positive loss contribution for which weights of a model on which the machine learning algorithm relies are increased if the weights contribute constructively to a prediction corresponding to the respective label, and a negative loss contribution for which weights of the model are decreased if the weights contribute constructively to a prediction not corresponding to the respective label, permit negative contributions to the loss function for all labels; permit positive contributions to the loss function for labels having a care attribute; and permit positive contributions to the loss function for labels having a no-care attribute only if a confidence value of the model prediction for the respective label is greater than a predetermined threshold.
12. The system according to claim 11, wherein: the at least one primary sensor includes at least one radar sensor, and the at least one auxiliary sensor includes at least one of a light ranging and detection (LIDAR) sensor or at least one a camera.
13. The system according to claim 11, wherein the predetermined threshold for the confidence value is zero.
14. The system according to claim 13, wherein: the at least one primary sensor includes at least one radar sensor; and the reference value is determined based on radar energy detected by the radar sensor within the spatial area to which the respective label is related.
15. The system according to claim 14, wherein: ranges and angles at which radar energy is perceived are determined based on the primary data captured by the radar sensor; and the ranges and angles are assigned to the spatial areas to which the respective labels are related in order to determine the at least one of the care attribute or the no-care attribute for each label.
16. The system according to claim 15, wherein: an expected range, an expected range rate and an expected angle are estimated for each label based on the auxiliary data; and the expected range, the expected range rate and the expected angle of the respective label are assigned to a range, a range rate and an angle derived from the primary data of the radar sensor in order to determine the radar energy associated with the respective label.
17. The system according to claim 16, wherein the expected range rate is estimated for each label based on a speed vector which is estimated for a respective label by using differences of label positions determined based on the auxiliary data at different points in time.
18. The system according to claim 17, wherein: the at least one primary sensor includes a plurality of radar sensors; and the auxiliary data point is regarded as having a direct line of sight to the at least one primary sensor if the auxiliary data point is located within an instrumental field of view of at least one of the radar sensors and has a direct line of sight to at least one of the radar sensors.
19. The system according to claim 18, wherein: for each of the radar sensors, a specific subset of the auxiliary data points is selected for which the auxiliary data points are related to a respective spatial area within an instrumental field of view of the respective radar sensor; the auxiliary data points of the specific subset are projected to a cylinder or sphere surrounding the respective radar sensor; a surface of the cylinder or sphere is divided into pixel areas; for each pixel area, the auxiliary data point having a projection within the respective pixel area and having the closest distance to the respective radar sensor is marked as visible; for each label, a number of visible auxiliary data points is determined which are located within the spatial area related to the respective label and which are marked as visible for at least one of the radar sensors; and the care attribute is assigned to the respective label if the number of visible auxiliary data points is greater than a visibility threshold.
20. A non-transitory computer-readable storage medium storing one or more programs comprising instructions, which when executed by a processor, cause the processor to perform operations including: receiving auxiliary data from at least one auxiliary sensor; identifying labels based on the auxiliary data, the identifying labels comprising determining a respective spatial area to which each label is related; assigning at least one of a care attribute or a no-care attribute to each identified label by determining a perception capability of the at least one primary sensor for the respective label based on the primary data captured by at least one primary sensor and based on the auxiliary data captured by the at least one auxiliary sensor, the primary data usable to determine a reference value for a respective spatial area and, for each label, the care attribute is assigned to the respective label if the reference value is greater than a reference threshold and the no-care attribute is assigned to the respective label if the reference value is smaller than or equal to the reference threshold; generating model predictions for the labels via a machine-learning algorithm; defining a loss function for the model predictions, wherein the loss function receives a positive loss contribution for which weights of a model on which the machine-learning algorithm relies are increased if the weights contribute constructively to a prediction corresponding to the respective label, and a negative loss contribution for which weights of the model are decreased if the weights contribute constructively to a prediction not corresponding to the respective label; permitting negative contributions to the loss function for all labels; permitting positive contributions to the loss function for labels having a care attribute; and permitting positive contributions to the loss function for labels having a no-care attribute only if a confidence value of the model prediction for the respective label is greater than a predetermined threshold.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0042] The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
[0043] Example implementations and functions of the present disclosure are described herein in conjunction with the following drawings, showing schematically:
[0044]
[0045]
[0046]
[0047]
DETAILED DESCRIPTION
[0048]
[0049] The primary data or input for the training is received from the radar sensors 13 and is represented as normalized radar energy 21 which is depicted in the form of shadows (as indicated by the arrows) in
[0050]
[0051] The LIDAR system 15 (see
[0052] The labels 19 which are derived from the data provided by the LIDAR system 15 are used as ground truth for a cross-domain training of the machine-learning algorithm since reliable labels cannot be derived from the radar data directly, i.e., neither by humans nor by another automated algorithm, as can be recognized by the representation of the normalized radar energy 21 in
[0053] In order to avoid the above problem, i.e., forcing the radar neural network to predict objects which are not recognizable for the radar sensors 13, the labels 19 are additionally provided with an attribute 22 which indicates how the respective label 19 is to be considered for the training of the machine-learning algorithm. In detail, each label 19 is provided with a care attribute or a no-care attribute, wherein the care attribute indicates that the respective label is to be fully considered for the training of the machine-learning algorithm or radar neural network, whereas specific labels 19 provided with the no-care attribute are partly considered only for the training of the radar neural network. This will be explained in detail below. Since the labels 19 are adapted by the attribute 22 in order to provide a ground truth for cross-domain training of a radar neural network, the entire procedure is referred to as ground truth adaptation for a cross-domain training of radar neural networks (GARNN).
[0054] For assigning the attribute 22, i.e., a care attribute or a no-care attribute, to the ground truth labels 19 derived from auxiliary data which are captured by the LIDAR system 15, two procedures are concurrently performed which are referred to as activation tagging and geometric tagging. For the activation tagging, it is decided for each label 19 whether the respective label 19 can be perceived in the input or primary data captured by the radar sensors 13 or whether the label 19 cannot be perceived in the primary data and would therefore force the machine-learning algorithm to predict a label or object where no signal exists, which would lead to an increase of false detection rates.
[0055] The raw data received by the radar sensors 13 are processed in order to generate a so-called compressed data cube (CDC) as a reference for assigning the suitable attribute to the labels 19. For each radar scan or time step, the compressed data cube includes a range dimension, a range rate or doppler dimension and an antenna response dimension.
[0056] As a first step of activation tagging, angles are estimated at which the radar sensors 13 are able to perceive energy. The angles are estimated by using a classical angle finding procedure, e.g., a fast Fourier transform (FFT) or an iterative adaptive approach (IAA). As a result, a three-dimensional compressed data cube is generated including range, range rate and angle dimensions. Thereafter, the perceived radar energy is normalized, e.g., using a corner reflector response or a noise floor estimation.
[0057] As a next step, a speed vector is assigned to each label 19 (see
[0058] In addition to the range rate which is estimated based on the respective speed vector of the label 19, an expected distance and an expected angle with respect to the position of the radar sensors 13 are determined for each label 19. Based on the expected distance, range rate and angle for the respective label 19 which are derived from LIDAR data 23 (see
[0059] In
[0060] If the normalized radar energy is greater than or equal to a predefined threshold for the respective label 19, this label can be perceived by the radar sensors 13. Therefore, the care attribute is assigned to this label 19. Conversely, if the normalized radar energy is smaller than the predefined threshold for a certain label 19, this label is regarded is not perceivable for the radar sensors 13. Hence, the no-care attribute is assigned to this label 19. As shown in
[0061] The reliability of the activation tagging described above, i.e., associating the normalized radar energy with the respective labels 19, can be limited by a high angular uncertainty of the radar detection. The high angular uncertainty can be recognized in
[0062] Therefore, a second procedure which is called geometric tagging is additionally considered which determines whether a direct line of sight 25 (see
[0063] For the geometric tagging, the LIDAR data points 23 are selected first which belong to the respective bounding box or label 19. The selected LIDAR data points 23 are transformed into a coordinate system of the radar sensors 13, i.e., into the “perspective” of the radar sensors 13. While for the activation tagging a “map” for the normalized radar energy has been considered (see
[0064] Each antenna of the radar sensors 13 has a certain aperture angle or instrumental field of view. For the geometric tagging, all LIDAR data points 23 which are located outside the aperture angle or instrumental field of view of the respective antenna are therefore marked as “occluded” for the respective antenna. For the remaining LIDAR data points 23, a cylinder 27 (see
[0065] The surface of the cylinder 27 is divided into pixel areas, and the LIDAR data points 23 which fall into the aperture angle or instrumental field of view of the respective antenna of the radar sensors 13 are projected to the surface of the cylinder 27. For each pixel area of the cylinder 27, the projections of LIDAR data points 23 are considered which fall into this area, and these LIDAR data points 23 are sorted with respect to their distance to the origin of the radar coordinate system. The LIDAR data point 23 having the closest distance to the respective radar sensor 13 is regarded as visible for the respective pixel area, while all further LIDAR data points 23 are marked as “occluded” for this pixel area and for the respective antenna.
[0066] In the example shown in
[0067] In order to determine whether the entire bounding box or label 19 is regarded as occluded for the radar sensors 13, the number of LIDAR data points 23 belonging to the respective label 19 and being visible (i.e., not marked as “occluded”) for at least one single radar antenna is counted. If this number of visible LIDAR data points 23 is lower than a visibility threshold, the no-care attribute is assigned to the respective label 19. The visibility threshold may be set to two LIDAR data points, for example. In this case, the right object or label 19 as shown in
[0068]
[0069] For most of the bounding boxes or labels 19 which are shown in
[0070] It is noted that the above procedure of geometric tagging is also referred to as z-buffering in computer graphics. As an alternative to the cylinder 27 (see
[0071] For providing a reliable ground truth for the training of the machine-learning algorithm, the attributes 22 determined by activation tagging and by geometric tagging are combined. That is, a label 19 obtains the care attribute only if both the activation tagging and the geometric tagging have provided the care attribute to the respective label, i.e., if the label can be perceived by the radar sensors 13 due to sufficient radar energy and is geometrically visible (not occluded) for at least one of the radar sensors 13.
[0072] For the training of the machine-learning algorithm, i.e., of the radar neural network, labelled data are provided which include inputs in the form of primary data from the radar sensors 13 and the labels 19 which are also referred to as ground truth and which are provided in the form of bounding boxes 19 (see
[0073] During the training, two types of contributions may be received by the loss function. For positive loss contributions, weights of a model on which the machine-learning algorithm relies are increased if these weights contribute constructively to a prediction corresponding to the ground truth or label 19. Conversely, for negative loss contributions the weights of the model are decreased if these weights contribute constructively to a prediction which does not correspond to the ground truth, i.e., one of the labels 19.
[0074] For the training procedure according to the present disclosure, the labels 19 having the care attribute are generally permitted to provide positive and negative loss contributions to the loss function. For the labels 19 having the no-care attribute, neither positive nor negative contributions could be permitted, i.e., labels having the no-care attribute could simply be ignored. Hence, the machine-learning algorithm could not be forced to predict any label or object which is not perceivable by the radar sensors 13. However, any wrong prediction would also be ignored and not analyzed by a negative loss contribution in this case. Therefore, the negative loss contribution is at least to be permitted for labels having the non-care attribute.
[0075] To improve the training procedure, a dynamic positive loss contribution is also permitted for the labels 19 having the no-care attribute. In detail, a positive loss contribution is generated for a label 19 having a no-care attribute only if a confidence value P for predicting a ground truth label 19 is greater than a predefined threshold τ, i.e., P>τ, wherein τ is greater than or equal to 0 and smaller than 1.
[0076] Allowing dynamic positive loss contributions for labels 19 having a no-care attribute allows the machine-learning algorithm or model to use complex queues, i.e., multi-reflections, temporal queues, to predict e.g., the presence of objects. Hence, permitting positive loss contributions for labels having the no-care attribute in a dynamic manner (i.e., by controlling the predefined threshold τ for the confidence value P) will strengthen complex decisions and improve the performance of the model predictions via the machine-learning algorithm.
[0077] In
[0078] In
[0079] The solid lines 41, 43 and 45 represent the results for a machine-learning algorithm which does not use the attributes 22 for the labels 19, i.e., the care and no-care attributes have not used for the training of the machine-learning algorithm. In contrast, the dashed lines 42, 44 and 46 depict the results for a machine-learning algorithm which has been trained via labels 19 having the care or no-care attribute according to the present disclosure.
[0080] Pairs of lines shown in
REFERENCE NUMERAL LIST
[0081] 11 host vehicle [0082] 13 radar sensor [0083] 15 LIDAR system [0084] 16 vehicle coordinate system [0085] 17 processing unit [0086] 18 x-axis [0087] 19 bounding box, label [0088] 20 y-axis [0089] 21 normalized radar energy [0090] 23 LIDAR data point [0091] 25 line of sight [0092] 27 cylinder [0093] 29 occluded LIDAR data point [0094] 31 region without LIDAR data points [0095] 33 LIDAR data point visible for the radar sensor [0096] 35 LIDAR data point occluded for the radar sensor [0097] 37 occluded label [0098] 41 line for class “pedestrian”, labels without attributes [0099] 42 line for class “pedestrian”, labels with attributes [0100] 43 line for class “moving vehicle”, labels without attributes [0101] 44 line for class “moving vehicle”, labels with attributes [0102] 45 line for class “stationary vehicle”, labels without attributes [0103] 46 line for class “stationary vehicle”, labels with attributes