Methods and Systems for Generating Ground Truth Data

20220402504 · 2022-12-22

    Inventors

    Cpc classification

    International classification

    Abstract

    A computer-implemented method for generating ground truth data may include the following steps carried out by computer hardware components: for a plurality of points in time, acquiring sensor data for a respective point in time; and for at least a subset of the plurality of points in time, determining ground truth data of the respective point in time based on the sensor data of at least one present and/or past point of time and at least one future point of time.

    Claims

    1. A computer-implemented method for generating ground truth data, the method comprising: for a plurality of points in time, acquiring sensor data for a respective point in time; and for at least a subset of the plurality of points in time, determining ground truth data of the respective point in time based on the sensor data of a future point of time and at least one of a present point of time or a past point of time.

    2. The computer-implemented method of claim 1, wherein: at least one of the present point of time, the past point of time, or the future point of time are relative to the respective point in time.

    3. The computer-implemented method of claim 1, wherein: the sensor data includes at least one of radar data or lidar data.

    4. The computer-implemented method of claim 1, further comprising: training a machine-learning model based on the ground truth data.

    5. The computer-implemented method of claim 4, wherein the machine-learning model is configured to at least one of: determine an occupancy grid; or classify an object with respect to underdrivability.

    6. The computer-implemented method of claim 5, wherein the determining comprises: determining the ground truth data based on at least two maps.

    7. The computer-implemented method of claim 6, wherein: the at least two maps include a full-range map based on scans that are irrespective of a range of the scans.

    8. The computer-implemented method of claim 7, wherein: the at least two maps include a limited-range map based on scans that are below a pre-determined range threshold.

    9. The computer-implemented method of claim 8, further comprising: labeling a cell as non-underdrivable or underdrivable based on a probability of the cell in the full-range map and a probability of the cell in the limited-range map.

    10. The computer-implemented method of claim 9, wherein the labeling comprises: labeling the cell as non-underdrivable responsive to the probability of the cell in the limited-range map being above a first pre-determined threshold.

    11. The computer-implemented method of claim 10, wherein the labeling further comprises: labeling the cell as underdrivable responsive to the probability of the cell in the full-range map being above a second pre-determined threshold and the probability of the cell in the limited-range map being equal to a value representing no occupation in the cell.

    12. A non-transitory computer-readable medium storing one or more programs comprising instructions, which when executed by at least one processor, cause the at least one processor to perform operations including: for a plurality of points in time, acquiring sensor data for a respective point in time; and for at least a subset of the plurality of points in time, determining ground truth data of the respective point in time based on the sensor data of a future point of time and at least one of a present point of time or a past point of time.

    13. The non-transitory computer-readable medium of claim 12, wherein the operations further include: training a machine-learning model based on the ground truth data, the machine-learning model configured to determine an occupancy grid.

    14. The non-transitory computer-readable medium of claim 12, wherein the operations further include: training a machine-learning model based on the ground truth data, the machine-learning model configured to classify at least one of an object or a cell with respect to underdrivability or non-underdrivability.

    15. The non-transitory computer-readable medium of claim 12, wherein the determining comprises: determining the ground truth data based on at least two maps, the at least two maps including a full-range map based on scans that are irrespective of a range of the scans and a limited-range map based on scans that are below a pre-determined range threshold.

    16. A system comprising: one or more processors; and a memory coupled to the one or more processors, the memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions that, when executed by the one or more processors, cause the one or more processors to: for a plurality of points in time, acquire sensor data for a respective point in time; and for at least a subset of the plurality of points in time, determine ground truth data of the respective point in time based on the sensor data of a future point of time and at least one of a present point of time or a past point of time.

    17. The system of claim 16, wherein the one or more programs include further instructions that, when executed by the one or more processors, cause the one or more processors to: train a machine-learning model based on the ground truth data.

    18. The system of claim 17, wherein the machine-learning model comprises an artificial neural network.

    19. The system of claim 16, wherein the one or more programs include further instructions that, when executed by the one or more processors, cause the one or more processors to: determine the ground truth data based on at least two maps, the at least two maps including a full-range map and a limited-range map.

    20. The system of claim 19, wherein the one or more programs include further instructions that, when executed by the one or more processors, cause the one or more processors to: label a cell as non-underdrivable or underdrivable based on a probability of the cell from the full-range map and a probability of the cell from the limited-range map.

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0028] Example implementations and functions of the present disclosure are described herein in conjunction with the following drawings, showing schematically:

    [0029] FIG. 1 is an illustration of a traditional pipeline of occupancy grid creation;

    [0030] FIG. 2 is an example occupancy grid creation in the training procedure according to various implementations;

    [0031] FIG. 3 is an illustration of an example mask which may be defined as a certain region around the path taken by the ego-vehicle;

    [0032] FIG. 4 is a flow diagram illustrating an example method for generating ground truth data according to various implementations; and

    [0033] FIG. 5 is an example computer system with a plurality of computer hardware components configured to carry out steps of a computer-implemented method for generating ground truth data according to various implementations.

    DETAILED DESCRIPTION

    [0034] Employing machine learning methods, for example artificial neural networks, on low-level radar data for object detection and environment classification may provide superior results compared to traditional methods working on conventionally filtered radar detections, as shown by RaDOR.Net (in European Patent Application No. 20187674.5, now European Published Patent Application EP 3 943 968, published Jan. 26, 2022, which is incorporated herein in its entirety for all purposes). The low-level radar data may, for example, include radar data arranged in a cube, which can be sparse as all beamvectors below a CFAR (constant false alarm rate) level may be suppressed. In some cases, missing antenna elements in the beamvector may be interpolated, and calibration may be applied—e.g., with the bin-values being scaled according to the radar equation.

    [0035] The superior results may be explained by the fact that the radar data contains plenty of information that is removed due to detection filtering and by the ability of the machine learning method to filter this large amount of data in a sophisticated way.

    [0036] In addition to rich and genuine input sensor data, the preparation of ground truth (GT) data may be relatively important. The GT data can represent the desired output of the machine learning method while not forcing the machine learning method to create an output that fails to actually be represented by the input sensor data.

    [0037] For example, creating the GT data (manually or automatically) based on a stronger reference (e.g., Lidar) may yield a detailed and precise GT but may overstrain the machine learning method by requesting an output it cannot actually see from the input sensor position or due to the different kind of data acquisition of reference and input sensor (e.g., Lidar and Radar). This effect can bear a potential negative effect on the system output.

    [0038] According to various implementations, the GT data may be determined without using an additional reference sensor. Example applications are determining of an Occupancy Grid (OCG) via a machine learning method or underdrivability classification using a machine learning method. The training pipeline may employ a traditional OCG method on conventionally filtered radar detections to automatically create the GT for the network to train. The relatively naïve procedure of presenting the respective OCG frame output to the network at training would apparently limit the network to output OCG data resembling the quality of the utilized OCG method.

    [0039] Due to the radar filtering, this method may react only to relatively “strong” signals and may thus delay the time until distant oncoming structures are identified. The machine learning method, in contrast, may have the capability to identify relatively “weak” signals in the radar data (for example, the low-level radar data) to detect these oncoming structures earlier in case it was taught to using appropriate GT that includes these more-distant structures.

    [0040] According to various implementations, this appropriate GT may be created by feeding the method additional sensor data from “future timestamps” when creating the GT for a current timestamp. This results in a more complete ground truth data while still being based on data of the input sensor only, which incorporates distant and high structures as well, as they lead to “strong” signals in these additional future frames.

    [0041] FIG. 1 shows an illustration 100 of a pipeline of traditional OCG creation. The OCG 102 created at the current time 104 is only based on sensor data 106 of (or up to) this point in time 104 and on the OCG 108 of the previous point in time.

    [0042] FIG. 2 shows an illustration 200 of an example training pipeline according to various implementations. A general OCG technique may be utilized to create GT data 102. However, GT data 202 can further be created for network training based on future input sensor data 204 in addition to the current and/or past input sensor data 106 in order to create a more complete output of the GT data 202 for the current time step 104. The machine learning method (for example network) may be trained on this enriched GT. On execution time, the machine learning method (for example network) may be fed by the current radar data 106 (for example low-level radar data) only. It will be understood that at execution time, the future sensor data 204 is of course not available; however, for training, a sequence of historic sensor data may be used, and this sequence includes future time steps (relative to the earlier time steps in relation to the future time steps). The network output 108 of the previous timestamp (or time step) may either be fed explicitly or stored within the network nodes (for example in a recurrent neural network).

    [0043] According to various implementations, lower-level radar data may be used with an OCG method or traditional underdrivability classification for GT creation.

    [0044] According to various implementations, a combination with an additional sensor (e.g., Lidar) may be provided.

    [0045] According to various implementations, the methods as described herein may be used for alternative network output (e.g., multiclass SemSeg instead of OCG). SemSeg stands for semantic segmentation where each data point is assigned a higher level, meaning like a sidewalk or a road. At the same time, OCG may show whether the particular data point represents an occupied region or a free space, but in contrast to SemSeg no higher meaning.

    [0046] According to various implementations, the methods as described herein may be used for a radar-based automatic ground truth annotation system for underdrivability classification. For example, the method may be for automatically generating ground truth data for the classification problem of under- and non-underdrivability with a radar sensor.

    [0047] With the automatic ground truth generation as described herein, GT may be established with the used radar itself, an offline system to generate GT data for an online system may be provided, no manual labeling may be needed, no additional sensor may be needed, no additional installing of sensor hardware may be required, no extrinsic calibration/temporal sync may be required for any additional sensors (while calibration and/or synchronization may still be described for the radar itself), no additional software may be needed, and/or fast testing of new radars may be possible (for example, the radar may just need to be installed and driving may start).

    [0048] According to various implementations, the limited elevation field of view (FoV) may be leveraged to label regions as under- or non-underdrivable.

    [0049] Due to the limited elevation FoV, underdrivable objects may not be observable at close ranges in comparison to non-underdrivable objects which are also observed at lower ranges.

    [0050] In order to be able to generate labels for high ranges as well, not only data from the past to the present may be used, but data from the future path of the ego vehicle may be considered.

    [0051] Furthermore, the information where the ego vehicle (equipped with the radar sensor) drives may be considered during the labeling process.

    [0052] According to various implementations, in order to automatically generate ground truth data, two different occupancy grid maps may be created: [0053] a full-range omniscient map (FRom): An occupancy grid map may be created not only from the available information of past scans up to the present, but from additional scans including future ones (and hence, the approach may be referred to as “omniscient” due to being based on one or more future scans). [0054] a limited-range omniscient map (LRom): Like FRom, but the LRom focuses on detections below a fixed range threshold that may be considered during the mapping process to filter out underdrivable objects.

    [0055] Labeling may be possible in regions which are considered by the mapping process.

    [0056] FIG. 3 shows an illustration 300 of an example mask 316, which may be defined as a certain region around the path taken by the ego-vehicle 302. The size of that region may depend on the azimuth FoV of the radar sensor and the range threshold (e.g., 15 m or 20 m, like illustrated by arrow 306). The FoV for various example positions along the path taken by the ego-vehicle 302 are illustrated by triangles 304, 308, 310, 312, and 314 in FIG. 3. Illustratively speaking, the mask 316 can be the hull of these triangles that represent the FoV.

    [0057] In some cases, cells within the mask region 316 may be automatically labeled, but other remaining cells may be set as “unknown” and may be ignored during training of the machine-learning model.

    [0058] An example label logic based on FRom, LRom and the mask may be: [0059] A cell may be labelled as “non-underdrivable” if Probability(LRom)>0.5 and mask==1 (wherein mask==1 means that the cell is inside the mask region); [0060] A cell may be labelled as “underdrivable” if Probability(FRom)>0.5 and probability(LRom)==0.5 and mask==1.

    [0061] The default probability for the occupancy grid maps may be, for example, 0.5.

    [0062] By the above logics, a cell which is occupied according to the limited-range map is labelled as “non-underdrivable” (since objects which can be detected from a short distance “usually” are non-underdrivable). If an object is not present according to the limited-range map, but it is present according to the full-range map, the cell may be labelled as “underdrivable” (since objects which can be detected from a large distance, but not from a shorter distance, “usually” are underdrivable).

    [0063] The labeling approach according to various implementations may allow to generate ground truth for cells in a world-centric grid related to the classification of under- or non-underdrivable. As for the grid maps, a world-centric coordinate system and the detections from up to all scans may be used (including the future, and hence, the approach may be referred to as “omniscient”). The labels may then also be available for high ranges where the underdrivable objects are clearly observable within the FoV. This fact may allow machine learning methods to be trained that classify under- and non-underdrivable regions based on radar sensor information like elevation information or RCS (radar cross section) measurements for very high ranges.

    [0064] FIG. 4 shows a flow diagram 400 illustrating an example method for generating ground truth data according to various implementations. At 402, for a plurality of points in time, sensor data for the respective point in time may be acquired. At 404, for at least a subset of the plurality of points in time, ground truth data of the respective point in time may be determined based on the sensor data of at least one present and/or past point of time and at least one future point of time.

    [0065] FIG. 5 shows an example computer system 500 with a plurality of computer hardware components configured to carry out steps of a computer-implemented method for generating ground truth data according to various implementations. The computer system 500 may include a processor 502, a memory 504, and a non-transitory data storage 506. A sensor 508 may be provided as part of the computer system 500 (like illustrated in FIG. 5), or the sensor 508 may be provided external to the computer system 500.

    [0066] The processor 502 may carry out instructions provided in the memory 504. The non-transitory data storage 506 may store a computer program, including the instructions that may be transferred to the memory 504 and then executed by the processor 502. The sensor 508 may be used for determining the sensor data for the respective points in time.

    [0067] The processor 502, the memory 504, and the non-transitory data storage 506 may be coupled with each other, e.g., via an electrical connection 510, such as, e.g., a cable or a computer bus or via any other suitable electrical connection to exchange electrical signals. The sensor 508 may be coupled to the computer system 500, for example via an external interface, or may be provided as part(s) of the computer system 500 (e.g., internal to the computer system, for example coupled via the electrical connection 510).

    [0068] The terms “coupling” or “connection” are intended to include a direct “coupling” (for example via a physical link) or direct “connection” as well as an indirect “coupling” or indirect “connection” (for example via a logical link), respectively.

    [0069] It will be understood that what has been described for one of the methods above may analogously hold true for the computer system 500.

    REFERENCE NUMERAL LIST

    [0070] 100 illustration of a traditional pipeline of occupancy grid creation [0071] 102 occupancy grid [0072] 104 current time [0073] 106 sensor data of (or up to) the current time [0074] 108 occupancy grid of the previous point in time [0075] 200 occupancy grid creation in the training procedure according to various implementations [0076] 202 ground truth data [0077] 204 future input sensor data [0078] 206 preprocessing [0079] 208 real-time [0080] 300 illustration of a mask which may be defined as a certain region around the path taken by the ego-vehicle [0081] 302 ego-vehicle [0082] 304 triangle [0083] 306 arrow illustrating range threshold [0084] 308 triangle [0085] 310 triangle [0086] 312 triangle [0087] 314 triangle [0088] 316 mask [0089] 400 flow diagram illustrating an example method for generating ground truth data according to various implementations [0090] 402 step of, for a plurality of points in time, acquiring sensor data for the respective point in time [0091] 404 step of, for at least a subset of the plurality of points in time, determining ground truth data of the respective point in time based on the sensor data of at least one present and/or past point of time and at least one future point of time [0092] 500 example computer system according to various implementations [0093] 502 processor [0094] 504 memory [0095] 506 non-transitory data storage [0096] 508 sensor [0097] 510 connection