Methods and Systems for Object Detection

20230093301 · 2023-03-23

    Inventors

    Cpc classification

    International classification

    Abstract

    This disclosure describes systems and techniques for object detection. In aspects, techniques include obtaining 3D data including range data, angle data, and doppler data. The techniques further include processing a deep-learning algorithm on the 3D data to obtain processed 3D data and obtaining processed 2D data from the processed 3D data. The processed 2D data includes range data and angle data.

    Claims

    1. A computer-implemented method comprising: obtaining three-dimensional (3D) data, the 3D data comprising range data, angle data, and doppler data; processing a deep-learning algorithm on the 3D data to obtain processed 3D data; and obtaining processed two-dimensional (2D) data from the processed 3D data, the processed 2D data comprising range data and angle data.

    2. The computer-implemented method as described in claim 1, further comprising: decomposing, prior to processing the deep-learning algorithm, the 3D data into three sets of 2D data, a first set of 2D data comprising range data and angle data, a second set of 2D data comprising range data and doppler data, and a third set of 2D data comprising angle data and doppler data.

    3. The computer-implemented method as described in claim 2, wherein processing the deep-learning algorithm on the 3D data comprises processing the first set of 2D data, the second set of 2D data, and the third set of 2D data individually.

    4. The computer-implemented method as described in claim 3, wherein processing of the first set of 2D data comprises processing a compression algorithm.

    5. The computer-implemented method as described in claim 2, wherein processing at least one of the first set of 2D data, of the second set of 2D data, or the third set of 2D data comprises processing a convolution algorithm.

    6. The computer-implemented method as described in claim 2, wherein processing of the first set of 2D data comprises processing a dropout algorithm.

    7. The computer-implemented method as described in claim 2, wherein processing the deep-learning algorithm on the 3D data comprises processing a position encoding algorithm on the second set of 2D data and the third set of 2D data.

    8. The computer-implemented method as described in claim 2, wherein processing the deep-learning algorithm on the 3D data further comprises aligning the first set of 2D data, the second set of 2D data, and the third set of 2D data in the first set of 2D data.

    9. The computer-implemented method as described in claim 2, wherein processing the deep-learning algorithm on the 3D data further comprises aligning the first set of 2D data, the second set of 2D data, and the third set of 2D data in the first set of 2D data by applying a cross-attention algorithm attending from the first set of 2D data on the second set of 2D data and on the third set of 2D data.

    10. The computer-implemented method as described in claim 1, wherein processing the deep-learning algorithm on the 3D data further comprises processing a convolution algorithm.

    11. The computer-implemented method as described in claim 10, wherein processing the convolution algorithm processes the angle data and the doppler data.

    12. The computer-implemented method as described in claim 1, wherein processing the convolution algorithm processes the range data and the angle data.

    13. The computer-implemented method as described in claim 1, wherein processing the deep-learning algorithm on the 3D data further comprises processing an upsample algorithm.

    14. The computer-implemented method as described in claim 1, wherein obtaining the 3D data comprises obtaining range data, antenna data, and doppler data, the computer-implemented method further comprising: processing at least one of a Fourier transformation algorithm or a dense layer algorithm; and processing an absolute number algorithm.

    15. A computer system comprising: a radar device; a processing device; and a non-transistory computer-readable medium storing one or more programs, the one or more programs comprising instructions, which when executed by the processing device, cause the computer system to: obtain 3D data, the 3D data comprising range data, angle data, and doppler data; process a deep-learning algorithm on the 3D data to obtain processed 3D data; and obtain processed 2D data from the processed 3D data, the processed 2D data comprising range data and angle data.

    16. The computing system as described in claim 15, further comprising: decomposing, prior to processing the deep-learning algorithm, the 3D data into three sets of 2D data, a first set of 2D data comprising range data and angle data, a second set of 2D data comprising range data and doppler data, and a third set of 2D data comprising angle data and doppler data.

    17. The computing system as described in claim 16, wherein processing the deep-learning algorithm on the 3D data comprises processing the first set of 2D data, the second set of 2D data, and the third set of 2D data individually.

    18. The computing system as described in claim 17, wherein processing of the first set of 2D data comprises processing a compression algorithm.

    19. The computing system as described in claim 16, wherein processing at least one of the first set of 2D data, of the second set of 2D data, or the third set of 2D data comprises processing a convolution algorithm.

    20. The computing system as described in claim 16, wherein processing the deep-learning algorithm on the 3D data further comprises aligning the first set of 2D data, the second set of 2D data, and the third set of 2D data in the first set of 2D data by applying a cross-attention algorithm attending from the first set of 2D data on the second set of 2D data and on the third set of 2D data.

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0031] Example embodiments and functions of the present disclosure are described herein in conjunction with the following drawings, shown schematically in:

    [0032] FIG. 1 illustrates a view of an embodiment of a computer system according to the present disclosure;

    [0033] FIG. 2 illustrates a flow chart of an embodiment of a method according to the present disclosure as being carried out by the computer system of FIG. 1;

    [0034] FIG. 3 illustrates a more detailed flow chart of an embodiment of a method according to FIG. 2;

    [0035] FIG. 4 illustrates a part of a method according to FIG. 3;

    [0036] FIG. 5 illustrates another part of a method according to FIG. 3;

    [0037] FIG. 6 illustrates a more detailed flow chart of another embodiment of a method according to FIG. 2; and

    [0038] FIG. 7 illustrates another more detailed flow chart of another embodiment of a method according to FIG. 6.

    DETAILED DESCRIPTION

    [0039] FIG. 1 depicts a view of an embodiment of a computer system 10 according to the present disclosure. Therein, the computer system 10 comprises a radar device 12, which may also be named a radar sensor, and a processing device 14, which may also be named a processor. The computer system 10 may in particular be embedded into a vehicle (not shown).

    [0040] The computer system 10 is in particular adapted to carry out a computer-implemented method for object detection. Therein, the radar device 12 is adapted to obtain 3D data, the 3D data comprising of range data, angle data, and doppler data. The processing device 14 is adapted to process a deep-learning algorithm on the 3D data to obtain processed 3D and to obtain processed 2D data from the processed 3D data, the processed 2D data comprising of range data and angle data.

    [0041] The processing device 12 may be further adapted to decompose the 3D data into three sets of 2D data, a first set of 2D data comprising range data and angle data, a second set of 2D data comprising range data and doppler data, and a third set of 2D data comprising angle data and doppler data. Therein, processing a deep-learning algorithm on the 3D data comprises processing the first set of 2D data, the second set of 2D data, and the third set of 2D data individually.

    [0042] The processing of the first set of 2D data may comprise processing a compression algorithm.

    [0043] The processing the first set of 2D data and/or the second set of 2D data and/or the third set of 2D data may also comprise processing a convolution algorithm.

    [0044] The processing of the first set of 2D data may also comprise processing a dropout algorithm.

    [0045] Processing the deep-learning algorithm on the 3D data may also comprise processing a position encoding algorithm on the second set of 2D data and the third set of 2D data.

    [0046] Processing the deep-learning algorithm on the 3D data may further comprise aligning the first set of 2D data, the second set of 2D data, and the third set of 2D data in the first set of 2D data.

    [0047] Processing the deep-learning algorithm on the 3D data may further comprise aligning the first set of 2D data, the second set of 2D data, and the third set of 2D data in the first set of 2D data by applying a cross-attention algorithm attending from the first set of 2D data on the second set of 2D data and on the third set of 2D data.

    [0048] Processing the deep-learning algorithm on the 3D data may further comprise processing a convolution algorithm.

    [0049] Obtaining the 3D data may also comprise obtaining of range data, obtaining of antenna data, and obtaining of doppler data. Therein, the processing device 14 may be further adapted to process one or more of a Fourier transformation algorithm, a dense layer algorithm, and an Abs algorithm.

    [0050] Processing the deep-learning algorithm on the 3D data may further comprise processing a convolution algorithm on the angle data, the angle data, and the doppler data

    [0051] Processing the deep-learning algorithm on the 3D data may further comprise processing a convolution algorithm on the angle data and the doppler data.

    [0052] Processing the deep-learning algorithm on the 3D data may also further comprise processing a convolution algorithm on the range data and the angle data.

    [0053] Processing the deep-learning algorithm on the 3D data may further comprise processing an upsample algorithm.

    [0054] FIG. 2 depicts a flow chart of an embodiment of a method 100 according to the present disclosure. The method 100 includes a first step 110 including obtaining 3D data. The 3D data may include range data, angle data, and doppler data. The method 100 include a second step 120 including processing a deep-learning algorithm on the 3D data to obtain processed 3D data. The method 100 includes a further step 130 including obtaining processed 2D data from the processed 3D data. The processed 2D data may include range data and angle data. Aspects of the method 100 will be described in more detail in line with FIGS. 3 to 5.

    [0055] FIG. 3 depicts more detailed flow chart of an embodiment of a method 1100 according to the present disclosure. The method 1100 includes three different method or processing paths, 1110, 1120, and 1130. In a first step, not shown in the flow chart of FIG. 3, 3D data is obtained. The 3D data includes range data R, angle data A, and doppler data D, thus representing an RAD cube.

    [0056] In a second step, also not shown in the flow chart of FIG. 3, the 3D data is decomposed into three sets of 2D data: a first set of 2D data including range data and angle data; a second set of 2D data including range data and doppler data; and a third set of 2D Data including angle data and doppler data. Those three sets of 2D data are represented by the three processing paths 1110, 1120, and 1130. In particular, the input to the three paths is denominated by the amount of radar sensors S, the amount of range data bins R, the amount of angle data bins A, and the amount of doppler data bins D, i.e., R.sup.S×R×A×D.

    [0057] The three sets of 2D data are then processed individually along the three paths 1110, 1120, and 1130 through a deep-learning algorithm to obtain processed 2D data from the processed 3D data. The processed 2D data may include range data and angle data.

    [0058] Therein, the first path 1110 is directed to the processing of the first set of 2D data, i.e., the RA data, the second path 1120 is directed to the processing of the second set of 2D data, i.e., the RD data and the third path 1130 data is directed to the processing of the third set of 2D data, i.e., the AD data.

    [0059] In another step not shown in the method 1100 in FIG. 3, this input R.sup.S×R×A×D is transposed so that the doppler dimension is defined as the feature dimension in the RA path 1110, the angle dimension is defined in the RD path 1120 and the range dimension is defined as the feature dimension in the AD path 1130. The first layers of each path are thus utilized to separately map the energy from the initial RAD cube to a latent feature representation in RA, RD, and AD.

    [0060] Going along the first path 1110, the RA data are in a first step 1111 initially processed with a compression algorithm, in particular, a RA doppler compression algorithm.

    [0061] In the further steps 1112, 1113, and 1114 along the first path 1110, the RA data are processed with three convolutions sequentially.

    [0062] Similarly, in the steps 1121, 1122, and 1123 along the second path 1120, the RD data are processed with three convolutions sequentially. However, in contrast to the first path, the convolution algorithm in the steps 1122 and 1123 are performed with an additional stride of, for example, 4, to compress D while maintaining the spatial resolution in R.

    [0063] Similarly, in the steps 1131, 1132, and 1133 along the third path 1130, the AD data are processed with three convolutions sequentially. In alignment with the second path 1120, the convolution algorithm in the steps 1132 and 1133 are performed with an additional stride of, for example, 4, to compress D while maintaining the spatial resolution in A.

    [0064] The convolutions along the paths 1110, 1120, and 1130 are applied to extract spatially local correlations in each of the three paths 1110, 1120, and 1130 in parallel. All of the convolution algorithms may be processed with or without mirrored padding, which reflects features of each plane at its edges.

    [0065] In the first path 1110, as a further step 1115, a dropout algorithm is processed. In particular, by randomly setting all values in RA to 0 with a probability of p during training, the network is forced to rely on returns from RD and AD paths, which increases the overall robustness of the algorithm.

    [0066] In a further step 1141 in parallel to the second path 1120 and the third path 1130, a position encoding algorithm is processed and appended to both paths individually. This is performed by linear interpolation between 0 and 1 along all remaining doppler bins and concatenation of this value in feature dimension for RA and AD.

    [0067] In a further step 1151, the three paths 1110, 1120, and 1130 are aligned in the range-angle plane, i.e., in RA. This is performed by calculating the mean of features along the doppler entries of the same spatial dimension range and angle in RD and AD, respectively, resulting in tensors R.sup.RD of the shape range×(number of features) and A.sup.AD of shape angles×(number of features). In particular, to align or concatenate all three paths 1110, 1120, and 1130 in the range-angle plane, i.e., RA, R.sup.RD, and A.sup.AD are repeated along the missing dimensions angle and range, respectively, resulting in RA.sup.RD and RA.sup.AD.

    [0068] In a last step 1152, a further convolution algorithm is processed to extract patterns within the aligned tensors RA, RA.sup.RD, and RA.sup.AD.

    [0069] Optionally, and in particular alternatively in step 1151, it is possible to map from RD to RA and AD to RA by attending each doppler bin in AD and RD from RA of the same spatial dimension angle and range so that a dynamic weighting of the entries along doppler dimensions can be performed.

    [0070] For this purpose, the number of output features of step 1114 will be increased by len_query and the number of output features of step 1123 and 1133 will both be increased by len_key. The initial amount of output features is here defined as len_values. The input to this alternative is therefore composed as follows:

    TABLE-US-00001 first path second path third path V len_values.sup.RA len_values.sup.RD len_values.sup.AD Q len_query.sup.RA K len_key.sup.RD len_key.sup.AD

    [0071] This is shown in further detail in FIGS. 4 and 5, which state the process of attending and compressing values in each doppler bin along the same spatial dimensions for both RD and AD planes: First, a positional encoding is added along the doppler dimension in both V.sup.RD and V.sup.RD. Therefore, an array along the doppler dimension is linearly interpolated between 0 and 1 and concatenated to the feature dimensions in V.sup.RD and V.sup.AD.

    [0072] Then, each V, K pair of RD is repeated along the angular dimension of length 1, resulting in K.sup.RD*, V.sup.RD*, and each V, K pair of AD is repeated along the range dimension of length J resulting in K.sup.AD*, V.sup.AD*. In the next step, for every position i, j, the dot product between the query in RA, Q.sup.RA.sub.i,j, and K.sup.AD*.sub.i,j as well as Q.sup.RA.sub.i,j and K.sup.RD*.sub.i,j is calculated (in FIGS. 4 and 5). The resulting entries are normalized, potentially by some exponential function

    [00001] X Norm = A x m .Math. m = 0 M A x m

    [0073] for A indicating some generic value, e.g., e or 2, to avoid negative values, resulting in ATT.sup.RD and ATT.sup.AD so that Σ.sub.m=0.sup.MATT.sub.i,j,m.sup.RD=1 and Σ.sub.n=0.sup.MATT.sub.i,j,m.sup.AD=1 for M defining the length of the doppler dimension in the second and third path. Finally, the features in V.sup.RD* and V.sup.AD* are element-wise multiplied by ATT.sup.RD and ATT.sup.AD (c), respectively, and summed over along the doppler dimension (d) for each position i,j:


    RA.sub.i,j.sup.RD=Σ.sub.m=0.sup.MV.sub.i,j,m.sup.RD**ATT.sub.i,j,m.sup.RD and RA.sub.i,j.sup.AD=Σ.sub.m=0.sup.MV.sub.i,j,m.sup.AD**ATT.sub.i,j,m.sup.AD

    [0074] The resulting tensors RA, RA.sup.AD, and RD.sup.RD are then concatenated in feature dimension and processed by an alignment convolution.

    [0075] In particular, through the embodiment as shown in FIG. 3, a learned 2D Convolution is applied to map energy in the RAD cube to latent representations in each of RA, RD, AD planes.

    [0076] In contrast to previously known approaches, which are utilizing 3D Convolutions that are costly in terms of processing time for processing the three planes, the present embodiment effectively processes RA, RD, and AD planes by solely utilizing 2D Convolutions and is, therefore, more suitable for application on embedded systems.

    [0077] Further, by randomly setting all values in RA to 0 during training with probability p, the network is forced to rely on returns from RD and AD paths. This dropout is not applied during inference.

    [0078] Additionally, by attending on RD and AD by queries calculated from RA, as described in line with FIGS. 4 and 5, the doppler dimension in RD and AD can be dynamically compressed. While entries along the doppler dimension for a given spatial position range and angle in RD and AD are initially compressed by an equal weighting of each cell in the Doppler dimension (by calculating the mean), this weighting can be conducted by calculating queries in RA and keys in RD and AD. Features in RD and AD are then mapped to RA depending on how well the keys in RD and AD match the query in RA (defined by calculating the dot product between queries and keys followed by a normalization along the doppler dimension for a given entry in range-angle).

    [0079] Furthermore, by appending a positional encoding along the Doppler dimension before attention followed by compression, the radial velocity information in the resulting maps in RA.sup.AD and RD.sup.RD is maintained. As a result, the algorithm is able to dynamically attend from RA to RD and AD.

    [0080] FIG. 6 depicts a more detailed flow chart of another embodiment of a method 1200 according to the present disclosure.

    [0081] Therein, in a first step 1201, a 3D data is obtained. Therein, obtaining the 3D data comprises obtaining range data, antenna data, and obtaining doppler data. The 3D data thus comprises of range data R, antenna data a, and doppler data D, thus representing an RaD cube.

    [0082] In a next step 1202, the RaD cube is processed with either one of a Fourier transformation algorithm, in particular a discrete Fourier transformation, further in particular a small discrete Fourier transformation, and a dense layer algorithm. In case of a Fourier transformation, the RaD cube is transformed into a RD cube with complex frequencies instead of antennas. In case of a dense layer algorithm, the RaD cube results in an abstract version of it. The processing of these algorithms either does not increase the number of output bins or only slightly increases the number of output bins.

    [0083] The data is then processed with an Abs or absolute number algorithm. By processing an Abs algorithm, an angle is achieved and thus the capacity is reduced by a factor of 2. This results in obtaining the RAD cube in step 1203, wherein the 3D data comprises of range data R, angle data A, and doppler data D.

    [0084] In a further step 1204, the RAD cube is processed with a convolution algorithm. In particular, multiple convolutions are applied to the small RAD cube, which starts to reduce the doppler dimension in the same magnitude as the feature dimension increases. This reduction may for example be achieved by strided convolutions or max pooling. As the angle dimension is small, the size of the present RAD cube is comparable to the size of an image with a small feature dimension. This results in manageable complexities of the 3D convolutions.

    [0085] In a further step 1205, a further convolution algorithm is processed. In this particular case an AD convolution is performed such that the information is transformed from the doppler domain to the angle domain. For this purpose, 2D convolutions are applied on angle and doppler domain to identify correlations. After each convolution, the angle resolution is increased by processing an upsampling algorithm. This convolution reduces the doppler dimension in the same way as the angles are increased. Instead of processing convolutions together with upsampling, it is also possible to use transposed convolutions with strides on the doppler dimension.

    [0086] In any case, this step 1205 continuously refines the angles using the doppler information and further compresses the doppler at the same time. After processing this step 1205, the doppler dimension and the feature dimension are reshaped to a single dimension.

    [0087] In a further step 1206, a further convolution algorithm is processed. In this particular case, an RA convolution is performed, which results in a refinement in the range-angle domain. As these two are the spatial domains, these convolutions are processed to fulfill a spatial refinement on both spatial domains together. As the doppler domain has been previously merged into the feature dimension, these convolutions are convolutions in 2D polar space.

    [0088] Optionally, and depending on the original shape of the RAD cube and the desired final angular resolution, further upsampling algorithms and/or transposed convolutions can be processed on the angle dimension. The result is processed 2D data, in particular an RA grid with several features, as put out in step 1207.

    [0089] Through this particular embodiment, there is neither created a bottleneck in processing, i.e., a layer with less real entries than the input or the output, nor a layer with a higher number of entries than the input and output. Therefore, this embodiment provides a solution that does not lose information and at the same time does not require an increase of capacity.

    [0090] FIG. 7 depicts another more detailed flow chart of another embodiment of a method according to the present disclosure. In particular, the method as shown in FIG. 7 is based on the method as shown in FIG. 6 wherein the method steps 1301 to 1307 in FIG. 7 are the same as method steps 1201 to 1207 as described in line with FIG. 6 unless otherwise stated.

    [0091] In particular, according to the embodiment as depicted in FIG. 7, ego-motion of a vehicle is considered in addition to the embodiment according to FIG. 6.

    [0092] Therein, the ego-motion of the vehicle is obtained in step 1308. In a further step 1309 relative speed of stationary objects per angle bin are calculated based on the ego-motion. This information can then be fed to the step 1306, in which the RA convolution is processed.

    [0093] In addition, and optionally, the RAD cube from step 1303 can be used to extract ego-motion dependent features, like, for example, extracting the bin representing zero absolute speed for stationary targets. This can further help to identify stationary targets as well as providing a more accurate relative speed propagation.

    [0094] The embodiments as depicted in FIGS. 6 and 7 achieve no information loss, i.e., as much information as possible as provided by the sensor is maintained, no resource wasting, i.e., higher resource consumption is avoided, an application of the doppler to refine the angle and operations are used which can be accelerated on chips, thus improving embeddability.

    [0095] In particular, and in contrast to the present embodiment as depicted in FIGS. 6 and 7, other solutions tend to start with an angle finding, which increases the angular resolution as a first step and results in an RAD cube, which however is much bigger than the presently used RaD cube as most of the radar sensors usually have only a few (virtual) antennas. These big RAD cubes need to have a shallow (i.e., only very few layers, mostly just one), aggressive and therefore potentially lossy compression afterwards. But the angle finding cannot create additional information. All the information was already there in the RaD cube.

    [0096] Further, the present solution as depicted in FIGS. 6 and 7 does not increase the cube in that manner. Instead, the idea is to keep the capacity (the number of neurons) of each layer within some range and smoothly transform it to the desired output capacity, i.e., the capacity of the final range-angle grid. Operating on this smaller cube removes the need of aggressive compressions and offers freedom to operate on two or three dimensions of the cube at the same time. It is therefore possible to define a smooth transformation from a cube with a high doppler and low angle resolution to a 2D range-angle grid with high angle resolution and just a few features.

    [0097] In particular, as the RAD cube is small, it is possible to perform convolutions along multiple dimensions, even on all three dimensions together, with an additional feature dimension and the network is therefore having the possibility to transform information from one dimension to another. It is important to notice that this approach is not relying on a CFAR (constant false alarm rate) thresholded cube. It can operate on a fully filled cube where no data was removed by any kind of thresholding. Because it has a smooth change in the capacity, no aggressive compression and no bottlenecks, it has the architecture to keep all the information provided by the sensor

    [0098] Further, the network has modules to operate on several dimensions at the same time, especially, it can operate on all three dimensions at the same time (RAD convolutions) and has a processing dedicated for the information transfer between the angle and the Doppler dimension (AD convolutions). This enables the network to use the doppler to refine the angle.

    [0099] Lastly, the present embodiment as depicted in FIGS. 6 and 7 consists of operations which can be accelerated by chips which are optimized for matrix multiplications and convolutions (Dense layers, 2D convolutions, upsamples). Even the 3D convolutions used for the RAD convolutions and AD convolutions can be rewritten as equivalent 2D group convolutions if the stride in doppler is identical to the kernel size in doppler. This makes the architecture runnable on hardware that only supports 2D convolutions and not 3D convolutions.

    [0100] List of Reference Characters for the Elements in the Drawings

    [0101] The following is a list of the certain items in the drawings, in numerical order. Items not listed in the list may nonetheless be part of a given embodiment. For better legibility of the text, a given reference character may be recited near some, but not all, recitations of the referenced item in the text. The same reference number may be used with reference to different examples or different instances of a given item. [0102] 10 computer system [0103] 12 radar device [0104] 14 processing device [0105] 100 method [0106] 110 method step [0107] 120 method step [0108] 130 method step [0109] 1100 method [0110] 1110 first method path [0111] 1111 method step [0112] 1112 method step [0113] 1113 method step [0114] 1114 method step [0115] 1115 method step [0116] 1120 second method path [0117] 1121 method step [0118] 1122 method step [0119] 1123 method step [0120] 1130 third method path [0121] 1131 method step [0122] 1132 method step [0123] 1133 method step [0124] 1141 method step [0125] 1151 method step [0126] 1152 method step [0127] 1200 method [0128] 1201 method step [0129] 1202 method step [0130] 1203 method step [0131] 1204 method step [0132] 1205 method step [0133] 1206 method step [0134] 1207 method step [0135] 1300 method [0136] 1301 method step [0137] 1302 method step [0138] 1303 method step [0139] 1304 method step [0140] 1305 method step [0141] 1306 method step [0142] 1307 method step [0143] 1308 method step [0144] 1309 method step