Method For Determining Spatial-Temporal Patterns Related To The Environment Of A Vehicle
20250005890 · 2025-01-02
Assignee
Inventors
Cpc classification
G06V10/469
PHYSICS
International classification
G06V10/46
PHYSICS
Abstract
A method is provided for determining patterns related to an environment of a host vehicle. Characteristics are detected by a perception system of the host vehicle within the environment of the host vehicle. At least two processing levels having different scales are defined. For each processing level, respective current input data associated with the characteristics for a current point in time is combined with respective memory data related to the characteristics for previous points in time in order to generate joint spatial-temporal data for the respective processing level. An attention algorithm is applied to the joint spatial-temporal data of all processing levels for generating an aggregated data set, and at least one pattern related to the environment is determined from the aggregated data set.
Claims
1. A computer implemented method for determining patterns related to an environment of a host vehicle from sequentially recorded data, the method comprising: determining sets of characteristics detected within the environment of the host vehicle by a perception system of the host vehicle, and via a processing unit of the host vehicle: defining at least two processing levels having different scales for data associated with the respective level, for each processing level, combining a respective set of current input data associated with the set of characteristics for a current point in time and a respective set of memory data related to sets of characteristics for previous points in time in order to generate a set of joint spatial-temporal data for the respective processing level, applying an attention algorithm to the sets of joint spatial-temporal data of all processing levels in order to generate an aggregated data set, and determining at least one pattern related to the environment of the host vehicle from the aggregated data set.
2. The computer implemented method according to claim 1, wherein: the respective set of current input data and the respective set of memory data are associated with respective grid maps having different respective spatial resolutions on each processing level.
3. The computer implemented method according to claim 2, wherein: a first processing level is provided with a highest grid resolution, and subsequent processing levels are provided with a grid resolution being lower than the highest grid resolution.
4. The computer implemented method according to claim 1, wherein: the attention algorithm includes a query vector being independent from the processing levels.
5. The computer implemented method according to claim 4, wherein: the attention algorithm includes respective key vectors and value vectors defined on each respective processing level after combining the respective set of current input data with the respective set of memory data.
6. The computer implemented method according to claim 5, wherein: the key vectors of each processing level are combined with the query vector in order to provide weights for elements of the value vectors.
7. The computer implemented method according to claim 6, wherein: the respective key vector and the respective value vector defined on the respective processing level are up-sampled to a resolution of the query vector if the resolution of the respective processing level is lower than the resolution of the query vector.
8. The computer implemented method according to claim 7, wherein: the up-sampling of the key vector is performed by applying an interpolation to elements of the key vector.
9. The computer implemented method according to claim 1, wherein: the combination of the respective set of current input data and the respective set of memory data is provided by applying a recurrent algorithm on each processing level.
10. The computer implemented method according to claim 1, wherein: the at least one pattern related to the environment of the host vehicle is associated with at least one object being detected in the environment of the host vehicle.
11. The computer implemented method according to claim 10, wherein: the at least one pattern associated with the at least one object is employed for tracking the object.
12. A computer system configured to: receive respective sets of characteristics within the environment of a host vehicle, the characteristics being detected by a perception system of the host vehicle for a current point in time and for a predefined number of points in time before the current point in time, and carry out the computer implemented method of claim 1.
13. A vehicle including the perception system and the computer system of claim 12.
14. The vehicle according to claim 13, further including a control system configured to receive information derived from the at least one pattern provided by the computer system and to apply the information for controlling the vehicle.
15. A non-transitory computer readable medium comprising instructions for carrying out the computer implemented method of claim 1.
Description
DRAWINGS
[0037] The drawings described herein are for illustrative purposes only of selected embodiments and not all possible implementations, and are not intended to limit the scope of the present disclosure.
[0038] Exemplary embodiments and functions of the present disclosure are described herein in conjunction with the following drawings, showing schematically:
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
[0045]
[0046]
[0047] Corresponding reference numerals indicate corresponding parts throughout the several views of the drawings.
DETAILED DESCRIPTION
[0048] Example embodiments will now be described more fully with reference to the accompanying drawings.
[0049]
[0050] The perception system 110 may include a radar system, a LiDAR system and/or one or more cameras in order to monitor the external environment or surroundings of the vehicle 100. Therefore, the perception system 110 is configured to monitor a dynamic context 125 of the vehicle 100 which includes a plurality of objects 130 which are able to move in the external environment of the vehicle 100. The objects 130 may include other vehicles 140 and/or pedestrians 150, for example.
[0051] The perception system 110 is also configured to monitor a static context 160 of the vehicle 100. The static context 160 may include static objects 130 like traffic signs 170 and lane markings 180, for example.
[0052] The perception system 110 is configured to determine characteristics of the objects 130. The characteristics include a current position, a current velocity and an object class of each road user 130 for a plurality of points in time. The current position and the current velocity are determined by the perception system 110 with respect to the vehicle 100, i.e. with respect to a coordinate system having its origin e.g. at the center of mass of the vehicle 100, its x-axis along a longitudinal direction of the vehicle 100 and its y-axis along a lateral direction of the vehicle 100. Moreover, the perception system 100 determines the characteristics of the road users 130 for a predetermined number of previous points in time and for a current point in time, e.g. for each 0.5 s.
[0053] The computer system 120 transfers information derived from the result or output 250 (see
[0054]
[0055] The characteristics for the current and the previous points in time are transferred to the processing unit 121 which generates a set of current input data 210 associated with the sets of characteristics for the current point in time, and a primary set of memory data H.sub.t-1 by aggregating the sets of characteristics over the predefined number of previous points in time.
[0056] The set of current input data 210 and the primary set of memory data H.sub.t-1 are associated with respective grid maps defined for the receptive field within the environment of the host vehicle 100. That is, the respective dynamic and static contexts 125, 160 (see
[0057] Accordingly, the set of current input data 210 is associated with a grid map including LT pixels, wherein L denotes the number of pixels in a first or longitudinal direction and T denotes the number of pixels in a second or transversal direction. For example, the region of interest may cover an area of 280 m160 m in front of the vehicle 100 and may be rasterized into a 280160 pixel image, wherein each pixel represents a square area of 1 m1 m.
[0058] For each pixel or cell of the respective grid map or image, a respective channel is associated with one of the characteristics or features of the object 130. Hence, the empty multi-channel image mentioned above and representing the rasterized region of interest close to the vehicle 100 is filled by the characteristics of the objects 130 which are associated with the respective channel of the pixel or grid cell.
[0059] For processing the current input data 210, processing levels PL0, PL1, PL2 are defined, each of which has a different scale or resolution for the data associated with respective levels. In the example as shown in
[0060] The first processing level PL0 has a scaling or resolution of LT, i.e. a resolution which is identical to the resolution of a grid map the original current input data 210 are associated with. That is, the first processing level PL0 receives a set of current input data 212 which is associated with the set of characteristics for the current point in time and associated with a grid map having a scaling or spatial resolution of LT like the original current input data 210.
[0061] The second processing level PL1 has a scaling or spatial resolution of L/2T/2, i.e. half of the resolution of the first processing level PL0. On the second processing level PL1, a set of current input data 214 is generated which is associated with a grid map of this processing level PL1 having the scaling or resolution of L/2T/2. Similarly, the third processing level PL2 has a further reduced scale or resolution of L/4T/4, i.e. with respect to the resolution of LT of the grid map associated with the original current input data 210.
[0062] Each processing level PL0, PL1, PL2 includes a respective recurrent unit 232, 234, 236 which combines the respective set of current input data 212, 214, 216 for each processing level PL0, PL1, PL2 with a respective set of memory data 222, 224, 226 which are respectively related to the sets of characteristics for the previous points in time. The respective set of memory data 222, 224, 226 is also associated with the respective grid map having the same resolution as provided for the respective grid map associated with the respective set of current input data 212, 214, 216 on each processing level PL0, PL1, PL2.
[0063] The respective recurrent units 232, 234, 236 generate, as a respective output on each processing level PL0, PL1, PL2, a set of joint spatial-temporal data based on the combination of the respective set of current input data 212, 214, 216 with the respective set of memory data 222, 224, 226. The respective joint spatial-temporal data of each processing level PL0, PL1, PL2 are used to provide a respective input for an attention algorithm 240.
[0064] In detail, a respective key vector K.sub.0, K.sub.1, K.sub.2 and a respective value vector V.sub.0, V.sub.1, V.sub.2 is calculated on each processing level PL0, PL1, PL2 based on the output of the respective recurrent unit 232, 234, 236 and provided as an input for the attention algorithm 240. In addition, a query vector Q is generated based on the original set of current input data 210 and also provided as an input for the attention algorithm 240. The query vector Q is independent from the processing levels PL0, PL1, PL2.
[0065] The attention algorithm 240 performs a matching of the query vector Q with the key vectors K.sub.0, K.sub.1, K.sub.2 of the respective processing level PL0, PL1, PL2 in order to provide weights for the respective value vectors V0, V1, V2 when aggregating the data provided on the different processing level PL0, PL1, PL2, as will be described in detail below. Based on this aggregation, the attention algorithm 240 provides an output 250 of the method according to the disclosure, i.e. at least pattern related to the environment of the host vehicle 100 (see
[0066] The output 250, i.e. the at least one pattern related to the environment of the host vehicle 100, is provided as an abstract feature map which is stored in a grid map. This grid map is generated in a similar manner as described above for the grid map generated for associating the set of current input data 210.
[0067] The respective grid maps for the output 250 and for the set of current input data 210 are represented by a two-dimensional grid in bird's eye view with respect to the host vehicle 100. However, other representations of the grid maps may be realized alternatively. The grid maps include a predefined number of cells. For the output 250, a predefined number of features is assigned to each cell in order to generate the feature map.
[0068] The output 250, i.e. the feature map described above, is transferred to a task module 260 which applies further tasks to the feature map including the at least one pattern related to the environment of the host vehicle 100. These tasks include tasks related to an object detection and/or to a segmentation of the environment of the host vehicle 100, i.e. to a segmentation of the grid map associated with the output 250.
[0069] The object detection provides different kinds of information regarding the dynamics of a respective object 130, e.g. regarding its position, its velocity and/or regarding a bounding box surrounding the respective object 130. That is, the objects 130 themselves, i.e. their positions, and their dynamic properties are detected and/or tracked by applying a respective task of the task module 260 to the feature map including the pattern. Moreover, the grid segmentation is applied e.g. in order to detect a free space in the environment of the host vehicle 100.
[0070] The results of the task module 260 as described above are provided to the control system 124 in order to use these results, e.g. the properties of the objects 130 and/or the free space, as information for controlling the host vehicle 100.
[0071] The recurrent units 232, 234, 236 and the attention algorithm 240, are implemented as respective machine learning algorithms, e.g. as a respective neural network for which suitable training procedures are defined. The task module 260 including the further tasks is also implemented as one or more machine learning algorithms, e.g. comprising a respective decoding procedure, being associated with the respective task. The task module 260 includes a respective head for each required task.
[0072] When training the machine learning algorithms or neural networks, the output 250 and a ground truth are provided to a loss function for optimizing the neural network. The ground truth is generated for a known environment of the host vehicle 100 for which sensor data provided by the perception system is preprocessed in order to generate data associated with a grid map e.g. in bird's eye view. This data is processed by the method, and the respective result of the different heads of the task module 260, i.e. regarding object detection and/or segmentation for e.g. determining a free space, is related to the known environment of the host vehicle 100. The loss function acquires the error of a model, i.e. the model on which the machine learning algorithm or neural network relies, with respect to the ground truth. Specific weights of the machine learning algorithms or neural networks are updated accordingly for minimizing the loss function, i.e. the error of the model.
[0073]
[0074] Before the query vector 310 is generated based on the original set of current input data 210, a layer norm 312 is applied to these original input data 210 to obtain a uniform distribution of values across training samples, i.e. when a training procedure of the entire neural network is performed. Via the layer norm 312, the current input data or feature map 210 is scaled by a variance across entries calculated within a layer.
[0075] If the layer norm 312 is applied to a sparse feature map or set of current input data 210 as provided e.g. by radar sensors, large peaks, i.e. having values in a range much greater than e.g. 1, may be obtained due to a significant imbalance in the sparse feature map or input data between entries containing relevant values and empty background. For example, a few entries in the spares feature map may have values >0 while a broad majority of entries may have values equal to 0. For the sparse feature maps provided by a radar sensor, for example, the imbalance between the entries of the feature map or input data may be reduced by scaling the feature map or input data I.sub.t subjected to the layer norm 312 by a factor of 1/20.
[0076] In the example as shown in
[0077] The respective recurrent units 232, 234, 236 for each processing level PL0, PL1, PL2 is realized as a respective convolutional gating recurrent unit (ConvGRU) which combines the respective set of current input data 212, 214, 216 with the respective set of memory data 222, 224, 226.
[0078] The spatial-temporal data or features extracted by the respective ConvGRU 232, 234, 236 are employed for generating respective key vectors 332, 334, 336 and respective value vectors 342, 344, 346 for each of the processing level PL0, PL1, PL2. The respective key vectors 332, 334, 336 are matched with the query vector 310 within an aggregation module 360 of the attention algorithm 214 in order to provide weights for a data aggregation over all processing levels PL0, PI1, PL2, as will be described in detail below in context of
[0079]
wherein .sub.Q defines a set of trainable parameters. Each of LT grid cells associated with the set of current input data I.sub.t is assigned to a distinct query vector of size d.sub.k, and therefore the final query vector or query matrix 310 is defined regarding its dimension as
[0080] The query vector or matrix 310 is matched with the respective key vectors 332, 334, 336 generated for each processing level PL0, PL1, PL2 and, in detail, calculated by linear dense layers as follows:
wherein the key vectors K.sub.i are obtained from the feature maps or joint spatial-temporal data H.sub.t, i resulting from the respective ConvGRU 332, 334, 336 (see
wherein L.sub.i und T.sub.i denote the number of grid cells in the longitudinal and transversal dimension for the respective processing level PL0, PL1, PL2.
[0081] At 410, the respective key vectors K.sub.i 332, 334, 336 are matched with the query vector or matrix 310 for each processing level PL0, PL1, PL2, respectively, wherein a dot product of the respective key vector 332, 334, 336 and the query vector or matrix 310 is calculated. Before calculating the dot products, the key vectors 334, 336 of the second and third processing levels PL1, PL2 are up-sampled as mentioned above and described in detail below in context of
[0082] At 412, the dot products calculated at 410 are divided by {square root over (d)} in order to provide a value range allowing larger gradients to pass the subsequent softmax function 420 in order to generate weights W for the value vectors 342, 344, 346 on each processing level PL0, PL1, PL2. Since the weights W are generated by an attention algorithm using a dot product of key and query vectors, the weights may be denoted as attention weights and defined regarding their dimension by:
wherein X and Y denote the respective set of cells along the longitudinal and transversal dimension for the grid cells provided for respective processing level PL0, PL1, PL2.
[0083] At 430, the weights W.sup.Att are applied to the respective value vectors 342, 344, 346 in order to define the extent to which each processing level PL0, PL1, PL2 contribute to the output 250 (see
[0084] The value vectors 342, 344, 346 are obtained in a similar manner as the key vectors 332, 334, 336 from the feature maps H.sub.t,i provided from the respective ConvGRU 332, 334, 336. In detail, the respective value vectors are defined as follows:
wherein f.sup.V defines a dense layer with trainable parameters .sub.V which is followed by an elu activation function. The value vectors include feature encodings per cell of the associated grid map on each processing level PL0, PL1, PL2. The value vectors 342, 344, 346 which are combined with the weights W.sup.Att on each processing level PL0, PL1, PL2 at 430 are then summed or aggregated at 440 in order to provide the result 250 of the method according to the disclosure. The value vectors are defined regarding their dimension by
wherein the dv denotes the length of the value vectors which may vary from the dimension d.sub.K of the key vectors described above.
[0085]
[0086]
[0087] At 602, sets of characteristics may be determined within the environment of a host vehicle, wherein the characteristics may be detected by a perception system of the host vehicle. At 604, at least two processing levels having different scales may be defined for data associated with the respective level via a processing unit of the host vehicle. At 606, for each processing level a respective set of current input data associated with the set of characteristics for a current point in time may be combined with a respective set of memory data related to sets of characteristics for previous points in time in order to generate a set of joint spatial-temporal data for the respective processing level. At 608, an attention algorithm may be applied to the sets of joint spatial-temporal data of all processing levels in order to generate an aggregated data set. At 610, at least one pattern related to the environment of the host vehicle may be determined from the aggregated data set.
[0088] According to various embodiments, the respective set of current input data and the respective sets of memory data may be associated with respective grid maps having different respective spatial resolutions on each processing level.
[0089] According to various embodiments, a first processing level may be provided with a highest grid resolution, and subsequent processing levels may be provided with a grid resolution being lower than the highest grid resolution.
[0090] According to various embodiments, the attention algorithm may include a query vector being independent from the processing levels.
[0091] According to various embodiments, the attention algorithm may include respective key and value vectors defined on each respective processing level after combining the respective set of current input data with the respective set of memory data.
[0092] According to various embodiments, the key vectors of each processing level may be combined with the query vector in order to provide weights for elements of the value vector.
[0093] According to various embodiments, the respective key vector and the respective value vector defined on the respective processing level may be up-sampled to a resolution of the query vector if the resolution of the respective processing level is lower than the resolution of the query vector.
[0094] According to various embodiments, the up-sampling of the key vector may be performed by applying an interpolation to elements of the key vector.
[0095] According to various embodiments, the combination of the respective set of current input data and the respective set of memory data may be provided by applying a recurrent algorithm on each processing level.
[0096] According to various embodiments, the at least one pattern related to the environment of the host vehicle may be associated with at least one object being detected in the environment of the host vehicle.
[0097] According to various embodiments, the at least one pattern associated with the at least one object may be employed for tracking the object.
[0098] Each of the steps 602, 604, 606, 608, 610, 612 and the further steps described above may be performed by computer hardware components.
[0099]
[0100] The characteristics determination circuit 702 may be configured to determine sets of characteristics detected within an environment of the host vehicle by a perception system of a host vehicle.
[0101] The processing level definition circuit 704 may be configured to define at least two processing levels having different scales for data associated with the respective level.
[0102] The data combination circuit 706 may be configured to combine, for each processing level, a respective set of current input data associated with the set of characteristics for a current point in time and a respective set of memory data related to sets of characteristics for previous points in time in order to generate a set of joint spatial-temporal data for the respective processing level.
[0103] The attention algorithm circuit 708 may be configured to apply an attention algorithm to the sets of joint spatial-temporal data of all processing levels in order to generate an aggregated data set.
[0104] The pattern determination circuit 710 may be configured to determine at least one pattern related to the environment of the host vehicle from the aggregated data set.
[0105] The characteristics determination circuit 702, processing level definition circuit 704, the data combination circuit 706, the attention algorithm circuit 708 and pattern determination circuit 710 may be coupled to each other, e.g. via an electrical connection 711, such as e.g. a cable or a computer bus or via any other suitable electrical connection to exchange electrical signals.
[0106] A circuit may be understood as any kind of a logic implementing entity, which may be special purpose circuitry or a processor executing a program stored in a memory, firmware, or any combination thereof.
[0107]
[0108] The processor 802 may carry out instructions provided in the memory 804. The non-transitory data storage 806 may store a computer program, including the instructions that may be transferred to the memory 804 and then executed by the processor 802.
[0109] The processor 802, the memory 804, and the non-transitory data storage 806 may be coupled with each other, e.g. via an electrical connection 808, such as e.g. a cable or a computer bus or via any other suitable electrical connection to exchange electrical signals.
[0110] As such, the processor 802, the memory 804 and the non-transitory data storage 806 may represent the characteristics determination circuit 702, the processing level definition circuit 704, the data combination circuit 706, the attention algorithm circuit 708 and the pattern determination circuit 710, as described above.
[0111] The terms coupling or connection are intended to include a direct coupling (for example via a physical link) or direct connection as well as an indirect coupling or indirect connection (for example via a logical link), respectively.
[0112] It will be understood that what has been described for one of the methods above may analogously hold true for the pattern determination system 700 and/or for the computer system 800.
REFERENCE NUMERAL LIST
[0113] 100 vehicle [0114] 110 perception system [0115] 115 field of view [0116] 120 computer system [0117] 121 processing unit [0118] 122 memory, database [0119] 124 control system [0120] 125 dynamic context [0121] 130 object [0122] 140 vehicle [0123] 150 pedestrian [0124] 160 static context [0125] 170 traffic sign [0126] 180 lane markings [0127] 210 original set of current input data [0128] 212, 214, 216 input data on the first, second and third processing levels [0129] 222, 224, 226 memory data on the first, second and third processing levels [0130] 232, 234, 236 recurrent unit on the first, second and third processing levels [0131] 240 attention algorithm [0132] 250 output [0133] 260 task module [0134] 310 query vector or matrix [0135] 312 layer norm [0136] 320 maxpool layers [0137] 332, 334, 336 key vectors on the first, second and third processing levels [0138] 342, 344, 346 value vectors on the first, second and third processing levels [0139] 350 up-sampling [0140] 360 aggregation module [0141] 410 application of dot product [0142] 412 scaling [0143] 420 softmax function [0144] 430 weighting of value vectors [0145] 440 aggregation of the processing levels [0146] 600 flow diagram illustrating a method for determining patterns related to an environment of a host vehicle from sequentially recorded data [0147] 602 step of determining sets of characteristics detected within the environment of the host vehicle by a perception system of the host vehicle [0148] 604 step of defining, via a processing unit of the host vehicle, at least two processing levels having different scales for data associated with the respective level [0149] 606 step of combining, for each processing level, a respective set of current input data associated with the set of characteristics for a current point in time and a respective set of memory data related to sets of characteristics for previous points in time in order to generate a set of joint spatial-temporal data for the respective processing level [0150] 608 step of applying an attention algorithm to the sets of joint spatial-temporal data of all processing levels in order to generate an aggregated data set [0151] 610 step of determining at least one pattern related to the environment of the host vehicle from the aggregated data set [0152] 700 pattern determination system [0153] 702 characteristics determination circuit [0154] 704 processing level definition circuit [0155] 706 data combination circuit [0156] 708 attention algorithm circuit [0157] 710 pattern determination circuit [0158] 711 connection [0159] 800 computer system according to various embodiments [0160] 802 processor [0161] 804 memory [0162] 806 non-transitory data storage [0163] 808 connection [0164] K0, K1, K2 key vectors [0165] L number of pixels or cells in longitudinal direction [0166] Q query vector [0167] PL0, PI1, PI2 processing levels [0168] T number of pixels or cells in transversal direction [0169] V0, V1, V2 value vectors [0170] W weight