METHOD, PROCESSOR CIRCUIT AND COMPUTER-READABLE STORAGE MEDIUM FOR PEDESTRIAN DETECTION BY A PROCESSOR CIRCUIT OF A MOTOR VEHICLE

20230410533 · 2023-12-21

    Inventors

    Cpc classification

    International classification

    Abstract

    The disclosure relates to a method for pedestrian detection in a processor circuit of a motor vehicle, wherein an image data set describing an image of an environment of the motor vehicle is received from an environment sensor, and a machine learning model (ML model) is used to determine bounding boxes with potential images of pedestrians using the image data set, and from image data of the image data set the at least one ML model extracts feature data of image features and a detection of a completely or partially depicted pedestrian is carried out within a bounding box using the image features contained therein by way of a classifier of the at least one ML model, and the bounding box depicting the pedestrian is identified by a detection signal as the result of the detection.

    Claims

    1. A method for pedestrian detection performed by a processor circuit of a motor vehicle, the method comprising: receiving at least one image data set describing an image of an environment of the motor vehicle from at least one environment sensor; determining one or more bounding boxes with one or more potential images of one or more pedestrians using at least one machine learning model (ML model) and the at least one image data set; extracting feature data of image features from image data of the at least one image data set using the at least one ML model; detecting a completely or partially depicted pedestrian within at least one of the one or more bounding boxes using the at least one ML model and the image features; determining a feature vector and a distance value of the feature vector by combining image features contained in the one or more bounding boxes based on multiple statistical distribution models, wherein the image features of one partial portion of a pedestrian or a body of a person is modeled by each of the statistical distribution models; comparing the distance value to a threshold value; and sending a signal indicating that a pedestrian has gone undetected if the distance value is less than the threshold value.

    2. The method according to claim 1, wherein the one or more bounding boxes include a plurality of bounding boxes, wherein the method includes forming one or more additional bounding boxes based on the plurality of bounding boxes, and wherein the determining the feature vector and the distance value includes combining image features contained in the one or more additional bounding boxes and the plurality of bounding boxes.

    3. The method according to claim 2, further comprising: determining that one of the one or more bounding boxes intersects the at least one of the one or more bounding boxes within which the completely or partially depicted pedestrian has been detected, wherein at least one of the one or more additional bounding boxes is a non-overlapping portion of the one of the one or more bounding boxes and the at least one of the one or more bounding boxes within which the completely or partially depicted pedestrian has been detected.

    4. The method according to claim 1, further comprising: determining that one of the one or more bounding boxes intersects the at least one of the one or more bounding boxes within which the completely or partially depicted pedestrian has been detected; eliminating the one of the one or more bounding boxes prior to determining the distance.

    5. The method according to claim 1, wherein the determining the feature vector includes combining the image features to form a temporary vector, and reducing the temporary vector.

    6. The method according to claim 5, wherein the reducing the temporary vector includes transforming the temporary vector to form a transformed vector by way of a principal component analysis, and using a predetermined subset of vector components from the transformed vector for the feature vector.

    7. The method according to claim 1, wherein each of the statistical distribution models simulates a partial portion situated below a detection threshold of a classifier of the at least one ML model.

    8. The method according to claim 1, wherein the extracting feature data of image features is based on a convolution network, and the detecting the completely or partially depicted pedestrian within at least one of the one or more bounding boxes is based on a neural network.

    9. The method according to claim 1, wherein activation values of artificial neurons of at least one network layer of the at least one ML model are determined as the feature data.

    10. The method according to claim 1, further comprising: triggering a predetermined safety measure in the motor vehicle if the distance value from one of the statistical distribution models is less than the threshold value.

    11. The method according to claim 1, further comprising: generating the statistical distribution models from training data sets; decomposing at least one of the one or more bounding boxes that completely depicts one or more pedestrians into partial portions; combining image features contained in respective ones of the partial portions to form training feature vectors; and dividing the training feature vectors into clusters based on a cluster algorithm, wherein each of the clusters represents one of the statistical distribution models.

    12. A processor circuit for a motor vehicle, comprising: a processor; and a memory storing instructions that, when executed by the processor, cause the processing circuit to: receive at least one image data set describing an image of an environment of a motor vehicle from at least one environment sensor; and determine one or more bounding boxes with one or more potential images of one or more pedestrians using at least one machine learning model (ML model) and the at least one image data set; extract feature data of image features from image data of the at least one image data set using the at least one ML model; detect a completely or partially depicted pedestrian within at least one of the one or more bounding boxes using the at least one ML model and the image features; determine a feature vector and a distance value of the feature vector by combining image features contained in the one or more bounding boxes based on multiple statistical distribution models, wherein the image features of one partial portion of a pedestrian or a body of a person is modeled by each of the statistical distribution models; compare the distance value to a threshold value; and send a signal indicating that a pedestrian has gone undetected if the distance value is less than the threshold value.

    13. A computer-readable storage medium containing program instructions which, when executed by a processor circuit, cause the processor circuit to: receive at least one image data set describing an image of an environment of a motor vehicle from at least one environment sensor; and determine one or more bounding boxes with one or more potential images of one or more pedestrians using at least one machine learning model (ML model) and the at least one image data set; extract feature data of image features from image data of the at least one image data set using the at least one ML model; detect a completely or partially depicted pedestrian within at least one of the one or more bounding boxes using the at least one ML model and the image features; determine a feature vector and a distance value of the feature vector by combining image features contained in the one or more bounding boxes based on multiple statistical distribution models, wherein the image features of one partial portion of a pedestrian or a body of a person is modeled by each of the statistical distribution models; compare the distance value to a threshold value; and send a signal indicating that a pedestrian has gone undetected if the distance value is less than the threshold value.

    Description

    BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

    [0041] In the following, exemplary embodiments of the disclosure shall be described.

    [0042] FIG. 1 shows a schematic representation of an embodiment of the motor vehicle according to the disclosure; and

    [0043] FIG. 2 shows a sketch to illustrate one embodiment of the method according to the disclosure.

    DETAILED DESCRIPTION

    [0044] The following explained exemplary embodiments involve preferred embodiments of the disclosure. In the exemplary embodiments, the components of the embodiments which are described each time constitute individual features of the disclosure to be viewed independently of each other, which also modify the disclosure independently of each other. Therefore, the disclosure will also cover other than the represented combinations of the features of the embodiments. Moreover, the described embodiments can also be supplemented by other of the already described features of the disclosure.

    [0045] In the figures, the same reference numbers each time designate functionally equal elements.

    [0046] FIG. 1 shows a motor vehicle 10, which can be an automobile, especially a passenger car or truck. The motor vehicle 10 can have an automated driving function 11, by which actuators 12 of the motor vehicle 10 can be operated by way of a control signal 13 automatically or without involvement of the driver. The actuators 12 can be provided for the transverse control (steering) and/or longitudinal control (acceleration and braking) of the motor vehicle 10. Thus, by way of the control signal 13, the driving function 11 can guide the motor vehicle 10 by triggering the actuators 12 along a driving trajectory 14 which the driving function 11 can compute in order to guide the motor vehicle 10 collision-free through an environment 15, such as road traffic or a road network. In order to plan the driving trajectory 14, the driving function 11 can be connected to a pedestrian detection 16, which can be implemented by a machine learning (ML) model.

    [0047] The pedestrian detection 16 can receive image data sets 19 from at least one environment sensor 17 of the motor vehicle 10, such as a camera, the detection region 18 of which can be pointed toward the environment 15, each time depicting the environment 15 with the persons or pedestrians 20, 21 potentially visible therein.

    [0048] The ML model may comprise a feature extraction unit 22, which can be based on a convolution network CNN, for example. By way of the feature extraction unit 22, image features can be extracted from the image data sets 19, such as is known in itself for convolution networks or in general for computer vision processing of image data sets 19. In addition or alternatively, bounding boxes 23 can be determined with the aid of the image data sets 19, which bound or encircle those regions or picture areas of the images according to the image data sets 19 in which a person might be located as a pedestrian 20, 21 according to the feature-extracted image features 24. These are so-called hypotheses or suggestions. The image data of the individual bounding boxes 23 can be provided to a classifier unit 25, which can be based for example on a FCNN (Fully Connected Neuronal Network). The classifier unit 25 can produce a recognition result 26 in known manner on the basis of the image features 24 from the individual bounding boxes 23, indicating in which of the bounding boxes 23 a person is in fact located as a pedestrian 20, 21.

    [0049] In the following, it shall be assumed for the further explanation that the fully visible pedestrian 20 can be detected by the classifier unit 25, being completely visible or depicted in the respective image data set 19. On the contrary, the pedestrian 21 is depicted only partially or only a partial portion of the pedestrian 21 is depicted and has been overlooked or not detected by the classifier unit 25 in the example. A pedestrian in this exemplary embodiment is a person.

    [0050] The detection result or the recognition result 26 of the detection process can be reported to the driving function 11. The driving function 11 can use the position of the detected pedestrian 20 in the image according to image data sets 19 to ascertain the relative position of the pedestrian 20 in the environment 15 in relation to the motor vehicle and the plan the driving trajectory 14.

    [0051] In order to see whether the classifier unit 25 has overlooked a pedestrian, for example the pedestrian 21, in addition to those bounding boxes 23 which do not contain or cover any of the detected pedestrians 20, 21 it is possible to determine a respective feature vector 31 by way of a dimension-reducing imaging 30. For this, the image features 24 of the respective bounding box 23 can be combined to form a preliminary vector 33, for example, which can be reduced in its dimension or length by way of the imaging 30 in order to generate the feature vector 31. As the imaging 30, a principal component analysis PCA (Principal Compound Analysis) can be used, for example.

    [0052] The feature vector 31 can be compared in a distance calculation 34 with statistical distribution models 35 or it is possible to determine an affiliation value or a value for the probability of occurrence according to the respective statistical distribution model 35 for the feature vector 31. From this, a respective distance value 36 of the feature vector 31 in regard to the statistical distribution model 35 can be determined. The smallest distance value 36 can be used thereafter, since in particular only the smallest distance value 36 needs to be used for the further steps of the method. For example, the distance value 36 which is used can be the reciprocal of a probability value, producing the probability of occurrence of the feature vector 31 according to the respective statistical distribution model 35. The statistical distribution model 35 can be, for example, a Gaussian kernel distribution function, which can indicate a probability of occurrence for a feature vector 31. If a support vector machine, SVM, is used as the statistical distribution model 35, the detection result will be a binary distance value (belonging or not belonging).

    [0053] In a threshold value comparison 37, the distance value 36 can be compared to a threshold value 38, indicated here as the Greek letter . If the distance value is larger than the threshold value 38 (symbolized by a plus sign), there will be no match between the bounding box checked according to the feature vector 31 and one of the statistical distribution models 35 and therefore neither does the bounding box represent any partial portion of a pedestrian (symbolized by an OK check mark).

    [0054] On the contrary, if the distance value 36 in the threshold value comparison 37 is smaller than the threshold value 38 (symbolized by a minus sign), a signal 39 can be provided that an undetected or overlooked pedestrian is present in the image data set 19 on which the bounding box is based. For example, the signaling 39 can trigger the described safety measure 40, that is, the driving function 11 may reduce the driving speed of the motor vehicle 10 along the previously planned driving trajectory 14 or make sure that the driving speed is kept below a given maximum speed.

    [0055] FIG. 2 illustrates, for example, how the distribution models 35 can be formed and how the undetected pedestrian 21 can be determined with the aid of the feature vector 31 by way of the distribution model 35.

    [0056] FIG. 2 shows (with reference to FIG. 1) how training data sets 60 (FIG. 1) can be analyzed in the same way as has already been described by way of the feature extraction unit 22. The training data sets 60 can be provided in already known manner with labeling data, that is, it can be known which bounding box 23 represents or depicts a fully visible person 61 as the pedestrian. Such a bounding box 23 can be decomposed or divided into partial portions 62, that is, the image features inside the bounding box 23 can be associated with the respective partial portion 62. It is known from the prior art that an image feature 24 is also associated with a location or a position within an image in an image data set 19.

    [0057] The image features 24 of each partial portion 62 can now be transformed or mapped into training feature vectors 63 in the described manner by way of the dimension-reducing imaging. The trainings feature vectors 63 can be combined by way of a clustering algorithm 64 or cluster algorithm. In this way, multiple clusters 66 are produced in a feature space 65, each of which represents a distribution model 35. The feature space 65 is represented here, in simplified manner for the description, by a two-dimensional space (plane). For individual regions or for individual points in the feature space 65, the respective distribution model 35 can indicate whether and/or with what probability such a point in the feature space 65 belongs to the distribution model 35 or is produced or generated by it.

    [0058] For example, if the bounding box 23 for the pedestrian 21 has been transformed or transferred to the feature vector 31, this feature vector 31 represents a point 67 in the feature space 65 which has a distance value 36 relative to the distribution model 35 that is smaller than the threshold value 38. Accordingly, the point 67 can be determined as belonging to the distribution model 35, that is, its cluster 66. Hence, in the threshold value comparison 37, it is found that an undetected pedestrian 21 is present and therefore the signaling 39 must be triggered or initiated.

    [0059] In this way, an overall reliability metric is obtained for finding concealed data points based on their distance from the center of mass or geometrical center of the clusters from the training data.

    [0060] Based on this, one obtains a system installed in the vehicle which uses a reference cluster model that was created in the backend on the basis of the training data set used for the optimization of the DNN (in general, the classifier unit) in order to verify whether the currently processed image contains concealed pedestrians.

    [0061] In each phase of the detection and decision making process, predictions are made in the DNNs, and the reliability of such predictions is of critical importance on the whole for the automated driving system, since the upcoming decisions of the system may be influenced by these reliability values. In other words: if a prediction of a subsystem, e.g., the detection process, proves to be unreliable, the system must make alternative decisions in place of the unreliable decisions, or else the safety of the passengers or other traffic participants might be jeopardized. In the case of this patent, this algorithm checks the possibility of a concealed pedestrian being overlooked by the DNN.

    [0062] The prior art thus far is based on computation-intense methods, which are not always easy to apply given the limited resources in the operational equipment, such as vehicles. On the other hand, their predictions are misleading when unfavorable interference is present, while they still provide a high degree of trust in the wrongly predicted data points. Our method is based on lightweight statistical models requiring only a fraction of the computing power which is needed by the main DNN, and thus they do not detract from the efficiency of the overall recognition system. Since the method is based on statistical analysis methods, the system engineers can furthermore establish decision making boundaries so as to rely on only a certain range of valid detections and to regard the rest as unreliable. This is especially helpful in establishing safe and reliable decision making areas where the DNN can provide good performance.

    [0063] Furthermore, our method can also be used for the safety argument of detection DNNs, in which the DNNs are evaluated at a number of input data points with distances from their true class cluster centers and can generate evidence based on this.

    [0064] An overview of a particularly preferred embodiment of the method is now given. The figures show that the method decides whether or not a prediction of the automated driving system in the vehicle is reliable. The decision as to whether or not a prediction of the automated driving system in the vehicle is reliable is made in the following six steps. Steps S1 to S3 are carried out during the design phase of the system in the backend, while steps S4 to S6 are performed during the running time in the vehicle, requiring a minimal computing power. The steps in particular are as follows: [0065] (1) [In the backend] The activations of one or more CNN layers (or a similar encoder) are extracted for the entire training data set for the fully visible pedestrians. These are then divided into various random splits so as to cover a partial portion of the pedestrian each time. Each split is then simplified with linear dimensionality reduction methods such as principal component analysis (PCA), resulting in a simplified feature space, which can be divided into various classes. This model is very small and light, thanks to its linearity, so that it requires only minimal computing power. Consequently, at the end of this step many statistical distributions are extracted, based on different partitioning of the pedestrians, and these are used as a reference for the next steps. [0066] (2) [In the backend] On the basis of the results from (1), a model is developed, forming a cluster for each split. This is then used to estimate a probability score, which defines the probability that a data point belongs to a cluster. This probability value is calculated such that the vehicle requires the least possible computing power. [0067] (3) [In the backend] If required for the cluster method, a probability threshold value is defined for each cluster, representing the minimum probability that a data point belongs to this cluster or not. [0068] (4) [In the vehicle] In the customary 2D object recognition methods, thousands of 2D suggestions are generated, being discarded in the later phases of the detection and only a few of them leading to the final result. Our algorithm uses these numerous 2D suggestions to extract the corresponding filter activations from the aforementioned CNN layer. These suggestions are then included in the narrower selection, in order to eliminate those which intersect with a pedestrian already detected by the main DNN. (The goal is to find the overlooked concealed pedestrians). Finally, they are transferred to the new space by the same PCA model as in (1) and are compared to the clusters formed in (2). [0069] (5) [In the vehicle] The probability value of the results from (4) in regard to the partitioned clusters is estimated and the closest cluster is fed back. [0070] (6) [In the vehicle] If the probability value estimated in step S5 is less than the threshold value calculated in step S3, the final prediction is then considered to be a probably concealed pedestrian overlooked by the main pedestrian detector.

    [0071] On the whole, the examples show how an additional checking for overlooked or undetected pedestrians can be provided for an automated driving function in a pedestrian detection process, which can be based on clusters (distribution models) in a reduced feature space and can therefore be carried out with less computing expense.

    [0072] German patent application no. 102022115189.1, filed Jun. 17, 2022, to which this application claims priority, is hereby incorporated herein by reference, in its entirety.

    [0073] Aspects of the various embodiments described above can be combined to provide further embodiments. In general, in the following claims, the terms used should not be construed to limit the claims to the specific embodiments disclosed in the specification and the claims, but should be construed to include all possible embodiments along with the full scope of equivalents to which such claims are entitled.