Method of Classifying a Road Surface Object, Method of Training an Artificial Neural Network, and Method of Operating a Driver Warning Function or an Automated Driving Function
20240104940 ยท 2024-03-28
Inventors
Cpc classification
G06F18/2414
PHYSICS
G06V10/774
PHYSICS
B60W2420/403
PERFORMING OPERATIONS; TRANSPORTING
B60W60/001
PERFORMING OPERATIONS; TRANSPORTING
B60W2552/35
PERFORMING OPERATIONS; TRANSPORTING
G06V20/588
PHYSICS
International classification
G06V20/56
PHYSICS
B60W60/00
PERFORMING OPERATIONS; TRANSPORTING
G01C21/00
PHYSICS
Abstract
A road surface is classified by providing a set of data points that is attributable to a same road surface object. Each data point specifies a first variable and a second variable. For each data point, the first variable characterizes a horizontal motion exhibited by a vehicle when driving over the road surface object and the second variable characterizes a vertical motion exhibited by said vehicle when driving over the road surface object. The set of data points are classified using an artificial neural network with regard to a relevance of the road surface object for a driver warning function or an automated driving function.
Claims
1. A computer-implemented method of classifying a road surface object, the method comprising: providing a set of data points that is attributable to a same road surface object, each data point specifying a first variable and a second variable, wherein for each data point the first variable characterizes a horizontal motion exhibited by a vehicle when driving over the road surface object; and the second variable characterizes a vertical motion exhibited by said vehicle when driving over the road surface object; and classifying the set of data points using an artificial neural network with regard to a relevance of the road surface object for a driver warning function or an automated driving function.
2. The method of claim 1, wherein the set of data points is provided to the artificial neural network in a form of an image data file.
3. The method of claim 2, wherein the image data file depicts the data points in the form of a scatter plot.
4. The method of claim 1, wherein the first variable is a speed of the vehicle.
5. The method of claim 1, wherein the second variable is an amplitude characterizing a vertical displacement of the vehicle.
6. The method of claim 3, wherein the set of data points comprises a first subset with data points specifying, as the second variable, an amplitude measured at a right side of the vehicle; and a second subset with data points specifying, as the second variable, an amplitude measured at a left side of the vehicle.
7. The method of claim 6, wherein the scatter plot depicts the first subset of data points and the second subset of data points with different marker symbols and/or with different marker shapes.
8. The method of claim 1, further comprising: generating, supplementing or updating a digital map by relating a result of the classification to a position of the road surface object.
9. The method of claim 8, further comprising: providing the digital map or a part of the digital map or information derived from the digital map to a vehicle for use by a driver warning function or an automated driving function.
10. A computer-implemented method of operating a driver warning function or an automated driving function of a vehicle, the method comprising the following steps being performed by one or more computing devices of the vehicle: receiving information with regard to a relevance of a road surface object for a driver warning function or an automated driving function, wherein the information results from a classification of the road surface object according to claim 1; and generating a control command for controlling the driver warning function or the automated driving function in dependence on the received information.
11. A computer-implemented method of training an artificial neural network to classify road surface objects, the method comprising: providing a plurality of data points, each data point specifying a first variable and a second variable, wherein for each data point the first variable characterizes a horizontal motion exhibited by a vehicle when driving over the road surface object; and the second variable characterizes a vertical motion exhibited by said vehicle when driving over the road surface object; providing ground truth data indicating a respective relevance of a plurality of road surface objects for a driver warning function or an automated driving function; clustering the data points into a plurality of sets of data points, such that each data point of a same set of data points is attributable to a same road surface object; matching the ground truth data with the sets of data points so as to obtain a plurality of sets of data points, each set of data points being attributable to a respective road surface object and to a relevance of the road surface object for a driver warning function or an automated driving function; for each set of data points, generating an image data file depicting the data points in a form of a scatter plot; and training an artificial neural network to classify road surface objects by using the image data files and the corresponding ground truth information regarding the relevance of the respective road surface objects as training data.
12. The method of claim 11, wherein training the artificial neural network comprises providing a pretrained convolutional neural network that has been trained for image classification of other image data files; adding a layer to the pretrained convolutional neural network; and training the added layer with the image data files depicting the scatter plots as training data.
13. A computing device being configured for executing a method according to claim 1
14. A computer program comprising instructions which, when the program is executed by a computing device, cause the computing device to carry out the method according to claim 1.
15. A computer-readable storage medium comprising instructions which, when executed by a computing device, cause the computing device to carry out a method according to claim 1.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0085]
[0086]
[0087]
[0088]
[0089]
[0090]
[0091]
[0092]
[0093]
DETAILED DESCRIPTION
[0094]
[0095] The method comprises providing a set of data points that is attributable to a same road surface object, each data point specifying a first variable and a second variable. For each data point, the first variable characterizes a horizontal motion exhibited by a vehicle when driving over the road surface object and the second variable characterizes a vertical motion exhibited by said vehicle when driving over the road surface object.
[0096] The method further comprises classifying the set of data points using an artificial neural network (ANN) with regard to a relevance of the road surface object for a driver warning function or an automated driving function.
[0097]
[0098] The method comprises the following steps, which are carried out by one or more computing devices, such as one or more electronic control units, of the vehicle: receiving information with regard to a relevance of a road surface object for a driver warning function or an automated driving function, wherein the information results from a classification of the road surface object that has been carried out according to the method illustrated in
[0099]
[0100] The method comprises providing a plurality of data points, wherein each data point specifies a first variable and a second variable. For each data point, the first variable characterizes a horizontal motion exhibited by a vehicle when driving over the road surface object and the second variable characterizes a vertical motion exhibited by said vehicle when driving over the road surface object.
[0101] The method further comprises ground truth data indicating a respective relevance of a plurality of road surface objects for a driver warning function or an automated driving function.
[0102] The method further comprises clustering the data points into a plurality of sets of data points, such that each data point of a same set of data points is attributable to a same road surface object.
[0103] The method further comprises matching the ground truth data with the sets of data points to obtain a plurality of sets of data points, each set of data points being attributable to a respective road surface object and to a relevance of the road surface object for a driver warning function or an automated driving function.
[0104] The method further comprises generating, from each set of data points, an image data file depicting the data points in the form of a scatter plot.
[0105] The method further comprises training an ANN to classify road surface objects by using the image data files and the corresponding ground truth information regarding the relevance of the respective road surface objects as training data.
[0106] In the following, aspects of examples of the methods shown in
[0107]
[0108] A computing device in the backend 2 is programmed to classify road surface objects according to the method shown in
[0109] The backend 2 provides a result of the classification via a wireless downstream connection (indicated by a solid arrow) to a first vehicle 11 of the fleet 1.
[0110] In
[0111] For example, such information provided via the downstream link may indicate a position of a road surface object (Event location), as well as a respective relevance of the road surface objects for the driver warning function or the automated driving function (Event type, e.g., a pothole that the driver should be warned of). For example, the information may be provided to the first vehicle 11 in the form of a digital map or as information taken from a digital map.
[0112] A second vehicle 12 shown in
[0113] For example, the upstream information may include, for every recorded event, a location of the event in which the vehicle has traversed the road surface object, a speed (or velocity) at which the vehicle has traversed the road surface object, and a vertical displacement amplitude that the vehicle has exhibited when traversing the road surface object. In the backend processing, the measured speed and the measured vertical displacement amplitude may be used as the first and second variables, respectively, when carrying out a classification of road surface objects according to the method of
[0114] In an embodiment, the processing of the fleet sensor data in the backend may comprise several consecutive steps:
[0115] A preprocessing step may comprise filtering the data to reduce noise. The preprocessing may further comprise a so-called map matching process, wherein raw geospatial data that are provided together with the sensor data are matched to a lanelet network.
[0116] In a clustering step, the data are clustered into sets of data points, such that each set of data points contains data points that have likely been caused by a same road surface object.
[0117] The classification of the road surface objects is then carried out based on the sets of data points. As a result of the classification, each set of data points may be attributed a type of object (e.g., pothole or speedbump). Further, the sets of data points are classified regarding a necessity to warn the driver of the respective road surface object.
[0118] Finally, the result of the classification can be stored in a digital map again, so that when a vehicle 11 from the fleet 1 enters a certain area it can download the relevant information regarding the road conditions as well as the locations of the found events. Consequently, if the vehicle 11 is about to cross such an event, the driver may be warned of the upcoming event or an automated driving function of the vehicle 11 may automatically control a longitudinal and/or lateral movement of the vehicle, e.g., to avoid the road surface object or to mitigate its impact.
[0119] In accordance with the embodiment described above, the method of
[0120]
[0121] In the following, reference is made to
[0122] For example, an ANN that is used in the backend processing for classifying road surface objects in the example described above with reference to
[0123] In the schemes of
[0124] The fleet data comprise recorded sensor data characterizing a horizontal motion (first variable) and a vertical motion (second variable) that vehicles 12 have exhibited while driving over different road surface objects, as explained above with reference to
[0125] The GT data may have been acquired during a number of test drives with one or more vehicles, wherein test drivers may have manually labeled different events of driving over road surface objects according to a relevance of the respective road surface object for a driver warning function or an automated driving function. For example, a test driver may have indicated that a certain pothole is so deep that a driver should be warned before driving over it.
[0126] The data points of the fleet data are clustered into a plurality of sets of data points, such that each data point of a same set of data points is attributable to a same road surface object.
[0127] Further, the GT data are matched with the sets of data points to obtain a plurality of sets of data points, each set of data points being attributable to a respective road surface object and to a relevance of the road surface object for a driver warning function or an automated driving function.
[0128] Then, for each set of data points, an image data file depicting the data points in the form of a scatter plot is generated, wherein, for the purpose of training an ANN to classify road surface objects based on the scatter plots as training data, each image data filed is logically labeled according to the corresponding GT label indicating the relevance of the respective road surface object.
[0129] Finally, in the example of
[0130] For example, the training may be carried out such that it affects only one or a few layers added to a very large CNN, which has been pretrained with a huge number (e.g., 14 million samples) of other images, i.e., images depicting objects that may be entirely different from such scatter plots. This aspect of a so-called transfer learning will be explained in more detail below with reference to
[0131]
[0132] Each of the scatter plots comprises a plurality of data points stemming from a measurement of a first variable and a second variable when a vehicle has traversed the road surface object that is represented by the respective scatter plot. Specifically, in the examples shown in
[0133] For example, the depicted scale from 0 to 60 on the y-axis may approximately correspond to a vertical displacement amplitude in a range from 2 cm (corresponding to the value 0 on the y-axis) to 8 cm (corresponding to the value 60 on the y-axis), wherein the amplitudes have been normalized to make the amplitudes of different vehicle types or derivates comparable.
[0134] It should be noted that in each of the scatter plots, a first subset of data points referring to amplitude measurements taken at the left rear suspension is indicated by red circles (depicted in grey in the Figures) and a second subset of data points referring to amplitude measurements taken at the right rear suspension is depicted as black crosses. In other words, data points stemming from measurements taken on the left side and on the right side of the vehicle, respectively, are coded by a different marker shape as well as by a different marker color.
[0135] It can be observed that the two scatter plots in
[0136] Further, in the scatter plot shown in
[0137] It is intended that the scatter plots in
[0138] By relying on scatter plots such as the ones shown in
[0139] It is also worth mentioning, that it would be practically impossible translate the two samples shown in
[0140] For example, in an embodiment, the method of training the ANN comprises providing a pretrained CNN that has been trained for image classification of other image data files. For example, the other image data files may be of a different kind than the ones mentioned above (showing the scatter plots), such as, e.g., images of vegetables, images of animals, or the like. For example, the pretrained CNN may be a very large network that has been trained with an extensive amount of training images. An example of such a large pretrained networks is the CNN Xception, which was introduced by Chollet, 2017.
[0141] The method may further comprise adding one or more layers to the pretrained CNN. For example, one or more final layers may be added to the pretrained CNN.
[0142] The one or more added layers may comprise fully connected (FC) layers. In an embodiment, at least one added layer is a fully connected layer.
[0143] Adding the one or more layers to the pretrained CNN may comprise replacing one or more layers of the pretrained CNN with the one or more added layers.
[0144] The method may comprise training in particular the added layer(s) with the image data files depicting the scatter plots as training data. For example, in an embodiment, only the one or more added layer(s) are trained with the image data files depicting the scatter plots as training data, whereas during the training, the parameters of the pretrained CNN may be frozen (i.e., left unchanged by the training).
[0145] As mentioned before, CNNs are especially beneficial in terms of computational complexity because of the reduced number of weights (and biases) required to describe the neural network. However, even if this reduces the size of the problem significantly, a large CNN can still have multiple millions of weights. This is a challenge not only due to complexity or runtime constraints, but it also requires the training dataset to be reasonably large. To overcome this challenge, transfer learning may be used. This technique generally describes the process of using pretrained networks for a new problem. More precisely a neural networkin this case a CNNis trained on different data beforehand. Afterwards, only a few layers are trained for the actual task on the actual training data.
[0146] Transfer learning is usually applied in situations where the two problems are sufficiently similar and thus similar decision criteria can be learned and subsequently used (Weiss et al., 2016). This can be the case, for instance, when an ANN that was trained and used for extracting the opinion of writers from restaurant reviews is now used for different tasks that include extracting the semantics of language, e.g., as a module of a chatbot. However, as the present problem of pothole detection, or more precisely, the problem of extracting information from scatter plots, is a highly specialized task, the limits of what is possible in terms of transfer learning must be pushed even further. Consequently, in the present context, the pretrained CNN can be understood as a feature extractor rather than a classifier.
[0147] In an embodiment that is illustrated in
[0148] It may be worth mentioning that when just evaluating the model on a benchmark, like, e.g., the top 5 accuracy on ImageNet (a public dataset consisting of more than 14 Mio samples of 1000 classes), there are newer architectures outperforming Xception, one example being the CoCa architecture. Regardless, since the actual task differs substantially from classifying ImageNet, it is not that important how good the model performs on classifying ImageNet, but rather how well the model can be used for the task.
[0149] To overcome the difference in the task, additional fully connected (FC) layers, FC10 and FC1, were added in combination with a flattening layer that is provided to adapt the dimensions. For the activation function in the FC, layer, FC10, a leaky ReLu was used, which combines the general benefits of a ReLu with the advantage of having a small slope for values smaller than zero, thus dealing with the problem of dying ReLus, which is a well-known phenomenon that can occur during training. The term dying ReLu describes the problem that adjusting the weights between layers does not change the final loss anymore because to the output is negative, which results in no activation (=0) at all when using a ReLu. The size of the negativity, however, is not considered because to the slope is 0 as well. Therefore, e.g., ReLu(?100)=0, as is ReLu(?0.3).
[0150]
[0151] The ANN that was evaluated had the following numbers of parameters: [0152] Total parameters: 22,909,501 [0153] Trainable parameters: 2,048,021 (FC layers+bias) [0154] Non-trainable parameters: 20,861,480 (Xception without the final layer)
[0155] Due to the small number of trainable weights, the training can be performed locally in sufficient time. The prediction of unseen data is, as mentioned, independent of the number of data and can be computed locally.
[0156] The results of a training for 10 epochs on the GT data can be observed in
[0157] The GT data consist of 1264 examples. The model was trained on 70% of these GT samples (training set) and the subsequent prediction was carried out on the remaining 30%, which the model had not yet seen (testing set).
[0158] For example, in one training session, a final performance with an accuracy of 0.7875 and an F1-score of 0.8468 were achieved. These values of the accuracy and the F1-score are higher by 20% and 25%, respectively, compared to a current software, which classifies road surface objects according to their relevance for warning based on classical machine learning approaches.
[0159] It can thus be seen that the model performs well, especially when considering the comparatively small dataset and the task, which differs substantially from the image classification task that Xception was trained for.
[0160] Moreover, it should be noted that the final Sigmoid layer allows the developer to adjust the confidence of the model in terms of positive predictions. By default, if the predicted value is >0.5, an object is classified as true, and otherwise as false. However, this threshold can be increased manually, so that the model does not predict true if its confidence is low (e.g., near to 0.5). In the present example, the threshold has been set to 0.7. This behavior is especially advantageous for a binary classification that decides whether a human shall be warned of a road surface object or not, as there is a different subjective perception of false positives and false negatives. Generally, a driver is more likely to be negatively affected by the system if it outputs false positives frequently as compared to when the system produces false negatives, which results in the subjective impression that she/he is better than the system.
[0161] The foregoing disclosure has been set forth merely to illustrate the present subject matter and is not intended to be limiting. Since modifications of the disclosed embodiments incorporating the spirit and substance of the present subject matter may occur to persons skilled in the art, the present subject matter should be construed to include everything within the scope of the appended claims and equivalents thereof.