REAL-TIME VEHICLE OVERLOAD DETECTION METHOD BASED ON CONVOLUTIONAL NEURAL NETWORK

Abstract

The present disclosure provides a real-time vehicle overload detection method based on a convolutional neural network (CNN). The present disclosure detects a road driving vehicle in real time with a CNN method and a you only look once (YOLO)-V3 detection algorithm, detects the number of wheels to obtain the number of axles, detects a relative wheelbase, compares the number of axles and the relative wheelbase with a national vehicle load standard to obtain a maximum load of the vehicle, and compares the maximum load with an actual load measured by a piezoelectric sensor under the vehicle, thereby implementing real-time vehicle overload detection. The present disclosure has desirable real-time detection, can implement no-parking vehicle overload detection on the road, and avoids potential traffic congestions and road traffic accidents.

Claims

1. A real-time vehicle overload detection method based on a convolutional neural network (CNN), wherein the real-time vehicle overload detection method constructs, based on you only look once (YOLO)-V3, an object detection network for detecting a tire of a vehicle, performs sparsification on a YOLO network based on L1 regularization by using an artificial neural network pruning algorithm, and performs channel pruning on a CNN, thereby compressing the network greatly; and the real-time vehicle overload detection method comprises the following steps: step 1: preparing a visual object classes (VOC) dataset; acquiring an image of a multi-axle vehicle on site, labeling an acquired image (comprising the number of axles of the vehicle such as 6-axle and a wheel on a single side of the vehicle) of the multi-axle vehicle with a labeling tool, and preparing the VOC dataset, wherein the VOC dataset comprises four parts, specifically, a folder Annotations stores a label file suffixed by an xml for all images, a folder JPEGImages stores all dataset images, a folder ImageSets stores a file suffixed by a txt and generated after the dataset is partitioned, and a folder labels stores a file converted from the label file and suffixed by the txt; step 2: configuring a training environment for a YOLO-V3 object detection network model; constructing the YOLO-V3 object detection network model with a darknet deep learning framework in a ubuntu system, and training the YOLO-V3 object detection network model on the darknet deep learning framework, wherein the YOLO-V3 object detection network model is trained and tested on a computer; step 3: training the YOLO-V3 object detection network model; training the model with a YOLO-V3 object detection algorithm, and simplifying the network model with a pruning algorithm, thereby reducing a performance requirement on the computer in an actual application scenario; and step 4: uploading a trained model to a server, wherein the acquired vehicle image is uploaded by a camera to the server for detection, and the number of axles and a relative wheelbase of the vehicle are detected and compared with a national vehicle load standard GB1589-2016 to obtain a theoretical maximum load of the vehicle; and obtaining a true load of the vehicle through a piezoelectric sensor under a road, and determining whether the vehicle is overloaded by comparing the theoretical maximum load and the true load.

2. The real-time vehicle overload detection method based on a CNN according to claim 1, wherein step 3 specifically comprises: step 3.1: pre-training the YOLO-V3 object detection network model with Darknet53, and training the model with the VOC-format dataset prepared in step 1, wherein the Darknet53 is mainly composed of a series of 1×1 and 3×3 convolutional layers, with a total of 53 layers, and each convolutional layer is followed by a batch normalization (BN) layer and LeakyReLU layer; step 3.2: sparsely training the network model, performing channel pruning on the network according to a proportion or a set threshold, and performing iterative pruning according to a precision of a pruned network until a detection precision meets a requirement; and step 3.3: selecting a pruning channel, wherein a key for selecting the pruning channel is to search a channel less contributed to an output; a convolutional channel is selected based on characteristics of intrinsic parameters of the convolutional channel, for example, all channels are sorted based on characteristics of numerical values such an average of parameters, an L1 norm and an L2 norm and pruned according to a sorted result and the proportion or the threshold, and a channel less affecting the number of axles of the vehicle and a detection precision of the wheel is removed, thereby simplifying a structure of the network model; and with γ parameters of the BN layers as sparse factors, L1 regularization is performed on the γ parameters, such that a part of γ parameters approach to 0, and a convolution kernel having a γ parameter less than the threshold is pruned, thereby completing the model training.

3. The real-time vehicle overload detection method based on a CNN according to claim 1, wherein with the utilization of coordinate information of a wheel bounding box and a vehicle body bounding box, only the number of wheels in the vehicle body bounding box is calculated during detection on the number of axles of the vehicle; and an automatic online real-time vehicle overload detection is implemented as follows: step 1: acquiring the number of axles and the relative wheelbase of the vehicle photographing the vehicle with the camera, and uploading a photographed image to the server for real-time detection; and acquiring the number of tires on the single side of the vehicle to obtain the number of axles of the vehicle, calculating the relative wheelbase with a center coordinate of a bounding box, and comparing the number of axles and the relative wheelbase with the national vehicle load standard GB1589-2016 to obtain the theoretical maximum load of the vehicle; and step 2: evaluating a detection effect evaluating the detection effect to verify the effectiveness of a wheel detection model, wherein object detection evaluation indexes comprise a precision and a recall, with a following Eq.: Precision = T P T P + F P , Recall = T P T P + F N wherein, TP represents a true positive, i.e., a detection result is the wheel that is also detected actually; FP represents a false positive, i.e., the detection result is the wheel that is not detected actually; and FN represents a false negative, i.e., the detection result is not the wheel that is detected actually; introducing an average precision (AP) to evaluate a network performance since individual use of the precision or the recall cannot reflect the network performance accurately, wherein the AP is calculated as follows:
AP=∫.sup.1.sub.0P(r)dr wherein, P represents the precision, r represents the recall, and P is a function with the r as a parameter; a result obtained is an area enclosed by a Precision-Recall curve; and a higher AP value indicates a better performance of the trained model for detecting the number of axles and the wheel of the truck.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0038] FIG. 1 illustrates a flow chart of a detection algorithm.

[0039] FIG. 2 illustrates a network structure of YOLO-V3.

[0040] FIG. 3 illustrates a network structure of Darknet-53.

[0041] FIG. 4 illustrates a schematic view and a flow chart of channel pruning, where a is the schematic view of the channel pruning, and b is the flowchart of the channel pruning.

[0042] FIG. 5 illustrates a flow chart of a K-means clustering algorithm.

DETAILED DESCRIPTION OF THE EMBODIMENTS

[0043] The specific implementation of the present disclosure will be introduced below according to the above descriptions.

[0044] The offline part includes two steps:

[0045] Step 1: data acquisition

[0046] Acquire data with a camera on site, photograph multiple scenarios from multiple angles and ensure that each axle number and wheelbase are included in about 5,000 vehicle images.

[0047] Step 1.1: dataset preparation

[0048] Prepare a VOC-format dataset by labeling a wheel and a vehicle body in each photographed image.

[0049] Step 2: construction of a YOLO-V3 network framework and model training

[0050] The YOLO algorithm is to input an image to be detected into the convolutional network for direct classification and bounding box regression. The YOLO-V3 network structure (as shown in FIG. 2) includes two parts, one being a backbone network Darknet-53 for feature extraction and the other being a prediction network for classification and detection box regression.

[0051] The computer has a memory of 8 G, and a graphics card of NvidiaGeforeGTX1060. The parallel computation framework and acceleration pool of Nvidia are employed and the version CUDA10+cudnn7.4 is installed.

[0052] Darknet-53 provides 53 convolutional layers. Because of the residual structure, it can perform deeper construction than the Darknet-19 network. To some extent, the deeper the network, the better the feature extraction capability. Hence, the Darknet-53 model has the higher classification precision than the Darknet-19. The YOLO-V3 abandons the last layer of the Darknet-53 and takes front 52 convolutional layers of the Darknet-53 as the backbone network for feature extraction (as shown in FIG. 3).

[0053] In order to implement the real-time detection and maintain the original precision to the greatest extent, channel pruning is performed on the YOLO-V3 to reduce convolutional channels of the YOLO globally. The feature extraction network of the YOLO is then adjusted to reduce a convolutional layer less contributed to the network, thereby obtaining a narrower object detection network.

[0054] The convolution kernel can be deemed as a basic unit of the convolutional layer. After one convolution kernel is pruned, the corresponding output channel is also pruned. When designing the artificial neural network, researchers do not know how many channels are appropriate, and tend to design more channels for the fear of losing effective features of the network. As a result of the blindness, there are many redundant channels in the network. Upon pruning of some redundant convolution kernels, these convolution kernels are not subjected to any calculation during forward reasoning. Meantime, input channels of next convolutional layers corresponding to output of the convolution kernels are also pruned, thereby compressing the network greatly. As the channel less contributed to the network is pruned, the pruning has a little impact on the whole network. FIG. 4 illustrates the schematic view and flow chart of the channel pruning.

[0055] With the use of a prior box, the YOLO algorithm provides an anchor box for the convolutional network to predict the object bounding box. It narrows the feature map by increasing the step size of the convolution kernel instead of the use of a pooling layer. In other object detection algorithms, the prior box is manually set based on experience and is not accurate. The YOLO algorithm performs clustering analysis on the manual labeling box of the training sample with a K-means clustering method, and initializes the anchor box with the width and height obtained from the clustering.

[0056] FIG. 5 illustrates a flow chart of the K-means clustering algorithm. The K-means clustering algorithm mainly includes: Step 1: Randomly assign K points as initial centroids. Step 2: Classify each object to be classified to a cluster of the nearest centroid. Step 3: Calculate a centroid of each cluster after classification, update calculated centroids as new centroids of the clustering algorithm, and perform iterative calculation on Step 2 and Step 3 until the centroids no longer change or the number of iterations reaches.

[0057] In the K-means algorithm, the distance between the object to be classified and the centroid is indicated by a Euclidean distance, and specifically calculated as follows:


dis(X,C)=√{square root over (Σ.sup.n.sub.i=1(X.sub.i−C.sub.i).sup.2)}

[0058] where, X represents the object to be classified, C represents the centroid, X.sub.i represents an ith property of the object to be classified, C.sub.i represents an ith property of the clustering center, and n represents the number of properties. Distances from each object to be classified to each centroid are compared one by one to obtain m clusters, m being set manually as required. The evaluation index for the classification result of K-means is a sum of distances from all classified objects to centroids thereof. The smaller sum is an indication of a better classification effect.

[0059] The YOLO-V3 provides three different scales for output and each scale requires three prior boxes. In this case, nine prior boxes of different sizes are clustered to detect objects of different sizes. The three times of detection correspond to different the receptive ranges. Table 1 illustrates the corresponding relationship between the size of the feature map and the receptive range, where the 32-fold down-sampling is suitable for large objects with the largest receptive range, the 16-fold for middle-sized objects, and the 8-fold for small objects with the smallest receptive range.

[0060] The YOLO-V3 detects objects of different sizes with multi-scale prediction. By virtue of the multi-scale prediction, feature information extracted by networks on different layers can be combined to improve the detection effect. Shallow neutral networks more focus on detail information of the images, while the high-level networks can extract more semantic feature information. The output from the deep network is fused with the output from the low-level network, such that the resolution of feature mapping can be increased and the network can make a prediction with more information. Therefore, the object detection effect is effectively improved, and particularly, the detection effect for small objects is obviously improved.

TABLE-US-00001 TABLE 1 Corresponding relationship between the size of the feature map and the receptive range Feature map 13 × 13 26 × 26 52 × 52 Receptive range Large middle small Prior box (116 × 90)  (30 × 61) (10 × 13) VOC dataset (156 × 198) (62 × 45) (16 × 30) (373 × 326)  (59 × 119) (33 × 23)

[0061] The online part includes two steps:

[0062] Step 1: acquisition for the number of axles and a relative wheelbase of the vehicle

[0063] Detect a photographed image of the camera with the trained model in real time to obtain the number of tires on a single side of the vehicle and the number of axles of the vehicle, calculate the relative wheelbase with a center coordinate of a detection box, and compare the number of axles and the relative wheelbase with a national vehicle load standard to obtain a theoretical maximum load of the vehicle.

[0064] Step 2: evaluation of a detection effect

[0065] Evaluate the detection effect to verify the effectiveness of a wheel detection model. Object detection evaluation indexes include a precision and a recall, with a following Eq.:

[00002] Precision = T P T P + F P , Recall = T P T P + F N

[0066] where, TP represents a true positive, i.e., a detection result is the wheel that is also detected actually; FP represents a false positive, i.e., the detection result is the wheel that is not detected actually; and FN represents a false negative, i.e., the detection result is not the wheel that is detected actually; The recall and the precision are two paradoxical measures, and a higher recall may indicate a lower precision.

[0067] Introduce an AP to evaluate a network performance since individual use of the precision or the recall cannot reflect the network performance accurately. The AP is calculated as follows:


AP=∫.sup.1.sub.0P(r)dr

[0068] where, P represents the precision, r represents the recall, and P is a function with the r as a parameter; a result obtained is an area enclosed by a Precision-Recall curve. A higher AP value indicates a better performance of the trained model for detecting the number of axles and the wheel of the truck.