METHOD FOR IDENTIFICATION AND RECOGNITION OF AIRCRAFT TAKE-OFF AND LANDING RUNWAY BASED ON PSPNET NETWORK
20220315243 · 2022-10-06
Assignee
Inventors
Cpc classification
G06V10/778
PHYSICS
G06F18/214
PHYSICS
G06V10/774
PHYSICS
G06V10/26
PHYSICS
G06F18/2113
PHYSICS
International classification
Abstract
The present disclosure relates to a method for identification and recognition of an aircraft take-off and landing runway based on a PSPNet network, wherein the method: adopts a residual network ResNet and a lightweight deep neural network MobileNetV2 as the two backbone feature-extraction networks to enhance that feature extraction; at the same time adjusts an original four-layered pyramid pooling module into five layered, with each layer being respectively sized by 9×9, 6×6, 3×3, 2×2, 1×1; uses a finite self-made image about the aircraft take-off and landing terrain for training; and labels and extracts the aircraft take-off and landing runway in the aircraft take-off and landing terrain image. The method effectively combines ResNet and MobileNetV2, and improves the detection accuracy of the aircraft take-off and landing runway in comparison with the prior art.
Claims
1. A method for identification and recognition of an aircraft take-off and landing runway based on a PSPNet network, comprising: building a PSPNet network, wherein according to an image processing flow, the PSPNet network includes the following parts in sequence: two feature-extraction backbone networks that are respectively used for extracting feature maps; two enhanced feature-extraction modules that are respectively used for further feature extraction of the feature maps extracted by the backbone feature-extraction networks; an up-sampling module which is used for restore the resolution of an original image; a size unification module that is used for unifying the sizes of the enhanced features extracted by the two enhanced feature-extraction modules; a data serial connection module that is used for serially connecting two enhanced features processed by the size unification module; and a convolution output module that is used for convolution and output of the data processed by the data serial connection module; training the PSPNet network, which has the following training processes: building a training data set, wherein N pieces of optical remote sensing data images are collected, some of the images which meet a terrain specific to aircraft take-off and landing are selected for amplification, interception, and data set labeling, namely labeling the position and the area size of the aircraft taking off and landing runway, wherein all labeled images are used as training samples which then constitute a training data set; initializing parameters in the PSPNet network; inputting all the training samples in the training set into the PSPNet network to train the PSPNet network; and calculating a loss function, calculating a cross entropy between the prediction result obtained after the training samples are input into the PSPNet network and the training sample labels, wherein the calculated cross entropy is between all pixel points in the prediction image that enclose the area of the aircraft take-off and landing runway and all pixel points in the training samples that label the aircraft take-off and landing runway; through repeated iterative training and automatic adjustment of the learning rate, obtaining an optimal network model when the loss function value stops dropping; and detecting the image to be detected, inputting the image to be detected into the trained PSPNet network for prediction, filling the predicted pixel points in red, and outputting the prediction result, wherein the area surrounded by all pixel points filled in red is the runway area where the aircraft takes off and lands.
2. The method for identification and recognition of the aircraft take-off and landing runway based on the PSPNet network according to claim 1, wherein a residual network ResNet and a lightweight deep neural network MobileNetV2 are adopted for the two backbone feature-extraction networks, wherein by adopting the residual network ResNet and the lightweight deep neural network MobileNetV2, feature extraction is performed for the input image respectively to obtain two feature maps.
3. The method for identification and recognition of the aircraft take-off and landing runway based on the PSPNet network according to claim 2, wherein the two enhanced feature-extraction modules perform further feature extraction on the two feature maps, specifically including that the feature map obtained by the residual network ResNet are divided into regions sized by 2×2 and 1×1 for processing, and the feature map obtained by the lightweight deep neural network MobileNetV2 are divided into regions sized by 9×9, 6×6, and 3×3 for processing.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
DETAILED DESCRIPTION OF THE EMBODIMENTS
[0035] The present disclosure will be further described with reference to accompanying figures below.
[0036] A method for identification and recognition of the aircraft take-off and landing runway based on the PSPNet network includes the following steps:
[0037] Step 100: building a PSPNet network, as shown in
[0038] Two feature-extraction backbone networks that are respectively used for extracting feature maps; wherein a residual network ResNet and a lightweight deep neural network MobileNetV2 are adopted for the two backbone feature-extraction networks; and by adopting the residual network ResNet and the lightweight deep neural network MobileNetV2, feature extraction is performed for the input image respectively to obtain two feature maps.
[0039] Two enhanced feature-extraction modules that are respectively used for further feature extraction of the feature maps extracted by the backbone feature-extraction networks; wherein the two enhanced feature-extraction modules perform further feature extraction on the two feature maps, specifically including, that the feature map obtained by the residual network ResNet are divided into regions sized by 2×2 and 1×1 for processing, and the feature map obtained by the lightweight deep neural network MobileNetV2 are divided into regions sized by 9×9, 6×6, and 3×3 for processing.
[0040] Specifically, by assuming that a feature layer obtained by the backbone feature-extraction networks is 90×90×480, as for the 9×9 area, it is necessary to set the average pooling step size stride to 90/9=10 and the convolution kernel size kernel_size to 90/9=10; as for the 6×6 area, it is necessary to set the average pooling step size stride to 90/6=15 and the convolution kernel size kernel_size to 90/6=15; as for the 3×3 area, it is necessary to set the average pooling step size stride to 90/3=30 and the convolution kernel size kernel_size to 90/3=30; as for the 2×2 area, it is necessary to set the average pooling step size stride to 90/2=45 and the convolution kernel size kernel_size to 90/2=45; as for the 1×1 area, it is necessary to set the average pooling step size stride to 90/1=90 and the convolution kernel size kernel_size to 90/1=90. When it comes to the final convolution layer, the feature maps extracted by two backbone networks are used to replace a combination of the feature map extracted by one backbone network in the PSPNet network and the up-sampling result output by the pyramid pooling module of the PSPNet network, which are then used as the input of the convolution layer of the PSPNet network.
[0041] An up-sampling module which is used for restore the resolution of an original image.
[0042] A size unification module that is used for unifying the sizes of the enhanced features extracted by the two enhanced feature-extraction modules.
[0043] A data serial connection module that is used for serially connecting two enhanced features processed by the size unification module.
[0044] A convolution output module that is used for convolution and output of the data processed by the data serial connection module.
[0045] Step 200: training the PSPNet network, which has the following training processes:
[0046] Step 210: building a training data set,
[0047] Wherein N pieces of optical remote sensing data images are collected, sonic of the images which meet a terrain specific to aircraft take-off and landing are selected for amplification, interception, and data set labeling by a labelme tool, namely labeling the position and the area size of the aircraft taking off and landing runway, as shown in
[0048] The N pieces of optical remote sensing data images adopt DIOR, NUSWIDE, DOTA, RSOD, NWPU VHR-10, SIRI-WHU and other optical remote sensing data sets as basic data sets, including various terrain areas such as airport runways, building constructions, grasslands, fields, mountains, sandy areas, muddy areas, cement areas, jungles, sea, highways, and roads, as shown in
[0049] In order to prevent image distortion during the image zooming which affects the accuracy and precision of the network, preprocessing is necessary for the images, including image edge padding, so as to achieve an aspect ratio of 1:1 which meets the requirement of network input. At the same time, geometric adjustment is performed for the image sizes to meet the optimal size of network input. The picture preprocessing flow is as shown in
[0050] Step 220: initializing parameters in the PSPNet network;
[0051] Step 230: inputting all the training samples in the training set into the PSPNet network to train the PSPNet network;
[0052] Step 240: calculating a loss function, calculating a cross entropy between the prediction result obtained after the training samples are input into the PSPNet network and the training sample labels, i. e., the cross entropy between all pixel points in the prediction image that enclose the area of the aircraft take-off and landing runway and all pixel points in the training samples that label the aircraft take-off and landing runway; through repeated iterative training and automatic adjustment of the learning rate, obtaining an optimal network model when the loss function value stops dropping;
[0053] Step 300: detecting the image to be detected, inputting the image to be detected into the trained PSPNet network for prediction, filling the predicted pixel points in red, and outputting the prediction result, wherein the area surrounded by all pixel points filled in red is the runway area where the aircraft takes off and lands.
[0054] In order to effectively utilize computing resources of mobile devices and embedded devices, and to improve the speed of real-time processing of high-resolution images, MobileNet is introduced in the present disclosure. In the present disclosure, in view of that the MobileNetV2 parameters, which reduces the consumption of computing resources by 8-9 times compared with ordinary FCN, is relatively less in quantity and fast in computing speed, MobileNetV2 is selected as a backbone feature-extraction network in PSPNet. However, the lightweight MobileNetV2 will inevitably reduce the segmentation accuracy of PSPNet slightly. Therefore, ResNet is reserved as another backbone feature-extraction network in PSPNet, which has good performance in network classification and has high accuracy, thus improving the segmentation accuracy in the PSP module. ResNet and MobileNetV2 work together so as to improve the operation speed of PSPNet on one hand, and to improve the segmentation accuracy as possible on the other hand, meeting the requirements of low consumption, real-time performance and high precision of segmentation tasks.
EXPERIMENTAL VERIFICATION
[0055] The present disclosure adopts Mean Intersection over Union (MIoU), Pixel Accuracy (PA) and Recall as evaluation indicators to measure the performance of the semantic segmentation network. First of all, we calculate MIoU, PA and Recall through the confusion matrix as shown in Table 1.
TABLE-US-00001 TABLE 1 Confusion Matrix Predicted Value Confusion Matrix Positive Negative True Value Positive True Positive (TP) False Negative (FN) Negative False Positive (FP) True Negative (TN)
(1) Mean Intersection Over Union (MIoU)
[0056] MIoU is a standard measure of the semantic segmentation network. In order to calculate MioU, it is necessary to calculate the intersection over union (IoU) of each class object for the semantic segmentation, that is, a value of the intersection-to-union ratio of a ground truth value and a predicted value of each class. The IoU formula is as follows:
[0057] MIoU refers to an average of IOUs of all classes across the semantic segmentation network. Assuming that there are k+1 class objects (0,1 . . . ,k) in the data set, and class 0 usually represents the background, so we have the MIoU formula as follows:
(2) Pixel Accuracy (PA)
[0058] PA is a measurement unit of the semantic segmentation network, which refers to the percentage of correctly labeled pixels in total pixels. The PA formula is as follows:
(3) Recall
[0059] Recall is a measurement unit of the semantic segmentation network, which refers to the proportion of samples with the predicted value and ground truth value both of 1 in all samples with the ground truth value of 1. The Recall formula is as follows:
[0060] According to the present disclosure, a self-made test set is adopted to test the trained PSPNet semantic segmentation network, and the prediction results are shown in
[0061] Finally, it is noted that the above embodiments are only for the purpose of illustrating the technical scheme of the present disclosure without limiting it. Although a detailed specification is given for the present disclosure by reference to preferred embodiments, those of ordinary skills in the art should understand that the technical schemes of the present disclosure can be modified or equivalently replaced without departing from the purpose and scope of the technical schemes thereof, which should be included in the scope of claims of the present disclosure.