METHOD FOR GLASS DETECTION IN REAL SCENES

Abstract

The invention discloses a method for glass detection in a real scene, which belongs to the field of object detection. The present invention designs a combination method based on LCFI blocks to effectively integrate context features of different scales. Finally, multiple LCFI combination blocks are embedded into the glass detection network GDNet to obtain large-scale context features of different levels, thereby realize reliable and accurate glass detection in various scenarios. The glass detection network GDNet in the present invention can effectively predict the true area of glass in different scenes through this method of fusing context features of different scales, successfully detect glass with different sizes, and effectively handle with glass in different scenes. GDNet has strong adaptability to the various glass area sizes of the images in the glass detection dataset, and has the highest accuracy in the field of the same type of object detection.

Claims

1. A method for glass detection in a real scene, wherein the method specifically includes the following steps: step 1 constructing glass detection dataset GDD using cameras and smartphones to capture glass images for constructing a glass detection dataset GDD; the glass detection dataset GDD contains images with different scenes and different sizes of glass scenes to ensure the diversity of network learning and the applicability of the network; the images are taken in real scenes; the captured images in the glass detection dataset GDD are divided into training set and testing set; step 2 multi-level feature extractor extracts features inputting the images in the training set of the GDD dataset constructed in step 1 into the multi-level feature extractor MFE to harvest features of different levels; the multi-level feature extractor MFE is implemented by using a feature extraction network; step 3 constructing a large-scale contextual feature integration LCFI block using the cross convolution to construct LCFI block: achieving large-scale feature extraction through vertical convolution and horizontal convolution with dilation rate r and kernel size k; and using another parallel cross convolution with reverse order, to extract complementary large-scale context features to eliminate ambiguity; step 4 Designing LCFIM module combining n LCFI blocks with different sizes to form an LCFIM module to obtain contextual features from different scales; specifically: inputting the feature layer extracted by the multi-level feature extractor MFE into n parallel LCFI blocks, and the output of each LCFI block is fused through the attention module; at the same time, an information flow is added between two adjacent LCFI blocks to explore more contextual features, that is, the output of the current LCFI block is used as the input for next LCFI block, thereby fusing the local features from the previous LCFI block with the context features of the current block, and further processed by the current LCFI block, expanding the scale of exploration; step 5 combining to form a glass detection network GDNet and output the detection results embedding the multi-level feature extractor MFE selected in step 2 and the LCFIM module constructed in step 4 into the glass detection network GDNet to obtain different levels of large-scale context features; the glass detection network GDNet includes multi-level feature extraction in sequence the MFE, LCFIM module and the subsequent deconvolution operation; combining the three parts in sequence, and using the fused features as the final glass detection result to realize the glass detection and output the glass detection result; step 6 verifying the validity of GDNet comparing GDNet with methods in related fields to verify the effectiveness of GDNet; the methods in the related fields use public codes or settings with original recommended parameters, and they are all trained on the GDD training set and tested on the testing set.

2. The method for glass detection in a real scene according to claim 1, wherein the feature extraction network mentioned in step 2 comprises VGG16 or ResNet50.

3. The method for glass detection in a real scene according to claim 1, wherein the related field methods in step 6 include DSS, PiCANet, ICNet, PSPNet, DenseASPP, BiSeNet, PSANet, DANet, CCNet, RAS, R3Net, CPD, PoolNet, BASNet, EGNet, DSC, BDRAR, and MirrorNet.

Description

DESCRIPTION OF THE DRAWINGS

[0034] FIG. 1 is a partial picture display of the dataset according to an embodiment of the present invention.

[0035] FIG. 2 is the network structure of the GDNet of the present invention.

[0036] FIG. 3 shows the results of the comparative experiment.

DETAILED DESCRIPTION

[0037] The specific embodiments of the present invention will be further described below in conjunction with the drawings and technical solutions.

[0038] 2,827 images were taken from indoor scenes, and 1,089 images were taken from outdoor scenes. All the images were taken in real scenes. For dataset segmentation, 2,980 images are randomly selected for training, and the remaining 936 images are used for testing.

[0039] The invention realizes the construction of the glass detection network GDNet on the PyTorch framework. For training, the input image is adjusted to a resolution of 416×416 and augmented by horizontal random flipping. The parameters of the multi-level feature extractor MFE are initialized by the pre-trained ResNeXt101 network, and other parameters are initialized randomly.

[0040] Each LCFIM module is composed of 4 LCFI blocks, the kernel sizes of the LCFI blocks are 3, 5, 7, and 9, respectively, and the dilation rates are 1, 2, 3, and 4, respectively. Connecting 4 LCFI blocks to form an LCFIM module, and feeding the feature layer extracted by MFE into the LCFIM module to extract rich context features and perform down-sampling. At the same time, adding information flow between two adjacent LCFI blocks for further exploring more contextual features, that is, the output of the current LCFI block is used as the input of the next LCFI block. The selected multi-level feature extractor MFE and LCFIM modules are embedded in the glass detection network GDNet; the glass detection network GDNet includes the multi-level feature extractor MFE, LCFIM module, and subsequent deconvolution operations, and finally the output of the glass detection network GDNet is used as the result of glass detection.

[0041] The training process uses a stochastic gradient descent (SGD) equipped with a momentum of 0.9 and a weight decay of 5×10.sup.−4 to optimize the entire network. The learning rate is adjusted through the poly strategy, and the basic learning rate is 0.001. The batch size is set to 6, and the balancing parameters wh, wl, and wf are set to 1 based on experience. It takes about 22 hours for the network to converge on the NVIDIA GTX 1080Ti graphics card. For testing, the image is adjusted to a resolution of 416×416 for network inference. For the final glass detection results, no post-processing procedures such as Conditional Random Field (CRF) are performed.

[0042] FIG. 1 is a display of part of the images in the glass dataset proposed by the present invention. The GDD dataset is a dataset with about 4,000 images. The GDD dataset includes images of glass in daily life scenes, images of glass with complex object backgrounds, and images of multiple glass, ensuring that the training set and testing set have the same distribution curve, and ensuring the completeness and structural accuracy of the dataset.

[0043] FIG. 2 is the network structure of GDNet. First, feeding the image into a multi-level feature extractor (MFE) to harvest features of different levels. Then, the four feature layers extracted by MFE are fed into four LCFIM modules to learn a wide range of context features. The outputs of three LCFIM modules are fused together to generate high-level large-scale context features, which will be used to continuously guide the low-level large-scale context features extracted from the first LCFIM module to focus more on place of the glass area. Finally, the high-level large-scale context features and the low-level large-scale context features are fused, and the fused features are used to generate the final glass detection results.

[0044] FIG. 3 shows the results of the comparative experiment. The deep network used for semantic/instance segmentation and the public saliency detection code are trained on the GDD dataset. The deep networks include DSS, PiCANet, ICNet, PSPNet, BiSeNet, DANet, CCNet, RAS, R3Net, CPD, PoolNet, BASNet, EGNet, DSC, BDRAR, and MirrorNet. Adjusting the training parameters of these networks during the training process to obtain the best glass detection results. Finally, using the testing set of the GDD dataset to verify and compare the final glass detection results. After experimental comparison, the glass detection network GDNet of the present invention has the best glass detection accuracy rate, which shows that the LCFIM module in GDNet integrates multi-scale context features very well, and verifies the effectiveness of the LCFIM module. And GDNet has god versatility and accuracy. The verification results are shown in the table below.

TABLE-US-00001 Method IoU↑ PA↑ Fβ↑ MAE↓ BER↓ ICNet 69.59 0.836 0.821 0.164 16.1 PSPNet 84.06 0.916 0.906 0.084 8.79 BiSeNet 80 0.894 0.883 0.106 11.04 PSANet 83.52 0.918 0.909 0.082 9.09 DANet 84.15 0.911 0.901 0.089 8.96 CCNet 84.29 0.915 0.904 0.085 8.63 DSS 80.24 0.898 0.89 0.123 9.73 PiCANet 83.73 0.916 0.909 0.093 8.26 RAS 80.96 0.902 0.895 0.106 9.48 CPD 82.52 0.907 0.903 0.095 8.87 PoolNet 81.92 0.907 0.9 0.1 8.96 BASNet 82.88 0.907 0.896 0.094 8.7 EGNet 85.04 0.92 0.916 0.083 7.43 DSC 83.56 0.914 0.911 0.09 7.97 BDRAR 80.01 0.902 0.908 0.098 9.87 MirrorNet 85.07 0.918 0.903 0.083 7.67 GDNet 87.63 0.939 0.937 0.063 5.62

METHOD FOR GLASS DETECTION IN REAL SCENES

Inventors

Cpc classification

Classification Explorer

G06V10/267

PHYSICS

Classification Explorer

G06V10/768

PHYSICS

Classification Explorer

G06F18/214

PHYSICS

Classification Explorer

G06V10/44

PHYSICS

Classification Explorer

G06V20/00

PHYSICS

Classification Explorer

G06V10/776

PHYSICS

Classification Explorer

G06V10/7715

PHYSICS

Classification Explorer

G06V10/806

PHYSICS

Classification Explorer

G06V10/7747

PHYSICS

International classification

Classification Explorer

G06V10/774

PHYSICS

Classification Explorer

G06V10/44

PHYSICS

Classification Explorer

G06V10/70

PHYSICS

Classification Explorer

G06V10/77

PHYSICS

Classification Explorer

G06V10/776

PHYSICS

Classification Explorer

G06V10/80

PHYSICS

Abstract

Claims

Description