METHOD FOR FEATURE DETECTION OF COMPLEX DEFECTS BASED ON MULTIMODAL DATA
20230316736 · 2023-10-05
Inventors
Cpc classification
G06V10/774
PHYSICS
International classification
G06V10/80
PHYSICS
G06V10/774
PHYSICS
Abstract
The present disclosure disclose a method for feature detection of complex defects based on multimodal data, including feature extraction of multimodal data, multimodal feature cross-guided learning, multimodal feature fusion, and defect classification and regression. Feature extraction networks for multimodal two-dimensional data are constructed first, and a defect data set is sent to the networks for training; during training, cross-guided learning is implemented by using a multimodal feature cross-guidance network; then feature fusion is performed by using a weight adaptive method; and finally a defect detection task is implemented by using a classification subnetwork and a regression subnetwork. In the present disclosure, fusion of the multimodal data in a process of feature detection of the complex defects can be implemented efficiently, a capability of detecting the complex defects in an industrial environment can be improved more effectively, and production efficiency in an industrial manufacturing process is ensured.
Claims
1. A method for feature detection of complex defects based on multimodal data, specifically comprising the following steps: step S1: constructing feature extraction networks; step S2: inputting multimodal training data into the feature extraction networks for parallel learning of multimodal features; step S3: constructing a multimodal feature cross-guidance network, and establishing a local connection between parallel multimodal data extraction networks to form a multimodal feature cross-guidance mechanism; step S4: performing multimodal adaptive fusion by using weights; and step S5: implementing defect detection by using a classification subnetwork and a regression subnetwork.
2. The method for feature detection of complex defects based on multimodal data according to claim 1, wherein step S1 specifically comprises: constructing a plurality of parallel feature extraction networks by using a convolutional neural network, which correspond to extraction of data of multiple modals respectively, wherein each of the parallel feature extraction networks comprises six layers, which comprise different convolutional layers, pooling layers, dense block structures, and dilated bottleneck layer structures.
3. The method for feature detection of complex defects based on multimodal data according to claim 1, wherein step S2 specifically comprises: dividing an industrial defect multimodal data set into a training set and a test set, and inputting the training set into the parallel feature extraction networks first for feature extraction.
4. The method for feature detection of complex defects based on multimodal data according to claim 2, wherein step S3 specifically comprises: establishing a local connection between the feature extraction networks in a first stage, a third stage, and a fifth stage by using a 1×1 convolutional layer, merging features of a same stage first, and finally superimposing the merged features on each parallel feature extraction network as a whole through the 1×1 convolutional layer, to implement cross guidance of multimodal features, and establish a feature flow mechanism of different modal data in feature extraction.
5. The method for feature detection of complex defects based on multimodal data according to claim 1, wherein step S4 specifically comprises: establishing interdependence between feature channels of each parallel feature extraction network, automatically acquiring an importance degree of each feature channel by using a learning method, and then promoting useful features and suppressing features of little use in a current task based on the importance degree.
6. The method for feature detection of complex defects based on multimodal data according to claim 1, wherein step S5 specifically comprises: constructing a classification subnetwork and a regression subnetwork by using two fully convolutional networks and a distribution of feature pyramid network structures, and sending fused feature information into the two subnetworks for defect classification and location.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0018] In order to describe the technical solutions in the embodiments of the present disclosure more clearly, the accompanying drawings required for describing the embodiments are briefly described below. Obviously, the accompanying drawings in the following description show merely some embodiments of the present disclosure, and a person of ordinary skill in the art can further derive other accompanying drawings from these accompanying drawings without creative efforts.
[0019]
[0020]
[0021]
[0022]
DETAILED DESCRIPTION OF THE EMBODIMENTS
[0023] The technical solutions of the embodiments of the present disclosure are clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure. Apparently, the described embodiments are merely some rather than all of the embodiments of the present disclosure. All other embodiments obtained by a person of ordinary skill in the art on the basis of the embodiments of the present disclosure without creative efforts shall fall within the protection scope of the present disclosure.
[0024] Referring to
[0025] First, a plurality of feature extraction networks based on a convolutional neural network are established to implement feature extraction of multimodal two-dimensional data. Specifically, the structure of the feature extraction network based on the convolutional neural network is shown in
[0026] Specifically, the network structure is divided into six layers, including different dense connection structures, convolutional layers, and bottleneck layer structures.
[0027] The first layer includes a convolutional layer with a 7×7 convolution kernel.
[0028] The second layer includes a 3×3 maximum pooling layer and a dense connection structure, and the dense connection structure includes alternating 1×1 convolutional layers and 3×3 convolutional layers.
[0029] The third layer and the fourth layer include two layers of dense connection structure different in structure, and each of the dense connection structures includes alternating 1×1 convolutional layers and 3×3 convolutional layers.
[0030] The fifth layer and the sixth layer have the same structure, including two dilated bottleneck layer structures and a dilated bottleneck layer structure with a 1×1 convolutional layer in parallel. The specific structure is shown in
[0031] A corresponding multimodal industrial defect data set is constructed and divided into a training set and a test set.
[0032] Then, training is performed based on the foregoing feature extraction networks and the data set.
[0033] In addition, cross guidance of multimodal feature is performed based on the foregoing established feature extraction networks based on the convolutional neural network. Specifically, this multimodal feature cross-guidance structure is shown in
[0034] Then, multimodal feature fusion is performed. To solve the imbalance problem of multimodal feature fusion, learnable weights are introduced for multimodal features first. Specifically, as shown in
[0035] A Relu function is used for learning weights to ensure that ω.sub.i is greater than or equal to 0, ε = 0.0001 is a value for avoiding numerical instability, I.sub.i represents multimodal feature information to be fused, and 0 represents fused global feature information. Similarly, the value of each normalized weight also falls between 0 and 1.
[0036] Furthermore, the feature information obtained after feature fusion is sent to a classification subnetwork and a regression subnetwork to predict defect target bounding boxes. The classification subnetwork predicts a probability of an object occurring at each spatial location for each bounding box and object category. This subnetwork is implemented by connecting a small fully convolutional network to each feature pyramid network level. Parameters of this subnetwork are shared at all levels. The regression subnetwork is parallel to the classification subnetwork, and another fully convolutional network is attached to each pyramid network level, so that an offset of each bounding box returns to the vicinity of ground truth. ground truth represents manually marked defect detection data.
[0037] Various aspects of the present disclosure are described with reference to the accompanying drawings in the present disclosure, and the accompanying drawings show many illustrated embodiments. However, the embodiments of the present disclosure are not necessarily defined to include all aspects of the present disclosure. It should be understood that the various concepts and embodiments described above and the concepts and implementations described in more detail below may be implemented in any of many ways, because the disclosed concepts and embodiments of the present disclosure are not limited to any implementation. In addition, some disclosed aspects of the present disclosure may be used alone or in any appropriate combination with other disclosed aspects of the present disclosure.
[0038] The preferred embodiments of the present disclosure disclosed above are only used to help illustrate the present disclosure. The preferred embodiments neither describe all the details in detail, nor limit specific implementations of the present disclosure. Obviously, many modifications and changes may be made based on the content of the present specification. In the present specification, these embodiments are selected and specifically described to better explain the principle and practical application of the present disclosure, so that a person skilled in the art can well understand and use the present disclosure. The present disclosure is only limited by the claims and a full scope and equivalents thereof.