HIGH-PRECISION POINT CLOUD COMPLETION METHOD BASED ON DEEP LEARNING AND DEVICE THEREOF
20230206603 · 2023-06-29
Assignee
Inventors
- Dengyin Zhang (Nanjing, CN)
- Yingying FENG (Nanjing, CN)
- Li HUANG (Nanjing, CN)
- Weidan Yan (Nanjing, CN)
Cpc classification
G06V10/44
PHYSICS
G06V10/454
PHYSICS
G06V10/7715
PHYSICS
G06V10/80
PHYSICS
International classification
G06V10/77
PHYSICS
G06V10/80
PHYSICS
Abstract
The present disclosure discloses a high-precision point cloud completion method based on deep learning and a device thereof, which comprises the following steps: introducing dynamic kernel convolution PAConv into a feature extraction module, learning a weight coefficient according to the positional relationship between each point and its neighboring points, and adaptively constructing the convolution kernel in combination with the weight matrix. A spatial attention mechanism is added to a feature fusion module, which facilitates a decoder to better learn the relationship among various features, and thus better represent the feature information. A discriminator module comprises global and local attention discriminator modules, which use multi-layer full connection to classify and determine whether the generated results conform to the real point cloud distribution globally and locally, respectively, so as to optimize the generated results.
Claims
1. A high-precision point cloud completion method based on deep learning, comprising: acquiring point cloud data to be processed; preprocessing the point cloud data to obtain preprocessed point cloud data; inputting the preprocessed point cloud data into a trained point cloud completion model, wherein the point cloud completion model comprises a multi-resolution encoder module, a pyramid decoder module and an attention discriminator module; the multi-resolution encoder module is configured to perform feature extraction and fusion on the input point cloud data to obtain feature vectors; the pyramid decoder module is configured to process the feature vectors to obtain point cloud completion results of three scales; the attention discriminator module is configured to use the idea of a generative adversarial network to produce the results of consistency of global and local features through mutual game learning between a generation model and a discrimination model; determining high-precision point cloud completion results according to the output of the point cloud completion model.
2. The high-precision point cloud completion method based on deep learning according to claim 1, wherein the multi-resolution encoder module comprises a feature extraction module and a feature fusion module, a dynamic convolution layer PAConv is embedded in a multi-layer perceptron with shared weights in the feature extraction module, a weight coefficient is learned according to the positional relationship between each point and its neighboring points, and the convolution kernel is adaptively constructed in combination with the weight matrix, so as to improve the capability of extracting local detail features; a spatial attention mechanism is added to the feature fusion module to realize feature focusing in spatial dimension; three missing point clouds of different scales generated by sampling the farthest point are input into the multi-resolution encoder module; the feature extraction module of the multi-layer perceptron embedded in the dynamic kernel convolution PAConv is used to extract the features of three missing point clouds of different scales to generate multidimensional feature vectors V.sub.1, V.sub.2, V.sub.3; the output multidimensional feature vectors V.sub.1, V.sub.2, V.sub.3 are input into the feature fusion module consisted of the spatial attention mechanism, the spatial attention mechanism learns 1024-dimensional abstract features that synthesize local features and global information, and outputs weighted features of each position; thereafter, three 1024-dimensional abstract features are spliced by a splicing array, and finally, the potential feature mapping is integrated into the final feature vector V with 1024 dimensions using the MLP.
3. The high-precision point cloud completion method based on deep learning according to claim 2, wherein the method of constructing the dynamic kernel convolution PAConv comprises: initializing a weight library W={W.sub.k|k=1, 2, . . . , K} consisted of K weight matrices with the size of C.sub.in×C.sub.out, wherein C.sub.in represents the input dimension of the network in the current layer and C.sub.out represents the output dimension of the network in the current layer; calculating the relative position relationship between each point p.sub.i in the input point cloud and the neighboring points p.sub.j, and learning the weight coefficients E.sub.ij={E.sub.k.sub.
E.sub.ij=Softmax(θ(p.sub.i,p.sub.j)) where θ is a nonlinear function implemented by the convolution with a kernel size of 1×1; using the Softmax function for normalization operation to ensure that the output score is in the range (0,1), in which a higher score means that the corresponding position has more important local information; forming the kernel of PAConv by combining the weight matrix W.sub.k and the weight coefficient E.sub.k.sub.
4. The high-precision point cloud completion method based on deep learning according to claim 3, wherein the value of K is 16.
5. The high-precision point cloud completion method based on deep learning according to claim 1, wherein processing the feature vectors to obtain point cloud completion results of three scales comprises: obtaining three sub-feature vectors U.sub.1, U.sub.2, U.sub.3 with different resolutions by the feature vectors V through the full connection layer, wherein each sub-feature vector is responsible for completing the point clouds with different resolutions; using U.sub.3 to predict a primary point cloud P.sub.3, using U.sub.2 to predict the relative coordinate of a secondary point cloud P.sub.2 from the central point P.sub.3, and using the recombination and full connection operation to generate the secondary point cloud P.sub.2 according to P.sub.3; and using U.sub.1 and P.sub.2 to predict the relative coordinate of the final point cloud P.sub.1 from the center point P.sub.2 to supplement the final point cloud P.sub.1.
6. The high-precision point cloud completion method based on deep learning according to claim 1, wherein the attention discriminator module comprises a global attention discriminator and a local attention discriminator; the global discriminator is configured to view the whole point cloud completion result to evaluate its overall consistency, and the local discriminator module views a small area centered on the completed area to ensure the local consistency of the generated point cloud.
7. The high-precision point cloud completion method based on deep learning according to claim 6, wherein the processing process of the attention discriminator module comprises: sending the whole or local generated point cloud and the real point cloud to the attention discriminator, obtaining the feature vector with 512 dimensions through an auto-encoder therein, and then reducing the dimension [512-256-128-16-1] through the continuous full connection layer, and outputting the final fake or real binary result.
8. The high-precision point cloud completion method based on deep learning according to claim 1, wherein the training method of the point cloud completion model comprises: a loss function comprising two parts: a generated loss and an adversarial loss; using a chamfer distance CD to calculate the average shortest point distance between the generated point cloud and the real point cloud on the ground, in which the calculation formula is as follows:
L.sub.com=d.sub.CD.sub.
L=βL.sub.com+λL.sub.adv β and λ are the weights of the generated loss L.sub.com and the adversarial loss L.sub.adv, respectively, satisfying the following condition: β+λ=1; the chamfer distance CD is also used as an evaluation index to test the completion performance.
9. A high-precision point cloud completion device based on deep learning, comprising a processor and a storage medium; wherein the storage medium is configured to store instructions; the processor is configured to operate according to the instructions to perform the steps of the method according to claim 1.
10. A storage medium on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the method according to claim 1.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0049]
[0050]
[0051]
[0052]
DETAILED DESCRIPTION OF THE EMBODIMENTS
[0053] In order to make the technical means, creative features, objectives and effects of the present disclosure understandable, the present disclosure will be further illustrated with reference to specific embodiments.
[0054] In the description of the present disclosure. “several” meanings more than one, “a plurality of” meanings more than two, “greater than, less than, more than, etc.” are understood as excluding the number itself, and “above, below, within, etc.” are understood as including the number itself. If a first and a second are described, they are only used for the purpose of distinguishing technical features, but cannot be understood as indicating or implying relative importance, or implicitly indicating the number of indicated technical features or implicitly indicating the sequence of indicated technical features.
[0055] In the description of the present disclosure, the description referring to the terms “one embodiment”. “some embodiments”, “illustrative embodiments”. “examples”. “specific examples” or “some examples” means that the specific features, structures, materials or characteristics described in connection with the embodiment or example are included in at least one embodiment or example of the present disclosure. In this specification, the schematic expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials or characteristics which are described may be combined in any one or more embodiments or examples in a suitable manner.
Embodiment 1
[0056] A high-precision point cloud completion method based on deep learning comprises:
[0057] acquiring point cloud data to be processed;
[0058] preprocessing the point cloud data to obtain preprocessed point cloud data;
[0059] inputting the preprocessed point cloud data into a trained point cloud completion model, wherein the point cloud completion model comprises a multi-resolution encoder module, a pyramid decoder module and an attention discriminator module;
[0060] the multi-resolution encoder module is configured to perform feature extraction and fusion on the input point cloud data to obtain feature vectors;
[0061] the pyramid decoder module is configured to process the feature vectors to obtain point cloud completion results of three scales;
[0062] the attention discriminator module is configured to use the idea of a generative adversarial network to produce the results of consistency of global and local features through mutual game learning between a generation model and a discrimination model;
[0063] determining high-precision point cloud completion results according to the output of the point cloud completion model.
[0064] In some embodiments, a high-precision point cloud completion method based on deep learning is provided. As shown in
[0065] First, the farthest point from the existing set of sampling points is selected iteratively by sampling the farthest point, so as to acquire a set of skeleton points. This can represent the distribution of point sets more evenly, without destroying the structure of a point cloud model. Three missing point clouds of different scales generated by sampling the farthest point are input into the multi-resolution encoder module to extract feature. The multi-layer perceptron embedded in the dynamic kernel convolution PAConv is used to generate multidimensional feature vectors V.sub.1, V.sub.2, V.sub.3. The output multidimensional feature vectors V.sub.1, V.sub.2, V.sub.3 are input into the feature fusion module consisted of the spatial attention mechanism, the structure of which is shown in
[0066] PAConv first initializes a weight library W={W.sub.k|k=1, 2, . . . , K} consisted of K weight matrices with the size of C.sub.in×C.sub.out wherein C.sub.in represents the input dimension of the network in the current layer and C.sub.out represents the output dimension of the network in the current layer. The larger K can ensure the diversity of the convolution kernel, which will also increase the burden of the model. Therefore, in our network model, the value K will be taken as 16. Next, the relative position relationship between each point p.sub.i in the input point cloud and the neighboring points p.sub.j is calculated, and the weight coefficients E.sub.ij={E.sub.k.sub.
E.sub.ij=Softmax(θ(p.sub.i,p.sub.j) (1)
[0067] where θ is a nonlinear function implemented by the convolution with a kernel size of 1×1. The softmax is used for normalization operation to ensure that the output score is in the range (0,1), in which a higher score means that the corresponding position has more important local information. The kernel of PAConv is formed by combining the weight matrix W.sub.k and the weight coefficient E.sub.k.sub.
(p.sub.i,p.sub.j)=Σ.sub.k.sup.KE.sub.k.sub.
[0068] PAConv completes the work of constructing the convolution kernel adaptively so far, so as to capture the local area information of the input features and output the features with local correlation.
[0069] The pyramid decoder module consists of a full connection layer and a recombination layer. Using the idea of the feature pyramid network, the missing point cloud is gradually completed from coarsely to finely. The input is the output feature vector V of a multi-resolution encoder. Three sub-feature vectors U.sub.1, U.sub.2, U.sub.3 with different resolutions are obtained through the full connection layer, and the dimensions are 1024, 512, 256. Each sub-feature vector is responsible for completing the point clouds with different resolutions. First. U.sub.3 is used to predict a primary point cloud P.sub.3, U.sub.z is used to predict the relative coordinate of a secondary point cloud P.sub.2 from the central point P.sub.3, and the recombination and full connection operation is used to generate the secondary point cloud P.sub.Z according to P.sub.3. Similarly, U.sub.1 and P.sub.2 are used to predict the relative coordinate of the final point cloud P.sub.1 from the center point P.sub.Z to supplement the final point cloud P.sub.1.
[0070] The attention discriminator module uses the idea of a generative adversarial network to generate good output through mutual game learning between the generation model and a discrimination model in the framework. The module consists of a global attention discriminator and a local attention discriminator. The global discriminator views the whole point cloud completion result to evaluate its overall consistency, and the local discriminator module views a small area centered on the completed area to ensure the local consistency of the generated point cloud. Specifically, the whole or local generated point cloud and the real point cloud are sent to the attention discriminator, the feature vector with 512 dimensions is obtained through an auto-encoder therein, and then the dimension [512-256-129-16-1] is reduced through the continuous full connection layer, and the final fake or real binary result is output.
[0071] The loss function of the algorithm of the present disclosure comprises two parts: a generated loss and an adversarial loss.
[0072] A chamfer distance CD is used to calculate the average shortest point distance between the generated point cloud and the real point cloud on the ground, in which the calculation formula is as follows:
[0073] In formula (3), CD calculates the average nearest square distance between the generated point cloud S.sub.1 and the real point cloud S.sub.2, the final generated results are three generated point clouds P.sub.1, P.sub.2, P.sub.3 of different scales, and the total loss also consists of three parts, that is, d.sub.CD.sub.
L.sub.com=d.sub.CD.sub.
[0074] In formula (4), P.sub.1gt, P.sub.2gt, P.sub.3gt are the real point clouds corresponding to the three generated point clouds of different scales, respectively. The adversarial loss herein refers to the adversarial network GAN, and the calculation formula is as follows:
L.sub.adv=Σ.sub.1≤i≤S log.sub.10(G(y.sub.i))+Σ.sub.1≤j≤S log.sub.10(1−G(E(D(x.sub.i)))) (5)
[0075] In formula (5), where y.sub.i and x.sub.i belong to an original incomplete point cloud and a real point cloud, respectively. E, D, G represent the multi-resolution encoder, the pyramid decoder and the attention discriminator, respectively. The total loss function consists of the generated loss and the adversarial loss. The calculation formula is shown in Formula (6):
L=βL.sub.com+ΔL.sub.adv (6)
[0076] β and λ are the weights of the generated loss L.sub.com and the adversarial loss L.sub.adv, respectively, satisfying the following condition: β+λ=1; the chamfer distance CD is also used as an evaluation index to test the completion performance.
[0077] The system provided by the present disclosure has the following advantages. [0078] (1) In order to make up for the defect of the point cloud completion method based on deep learning in local feature extraction, a feasible scheme is proposed. [0079] (2) A point cloud model with high completion precision can be obtained, which guarantees the smooth progress of many downstream tasks such as point cloud segmentation, classification, object recognition and point cloud reconstruction.
[0080] The point cloud completion method based on deep learning according to the present disclosure can extract the global and local features of the point cloud and synthesize the local correlation and global information of key points, which makes up for the defect of the point cloud completion method based on deep learning in local feature extraction, improves the precision of point cloud completion, and guarantees the smooth progress of many downstream tasks such as point cloud segmentation, classification, object recognition and point cloud reconstruction.
Embodiment 2
[0081] In a second aspect, this embodiment provides a high-precision point cloud completion device based on deep learning, comprising a processor and a storage medium;
[0082] wherein the storage medium is configured to store instructions;
[0083] the processor is configured to operate according to the instructions to perform the steps of the method according to Embodiment 1.
Embodiment 3
[0084] In a third aspect, this embodiment provides a storage medium on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the method according to Embodiment 1.
[0085] It should be understood by those skilled in the art that the embodiments of the present disclosure can be provided as methods, systems, or computer program products. Therefore, the present disclosure can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Furthermore, the present disclosure may take the form of a computer program product implemented on one or more computer-available storage media (including but not limited to a disk storage, CD-ROM, an optical storage, etc.) in which computer-available program codes are contained.
[0086] The present disclosure is described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to the embodiments of the present disclosure. It should be understood that each flow and/or block in flowcharts and/or block diagrams and combinations of flows and/or blocks in flowcharts and/or block diagrams can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general-purpose computer, a special-purpose computer, an embedded processor or other programmable data processing devices to produce a machine, so that the instructions executed by the processor of the computer or other programmable data processing devices produce a device for implementing the functions specified in one or more flows in flowcharts and/or one or more blocks in block diagrams.
[0087] These computer program instructions can also be stored in a computer-readable memory that can direct a computer or other programmable data processing devices to work in a specific way, so that the instructions stored in the computer-readable memory produce an article of manufacture including an instruction device that implement the functions specified in one or more flows in flowcharts and/or one or more blocks in block diagrams.
[0088] These computer program instructions can also be loaded on a computer or other programmable data processing devices, so that a series of operation steps are executed on the computer or other programmable devices to produce a computer-implemented process, so that the instructions executed on the computer or other programmable devices provide steps for implementing the functions specified in one or more flows in flowcharts and/or one or more blocks in block diagrams.
[0089] According to the technical knowledge, the present disclosure can be implemented by other embodiments without departing from the spirit or essential characteristics thereof. Therefore, the embodiments disclosed above are just examples in all respects, rather than the only embodiments. All changes within the scope of the present disclosure or within the scope equivalent to the present disclosure are included within the present disclosure.