A DEEP LEARNING-BASED TEMPORAL PHASE UNWRAPPING METHOD FOR FRINGE PROJECTION PROFILOMETRY

20210356258 · 2021-11-18

Assignee

Inventors

Cpc classification

International classification

Abstract

The invention discloses a deep learning-based temporal phase unwrapping method for fringe projection profilometry. First, four sets of three-step phase-shifting fringe patterns with different frequencies (including 1, 8, 32, and 64) are projected to the tested objects. The three-step phase-shifting fringe images acquired by the camera are processed to obtain the wrapped phase map using a three-step phase-shifting algorithm. Then, a multi-frequency temporal phase unwrapping (MF-TPU) algorithm is used to unwrap the wrapped phase map to obtain a fringe order map of the high-frequency phase with 64 periods. A residual convolutional neural network is built, and its input data are set to be the wrapped phase maps with frequencies of 1 and 64, and the output data are set to be the fringe order map of the high-frequency phase with 64 periods. Finally, the training dataset and the validation dataset are built to train and validate the network. The network makes predictions on the test dataset to output the fringe order map of the high-frequency phase with 64 periods. The invention exploits a deep learning method to unwrap a wrapped phase map with a frequency of 64 using a wrapped phase map with a frequency of 1 and obtain an absolute phase map with fewer phase errors and higher accuracy.

Claims

1. A deep learning-based temporal phase unwrapping method for fringe projection profilometry is characterized in that the specific steps are as follows: step one, four sets of three-step phase-shifting fringe patterns with different frequencies (including 1, 8, 32, and 64) are projected to the tested objects; the projected fringe patterns are captured by the camera simultaneously to acquire four sets of three-step phase-shifting fringe images; step two, the three-step phase-shifting fringe images acquired by the camera are processed to obtain the wrapped phase map using a three-step phase-shifting algorithm; step three, a multi-frequency temporal phase unwrapping (MF-TPU) algorithm is used to unwrap four wrapped phase maps successively to obtain a fringe order map and an absolute phase map of the high-frequency phase with 64 periods; step four, a residual convolutional neural network is built to implement phase unwrapping; steps one to three are repeatedly performed to obtain multiple sets of data, which are divided into a training dataset, a validation dataset, and a test dataset; the training dataset is used to train the residual convolutional neural network; the validation dataset is used to verify the performance of the trained network; step five, the residual convolutional neural network after training and validation makes predictions on the test dataset to realize the precision evaluation of the network and output the fringe order map of the high-frequency phase with 64 periods.

2. According to claim 1, a deep learning-based temporal phase unwrapping method for fringe projection profilometry is characterized by step one wherein four sets of three-step phase-shifting fringe patterns with different frequencies are projected to the tested objects; each set of patterns contains three fringe patterns with the same frequency and different initial phase; any set of three-step phase-shifting fringe patterns projected by the projector can be represented as:
I.sub.1.sup.p(x.sup.p, y.sup.p)=128+127 cos[2πf x.sup.p/W]
I.sub.2.sup.p(x.sup.p, y.sup.p)=128+127 cos[2πf x.sup.p/W+27π/3]
I.sub.3.sup.p(x.sup.p, y.sup.p)=128+127 cos[2πf x.sup.p/W+4π/3] where I.sub.1.sup.p(x.sup.p, y.sup.p), I.sub.2.sup.p(x.sup.p, y.sup.p), I.sub.3.sup.p(x.sup.p, y.sup.p) are three-step phase-shifting, fringe patterns projected by the projector; (x.sup.p, y.sup.p) is the pixel coordinate of the projector; W is the horizontal resolution of the projector; f is the frequency of phase-shifting fringe patterns; a DLP projector is used to project four sets of three-step phase-shifting fringe patterns onto the tested objects; the frequencies of four sets of three-step phase-shifting fringe patterns are 1, 8, 32, and 64, respectively; each set of three fringe patterns has the same frequency; the projected fringe patterns are captured by the camera simultaneously; the acquired three-step phase-shifting fringe images are represented as:
I.sub.1(x, y)=A(x, y)+B(x, y)cos[Φ(x, y)]
I.sub.2(x, y)=A(x, y)+B(x, y)cos[Φ(x, y)+2π/3]
I.sub.3(x, y)=A(x, y)+B(x, y)cos[Φ(x, y)+4π/3] where I.sub.1(x, y), I.sub.2 (x, y), I.sub.3 (x, y) are three-step phase-shifting fringe images; (x, y) is the pixel coordinate of the camera; A(x, y) is the average intensity; B(x, y) is the intensity modulation; Φ(x, y) is the phase distribution of the measured object.

3. According to claim 2, a deep learning-based temporal phase unwrapping method for fringe projection profilometry is characterized by step two wherein the wrapped phase φ(x, y) can be obtained as: φ ( x , y ) = arctan 3 ( I 1 ( x , y ) - I 3 ( x , y ) ) 2 I 2 ( x , y ) - I 1 ( x , y ) - I 3 ( x , y ) due to the truncation effect of the arctangent function, the obtained phase φ(x, y) is wrapped within the range of [0, 2π], and its relationship with Φ)(x, y) is:
Φ(x, y)=φ(x, y)+2πk(x, y) where k(x, y) represents the fringe order of Φ(x, y), and its value range is from 0 to N−1; N is the period number of the fringe patterns (i;e;, N=f).

4. According to claim 1, a deep learning-based temporal phase unwrapping method for fringe projection profilometry is characterized by step three wherein the distribution range of the absolute phase map with unit frequency is [0, 2π], so the wrapped phase map with unit frequency is an absolute phase map; by using a multi-frequency temporal phase unwrapping (MF-TPU) algorithm, an absolute phase map with a frequency of 8 can be unwrapped with the aid of the absolute phase map with unit frequency; an absolute phase map with a frequency of 32 can be unwrapped with the aid of the absolute phase map with a frequency of 8; an absolute phase map with a frequency of 64 can be unwrapped with the aid of the absolute phase map with a frequency of 32; the absolute phase map can be calculated by the following formula: k h ( x , y ) = Round ( ( f h / f l ) Φ l ( x , y ) - φ h ( x , y ) 2 π ) Φ h ( x , y ) = φ h ( x , y ) + 2 π k h ( x , y ) where f.sub.h is the frequency of high-frequency fringe images; f.sub.l is the frequency of low-frequency fringe images; φ.sub.h(x, y) is the wrapped phase map of high-frequency fringe images; k.sub.h(x, y) is the fringe order map of high-frequency fringe images; Φ.sub.h(x, y) is the absolute phase map of high-frequency fringe images; Φ.sub.l(x, y) is the absolute phase map of low-frequency fringe images; round( ) is the rounding operation.

5. According to claim 2, a deep learning-based temporal phase unwrapping method for fringe projection profilometry is characterized by step four wherein a residual convolutional neural network is built, consisting of six modules, including convolutional layers, pooling layers, concatenate layers, residual blocks, and upsampling blocks; next, after the network is built, steps one to three are repeatedly performed to obtain multiple sets of data, which are divided into a training dataset, a validation dataset, and a test dataset; for the residual convolutional neural network, the input data are set to be the wrapped phase maps with frequencies of 1 and 64, and the output data are set to be the fringe order map of the high-frequency phase with frequencies of 64; to monitor the accuracy of the trained neural networks on data that they have never seen before, a validation dataset is created that is separate from the training scenarios; before training the residual convolutional neural network, the acquired data is preprocessed; because the fringe image obtained by the camera contains the background and the tested object, and the background is removed by the following equation: M ( x , y ) = 2 3 3 ( I 1 ( x , y ) - I 3 ( x , y ) ) 2 + ( 2 I 2 ( x , y ) - I 1 ( x , y ) - I 3 ( x , y ) ) 2 where M(x, y) is the intensity modulation in actual measurement; the modulation corresponding to the points belonging to the background in the image is much smaller than the modulation corresponding to the points of the measured objects, and the background in the image can be removed by setting a threshold value; the data after the background removal operation is used as the dataset of the residual convolutional neural network for training; in the network configuration, the loss function is set as mean square error (MSE), the optimizer is Adam, and the training epoch is set as 500; the training dataset is used to train the residual convolutional neural network, the validation dataset is used to verify the performance of the trained network.

6. According to claim 1, a deep learning-based temporal phase unwrapping method for fringe projection profilometry is characterized by step five wherein the residual convolutional neural network predicts the output data based on the input data in the test dataset; by comparing the real output data in the test dataset with the output data predicted by the network, the comparison results are used to evaluate the accuracy of the network.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0008] FIG. 1 shows the schematic diagram of a deep learning-based temporal phase unwrapping method for fringe projection profilometry.

[0009] FIG. 2 shows the schematic diagram of the 3D measurement system proposed in the invention.

[0010] FIG. 3 shows the structure of the deep learning-based residual convolutional neural network proposed in the invention.

[0011] FIG. 4 shows the training and validation loss curve of the residual convolutional neural network after 500 rounds.

[0012] FIG. 5 shows a fringe order map of the phase for a sample data in the test dataset. (a) a fringe order map of the phase with a frequency of 64 obtained using two sets of wrapped phase maps based on the MF-TPU algorithm. (b) a fringe order map of the phase with a frequency of 64 obtained using two sets of wrapped phase maps based on the deep learning-based method. (c) a fringe order map of the phase with a frequency of 64 obtained using four sets of wrapped phase maps based on the MF-TPU algorithm. (d) the difference between (a) and (c). (e) the difference between (b) and (c).

[0013] FIG. 6 shows 3D results of a sample data in the test dataset. (a) 3D results obtained using two sets of wrapped phase maps based on the MF-TPU algorithm. (b) 3D results obtained using two sets of wrapped phase maps based on the deep learning-based method. (c) 3D results obtained using four sets of wrapped phase maps based on the MF-TPU algorithm.

DESCRIPTION OF THE PREFERRED EMBODIMENT

[0014] The invention is based on a deep learning-based temporal phase unwrapping method for fringe projection profilometry. The steps of the invention are as follows: step one, four sets of three-step phase-shifting fringe patterns with different frequencies are projected to the tested objects. Each set of patterns contains three fringe patterns with the same frequency and different initial phase. Any set of three-step phase-shifting fringe patterns projected by the projector can be represented as:


I.sub.1.sup.p)(x.sup.p, y.sup.p)=128+127 cos[2πf x.sup.p/W]


I.sub.2.sup.p(x.sup.p, y.sup.p)=128+127 cos[2πf x.sup.p/W+2π/3]


I.sub.3.sup.p(x.sup.p, y.sup.p)=128+127 cos[2πf x.sup.p/W+4π/3]

where I.sub.1.sup.p(x.sup.p, y.sup.p), I.sub.2.sup.p(x.sup.p, y.sup.p), I.sub.3.sup.p(x.sup.p, y.sup.p) are three-step phase-shifting fringe patterns projected by the projector. (x.sup.p, y.sup.p) is the pixel coordinate of the projector. W is the horizontal resolution of the projector. f is the frequency of phase-shifting fringe patterns. A DLP projector is used to project four sets of three-step phase-shifting fringe patterns onto the tested objects. The frequencies of four sets of three-step phase-shifting fringe patterns are 1, 8, 32, and 64, respectively. Each set of three fringe patterns has the same frequency. The projected fringe patterns are captured by the camera simultaneously. The acquired three-step phase-shifting fringe images are represented as:


I.sub.1(x, y)=A(x, y)+B(x, y)cos[Φ(x, y)]


I.sub.2(x, y)=A(x, y)+B(x, y)cos[Φ(x, y)+2π/3]


I.sub.3(x, y)=A(x, y)+B(x, y)cos[Φ(x, y)+4π/3]

where I.sub.1(x, y), I.sub.2(x, y), I.sub.3(x, y) are three-step phase-shifting fringe images. (x, y) is the pixel coordinate of the camera. A(x, y) is the average intensity. B (x, y) is the intensity modulation. ψ(x, y) is the phase distribution of the measured object.

[0015] step two, the wrapped phase Φ(x, y) can be obtained as:

[00001] φ ( x , y ) = arctan 3 ( I 1 ( x , y ) - I 3 ( x , y ) ) 2 I 2 ( x , y ) - I 1 ( x , y ) - I 3 ( x , y )

Due to the truncation effect of the arctangent function, the obtained phase φ(x, y) is wrapped within the range of [0,2π], and its relationship with Φ(x, y) is:


Φ(x, y)=φ(x, y)+2πk(x, y)

where k(x, y) represents the fringe order of Φ(x, y), and its value range is from 0 to N−1. N is the period number of the fringe patterns (i.e., N=f).

[0016] step three, the distribution range of the absolute phase map with unit frequency is [0, 2π], so the wrapped phase map with unit frequency is an absolute phase map. By using a multi-frequency temporal phase unwrapping (MF-TPU) algorithm, an absolute phase map with a frequency of 8 can be unwrapped with the aid of the absolute phase map with unit frequency. An absolute phase map with a frequency of 32 can be unwrapped with the aid of the absolute phase map with a frequency of 8. An absolute phase map with a frequency of 64 can be unwrapped with the aid of the absolute phase map with a frequency of 32. The absolute phase map can be calculated by the following formula:

[00002] k h ( x , y ) = Round ( ( f h / f l ) Φ l ( x , y ) - φ h ( x , y ) 2 π ) Φ h ( x , y ) = φ h ( x , y ) + 2 π k h ( x , y )

where f.sub.h is the frequency of high-frequency fringe images. f.sub.l is the frequency of low-frequency fringe images. φ.sub.h(x, y) is the wrapped phase map of high-frequency fringe images, k.sub.h(x, y) is the fringe order map of high-frequency fringe images. Φ.sub.h(x, y) is the absolute phase map of high-frequency fringe images, Φ.sub.l(x, y) is the absolute phase map of low-frequency fringe images. Round( ) is the rounding operation. Based on the principle of the multi-frequency temporal phase unwrapping (MF-TPU) algorithm, the absolute phase can be obtained theoretically by directly using the absolute phase with unit-frequency to assist in unwrapping the wrapped phase with a frequency of 64. Due to the non-negligible noises and other error sources in actual measurement, a multi-frequency temporal phase unwrapping (MF-TPU) algorithm cannot be used to unwrap the high-frequency wrapped phase map with frequencies of 64 using the low-frequency wrapped phase map with frequencies of 1. The result has a large number of error points. Therefore, the multi-frequency temporal phase unwrapping (MF-TPU) algorithm generally use multiple sets of wrapped phase maps with different frequencies to unwrap sequentially the high-frequency wrapped phase map, which finally obtains the absolute phase with frequencies of 64. It is obvious that the multi-frequency temporal phase unwrapping (MF-TPU) algorithm consumes a lot of time and cannot achieve fast and high-precision 3D measurements based on fringe projection profilometry.

[0017] step four, a residual convolutional neural network is built to implement phase unwrapping. Steps one to three are repeatedly performed to obtain multiple sets of data, which are divided into a training dataset, a validation dataset, and a test dataset. The training dataset is used to train the residual convolutional neural network. The validation dataset is used to verify the performance of the trained network. Firstly, a residual convolutional neural network is built to implement phase unwrapping, and FIG. 3 shows the structure of the neural network. From FIG. 3. a residual convolutional neural network consists of six modules, including convolutional layers, pooling layers, concatenate layers, residual blocks. and upsampling blocks. Among these modules, convolutional layers, pooling layers, and concatenate Layer are common modules in traditional convolutional neural networks. The convolutional layer consists of multiple convolutional kernels. The number of convolutional kernels is the number of channels in the convolutional layer, and each convolutional kernel independently performs a convolution operation on the input data to generate the output tensor. The pooling layer compresses the input tensor and extracts the main features of the input tensor, which simplifies the network computational complexity and prevents overfitting. The common pooling layers are averagepooling layer and maxpooling layer. In our network, ½ downsampling, ¼ downsampling, and ⅛ downsampling are performed on the input tensor using the maximum pooling layer. The concatenate layer fuses the input tensor of each path. In addition, four residual blocks are used in each path of the network to solve the problem of gradient disappearance in the deep network and prevent overfittiing and to accelerate the loss convergence of the network (He, K., Zhang, X., Ren, S., Sun, J. Deep Residual Learning for Image Recognition. Proceedings of the IEEE conference on computer vision and pattern recognition. 770-778 (2016).). Each residual block contains two convolutional layers and two activation functions (ReLU). Due to the maxpooling layer, it leads to the problem of inconsistent size of the tensor in each path. Therefore, different numbers of upsampling blocks are used for different paths to make the size of the tensor in each path consistent. The upsampling block consists of a convolutional layer, an activation function (ReLU), and a subpixel Layer. The subpixel layer uses rich channel data of the input tensor to upsample the tensor in the spatial dimension so that the number of channels of the tensor is 1/4 size of the original and the horizontal and vertical dimensions of the tensor size are twice size of the original (Shi, W., Caballero, J., Huszár, F., et al. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. Proceedings of the IEEE conference on computer vision and pattern recognition. 1874-1883 (2016).)

[0018] Although these modules used in the network are existing, the innovation of the invention lies in how to use the existing modules to build a network model that enables phase unwrapping, as shown in FIG. 3. After the model of the network is built, the input data of the network is set to the wrapped phase maps with frequencies of I and 64 obtained in step two, and the output data of the network is set to the fringe order map of the phases with frequencies of 64 obtained in step three, instead of the absolute phases with frequencies of 64. Because the absolute phase is the sum of the fringe order of the phase and the wrapped phase map, the fringe order of the phase is only needed to obtain the absolute phase. In addition, the data type of the fringe order of the phase is integer, while the data type of the absolute phase map is floating. The fringe order of the phase as the output data of the network will reduce the complexity of the network and make the loss of the network converge faster, thus effectively g the output accuracy of the network. Next, steps one to three are repeatedly performed to obtain multiple sets of data, which are divided into a training dataset, a validation dataset, and a test dataset. For the residual convolutional neural network, the input data are set to be the wrapped phase maps with frequencies of 1 and 64, and the output data are set to be the fringe order map of the high-frequency phase with frequencies of 64. To monitor the accuracy of the trained neural networks on data that they have never seen before, a validation dataset is created that is separate from the training scenarios. Before training the residual convolutional neural network, the acquired data is preprocessed. Because the fringe image obtained by the camera contains the background and the tested object, and the background is removed by the following equation:

[00003] M ( x , y ) = 2 3 3 ( I 1 ( x , y ) - I 3 ( x , y ) ) 2 + ( 2 I 2 ( x , y ) - I 1 ( x , y ) - I 3 ( x , y ) ) 2

where M(x, y) is the intensity modulation in actual measurement. The modulation corresponding to the points belonging to the background in the image is much smaller than the modulation corresponding to the points of the measured objects, and the background in the image can be removed by setting a threshold value. The data after the background removal operation is used as the dataset of the residual convolutional neural network for training. In the network configuration, the loss function is set as mean square error (MSE), the optimizer is Adam, the size of mini-batch is 2, and the training epoch is set as 500. To avoid over-fitting as the common problem of the deep neural network, L2 regularization is adopted in each convolution layer of residual blocks and upsampling blocks instead of all convolution layers of the proposed network, which can enhance the generalization ability of the network. The training dataset is used to train the residual convolutional neural network. The validation dataset is used to verify the performance of the trained network.

[0019] step five, the residual convolutional neural network predicts the output data based on the input data in the test dataset. By comparing the real output data in the test dataset with the output data predicted by the network, the comparison results are used to evaluate the accuracy of the network. Due to the non-negligible noises and other error sources in actual measurement, a multi-frequency temporal phase unwrapping (MF-TPU) algorithm cannot be used to unwrap the high-frequency wrapped phase map with frequencies of 64 using the low-frequency wrapped phase map with frequencies of 1. The result has a large number of error points. The invention uses a deep learning approach to achieve temporal phase unwrapping. Compared with the multi-frequency temporal phase unwrapping (MF-TPU) algorithm, a residual convolutional neural network is used to implement phase unwrapping, which exploits the low-frequency wrapped phase map with frequencies of 1 to unwrap the high-frequency wrapped phase map with frequencies of 64. The absolute phase map with fewer phase errors and higher accuracy can be obtained by using this method.

Example of Implementation

[0020] To verify the actual performance of the proposed method described in the invention, a monochrome camera (Basler acA640-750um with the resolution of 640×480), a DLP projector (LightCrafter 4500Pro with the resolution of 912×1140), and a computer are used to construct a 3D measurement system based on a deep learning-based temporal phase unwrapping method for fringe projection profilometry, as shown in FIG. 2. The system captures the images at the speed of 25 Hz when measuring 3D profiles of objects. According to step one, four sets of three-step phase-shifting fringe patterns with different frequencies (including 1, 8, 32, and 64) are projected to the tested objects. The projected fringe patterns are captured by the camera simultaneously to acquire four sets of three-step phase-shifting fringe images. According to step two, the three-step phase-shifting fringe images acquired by the camera are processed to obtain the wrapped phase map using a three-step phase-shifting algorithm. According to step three, a multi-frequency temporal phase unwrapping (MF-TPU) algorithm is used to unwrap four wrapped phase maps successively to obtain a fringe order map and an absolute phase map for the high-frequency phase with 64 periods. According to step four, a residual convolutional neural network is built to implement phase unwrapping. Steps one to three are repeatedly performed to obtain 1100 sets of data, of which 800 sets of data are used as the training dataset, 200 sets of data as the validation dataset, and 100 sets of data as the test dataset. To monitor the accuracy of the trained neural networks on data that they have never seen before, a validation dataset is created that is separate from the training scenarios. The data after the background removal operation is used as the dataset of the residual convolutional neural network for training. In the network configuration, the loss function is set as mean square error (MSE), the optimizer is Adam, the size of mini-batch is 2, and the training epoch is set as 500. FIG. 4 shows the training and validation loss curve of the residual convolutional neural network after 500 rounds. FIG. 4 shows that the network stops converging after 250 rounds. The loss value of the final training dataset is about 0.0058 and the loss value of the final validation dataset is about 0.0087. According to step five, the trained residual convolutional neural network is used to predict the output data based on the input data in the test dataset for evaluating the accuracy of the network. A comparative experiment is implemented for a sample data in the test dataset, and the results are shown in FIG. 5. FIG. 5 shows a fringe order map of the phase for a sample data in the test dataset. FIG. 5(a) is a fringe order map of the phase with a frequency of 64 obtained using two sets of wrapped phase maps based on the MF-TPU algorithm. FIG. 5(b) is a fringe order map of the phase with a frequency of 64 obtained using two sets of wrapped phase maps based on the deep learning-based method. FIG. 5(c) is a fringe order map of the phase with a frequency of 64 obtained using four sets of wrapped phase maps based on the MF-TPU algorithm. FIG. 5(d) is the difference between FIG. 5(a) and FIG. 5(c), the number of error points is 8909. FIG. 5(e) is the difference between FIG. 5(b) and FIG. 5(c), the number of error points is 381. Compared with the multi-frequency temporal phase unwrapping (MF-TPU) algorithm, it can be proved in FIG. 5 that the absolute phase map with fewer phase errors and higher accuracy can be obtained by using the deep leaning-based method proposed in the invention. FIG. 6 shows 3D results of a sample data in the test dataset. FIG. 6(a) shows 3D results obtained using two sets of wrapped phase maps based on the MF-TPU algorithm. FIG. 6 (b) shows 3D results obtained using two sets of wrapped phase maps based on the deep learning-based method. FIG. 6(c) shows 3D results obtained using four sets of wrapped phase maps based on the MF-TPU algorithm. The results in FIG. 6 further demonstrate that high-precision 3D measurements can be achieved without increasing the number of patterns projected by the projector, which improves the measurement efficiency.