DEPTH ESTIMATION AND COLOR CORRECTION METHOD FOR MONOCULAR UNDERWATER IMAGES BASED ON DEEP NEURAL NETWORK

Abstract

The invention discloses a depth estimation and color correction method for monocular underwater images based on deep neural network, which belongs to the field of image processing and computer vision. The framework consists of two parts: style transfer subnetwork and task subnetwork. The style transfer subnetwork is constructed based on generative adversarial network, which is used to transfer the apparent information of underwater images to land images and obtain abundant and effective synthetic labeled data, while the task subnetwork combines the underwater depth estimation and color correction tasks with the stack network structure, carries out collaborative learning to improve their respective accuracies, and reduces the gap between the synthetic underwater image and the real underwater image through the domain adaptation strategy, so as to improve the network's ability to process the real underwater image.

Claims

1. A method for depth estimation and color correction of monocular underwater images based on deep neural network, wherein the method comprises the following steps: (1) preparing initial data: the initial data is the land labeled dataset, including the land color map and the corresponding depth map for training; in addition, a small number of real underwater color images are collected to assist the training and testing; (2) construction of the style transfer subnetwork; (2-1) the style transfer subnetwork is constructed based on generative adversarial network model, in which the generator uses the U-Net structure, which is composed of an encoder and a decoder; (2-2) the discriminator consists of three parts; the first part is a module composed of Cony and Leaky Rectified Linear unit; the second part is three modules composed of Cony, BN and Leaky ReLU; the third part is a sigmoid function layer that is used to output the test results; (2-3) the style loss function and the content loss function are used to preserve the content and transform the style, and the total loss function of the whole style transfer subnetwork is constructed; (3) the construction of the task subnetwork; (3-1) depth estimation and color correction are separately realized by using two generative adversarial networks, in which the structure of generator and discriminator is the same as that of generator and discriminator in style transfer subnetwork; on this basis, the depth estimation generator and color correction generator are connected in series to form a stacked network structure; (3-2) two discriminators are used to realize the domain adaptation between the synthetic underwater image and the real underwater image; (3-3) constructing the total loss function of the entire task subnetwork; (4) training the whole network composed by (2) and (3); (4-1) the land labeled data and underwater real data are used to train the style transfer subnetwork, and then a convergent training model is obtained, so as to obtain effective synthetic underwater labeled data; (4-2) the synthetic underwater labeled dataset obtained by style transfer subnetwork is used to train the task subnetwork; real underwater images are simultaneously added to train together, so as to reduce the difference between real underwater domain and synthetic underwater domain and improve the network's ability to process real underwater images; (4-3) the two networks are connected in series according to the order of style transfer subnetwork and task subnetwork, and the total loss function is used for unified training and fine-tuning the whole network framework; when the training is completed, the trained model can be used for testing on the test set to obtain the output result of the corresponding input image.

2. The method for depth estimation and color correction of monocular underwater images based on deep neural network according to claim 1, wherein the construction of style transfer subnetwork includes the following steps: (2-1) the style transfer subnetwork is constructed based on the generative adversarial network model, in which the generator uses U-Net structure and the encoder is composed of four similar modules, each module containing a dense connection layer and a transition layer; the dense connection layer is composed of three dense blocks, and the transition layer is composed of batch standardization, Rectified Linear unit, convolution and average pooling; the decoder is composed of four symmetric modules, each of which is a combination of deconvolution, BN and ReLU; (2-2) the discriminator consists of three parts; the first part is a module composed of Cony and Leaky Rectified Linear unit; the second part is three modules composed of Cony, BN and Leaky ReLU; the third part is a sigmoid function layer that is used to output the test results; (2-3) the style loss function and the content loss function are used to preserve the content and transform the style; the formula of the style loss function L.sub.sty is shown as follows: $L_{sty} = \underset{l ϵ L_{s}}{.Math.} {.Math. {��}^{l} (x_{t}) - {��}^{l} (G_{s} (y_{s}, d_{s})) .Math.}_{2}^{2}$ in which, G.sub.s represents the generator, L.sub.s represents all the layers that need to be paid attention to in the style loss function, custom-character .sup.l represents the style representation of the l layer, x.sub.t represents the real image, y.sub.s represents the land color image, d.sub.s represents the corresponding depth map, and ∥⋅∥.sub.2.sup.2 represents the square of L2 norm; content loss function L.sub.con is shown as follows: $L_{c o n} = \underset{l ϵ L_{c}}{.Math.} {.Math. \emptyset^{l} (y_{s}) - \emptyset^{l} (G_{s} (y_{s}, d_{s})) .Math.}_{2}^{2}$ in which, L.sub.c represents all the layers that need to be paid attention to in the content loss function, Ø.sup.l represents the feature map of the l layer; thus, the total loss function L.sub.SAN of the entire style transfer subnetwork is:
L.sub.SAN=L.sub.adv.sub.s+λ.sub.aL.sub.sty+λ.sub.bL.sub.con in which, L.sub.adv.sub.s represents the generative adversarial loss function of the style transfer subnetwork part, which is a common loss function in the generative adversarial network; λ.sub.a and λ.sub.b represent the weight parameters, both of which have the value as 1.

3. The method for depth estimation and color correction of monocular underwater images based on deep neural network according to claim 2, wherein, in step (2-1), in order to obtain multi-scale information, a multi-scale module is added to the structure of the whole generator at the end.

4. The method for depth estimation and color correction of monocular underwater images based on deep neural network according to claim 1, wherein the construction of task subnetwork includes the following steps: (3-1) depth estimation and color correction are separately realized by using two generative adversarial networks, in which the structure of generator and discriminator is the same as that of generator and discriminator in style transfer subnetwork; on this basis, the depth estimation generator and color correction generator are connected in series to form a stacked network structure; (3-2) two discriminators are used to realize the domain adaptation between the synthetic underwater image and the real underwater image, which can enhance the network's ability to process the real underwater image, so as to solve the domain adaptation problem at the feature level; the structure of the domain adaptive discriminator is the same as that of the discriminator in (3-1); each discriminator has a special loss function to solve the domain adaptation at the feature level; the formula is shown as follows:
L.sub.fd= custom-character .sub.fx.sub.t.sub.˜fx.sub.t[log D.sub.fd(f.sub.x.sub.t)]+.sub.fx.sub.s.sub.˜fx.sub.s[log(1−D.sub.fd(f.sub.x.sub.s))] in which, L.sub.fd represents the domain discriminant loss function of depth estimation task, D.sub.fd represents the discriminator of depth estimation task, IE represents expectation, f represents the feature map obtained from the last translation layer of generator, x.sub.t represents real underwater images, x.sub.s represents synthetic images, X.sub.t represents real underwater images dataset, X.sub.s represents synthetic image dataset, f.sub.x.sub.t represents the feature map of x.sub.t, f.sub.x.sub.s represents the feature map of x.sub.s, custom-character .sub.fx.sub.t.sub.˜fx.sub.t represents the expectation that satisfies with the domain X.sub.t, .sub.fx.sub.s.sub.˜fx.sub.s represents the expectation that satisfies with the domain X.sub.s; the formula of the domain discriminant loss function of color correction task is as follows:
L.sub.fc= custom-character .sub.fx.sub.t.sub.˜fx.sub.t[log D.sub.fc(f.sub.x.sub.t)]+.sub.fx.sub.s.sub.˜fx.sub.s[log(1−D.sub.fc(f.sub.x.sub.s))] in which, L.sub.fc represents the domain discriminant loss function of color correction task, D.sub.fc represents the discriminator of color correction task. (3-3) constructing the total loss function of the entire task subnetwork; the task loss function is designed to make the predicted image approximate to the actual image and promote correct regression; the formula is as follows:
L.sub.t=∥d.sub.s−G.sub.d(x.sub.s)∥.sub.1+∥.sub.y.sub.s−G.sub.c(G.sub.d(x.sub.s))∥.sub.1 in which, L.sub.t represents the required loss function, G.sub.d and G.sub.c represent the generators for depth estimation and color correction respectively, x.sub.s represents the synthesized underwater data, d.sub.s represents the actual depth map corresponding to the synthesized underwater data, y.sub.s represents the actual land image corresponding to the synthesized underwater data, ∥⋅∥.sub.1 represents the L1 norms; the total loss of the entire task network is L.sub.TN:
L.sub.TN=L.sub.adv.sub.d+L.sub.adv.sub.c+λ.sub.tL.sub.t+λ.sub.dL.sub.fd+λ.sub.cL.sub.fc in which, L.sub.adv.sub.d and L.sub.adv.sub.c represent the generation adversarial losses of depth estimation and color correction parts, respectively, which are common losses in the generation adversarial network; λ.sub.t, λ.sub.d and λ.sub.c represent balance coefficients, with values of 10, 0.1 and 0.1, respectively.

Description

DESCRIPTION OF DRAWINGS

[0025] FIG. 1 is the actual flow chart.

[0026] FIG. 2 is a schematic diagram of the network structure. In which, L.sub.con is Content loss function; G.sub.s is a generator in the style transfer subnetwork; L.sub.adv.sub.s is the generative adversarial loss function of the style transfer subnetwork part; D.sub.s is the discriminator in the style transfer subnetwork; L.sub.sty is the style loss function; D.sub.fd is a domain adaptive discriminator for depth estimation in task subnetworks; L.sub.fd is a domain discriminant loss function of depth estimation task in task subnetwork; G.sub.d is the depth estimation generator in the task subnetwork; D.sub.fc is represents the domain adaptive discriminator of color correction in task subnetwork; L.sub.fc is the domain discrimination loss of color correction task in task subnetwork; G.sub.c is a color correction generator in task subnetwork.

[0027] FIG. 3 is the results of color correction compared with other methods. (a) Different underwater image; (b) FIP method; (c) CBF method; (d) R-cycle method; (e) Pix2Pix method; (f) Results of the present invention.

[0028] FIG. 4 is the results of depth estimation compared with other methods. (a) Different underwater image; (b) Laina method; (c) Results of the present invention.

DETAILED DESCRIPTION

[0029] Specific embodiment of the present invention is further described below in combination with accompanying drawings and the technical solution:

[0030] A method for depth estimation and color correction from monocular underwater images based on deep neural network, as shown in FIG. 1, which includes the following steps:

[0031] (1) Preparing initial data;

[0032] (1-1) Three representative real underwater datasets are used, including two video datasets (R. Liu, X. Fan, M. Zhu, M. Hou, and Z. Luo, “Real-world underwater enhancement: Challenges, benchmarks, and solutions,” arXiv preprint arXiv: 1901.05320, 2019) and one image dataset (C. Li, C. Guo, W. Ren, R. Cong, J. Hou, S. Kwong, and D. Tao, “An underwater image enhancement benchmark dataset and beyond”, arXiv preprint arXiv: 1901.05495, 2019). The videos in the two video datasets are split to obtain about 500 frames of real underwater images. The latter image dataset contains about 100 images.

[0033] (1-2) Using NYU RGB-D v2 dataset (N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from rgbd images”, in ECCV, 2012, pp. 746-760) as the land dataset of this invention, which contains 1449 land color images and their corresponding. This invention uses 795 image pairs for training and 654 for testing.

[0034] (2) The construction of the style transfer subnetwork;

[0035] (2-1) The style transfer subnetwork is constructed based on the generative adversarial network model, in which the generator uses U-Net structure (O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation”, in MICCAI, 2015, pp. 234-241.) and the encoder is composed of four similar modules, each module containing a dense connection layer (G. Huang, Z. Liu, L. V. D. Maaten, and K. Q. Weinberger, “Densely connected convolutional networks”, in IEEE CVPR, 2017, pp. 2261-2269.) and a transition layer. The dense connection layer is composed of three dense blocks, and the transition layer is composed of batch standardization (BN), Rectified Linear unit (ReLU), convolution (Cony) and average pooling. The decoder is composed of four symmetric modules, each of which is a combination of deconvolution (DConv), BN and ReLU. In order to obtain multi-scale information, the invention adds a multi-scale module at the end of the whole generator structure (L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE TPAMI, vol. PP, no. 99, pp. 1-1, 2017.).

[0036] (2-2) The discriminator consists of three parts. The first part is a module composed of Cony and Leaky Rectified Linear unit (Leaky ReLU). The second part is three modules composed of Cony, BN and Leaky ReLU. The third part is a sigmoid function layer that is used to output the test results.

[0037] (2-3) The style loss function and the content loss function are used to preserve the content and transform the style. The formula of the style loss function L.sub.sty is shown as follows:

[00001] $L_{sty} = \underset{l ϵ L_{s}}{.Math.} {.Math. {��}^{l} (x_{t}) - {��}^{l} (G_{s} (y_{s}, d_{s})) .Math.}_{2}^{2}$

in which, G.sub.s represents the generator, L.sub.s represents all the layers that need to be paid attention to in the style loss function, custom-character .sup.l represents the style representation of the l layer, x.sub.t represents the real image, y.sub.s represents the land color image, d.sub.s represents the corresponding depth map, and ∥⋅∥.sub.2.sup.2 represents the square of L2 norm.

[0038] Content loss function L.sub.con is shown as follows:

[00002] $L_{c o n} = \underset{l ϵ L_{c}}{.Math.} {.Math. \emptyset^{l} (y_{s}) - \emptyset^{l} (G_{s} (y_{s}, d_{s})) .Math.}_{2}^{2}$

in which, L.sub.c represents all the layers that need to be paid attention to in the content loss function, Ø.sup.l represents the feature map of the l layer.

[0039] Thus, the total loss function L.sub.sAN of the entire style transfer subnetwork is:

L.sub.SAN=L.sub.adv.sub.s+λ.sub.aL.sub.sty+λ.sub.bL.sub.con

in which, L.sub.adv.sub.s represents the generative adversarial loss function of the style transfer subnetwork part, which is a common loss function in the generative adversarial network. λ.sub.a and λ.sub.b represent the weight parameters, both of which have the value as 1.

[0040] (3) The construction of the task subnetwork;

[0041] (3-1) Depth estimation and color correction are separately realized by using two generative adversarial networks, in which the structure of generator and discriminator is the same as that of generator and discriminator in style transfer subnetwork. On this basis, the depth estimation generator and color correction generator are connected in series to form a stacked network structure.

[0042] (3-2) Two discriminators are used to realize the domain adaptation between the synthetic underwater image and the real underwater image, which can enhance the network's ability to process the real underwater image, so as to solve the domain adaptation problem at the feature level. The structure of the domain adaptive discriminator is the same as that of the discriminator in (3-1). Each discriminator has a special loss function to solve the domain adaptation at the feature level. The formula is shown as follows:

L.sub.fd= custom-character .sub.fx.sub.t.sub.˜fx.sub.t[log D.sub.fd(f.sub.x.sub.t)]+.sub.fx.sub.s.sub.˜fx.sub.s[log(1−D.sub.fd(f.sub.x.sub.s))]

in which, L.sub.fd represents the domain discriminant loss function of depth estimation task, D.sub.fd represents the discriminator of depth estimation task, custom-character represents expectation, f represents the feature map obtained from the last translation layer of generator, x.sub.t represents real underwater images, x.sub.s represents synthetic images, X.sub.t represents real underwater images dataset, X.sub.s represents synthetic image dataset, f.sub.x.sub.t represents the feature map of x.sub.t, f.sub.x.sub.s represents the feature map of x.sub.s, custom-character .sub.fx.sub.t.sub.˜fx.sub.t represents the expectation that satisfies with the domain X.sub.t, .sub.fx.sub.s.sub.˜fx.sub.s represents the expectation that satisfies with the domain X.sub.s.

[0043] The formula of the domain discriminant loss function of color correction task is as follows:

L.sub.fc= custom-character .sub.fx.sub.t.sub.˜fx.sub.t[log D.sub.fc(f.sub.x.sub.t)]+.sub.fx.sub.s.sub.˜fx.sub.s[log(1−D.sub.fc(f.sub.x.sub.s))]

in which, L.sub.fc represents the domain discriminant loss function of color correction task, D.sub.fc represents the discriminator of color correction task.

[0044] (3-3) Constructing the total loss function of the entire task subnetwork;

[0045] First, the task loss function is designed to make the predicted image approximate to the actual image and promote correct regression. The formula is as follows:

L.sub.t=∥d.sub.s−G.sub.d(x.sub.s)∥.sub.1+∥y.sub.s−G.sub.c(G.sub.d(x.sub.s))∥.sub.1

in which, L.sub.t represents the required loss function, G.sub.d and G.sub.c represent the generators for depth estimation and color correction respectively, x.sub.s represents the synthesized underwater data, d.sub.s represents the actual depth map corresponding to the synthesized underwater data, y.sub.s represents the actual land image corresponding to the synthesized underwater data, ∥⋅∥.sub.1 represents the L1 norms.

[0046] Thus, the total loss of the entire task network is L.sub.TN:

L.sub.TN=L.sub.adv.sub.d+L.sub.adv.sub.c+λ.sub.tL.sub.t+λ.sub.dL.sub.fd+λ.sub.cL.sub.fc

in which, L.sub.adv.sub.d and L.sub.adv.sub.c represent the generation adversarial losses of depth estimation and color correction parts, respectively, which are common losses in the generation adversarial network; λ.sub.t, λ.sub.d and λ.sub.c represent balance coefficients, with values of 10, 0.1 and 0.1, respectively. The entire network structure is shown in FIG. 2.

[0047] (4) Training the whole network composed by (2) and (3).

[0048] (4-1) First, the land paired data (NYU RGB-D V2) and underwater real data are used to train the style transfer subnetwork, and a convergent training model is obtained, so as to obtain effective synthetic underwater labeled dataset.

[0049] (4-2) Then, the synthetic underwater labeled dataset obtained by style transfer subnetwork is used to train the task subnetwork, and real underwater images are simultaneously added to train together, so as to reduce the difference between real underwater domain and synthetic underwater domain and improve the network's ability to process real underwater images.

[0050] (4-3) The two networks are connected in series according to the order of style transfer subnetwork and task subnetwork, and the total loss function L is used to train and fine-tune the whole network framework. The equation is shown as follows:

L=L.sub.SAN+L.sub.TN

During the training, the momentum parameter is set as 0.9. The learning rate is initialized to 2e-4 and decreases by 0.9 in each epoch. When the training is completed, the trained model can be used for testing on the test set to obtain the output result of the corresponding input image.

[0051] The comparison results of color correction with other methods are shown in FIG. 3: (a) Different real underwater images; (b) FIP method (Q. Chen, J. Xu, and V. Koltun, “Fast image processing with fully convolutional networks”, in IEEE ICCV, October 2017, pp. 2516-2525); (c) CBF method (C. O. Ancuti, C. Ancuti, V. C. De, and P. Bekaert, “Color balance and fusion for underwater image enhancement, “IEEE TIP, vol. 27, no. 1, pp. 379-393, 2018); (d) R-cycle method (C. Li, J. Guo, and C. Guo, “Emerging from water: Underwater image color correction based on weakly supervised color transfer”, IEEE Signal Processing Letters, vol. 25, no. 3, pp. 323-327, 2018); (e) Pix2Pix method (P. Isola, J. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks”, in IEEE CVPR, July 2017, pp. 5967-5976); (f) Results of the present invention.

[0052] The comparison results of depth estimation with other methods are shown in FIG. 4: (a) Different real underwater images; (b) Laina method (I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, and N. Navab, “Deeper depth prediction with fully convolutional residual networks”, in Fourth International Conference on 3d Vision, 2016, pp. 239-248); (c) Results of the present invention.

[0053] The results show that we get the best results in both depth estimation and color correction tasks.

DEPTH ESTIMATION AND COLOR CORRECTION METHOD FOR MONOCULAR UNDERWATER IMAGES BASED ON DEEP NEURAL NETWORK

Inventors

Cpc classification

Classification Explorer

G06T7/55

PHYSICS

Classification Explorer

G06V10/82

PHYSICS

Classification Explorer

G06F18/214

PHYSICS

Classification Explorer

H04N1/60

ELECTRICITY

Classification Explorer

G06T2207/10024

PHYSICS

Classification Explorer

G06T2207/20084

PHYSICS

Classification Explorer

G06F18/213

PHYSICS

Classification Explorer

G06V20/05

PHYSICS

Classification Explorer

G06T7/50

PHYSICS

Classification Explorer

G06T2207/20081

PHYSICS

International classification

Classification Explorer

G06K9/62

PHYSICS

Classification Explorer

G06T7/55

PHYSICS

Classification Explorer

H04N1/60

ELECTRICITY

Abstract

Claims

Description