ACCELERATION METHOD OF DEPTH ESTIMATION FOR MULTIBAND STEREO CAMERAS

Abstract

The present invention belongs to the field of image processing and computer vision, and discloses an acceleration method of depth estimation for multiband stereo cameras. In the process of depth estimation, during binocular stereo matching in each band, through compression of matched images, on one hand, disparity equipotential errors caused by binocular image correction can be offset to make the matching more accurate, and on the other hand, calculation overhead is reduced. In addition, before cost aggregation, cost diagrams are transversely compressed and sparsely matched, thereby reducing the calculation overhead again. Disparity diagrams obtained under different modes are fused to obtain all-weather, more complete and more accurate depth information.

Claims

1. An acceleration method of depth estimation for multiband stereo cameras, comprising the following steps: step 1, calibrating respective internal and external parameters of multiband binocular cameras, wherein the internal parameters comprise a focal length and an optical center, and the external parameters comprise rotation and translation; correcting binocular images outputted by the binocular cameras in different bands into a parallel equipotential relationship; and jointly calibrating the multiband binocular cameras to obtain position relationship parameters among devices of different bands, comprising rotation and translation; step 2, registering the multiband binocular cameras to obtain a coordinate conversion matrix of corresponding pixels among images collected by the devices of different bands, i.e., a homography matrix; denoising a binocular image pair in each band; and conducting longitudinal compression on the image in each band to save matching efficiency; step 3, matching the binocular images according to a semi-global matching (SGM) method to obtain respective initial cost diagrams of the multiband binocular images; compressing the initial cost diagrams to increase propagation efficiency; conducting energy propagation on the compressed sparse cost diagrams to correct and optimize error matching; finding a disparity corresponding to minimum energy for each pixel position according to an energy propagation diagram to obtain a disparity diagram; step 4, optimizing the disparity diagram, recovering the optimized disparity into an original scale through an upsampling method, and obtaining a final depth map according to a disparity fusion method.

2. The acceleration method of depth estimation for multiband stereo cameras according to claim 1, wherein in step 2, the longitudinal compression method of conducting longitudinal compression on the image in each band is: two adjacent rows take a mean value and compressed into one row, i.e.,
I(x,y quotient 2)=(I(x,y)+I(x,y+1))/2 wherein quotient represents quotient operation; I is an image to be compressed, (x,y quotient 2) is the image coordinate after compression, and (x,y) is the image coordinate before compression.

3. The acceleration method of depth estimation for multiband stereo cameras according to claim 1, wherein in step 3, the specific mode of compressing the initial cost diagram is to conduct transverse down-sampling for the initial cost diagram and keep data in alternate columns.

4. The acceleration method of depth estimation for multiband stereo cameras according to claim 1, wherein in step 3, energy propagation is conducted on the sparse cost diagram to obtain an energy diagram and then to obtain the disparity diagram by a principle “Winner takes all”. the energy is described as follows: $\begin{matrix} E (D) = \underset{p}{.Math.} C (p, D_{p}) + \underset{q \in N_{p}}{.Math.} P_{1} T [.Math. D_{p} - D_{q} .Math. = 1] + \underset{q \in N_{p}}{.Math.} P_{2} T [.Math. D_{p} - D_{q} .Math. > 1]] & (3) \end{matrix}$ wherein C(p,D.sub.p) is the cost at position p when the disparity is D.sub.p, T[⋅] is an indicator function, and the output is 1 when the input satisfies the conditions within [ ], otherwise, is 0; P.sub.1 and P.sub.2 are penalty terms; D.sub.q is a disparity value at position q; according to formula (4), in accordance with the global structure of the image, the cost distribution information of surrounding pixels is transmitted to the center pixel from multiple directions. $\begin{matrix} L_{r} (p, d) = C (p, d) + \min (\begin{matrix} L_{r} (p - r, d), L_{r} (p - r, d - 1) + P_{1}, \\ L_{r} (p - r, d + 1) + P_{1}, \\ \min_{i} L_{r} (p - r, i) + P_{2} \end{matrix}) - \min_{k} L_{r} (p - r, k) & (4) \end{matrix}$ L.sub.r(p,d) is aggregation energy when the disparity at position p is d, and r is a transfer direction; after energy propagation, a tensor of size height(H)×width(W)×maximum disparity(D), i.e., the energy propagation diagram, is obtained; a disparity corresponding to minimum energy is found for each pixel position according to an energy propagation diagram, which is the integer disparity d(x,y) of the pixel; $\begin{matrix} d (x, y) = \arg \min_{i \in D_{\max}} energy (i) & (5) \end{matrix}$ the energy (⋅) is the energy after aggregation; a subpixel level disparity diagram is calculated by using the energy diagram and an integral pixel disparity diagram.

Description

DESCRIPTION OF DRAWINGS

[0018] FIG. 1 is an overall flow chart of a solution;

[0019] FIG. 2 shows a detailed flow of a disparity fusion module; and

[0020] FIG. 3 is an effect diagram after disparity fusion.

DETAILED DESCRIPTION

[0021] The present invention fuses the disparity diagrams obtained based on a multiband device according to a multiband sensor device and a binocular disparity estimation method, and calculates distance information according to the triangulation measurement principle and the fused disparity to use the imaging advantages of devices of different bands under different environments. By taking the depth estimation of a pair of visible light binocular cameras and a pair of infrared binocular cameras as an example, a specific implementation solution is as follows:

[0022] FIG. 1 shows an overall flow of the solution.

[0023] Step 1, respectively calibrating each lens of a visible light binocular camera and an infrared binocular camera and jointly calibrating respective systems;

[0024] 1.1 Respectively calibrating the infrared camera and the visible light camera by the Zhangzhengyou calibration method to obtain internal parameters such as focal length and principal point position and external parameters such as rotation and translation of each camera.

[0025] 1.2 Jointly calibrating the visible light binocular camera to obtain external parameters such as rotation and translation of two cameras of a binocular camera system; jointly calibrating the infrared binocular camera to obtain external parameters such as rotation and translation of two cameras of an infrared binocular system; correcting output image pairs according to the respective external parameters of the binocular camera system so that the binocular images outputted by the same binocular camera system satisfy the parallel equipotential relationship.

[0026] Step 2, jointly calibrating and registering the visible light binocular camera system and the infrared camera system;

[0027] 2.1 Jointly calibrating the left lens of the visible light binocular system and the left lens of the infrared binocular system by the Zhangzhengyou calibration method to obtain external parameters such as rotation and translation of the visible light camera and the infrared camera.

[0028] 2.2 Simultaneously shooting the images of checkers in different planes by two pairs of binocular cameras, calculating the positional relationship of the same plane in the visible light image and the infrared image by using RT obtained by joint calibration and the detected checker corners, and representing the positional relationship with a homography matrix H.

[0029] 2.3 Denoising and filtering input images by a Gaussian filtering algorithm. The weight of a Gaussian filtering window is determined by a Gaussian function, formula 1.

[00001] $\begin{matrix} h (x, y) = e^{- \frac{x^{2} + y^{2}}{2 σ^{2}}} & (1) \end{matrix}$

[0030] wherein (x,y) is a point coordinate and a is a standard deviation. The Gaussian function is discretized to obtain a weight matrix, i.e., a Gaussian filter.

[0031] Through Gaussian filtering, noise can be effectively suppressed and the image is smoothed. Subsequent matching errors caused by the noise are prevented.

[0032] 2.4 Longitudinally compressing the left input image and the right input image, so that on one hand, parallel equipotential errors caused by binocular correction are offset, and on the other hand, matching speed is increased.

[0033] The longitudinal compression method is: two adjacent rows take a mean value and compressed into one row, i.e.,

I(x,y quotient 2)=(I(x,y)+I(x,y+1))/2

[0034] wherein quotient represents quotient operation. I is an image to be compressed, (x,y quotient 2) is the image coordinate after compression, and (x,y) is the image coordinate before compression.

[0035] Step 3, 3.1 conducting initial cost calculation. Sliding window matching based on Census features is taken as an example herein to describe the flow of matching cost calculation.

[0036] A Census feature descriptor of each pixel is obtained. A sliding window is used for search on a scanning line to calculate the cost corresponding to the possible disparity of each pixel, formula 2:

Cost.sub.d,d∈D.sub.max(x,y)=HD(CensusL(x,y),CensusR(x−d,y)) (2)

[0037] In the formula, HD( ) represents a Hamming distance, and CensusL and CensusR are respectively Census feature descriptors of a left diagram pixel and a right diagram pixel. The output of cost calculation is a tensor of size height(H)×width(W)×maximum disparity(D), i.e., the initial cost diagram.

[0038] 3.2 Transversely compressing the initial cost diagram. Energy propagation needs to select the information of the position similar to the current position in the semi-global as a reference to update the cost of the current position. In order to improve the calculation efficiency, semi-global propagation is changed to sparse propagation, and through experiments, it is found that the operation can reduce the calculation overhead without losing the accuracy. The specific mode is to conduct transverse down-sampling for the initial cost diagram and keep data in alternate columns.

[0039] 3.3 Conducting energy propagation on the sparse cost diagram to obtain an energy diagram and then to obtain the disparity diagram.

[0040] The energy is described as follows:

[00002] $\begin{matrix} E (D) = \underset{p}{.Math.} C (p, D_{p}) + \underset{q \in N_{p}}{.Math.} P_{1} T [.Math. D_{p} - D_{q} .Math. = 1] + \underset{q \in N_{p}}{.Math.} P_{2} T [.Math. D_{p} - D_{q} .Math. > 1]] & (3) \end{matrix}$

[0041] wherein C(p,D.sub.p) is the cost at position p when the disparity is D.sub.p, T[⋅] is an indicator function, and the output is 1 when the input satisfies the conditions within [ ], otherwise, is 0. P.sub.1 and P.sub.2 are penalty terms. D.sub.q is a disparity value at position q.

[0042] According to formula (4), in accordance with the global structure of the image, the cost distribution information of surrounding pixels is transmitted to the center pixel from multiple directions.

[00003] $\begin{matrix} L_{r} (p, d) = C (p, d) + \min (\begin{matrix} L_{r} (p - r, d), L_{r} (p - r, d - 1) + P_{1}, \\ L_{r} (p - r, d + 1) + P_{1}, \\ \min_{i} L_{r} (p - r, i) + P_{2} \end{matrix}) - \min_{k} L_{r} (p - r, k) & (4) \end{matrix}$

[0043] L.sub.r(p,d) is aggregation energy when the disparity at position p is d, and r is a transfer direction.

[0044] After energy propagation, a tensor of size height(H)×width(W)×maximum disparity(D), i.e., the energy diagram, is obtained. Energy propagation conducted successively in four directions of from top to bottom (TB), from left top to bottom right (LTB), from left to right (LR) and from right to left (RL) is taken as an example.

[0045] A disparity corresponding to minimum energy is found for each pixel position according to an energy propagation diagram, which is the integer disparity d(x,y) of the pixel.

[00004] $\begin{matrix} d (x, y) = \arg \min_{i \in D_{\max}} energy (i) & (5) \end{matrix}$

[0046] The energy (⋅) is the energy after aggregation.

[0047] A subpixel level disparity diagram is calculated by using the energy diagram and an integral pixel disparity diagram.

[0048] Step 4, 4.1 Filtering the obtained disparity diagram to remove invalid disparity.

[0049] Firstly, speckle filtering is conducted on the image to remove the outlier in the disparity diagram.

[00005] $\begin{matrix} p (x, y) = {\begin{matrix} 0, & \begin{matrix} {.Math.}_{(i, j) \in Ω (x, y)} T \\ [\begin{matrix} p (i, j) > (1 + t) .Math. p (x, y) .Math. .Math. \\ p (i, j) < (1 - t) .Math. p (x, y) \end{matrix}] > t 1 \end{matrix} \\ p (x, y), & other \end{matrix} & (6) \end{matrix}$

p(x,y) is a disparity value at position (x,y); t and t1 are thresholds, counted and obtained by experiments and stored in hardware in advance; T[⋅] is an indicator function; and the output is 1 when the input satisfies the conditions within [ ], otherwise, is 0. Ω(x,y) is a local region centered on (x,y).

[0050] Median filtering is conducted on the disparity diagram.

p(x,y)=median.sub.(i,j)∈Ω(x,y)(p(i,j)) (7)

[0051] Step 4, restoring the scale of the disparity image. The width and the height of the disparity image obtained in the above process are both ½ of the original scale; and the nearest neighbor interpolation mode is selected to amplify the disparity image in the transverse and longitudinal directions to restore the scale.

[0052] The disparity fusion method is specifically operated as follows:

[0053] 1. The disparity diagram disparity.sub.vis obtained by the visible light binocular camera and the disparity diagram disparity.sub.ir obtained by the infrared binocular camera are fused according to the homography matrix H, the translation and rotation positional relationships between the visible light system and the infrared system and two confidence marker bits.

[0054] 2 Finally, the depth diagram is calculated according to the fused disparity, and the relationship formula between the disparity and the depth is as follows:

[00006] $\begin{matrix} Z = \frac{B \times f}{d} & (8) \end{matrix}$

[0055] wherein B is baseline length, f is the focal length, Z is the depth and d is the disparity.

ACCELERATION METHOD OF DEPTH ESTIMATION FOR MULTIBAND STEREO CAMERAS

Inventors

Cpc classification

Classification Explorer

G06T2207/10048

PHYSICS

Classification Explorer

G06T2207/20032

PHYSICS

Classification Explorer

G06T2207/10012

PHYSICS

Classification Explorer

G06T2207/10024

PHYSICS

Classification Explorer

G06T7/596

PHYSICS

Classification Explorer

G06T2207/20228

PHYSICS

Classification Explorer

G06T7/85

PHYSICS

Classification Explorer

G06T7/593

PHYSICS

International classification

Classification Explorer

G06T7/593

PHYSICS

Classification Explorer

G06T7/80

PHYSICS

Abstract

Claims

Description