DISPARITY ESTIMATION OPTIMIZATION METHOD BASED ON UPSAMPLING AND EXACT REMATCHING

20220198694 · 2022-06-23

    Inventors

    Cpc classification

    International classification

    Abstract

    The present invention discloses a disparity estimation optimization method based on upsampling and exact rematching, which conducts exact rematching within a small range in an optimized network, improves previous upsampling methods such as neighbor interpolation and bilinear interpolation for disparity maps or cost maps, and works out a propagation-based upsampling method by the way of network so that accurate disparity values can be better restored from disparity maps in the upsampling process.

    Claims

    1. A disparity estimation optimization method based on upsampling and exact rematching, comprising the following steps: step 1: extracting discriminable features; step 2: conducting initial cost matching and cost map optimization to obtain an initial disparity map with low resolution; step 3: obtaining a disparity map with one resolution higher from the initial disparity map with low resolution by a propagation upsampling method and an exact rematching method, and repeating the process until the original resolution is restored; 3.1 the propagation upsampling method the initial disparity map D.sub.n+1 with minimum resolution is first subjected to interpolation and upsampling to obtain a coarsely matched disparity map D′.sub.n, the disparity map obtained at this moment is only obtained by numerical interpolation without reference to any structural information of an original image, a left view is reestablished with an original right view I.sub.r according to the coarsely matched disparity map D′.sub.n and denoted as Ĩ.sub.l, and then the error between the reestablished left view Ĩ.sub.l and a real left view I.sub.l is calculated to obtain a confidence map M.sub.c:
    M.sub.c=1−normalization(I.sub.l−Ĩ.sub.l)  (2) normalization(.) is normalized operation, the difference is normalized to (0,1), and the probability value at each point on the confidence map M.sub.c represents the confidence of the disparity value of the pixel; and the confidence map is reproduced and translated to become a confidence map group which is denoted as M.sub.cg,
    M.sub.cg=f.sub.c(M.sub.c,k,s)  (3) wherein f.sub.c(.) represents the operation of reproduction and translation to resize, k represents the size of a neighboring window, and s represents the void content of a sampling window; and the receptive field is (2s+1).sup.2, and a confidence vector of k*k is obtained at each position, which represents the confidence of a pixel in a k*k neighboring window around the pixel; a relative relation network module is proposed, a left feature map with the corresponding resolution is input into the module, and a weight vector is worked out at each position, which indicates the relative relation of the neighboring pixel and the center pixel, i.e., the larger the weight is, the greater the effect of a neighboring pixel on the pixel is; and the weight is donated as W.sub.relative;
    W.sub.relative=custom-character.sub.relative(F.sub.n.sup.l,k)  (4) wherein k represents the size of a neighboring window, and custom-character.sub.relative represents the relative relation network module; the coarsely matched disparity map D′.sub.n, the confidence map M.sub.cg and the relative relation weight W.sub.relative are used for propagation to obtain a propagated disparity map, and the propagation calculation process is as follows:
    D.sub.n.sup.p=<f.sub.c(D′.sub.n,k,s),softmax(W.sub.relative*M.sub.cg)>  (5) wherein D.sub.n.sup.p represents the propagated disparity map, <, > represents dot product operation, f.sub.c(.) represents the operation of reproduction and translation to resize, and softmax(W.sub.relative*M.sub.cg) represents the support strength of the surrounding pixel to the center pixel during propagation and is obtained by multiplying the confidence of the surrounding pixel and the relative relation weight; then the void content of the window is used for repeating the propagation process so that the optimized disparity map can be propagated in different receptive fields; and at this point, the propagation upsampling process from D.sub.n+1 to D.sub.n.sup.p is completed; 3.2 the exact rematching method first, a left feature map is reestablished with a right feature map F.sub.n.sup.r with the corresponding resolution in a feature list L according to D.sub.n.sup.p and donated as custom-character, and custom-character=f.sub.w(F.sub.n.sup.r, D.sub.n.sup.p); and rematching is conducted once with the reestablished left feature map custom-character and the original left feature map F.sub.n.sup.l within a small range of the disparity d=[−d.sub.0, d.sub.0] to obtain a cost map, then the cost map is optimized through an hourglass network, the disparity is regressed to obtain a bias map Δ which represents an offset from D.sub.n.sup.p, and the two maps are added to obtain a final disparity map D.sub.n of an optimized network;
    D.sub.n=D.sub.n.sup.p+Δ  (6) the processes of 3.1 and 3.2 are iterated repeatedly until the original resolution is restored to obtain a final high-precision disparity map.

    2. The disparity estimation optimization method based on upsampling and exact rematching according to claim 1, wherein in step 1, the features of the left and right views input into the network are extracted, feature maps with different resolutions are stored in the feature list custom-character, and then matching is conducted on the feature map with minimum resolution.

    3. The disparity estimation optimization method based on upsampling and exact rematching according to claim 1, wherein in step 2, the left and right feature maps with minimum resolution are used, f.sup.l(x, y) f.sup.r(x, y) represents the feature vector at one point on the image, C represents a cost map, and the formula for forming the cost map is as follows:
    C(x,y,d)=<f.sup.l(x,y)−f.sup.r(x−d,y)>  (1) < > represents the subtraction of the corresponding position elements of the feature vector, d is equal to {0, 1, 2, D.sub.max}, and D.sub.max is the maximum disparity during matching; a cost map with minimum resolution is obtained and then optimized through an hourglass network; and the hourglass network is composed of convolution layers with different step sizes, and a cost map output from the hourglass network is regressed by the soft argmin layer to obtain an original disparity map with minimum resolution, which is donated as D.sub.n+1.

    Description

    DESCRIPTION OF DRAWINGS

    [0027] FIG. 1 is an overall flow chart of a solution;

    [0028] FIG. 2 is a flow chart of a propagation upsampling module;

    [0029] FIG. 3 is a flow chart of exact rematching.

    DETAILED DESCRIPTION

    [0030] The present invention makes end-to-end disparity map prediction on the input left and right views based on the disparity optimization strategy of a disparity estimation framework from coarse to fine, and the specific solution of predicting an accurate disparity map by the propagation upsampling method and the exact rematching method proposed by the present application without introducing additional tasks is as follows:

    [0031] The specific flow of the network of the solution is shown in FIG. 1, and the specific operation is as follows:

    [0032] Step 1: extracting discriminable features;

    [0033] The features of the left and right views input into the network are extracted. Compared with matching in the gray value of the original image, matching using a feature vector can better cope with the change of illumination and appearance, and the extracted feature vector can provide more detailed and complete description of the information of the image, which is conductive to better matching. Feature extraction using a simple CNN network comprises four cascaded parts (each part comprises three different convolution layers to extract features), the four parts respectively generate left and right feature maps F.sub.0 to F.sub.3 with different resolutions (a subscript represents a downsampling factor, for example, F.sub.3 represents a feature map with ⅛ resolution), the dimension of each feature vector f is 32, four feature maps with different resolutions are stored in the feature list custom-character={F.sub.0, F.sub.1, F.sub.2, F.sub.3} as the input of the subsequent optimized network, and then matching is conducted on the feature map F.sub.3 with minimum resolution, i.e., ⅛ resolution.

    [0034] Step 2: conducting initial cost matching and cost map optimization to obtain an initial disparity map with low resolution;

    [0035] F.sub.3.sup.l and F.sub.3.sup.r represent left and right feature maps with ⅛ resolution, f.sup.l(x, y) f.sup.r(x, y) represents the feature vector at one point on the image, and C represents a cost map, wherein the formula for forming the cost map is as follows: (formula 1)


    C(x,y,d)=<f.sup.l(x,y)−f.sup.r(x−d,y)>  (1)

    < > represents the subtraction of the corresponding position elements of the feature vector, d is equal to {0, 1, 2, D.sub.max}, and D.sub.max is the maximum disparity during matching, so the size of the cost map finally formed is [H/8, W/8, D.sub.max/8, f].

    [0036] A cost map with ⅛ resolution is obtained and then optimized through an hourglass network, wherein the hourglass network is composed of convolution layers with different step sizes, and a cost map output from the hourglass network is regressed by the soft argmin layer to obtain a coarse disparity map with ⅛ resolution, which is donated as D.sub.3.

    [0037] Step 3: inputting the initial disparity map with low resolution into the optimized network to obtain a fine disparity map with high resolution;

    [0038] A disparity map with one resolution higher is obtained from the disparity map with minimum resolution by a propagation upsampling module and an exact rematching module, and the process is repeated until the original resolution is restored.

    [0039] The specific flow is shown in FIG. 2 and FIG. 3.

    [0040] The specific steps are as follows. (with one-step iteration from D.sub.3 to D.sub.2 as an example)

    [0041] 3.1 Propagation Upsampling Method

    [0042] D.sub.3 is first subjected to interpolation and upsampling to obtain a coarsely matched disparity map D′.sub.2, the disparity map obtained at this moment is only obtained by numerical interpolation without reference to any structural information of an original image, and information loss caused by downsampling cannot be restored, so the obtained D′.sub.2 has a higher error rate. Therefore, a propagation-based strategy is required to optimize the disparity map D′.sub.2. A left view is reestablished with an original right view l.sub.r according to the upsampling disparity map D′.sub.2 and denoted as Ĩ.sub.l, and Ĩ.sub.l=f.sub.w(I.sub.r, D.sub.n). f.sub.w(.) is a warping function. Then the error between the reestablished left view Ĩ.sub.l and a real left view I.sub.l is calculated to obtain a confidence map M.sub.c:


    M.sub.c=1−normalization(I.sub.l−Ĩ.sub.l)  (2)

    normalization(.) is normalized operation, the difference is normalized to (0,1), and the probability value at each point on the confidence map M.sub.c represents the confidence of the disparity value of the pixel. The confidence map is reproduced and translated to become a confidence map group with the size of [H/8, W/8, k*k], which is denoted as M.sub.cg,


    M.sub.cg=f.sub.c(M.sub.c,k,s)  (3)

    [0043] wherein f.sub.c(.) represents the operation of reproduction and translation to resize, k represents the size of a neighboring window, and s represents the void content of a sampling window. (The receptive field is (2s+1).sup.2) A confidence vector of k*k can be obtained at each position, which represents the confidence of a pixel in a k*k neighboring window around the pixel.

    [0044] A relative relation network module is proposed, a left feature map with the corresponding resolution is input into the module, and a weight vector is worked out at each position, which indicates the relative relation of the neighboring pixel and the center pixel, i.e., the larger the weight is, the greater the effect of a neighboring pixel on the pixel is. For example, if the relative relation between the pixel and a neighboring pixel in the same object is strong, the weight is also large, and on the contrary, if the neighboring pixel is at the edge, the weight of the pixel is small. Through this module, different weights can be worked out from each image so that the disparity value of the pixel can be updated according to different weights of the surrounding pixel during propagation rather than that the disparity map is optimized using the convolution kernel with the same weight for different inputs in the conventional neural networks. The module is composed of three convolution layers with void contents of {1,2,3}, the left feature map is input, and the weight with the size of [H/8, W/8, k*k] is output and donated as W.sub.relative;


    W.sub.relative=custom-character.sub.relative(F.sub.n.sup.l,k)  (4)

    wherein k represents the size of a neighboring window, and custom-character.sub.relative represents the relative relation network module.

    [0045] The coarse disparity map D′.sub.2, the confidence map M.sub.cg and the relative relation weight W.sub.relative obtained in the above step of upsampling are used for propagation to obtain an optimized D.sub.2.sup.p (p: propagate), and the propagation calculation process is as follows:


    D.sub.n.sup.p=<f.sub.c(D′.sub.n,k,s),softmax(W.sub.relative*M.sub.cg)>  (5)

    [0046] wherein D.sub.n.sup.p represents the propagated disparity map, <, > represents dot product operation, f.sub.c(.) represents the operation of reproduction and translation to resize, and softmax(W.sub.relative*M.sub.cg) represents the support strength of the surrounding pixel to the center pixel during propagation and is obtained by multiplying the confidence of the surrounding pixel and the relative relation weight. Then the void content s=1, 2, 3 of the window is used for repeating the propagation process three times so that the optimized disparity map can be propagated in different receptive fields. At this point, the propagation upsampling process from D.sub.n+1 to D.sub.n.sup.p is completed.

    [0047] 3.2 Exact Rematching Method

    [0048] The propagation upsampling module outputs a propagation-based disparity map D.sub.n.sup.p with high resolution from D.sub.n+1 with low resolution, and the exact rematching module will conduct rematching within a small range on D.sub.n.sup.p. First, a left feature map is reestablished with a right feature map F.sub.n.sup.r with the corresponding resolution in a feature list custom-character according to D.sub.n.sup.p and donated as custom-character, and custom-character=f.sub.w(F.sub.n.sup.r, D.sub.n.sup.p). Rematching is conducted once with the reestablished left feature map custom-character and the original left feature map F.sub.n.sup.l within a small range of the disparity d=[−2, 2] to obtain a cost map with the size of [H/4, W/4, 5, f](with D.sub.2.sup.p as an example), then the cost map is optimized through an hourglass network, the disparity is regressed to obtain a bias map Δ which represents an offset from D.sub.n.sup.p, and the two maps are added to obtain a final disparity map D.sub.n of an optimized network.


    D.sub.n=D.sub.n.sup.p+Δ  (6)

    The processes of 3.1 and 3.2 are iterated repeatedly until the original resolution is restored to obtain a final high-precision disparity map.

    [0049] 4. Loss Function

    [0050] The solution adopts two kinds of loss functions for network training, the smooth loss function is used for the disparity map D.sub.n.sup.p output by the propagation upsampling module and donated as custom-character.sub.smooth, and the output of the exact rematching module is supervised through downsampling to the disparity label with the corresponding resolution and donated as custom-character.sub.gt,

    [00001] smooth = 1 N .Math. i , j .Math. x d i , j .Math. e - .Math. x .Math. i , j .Math. + .Math. y d i , j .Math. e - .Math. j .Math. i , j .Math. ( 7 ) gt = 1 N .Math. D n - D n ^ .Math. 2 ( 8 ) = smooth + gt ( 9 )

    [0051] In formula (7). N represents the number of image pixels, ∂d represents the gradient of the disparity map, and ∂ε represents the gradient of an edge map of the original image. In formula (8), custom-character represents the disparity label with the corresponding resolution, and ∥.∥.sub.2 represents the L2 distance; and the final loss function is formed by adding two loss functions.