Method for motion estimation between two images of an environmental region of a motor vehicle, computing device, driver assistance system as well as motor vehicle

10567704 ยท 2020-02-18

Assignee

Inventors

Cpc classification

International classification

Abstract

The invention relates to a method for motion estimation between two images of an environmental region (9) of a motor vehicle (1) captured by a camera (4) of the motor vehicle (1), wherein the following steps are performed: a) determining at least two image areas of a first image as at least two first blocks (B) in the first image, b) for each first block (B), defining a respective search region in a second image for searching the respective search region in the second image for a second block (B) corresponding to the respective first block (B); c) determining a cost surface (18) for each first blocks (B) and its respective search region; d) determining an averaged cost surface (19) for one of the at least two first blocks (B) based on the cost surfaces (18); d) identifying a motion vector (v) for the one of the first blocks (B) describing a motion of a location of the first block (B) in the first image and the corresponding second block (B) in the second image. The invention also relates to a computing device (3), a driver assistance system (2) as well as a motor vehicle (1).

Claims

1. A method for motion estimation between two images of an environmental region of a motor vehicle captured by a camera of the motor vehicle, the method comprising: determining at least two image areas of a first image as at least two first blocks in a block grid in the first image; for each first block, defining a respective search region in a second image for searching the respective search region in the second image for a second block corresponding to the respective first block using block matching; determining a cost surface for each first blocks and its respective search region; determining an averaged cost surface for one of the at least two first blocks based on the cost surfaces; and identifying a motion vector for the one of the first blocks describing a motion of a location of the first block in the first image and the corresponding second block in the second image.

2. The method according to claim 1, wherein a global minimum of the averaged cost surface is determined and the motion vector is determined in dependency on the global minimum.

3. The method according to claim 1, wherein for determining the average cost surface, a mean value of each cost surface is determined, and respective weighting factors for determining the averaged cost surface are determined based on the mean values.

4. The method according to claim 3, wherein the weighting factor is determined as a reciprocal of the respective mean value.

5. The method according to claim 1, wherein a sliding window is determined comprising a predetermined number of first blocks, wherein the motion vector is determined for one of the first blocks within the sliding window based on the cost surfaces of all first blocks within the sliding window.

6. The method according to claim 5, wherein the number of first blocks within the sliding window is preset such that one first block is completely surrounded by further first blocks within the sliding window, wherein the motion vector is determined for the first block in the middle surrounded by the further first blocks.

7. The method according to claim 1, wherein an extrinsic calibration of the camera is performed based on the motion vector derived from the averaged cost surface.

8. The method according to claim 7, wherein for performing the extrinsic calibration, a rotation calibration of the camera is performed, wherein a loss function describing a deviation between the motion vector and a predetermined vector is determined and a rotation-compensated motion vector is determined by minimizing the loss function.

9. The method according to claim 8, wherein for performing the extrinsic calibration, a height calibration of the camera is performed, wherein the height of the camera is determined in dependency on a length of the rotation-compensated motion vector and an expected value of the length of the rotation-compensated motion vector.

10. The method according to claim 9, wherein the expected value for the length is preset in dependency on a velocity of the motor vehicle.

11. The method according to claim 10, wherein the velocity of the motor vehicle is determined by means of odometry and/or based on at least one further motion vector determined for at least one further camera.

12. A computing device for a driver assistance system of a motor vehicle, which is adapted to perform a method according to claim 1.

13. A driver assistance system for a motor vehicle comprising at least one camera and a computing device according to claim 12.

14. A motor vehicle with a driver assistance system according to claim 13.

Description

(1) These show in:

(2) FIG. 1 a schematic representation of an embodiment of a motor vehicle according to the invention;

(3) FIG. 2 a schematic representation of a block matching operation;

(4) FIG. 3a, 3b a schematic view of a motion field divided into blocks as well as a schematic view of motion vectors;

(5) FIG. 4 a schematic representation of a flow chart of an embodiment of a method according to the invention; and

(6) FIG. 5 a schematic view of cost surfaces as well as an averaged cost surface.

(7) In the figures, identical as well as functionally identical elements are provided with the same reference characters.

(8) FIG. 1 shows a motor vehicle 1 according to the present invention. In the present case, the motor vehicle 1 is configured as a passenger car. The motor vehicle 1 has a driver assistance system 2 for supporting a driver of the motor vehicle 1. The driver assistance system 2 comprises a computing device 3, which can for example be formed by a vehicle-side control unit. Additionally, the driver assistance system 2 comprises at least one camera 4. In the present case, the driver assistance system 2 includes four cameras 4, wherein a first camera 4 is disposed in a front area 5 of the motor vehicle 1, a second camera 4 is disposed in a rear area 6 of the motor vehicle 1, a third camera 4 is disposed on a driver's side 7 of the motor vehicle 1, in particular on a wing mirror of the driver's side 7, and a fourth camera 4 is disposed on a passenger side 8 of the motor vehicle 1, in particular on a wing mirror of the passenger's side 8. The cameras 4 disposed on the driver's side 7 and the passenger's side 8 can also replace the wing mirrors, whereby the motor vehicle 1 can be designed as a mirrorless vehicle 1 enabling mirrorless driving. By means of the cameras 4, an environmental region 9 of the motor vehicle 1 can be captured in images. The cameras 4 can comprise fisheye lenses in order to enlarge an angle of view and thus a detection range of the cameras 4.

(9) The computing device 3 is adapted to perform a block matching operation based on the images or rather video frames captured by the at least one camera 4 by means of tracking a texture, such as tarmac, of a road surface 11 of a road 10 for the motor vehicle 1. In particular, the computing device 3 is adapted to improve the accuracy of the road surface texture tracking in the block matching algorithm when operating in difficult environmental conditions, such as low-light or adverse weather. In such conditions, the images or video frames can be corrupted by high-level of noise, motion blur and other artefacts that generally degrade the block matching quality.

(10) The block matching algorithm is a method of locating matching blocks B (see FIG. 2) of image data in a sequence of images or digital video frames for the purposes of motion estimation or optical flow. FIG. 2 visualizes a block matching operation known from the prior art. Therefore, the current video frame is divided into blocks B also referred to as macroblocks, wherein each macroblock is particularly compared for its similarity with all possible blocks of the same size within a specified search window 15 in another frame. In FIG. 2, the block 14 is a macroblock B with a size NN in the current video frame and the block 13 is a macroblock B under search with a size NN in the previous frame within a search window 15. The location that gives the highest similarity between the two blocks 13, 14, the so-called minimum cost location, can be selected and registered as a motion vector, i.e. a 2-dimension vector where the two components x, y correspond to the horizontal and vertical relative displacement of the block 13, 14 between the two frames.

(11) The similarity measure, also referred to as a block-distortion measure or matching cost, can be obtained by various methods such as Cross Correlation, Normalised Cross Correlation, Phase Correlation, Sum of Absolute Differences (SAD), Sum of Squared Differences (SSD), Sum of Absolute Transformed Differences (SATD), Census/Hamming distance and many more. SAD and SSD are popular because of their computational efficiency and sufficiently good quality for many applications. The lower the matching cost is at a particular location the higher the similarity between the compared blocks 13, 14 is at that location. For example a perfect or ideal match would result in zero cost when the sum of absolute or squared differences of the overlapping pixels of the two blocks 13, 14 is taken. Radiometric invariance can be achieved by pre-filtering the images with a Laplacian of Gaussian, Difference of Gaussians or other filters.

(12) The matching cost between the reference block 14 and a candidate block 13 at each test location can be computed by means of exhaustive search for all possible locations within the nominated search window 15. This method which is also referred to as full search (FS) yields the most accurate results and lowest number of outliers or erroneous motion vectors but is also the most computationally expensive. Various methods are known that seek to reduce the amount of computation by avoiding computing the matching cost at all possible locations at the expense of degrading the quality of the results. These include Diamond Search (DS), Hexagon-based Search (HEXBS), Adaptive Rood Pattern Search (ARPS) and many more. A notable exception is a simple computational optimisation of the full-search method known as Fast Computation of Full Search (FCFS) that does not impact quality.

(13) With those techniques each reference block 14 can be matched independently within its nominated search window 15. The proposed technique can combine information from neighbouring blocks in a unique way, in particular under the assumption that all blocks B represent a uniform motion locally on the same plane, leading to strong matches.

(14) For example, a motion field 16a, 16b on the ground surface 11 obtained by block matching two successive frames while the vehicle 1 is travelling straight is shown in FIG. 3a and FIG. 3b. In case of cameras 4 having fisheye lenses, the frames have been previously rectified into virtual-plan views, in particular for the purpose of an extrinsic calibration algorithm for the camera 4. FIG. 3a shows the motion field 16a of an uncalibrated camera 4 which exhibits perspective distortion as the virtual plan-view has been generated using incorrect extrinsic calibration parameters. A calibrated system should produce motion vectors on the virtual plan view free from perspective distortion, i.e. parallel to the horizontal x-axis and of equal length. The motion vectors v shown in FIG. 3a are not parallel to the horizontal x-axis and of equal length, and thus, indicate the uncalibrated system.

(15) FIG. 3b has been generated with the correct extrinsic calibration parameters and has no perspective distortion i.e. the ground plane is mapped correctly to a parallel virtual image plane. Thus FIG. 3b shows the motion field 16b of a calibrated camera 4. The calibrated motion vectors v.sub.c are all parallel to the horizontal x-axis and of equal length. The square 17 indicates an arbitrary group of motion vectors v.

(16) The calibrated motion vectors v.sub.c as shown in FIG. 3b can, for instance, be generated by extrinsic calibration, a so-called motion tracking calibration, MTC, wherein the extrinsic rotations and height of the camera 4 are calibrated by analysing the motion field 16a. The MTC operation uses at least two images consecutively captured by the camera 4 and tracks the relative movement of road surface texture such as tarmac between the images or video frames, in particular without the need of strong features, like a kerb 12 alongside the road 10.

(17) In particular, using the calibration algorithm, a spatial orientation of the camera 4 can be found relative to the ground plane 11 by analysing the motion vectors v. The orientation of the cameras 4 can be expressed in the roll-pitch-yaw rotation scheme rotating in sequence about the fixed X, Y and Z axes of the world coordinate system, where X is the longitudinal vehicle axis, Y is the transverse vehicle axis and Z is the vertical vehicle axis as shown in FIG. 1.

(18) A loss function to be minimised is formulated that exploits the geometric properties of the motion vectors v on the ground plane 11, in particular considering the constraint for approximately straight driving of the motor vehicle. By minimizing the cost function the motion vectors v can be mapped and re-projected to rotation-compensated and calibrated motion vectors v.sub.c that are free from perspective distortion and are all parallel to the horizontal x-axis.

(19) In addition, the height of the cameras 4 can be calibrated by finding the absolute height of the camera 4 from the ground surface 11 or the relative height between the cameras 4 by analysing the calibrated and re-projected motion vectors v.sub.c. The height of the cameras 4 can deviate from the nominal default due to airmatic suspension or due to loading changes in the motor vehicle 1 such as the number of passengers or weight in a vehicle's boot. The absolute height of a single camera 4 can be estimated based on vehicle odometry or image features of known scale. The length of the corrected, rotation calibrated, and re-projected motion vectors v.sub.c of a single camera 4 is proportional to the velocity or speed of the vehicle 1 and inversely proportional to the height of that camera 4 from the ground plane 11. Given the odometry of the vehicle from the vehicle-side network, e.g. CAN or FlexRay, an expected length of the motion vectors v.sub.c on the ground 11 can be calculated and the height of the camera 4 can be adjusted to match it with the re-projected motion vectors v.sub.c.

(20) It is possible to smooth the motion field 16a before using it for calibration by averaging motion vectors v within a sliding window. For example, the central motion vector v within a sliding window 17 can be computed as the average of the nine motion vectors v contained by the window 17. This can be repeated for each possible location of the sliding window 17 in the block grid while storing the output motion vectors v in a separate buffer to avoid recursion. Any small biases introduced by the averaging of motion vectors v under perspective distortion can be neutralised progressively as the virtual-plan view is updated from the calibration result in a feedback loop.

(21) In difficult environmental conditions such as low-light or adverse weather, block matching produces mostly outlier motion vectors v, i.e. motion vectors v at random lengths and directions. Averaging these motion vectors v would not improve the quality of the information used for calibration. Thus, for improving the performance of the autonomous road based calibration algorithm such as MTC, that is tracking the relative movement of road surface texture between two video frames, a method for motion estimation between two images is performed. The method is visualized in FIG. 4 by means of a flow chart.

(22) In particular, the main idea of the method is to take a weighted average of cost surfaces 18 (see FIG. 5) of all the blocks B within the sliding window 17 before extracting the motion vector v referenced to a block B, particularly the central block B, of the sliding window 17. Within the method, the following steps S1, S2, S3, S4 are performed.

(23) In a first step S1, an individual cost surface 18 for each block B in the block grid is computed, where the blocks B are obtained from their fixed locations on the block-grid in one frame and matched within their respective search regions in the other frame. The size and position of the search regions can be fixed or dynamically assigned by predicting the camera ego-motion between the two frames, e.g. from the vehicle odometry. (NB: The search region 15 or search window 15 used for block matching should not be confused with the sliding-window 17 used for selecting the blocks B whose cost surfaces 18 are to be averaged). The individual cost surfaces 18 of blocks B within the sliding window 17 are shown in FIG. 5. For instance, a cost surface 18 can be derived by positioning a 3232 pixels reference block 14 from one frame at all possible locations within a 6464 pixels search window 15 of another frame and taking the Sum of the Squared Differences (SSD) between the pixels of the reference block 14 and the respective pixels in the search window 15. As can be seen in FIG. 5 the shape of the cost surfaces 18 are inherently irregular as there is a random level of similarity between the block 12 and the pixels of the search region 15 at every location. The location of highest similarity has the lowest cost, i.e. is at the global minimum of the cost surface 18the ideal case being zero cost which occurs when all subtracted pixels are identical. In higher levels of image noise these fluctuations become stronger and may even exceed the depth at the location where the best match would be expected and produce a hardly visible global minimum or a global minimum at the wrong location. A motion vector v extracted from that global minimum location would consequently be erroneous.

(24) In a second step S2 the mean value of each cost surface 18 is determined and its reciprocal is stored as a weighting factor w.sub.ij (see FIG. 5) to be used in the next step S3. In this next, third step S3, by using a sliding window 17, e.g. with size 33 blocks or another size, the cost surfaces 18 are selected from a group of neighbouring blocks B in the block-grid enclosed by the sliding window 17. A weighted average of their cost values is determined so as to obtain a new cost surface 19 where each value is the weighted average of the respective values of the individual cost surfaces 18. The resulting cost surface 19 will have less fluctuations and a stronger, clearly visible minimum 20 formed by the contributions of weaker minima corrupted by noise in the individual cost surfaces 18. The expectation is that this stronger minimum 20 will now be a global minimum so that correct motion vector v can be extracted.

(25) In a fourth step S4, the global minimum location within the averaged cost surface 19 is determined. From that the coordinates of the motion vector v at the centre of the sliding window 17 can be derived, i.e. the motion vector v that would normally correspond to the central block B of the sliding window 17. Then the sliding window 17 can be moved to the next location and repeat the process steps S1 to S4 until all locations within the block-grid have been exhausted.

(26) In summary, the motion vector v of a particular block B is extracted from the minimum cost location of the weighted-average cost surface 19. The weights w.sub.ij are chosen so as to approximately equalise the energy of the individual cost surfaces 18. For this purpose the individual cost surfaces 18 are weighted by the reciprocal of their own average value. To avoid breaking symmetry motion vectors v are particularly not computed for blocks B without symmetrical neighbours. This method can be extended to any window size and can be applied also in a hierarchical way such that multiple layers of cost averaging constrain the search area in subordinate layers to further reduce the possibility of outliers. For example, a global motion vector can be computed from the average cost surface 19 of all blocks B on the grid having a single very strong peak representing the average motion vector. This global motion vector can be used to constrain the search in a previous layer or otherwise for obtaining approximate information about the global motion or perspective error in the scene.