METHOD FOR ESTIMATING A RELATIVE POSITION OF AN OBJECT IN THE SURROUNDINGS OF A VEHICLE AND ELECTRONIC CONTROL UNIT FOR A VEHICLE AND VEHICLE
20210287022 · 2021-09-16
Assignee
Inventors
Cpc classification
G06F18/214
PHYSICS
G06V20/58
PHYSICS
International classification
Abstract
A relative position of an object in the surroundings of a vehicle is estimated based on a two-dimensional camera image. A control unit determines an object contour of the object from the camera image and determines at least one digital object template that represents the object based on the object contour. The control unit forward projects the at least one object template from respective different positions onto an image plane of the camera image. Each forward-projected object template yields a respective two-dimensional contour proposal, and the control unit compares the contour proposals with the object contour of the object.
Claims
1-15. (canceled)
16. A method for determining a relative position of an object in a surroundings of a vehicle, the method comprising: determining, by an electronic control unit of the vehicle, an object contour of the object from a two-dimensional (2D) camera image taken by a camera of the vehicle; back-projecting the object contour into a three-dimensional virtual space that represents the surroundings, the back-projected object contour describing a virtual three-dimensional (3D) frustum reaching from the camera into the surroundings; determining at least one digital object template that represents the object; positioning the at least one digital object template at a plurality of predefined positions inside the frustum, the plurality of predefined positions being determined according to a predefined positioning rule; forward projecting the at least one digital object template from the plurality of predefined positions onto an image plane of the camera image, each forward-projected digital object template yielding a respective 2D contour proposal; comparing the 2D contour proposals with the object contour of the object; selecting, based on the comparing, at least one 2D contour proposal among the 2D contour proposals that fulfills a predefined matching criterion with respect to the object contour, as a respective best-fit contour proposal; and determining the relative position of the object based on the respective predefined position of each corresponding digital object template that leads to the at least one 2D contour proposal selected as the respective best-fit contour proposal.
17. The method according to claim 16, wherein the object contour corresponds to a 2D bounding box having a rectangular shape.
18. The method according to claim 16, wherein each digital object template represents a specific object type, an object size, and a spatial orientation.
19. The method according to claim 16, wherein determining the at least one digital object template comprises using a plurality of digital object templates, and at least two digital object templates among the plurality of digital object templates represent different object types and/or at least two digital object templates among the plurality of digital object templates represent different object sizes and/or at least two digital object templates among the plurality of digital object templates represent a same object type, but different potential spatial orientations of the object.
20. The method according to claim 16, wherein determining the at least one digital object template includes, determining, by an object classification module, an object type and/or an object size and/or a spatial orientation, of the object based on the camera image.
21. The method according to claim 16, wherein each digital object template is a 3D bounding box.
22. The method according to claim 16, wherein the predefined positioning rule comprises using a plurality of predefined positions for each digital object template, the plurality of predefined positions being arranged in a predefined pattern, and in the 3D virtual space a ground plane, on which the vehicle and/or the object are arranged, is represented and the plurality of predefined positions are arranged inside the frustum on the ground plane and/or on a plane parallel to the ground plane.
23. The method according to claim 16, wherein the predefined matching criterion comprises that the respective 2D contour proposal and the object contour overlap at least to a predefined minimum overlap value.
24. The method according to claim 16, wherein selecting the at least one 2D contour proposal among the 2D contour proposals comprises selecting only some of the 2D contour proposals that fulfill the predefined matching criterion, and selecting only some of the 2D contour proposals that fulfill the predefined matching criterion comprises applying a similarity criterion to the corresponding digital object templates and determining at least one group of similar corresponding digital object templates and only selecting one digital object template out of each group and selecting the associated best-fit contour proposal of each selected digital object template.
25. The method according to claim 16, wherein determining the relative position of the object comprises applying an artificial neural network (ANN) to each best-fit contour proposal, the ANN being trained to provide correction data for each predefined position of each corresponding digital object template that leads to the at least one 2D contour proposal, to increase a degree of matching between the best-fit contour proposal and the object contour.
26. The method according to claim 25, further comprising: calculating corrected positions for each predefined position of each corresponding digital object template that leads to the at least one 2D contour proposal based on the correction data; and calculating a mean value of the corrected positions as an estimate of the relative position of the object.
27. The method according to claim 16, wherein when none of the 2D contour proposals fulfills the predefined matching criterion based on the comparing, the method further comprises: providing, by an object estimation module, an estimate of an object type, an object size, and a spatial orientation of the object based on the camera image; back-projecting a digital object template of a corresponding object type, object size, and spatial orientation inside the frustum; varying a projection distance and generating, for each value of the projection distance, a 2D contour proposal by forward-projecting the back-projected digital object template onto an image plane of the camera image, until a value for the projection distance is found for which the 2D contour proposal fulfills the predefined matching criterion; and determining the relative position of the object based on the value for the projection distance found for which the 2D contour proposal fulfills the predefined matching criterion.
28. The method according to claim 27, wherein the object estimation module is configured as an artificial neural network.
29. An electronic control unit for a vehicle, comprising: a memory configured to store instructions; and a processor configured to execute the instructions to: determine an object contour of the object from a two-dimensional (2D) camera image taken by a camera of the vehicle, back-project the object contour into a three-dimensional virtual space that represents the surroundings, the back-projected object contour describing a virtual three-dimensional (3D) frustum reaching from the camera into the surroundings, determine at least one digital object template that represents the object, position the at least one digital object template at a plurality of predefined positions inside the frustum, the plurality of predefined positions being determined according to a predefined positioning rule, forward project the at least one digital object template from the plurality of predefined positions onto an image plane of the camera image, each forward-projected digital object template yielding a respective 2D contour proposal, compare the 2D contour proposals with the object contour of the object; select, based on the comparison, at least one 2D contour proposal among the 2D contour proposals that fulfills a predefined matching criterion with respect to the object contour, as a respective best-fit contour proposal, and determine a relative position of the object based on the respective predefined position of each corresponding digital object template that leads to the at least one 2D contour proposal selected as the respective best-fit contour proposal.
30. The electronic control unit according to claim 29, wherein the object contour corresponds to a 2D bounding box having a rectangular shape.
31. The electronic control unit according to claim 29, wherein each digital object template represents a specific object type, an object size, and a spatial orientation.
32. The electronic control unit according to claim 29, wherein the predefined matching criterion comprises that the respective 2D contour proposal and the object contour overlap at least to a predefined minimum overlap value.
33. A motor vehicle, comprising: a two-dimensional camera; and the electronic control unit according to claim 29.
34. The motor vehicle according to claim 33, wherein the object contour corresponds to a 2D bounding box having a rectangular shape, and each digital object template represents a specific object type, an object size, and a spatial orientation.
35. The motor vehicle according to claim 33, further comprising: a driver assistance system to receive the relative position of the object from the electronic control unit, and to control the vehicle based on the relative position of the object.
Description
[0034] In the following, an exemplary implementation of the invention is described. The figures show:
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
[0045] The embodiment explained in the following is a preferred embodiment of the invention. However, in the embodiment, the described components of the embodiment each represent individual features of the invention which are to be considered independently of each other and which each develop the invention also independently of each other and thereby are also to be regarded as a component of the invention in individual manner or in another than the shown combination. Furthermore, the described embodiment can also be supplemented by further features of the invention already described.
[0046] In the figures identical reference signs indicate elements that provide the same function.
[0047]
[0048] For estimating the relative position 15 of object 12, control unit 14 may receive from a 2D camera 17 one or more 2D camera images 18. In the following, it is assumed that the estimation of the relative position 15 is performed on a single 2D camera image. From camera 17, an image sensor 19 is shown in order to illustrate an image plane 20 on which an optical lens 21 forward-projects the light from the surroundings 13 onto the image sensor 19. On image sensor 19, the 2D camera image is generated. Based on 2D camera image 18, the electronic control unit may perform a method 22 in order to provide an estimate 23 of the relative position 15. The estimate 23 can be forwarded to, e.g., a driver assistance system which may, e.g., autonomously drive vehicle 10.
[0049] In order to estimate the relative position 15, electronic control unit 14 may perform method 22. For performing method 22, electronic control unit 14 may comprise a processing unit CPU, which may be based on one or more microprocessors and/or graphical processing units and/or microcontrollers. By means of the processing unit CPU, electronic control unit 14 may operate one or more artificial neural networks ANN.
[0050]
[0051] In a first step S10, the control unit may determine an object contour of the object 12 from the camera image 18.
[0052]
[0053]
[0054]
[0055]
[0056] A potential position 31 of for the determined object template 30 may be defined in a step S13 for positioning object template 30. Optionally, for object template 30 and/or for at least one other object template, further potential or possible positions 32 inside frustum 28 may be set. In
[0057]
[0058]
[0059]
[0060]
[0061] So far, method 22 has been described on the basis of one single object template 30 and its resulting contour proposal 42, if object template 30 is positioned at position 31. However, each single object template 30 is positioned at more than one position 31, as illustrated in
[0062]
[0063]
[0064]
[0065]
[0066] Method 22 may comprise an additional step S17, which is provided for the case that (see
[0067] The general background of the method therefore can be a monocular camera (2D camera) 3D (non-temporal) object detection. This means trying to estimate the position of objects in 3-dimensional space, including the pose, dimensions, location and class of the object based solely on a single camera frame, and known camera intrinsic and extrinsic parameters.
[0068] As part of the environment model use by an autonomous vehicle, one needs to capture dynamic objects and differentiate them from static objects. Potentially mobile objects include cars, trucks, buses, cyclists, pedestrians, animals etc. These should not only be detected in image coordinates (pixel region of interest) but also in real-world coordinates. This method described herein seeks to solve this problem in the context of mono-cameras (i.e. not using additional sensors like lidar, radar or addition stereo-vision cameras), using a single frame.
[0069] In the method described here, a 2D image detector or camera can be used to give areas of interest in the image. A rectangular area of interest (bounding box) creates a frustum in 3D, in which points from many potential (but known) locations of the 3D world can fall in, provided we know the camera instrinsic/extrinsic parameters (i.e. the properties of the lens). Given that we know already the class of the object (i.e. car, truck, pedestrian) we can select prior learnt templates for such classes which maximise the likelihood of an accurate size estimate and place them in 3D space. We then re-project in 2D and check the re-projection error of such template to the originally detected 2D bounding box. For example, within the frustum, we can have a point coming from 400 meters away; however, if we place a “car” template 3D box at this location and re-project into 2D and find that the 2D box is much smaller than the original 2D detection, we know that this location is too far. The best matching templates+locations are then refined by a neural network, which learns to correct the orientation, dimensions, and location of the template to minimize the localization loss and re-projection loss to ground truths, purely based on the camera input. For efficiency, we also make an assumption that objects are not in mid air, but relatively close to the ground plane, thereby reducing further the potential templates.
[0070] There are some caveats with this approach above—for example, one might not be able to find any templates with the right threshold. This might be because, especially at large distances, small deviations in pose and orientation might shift the 3D box so much that the re-projection error is very high, and thus no templates are proposed (all are filtered out). In these cases, or as an additional check for all cases, rather than look for templates in the frustum, we can ask another network to estimate pose R and size S of the object. Given these, we can solve for T (translation) by assuming that the 3D box fits snugly in the 2D image detection. One of the 8 points in the 3D image could potentially define either xmin, ymin, xmax or ymax of the 2D bounding box in image space, so 8**4 combinations (4096). We can solve these combinations exhaustively and pick the translation that gives the lowest re-projection error.
[0071] Overall, the example shows how a position of an external object can be estimated on the basis of a single 2D camera image by the invention.