TARGET DETECTION METHOD BASED ON FUSION OF VISION, LIDAR, AND MILLIMETER WAVE RADAR

20220277557 · 2022-09-01

    Inventors

    Cpc classification

    International classification

    Abstract

    A target detection method based on fusion of vision, lidar and millimeter wave radar comprises: obtaining original data detected by a camera, a millimeter wave radar, and a lidar, and synchronizing the millimeter wave radar, the lidar, and the camera in time and space; performing a calculation on the original data detected by the millimeter wave radar according to a radar protocol; generating a region of interest by using a position, a speed, and a radar reflection area obtained from the calculation; extracting feature maps of a point cloud bird's-eye view and the original data detected by the camera; projecting the region of interest onto the feature maps of the point cloud bird's-eye view and the original data detected by the camera; fusing the feature maps of the point cloud bird's-eye view and the original data detected by the camera, and processing a fused image through a fully connected layer.

    Claims

    1. A target detection method based on fusion of vision, lidar, and millimeter wave radar, comprising: (1) obtaining original data detected by each of a camera, a millimeter wave radar, and a lidar, and synchronizing the millimeter wave radar, the lidar, and the camera in time and space; (2) performing a calculation on the original data detected by the millimeter wave radar according to a radar protocol; (3) using a position based on the original data, which is detected by the millimeter wave radar and has been calculated, as a first anchor point, and generating a first region of interest, which is three-dimensional, according to a speed and a radar reflection area with the first anchor point as a center of the first region of interest; (4) generating a second anchor point arranged according to a specified distance in a blind area in which radar points of the millimeter wave radar are not generated, and generating a second region of interest by traversing the second anchor point with the second anchor point as a center of the second region of interest; (5) pre-processing the original data detected by the lidar to generate a point cloud bird's-eye view, and extracting a feature map of the point cloud bird's-eye view and a feature map of the original data detected by the camera; (6) projecting the first region of interest and the second region of interest generated in step 3 and step 5, which are three-dimensional, onto the feature map of the point cloud bird's-eye view and the feature map of the original data detected by the camera; and (7) fusing the first region of interest, the second region of interest, the feature map of the point cloud bird's-eye view, and the feature map of the original data detected by the camera, which have a same size, to form a fused image, processing the fused image through a fully connected layer, and outputting an image of a test result.

    2. The target detection method based on fusion of vision, lidar, and millimeter wave radar according to claim 1, wherein: the step 1 comprises: triggering an exposure of the camera when the lidar sweeps across a center of a Field of View (FOV) of the camera; distributing the exposure of the camera in scans of the lidar; obtaining synchronized frames of the camera, the millimeter wave radar, and the lidar as key frames of the camera, the millimeter wave radar, and the lidar; superimposing millimeter wave radar data and lidar data, which are obtained by multi-frame scanning, onto the key frames; and processing the key frames.

    3. The target detection method based on fusion of vision, lidar, and millimeter wave radar according to claim 1, wherein: in step 2, the original data detected by the millimeter wave radar after being calculated contains information of position, speed, radar reflection area, and radar point status.

    4. The target detection method based on fusion of vision, lidar, and millimeter wave radar according to claim 1, wherein: the step 3 comprises: generating the first region of interest, which is three-dimensional, based on the original data detected by the millimeter wave radar that has been calculated; using the position as the first anchor point to generate the first region of interest that is cuboid; using a direction of a vector velocity as a direction of the first region of interest; determining a size of the first region of interest according to a size of a Radar Cross Section (RCS); determining a size of a three-dimensional (3D) frame according to the following table and formula: TABLE-US-00003 Target RCS[m.sup.2] RCS[dB] Pedestrian 0.01 −20 Car 100 20 Truck 200 23 Corner reflection 20379 43
    RCS.sub.dB=10 log(RCS.sub.m.sub.2); and determining a direction and an angle of the 3D frame according to a speed (V x, V y) of a millimeter wave radar point and compensation speed information (V x_comp, V y_comp) and according to the formula: θ = tan - 1 V x - V x _ comp V y - V y_comp .

    5. The target detection method based on fusion of vision, lidar, and millimeter wave radar according to claim 1, wherein: the step 4 comprises: arranging the second anchor point at a certain interval in a blind area in which radar points of the millimeter wave radar are not generated; and generating the second region of interest by traversing the second anchor point with the second anchor point as the center of the second region of interest, the short-wave detection range of millimeter wave radar from 0 to 30 meters is 90 degrees, the short-beam detection range of millimeter wave radar from 30 to 70 meters is 18 degrees, a long-wave detection angle at 70 meters is 18 degrees, a range of detecting obstacles is [−30, 30, 0, 70], and a method for determining the blind area comprises: (1) projecting the first region of interest obtained in step 3 onto bird's-eye view plane; (2) obtaining a background area according to a projected two-dimensional candidate frame; and (3) generating candidate frames by traversing the background area.

    6. The target detection method based on fusion of vision, lidar, and millimeter wave radar according to claim 1, wherein: the step 5 comprises: processing the original data detected by the lidar to reserve lidar points of the original data detected by the lidar selected within a range of [−0.3, 2.5] meters in a direction perpendicular to a ground plane; equally slicing the lidar points of the original data detected by the lidar selected within the range of [−0.3, 2.5] meters into four slices; compressing each of the four slices to form a horizontal two-dimensional image; combining the horizontal two-dimensional image with intensity information of the lidar points to obtain the point cloud bird's-eye view in a [600 700 5] dimensionality; and using a neural network model to extract the feature map of the point cloud bird's-eye view and the feature map of the original data detected by the camera, and a size of the feature map is unchanged from a size of an input image.

    7. The target detection method based on fusion of vision, lidar, and millimeter wave radar according to claim 1, wherein: the step 6 comprises: setting an index number for each of the first anchor point and the second anchor point; projecting the first region of interest and the second region of interest onto the feature map of the point cloud bird's-eye view and the feature map of the original data detected by the camera; determining a three-dimensional region of interest through a spatial synchronization of the original data detected by the millimeter wave radar and the lidar in step 1 since the original data detected by the millimeter wave radar and the lidar are both three-dimensional data; and obtaining a vertex coordinate of the three-dimensional region of interest R=[x; y; z], a conversion relationship of the three-dimensional region of interest is: .Math. X p Y p Z .Math. = P [ R c T c O T 1 ] [ x y z 1 ] , (X p, Y p) are coordinates of projection points in an image coordinate system, P is a matrix of camera parameters, Rc is a rotation matrix of the camera relative to an Inertial Measurement Unit (IMU), Tc is a translation matrix of the camera relative to the IMU, coordinate points of a 3D area in an image are obtained through the steps 1-7, each of the vertex coordinates, which has been obtained, is adjusted according to a ground vector to obtained an adjusted coordinate point (X, Y, Z*), the feature maps of the same size are fused to form the fused image, the fused image is processed through the fully connected layer, anchor points are filtered out, and a size and direction of anchor point boxes are regressed.

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0025] FIG. 1 illustrates a schematic diagram of an installation location of vehicle-mounted millimeter wave radar and a lidar.

    [0026] FIG. 2 illustrates a region of interest (ROI) coding method.

    [0027] FIG. 3 illustrates a schematic diagram of a speed direction of the millimeter wave radar.

    [0028] FIG. 4 illustrates a schematic diagram of a ROI area and a background area generated by the millimeter wave radar.

    [0029] FIG. 5 illustrates a flowchart of target detection using multi-sensor fusion.

    DETAILED DESCRIPTION OF THE EMBODIMENTS

    [0030] The present disclosure will be further described below in combination with the accompanying drawings and embodiments.

    Embodiment 1

    [0031] A target detection method based on fusion of vision, lidar, and millimeter wave radar is provided. The specific flow chart is shown in FIG. 5. The target detection method comprises the following steps.

    [0032] Camera data, millimeter wave radar data, and lidar data are obtained in step (1). A camera and a lidar are installed on a top of a vehicle, as shown in FIG. 1. When the lidar sweeps across a center of the camera's Field of View (FOV), exposure of the camera will be triggered. The camera runs at 12 Hz, while the lidar runs at 20 Hz. The exposure of the camera, which runs at 12 Hz, is distributed as evenly as possible in scans of the lidar, which runs at 20 Hz, so not all of the scans of the lidar have corresponding frames of the camera and corresponding frames of the millimeter wave radar. Three synchronized frames are taken as key frames of the three sensors. In order to obtain more data, the millimeter wave radar data, obtained by multi-frame scanning, and the lidar data are superimposed on the key frames, and only the key frames are processed later.

    [0033] The sensors use an Inertial Measurement Unit (IMU) of the vehicle as a reference point. The IMU is installed at a center of the vehicle to obtain information such as a speed and attitude of the vehicle. The following abbreviations are relevant to the equations provided below: translation matrix T.sub.c and rotation matrix R.sub.c of the camera relative to IMU, translation matrix T.sub.1 and rotation matrix R.sub.1 of the lidar, translation matrix T.sub.r and rotation matrix R.sub.r of the millimeter wave radar.

    [00006] [ X l Y l l ] = [ R T 0 1 ] [ X r Y r Z r ] R = R r * R 1 T = T r - T 1

    [0034] Because an altitude of the ground is not constant, an installation position of the IMU is used as a reference point of the world coordinate system. A normal vector n of a ground plane in an image and a height h of the camera are calculated through the IMU. The rotation matrix of the camera is R.sub.c, and the translation matrix of an installation position of the camera relative to the sensor is T.sub.c. A unit normal vector is n r. Available ground function vector [n, Tra[2]] a is obtained.

    [00007] R c = [ R 1 1 R 1 2 R 1 3 R 2 1 R 2 2 R 2 3 R 3 1 R 3 2 R 3 3 ] T c = [ T 11 T 12 T 13 ] Tra = R c * T c T n = R c * n r

    [0035] Step 2: a calculation is performed on the millimeter wave radar data according to a corresponding radar protocol. The millimeter wave radar data after being calculated contains information such as position, speed, radar reflection area, radar point status, etc., as shown in FIG. 2.

    [0036] Step 3: a first region of interest, which is three-dimensional, is generated based on the position, velocity, and radar reflection area of the millimeter wave radar data which has been calculated. A position (i.e., position information calculated from the millimeter wave radar data) of the millimeter wave radar data is used as a center point of a cuboid region of interest, and a direction of a vector velocity is taken as a direction of the cuboid region of interest. A size of the first region of interest is determined according to a size of the radar reflection area Radar Cross Section (RCS). A size of a three-dimensional (3D) frame is determined according to a range of the size of the radar reflection area RCS. When RCS.sub.db<0, the size of the 3D frame is 0.5 m*0.5 m*1.8 m, when 0<RCS.sub.db<20, the size of the 3D frame is 2 m*4 m*1.8 m. The 3D frame uses the coding method shown in FIG. 2. In this way, the amount of parameters is reduced. A target RCS.sub.db size is shown in the Table 1. The size of RCS.sub.db, of pedestrians and vehicles is more distinguishable.


    RCS.sub.dB=10 log(RCS.sub.m.sub.2)

    [0037] According to a speed (V x, V y) of the millimeter wave radar point, this speed is an absolute speed. Compensation speed information (V x_comp, V y_comp) is a moving speed of the vehicle obtained according to the IMU. According to the formula:

    [00008] θ = tan - 1 V x - V x _ comp V y - V y_comp

    [0038] A range of θ is (0, π), and a direction angle of the 3D frame is determined by θ, as shown in FIG. 3.

    TABLE-US-00002 TABLE 1 Target RCS[m.sup.2] RCS[dB] Pedestrian 0.01 −20 Car 100 20 Truck 200 23 Corner reflection 20379 43

    [0039] Step 4: anchor points are set at a certain interval for a blind area in which the millimeter wave radar point is not generated, and each of the anchor points are traversed as a center (i.e., center of the second region of interest) to generate a second region of interest. Each of anchor points generates two 3D frames with two sizes, the frame size (0.5 m*0.5 m*1.8 m) and size (2 m*4 m*1.8 m), as shown in FIG. 4. A short-wave detection range of millimeter wave radar is limited. The short-wave detection range from 0 to 30 meters is 90 degrees, and the short-wave detection range from 30 to 70 meters is reduced to 18 degrees. A long-wave detection angle at 70 meters away is 18 degrees, which will cause the blind area. A range of detecting obstacles is [−30, 30, 0, 70]. The following method is adopted for a position where there is no region of interest within the range of detecting obstacles.

    [0040] 1) The first region of interest obtained in step 3 is projected onto a bird's-eye view plane.

    [0041] 2) A background area is obtained according to a projected two-dimensional candidate frame.

    [0042] 3) To ensure that a target is not missed, candidate frames are generated by traversing the background area.

    [0043] In step 5: original laser point cloud data (i.e., the lidar data) is processed, and lidar points selected within a range of [−0.3, 2.5] meters in a direction perpendicular to the ground plane are reserved. The lidar data is equally sliced within the range of [−0.3, 2.5] meters and divided into 4 slices. Each of the 4 slices is compressed into a horizontal two-dimensional image. The horizontal two-dimensional image is combined with intensity information of the lidar points to obtain a point cloud bird's-eye view in a [600 700 5] dimensionality, and a neural network model is used to extract a feature map of the point cloud bird's-eye view and a feature map of the camera image (i.e., the camera data). A size of the feature map is unchanged from a size of an input image (i.e., the camera image and the point cloud bird's-eye view).

    [0044] In step 6: there is an index number for the anchor points generated by the millimeter wave radar and the anchor points generated by traversing. The region of interest (i.e., the first region of interest and the second region of interest) is projected into two feature maps. The millimeter wave radar data and the lidar data are both three-dimensional data, and a three-dimensional region of interest can be determined through a spatial synchronization of the two (i.e., the millimeter wave radar data and the lidar data) in step 1. A vertex coordinate of the three-dimensional region of interest R=[x; y; z] is obtained. A conversion relationship of the three-dimensional region of interest is:

    [00009] .Math. X p Y p Z .Math. = P [ R c T c O T 1 ] [ x y z 1 ]

    [0045] (X.sub.p, Y.sub.p) are coordinates of projection points in an image coordinate system, and P is a matrix of camera parameters. Through the above steps, coordinate points of a 3D area in an image are obtained. Each of the vertex coordinates, which has been obtain, is adjusted according to a ground vector [n Tra[2]].sup.T composed of a normal vector n perpendicular to the ground plane and a camera height Tra[2]. An adjusted coordinate point (X, Y, Z*) is obtained.

    [00010] [ n T r a [ 2 ] ] T = [ a b c d ] Z * = - ( a * X + c * Y + d ) / b .

    [0046] The feature maps of the same size are fused to form a fused image, the fused image is processed through a fully connected layer, anchor points are filtered out, the size and direction of anchor point boxes are regressed, scores are given for each of the anchor point boxes, and a higher scored anchor point box of the anchor point boxes is filtered through Non-maximum suppression (NMS). A test result is obtained and output.

    [0047] The aforementioned embodiments are merely some embodiments of the present disclosure, and the scope of the disclosure is not limited thereto. Thus, it is intended that the present disclosure cover any modifications and variations of the presently presented embodiments provided they are made without departing from the appended claims and the specification of the present disclosure.