REAL-TIME GROUND FUSION METHOD AND SYSTEM BASED ON BINOCULAR STEREO VISION, AND INTELLIGENT TERMINAL

20230147557 · 2023-05-11

    Inventors

    Cpc classification

    International classification

    Abstract

    A real-time ground fusion system is based on binocular stereo vision and an intelligent terminal. The method for accomplish real-time ground fusion includes: S1 of obtaining a disparity map about a same road scenario, and converting a disparity map in a target region into a 3D point cloud; S2 of performing pose conversion on a current frame and a next frame adjacent to the current frame, and performing inverse conversion on a 3D point cloud of the current frame; and S3 of repeating S2 with each frame in the target region as the current frame, so as to achieve ground fusion. Through the conversion and fusion of adjacent frames, holes caused by the projection of the disparity map can be filled to assist driving and output accurate height data, thereby improving comfortableness.

    Claims

    1. A real-time ground fusion method based on binocular stereo vision, comprising: S1 of obtaining a disparity map about a same road scenario, and converting a disparity map in a target region into a three-dimensional (3D) point cloud; S2 of performing pose conversion on a current frame and a next frame adjacent to the current frame, and performing inverse conversion on a 3D point cloud of the current frame; and S3 of repeating S2 with each frame in the target region as the current frame, so as to achieve ground fusion.

    2. The real-time ground fusion method according to claim 1, wherein the disparity map in the target region is converted into the 3D point cloud through [ X Y Z W ] = [ 1 0 0 - cx 0 1 0 - cy 0 0 0 f 0 0 - 1 / baselne ( cx - c x ) / baseline ] [ u v disparity 1 ] , where u and v represent coordinates of a pixel in an image, disparity represents a disparity value of a corresponding pixel, f represents a focal length of a camera, cx and cy represent coordinates of an optical center of a left camera, c′x represents a coordinate of an optical center of a right camera, baseline represents a distance between the optical center of the left camera and the optical center of the right camera, and X, Y, Z and W represent homogeneous coordinates in a 3D coordinate system.

    3. The real-time ground fusion method according to claim 1, wherein the pose conversion is performed on the current frame and the next frame adjacent to the current frame through [ p world 1 ] = [ R t 0 1 ] [ p camera 1 ] , where P.sub.camera represents 3D coordinates in a camera coordinate system, P′.sub.world represents coordinates in a world coordinate system after the pose conversion, R represents a rotation matrix for two frames, and t represents a translation matrix for two frames.

    4. The real-time ground fusion method according to claim 3, wherein the rotation matrix and the translation matrix are obtained through: extracting feature points of the two frames, and matching the two frames in accordance with the extracted feature points; obtaining a matching relation, and calculating essential matrices for data of the two frames; performing calculation on randomly-selected N point pairs through an RANSAC algorithm, so as to obtain an optimal essential matrix; and performing Singular Value Decomposition (SVD) on the optimal essential matrix, so as to obtain the rotation matrix and the translation matrix for the pose conversion.

    5. The real-time ground fusion method according to claim 4, wherein the feature points of the two frames are calculated through a Functional Link Artificial Neural Network (FLANN) nearest neighbor matching algorithm.

    6. The real-time ground fusion method according to claim 4, wherein the calculating the essential matrices for the data of two frames comprises, when a pair of matching points p.sub.1 and p.sub.2 meet p.sub.2.sup.TK.sup.−Tt{circumflex over ( )}RK.sup.−1p.sub.1=0 in accordance with a geometrical relationship, determining E=t{circumflex over ( )}R as the essential matrix, where K represents a camera intrinsic parameter matrix, t represents the translation matrix for the two frames, and R represents the rotation matrix for the two frames.

    7. The real-time ground fusion method according to claim 4, wherein N is 8.

    8. Areal-time ground fusion system based on binocular stereo vision, comprising: a data obtaining unit configured to obtain a disparity map about a same road scenario, and convert a disparity map in a target region into a 3D point cloud; a pose conversion unit configured to perform pose conversion on a current frame and a next frame adjacent to the current frame, and perform inverse conversion on a 3D point cloud of the current frame; and a ground fusion unit configured to perform the pose conversion and the inverse conversion repeatedly with each frame in the target region as the current frame, so as to achieve ground fusion.

    9. An intelligent terminal, comprising a data collection device, a processor and a memory, wherein the data collection device is configured to collect data, the memory is configured to store therein one or more program instructions, and the processor is configured to execute the one or more program instructions, so as to implement the real-time ground fusion method according to claim 1.

    10. A computer-readable storage medium storing therein one or more program instructions, wherein the one or more program instructions is executed to implement the real-time ground fusion method according to claim 1.

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0017] In order to illustrate the technical solutions of the present disclosure or the related art in a clearer manner, the drawings desired for the present disclosure or the related art will be described hereinafter briefly. Obviously, the following drawings merely relate to some embodiments of the present disclosure, and based on these drawings, a person skilled in the art may obtain the other drawings without any creative effort.

    [0018] The structure, scale and size shown in the drawings are merely provided to facilitate the understanding of the contents disclosed in the description but shall not be construed as limiting the scope of the present disclosure, so they has not substantial meanings technically. Any modification on the structure, any change to the scale or any adjustment on the size shall also fall within the scope of the present disclosure in the case of not influencing the effects and the purposes of the present disclosure.

    [0019] FIG. 1 is a flow chart of a real-time ground fusion method based on binocular stereo vision according to one embodiment of the present disclosure;

    [0020] FIG. 2 is a view showing an original image collected by a binocular camera;

    [0021] FIG. 3 is a view showing a 3D point cloud into which the original image in FIG. 2 is converted;

    [0022] FIG. 4 is a view showing a situation where misplacement occurs for the point cloud;

    [0023] FIG. 5 is a view showing a point cloud for a single frame;

    [0024] FIG. 6 is a view showing a point cloud for fused frames; and

    [0025] FIG. 7 is a block diagram of a real-time ground fusion system based on binocular stereo vision according to one embodiment of the present disclosure.

    DETAILED DESCRIPTION

    [0026] In order to illustrate the technical solutions of the present disclosure or the related art in a clearer manner, the drawings desired for the present disclosure or the related art will be described hereinafter briefly. Obviously, the following drawings merely relate to some embodiments of the present disclosure, and based on these drawings, a person skilled in the art may obtain the other drawings without any creative effort.

    [0027] A camera is an indispensable member for detecting an obstacle. A binocular stereo camera is used to provide accurate point cloud data within a short range, so it is very suitable to detect a road surface height. A suspension may be adaptively adjusted in accordance with information about the road surface height, so as to improve the comfortableness. An object of the present disclosure is to provide a real-time ground fusion method based on binocular stereo vision, so as to fuse the point cloud information about multiple frames, and fill a hole in the 3D point cloud, thereby to improve a detection effect and facilitate the adjustment of the suspension.

    [0028] As shown in FIG. 1, the present disclosure provides in some embodiments a real-time ground fusion method based on binocular stereo vision, which includes the following steps.

    [0029] S1: obtaining a disparity map about a same road scenario, and converting a disparity map in a target region into a 3D point cloud.

    [0030] To be specific, the disparity map in the target region is converted into the 3D point cloud through

    [00003] [ X Y Z W ] = [ 1 0 0 - cx 0 1 0 - cy 0 0 0 f 0 0 - 1 / baselne ( cx - c x ) / baseline ] [ u v disparity 1 ] ,

    where u and v represent coordinates of a pixel in an image, disparity represents a disparity value of a corresponding pixel, f represents a focal length of a camera, cx and cy represent coordinates of an optical center of a left camera, c′x represents a coordinate of an optical center of a right camera, baseline represents a distance between the optical center of the left camera and the optical center of the right camera, and X, Y, Z and W represent homogeneous coordinates in a 3D coordinate system.

    [0031] In an actual scenario, a collected original image is shown in FIG. 2. When the disparity map is converted into the 3D point cloud through the above formula, as shown in FIG. 3, some holes may occur for outputted ground information, and the larger the distance, the more obvious the hole. At this time, it is necessary to fill this holes subsequently.

    [0032] S2: performing pose conversion on a current frame and a next frame adjacent to the current frame, and performing inverse conversion on a 3D point cloud of the current frame.

    [0033] Slight fluctuation occurs for a vehicle during the driving. When a vehicle motion model is considered as planar motion and the data for two adjacent frames is fused, misplacement may occur for the point cloud, as shown in FIG. 4.

    [0034] The pose conversion is performed on the current frame and the next frame adjacent to the current frame through

    [00004] [ p world 1 ] = [ R t 0 1 ] [ p camera 1 ] ,

    where P.sub.camera represents 3D coordinates in a camera coordinate system, P′.sub.world represents coordinates in a world coordinate system after the pose conversion, R represents a rotation matrix for two frames, and t represents a translation matrix for two frames.

    [0035] The transfer matrices R and t for the two frames may be obtained through a visual odometer. In other words, the rotation matrix and the translation matrix are obtained through: extracting feature points of the two frames, and matching the two frames in accordance with the extracted feature points; obtaining a matching relation, and calculating essential matrices for data of the two frames; performing calculation on randomly-selected 8 point pairs through an RANSAC algorithm, so as to obtain an optimal essential matrix; and performing SVD on the optimal essential matrix, so as to obtain the rotation matrix and the translation matrix for the pose conversion.

    [0036] In order to improve the real-time performance of the algorithm, the feature points of the two frames are calculated through an FLANN nearest neighbor matching algorithm.

    [0037] To be specific, when a pair of matching points p.sub.1 and p.sub.2 meet p.sub.2.sup.TK.sup.−Tt{circumflex over ( )}RK.sup.−1p.sub.1=0 in accordance with a geometrical relationship, E=t{circumflex over ( )}R may be determined as the essential matrix, where K represents a camera intrinsic parameter matrix, t represents the translation matrix for the two frames, and R represents the rotation matrix for the two frames.

    [0038] The inverse conversion may be performed on the point cloud of the current frame, i.e., an inverse matrix of a matrix T obtained through combining R and t may be obtained through S2, so as to eliminate the misplacement caused by the rotation and translation between the two frames.

    [0039] S3: repeating S2 with each frame in the target region as the current frame, so as to achieve ground fusion.

    [0040] Through comparing FIG. 5 with FIG. 6, after the fusion of the multiple frames, it is able to reduce the quantity of holes in the road surface, thereby to improve the quality of the space information in the scenario.

    [0041] According to the real-time ground fusion method based on the binocular stereo vision in the embodiments of the present disclosure, the disparity map about the same road scenario is obtained, and the disparity map in the target region is converted into the 3D point cloud. Next, the pose conversion is performed on the current frame and the next frame adjacent to the current frame, and the inverse conversion is performed on the 3D point cloud of the current frame. Then, the pose conversion and the inverse conversion are performed repeatedly with each frame in the target region as the current frame, so as to achieve ground fusion. As a result, through the conversion and fusion of the adjacent frames, it is able to fill the holes caused by the projection of the disparity map in the assistant driving, and output the accurate height data, thereby to improve the comfortableness.

    [0042] The present disclosure further provides in some embodiments a real-time ground fusion system based on binocular stereo vision which, as shown in FIG. 7, includes: a data obtaining unit 100 configured to obtain a disparity map about a same road scenario, and convert a disparity map in a target region into a 3D point cloud; a pose conversion unit 200 configured to perform pose conversion on a current frame and a next frame adjacent to the current frame, and perform inverse conversion on a 3D point cloud of the current frame; and a ground fusion unit 300 configured to perform the pose conversion and the inverse conversion repeatedly with each frame in the target region as the current frame, so as to achieve ground fusion.

    [0043] According to the real-time ground fusion system based on the binocular stereo vision in the embodiments of the present disclosure, the disparity map about the same road scenario is obtained, and the disparity map in the target region is converted into the 3D point cloud. Next, the pose conversion is performed on the current frame and the next frame adjacent to the current frame, and the inverse conversion is performed on the 3D point cloud of the current frame. Then, the pose conversion and the inverse conversion are performed repeatedly with each frame in the target region as the current frame, so as to achieve ground fusion. As a result, through the conversion and fusion of the adjacent frames, it is able to fill the holes caused by the projection of the disparity map in the assistant driving, and output the accurate height data, thereby to improve the comfortableness.

    [0044] The present disclosure further provides in some embodiments an intelligent terminal, which includes a data collection device, a processor and a memory. The data collection device is configured to collect data, the memory is configured to store therein one or more program instructions, and the processor is configured to execute the one or more program instructions so as to implement the above-mentioned real-time ground fusion method.

    [0045] The present disclosure further provides in some embodiments a computer-readable storage medium storing therein one or more program instructions. The one or more program instructions is executed by the topographic environment detection system so as to implement the above-mentioned real-time ground fusion method.

    [0046] In the embodiments of the present disclosure, the processor may be an integrated circuit (IC) having a signal processing capability. The processor may be a general-purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or any other programmable logic element, discrete gate or transistor logic element, or a discrete hardware assembly, which may be used to implement or execute the methods, steps or logic diagrams in the embodiments of the present disclosure. The general purpose processor may be a microprocessor or any other conventional processor. The steps of the method in the embodiments of the present disclosure may be directly implemented by the processor in the form of hardware, or a combination of hardware and software modules in the processor. The software module may be located in a known storage medium such as a Random Access Memory (RAM), a flash memory, a Read-Only Memory (ROM), a Programmable ROM (PROM), an Electrically Erasable PROM (EEPROM), or a register. The processor may read information stored in the storage medium so as to implement the steps of the method in conjunction with the hardware.

    [0047] The storage medium may be a memory, e.g., a volatile, a nonvolatile memory, or both.

    [0048] The nonvolatile memory may be an ROM, a PROM, an EPROM, an EEPROM or a flash disk.

    [0049] The volatile memory may be an RAM which serves as an external high-speed cache. Illustratively but nonrestrictively, the RAM may include Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM) or Direct Rambus RAM (DRRAM).

    [0050] The storage medium in the embodiments of the present disclosure intends to include, but not limited to, the above-mentioned and any other appropriate memories.

    [0051] It should be appreciated that, in one or more examples, the functions mentioned in the embodiments of the present disclosure may be achieved through hardware in conjunction with software. For the implementation, the corresponding functions may be stored in a computer-readable medium, or may be transmitted as one or more instructions on the computer-readable medium. The computer-readable medium may include a computer-readable storage medium and a communication medium. The communication medium may include any medium capable of transmitting a computer program from one place to another place. The storage medium may be any available medium capable of being accessed by a general-purpose or special-purpose computer.

    [0052] The above embodiments are for illustrative purposes only, but the present disclosure is not limited thereto. Obviously, a person skilled in the art may make further modifications and improvements without departing from the spirit of the present disclosure, and these modifications and improvements shall also fall within the scope of the present disclosure.