DATASET GENERATION METHOD FOR SELF-SUPERVISED LEARNING SCENE POINT CLOUD COMPLETION BASED ON PANORAMAS
20230094308 · 2023-03-30
Inventors
- Xin YANG (Dalian, CN)
- Tong LI (Dalian, CN)
- Baocai YIN (Dalian, CN)
- Zhaoxuan ZHANG (Dalian, CN)
- Boyan WEI (Dalian, CN)
- Zhenjun DU (Dalian, CN)
Cpc classification
G06V10/7792
PHYSICS
G06V20/647
PHYSICS
International classification
Abstract
The present invention belongs to the technical field of 3D reconstruction in the field of computer vision, and provides a dataset generation method for self-supervised learning scene point cloud completion based on panoramas. Pairs of incomplete point cloud and target point cloud with RGB information and normal information can be generated by taking RGB panoramas, depth panoramas and normal panoramas in the same view as input for constructing a self-supervised learning dataset for training of the scene point cloud completion network. The key points of the present invention are occlusion prediction and equirectangular projection based on view conversion, and processing of the stripe problem and point-to-point occlusion problem during conversion. The method of the present invention includes simplification of the collection mode of the point cloud data in a real scene; occlusion prediction idea of view conversion; and design of view selection strategy.
Claims
1. A dataset generation method for self-supervised learning scene point cloud completion based on panoramas, comprising the following steps: Step 1: Generating Initial Point Cloud from a Panorama Under a Specific View 1.1) introducing a sphere to represent a three-dimensional world, and representing the coordinates in x, y and z directions by longitude and latitude, wherein the radius r of the sphere represents a depth value; assuming that the length of a depth panorama D.sub.1 corresponds to the range of −180° to 180° in a horizontal direction of a scene, and the width of the depth panorama D.sub.1 corresponds to the range of −90° to 90° in a vertical direction; representing the coordinate of each pixel of the depth panorama D.sub.1 with longitude and latitude, wherein the radius of a point in the sphere corresponding to each pixel is the depth value of each pixel in the depth panorama D.sub.1; and in a spherical coordinate system, converting the latitude, longitude and depth values of each pixel into x, y and z coordinates in the camera coordinate system to generate point cloud P.sub.0; 1.2) converting the point cloud P.sub.0 in the camera coordinate system to the world coordinate system based on a camera extrinsic parameter corresponding to the view v.sub.1, and assigning the color information of RGB panorama C.sub.1 and normal panorama N.sub.1 to each point in the point cloud P.sub.0 in the row column order of pixel points to generate initial point cloud P.sub.1 with RGB information and initial point cloud P.sub.2 with normal information; Step 2: Selecting a New Occlusion Prediction View Based on the Initial Point Cloud 2.1) encoding the initial point cloud P.sub.1 by a truncated signed distance function; dividing a selected 3D space to be modeled into a plurality of small blocks, and calling each small block as a voxel; storing, by the voxel, a distance value between the small block and a nearest object surface, and representing, by the symbol of the distance value, that the voxel is in a free space or a closed space; and conducting truncation processing if the absolute value of the distance value exceeds a set truncation distance D; 2.2) assuming that a small voxel block corresponding to the view v.sub.1 is t.sub.0; updating the distance value of t.sub.0 as 0; and updating the distance value of the small voxel block near t.sub.0 according to a distance from t.sub.0, wherein if the distance from to is smaller, the decline of the distance value is larger; 2.3) traversing each small voxel block to find a voxel block with the largest distance value; selecting the voxel block closest to a scene center if a plurality of voxel blocks have the same distance value; randomly selecting from the voxel blocks which satisfy conditions if the distance from the scene center is still the same; and taking the center of the selected voxel block as the position of view v.sub.2 to obtain a translation matrix of the view v.sub.2, with a rotation matrix of the view v.sub.2 the same as a rotation matrix of the view v.sub.1; Step 3: Generating a Panorama Under the Selected View from the Initial Point Cloud 3.1) converting the initial point cloud P.sub.1 with RGB information and the initial point cloud P.sub.2 with normal information in the world coordinate system to the camera coordinate system based on the camera extrinsic parameter corresponding to the view v.sub.2; 3.2) in the spherical coordinate system, converting the x, y and z coordinates of each point in the point cloud P.sub.1 and the point cloud P.sub.2 respectively into latitude, longitude and radius, and corresponding to the pixel position of a 2D panorama; making the color of each point correspond to the pixel position; considering that occlusion is completed by point-to-point occlusion, which is inconsistent with the real world, increasing the influence range of each point; specifically, extending the calculated each pixel (x, y) outward to pixels (x, y), (x+1, y), (x, y+1) and (x+1, y+1); and copying the information carried by each pixel to the new pixels; 3.3) the problem that multiple points correspond to the same pixel exists when the pixels are merged into a panorama; firstly, initializing the depth value of each pixel of depth panorama D.sub.2 to a maximum value 65535 that can be represented by an unsigned 16-bit binary number, and initializing the color value of each pixel of a RGB panorama C.sub.2 and a normal panorama N.sub.2 as a background color; then conducting the following operation on all the pixels generated in step 3.2): acquiring the position (x,y) of the pixel and the corresponding depth value, and comparing with the depth value at (x,y) in the depth panorama D.sub.2; if the former depth value is smaller, updating the depth value at (x,y) in the depth panorama D.sub.2 and the color values at (x,y) in the RGB panorama C.sub.2 and the normal panorama N.sub.2; if the latter depth value is smaller, keeping unchanged; and after all the updates are completed, obtaining the RGB panorama C.sub.2, the depth panorama D.sub.2 and the normal panorama N.sub.2 rendered under the new view v.sub.2; Step 4: Generating Incomplete Point Cloud from the Panorama Under the Specific View 4.1) generating point cloud {tilde over (P)}.sub.0 from the depth panorama D.sub.2, like step 1.1); 4.2) calculating normal direction in the world coordinate system according to the normal panorama N.sub.2, and converting the normal direction in the world coordinate system to the camera coordinate system according to the camera extrinsic parameter corresponding to the view v.sub.2, wherein the normal panorama N.sub.2 is rendered in the camera coordinate system corresponding to the view v.sub.2, but the color of the normal panorama records the normal direction in the world coordinate system; 4.3) the incompletion of the scene is mainly caused by occlusion, but partly caused by the view; therefore, in the process of 2D-3D rectangular projection, angle masks need to be calculated to locate a stripe area, so that a scene point cloud completion network can especially complete a real occlusion area; a specific implementation solution is: calculating each point in the point cloud {tilde over (P)}.sub.0 in the camera coordinate system; denoting a vector represented by a connecting line from an origin to a point in {tilde over (P)}.sub.0 as {right arrow over (n)}.sub.1; denoting the vector of the point in the row column order calculated from the normal panorama N.sub.2 as {right arrow over (n)}.sub.2; calculating an angle α between the vector {right arrow over (n)}.sub.1 and the vector {right arrow over (n)}.sub.2; then calculating the difference values between the angle α and 90° to obtain absolute values; and filtering the points with the absolute value of less than 15° as the angle masks; 4.4) converting the point cloud {tilde over (P)}.sub.0 in the camera coordinate system to the world coordinate system based on the camera extrinsic parameter corresponding to the view v.sub.2, and assigning the color information of the RGB panorama C.sub.2 and normal panorama N.sub.2 to each point in the point cloud {tilde over (P)}.sub.0 in the row column order of the pixel points to generate incomplete point cloud P.sub.3 with RGB information and incomplete point cloud P.sub.4 with normal information; Step 5: Constructing a Self-Supervised Learning Dataset taking the incomplete point cloud P.sub.3 with RGB information, the incomplete point cloud P.sub.4 with normal information and the angle masks as input for the training of the scene point cloud completion network, wherein the targets of the scene point cloud completion network are incomplete point cloud P.sub.1 with RGB information and incomplete point cloud P.sub.2 with normal information; thus, generating self-supervised learning data pairs for scene point cloud completion, and then constructing the self-supervised learning dataset.
Description
DESCRIPTION OF DRAWINGS
[0039]
[0040]
[0041]
DETAILED DESCRIPTION
[0042] Specific embodiments of the present invention are further described below in combination with accompanying drawings and the technical solution.
[0043] The present invention is based on the 2D-3D-Semantics dataset published by Stanford University. The dataset involves six large indoor areas which are derived from three different buildings that focus on education and office. The dataset contains 1413 equirectangular RGB panoramas, as well as corresponding depth maps, surface normal maps, semantic annotation maps and camera metadata, which are sufficient to support the dataset generation method for self-supervised learning scene point cloud completion based on panoramas proposed by the present invention. In addition, other equirectangular panoramas taken or collected are also applicable to the present invention.
[0044] The present invention comprises four main modules: a 2D-3D equirectangular projection module, a view selection module, a 3D-2D equirectangular projection and point-to-point occlusion processing module and a 2D-3D equirectangular projection and angle mask filtering module, as shown in
[0045]
[0046]