Method for determining dimensions in an indoor scene from a single depth image
09761015 · 2017-09-12
Assignee
Inventors
- Yong Xiao (Ann Arbor, MI, US)
- Chen Feng (Ann Arbor, MI, US)
- Yuichi Taguchi (Arlington, MA, US)
- Vineet R. Kamat (Ann Arbor, MI)
Cpc classification
G06F18/231
PHYSICS
G06V10/7625
PHYSICS
International classification
Abstract
A method determines dimensions in a scene by first acquiring a depth image of the scene acquired by a sensor, and extracting planes from the depth image. Topological relationships of the planes are determined. The dimensions are determined based on the planes and the topological relationships. A quality of the dimensions is evaluated using a scene type, and if the quality is sufficient outputting the dimensions, and otherwise outputting a guidance to reposition the sensor.
Claims
1. A method for determining dimensions in a scene, comprising steps of: acquiring a single depth image of the scene acquired by a sensor; extracting planes from the single depth image; determining topological relationships of the planes; determining the dimensions based on the planes and the topological relationships; evaluating a quality of the dimensions of the planes acquired from the single depth image using a scene type, and if the quality is sufficient; then outputting the dimensions; otherwise outputting a guidance to reposition the sensor, and wherein the steps are performed in a processor.
2. The method of claim 1, wherein the depth image is combined with a red, green, and blue (RGB) image of the scene to form an RGB-depth image.
3. The method of claim 1, wherein the guidance is output to a user.
4. The method of claim 1, wherein the guidance is output to a robot on which the sensor is arranged.
5. The method of claim 1, wherein the extracting further comprises: segmenting pixels in the depth image into groups; representing the groups as nodes in a graph, wherein edges represent neighbouring groups; and applying agglomerative hierarchical clustering to the graph to merge nodes on the same plane.
6. The method of claim 1, wherein the topological relationships include: parallel planes if normal vectors of two planes are parallel to each other; coplanar planes if two planes have identical parameters; intersecting planes if two planes are not parallel; and perpendicular planes if the normal vectors of two planes are perpendicular to each other.
7. The method of claim 1, further comprising: using a least squares procedure for extracting the planes.
8. The method of claim 1, wherein the scene type defines a predetermined shape.
9. The method of claim 8, wherein the predetermined shape includes a box shape and an opening shape.
10. The method of claim 9, wherein the box shape contains two sets of two parallel planes and the two sets are perpendicular to each other.
11. The method of claim 9, wherein the opening shape contains two coplanar planes and a plane that is perpendicular to the two coplanar planes.
12. A non-transitory computer-readable recording medium having stored therein steps of a method that causes a computer to execute a process for determining dimensions in a scene, the method comprising the steps of: acquiring a single depth image of the scene acquired by a sensor; extracting planes from the single depth image; determining topological relationships of the planes; determining the dimensions based on the planes and the topological relationships; evaluating a quality of the dimensions of the planes acquired form the single depth image using a scene type, and if the quality is sufficient; then outputting the dimensions; otherwise outputting a guidance to reposition the sensor.
13. A system for determining dimensions in a scene including a processor in communication with a memory, the system comprising: a depth sensor configured to acquire a single depth image in the scene and transmit the single depth image with a color image, wherein the memory is configured to store steps of a method that causes the processor to execute a process for determining dimensions in a scene, wherein the process comprises steps of: acquiring the single depth image of the scene acquired by a sensor; extracting planes from the single depth image; determining topological relationships of the planes; determining the dimensions based on the planes and the topological relationships; evaluating a quality of the dimensions of the planes acquired form the single depth image using a scene type, and if the quality is sufficient; then outputting the dimensions; otherwise outputting a guidance to reposition the sensor.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
(5) As shown in
(6) In some embodiments, a Kinect™ for Xbox sensor is used as the depth sensor to obtain 3D point clouds of indoor scenes. Equipped with an infrared (IR) camera and a color (RGB) camera, Kinect is able to acquire a depth image and a color image of the scene. Therefore, in some, but not all embodiments, the depth image can be registered 104 with the color image by using sensor calibration to obtain an RGB-D image 101.
(7) Pre-processing 110 is applied to the depth image or the RGB-D image 101. The pre-processing includes extracting planar surfaces, and determining topological relationships of these planes 111. Based on the planes and their relationships, geometric analysis 120 is performed to determine the initial dimensions 121 of the scene.
(8) Using a scene type and the initial dimensional measurements, a quality 131 of the image and initial dimensions is evaluated 130. If the quality is sufficient 140, then the final dimensions 105 are output. Otherwise, guidance 141 is output to improve the quality of the data for obtaining better dimensions. For example, the guidance can indicate a better pose 142 for the sensor. The output can be to a user to manually reposition the sensor, or a robot to do so automatically.
(9) The steps of the method can be performed in a processor 100 connected to memory for storing the image and other data structures used by the method, and input/output interface by busses as known in the art. In essence, the method transforms a depth image of real-world objects, e.g., structures in an indoor scene, to dimensions of the objects.
(10) Indoor Scenes and Planar Surfaces
(11) Most indoor scenes are enclosed within planar surfaces. Based on this assumption, the geometric analysis is performed to obtain the dimensional information of specific infrastructures. To extract planar surfaces efficiently, a plane extraction procedure is applied to the depth image, see e.g., Feng et al, “Fast plane extraction in organized point clouds using agglomerative hierarchical clustering,” IEEE International Conference on Robotics and Automation (ICRA), pp. 6218-6225, 2014.
(12) The pixels in the depth image are segmented into groups that are used to construct a graph, where the groups are represented by nodes, and edges represent neighboring groups. Then, an agglomerative hierarchical clustering is performed on this graph to merge nodes on the same plane. The planes are refined by pixel-wise region growing.
(13) If the color image is available along with the depth image, i.e., the RGB-D image is used, and then the color information can be used to further segment the planes. For example, the colors appearing in each plane can be clustered, and the plane is segmented according to the clusters.
(14) After all the planes are extracted from the depth image, based on the plane parameters, the topological relationships among these planes are estimated. Four types of topological plane relationship are defined as follows:
(15) parallel: if the normal vectors of two planes are parallel to each other, then the two planes are parallel planes;
(16) coplanar: if two planes have the same parameters, then the two planes are coplanar planes and parallel;
(17) intersecting: if two planes are not parallel to each other, then the two planes are intersecting planes; and
(18) perpendicular: if the normal vectors of two planes are perpendicular (orthogonal to each other), then the two planes are perpendicular to each other.
(19) It should be noted that due to the uncertainty in sensor measurements, these relationships are determined approximately. For example, if the angle of normal vectors of two planes is less than 5 degrees, then the planes are considered as parallel planes.
(20) Geometric Analysis
(21) If all the measurements from the sensor are accurate, then the geometric dimension information can be directly determined based on the geometric representations of the scene. However, the sensor is not perfect and the measurements have uncertainty. To obtain accurate dimensional information, a least squares procedure is used. For example, the distance between two parallel planes and the distance between the boundaries of coplanar planes are of interest. Two methods for these two distance determinations are used to obtain accurate estimation.
(22) Distance Between Parallel Planes
(23) After extracting the planes, the plane parameters are estimated by the least squares procedure. A 3D plane equation is ax+by+cz+d=0, wherein a, b, c, and d are the plane parameters. If the measurements are given as A=[x, y, z, 1], where x, y, z are column vectors containing all the X, Y, Z coordinates of all the 3D points assigned to this plane, and the plane parameters are P=[a, b, c, d].sup.T, then a linear system can be constructed as
AP=0. (1)
(24) To obtain the least squares estimation, one solution is to perform singular value decomposition (SVD) on the matrix A and then the plane parameters P are extracted from the results of SVD.
(25) Because there are parallel plane sets, the plane parameter estimation results can be made more accurate by using this prior information. Suppose Plane i and Plane j are parallel to each other while the points assigned to these planes are represented by A.sub.i and A.sub.j respectively. To enforce the parallel constraint, Plane i and Plane j share the same normal vector and are defined as
a.sub.ix.sub.i+b.sub.iy.sub.i+c.sub.iz.sub.i+d.sub.i=0
a.sub.ix.sub.j+b.sub.iy.sub.j+c.sub.iz.sub.j+d.sub.j=0. (1)
(26) Then, a linear system similar to Equation (1) can be constructed with
(27)
(28) Therefore, by utilizing the SVD, the plane parameters of parallel planes are determined using all the points on both of the planes.
(29) After the parallel plane parameters are obtained, the distance between parallel planes is determined directly based on the plane parameters. For example, the distance between Plane i and Plane j is
dist.sub.ij=|d.sub.i−d.sub.j|. (4)
(30) Distance Between Boundaries of Coplanar Planes
(31) The distance between boundaries of coplanar planes is required to estimate, e.g., a width of a door frame. In this context, the width is the distance between the boundaries of the left and right walls (two coplanar planes) of the door. To determine this width, boundary points of the door frame are extracted, and then two lines are fitted based on the boundary points. The distance between these two parallel lines is the width of the door frame.
(32) In order to automatically locate door frames, the topological relationships between extracted plane surfaces are estimated based on the plane fitting results. After detecting the coplanar planes, all the coplanar planes are rotated to a 2D space.
(33)
(34) Then, for the first plane, for each point in CP1, a nearest point in the other plane boundary points CP2 is searched. After iterating all the points on the first plane, the points in CP2 that have been searched as the nearest points, BP2, are the door frame boundary points on the second plane. By repeating the process for the second plane, the door frame boundary points on the first plane, BP1, are found. After the door frame boundary points BP1 and BP2 are detected, the two lines are estimated from the two sets of boundary points respectively. The distance is estimated from the two lines.
(35) User Guidance
(36) Our user guidance system is based on the prior knowledge of the scene of interest. The goal of the user guidance system is to indicate the quality of the current frame data in terms of obtaining the dimensional information from the scene. We define high quality data as an image including sufficient data from the supporting planar surfaces of the infrastructure features of interest.
(37) The user guidance system evaluates the quality of obtained data based on the characteristics of the sensor and the scene. To fully utilize the prior information, the user guidance system visualizes the topological relationships of planar surfaces. We describe two general cases, box shape and opening.
(38) Box Shape
(39)
(40) To obtain the dimension of this structure, i.e., the width and height of the hallway, all the four planes should be acquired by the sensor. The user guidance is designed to make sure that the sensor acquires sufficient points from all the four planar surfaces with high accuracy.
(41) The user guidance assumes that at least three planes are detected from the scene. This assumption is reasonable because if the sensor only observes two planar surfaces, then the sensor may not be able to obtain all the four planes. This happens when the hallway is too high and it is impossible for the sensor to capture all the four planes. If one planar surface is not obtained in the data, then the geometric analysis is performed based on the partial data. Based on the prior information of the scene and the captured data, the potential shape is reconstructed so as to guide the user.
(42) For example as shown in Error! Reference source not found.if Plane D, i.e., the floor is not detected from the data, then the height of the hallway is unknown, but the width of the hallway can still be determined based on the two walls. Since the ceiling and the two walls are detected, the intersection lines between the ceiling and the walls can be derived. Based on the prior information and the determined intersection lines, a potential height is estimated and the box shape (white lines) can be constructed as shown in
(43) Because the method detects that there are no points from Plane D, the system suggests the user to reposition the sensor to obtain points from Plane D, the floor. By following the guidance, the sensor is lowered, or declined in orientation, and then an image,
(44) Apart from repositioning the sensor to acquire missing planes, the user guidance can also provide comments on the quality of the measurements based on the data quality. For example, an uncertainty of the depth sensor usually increases as the distance between the scene and the sensor increases. Thus, if scene elements are far from the sensor, the points of this object have high uncertainty, which affects the accuracy of dimensional measurements.
(45) Therefore, when all the four planes are detected from the data, for each plane, the distance between its centroid and the sensor is determined. If the distance to the sensor is larger than a threshold, e.g., 3.0 m, the user guidance system suggests the user to move the sensor closer to that plane so as to minimize the measurement uncertainty.
(46) Opening
(47) An opening structure is defined as an opening in a planar surface while there is another supporting planar surface, e.g., the floor, for the first planar surface. We use a door frame that is an opening in a wall as an example. As shown in
(48) If Plane C, the floor is not measured in the data, the system can still reconstruct the two solid lines in
(49) In addition, because the door usually indents in the walls, the wall might block the view of the sensor if the sensor view direction is not perpendicular to the door. Therefore, the user guidance system also takes this into consideration. The normal vector of the door surface is used for this evaluation. If the sensor view direction is not vertical to the door surface, the view direction is not parallel to the normal vector of the door surface. Therefore, the user guidance system is capable of offering feedback about adjusting the view direction of the sensor.
(50) Although the invention has been described by way of examples of preferred embodiments, it is to be understood that various other adaptations and modifications may be made within the spirit and scope of the invention. Therefore, it is the object of the appended claims to cover all such variations and modifications as come within the true spirit and scope of the invention.