APPARATUS AND METHOD FOR ESTIMATING CAMERA ORIENTATION RELATIVE TO GROUND SURFACE

Abstract

A method for estimating camera orientation relative to a ground surface. Line segments are detected from an image captured by a camera. A first virtual cube having three orthogonal vanishing points with a random 3D orientation is superimposed on to the image. The line segments of the image are classified grouped into 3D-directional groups. A second virtual cube is superimposed on to the image with an initial 3D orientation. An optimal 3D orientation of the second virtual cube is computed by iteratively changing the 3D orientation of the second virtual cube and measuring perpendicular distances of the three orthogonal vanishing points to the three line segment groups in each iteration starting with the initial 3D orientation, wherein the optimal 3D orientation of the second virtual cube being one that provides shortest perpendicular distances. Co-variances of the orthogonal vanishing points of the second virtual cube at the optimal orientation are computed. Ground orientation is computed from the second virtual cube at the optimal orientation.

Claims

1. A method for estimating camera orientation of a front-facing camera, comprising: recording a first image of a scene before the front-facing camera; determining a ground plane on the first image, comprising: detecting a plurality of line segments in the scene in the first image; superimposing a first virtual cube having three orthogonal vanishing points on to the first image, wherein the first virtual cube being in a first three-dimensional (3D) orientation; classifying and grouping the line segments of the first image into a first, a second, and a third 3D-directional groups by comparing and identifying a shortest among perpendicular distances between each of the three orthogonal vanishing points of the first virtual cube and each of the detected line segments, respectively; superimposing a second virtual cube on to the first image, wherein the second virtual cube has three orthogonal vanishing points on the first image, wherein the second virtual cube being in an initial 3D orientation represented by an initial rotation matrix R.sub.0; computing an optimal 3D orientation of the second virtual cube with respect to the line segment groups by iteratively changing the 3D orientation of the second virtual cube and measuring perpendicular distances of the three orthogonal vanishing points to the three line segment groups in each iteration starting with the initial 3D orientation, wherein the optimal 3D orientation of the second virtual cube being one that provides shortest perpendicular distances; computing co-variances of the three orthogonal vanishing points of the second virtual cube at the optimal orientation; computing ground orientation from one of the three orthogonal vanishing points of the second virtual cube at the optimal orientation; and determining the ground plane on the first image according to the ground orientation and determining an estimation error in response to the co-variances; and estimating the camera orientation of the front-facing camera from the determined ground plane on the first image.

2. The method of claim 1, wherein the first 3D orientation of the first virtual cube is randomly selected.

3. The method of claim 1, wherein the first 3D orientation of the first virtual cube is a best-guess selection based on last known position and orientation of the front-facing camera.

4. The method of claim 1, further comprising: recording a second image of a scene before the front-facing camera following the first image; taking the optimal orientation of the second virtual cube as input to determine a ground plane on the second image; and estimating the camera orientation of the front-facing camera from the determined ground plane on the second image.

5. The method of claim 1, further comprising: repeatedly recording subsequent N number of images of a scene before the front-facing camera; for each of subsequent images, determining a ground plane on the first image, wherein the first image is the subsequent image; and selecting a most accurate ground plane among the determined ground planes of the subsequent images, wherein the most accurate ground plane corresponds to the computed ground orientation having the least estimation error in response to the co-variances; and estimating the camera orientation of the front-facing camera from the most accurate ground plane.

6. The method of claim 1, wherein the three orthogonal vanishing points of the first virtual cube are computed by using the pre-determined camera calibrated matrix; and wherein the classifying and grouping the detected line segments of the first image comprises: obtaining three reference values between the three orthogonal vanishing points of the first virtual cube and an object line segment, wherein the three reference values are related to vertical distances between the three orthogonal vanishing points of the first virtual cube and every line segment, respectively; and grouping the object line segment into one of the first, second, and third groups when only one of the reference values is below an acceptance threshold and the rest are above a rejection threshold.

7. The method of claim 1, wherein the three orthogonal vanishing points of the second virtual cube are computed by using the pre-determined camera calibrated matrix; and wherein computing the optimal orientation of the second virtual cube with respect to the grouped line segments comprises: applying an infinitesimal rotational perturbation to the initial rotation matrix R.sub.0 by using an Euler-angle ϕ, so as to derive a function of the Euler-angle ϕ; taking pixel noise level at both end of an object line segment as input to the function of the Euler-angle ϕ, so as to obtain a Sampson error distance formula expressing relationships between the object line segment and the three orthogonal vanishing points of the second virtual cube; computing a target Euler-angle ϕ.sup.* such that rate of change of the function of the Euler-angle ϕ with substituting the target Euler-angle ϕ.sup.* is 0; and using the target Euler-angle ϕ.sup.* to compute the optimal orientation of the second virtual cube.

8. The method of claim 7, wherein computing the optimal orientation of the second virtual cube with respect to the grouped line segments is an iterative algorithm such that an absolute of the target Euler-angle ϕ.sup.* converges to close to zero.

9. The method of claim 7, wherein computing the target Euler-angle ϕ.sup.* such that the rate of the change of the function of the Euler-angle ϕ with substituting the target Euler-angle ϕ.sup.* is 0 comprises: computing ϕ.sup.*=A.sup.+b, where $H_{i}^{τ} = {[P_{i}]}_{x} R_{0} K^{τ} l_{i};$ $\frac{\partial l_{t}}{\partial g} = [\begin{matrix} 0 & 1 & 0 & 1 \\ - 1 & 0 & 1 & 0 \\ q & - p & - v & u \end{matrix}];$ $J_{i} = P_{i}^{τ} R^{τ} K^{τ} \frac{\partial l_{i}}{\partial g};$ $w_{i} = J_{i} {.Math.}_{g} J_{i}^{τ};$ $A = {.Math.}_{i} H_{i}^{τ} w_{i} H_{i};$ $b = {.Math.}_{i} H_{i}^{τ} w_{i} ϵ_{i};$ l.sub.i=(p.sub.i, q.sub.i,1)×(u.sub.i, v.sub.i, 1) which is the object line segment between two end points (p.sub.i, q.sub.i) and (u.sub.i, v.sub.i); K is the camera calibrated matrix; R is set by using the initial rotation matrix R.sub.0 as input; $P_{i} = {\begin{matrix} {[0, 0, 1]}^{τ} & for l_{i} intersecting at v_{z} \\ {[0, 1, 0]}^{τ} & for l_{i} intersecting at v_{y}; \\ {[1, 0, 0]}^{τ} & for l_{i} intersecting at v_{x} \end{matrix}$ v.sub.z, v.sub.y, v.sub.x are the three orthogonal vanishing points of the second virtual cube; ∈.sub.i=l.sub.i.sup.τKRP.sub.i; and Σ.sub.g is the pixel noise level at the both end of the object line segment.

10. The method of claim 7, further comprising converting the target Euler-angle ϕ.sup.* to a rotation matrix to act on the second tube.

11. The method of claim 1, wherein computing the ground orientation comprises: taking a resulted rotation matrix R.sup.* representing the optimal orientation of the second virtual cube as input to compute a ground normal vector n of the ground orientation as n=R.sup.* [0,0,1].

12. The method of claim 1, wherein detecting the plurality of line segments from the first image comprises: converting the first image in RBG into a 2D array containing only zeros and ones, using Canny edge detection; and detecting the line segments from the 2D array using statistical Hough transform.

13. A method for guiding a vehicle having a front-facing camera, comprising: executing a method for estimating camera orientation of the front-facing camera of claim 1; and controlling motions of the vehicle by a remote processing server in response to the estimated camera orientation.

14. A remote processing server for estimating camera orientation of front-facing camera of an autonomous guided vehicle (AGV), comprising: a processor in data communication with an AGV; wherein the processor is configured to receive a video file or data stream from the AGV and to execute the method for estimating camera orientation of claim 1 with respect to the front-facing camera of the AGV.

15. An autonomous guided vehicle (AGV), comprising: a front-facing camera installed at a front side of the AGV body and configured to capture a scene before the AGV; a processor configured to receive a video file or data stream from the front-facing camera and to execute the method for estimating camera orientation of claim 1 with respect to the front-facing camera of the AGV.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0009] Embodiments of the invention are described in more details hereinafter with reference to the drawings, in which:

[0010] FIG. 1 depicts a schematic diagram of an exemplary embodiment of an autonomous guided vehicle (AGV) in communication with a remote processing server;

[0011] FIG. 2 shows a real-world scene image and what a machine-vision application sees/senses;

[0012] FIG. 3 shows another real-world scene image and what a machine-vision application see/senses;

[0013] FIG. 4 depicts a flowchart of a method for estimating camera orientation according to various embodiments of the present invention;

[0014] FIG. 5 shows a video frame/image with line segment indicators before and after line segment classification and grouping according to various embodiments of the present invention;

[0015] FIGS. 6A-6C illustrate a virtual cube having three vanishing points with rotating;

[0016] FIG. 7 illustrates superimposing a virtual cube on to an image/video frame;

[0017] FIGS. 8A and 8B show two exemplary results of the line segment classification and grouping;

[0018] FIG. 9 depicts a flowchart of a method for estimating orthogonal vanishing points according to various embodiments of the present invention;

[0019] FIG. 10 illustrates the computation of a perpendicular distance between an orthogonal vanishing point and a line segment according to various embodiments of the present invention;

[0020] FIG. 11 illustrates the ground plane in a range of estimation error in an image/video frame;

[0021] FIG. 12 illustrates an orthogonal vanishing point on a line segment;

[0022] FIG. 13 illustrates the relationships between three vanishing points and the X, Y, and Z 3D plane line segment groups; and

[0023] FIG. 14 depicts a schematic diagram of another exemplary embodiment of an AGV.

DETAILED DESCRIPTION

[0024] In the following description, methods and apparatuses for estimating camera orientation relative to a ground plane by leveraging properties of orthogonal vanishing points, and the likes are set forth as preferred examples. It will be apparent to those skilled in the art that modifications, including additions and/or substitutions may be made without departing from the scope and spirit of the invention. Specific details may be omitted so as not to obscure the invention; however, the disclosure is written to enable one skilled in the art to practice the teachings herein without undue experimentation.

[0025] In the present disclosure, 2D and 3D spatial geometry, such as points and lines as perceived by machine vision are represented in projective space coordinates. Definitions for mathematical notations in the present disclosure are listed as follows:

[0026] A point p in a two-dimensional projective space custom-character .sup.2 is represented as three-vector {right arrow over (p)}=(u, v, k), and its coordinate in a two-dimensional Euclidean space .sup.2 is

[00001] $(\frac{u}{k}, \frac{v}{k});$

[0027] A line l in custom-character .sup.2 is represented as three-vector {right arrow over (l)}=(a, b, c), and its slope and y-intercept in .sup.2 is respectively

[00002] $- \frac{a}{b} and - \frac{c}{b};$

[0028] A point p is on a line l in custom-character .sup.2 if and only if p.sup.τl=0 because au+bv+ck=0 which is a line equation;

[0029] a.sup.τ represents transpose of a, and a.sup.τb represents dot product between two vectors a and b.

[0030] Projective transformation H in custom-character .sup.2 is a 3×3 matrix. It transforms a point in .sup.2 from p to p′=Hp.

[0031] If H in custom-character .sup.2 transforms point from p to p′=Hp, it transforms line from l to l′=H.sup.−τl.

[0032] A.sup.−τ represents transpose of matrix A.sup.−1, and A.sup.−1 represents inverse of matrix A;

[0033] A point in three-dimensional custom-character .sup.3 is P=(X, Y, Z). Under a pinhole camera model, an image captured by a pinhole camera is modeled as a point p=KP in two-dimensional .sup.2, where K is a projective transformation in .sup.2.

[0034] K is also known as camera calibrated (or intrinsic) matrix, and it encodes camera's focal length f and principal point (p.sub.x, p.sub.y)

[00003] $K = [\begin{matrix} f & 0 & p_{x} \\ 0 & f & p_{y} \\ 0 & 0 & 1 \end{matrix}]$

such that point =(X, Y, Z) in custom-character .sup.3 is imaged as point

[00004] $p = (f \frac{X}{Z} + p_{x}, f \frac{Y}{Z} + p_{y}, 1) in ℙ^{2};$

and

[0035] A camera calibrated matrix K can be found by some manual calibration procedure.

[0036] Referring to FIG. 1. The AGV 100 includes a vehicle body 110, a front-facing camera 120, and a control and communication circuitry 130 disposed in the body 110. The body 110 has a moving mechanism such as wheels 112 to go forward along arrow. The front-facing camera 120 is installed at the front side of the body 110, so as to capture a scene in front of the AGV 100 and record it into a video file/data stream. The control and communication circuitry 130 is electrically coupled with the front-facing camera 120 and configured to receive the video file/data stream from the front-facing camera 120. The control and communication circuitry 130 is in communication with the remote processing server 150 via a wireless link for uploading the video file/data stream to and downloading instructions from the remote processing server 150. Herein, the instructions include control commands for AGV movements or actions, such as going straight, turning right/left, return to charging station, shutting down, etc. The remote processing server 150 is configured to execute the AGV's machine vision application in processing the video file/data stream. In one embodiment, a group of multiple AGVs 100 is arranged in a factory, warehouse, or distribution center and the remote processing server 150 is responsible to coordinate the group to perform automated large-scope operations such as transporting product parts in a production assembly and routing goods and packages for delivery.

[0037] In practical cases, during operation of AGVs 100, certain conditions encountered may result in computational problems that cause the AGVs 100 unable to function. For example, as shown in FIG. 2, the left image is a real-world scene image and the right image is what the machine vision application implemented in one of the AGVs 100 sees/senses, and the difference therebetween is that the right one shows line segments 200 detected from the real-world scene image. To estimate camera orientation, either a Z-plane vanishing point (VP) or a ground plane is typically necessary. However, in the right image, a Z-plane VP is very close to infinity because the Z-direction line segments are almost parallel with each other. Under such condition, it is difficult to apply Z-plane VP estimation to the front-facing camera.

[0038] Further, as shown in FIG. 3, the left image is a real-world scene image and the right image is what the machine vision application implemented in one of the AGVs 100 sees/senses, and a difference therebetween is the right one shows line segments 200 detected from the real-world scene image. It is noticed that no Z-direction line segment is detected from the real world scene image. In this regard, even though a ground plane in the real-world scene image is clearly visible for human sense, the ground plane cannot be estimated by the machine vision application due to no line segment going through a Z-plane VP. That is, it is almost impossible to estimate an object ground plane when there is no vertical structure in the real world scene image.

[0039] Referring to the flowchart depicted in FIG. 4, a method for estimating camera orientation is provided to address the issue of missing Z-direction line segment in accordance to one embodiment of the present invention. The method includes steps S10-S60, which may be adopted by the machine vision application implemented by specific configurations of the AVG 100 and/or the remote processing server 150 as shown in FIG. 1.

[0040] In the step S10, a video file/data stream is produced by the AVG 100's front-facing camera 120 in capturing a real world scene before it and transmitted to the remote processing server 150 via the wireless communication. The video file/data stream contains a plurality of video frames of continuous images.

[0041] In the step S20, one video frame/image is extracted from the video file/data stream by the remote processing server 150. The video frame/image is static and reflects the real-world scene (i.e. the left image in FIG. 2 or FIG. 3).

[0042] In the step S30, detection of line segments in the video frame/image is performed by the remote processing server 150, such that line segments are generated on the video frame/image (i.e. the right image in FIG. 2 or FIG. 3). In one embodiment, the generation of the line segments applies Canny edge detection and Statistical Hough transform. More specifically, the video frame/image in RGB setting is converted into a 2D array containing only zeros and ones and then Canny edge detection is applied to the 2D array. Thereafter, line segments are detected from the 2D array by using statistical Hough transform, so as to obtain the locations where the line segments are on the video frame/image.

[0043] In step S40, the line segments detected in the step S30 are classified and grouped into three orthogonal directions, for example, the X, Y and Z directions. FIG. 5 shows a video frame/image with line segment indicators before and after line segment classification and grouping. The left image shows detected many line segments 210 extending to many different directions before classification and grouping. The right shows only those line segments that are classified and grouped.

[0044] In the present disclosure, definition for X, Y, and Z directions is that the X, Y, Z directions are orthogonal in 3D space and satisfy: X.Math.Y=0; Y.Math.Z=0; and Z.Math.X=0. Further, the points at infinity in 3D space along X, Y, and Z directions are captured by a camera with K onto a 2D object image at image locations known as VPs and denoted by v.sub.x, v.sub.y, v.sub.z, hereafter. They are also “orthogonal with respect to ω” on an object image such that v.sub.z.sup.τωv.sub.y=0; v.sub.y.sup.τωv.sub.x=0; and v.sub.x.sup.τωv.sub.z=0, where ω=K.sup.−τK.sup.−1, which is known as “image of absolute conic”, and K refers to the aforementioned definition of the camera calibrated matrix.

[0045] At the beginning of the line segment classification, it is assumed a virtual cube with an initial orientation in 3D space as shown in FIG. 6A, in which the virtual cube defines three group of edges in X, Y, and Z directions, respectively. The initial orientation can be randomly selected or a best-guess orientation based on the last known position and orientation of the camera. Each pair of the edges of the virtual cube in each group is pointing to one of the orthogonal directions X, Y, and Z in 3D space, and as the edges extend along the X, Y, and Z directions, they converge at infinitely far and vanish at points v.sub.x, v.sub.y, v.sub.z in an object image, respectively. The virtual cube may rotate about any 3D axis . For example, FIG. 6B shows that the virtual cube rotates leftward, and FIG. 6C shows that it rotates downward.

[0046] Continuing with step S40, the virtual cube is superimposed onto the video frame/image with line segments detected as shown in FIG. 7. The virtual cube is in an initial orientation represented by an initial 3×3 rotation matrix R. In one embodiment, the initial orientation is randomly selected. In another embodiment, the initial orientation is selected based on the last known position and orientation of front-facing camera 120, meaning the initial orientation is selected in a pre-defined range by best guess. The initial orientation with the rotation matrix R yields three orthogonal VPs: v.sub.x, v.sub.y and v.sub.z. Perpendicular distances of each detected line segment to all three orthogonal VPs are measured on the object image. In theory, it can be hypothesized that by extending the detected line segment in 3D space, it eventually converges to one of VP v.sub.x, v.sub.y and v.sub.z in 2D image. As such the classification of the detected line segment depends on which of the VPs that it is expected to converge to.

[0047] As illustrated in FIGS. 8A and 8B, a detected line segment is labelled l. Perpendicular distances from line l to the VPs v.sub.x, v.sub.y and v.sub.z are labeled as δ.sub.X, δ.sub.Y, and δ.sub.Z, respectively. As shown in FIG. 8A, if line l is very close to v.sub.x and very far from both v.sub.y and v.sub.z, δ.sub.X is measured relatively very short and both δ.sub.Y and δ.sub.Z very long, meaning line l in the video frame/image is classified to be pointing toward the X direction in 3D space with high confidence, and consequently grouped into the X group. The same logic is applied in classifying and grouping other detected line segments into Y or Z groups. On the other hand, as shown in FIG. 8B, if line l approximate-equally close to more than one VPs (i.e. v.sub.x and v.sub.z), line l cannot be classified and becomes an out-liner. Lastly, if line l is far away from all three VPs, line l cannot be classified and also becomes an out-liner.

[0048] In accordance to one embodiment, the line segment classification is computed using the following algorithm. Hypothesizing a line l to converge to one of VPs v.sub.x, v.sub.y and v.sub.z, the computation of the Sampson error

[00005] $\frac{ϵ_{i}^{2}}{J_{i} {.Math.}_{g} J_{t}^{τ}}$

of the three hypotheses is equivalent to determining δ.sub.X, δ.sub.Y, and δ.sub.Z . Here, the computation of the Sampson error is based on defining distance δ.sub.i, where i is x, y, or z, in terms of the rotation matrix R as:

[00006] $distance δ_{i} = \frac{l_{i}^{τ} K R P_{i}}{J_{i} {.Math.}_{g} J_{i}^{τ}},$

where

[0049] l.sub.i=(p.sub.i, q.sub.i, 1)×(u.sub.i, v.sub.i, 1) which is a line segment having two end points (p.sub.i, q.sub.i) and (u.sub.i, v.sub.i);

[0050] K=3×3 camera calibrated matrix;

[0051] R=3×3 rotation matrix;

[00007] $P_{i} = {\begin{matrix} {[0, 0, 1]}^{τ} & for l_{i} intersecting at v_{z} \\ {[0, 1, 0]}^{τ} & for l_{i} intersecting at v_{y}; and \\ {[1, 0, 0]}^{τ} & for l_{i} intersecting at v_{x} \end{matrix}$

[0052] where the denominator J.sub.iΣ.sub.gJ.sub.i.sup.τ can be understood as pixel error of l.sub.i, meaning the pixel noise level at both end of the object line segment.

[0053] Then, with the conditions of ∈.sub.i=l.sub.i.sup.τKRP.sub.i, serving as scalar residual error, Σ.sub.g=4×4 co-variances matrix of (p, q, u, v), and J.sub.i=Jacobian of ∈.sub.i w.r.t. g=(p, q, u, v), Sampson error is computed and δ.sub.X, δ.sub.Y, and δ.sub.Z for each line segment are obtained. Using the Sampson error computation in the aforementioned first illustration as shown in FIG. 8A, since the Sampson error of v.sub.z-hypothesis is relatively very small, but that of both v.sub.y/v.sub.x-hypotheses are very large, line l is classified as a Z group.

[0054] Step S40 can also be described as a qualitative computation with input parameters including an initial rotation matrix R, line segments l.sub.i, camera calibrated matrix K, pixel noise co-variance Σ.sub.g, an acceptance level α, and a rejection level β, where α=0.05 and β=0.95≥α for example. The objective is to classify line segments l.sub.i into X, Y, or Z group.

[0055] The step S40 qualitative computation comprises the following steps:

[0056] Step I: for each l.sub.i, intermediate expressions as follows are computed:

[00008] $δ_{i}^{Z} = \frac{ϵ_{i}^{2}}{J_{i} {.Math.}_{g} J_{i}^{τ}} |_{P_{i} = {[0, 0, 1]}^{τ}} δ_{i}^{Y} = \frac{ϵ_{i}^{2}}{J_{i} {.Math.}_{g} J_{i}^{τ}} |_{P_{i} = {[0, 1, 0]}^{τ}} δ_{i}^{X} = \frac{ϵ_{i}^{2}}{J_{i} {.Math.}_{g} J_{i}^{τ}} |_{P_{i} = {[1, 0, 0]}^{τ}};$

[0057] Step II: costs are sorted such that δ.sub.i.sup.D.sup.1≤δ.sub.i.sup.D.sup.2≤δ.sub.i.sup.D.sup.3 where D.sub.k is either Z, Y or X; and

[0058] Step III: l.sub.i is classified into D.sub.1 directional group if:

δ.sub.i.sup.D.sup.1≤F.sub.1.sup.−1(α) and δ.sub.i.sup.D.sup.2>F.sub.1.sup.−1(β),

where F.sub.n, is cumulative chi squared χ.sup.2 distribution with n degree of freedom. The determination condition in step III serves as a Hysteresis window to avoid line segments that are likely determined to be pointing to multiple directions.

[0059] Furthermore, the initial random orientation of the virtual cube may not be within a range of proximately correct orientations, and if the initial orientation is entirely incorrect, the classification may fail entirely. As such, multiple trial-and-error runs with multiple random initial orientations and rotation matrix R are needed. In one embodiment, the initial orientation and rotation matrix R of the trial-and-error run that yield the lowest number of out-liners are selected for the correct line segment classification and grouping generated there within.

[0060] After the classification and grouping, the video frame/image having the properly classified and grouped line segments in X, Y, Z is obtained, as shown in the right image of FIG. 5 as an example. The next step is the step S50, which is to estimate orthogonal VPs.

[0061] Referring to FIG. 9. Step S50 comprises sub-steps P10-P70. In sub-step P10, a virtual cube with an initial orientation in 3D space that is represented by a 3×3 rotation matrix R (note that the rotation matrices in steps S40 and S50 can be same or different). Similar to the previous step S40, the initial orientation with rotation matrix R yields three orthogonal VPs v.sub.x, v.sub.y and v.sub.z. However, at least one difference between step S40 and step S50 is that since all line segments are already classified and grouped into X, Y, and Z groups, the virtual cube in the step S50 is first rotated to the best initial orientation (as shown in the right image of FIG. 5) such that v.sub.x is closest to all line segments in the X group; v.sub.y is closest to all line segments in the Y group; and v.sub.z is closest to all line segments in the Z group.

[0062] In sub-step P20, distances δ.sub.i between VPs v.sub.x, v.sub.y ,v.sub.z and every line segment in their respective groups are measured. For example, referring to FIG. 10, a distance δ.sub.i between v.sub.x and a line l.sub.i of the X group is measured, where the distance δ.sub.i is defined in terms of the rotation matrix R as aforementioned:

[00009] $distance δ_{i} = \frac{l_{i}^{τ} K R P_{i}}{J_{i} {.Math.}_{g} J_{i}^{τ}} .$

In the present step, linearized rotation matrix technique is further applied to approximating an infinitesimal rotational perturbation at the rotation matrix R as: R′=(1−[ϕ].sub.x)R, ϕ is a three-vector Euler-angle. Such technique achieves low-complexity linear matrix computation. Note that for any three-vector a=(a.sub.1, a.sub.2, a.sub.3), it has:

[00010] ${[a]}_{x} = [\begin{matrix} 0 & - a_{3} & a_{2} \\ a_{3} & 0 & - a_{1} \\ - a_{2} & a_{1} & 0 \end{matrix}] .$

Then, since scalar residual error ∈.sub.i=l.sub.i.sup.τKRP.sub.i is related to the rotation matrix R, R′ can be substituted into ∈.sub.i such that the total Sampson error is expressed in terms of ϕ, which yields the following expression.

[00011] $J (ϕ) = \underset{i}{.Math.} \frac{ϵ_{i}^{2}}{J_{i} {.Math.}_{g} J_{i}^{τ}} .$

[0063] In sub-step P20, the total Sampson error is computed as equivalent to computing the distance δ.sub.i between v.sub.x, v.sub.y, v.sub.z and every line segment in their respective group. A least square estimation (LSE) for the three orthogonal VPs v.sub.x, v.sub.y, v.sub.z is expressed as

[00012] $R^{*} = \underset{R}{\arg \min} {.Math.}_{i} \frac{ϵ_{i}^{2}}{J_{i} {.Math.}_{g} J_{i}^{τ}},$

and the output for the expression may include the optimal orthogonal VPs v.sub.z.sup.*, v.sub.y.sup.*, v.sub.z.sup.* and jointly as:

v.sub.z.sup.*=KR.sup.*[0,0,1].sup.τ;

v.sub.y.sup.*=KR.sup.*[0,1,0].sup.τ; and

v.sub.x.sup.*=KR.sup.*[1,0,0].sup.τ.

[0064] In sub-step P30, an optimal three-vector Euler-angle ϕ*, which is also referred to as the best rotation, is computed from the total Sampson error. Since the total Sampson error J(ϕ) is a function of ϕ, the minimum of J(ϕ) occurs at ∂J(ϕ)/∂ϕ=0 (i.e. in an orientation that rate of change of the total Sampson error is zero). As such, by solving the equation ∂J(ϕ)/∂ϕ=0, the optimal three-vector Euler-angle ϕ* (i.e. the best or optimal orientation) is determined.

[0065] In sub-step P40, the optimal three-vector Euler-angle ϕ* is converted to a 3×3 rotation matrix by e.sup.[ϕ]x, which is also referred to as a rotation matrix R″, where e.sup.A is an exponential function of matrix A. The rotation matrix R″ represents a new orientation. It should be noted that the yielded rotation matrix R″ is different from the aforementioned rotation matrices R and R′, and thus it is labelled by the different symbol.

[0066] In sub-step P50, an absolute value of the three-vector Euler-angle ϕ* (i.e. ||ϕ*∥) is checked whether it is very close to 0. If it is very close to 0, the computation proceeds to sub-step P60. Otherwise, the computation reiterates from sub-step P20, and the yielded rotation matrix R″ serves as input instead of the initial rotation matrix R generated in sub-step P10; and then a further 3×3 rotation matrix is obtained by executing sub-steps P20-P40 again.

[0067] In sub-step P60, co-variances of the yielded rotation matrix R″ are also computed, which is equivalent to uncertainty in the yielded rotation matrix R″ due to error of l.sub.i.

[0068] In sub-step P70, the VP v.sub.z.sup.* is computed by v.sub.z.sup.*=KR.sup.*[0,0,1].sup.τ. That is, the yielded rotation matrix R″ serves as input for the computation of the VP v.sub.z.sup.*. In one embodiment, if iterative executions are performed (i.e. when the execution of sub-step P50 results in reiterating from sub-step P20 repeatedly), a 3×3 rotation matrix R eventually obtained to compute the ground orientation is expected to have the least total error trace (Σ.sub.ϕ).

[0069] Step S50 can also be described as a qualitative computation with input parameters including an initial rotation matrix R.sub.0, three groups of line segments l.sub.i going through respectively v.sub.z, v.sub.y, v.sub.x, a camera calibrated matrix K, and pixel noise co-variance Σ.sub.g. The objective is to find R.sup.* such that Σ.sub.i∈.sub.i.sup.2/J.sub.iΣ.sub.gJ.sub.i.sup.τ is minimized and to find co-variance Σ.sub.ϕ of R.sup.* in terms of Euler angles linearized at R.sub.0.

[0070] The step S50 qualitative computation comprises the following steps:.

[0071] Step I: a parameter R is initialized by using an initial rotation matrix R.sub.0 as input (R←R.sub.0);

[0072] Step II: intermediate expressions as follows are computed:

[00013] $H_{i}^{τ} = {[P_{i}]}_{x} R_{0} K^{τ} l_{i};$ $\frac{\partial l_{i}}{\partial g} = [\begin{matrix} 0 & 1 & 0 & 1 \\ - 1 & 0 & 1 & 0 \\ q & - p & - v & u \end{matrix}];$ $J_{i} = P_{i}^{τ} R^{τ} K^{τ} \frac{\partial l_{i}}{\partial g};$ $w_{i} = J_{i} {.Math.}_{g} J_{i}^{τ};$ $A = {.Math.}_{i} H_{i}^{τ} w_{i} H_{i};$ $b = {.Math.}_{i} H_{i}^{τ} w_{i} ϵ_{i};$ $ϕ = A^{+} b, R_{ϕ} = R_{0} e^{{[ϕ]}_{x}} R_{0}^{τ};$

where A.sup.+ is pseudo inverse of A;

[0073] Step III: the parameter R is updated by using R.sub.ϕR as input (R←R.sub.ϕR);

[0074] Step IV: determine whether to proceed to the next step of the computation: if ∥ϕ∥ is very close to 0, it is determined to proceed to the next step; otherwise, it returns to the step II; and

[0075] Step V: a final-determination parameter R* is set by using the parameter R (R.sup.*←R), and co-variance Σ.sub.ϕ of R.sup.* is computed, in which Σ.sub.ϕ=A.sup.+.

[0076] In one embodiment, if ∥ϕ∥ is not very close to 0 or exceeds a preset threshold value, step I to step V are repeated by taking R* as input to set R in step I (R←R.sup.*) and reiterate the execution of the computation until ∥ϕ∥ is very close to 0.

[0077] Referring to FIG. 4 again. In step S60, with the obtained VP v.sub.z.sup.*, the ground normal n, which is a normal vector 90 degree to a ground plane, is computed by n=K.sup.−1v.sub.z.sup.*. In one embodiment, multiple consecutive images or video frames are used in finding the most accurate ground normal n′. In this case, Steps S10 to S60 are executed in loops each on a image or video frame following the last captured in a sequence of, e.g. 30 images or video frames, or images or video frames captured during a pre-determined time period. In one embodiment, the initial 3D orientation of the virtual cube used in step S30 is randomly selected and different for each image or video frame in each of the looped executions. After the looped executions are completed, in one embodiment of step S70, the most accurate ground normal n′ is determined by selecting the ground orientation having the least estimation error in response to the co-variances computed in step S50. In another embodiment, the most accurate ground normal n′ is the convergence of all the ground normals obtained in the looped executions. The most accurate ground normal n′ is then output to determine a ground plane in the video frame/image.

[0078] By the above processes, as the ground orientation is estimated from orthogonal VPs, camera orientation is correspondingly obtained. In the illustrative example above, none of the X, Y, and Z groups is empty. Nonetheless, embodiments of the present invention allow any one of the three groups to be empty (i.e. both X and Y groups contain at least one detected and classified line segment but the Z group is empty) because the computation in step S50 needs at least two of the three VPs being matched with the orientation of the virtual cube. Nevertheless, step S50 takes the advantage of having all three non-empty groups. It matches all three VPs to three groups and further reduces estimation error.

[0079] Referring to FIG. 12. A point v is on a line segment l if and only if l.sup.τv=0. Referring to FIG. 13. Assuming a group of line segments l.sub.i for all i (l.sub.x, l.sub.y, l.sub.z), the single common intersection point v.sup.* (either one of the v.sup.*.sub.x, v.sup.*.sub.y, v.sup.*.sub.z) closest to all line segments in the group in least square sense is:

[00014] $v^{*} = \underset{v}{\arg \min} {.Math.}_{i} .Math. l_{i}^{τ} v .Math.,$

where ∥a∥ is the norm of vector a.

[0080] Further, assuming three group of line segments l.sub.x.sub.i, l.sub.y.sub.j, l.sub.z.sub.k, their three respective least square intersection points v.sub.x.sup.*, v.sub.y.sup.*, v.sub.z.sup.* are:

[00015] $\begin{matrix} v_{x}^{*} = \underset{v_{x}}{\arg \min} \underset{i}{.Math.} .Math. l_{x_{i}}^{τ} v_{x} .Math., v_{y}^{*} = \underset{v_{y}}{\arg \min} \underset{j}{.Math.} .Math. l_{y_{i}}^{τ} v_{y} .Math., v_{z}^{*} = \underset{v_{z}}{\arg \min} \underset{k}{.Math.} .Math. l_{z_{k}}^{τ} v_{z} .Math. & (1) \end{matrix}$

To enforce the orthogonality among v.sub.x.sup.*, v.sub.y.sup.*, v.sub.z.sup.*, and avoid 0 trivial solutions, the minimization above are constrained by: v.sub.x.sup.*τωv.sub.y.sup.*=0, v.sub.y.sup.*τωv.sub.z.sup.*=0, v.sub.z.sup.*τωv.sub.x.sup.*=0, ∥v.sub.x.sup.*∥=1, ∥v.sub.y.sup.*∥=1, ∥v.sub.z.sup.*∥=1. In a case where i.e. group l.sub.z.sub.k is empty, it is still possible to compute v.sub.z.sup.*from v.sub.x.sup.*, v.sub.y.sup.* by the orthogonality constraint. Therefore, a ground plane is estimated where there is no e.g. vertical structure in a video frame/image. Moreover, v.sub.z.sup.* is estimated from all groups of lines l.sub.x.sub.i, l.sub.y.sub.j, l.sub.z.sub.k, instead of only from group l.sub.z.sub.k, such that error uncertainty in v.sub.z.sup.* is reduced further, which in turn solves the problem of difficult estimation of Z VP in a front-facing camera.

[0081] On the other hand, it is difficult to solve the quadratic least square minimization problem with six quadratic equality constraints such as those presented in above equation (1). In this regard, the present invention provides a solution, presented below, to circumvent the quadratic equality constraints by rotation.

[0082] Basis directions in custom-character .sup.3 are defined as X=[1,0,0], Y=[0,1,0] and Z=[0,0,1], and their VPs are imaged by a camera calibrated matrix K as:

q.sub.x=KX, q.sub.y=KY, q.sub.z=KZ (2).

Since v.sub.x.sup.*, v.sub.y.sup.*, v.sub.z.sup.* and q.sub.x, q.sub.y, q.sub.z are on the plane at infinity, there exists a projective transformation H.sup.* such that

v.sub.d.sup.*=H.sup.*q.sub.d for d being x, y and z (3).

A simple check reveals v.sub.x.sup.*, v.sub.y.sup.*, v.sub.z.sup.* are orthogonal on ω=K.sup.−τK.sup.−1. Because H.sup.* transforms points on the plane at infinity, the “infinite homography” property is applied herein, such that:

H.sup.*=KR.sup.*K.sup.−1, where R.sup.* is a rotation matrix (4).

Accordingly, by substituting equations (2), (3), and (4) into the equation (1), the least square intersection points problem for three line-segment groups l.sub.x.sub.i, l.sub.y.sub.j, l.sub.z.sub.k for all i, j, k becomes:

[00016] $R^{*} = (\underset{i}{.Math.} .Math. l_{x_{i}}^{τ} KRX .Math. + \underset{j}{.Math.} .Math. l_{y_{i}}^{τ} KRY .Math. + \underset{k}{.Math.} .Math. l_{z_{k}}^{τ} KRZ .Math.) .$

[0083] As such, the six quadratic equality constraints are eliminated and transformed to depend upon only one rotation matrix constraint. Furthermore, it explains the definition of distance δ.sub.i as being related to “l.sub.i.sup.τKRP.sub.i” and is effective to estimate camera orientation.

[0084] Therefore, by leveraging the property of orthogonal VPs, camera orientation is automatically estimated by analyzing sense structure. The core algorithm is properly established on a least square optimization approach, enabling the error uncertainty on camera orientation to be computed.

[0085] FIG. 14 depicts a schematic diagram of another exemplary embodiment of an AGV 300 according to applications of various embodiments of the present invention. The configuration of the present embodiment is similar or identical to that as afore-described, but with at least one difference in that the AGV 300 of the present embodiment further include local CPU 340 and GPU 350 for fully localized machine vision computation in lieu of relying on that by a remote processing server.

[0086] The control circuitry 330 is electrically coupled with the front-facing camera 320, the CPU 340, and the GPU 350. The control circuitry 330 is configured to transmit a video file recorded by the front-facing camera 320 to the CPU 340 and the GPU 350 for estimating camera orientation. The configuration of the present embodiment enables the AGV 300 for being a street vehicle or commercial/domestic robots.

[0087] Although the above description of the present invention involved only ground-based AGVs, an ordinarily skilled person in the art can readily adapt and apply the various embodiments of the present invention in other machine vision applications in e.g. aerial and marine-based drones without undue experimentation or deviation from the spirit of the present invention.

[0088] The electronic embodiments disclosed herein may be implemented using computing devices, computer processors, or electronic circuitries including but not limited to application specific integrated circuits (ASIC), field programmable gate arrays (FPGA), and other programmable logic devices configured or programmed according to the teachings of the present disclosure.

[0089] Computer instructions or software codes running in the computing devices, computer processors, or programmable logic devices can readily be prepared by practitioners skilled in the software or electronic art based on the teachings of the present disclosure.

[0090] All or portions of the electronic embodiments may be executed in one or more computing devices including server computers, personal computers, laptop computers, mobile computing devices such as smartphones and tablet computers.

[0091] The electronic embodiments include computer storage media having computer instructions or software codes stored therein which can be used to program computers or microprocessors to perform any of the processes of the present invention. The storage media can include, but are not limited to, floppy disks, optical discs, Blu-ray Disc, DVD, CD-ROMs, and magneto-optical disks, ROMs, RAIVIs, flash memory devices, or any type of media or devices suitable for storing instructions, codes, and/or data.

[0092] Various embodiments of the present invention also may be implemented in distributed computing environments and/or Cloud computing environments, wherein the whole or portions of machine instructions are executed in distributed fashion by one or more processing devices interconnected by a communication network, such as an intranet, Wide Area Network (WAN), Local Area Network (LAN), the Internet, and other forms of data transmission medium.

[0093] The foregoing description of the present invention has been provided for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations will be apparent to the practitioner skilled in the art.

[0094] The embodiments were chosen and described in order to best explain the principles of the invention and its practical application, thereby enabling others skilled in the art to understand the invention for various embodiments and with various modifications that are suited to the particular use contemplated.

APPARATUS AND METHOD FOR ESTIMATING CAMERA ORIENTATION RELATIVE TO GROUND SURFACE

Inventors

Cpc classification

Classification Explorer

G06T1/0014

PHYSICS

Classification Explorer

B60R1/24

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

G06T2207/30252

PHYSICS

Classification Explorer

G06T2207/30244

PHYSICS

Classification Explorer

B60R1/00

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

G06T7/73

PHYSICS

Classification Explorer

B60R2300/402

PERFORMING OPERATIONS; TRANSPORTING

International classification

Classification Explorer

G06T7/73

PHYSICS

Classification Explorer

B60R1/00

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

G06T1/00

PHYSICS

Abstract

Claims

Description