APPARATUS AND METHOD FOR ESTIMATING CAMERA ORIENTATION RELATIVE TO GROUND SURFACE
20220051430 · 2022-02-17
Inventors
- Wang Kit Wilson THONG (Hong Kong, HK)
- Jihui ZHANG (Hong Kong, HK)
- Man Wai KWAN (Hong Kong, HK)
- Yiu Man LAM (Hong Kong, HK)
- Cheng Hsiung LIU (Hong Kong, HK)
- Jiaxin YAN (Hong Kong, HK)
- Zhuobin LIN (Hong Kong, HK)
Cpc classification
G06N7/01
PHYSICS
G06T7/80
PHYSICS
G05D1/0088
PHYSICS
International classification
G05D1/00
PHYSICS
G06F17/11
PHYSICS
Abstract
An iterative multi-image camera orientation estimation comprising: capturing an image of a scene before the camera; detecting line segments in the scene; computing a maximum likelihood (ML) camera orientation by maximizing a likelihood objective by rotating the camera's X-Y-Z coordinate system such that it is being optimally aligned with the line segments in at least two of the frontal, the lateral, and the vertical orthogonal directions; estimating a maximum a-posteriori (MAP) camera orientation that maximizes an a-posteriori objective such that the MAP camera orientation is an optimal value in between the priori camera orientation and the ML camera orientation, and is closer to the one with smaller uncertainty; iterating the multi-image camera orientation estimation with the priori camera orientation and its corresponding priori camera orientation uncertainty set to the computed MAP camera orientation and its corresponding uncertainty respectively until the uncertainty is lower than a threshold.
Claims
1. A method for estimating camera orientation of a camera relative to a ground, comprising: initializing a priori camera orientation and its corresponding priori camera orientation uncertainty with a best guess or random camera orientation of the camera; executing an iterative multi-image camera orientation estimation, comprising: capturing a new image or extracting a new video frame from a video of a scene before the camera; detecting one or more line segments in the scene in the image or video frame; classifying and grouping the line segments of the image or video frame into a frontal, a lateral, and a vertical line segment groups; computing a maximum likelihood (ML) camera orientation by taking the camera's calibrated matrix and maximizing a likelihood objective by rotating a X-Y-Z coordinate system under the camera orientation such that it is being optimally aligned with the line segments in at least two of the frontal, the lateral, and the vertical orthogonal directions; estimating a maximum a-posteriori (MAP) camera orientation that maximizes an a-posteriori objective such that the MAP camera orientation is an optimal value in between the priori camera orientation and the ML camera orientation, and is closer to the one with smaller uncertainty; comparing the MAP camera orientation with a pre-defined MAP camera orientation uncertainty; and if the MAP camera orientation uncertainty is higher than the MAP camera orientation uncertainty threshold, iterating the multi-image camera orientation estimation with the priori camera orientation and its corresponding priori camera orientation uncertainty set to the computed MAP camera orientation and its corresponding MAP camera orientation uncertainty respectively; and if the MAP camera orientation uncertainty is equal and lower than the MAP camera orientation uncertainty threshold, taking the MAP camera orientation that is corresponding to the MAP camera orientation uncertainty that is equal and lower than the MAP camera orientation uncertainty threshold value as the camera orientation estimation method result.
2. The method of claim 1, wherein the classification and grouping of the line segments of the image or video frame into the frontal, the lateral, and the vertical line segment groups comprises: projecting a three-dimensional (3D) x-axis infinity point, a 3D y-axis infinity point, and a 3D z-axis infinity point infinity points corresponding to an initial orientation of the camera on to the image or video frame to obtain a two-dimensional (2D) X-directional orthogonal vanishing point, a 2D Y-directional orthogonal vanishing point, and a 2D Z-directional orthogonal vanishing point respectively of the scene in the image or video frame; and classifying and grouping the line segments into a frontal line segment group, which contains line segments having shortest perpendicular distances to the X-directional orthogonal vanishing point in comparison to the other vanishing points; a lateral line segment group, which contains line segments having shortest perpendicular distances to the Y-directional orthogonal vanishing point in comparison to the other vanishing points; and a vertical line segment group, which contains line segments having the shortest perpendicular distances to the Z-directional orthogonal vanishing point in comparison to the other vanishing points; wherein the initial orientation of the camera is obtained from the camera's calibrated matrix, a best guess orientation, a randomly set orientation, or measurements using an orientation sensor.
3. The method of claim 1, wherein the MAP camera orientation estimation comprises: initializing a currently estimated camera orientation rotation matrix to a prior camera orientation rotation matrix of the prior camera orientation; initializing a currently estimated camera orientation uncertainty to a prior camera orientation uncertainty of the prior camera orientation; executing an iterative a-posteriori objective maximization comprising: computing a X-directional orthogonal vanishing point, a Y-directional orthogonal vanishing point, and a Z-directional orthogonal vanishing point of a X-Y-Z coordinate system under the currently estimated camera orientation; projecting the X-directional orthogonal vanishing point, the Y-directional orthogonal vanishing point, and the Z-directional orthogonal vanishing point on to the image or video frame; measuring a perpendicular distance between each line segment in the frontal line segment group and the X-directional orthogonal vanishing point, a perpendicular distance between each line segment in the lateral line segment group and the Y-directional orthogonal vanishing point, and a perpendicular distance between each line segment in the vertical line segment group and the Z-directional orthogonal vanishing point; computing a camera rotation Euler-angle for rotating from the currently estimated camera orientation rotation matrix, currently estimated camera orientation uncertainty, and the perpendicular distances so to maximize the a-posteriori objective; updating the currently estimated camera orientation rotation matrix by perturbing it by the camera rotation Euler-angle; updating currently estimated camera orientation uncertainty by setting it to a co-variance of the camera rotation Euler-angle; and if the camera rotation Euler-angle is higher than a pre-defined camera rotation threshold, iterating the a-posteriori objective maximization; if the camera rotation Euler-angle equal or lower than the camera rotation threshold, outputting the currently estimated camera orientation as the MAP camera orientation, and the currently estimated camera orientation uncertainty as the MAP camera orientation uncertainty.
4. The method of claim 3, wherein the computation of the camera rotation Euler-angle, comprises: computing Φ.sub.0 from R.sub.0 by solving [Φ.sub.0].sub.x=ln R.sub.0; computing a precision of the priori camera orientation, Λ.sub.Φ.sub.
C=A+Λ.sub.Φ.sub.
d=b+Λ.sub. .sub.
5. The method of claim 1, further comprising a ground plane normal vector, n, of the scene before the camera is computed by solving:
n=R*[0,0,1].sup.τ; where R* is a rotation matrix of the camera orientation estimation method result.
6. The method of claim 1, wherein the detection of one or more line segments in the scene in the image or video frame comprises: converting the image or video frame into a 2D array containing only zeros and ones using Canny edge detection; and detecting the line segments from the 2D array using statistical Hough transform.
7. A method for guiding a vehicle or a mobile robot having a front-facing camera, comprising: executing a method for estimating camera orientation of the front-facing camera of claim 1; and controlling motions of the vehicle by a remote processing server in response to the estimated camera orientation.
8. A remote processing server for estimating camera orientation of camera of an autonomous guided vehicle (AGV) or a mobile robot, comprising: a processor in data communication with an AGV or a mobile robot; wherein the processor is configured to receive a video file or data stream from the AGV or the mobile robot and to execute the method for estimating camera orientation of claim 1 with respect to the front-facing camera of the AGV or the mobile robot.
9. An autonomous guided vehicle (AGV), comprising: a camera installed at a front side of the AGV body and configured to capture a scene before the AGV; a processor configured to receive a video file or data stream from the front-facing camera and to execute the method for estimating camera orientation of claim 1 with respect to the front-facing camera of the AGV.
10. A mobile robot, comprising: a camera installed at a front side of the mobile robot body and configured to capture a scene before the mobile robot; a processor configured to receive a video file or data stream from the front-facing camera and to execute the method for estimating camera orientation of claim 1 with respect to the front-facing camera of the mobile robot.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0020] Embodiments of the invention are described in more details hereinafter with reference to the drawings, in which:
[0021]
[0022]
[0023]
[0024]
[0025]
[0026] Gaussian probability density functions and an “a-posteriori” product function in illustrating the Bayes' theorem used in a maximum a-posteriori method for estimating camera orientation according to an embodiment of the present invention;
[0027]
[0028]
DETAILED DESCRIPTION
[0029] In the following description, methods and apparatuses for estimating camera orientation relative to a ground plane by leveraging properties of orthogonal vanishing points, and the likes are set forth as preferred examples. It will be apparent to those skilled in the art that modifications, including additions and/or substitutions may be made without departing from the scope and spirit of the invention. Specific details may be omitted so as not to obscure the invention; however, the disclosure is written to enable one skilled in the art to practice the teachings herein without undue experimentation.
[0030] In the present disclosure, 2D and 3D spatial geometry, such as points and lines as perceived by machine vision are represented in projective space coordinates. Definitions for mathematical notations in the present disclosure are listed as follows:
[0031] A point p in a two-dimensional projective space .sup.2 is represented as three-vector {right arrow over (p)}=(u, v, k), and its coordinate in a two-dimensional Euclidean space
.sup.2 is
[0032] A line l in .sup.2 is represented as three-vector {right arrow over (l)}=(a, b , c), and its slope and y-intercept in
.sup.2 is respectively
[0033] A point p is on a line l in .sup.2 if and only if p.sup.τl=0 because au+bv+ck=0 which is a line equation;
[0034] a.sup.τ represents transpose of a, and a.sup.τb represents dot product between two vectors a and b.
[0035] Projective transformation H in .sup.2 is a 3×3 matrix. It transforms a point in
.sup.2 from p to p′=Hp.
[0036] If H in .sup.2 transforms point from p to p′=Hp, it transforms line from l to l′=H.sup.−τl.
[0037] A.sup.−τ represents transpose of matrix A.sup.−1, and A.sup.−1 represents inverse of matrix A;
[0038] A point in three-dimensional .sup.3 is P=(X, Y, Z). Under a pinhole camera model, an image captured by a pinhole camera is modeled as a point p=KP in two-dimensional
.sup.2, where K is a projective transformation in
.sup.2.
[0039] K is also known as camera calibrated (or intrinsic) matrix, and it encodes camera's focal length f and principal point (p.sub.x, p.sub.y) by
such that point P=(X, Y, Z) in .sup.3 is imaged as point
in .sup.2; and
[0040] A camera calibrated matrix K can be found by some manual calibration procedure.
[0041] Referring to
[0042] In practical cases, during operation of AGVs 100, certain conditions encountered may result in computational problems that cause the AGVs 100 unable to function. For example, as shown in
[0043] Further, as shown in
[0044] Referring to the flowchart depicted in
[0045] In the step S10, a video file/data stream is produced by the AGV 100's front-facing camera 120 in capturing a real-world scene before it and transmitted to the remote processing server 150 via the wireless communication. The video file/data stream contains a plurality of video frames of continuous images.
[0046] In the step S20, a current video frame/image is extracted from the video file/data stream by the remote processing server 150. The video frame/image is static and reflects the real-world scene (i.e. the left image in
[0047] In the step S30, detection of line segments in the current video frame/image is performed by the remote processing server 150, such that line segments are generated on the video frame/image (i.e. the right image in
[0048] In step S40, the line segments detected in the step S30 are classified and grouped into three orthogonal directions, for example, the X, Y and Z directions. In one embodiment, the classification and grouping of the line segments comprises superimposing on to the current video frame/image a virtual cube having three orthogonal vanishing points in a random or best-guess 3D orientation. An orthogonal direction classifier classifies the line segments of the video frame/image and groups them by comparing the perpendicular distances between each of the three orthogonal vanishing points of the first virtual cube to each of the detected line segments, and determining the group of which the line segment belongs to according to the shortest of the three perpendicular distances. The details of this embodiment of classification and grouping of the line segments are provided in U.S. patent application Ser. No. 16/992,088.
[0049] In another embodiment, the classification and grouping of the line segments comprises projecting the 3D x-axis, y-axis, and z-axis infinity points corresponding to an initial orientation of the camera on to the first image to obtain the respective three 2D orthogonal vanishing points in the X, Y, and Z directions. The initial orientation of the camera may be obtained from the camera's calibrated (or intrinsic) matrix, a best guess orientation, or a random orientation.
[0050] Then, an orthogonal direction classifier classifies the line segments of the first image and groups them into a frontal line segment group, which contains line segments having the shortest perpendicular distances to the X vanishing point in comparison to the other vanishing points; a lateral line segment group, which contains line segments having the shortest perpendicular distances to the Y vanishing point in comparison to the other vanishing points; and a vertical line segment group, which contains line segments having the shortest perpendicular distances to the Z vanishing point in comparison to the other vanishing points.
[0051] In step S50, a maximum a-posteriori (MAP) camera orientation estimation is performed to obtain a MAP camera orientation by considering a priori camera orientation with its corresponding priori camera orientation uncertainty and a maximum likelihood (ML) camera orientation with its corresponding ML camera orientation uncertainty; wherein the ML camera orientation is computed by taking the camera's calibrated matrix and maximizing a likelihood objective by rotating the camera's 3D X-Y-Z coordinate system such that it is being optimally aligned with the 2D line segments in at least two of the three orthogonal directions.
[0052] The MAP camera orientation estimation then maximizes an a-posteriori objective such that the MAP camera orientation is computed to equal to an optimal value, which being a value in between the ML camera orientation and the priori camera orientation, and being closer to the one with the smaller uncertainty.
[0053] In step S60, compare the MAP camera orientation uncertainty with a MAP camera orientation uncertainty threshold value; and if the MAP camera orientation uncertainty is higher than the threshold value, the process steps S20 to S50 are repeated with a subsequent video frame/image of the video file/data stream with the priori camera orientation and its corresponding priori camera orientation uncertainty set to the computed MAP camera orientation and its corresponding MAP camera orientation uncertainty respectively. As to the priori camera orientation and its corresponding priori camera orientation uncertainty used in the MAP camera orientation estimation on the first image, a best guess or random camera orientation and its corresponding camera orientation uncertainty are used.
[0054] The iterations of the process steps S20 to S50 continue with each subsequent video frame/image of the video file/data stream for computing an estimated MAP camera orientation and its corresponding MAP camera orientation uncertainty in each iteration until the MAP camera orientation uncertainty is found to be equal or below a pre-defined MAP camera orientation uncertainty threshold.
[0055] Finally, in step S70, the MAP camera orientation that is corresponding to the MAP camera orientation uncertainty found to be equal or below a pre-defined MAP camera orientation uncertainty threshold is taken as the camera orientation estimation result. Also, a ground plane normal vector, n, of the scene before the camera is computed by solving:
n=R* [0,0,1].sup.τ;where R* is the resulting estimated camera orientation rotation matrix.
[0056] In accordance to one embodiment, the MAP camera orientation estimation is based on the Bayes' theorem that combines a priori camera orientation (the camera orientation estimation result in the last estimation iteration), R.sub.0, and the ML camera orientation, R.sub.ML, of the current video frame or image in finding an optimal camera orientation, R*, by maximizing an a-posteriori probability, which can be expressed by:
Pr(R)=Pr(current frame or image|R)×Pr(R|previous frame or image);
where R is the camera orientation being estimated;
Pr(current frame or image|R) is the “likelihood” term, which is a Gaussian probability density function that attains its maximum value at R=R.sub.ML, with its co-variance being the ML camera orientation uncertainty of the current video frame or image;
Pr(R|previous frame or image) is the “prior” term, which is a Gaussian probability density function that attains its maximum value at R=R.sub.0, with its co-variance being the priori camera orientation uncertainty; and
Pr(R) is the “a-posteriori” term, which is a product of the two Gaussian probability density functions and thus is proportional to a Gaussian probability density function with its maximum value (R=R*) lying in between R.sub.ML and R.sub.0 depending on the co-variances of the Gaussian probability density functions (the ML camera orientation uncertainty and the priori camera orientation uncertainty). To further illustrate,
[0057] Referring to the flowchart depicted in
[0058] In step P10, the rotation matrix of the camera orientation being estimated, R, is first initialized to equal to that of the priori camera orientation (the camera orientation estimation result in the last camera orientation estimation on the last video frame/image), R.sub.0; that is: R=R.sub.0, where each of R and R.sub.0 is a 3×3 rotation matrix. Note that the camera orientation can also be expressed in Euler-angle representation, which is a vector of three elements, denoted as Φ.
[0059] In step P20, the camera orientation uncertainty, which can be expressed by the co-variance matrix, ΣΦ, is initialized to equal to the priori camera orientation uncertainty, which can be expressed by the co-variance matrix, ΣΦ.sub.0; where Φ.sub.0 is an infinitesimal Euler-angle defined with respect to R.sub.0 perturbing to R.
[0060] In step P30, compute the orthogonal vanishing points, v.sub.x, v.sub.y, and v.sub.z in the X, Y, and Z directions respectively of a X-Y-Z coordinate system under the camera orientation obtained from the camera orientation rotation matrix, R.
[0061] In step P40, project the orthogonal vanishing points, v.sub.x, v.sub.y, and v.sub.z on to the current video frame/image; measuring the perpendicular distance, from every line, l.sub.xi, in the frontal line segment group to v.sub.x; measuring the perpendicular distance, δ.sub.yi, from every line, l.sub.yi, in the lateral line segment group to v.sub.y; and measuring the perpendicular distance, fit, from every line, l.sub.zi, in the vertical line segment group to v.sub.z.
and it is further defined: δ.sub.i∈{δ.sub.xi, δ.sub.yi, δ.sub.zi}, l.sub.i∈{l.sub.xi, l.sub.yi, l.sub.zi}, and K being the camera's calibrated matrix.
[0062] To find the optimal camera orientation, R*, that maximizes the “a-posteriori” term: Pr(Φ|Σδ.sub.i), the Φ.sub.ML that maximizes the “likelihood” term: Pr(Σδ.sub.i|Φ) is first computed by linearizing the total error term, Σδ.sub.i, at the current camera orientation over Φ. It can also be expressed by that the maximum of the “likelihood” term is found by solving Φ for ∂J(Φ)/∂Φ=0; where ∂J(Φ)/∂Φ is the linear rate of change of total error, E(Φ)=Σδ.sub.i, with respect to the camera orientation around the vicinity of the current Φ. The uncertainty in Φ.sub.ML represented by the co-variance matrix, Σ.sub.Φ.sub.
[0063] In step P50, compute the amount of rotation, ΔΦ.sub.MAP, from R, Φ.sub.0, Σ.sub.Φ.sub.
[0064] Sub-step I: compute Φ.sub.0 from R.sub.0 by solving [Φ.sub.0].sub.x=ln R.sub.0; and compute the precision of the priori camera orientation by pseudo inversing of the priori camera orientation uncertainty, Σ.sub.Φ.sub.
Λ.sub.Φ.sub.
[0065] Sub-step II: compute Φ.sub.ML such that the rate of change of E(Φ)=ΣJ.sub.i∈.sub.i.sup.2/J.sub.iΣ.sub.gJ.sub.i.sup.τ is 0, i.e., ∂J(Φ)/∂Φ=0, where ∈.sub.i=l.sub.i.sup.τKRP.sub.i, by solving the following intermediate expressions, during which Σ.sub.Φ.sub.
Σ.sub.Φ.sub.
[0066] Sub-step III: compute the camera rotation, ΔΦ.sub.MAP, so to maximize “a-posteriori” between Φ.sub.ML and Φ.sub.0 by solving the following intermediate expressions, during which Σ.sub.Φ.sub.
C=A+Λ.sub.Φ.sub.
Σ.sub.ΔΦ.sub.
where:
[0067] l.sub.i represents the line segment i, that is l.sub.i=(p.sub.i, q.sub.i, 1)×(u.sub.i, y.sub.i, 1) between two end points (p.sub.i, q.sub.i) and (u.sub.i, v.sub.i);
[0068] K is the camera's calibrated matrix; and
[0069] Σ.sub.g is a user-defined pixel noise co-variance at both ends of the line segment, l.sub.i.
[0070] In step P60, rotate the camera orientation X-Y-Z coordinate system by ΔΦ.sub.MAP, that is the current camera orientation rotation matrix, R, is perturbed by ΔΦ.sub.MAP, or updated by R.sub.ΔΦ.sub.
R←R.sub.ΔΦ.sub.
[0071] In step P70, update the camera orientation uncertainty by the co-variance of ΔΦ.sub.MAP, which is Σ.sub.101 =Σ.sub.ΔΦ.sub.
[0072] If ∥ΔΦ.sub.MAP∥ is very close to 0 or lower than a pre-defined camera rotation threshold, proceeds to step P80; otherwise, repeats steps P30 to P70.
[0073] In step P80, the optimal camera orientation, R*, is found to be the current camera orientation, that is R*←R, and the estimated MAP camera orientation is the optimal camera orientation.
[0074] Although the above description of the present invention involved only ground-based AGVs, an ordinarily skilled person in the art can readily adapt and apply the various embodiments of the present invention in other machine vision applications in e.g. aerial and marine-based drones without undue experimentation or deviation from the spirit of the present invention.
[0075] The electronic embodiments disclosed herein may be implemented using computing devices, computer processors, or electronic circuitries including but not limited to application specific integrated circuits (ASIC), field programmable gate arrays (FPGA), and other programmable logic devices configured or programmed according to the teachings of the present disclosure. Computer instructions or software codes running in the computing devices, computer processors, or programmable logic devices can readily be prepared by practitioners skilled in the software or electronic art based on the teachings of the present disclosure.
[0076] All or portions of the electronic embodiments may be executed in one or more computing devices including server computers, personal computers, laptop computers, mobile computing devices such as smartphones and tablet computers.
[0077] The electronic embodiments include computer storage media having computer instructions or software codes stored therein which can be used to program computers or microprocessors to perform any of the processes of the present invention. The storage media can include, but are not limited to, floppy disks, optical discs, Blu-ray Disc, DVD, CD-ROMs, and magneto-optical disks, ROMs, RAMs, flash memory devices, or any type of media or devices suitable for storing instructions, codes, and/or data.
[0078] Various embodiments of the present invention also may be implemented in distributed computing environments and/or Cloud computing environments, wherein the whole or portions of machine instructions are executed in distributed fashion by one or more processing devices interconnected by a communication network, such as an intranet, Wide Area Network (WAN), Local Area Network (LAN), the Internet, and other forms of data transmission medium.
[0079] The foregoing description of the present invention has been provided for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations will be apparent to the practitioner skilled in the art.
[0080] The embodiments were chosen and described in order to best explain the principles of the invention and its practical application, thereby enabling others skilled in the art to understand the invention for various embodiments and with various modifications that are suited to the particular use contemplated.