METHOD AND DEVICE FOR GENERATING A PHOTOGRAMMETRIC CORRIDOR MAP FROM A SET OF IMAGES

Abstract

A method for generating a photogrammetric corridor map from a set of input images by recovering a respective pose of each image, wherein a pose includes a position and an orientation information of the underlying camera, including steps of: a) receiving a set of input images, b) defining a working set, c) initializing an image cluster, d) further growing the image cluster: d1) selecting one image from the working set that features overlap with at least one image already in the cluster, e) continuing with step b) if there remain images in the working set; if not, f) generating and providing as output the corridor map using the recovered camera poses.

Claims

1. A method for generating a photogrammetric corridor map from a set of input images by recovering a respective pose of each image, wherein a pose comprises a position and an orientation information of an underlying camera, the method comprising: a) receiving a set of input images acquired with a camera along a corridor flight path and a corresponding set of input camera positions, b) defining as a working set the subset of input images for which corresponding pose has not yet been recovered, c) initializing an image cluster by: c1) incrementally recovering pose for images from the working set until pose for at least three images has been recovered and such that not all recovered camera positions are collinear using a method for classical incremental Structure from Motion pipeline, c2) computing a similarity transformation transforming the recovered camera positions to the corresponding input camera positions, and c3) applying the similarity transformation to the recovered camera poses in the image cluster, d) further growing the image cluster by: d1) selecting one image from the working set that features overlap with at least one image already in the cluster, d2) adding the image to the cluster by recovering, via camera resectioning, the pose of its underlying camera relative to the camera poses corresponding to the images already in the cluster, d3) when at least a predefined number of images have been added since the last invocation of the GNSS bundle adjustment algorithm, performing a GNSS bundle adjustment algorithm to refine the poses of the cluster, and d4) when there remain images in the working set that feature overlap with at least one image already in the cluster, continuing with step d1); otherwise, continuing with step e), e) when there remain images in the working set, continuing with step b); otherwise, continuing with step f), f) generating and providing as output the corridor map using the recovered camera poses.

2. The method according to claim 1, wherein the predefined number of images is at least 3.

3. The method according to claim 1, wherein the corridor map is an orthophoto, a 2.5D elevation map, or a contour map.

4. The method according to claim 1, wherein the set of input images is acquired with a camera sensible in an optical visible or an IR spectrum.

5. A device for generating a photogrammetric corridor map from a set of input images by recovering a respective pose of each image, wherein a pose comprises a position and an orientation information of an underlying camera, comprising a computing unit, and a memory, wherein the device is configured to receive and store the set of input images in the memory, which is captured along a corridor flight path and which includes respective position information about the place of capturing the respective image from the set of input images, wherein the device is further configured to perform the method according to claim 1, and wherein the device is configured to provide the corridor map from the memory.

6. The method according to claim 2, wherein the predefined number of images is at least 5.

7. The method according to claim 2, wherein the predefined number of images is at least 10.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0033] The invention will be explained in more detail with reference to an embodiment example shown in the accompanying drawings. In the drawings shows:

[0034] FIG. 1 an illustration of ground surface footprints in principle,

[0035] FIG. 2 an example of a naive approach according to prior art,

[0036] FIGS. 3-5 an embodiment of the method according to the invention,

[0037] FIG. 6 an embodiment with a flowchart of the method according to the invention,

[0038] FIG. 7 an embodiment of the device according to the invention.

DETAILED DESCRIPTION OF INVENTION

[0039] The invention is not restricted to the specific embodiments described in detail herein, but encompasses all variants, combinations and modifications thereof that fall within the framework of the appended claims.

[0040] According to the invention the incremental SfM pipeline outlined above is a “multi-cluster” variant of incremental SfM, addressing both aforementioned challenges inherent to classical incremental SfM by making use of GNSS positions while relying neither on the presence of a costly high-accuracy INS nor on the availability of GCPs.

[0041] In the context of the present embodiment of the invention, “GNSS positions” refer to GNSS positions obtained using the real-time kinematic (RTK) technique or via post-processing, capable of yielding point measurement accuracy in the centimeter range.

[0042] In the present embodiment of the invention, such GNSS positions can additionally be used to ameliorate the quadratic complexity of naively carrying out sparse matching between all N(N−1)/2 possible pairs of the N input images.

[0043] In the context of the present embodiment of the invention, the displacement (i.e., “lever arm”) from a camera center to a phase center of the GNSS antenna has been considered.

[0044] In the context of the present embodiment of the invention, overlapping images are understood to be images that share overlap with respect to their respective ground footprints, i.e., the area on the ground seen from the viewpoint of the camera.

[0045] 1. Sparse Matching

[0046] Sparse matches are computed between pairs of input images using SIFT, aided by the widely employed “ratio test” to discard matches deemed spurious.

[0047] In order to alleviate the quadratic complexity of naively computing sparse matches between all N(N−1)/2 possible pairs of the N input images, a “pre-matching” step is first carried out in order to determine the subset of image pairs upon which to restrict attention. In contrast to methods intended for unordered collections of input images that borrow from image retrieval techniques by quantizing keypoint descriptors to a vocabulary of “visual words”, the present pre-matching approach makes no assumptions concerning image content, which in the context of corridor mapping risks being repetitive.

[0048] 1.1. Pre-matching

[0049] Pre-matching is carried out assuming a predominantly nadir acquisition scenario by estimating, for each image text missing or illegible when filed, the ground surface footprint of the corresponding camera text missing or illegible when filed and determined into which other cameras text missing or illegible when filed that footprint projects. Let custom-charactercustom-character.sup.3 denote the set of input GNSS positions, expressed in a local Earth-centered Earth-fixed (ECEF) coordinate frame. For each image text missing or illegible when filed where a corresponding GNSS position text missing or illegible when filedcustom-character available, the camera center text missing or illegible when filedcustom-character.sup.3 of the corresponding camera text missing or illegible when filed is taken to co-incide with text missing or illegible when filed. Two of the three degrees of freedom (DoF) of the rotational component of the approximated pose (text missing or illegible when filed)∈SE(3) of text missing or illegible when filed are resolved by assuming a vertical gravity vector; the remaining DoF is obtained by rotating the camera in-plane to point to the GNSS position associated with the next image with respect to a time stamp. Flight direction is parameterizable with respect to either the x- or y-direction of the camera coordinate frame. Finally, elevation above ground is estimated using the Shuttle Radar Topography Mission (SRTM) elevation model. An illustration of such footprints and corresponding approximated camera poses is provided in FIG. 1.

[0050] FIG. 1 illustrates ground surface footprints in principle, spanned with respect to the approximated camera poses 223, 225 corresponding to a pair of GNSS positions 213, 215 out of the GNSS positions 211-215, with elevation above ground estimated using the SRTM elevation model. For each image, its footprint is projected to the image plane of all its pre-match candidates obtained using metric queries on a kd-tree in order to determine the subset of image pairs over which to subsequently carry out sparse matching.

[0051] Initial pre-match candidates text missing or illegible when filed for each text missing or illegible when filed considering only i<j in order to ensure that sparse matching be carried out only once per pair are obtained by means of metric queries on a kd-tree.

[0052] This list is subsequently culled using ground surface footprints obtained in the manner outlined above, by projecting a given footprint to each corresponding pre-match candidate camera's image plane and determining whether overlap with the image plane is present.

[0053] Note that in contrast to reasoning uniquely in terms of time stamp or metric queries using kd-trees, the proposed method has the advantage of being able to elegantly handle stark variation in elevation above ground.

[0054] 1.2. Sparse Matching

[0055] The classical two-image sparse matching pipeline comprises [0056] (i) detecting, per image, a set of “keypoints”, [0057] (ii) computing, per keypoint, a “descriptor” (or “feature”) characterizing the keypoint, and [0058] (iii) matching, per pair of images text missing or illegible when filed under consideration of keypoints with respect to their associated descriptors, as determined to exhibit overlap in the above pre-matching step pairs, keypoints with respect to their associated descriptors.

[0059] Turning to the Scale Invariant Feature Transform (SIFT) of Lowe to carry out sparse matching, SIFT combines a keypoint detector based on Difference of Gaussians (DoG) offering partial invariance to rotation, translation, and scale (i.e., to similarity transformations), with a keypoint descriptor that offers partial invariance not only to similarity transformations, but to illumination changes and noise as well. In order to reduce the number of false matches, Lowe's ratio test is employed for the matches from an image text missing or illegible when filed with respect to an image text missing or illegible when filed, in that both the nearest and second nearest match are extracted; if the relative magnitude of the two distances is small, the match is deemed spurious and is discarded from further consideration.

[0060] 2. Pose Recovery

[0061] The pose recovery stage aims to recover, for each input image text missing or illegible when filed, the absolute pose (text missing or illegible when filed)∈SE(3) of the corresponding camera at the moment of acquisition, relative to a georeferenced coordinate frame. Unless provided and fixed, the focal length and principal point as well as two tangential and three radial distortion coefficients according to the distortion model of Brown can be estimated jointly for each set of images acquired using a common physical camera. These intrinsic parameters yield a 3×3 camera calibration matrix text missing or illegible when filed and distortion coefficients text missing or illegible when filed as a 5-tuple, where text missing or illegible when filed indexes the physical camera associated with text missing or illegible when filed. The pose recovery stage in turn gives the absolute pose of the respective camera of each image text missing or illegible when filed that could successfully be recovered in terms of text missing or illegible when filed. In a first step, a match graph custom-character based on sparse matches extracted between pre-matched pairs of images is constructed, where each node i represents an image text missing or illegible when filed and where the presence of an edge between nodes indicates the associated image pair is purported to exhibit overlap. Using the match graph custom-character and the available GNSS positions custom-character as input, the “multi-cluster” variant of incremental SfM according to the invention proceeds to recover the respective camera poses for the input image collection with respect to a georeferenced coordinate frame. For at least every n images for which camera pose has been newly recovered, the present variant of SfM refines the reconstruction using a variant of bundle adjustment (“GNSS-BA”). In addition to image residuals minimized by traditional BA, GNSS-BA minimizes position residuals computed, respectively, as a function of reconstructed camera center and corresponding GNSS position.

[0062] 2.1. Match Graph

[0063] The match graph custom-character is constructed such that each image text missing or illegible when filed is associated with a node text missing or illegible when filed, and each pair of matching images (text missing or illegible when filed) with an edge (i, j). In order to construct the match graph, the five-point algorithm is used within a RANSAC loop in the aim of estimating, for each pair (text missing or illegible when filed) of pre-matched images, the corresponding essential matrix text missing or illegible when filed relating text missing or illegible when filed and text missing or illegible when filed. The pose text missing or illegible when filed∈SE (3) of camera text missing or illegible when filed relative to the camera coordinate frame of text missing or illegible when filed is estimated up to a scaling factor by decomposing text missing or illegible when filed in a manner taking into account the cheirality constraint. This relative pose is used in turn to carry out geometric verification on the sparse matches relating the image pair by filtering away outlier matches with respect to the epipolar constraint. Pre-matched image pairs thus associated with at least some fixed minimal number of geometrically verified matches are deemed “matching”. Associated with each edge of G are thus the corresponding relative pose (text missing or illegible when filed) and the set of geometrically verified sparse matches.

[0064] 2.2 Multi-cluster SfM

[0065] A common approach to obtaining a georeferenced reconstruction from a collection of images without recourse to INS measurements or GCPs is in a first step to [0066] (i) apply a classical incremental SfM pipeline as outlined before to the input image collection. To georeference the resulting reconstruction, [0067] (ii) a 7 DoF similarity transformation (s, R, t), where (R, t)∈SE(3) and s denotes a nonzero scaling factor transforming estimated camera centers to their corresponding GNSS positions is then computed and applied. A naive approach to overcoming accumulated drill would be to subsequently attempt refining the transformed reconstruction by [0068] (iii) carrying out a “GNSS-BA” variant of BA taking into account residuals computed as a function of transformed camera centers and corresponding GNSS positions, in addition to classical image residuals (FIG. 2).

[0069] FIG. 2 shows an example of a naive approach.

[0070] A set 100 of raw GNSS positions is depicted, comprising raw GNSS positions 101-107 with a camera center (i.e., recovered camera position) and an image plane (thus illustrating the recovered orientation) for each recovered relative camera pose.

[0071] A set 110 of transformed GNSS positions, comprising transformed GNSS positions 111-117 as a set 110, is transformed from the set 100 by a transformation function 200, representing a similarity transformation (s, R, t).

[0072] The camera poses for image collection are recovered by means of a classical incremental SfM pipeline, georeferenced in a final step using the similarity transformation 200 (s, R, t) relating reconstructed camera centers 121-127 as a set 120 and GNSS positions 111-117 of set 110.

[0073] With other words, a set of recovered camera poses 120 with recovered camera poses 121-127 is georeferenced by estimating and applying a similarity transformation relating estimated camera positions to underlying GNSS positions.

[0074] Refinement of this transformed reconstruction can be attempted using GNSS-BA, taking into account position residuals in addition to classical image residuals. However, in the presence of enough accumulated drift, GNSS-BA will remain trapped in a local optimum and thus fail to correct for the drift. Moreover, it is only for images belonging to a single connected “cluster” (corresponding to GNSS positions colored black) that respective camera pose can be recovered.

[0075] What renders such an approach inherently naive is that in the presence of enough accumulated drift in its initialization, GNSS-BA like any optimization technique based on iterative non-linear least squares will fail to converge to the desired optimum. An additional disadvantage of the approach is that absolute pose can be recovered only with respect to a set of images corresponding to a connected subgraph of custom-character, since relative pose between images belonging to different subgraphs cannot be determined. In this sense, the naive approach can recover pose only for what amounts to a single connected “cluster” of images and can thus fail to recover from the potential occurrence of weak sparse matches between pairs of overlapping frames.

[0076] The approach according to the invention proceeds in a variation on the manner outlined above, computing and applying a similarity transformation not on the output of a classical incremental SfM pipeline run over the entire input image set, but rather only on a minimal connected subset with respect to custom-character. This minimal subset is selected in accordance with a “similarity check” intended to ensure that computation of a similarity transformation is possible; accordingly, in addition to requiring at least three images with recovered pose and associated GNSS positions, those GNSS positions are required to be non-collinear. Such an initializing “image cluster” is then [0077] (i) transformed using a similarity transformation relating camera centers to corresponding GNSS positions, and then [0078] (ii) grown as far as possible, undergoing refinement using GNSS-BA for at least every n newly added images. If a cluster can no longer be grown but there remain images for which pose has yet to be recovered, the attempt is made to [0079] (iii) initialize and grow a new cluster.

[0080] Proceeding accordingly thus serves not only to ensure that drill of the sort possible using the above naive approach not be permitted to accumulate prior to carrying out refinement using GNSS-BA, but also to enable recovery from failure to compute a single connected reconstruction.

[0081] FIG. 3 to FIG. 5 show an example for a multi-cluster SfM.

[0082] FIG. 3 shows the cluster initialization over a minimal connected subset 133-136 of camera positions, i.e. images using classical incremental SfM, subsequently transformed using a similarity transformation relating reconstructed camera centers and GNSS positions.

[0083] With other words, the minimal, initializing set 130 of recovered camera poses 133-136 is georeferenced by estimating and applying a similarity transformation relating estimated camera positions to underlying GNSS positions.

[0084] Next, the cluster is grown according to FIG. 4 by recovering pose for additional images via spatial resection as depicted by direction 210, 211 of appending images, and after having newly added at least n images 131, 132 and 137 in turn FIG. 5 refined using GNSS-BA and yielding refined camera poses 141-147 of the refined camera pose set 140, a process repeated until no more images can be added to the cluster as a full set 140. If images remain for which pose is yet to be determined, the attempt is made to initialize a new cluster, likewise grown as far as possible.

[0085] As already said, the growing 210, 211 of the cluster outwards by adding images that overlap with images already present in the cluster is depicted in the figure, wherein the recovered camera poses 131, 132 and 137 are added.

[0086] In other words, the set of recovered camera poses 140 with recovered camera poses 141-147 is georeferenced by estimating and applying a similarity transformation relating estimated camera positions to underlying GNSS positions and refined using GNSS-BA (note the consideration of the “lever arm”).

[0087] Note that pose can also be recovered for images acquired during GNSS outages; such images, however, are excluded from consideration in the similarity check, and do not contribute respective position residuals to GNSS-BA.

[0088] 2.3. GNSS-BA

[0089] Bundle adjustment (BA) serves to refine existing camera poses by minimizing an objective function of the form

[00001] .Math. i ρ i ( .Math. ϵ i im .Math. 2 )

[0090] where i iterates over the set of all image residuals text missing or illegible when filed, giving the distance in pixels between the ith tie point and the projection of its triangulated counterpart, and where the functions text missing or illegible when filed serve to dampen the influence of outlier residuals. For instance, each text missing or illegible when filed is set to the same Huber loss. The “GNSS-BA” objective function is proposed to minimize is

[00002] .Math. i ρ i ( .Math. ϵ i im .Math. 2 ) + λ .Math. .Math. j N j .Math. ρ j ( .Math. ϵ j pos .Math. 2 )

[0091] where j iterates over the subset of text missing or illegible when filed for which a GNSS position text missing or illegible when filedcustom-character was provided, expressed in a local ECEF Euclidean coordinate frame. While the image residuals text missing or illegible when filed are computed as in the penultimate equation, the position residuals text missing or illegible when filed are given in meters by

[00003] ? = ( C j + R j v ) - P j ? indicates text missing or illegible when filed

[0092] where text missing or illegible when filed denotes the estimated camera center corresponding to text missing or illegible when filed, and (text missing or illegible when filed) the estimated absolute pose. The vector v denotes the offset (i.e., “lever arm”) from the camera center to the phase center of the GNSS antenna, expressed in the coordinate frame of the camera. Multiplication by factor text missing or illegible when filed in the penultimate equation is intended to balance the relative influence of image and position residuals. In the present embodiment, text missing or illegible when filed is set to the number of image residuals associated with text missing or illegible when filed. In an alternative embodiment, text missing or illegible when filed could be chosen in another manner. The parameter λϵcustom-character∪{0} provided by the user serves to weight the influence of position residuals relative to image residuals, with respect to their balanced representation. Minimization of the first two formulas above is carried out via the Ceres solver using an implementation of the Levenberg-Marquardt algorithm.

[0093] 3. Output Corridor Map Generation

[0094] With the pose recovery stage completed, dense scene geometry can be recovered using a (multi-view) stereo algorithm. Next, using conventional techniques, the recovered scene geometry can be used to generate a georeferenced 2.5D digital surface model (DSM), which in turn can be textured with input images given respective recovered camera poses to obtain a corresponding georeferenced (true) orthophoto, in effect a map obtained by stitching the input images.

[0095] FIG. 6 shows a flowchart of an embodiment of the method according to the invention.

[0096] The method for generating a photogrammetric corridor map from a set of images by recovering a respective pose of each image, wherein a pose comprises a position and an orientation information of the underlying camera comprises following steps:

[0097] a) Receiving a set of input images acquired with a camera along a corridor flight path and a corresponding set of input camera positions,

[0098] b) Defining as a working set the subset of input images for which corresponding pose has not yet been recovered,

[0099] c) Initializing an image cluster: [0100] c1) Incrementally recovering pose for images from the working set until pose for at least three images has been recovered and such that not all recovered camera positions are collinear using a method for classical incremental Structure from Motion pipeline, [0101] c2) Computing a similarity transformation transforming the recovered camera positions to the corresponding input camera positions, [0102] c3) Applying the similarity transformation to the recovered camera poses in the image cluster,

[0103] d) Further growing the image cluster: [0104] d1) Selecting one image from the working set that features overlap with at least one image already in the cluster, [0105] d2) Adding the image to the cluster by recovering, via camera resectioning, the pose of its underlying camera relative to the camera poses corresponding to the images already in the cluster, [0106] d3) Performing a GNSS bundle adjustment algorithm to refine the poses of the cluster, if at least a predefined number of images have been added since the last invocation of the GNSS bundle adjustment algorithm, [0107] d4) Continuing with step d1), if there remain images in the working set that feature overlap with at least one image already in the cluster; if not, continuing with step e),

[0108] e) Continuing with step b) if there remain images in the working set; if not, continuing with step f),

[0109] f) Generating and providing as output the corridor map using the recovered camera poses.

[0110] Step a) is represented in FIG. 6 by “receiving the set of input images” 10.

[0111] Steps b) and c) with c1) . . . c3) are represented by “initializing cluster” 20.

[0112] Step d) is depicted by “build the cluster” 30, which procedure can be described in other words used before by following substeps: [0113] “try to add image to cluster” 31, [0114] “approve, whether cluster is grown” 32 with result “yes” 321 or “no” 322, [0115] “approve, whether at least n images have been added” 33 with result “yes” 331 or “no” 332, [0116] “refine cluster using GNSS-BA” 34

[0117] Step e) is represented in FIG. 6 by “determination whether more images are available” 40.

[0118] Step e) is represented in FIG. 6 by “provide corridor map” 50.

[0119] The predefined number of images is in this example 5.

[0120] A specific implementation of the steps of the method according to the invention can lead to a variance of the sequence of the method steps in the claims.

[0121] FIG. 7 shows an embodiment of the device according to the invention.

[0122] A device 3 generates a photogrammetric corridor map 1 from a set of input images 2 by recovering a respective pose of each image, wherein a pose comprises a position and an orientation information of the underlying camera.

[0123] The device 3 comprises a computing unit 4 and a memory 5. The device 3 is configured to receive and store the set of input images 2 in the memory 5, which is captured along a trajectory and which includes respective, not collinear position information about the place of capturing the respective image from the set of input images 2, and the device 3 is further configured to perform the method according to the invention, and to provide the corridor map 1 from the memory 5.

LIST OF REFERENCE NUMERALS

[0124] 1 output photogrammetric corridor map

[0125] 2 input set of images and associated GNSS positions

[0126] 3 device

[0127] 4 computing unit

[0128] 5 memory

[0129] 10 receive set of images

[0130] 20 initialize cluster

[0131] 30 build the cluster

[0132] 31 try to add image to cluster

[0133] 32 approve, whether cluster is grown

[0134] 33 approve, whether at least n images have been added

[0135] 34 refine cluster using GNSS-BA

[0136] 40 determination whether more images are available

[0137] 50 provide corridor map

[0138] 100 set of recovered relative camera poses (depicted as camera center and image plane)

[0139] 101-107 recovered relative camera poses

[0140] 110 set of GNSS positions

[0141] 111-117 GNSS positions

[0142] 120, 130 set of recovered camera poses

[0143] 121-127, 131-137 recovered camera poses

[0144] 140 set of recovered camera poses, refined using GNSS-BA

[0145] 141-147 recovered camera poses, relined using GNSS-BA

[0146] 200 application of estimated similarity transformation

[0147] 210, 211 depiction of growing the cluster outwards by adding images that overlap with images already present in the cluster

[0148] 321, 331, 401 yes

[0149] 322, 332, 402 no