Mosaic oblique images and methods of making and using same
09805489 · 2017-10-31
Assignee
Inventors
- Stephen Schultz (West Henrietta, NY, US)
- Frank Giuffrida (Honeoye Falls, NY, US)
- Robert Gray (Canandiagua, NY, US)
Cpc classification
G06T3/4038
PHYSICS
International classification
G06T3/40
PHYSICS
Abstract
A computer system running image processing software receives identification of a geographical area for which an oblique-mosaic image is desired; assigns surface locations to pixels included in an oblique-mosaic pixel map of the geographical area encompassing multiple source images, the oblique-mosaic pixel map being part of a mathematical model of a virtual camera looking down at an oblique angle onto the geographical area; creates a ground elevation model of the ground and vertical structures within the oblique-mosaic pixel map using overlapping source images of the geographical area, wherein the source images were captured at an oblique angle and compass direction similar to the oblique angle and compass direction of the virtual camera; and reprojects, with the mathematical model, source oblique image pixels of the overlapping source images for pixels included in the oblique-mosaic pixel map using the ground elevation model to thereby create an oblique-mosaic image of the geographical area.
Claims
1. A system comprising: a computer system running image processing software that when executed by the computer system causes the computer system to: receive an identification of a geographical area for which an oblique-mosaic image is desired; assign surface locations to pixels included in an oblique-mosaic pixel map of the geographical area encompassing multiple source images, the oblique-mosaic pixel map being part of a mathematical model of a virtual camera looking down at an oblique angle onto the geographical area; create a ground elevation model of the ground and vertical structures within the oblique-mosaic pixel map using overlapping source images of the geographical area, wherein the source images were captured at an oblique angle and compass direction similar to the oblique angle and compass direction of the virtual camera; and reproject, with the mathematical model, source oblique image pixels of the overlapping source images for pixels included in the oblique-mosaic pixel map using the ground elevation model to thereby create the oblique-mosaic image of the geographical area.
2. The system of claim 1, wherein the virtual camera has a perspective, and wherein the image processing software further causes the computer system to project each pixel through the perspective of the virtual camera to determine a corresponding surface location for each pixel in the oblique-mosaic pixel map.
3. The system of claim 1, wherein multiple source oblique images represent a same surface location.
4. The system of claim 3, wherein the image processing software further causes the computer system to compare the pixels of each source oblique image that represent the same surface location to determine which source pixel is most representative of the surface location, the more representative pixel to be included in the oblique-mosaic image.
5. The system of claim 1, wherein each pixel included in the oblique-mosaic pixel map is reprojected to match a size and shape of a represented surface location as taken from the elevation, compass direction, and oblique angle of the virtual camera.
6. The system of claim 1, wherein the image processing software further causes the computer system to remove effects of elevation from the source oblique images prior to reprojection and then to add the effects of elevation to the oblique-mosaic image after reprojection.
7. The system of claim 1, wherein metadata is stored with the oblique-mosaic image.
8. A system comprising: a computer system running image processing software that when executed by the computer system causes the computer system to: receive an identification of a geographical area for which an oblique-mosaic image is desired; determine geographic coordinates for pixels included in an oblique-mosaic pixel map of the geographical area encompassing multiple source images, the oblique-mosaic pixel map being part of a mathematical model of a virtual camera looking down at an oblique angle onto the geographical area; create a ground elevation model of the ground and vertical structures within the oblique-mosaic pixel map using overlapping source images of the geographical area, wherein the source images were captured at an oblique angle and compass direction similar to the oblique angle and compass direction of the virtual camera; and reproject, with the mathematical model, using the ground elevation model to define a ground surface and surfaces of vertical structures within the geographical area, at least one source oblique image pixel of the overlapping source images from a vantage point of the virtual camera for pixels included in the oblique-mosaic pixel map to thereby create a geo-referenced oblique-mosaic image of the geographical area.
9. The system of claim 8, wherein the virtual camera has a perspective, and wherein the image processing software further causes the computer system to project each pixel through the perspective of the virtual camera to determine a corresponding surface location for each pixel in the oblique-mosaic pixel map.
10. The system of claim 8, wherein multiple source oblique images represent a same surface location.
11. The system of claim 8, wherein the image processing software further causes the computer system to compare the pixels of each source oblique image that represent a same surface location to determine which source pixel is most representative of the surface location, the more representative pixel to be included in the geo-referenced oblique-mosaic image.
12. The system of claim 8, wherein the at least one source oblique image pixel is reprojected to match a size and shape of a represented surface location as taken from the elevation, compass direction, and oblique angle of the virtual camera.
13. The system of claim 8, wherein the image processing software further causes the computer system to remove effects of elevation from the source oblique images prior to reprojection and then to add the effects of elevation to the geo-referenced oblique-mosaic image after reprojection.
14. The system of claim 8, wherein metadata is stored with the geo-referenced oblique-mosaic image.
Description
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
DETAILED DESCRIPTION OF THE PRESENTLY DISCLOSED AND CLAIMED INVENTION
(7) Before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not limited in its application to the details of construction, experiments, exemplary data, and/or the arrangement of the components set forth in the following description or illustrated in the drawings. The invention is capable of other embodiments or of being practiced or carried out in various ways. Also, it is to be understood that the phraseology and terminology employed herein is for purpose of description and should not be regarded as limiting.
(8) The presently claimed and disclosed invention(s) relate to oblique-mosaic images and methods for making and using the same. More particularly, the presently claimed and disclosed invention(s) use a methodology whereby separate obliquely captured aerial images are combined into at least one single oblique-mosaic image. The at least one single oblique-mosaic image is visually pleasing and geographically accurate.
(9) Referring now to the Figures and in particular to
(10) In general, the method identifies a desired area 15 to be imaged and collected into the oblique-mosaic image 12. Source images are obtained utilizing a real camera 14 capturing a scene 16 as indicated by a block 18 in
(11) As described hereinabove, the use of a grid reprojection methodology to create oblique-mosaic images is fraught with problems and is unlikely to provide a useable image. Therefore, in order to produce the quality oblique-mosaic image 12, an improved process must be performed. Such an improved and unique process is described and claimed herein and preferably uses, generally, the following considerations: First, rather than projecting onto a rectilinear grid, the input image pixels are projected onto the “virtual camera” 20. The virtual camera 20 is a mathematical model that describes a very large camera, high up in the sky. Since it is a mathematical creation, this camera 20 is not bound by the current limitations of camera manufacturing. Second, the pixels are not then reprojected to be square or even the same size in the output image, as is the case in the standard ortho-rectification process used when creating an ortho-mosaic. Instead, they are reprojected to match the size they would project onto the ground from this virtual camera 20—i.e. the natural perspective that a camera captures. Third, in order to best align the combined imagery, the effects of changes in elevation are first backed out of the input imagery during the projection from the original input image's camera location to the ground, and then the effects of elevation are reintroduced during the projection from the ground up to the virtual camera's location. The result is a natural looking image that properly shows the contours of the land as seen from a natural oblique perspective. Finally, the effects of building lean are desirably minimized. There are a number of different ways in which building lean can be minimized: By “steering” the cut-lines between input images down streets such that the cut-lines do not run through a building, which can cause conflicting building lean to greatly distort the appearance of a building. The angle of building lean can be calculated and the images warped in order to compensate for the building lean. The buildings can be matched in adjacent images and aligned to each other.
(12) In practice, the methodology disclosed and claimed herein, consists of multiple steps and data transformations that can be accomplished by one of ordinary skill in the art given the present specification. There are a number of algorithms already known in the art that steer cut-lines for ortho-mosaics and could be readily adapted for use with oblique images. In addition, follow-on work could create new algorithms specifically designed to deal with the complexities of oblique images.
(13) The first step to creating the oblique-mosaic image 12 according to the presently disclosed and claimed invention requires the selection of an area to be imaged. Generally, the area to be imaged would be a specific geographical location. However, other areas can also be selected to be imaged into the oblique-mosaic image 12 such as building sides, walls, landscapes, mountain sides and the like.
(14) Once a desired area to be imaged and collected into the oblique-mosaic image 12 has been determined, the user or operator creates the “virtual camera” 20 i.e. a mathematical construct that is capable of covering or capturing a portion of, or the entirety of the desired area. The virtual camera 20 is a mathematical model of a camera with the necessary camera geometry parameters (the mathematical values that define the camera model, for instance the number of rows and columns of the sensor plane, size of the sensor plane in millimeters, focal length in millimeters, height above ground, yaw, pitch, and roll of the optical axis) that enable it to preferably “capture” the desired scene. For instance, a virtual camera can be devised having a very large sensor (e.g. 20,000 pixel columns and 10,000 pixel rows), a standard field of view (36 mm by 24 mm sensor plane and a 60 mm focal length), and be “placed” at a relatively high altitude (e.g. 30,000 feet) looking down at an oblique angle to the north (yaw and roll of 0 and pitch of −40 degrees). In a preferred embodiment, a sensor model from a real camera is used and the user simply modifies the parameters such that it meets the requirements in order to “capture” the desired area.
(15) The second step creates the resulting oblique pixel map for the virtual camera 20. The pixel map corresponds to the virtual camera's sensor and thus typically, but not necessarily, has the same number of rows and columns as the virtual camera's sensor. Then, for each pixel in the pixel map image, the projective equations for the virtual camera 20 are used to project each pixel downward and away from the virtual camera 20 and onto the ground, taking elevation into account when doing so (generally through the use of a mathematical elevation model of the ground surface). This results in a corresponding ground location for that virtual camera's pixel.
(16) Once the corresponding ground location has been found, it can be used to select which previously captured images contain image data for that ground location. This is generally done by checking to see if the ground location lies within the image boundaries of a previously captured image.
(17) When selecting source oblique images, e.g., input captured images, in order to achieve a desirable output image, it is important to use source oblique images that were captured in the same, or nearly the same, relative orientation of the virtual camera 20, in terms of oblique downward angle and compass direction of the optical axis. While it is generally not an issue to use input imagery from a camera whose model is different than the virtual camera's model, if that model is radically different (for instance, a line scanner versus a full frame capture device), it may result in an undesirable resulting image.
(18) While this invention discusses using captured images as input to the oblique mosaic 12, it is not actually required. It is possible to use a projected image as input to this process or even to use another oblique mosaic as input to this process. However, since this process reprojects the input images, it is desirable to use non-projected input images, i.e. captured images. The reason is that reprojecting already projected data can often lead to artifacts, sort of like rounding errors in mathematical calculations. These artifacts can create an undesirable resulting oblique-mosaic.
(19) It is generally desirable to create the continuous oblique-mosaic 12. In order to do so, there must be captured image data for the entire area being “captured” by the virtual camera 20. This means that if multiple captured images are being combined to create the oblique-mosaic 12, those input images must be adjacent or more commonly, overlapping. As a result of this overlap, it is common for there to be multiple captured images covering the same area on the ground. If multiple captured images are available for selection, a preferred captured image is chosen according to the selection criteria described below.
(20) When multiple images from real cameras 14 cover the same point on the ground, a selection process can be used to determine which real camera image should be used as input for the creation of the virtual camera's pixel map image. This selection process can be done by assigning weights (assigned numerical values) to the following input criteria, multiplying those weights by the normalized criterion (a value that has been scaled between 0 and 1), and then selecting the image with the greatest sum of these weight/criterion products. While any number of criteria can be used, the following three criteria have been used in the development of this invention:
(21) Selection Criterion: Distance to Optical Axis
(22) The distance between the point on the ground being selected and the point where the input camera's optical axis intersects the ground. This value can be normalized by dividing the distance by the maximum distance able to be measured in the scene.
(23) Selection Criterion: Angular Difference to Optical Axis
(24) The difference between the following two angles: the angle of the input camera's optical axis (generally measured relative to the perpendicular) and the angle of the ray being cast from the virtual camera to the point on the ground being selected (again, generally measured relative to the perpendicular). This value can be normalized by dividing by 180-degrees.
(25) Selection Criterion: Distance to Nearest Street Centerline
(26) The distance between the point on the ground being selected and the nearest street centerline. The street centerlines can be obtained from vector data files such as TIGER files or other Geographic Information System files. This value can be normalized by dividing by the maximum distance able to be measured in the scene.
(27) Once the preferred captured image has been selected, the projective equations for the captured image's camera are used to project from the ground up to the camera, taking the ground elevation into account when doing so. This projection through the focal point and onto the camera's sensor plane will find a pixel row and column corresponding to the point on the ground. As this typically does not fall on an integer row or column, bilinear interpolation (an industry standard mathematical formula for finding a single value from the proportionate proximity to the four surrounding pixels) is used to find the pixel value for the corresponding point on the camera's sensor plane.
(28) This pixel value is then used to fill the pixel in the image that corresponds to the virtual camera's sensor plane from which the original ray was projected outward onto the ground. This process is repeated for some or all of the remaining pixels in the virtual camera's sensor plane, resulting in an image that covers some area or the entire area on the ground that the virtual camera 20 can “see.” Preferably, this process is repeated for all of the remaining pixels in the virtual camera's sensor plane, resulting in a complete image that covers the entire area on the ground that the virtual camera 20 can “see.”
(29) The resulting image and its corresponding projective equations are then stored. The resulting image can be stored in any format, including one of many industry standard image formats such as TIFF, JFIF, TARGA, Windows Bitmap File, PNG or any other industry standard format. For the corresponding projective equations, the following data should, in a preferred embodiment, be stored as metadata with the resulting image, either appended to the same file or in another file readily accessible (an industry standard practice is to use the same filename but with a different file extension): 1. The location of the camera—generally, the location of the camera is specified by a geographic location and altitude (or height over ground). 2. The orientation of the camera—generally specified by three angles: yaw, pitch and roll (or omega, phi, and kappa). 3. The size of the sensor—generally specified as an overall sensor size in millimeters (or by the size of an individual sensor element). 4. The focal length of the lens—generally specified in millimeters. 5. Optionally, lens distortion information—generally specified as a principal point offset (offset between the optical axis and the middle of the sensor) and as radial distortion terms (polynomial coefficients that describe the amount of distortion that occurs radially out from the optical axis). 6. Optionally, a ground elevation model—generally specified as a grid of elevation values.
(30) As discussed above, the relative perspective of the camera causes an effect known as “building lean.” While building lean is most commonly applied to, and thus discussed as, buildings, it also applies to any vertical structure in the object, such as electrical towers, trees, cars, phone booths, mailboxes, street signs, and so on. Building lean is an effect that makes it appear as if buildings that are not along the optical axis of the camera “lean” away from the optical axis—the farther away from the optical axis, the greater the lean. This lean is the result of perspective, which causes objects that are raised off the ground to appear farther “back” into the image, away from the camera. Thus, the top of a building appears farther back than the bottom of a building. When this lean corresponds to the camera's perspective, it looks normal. However, as part of the oblique-mosaic process, captured images, each with their own perspective, are combined into a single virtual camera's perspective.
(31) This combination of different perspectives becomes problematic, especially when two different captured images from different vantage points contribute pixel values to adjacent areas in the virtual camera's pixel map image. For example: If a camera to the left of a building provides the left side of a building, then that portion of the building will appear to be leaning to the right (away from the camera). If a camera that is located to the right of the building provides the right side of the building, then that portion of the building will appear to be leaning to the left. Since the two halves of the building “lean into” each other, the resulting combined image has a building that has a triangular, rather than a rectangular, appearance.
(32) Because building lean only affects surfaces above the surface of the ground, it is generally fairly difficult to account for or correct these effects because in order to do so, the user must have knowledge of the presence of the building or structure. Features that are on the ground do not experience this building lean because the change in relative perspective is backed out when ground elevation is taken into account in the projective equations. The mathematical model used to define the ground surface during the projective process ensures the correct ground location is selected. However, for objects that rise above the ground surface, and are not represented in the mathematical model used to define the ground surface, this relative perspective change causes the virtual camera to “see” the building top in the wrong place—i.e. too far back from the camera.
(33) A method for minimizing the effects of building lean, as contemplated herein, is to transition between one input camera image and the next input camera image over an area where there are no structures above or below the ground elevation model. In one embodiment, this is accomplished by placing the transition down the middle of a street. Thus, by having a properly weighted selection criterion for distance to street centerline, if there is a street in the area where two captured images overlap, then the transition from contributing pixels from the one captured image to contributing pixels from the second captured image will occur along this street, thus minimizing the effects of building lean.
(34) A method for removing building lean entirely, as contemplated herein, is to provide an accurate ground elevation model taking into account buildings and other vertical structures. Thus, every pixel that comprises the image of the building is represented in the mathematical model of the ground elevation model and therefore the change in relative perspective is accounted for in the projective process. However, for this to work well, the elevation model must be highly correlated to the input imagery. If there is any shift in location between the imagery and the elevation model, then the buildings will not be projected properly when creating the oblique-mosaic image 12.
(35) To overcome this limitation, the preferred methodology is to create the elevation model from the imagery itself. This can be done by using an industry standard process known as aero-triangulation, which finds the elevation of a point on the ground by comparing its location in two overlapping captured images and using the projective equations of their corresponding cameras to triangulate its location and elevation. Repeating this process over the entire overlap area can produce an accurate mathematical model of the surface of not only the ground, but also of the surface of structures and objects within the images. More importantly, because this model is derived from the imagery itself, it is by definition, highly correlated to the input image.
(36) Another method for removing building lean, contemplated for use herein, is to attempt to identify vertical structures by using an edge matching process between the oblique and the corresponding nadir imagery. Vertical structures do not appear in a truly nadir image, and they barely appear in a slightly off-nadir image. Thus, when comparing an oblique image with its corresponding nadir image, the primary difference between the structures that appear in the two images will be the vertical structures. By using one or more edge detection algorithms (such as an industry standard Laplacian Filter), it is possible to identify the various structures within the two images and then isolate the vertical edges in the oblique image by subtracting out the non-vertical structures that also appear in the nadir image. Once these vertical structures have been found, the pixels for those vertical structures can be shifted to remove the effects of the change in relative perspective. By shifting the pixel's apparent location in the captured oblique image by the relative height above the ground model found through the measuring of the vertical edges, its proper ground location can be determined, thus negating the effects of building lean.
(37) It should be understood that the processes described above can be performed with the aid of a computer system running image processing software adapted to perform the functions described above, and the resulting images and data are stored on one or more computer readable mediums. Examples of a computer readable medium include an optical storage device, a magnetic storage device, an electronic storage device or the like. The term “Computer System” as used herein means a system or systems that are able to embody and/or execute the logic of the processes described herein. The logic embodied in the form of software instructions or firmware may be executed on any appropriate hardware which may be a dedicated system or systems, or a general purpose computer system, or distributed processing computer system, all of which are well understood in the art, and a detailed description of how to make or use such computers is not deemed necessary herein. When the computer system is used to execute the logic of the processes described herein, such computer(s) and/or execution can be conducted at a same geographic location or multiple different geographic locations. Furthermore, the execution of the logic can be conducted continuously or at multiple discrete times. Further, such logic can be performed about simultaneously with the capture of the images, or thereafter or combinations thereof.
(38) Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity of understanding, it will be obvious to those skilled in the art that certain changes and modifications may be practiced without departing from the spirit and scope thereof, as described in this specification and as defined in the appended claims below.