DEPTH MAP BASED PERSPECTIVE CORRECTION IN DIGITAL PHOTOS

20170289516 · 2017-10-05

    Inventors

    Cpc classification

    International classification

    Abstract

    The invention relates to post-processing of a digital photo to correct perspective distortion in the photo. The correction applies a digital photo of a scene and a depth map associated with the photo and comprising, for each pixel in the photo, a depth being a distance between a part of the scene in that pixel and a position of the camera at the time of acquisition. The correction is performed locally, so that the correction of any pixel in the photo depends on the depth of that pixel. The correction can be implemented as a transformation of each pixel in the original photo into a new position in a corrected photo. Afterwards, pixel values has to be calculated for the pixels in the corrected photo using the original pixel values and their new positions. The invention is particularly relevant for photos where objects or scenes involves a large magnification variation, such as selfies, close up photos, and photos when the extension of a large, object is not orthogonal to the optical axis of the camera (low/high angle shots).

    Claims

    1. A method for automatically performing perspective correction in a digital photo of a scene recorded with a camera, the method comprising: providing, using only a single camera with a single acquisition position, a digital photo of a scene represented by pixel values, P.sub.(x,y), for an array of pixels, (x,y), the photo having perspective distortion effects; a depth map associated with the photo and comprising, for each pixel in the pixel array, a depth, d.sub.(x,y), being a distance between a part of the scene represented by that pixel and an acquisition position of the camera at the time of acquisition of the photo; and performing perspective correction of the photo using as photographic input only the photo of the scene from the single camera and acquisition position and its associated depth map by, for each pixel, (x,y), in the photo, determining a new position, D.sub.proc, in an image plane for a virtual camera position from at least the pixel's depth, d.sub.acq(x,y), its position, D.sub.acq(x,y), in the image plane at the acquisition position, and a displacement, C, between the virtual camera position and the acquisition position as: D proc = D acq ( x , y ) * d acq ( x , y ) d acq ( x , y ) + C

    2-14. (canceled)

    15. The method according to claim 1, wherein the step of performing perspective correction further comprises adjusting a magnification of the new position using also a depth, d.sub.acq.sub._.sub.ref, of a reference plane chosen to preserve a magnification in a selected plane of the depth map.

    16. The method according to claim 15, wherein the step of performing perspective correction comprises determining the new position as: D proc = D acq ( x , y ) * d acq ( x , y ) * ( d acq_ref + C ) ( d acq ( x , y ) + C ) * d acq_ref

    17. The method according to claim 1, wherein the step of performing a perspective correction further comprises using the pixel values P.sub.(x,y) and the new positions (x′,y′) to determine new pixel values, P.sub.(i,j), for an array of pixels, (i,j), representing a corrected photo by, for each new position (x′,y′), adding the corresponding pixel value P.sub.(x,y) to the new pixel values P.sub.(i,j) of the pixels (I,j) surrounding the new position, wherein the pixel values P.sub.(x,y) are weighted by a weighting factor that is a function of the relative positions of the new position (x′,y′) and each pixel (i,j).

    18. The method according to claim 17, further comprising subsequently dividing each new pixel value P.sub.(i,j) with a normalization factor.

    19. The method according to claim 17, further comprising subsequently, for a new pixel with undefined values of P.sub.(i,j), calculating an interpolated pixel value from surrounding pixels in the corrected photo having defined values of P.sub.(i,j).

    20. The method according to claim 1, wherein the displacement between the virtual camera position and the acquisition position is a linear displacement along an optical axis of the camera.

    21. The method according to claim 1, wherein the steps of providing a photo and providing a depth map comprise providing a series of photos and associated depth maps, and wherein the method further comprises detecting and evaluating perspective distortion in the photos based either on the distance of the closest object in the scene or on the analysis of vanishing lines, selecting photos with perspective distortion that would benefit from perspective correction, and automatically performing perspective correction on the selected photos.

    22. The method according to claim 1, further comprising performing perspective correction of the depth map by, for each pixel, (x,y), in the depth map, and determining a new position, D.sub.proc, in an image plane for a virtual camera position from at least the depth held by that pixel, d.sub.acq(x,y), its position, D.sub.acq(x,y), in the image plane at the acquisition position, and a displacement, C, between the virtual camera position and the acquisition position.

    23. A digital storage holding software configured to perform the method of claim 1 when executed by one or more digital processing units.

    24. An integrated circuit configured to perform the method of claim 1.

    25. A handheld or portable device with a camera comprising the digital storage holding software according to claim 23.

    Description

    BRIEF DESCRIPTION OF THE FIGURES

    [0045] The invention will now be described in more detail with regard to the accompanying figures. The figures show one way of implementing the present invention and is not to be construed as being limiting to other possible embodiments falling within the scope of the attached claim set.

    [0046] FIGS. 1A-D illustrate image corrections made by DxO ViewPoint 2 Application with the same correction applied on a photo and a checkerboard pattern.

    [0047] FIGS. 2 and 3 illustrate a setup for explaining the derivation of the applicable algebra according to a specific implementation of the invention.

    [0048] FIG. 4 illustrates the calculation of new pixel values.

    [0049] FIGS. 5A-C illustrate the use of adaptive kernels for interpolation pixel values for pixels with undefined pixel values.

    [0050] FIG. 6 is a chart illustrating an embodiment of the method according to the invention, as well as a schematic system-chart representing an outline of the operations of the computer program product according to the invention.

    DETAILED DESCRIPTION OF THE INVENTION

    [0051] The main emphasis of the examples presented in the following description is on perspective deformation when the ratio between camera distance to closest part of the scene, and furthest part of the scene is high and introducing a strong distortion. This mainly happens with close distance or low-angle shot. However, the perspective correction according to the invention can be applied for any scene topology.

    [0052] The methodology of the perspective correction of the invention to transform the photo from the Point of View (POV) of the camera at the time of acquisition to a virtual POV in the processed photo where the perspective distortion effects are reduced, insignificant, or not present. It is also an objective to change to a new POV (far or infinity) while keeping the same object size.

    [0053] So, for each pixel, (x,y), in the acquired photo, one need to compute a new position (x′,y′) of the associated pixel value P.sub.(x,y) in the processed photo. The new position is calculated as a function of the pixels position in the acquired photo, (x,y), and the distance, d.sub.(x,y), between the camera and the part of the scene in that pixel.

    [0054] The following describes an embodiment where the displacement between the camera position at the time of acquisition (also referred to as the original camera position) and the camera position at the virtual POV is a linear displacement along the optical axis of the camera, here the Z axis. More complex displacements (displacement in other directions: x/y as well as rotation) can be used, but the algebra for these are, although straightforward to derive, quite extensive.

    [0055] FIG. 2 illustrates the setup with an object, an acquisition position of the camera and a virtual camera position, the camera having a lens and with a focal length f.

    [0056] The following notation will be used in the description and are illustrated in FIGS. 2 and 3: [0057] d: The depth for a pixel [0058] D: A pixels distance from the center of the sensor or the optical axis [0059] C: Displacement between the acquisition position and the virtual position along Z-axis, +: moving away from scene; −: moving closer to scene [0060] Index “acq”: referring to the acquired photo [0061] Index “proc”: referring to the processed photo from the virtual camera position. [0062] Coordinates/indices (x,y): the integer position of a pixel in the acquired photo [0063] P: Pixel value, e.g. RGB or another color space. [0064] Coordinates/indices (x′,y′): the new (decimal) position of the pixel value of pixel (x,y) after transformation [0065] Coordinates/indices (i,j): the integer position of a pixel in the processed photo

    [0066] The following geometric relations can be derived from FIG. 2:


    d.sub.acq/D=f/D.sub.acq


    d.sub.proc/D=f/D.sub.proc


    d.sub.proc=d.sub.acq+C


    =>D.sub.proc/D.sub.acq=d.sub.acq/d.sub.proc


    =>D.sub.proc=D.sub.acq*d.sub.acq/d.sub.proc


    =>D.sub.proc=D.sub.acq*d.sub.acq/(d.sub.acq+C)   (1)

    [0067] As previously mentioned, the magnification is the ratio between the real size of the object and the object's size in the photo, and can, in relation to FIG. 2, be expressed as D/D.sub.acq=f/d.sub.acq in the acquired photo and D/D.sub.proc=f/d.sub.proc=f/(d.sub.acq+C) in the processed photo.

    [0068] The transformation (1) introduces a magnification of the entire image. If we want to choose a reference plane where the magnification factor is one, we need to compute the magnification factor for this distance. This is illustrated in FIG. 3. The reference plane is preferably chosen to be near the center of the object in the direction towards the camera. For e.g. a face, the reference plane can be chosen to be the plane of the contour of the face (hair/ears) so the head size is kept and the nose distortion is fixed.

    [0069] The magnification at the reference plane is:


    D.sub.proc.sub._.sub.ref/D.sub.acq.sub._.sub.ref=(d.sub.acq.sub._.sub.ref)/(d.sub.acq.sub._.sub.ref+C)   (2)

    [0070] Including the reference magnification (2) in the transformation (1) one obtains:


    D.sub.proc=D.sub.acq*d.sub.acq*(d.sub.acq.sub._.sub.ref+C)/((d.sub.acq.sub._.sub.ref)*(d.sub.acq+C))   (3)

    [0071] If C is infinity (same magnification for all objects) we have:


    D.sub.proc=D.sub.acq*d.sub.acq/(d.sub.acq.sub._.sub.ref)   (4)

    [0072] Since D has rotational symmetry around the optical axis (z-axis), the transformation as expressed in (3) is in polar coordinates with D as the radial coordinate and angular coordinate φ which is unaffected by the transformation. It should be noted that other expressions for the transformation (3) with the same or similar results may be developed, using e.g. other coordinate systems, other camera optics or other conditions. The important feature in the transformation (3) is that the transformation, and thus the correction, of any pixel in the photo depends on the depth of that pixel, d.sub.acq. So, given a photo with a perspective distortion and an associated depth map, by selecting a virtual camera position from which the perspective distortion is significantly reduced or absent, the perspective correction is in principle complete.

    [0073] The transformation (3) can be used to calculate the positions (x′,y′) of the pixels with values P.sub.(x,y) as they would have been if photo had been taken with the camera in the virtual position. As can be inferred from (3), the new position is a function of the pixel's depth d.sub.(x,y), its position, D, and the displacement, C, between the virtual camera position and the acquisition position. By including the reference magnification from (2) in (3), the new positions preserves a magnification in a selected plane of the depth map.

    [0074] In a preferred implementation, the transformation (3) is applied as a forward transformation; from the original pixel position (x,y), compute the new position (x′,y′) of the pixel value P.sub.(x,y). Forward transform involves the complications that multiple pixels can contribute to a single pixel in the processed photo and some pixel in the processed photo may get no contributions at all. An inverse transformation can also be used, but for this perspective correction the computation is more demanding, and forward mapping is the preferred implementation.

    [0075] In a forward transformation, the acquired photo is scanned, and for each point computes new coordinates. However, in order for the transformed photo to be expressed in standard digital format with a regular array of pixels with the same size and each having a single value in some color space, the transformed photo need some more processing. Just relocating each pixel (x,y) to its new position (x′,y′) would create a picture with points where more pixels overlap (multiple source pixels) and blank points with no pixels (holes).

    [0076] For each source pixel P(x,y), with x and y being integer values, coordinates in the processed image: x′ and y′ are decimal values. These coordinates can be expressed as an addition of an integer value (i,j) and the fractional part (δ.sub.xδ.sub.y).


    x′=i+δ.sub.x


    y′=j+δ.sub.y

    [0077] First, pixel values P.sub.(x,y) are assigned to the pixels in the pixel array of the processed photo, this is described in relation to FIG. 4. The P.sub.(x,y)'s are the pixel values in the destination photo, and the X's are the centers of the pixels (i,j) in the processed photo with values P.sub.(i,j). For every new position calculated, the corresponding pixel value will contribute to the pixel values of the nearby pixels in the processed photo. In a preferred implementation, this is carried out as follows.

    [0078] The new pixel values are determined by, for each pixel in the acquired photo add the corresponding pixel value P.sub.(x,y) to the new pixel values P.sub.(i,j) of the four pixels in the corrected photo that are closest to the new position. When added, the pixel values P.sub.(x,y) are weighted by a weighting factor that is a function of the relative positions of the new position (x′,y′) and the pixel (i,j) so that, as illustrated in FIG. 4 for a bilinear interpolation:


    P.sub.(i,j).fwdarw.P.sub.(i,j)+P.sub.(x,y)*(1−δ.sub.x)*(1−δ.sub.y)


    P.sub.(i+1,j).fwdarw.P.sub.(i+1,j)+P.sub.(x,y)*δ.sub.x*(1−δ.sub.y)


    P.sub.(i,j+1).fwdarw.P.sub.(i,j+1)+P.sub.(x,y)*(1−δ.sub.x)*δ.sub.y


    P.sub.(i+1,j+1).fwdarw.P.sub.(i+1,j+1)+P.sub.(x,y)*δ.sub.x*δ.sub.y

    [0079] Thus, weighted values of the original pixel values are accumulated in each pixel in the processed photo. In order to normalize the new pixel values in the processed photo, each pixel value is subsequently divided by a normalization factor.

    [0080] In actual practice, a “Photo accumulated buffer” is created. First it is filled with 0's, and each time a pixel value P.sub.(x,y) contributes to a pixel P.sub.(i,j) in the processed photo, a weighted value of P.sub.(x,y) is summed in this buffer. At the same time, the weighting factor (e.g.: (1−δ.sub.x)*(1−δ.sub.y)) is summed in a “Weighting buffer”. Once the forward mapping is done, each pixel from the Photo accumulated buffer is divided by the Weighting factor of the weighting buffer” to generate a “Weighted perspective corrected photo”. This solves the problem that with forward mapping, some pixels in the processed photo get information from multiple pixels in the acquired photo.

    [0081] In the above, accumulating of the acquired pixel values in the surrounding pixels uses a bilinear interpolation and a weighting factor δ.sub.x*δ.sub.y. However, other ways of distributing the pixel values of the acquired photo and thereby other weighting factors are possible and other interpolation method such as bicubic or spline could also be used.

    [0082] The weighted perspective corrected photo may contain “holes” where no pixel in the acquired photo has contributed to this pixels value, and it is therefore undefined. To fill the holes, an interpolated pixel value from surrounding pixels in the corrected photo having defined values of P.sub.(i,j) is calculated for pixels with undefined values of P.sub.(i,j). In a preferred implementation, the weighted perspective corrected photo is scanned, and each time a hole is found, an interpolated value of this pixel is computed, based on Inverse Distance Weighted (IDW) from the valid pixel.

    [0083] More information related to IDW can be found on e.g. http://en.wikipedia.org/wiki/Inverse_distance_weighting.

    [0084] As the size of the holes is unknown, the distance to closest defined pixel value is unknown. In order to ensure fast processing and avoid unfilled holes, adaptive kernel size can be used as illustrated in FIGS. 5A-C: In FIGS. 5A and B, a 3×3 kernel is used, and in 5A, the a value for the active pixel, the hole, can be computed with IDW from the 6 surrounding pixels with defined pixel values. In 5B however, the surrounding pixels do not offer data for IDW, hence no defined value for the active pixel/hole is obtained and the kernel size must be increased, for example to the 5×5 kernel of FIG. 5C. Here, a value for the active pixel/hole can be computed with IDW from the 10 surrounding pixels with defined values.

    [0085] Filling hole can also be done by texture mapping if the holes are too large for pixel interpolation. For example, for selfies, the side of the nose could be an area where we get missing polygons when the distance is increased, in such a case skin texture mapping can be used.

    [0086] FIG. 6 illustrates the full process of the method for perspective correction.

    [0087] As mentioned previously, the depth map itself, a single channel image with all pixel values being depths, can also be corrected using the same transformations and procedures as for the photo of the scene. The displacement C between the actual and virtual camera positions may be added to each depth/pixel value after the transformation. The corrected depth map is associated with the corrected photo, giving the distances of different parts of the scene to the image plane of the virtual camera position.

    [0088] Selfie Application

    [0089] A preferred application of the present invention relates to perspective correction in selfies. Selfies are by nature photos taken at close distance such as using a mobile phone where max distance is typically arm length or a camera in a laptop or tablet computer or a webcam. These close distance photos most often exhibit perspective distortions—face is close to the camera, and the ratio between the distances to the closest part (nose) and furthest part (ear) is significant.

    [0090] Selfies are therefore typical candidates for automatic perspective correction, and the automatic detection and evaluation of perspective distortion in order to select photos that would benefit from perspective correction mentioned earlier can be combined with pattern recognition to detect depth variation occurring in human faces applied to the depth map.

    [0091] Also, in case of selfies, the background (part of scene behind the person shooting the selfie) can be determined using the depth map information. Hence, in a preferred embodiment, the method according to the invention involves detection of one or more foreground objects and a background in the photo by analyzing the depth map to identify regions with very fast change in depth and areas being at least partly outlined by such regions, the areas with smaller average depth being identified as foreground objects and areas with larger average depth being identified as background. In an alternative implementation, the method can involve detection of one or more foreground objects and a background in the photo by analyzing the depth map to identify areas with depths smaller than 300 cm, such as smaller than 200 cm, 150 cm, or 100 cm as foreground objects and areas with larger depth being defined as background.

    [0092] Having detected the background and foreground parts of the photo, the background can be replaced with other image content (photo, painting, graphics or any combination of such) while keeping the foreground objects.

    [0093] Technical Implementation

    [0094] The invention can be implemented by means of hardware, software, firmware or any combination of these. The invention or some of the features thereof can also be implemented as software running on one or more data processors and/or digital signal processors. FIG. 6 can also be seen as a schematic system-chart representing an outline of the operations of an embodiment of the computer program product according to the second aspect of the invention. The individual elements of hardware implementation of the invention may be physically, functionally and logically implemented in any suitable way such as in a single unit, in a plurality of units or as part of separate functional units. The invention may be implemented in a single unit, or be both physically and functionally distributed between different units and processors.

    [0095] The integrated circuit according to the third aspect of the invention can be a general In-System Program (ISP), microprocessor or an ASIC or part of such. This is can be advantageous in particular for the handheld or portable device with a camera according to the fourth aspect of the invention, where low costs, power consumption, weight, volume, heat generation, etc. are of high importance. Handheld devices with camera comprises digital cameras, cellular phones, tablet computers, mp3 players and others. Portable devices with camera comprises for example laptop computers.

    [0096] Although the present invention has been described in connection with the specified embodiments, it should not be construed as being in any way limited to the presented examples. The scope of the present invention is to be interpreted in the light of the accompanying claim set. In the context of the claims, the terms “comprising” or “comprises” do not exclude other possible elements or steps. Also, the mentioning of references such as “a” or “an” etc. should not be construed as excluding a plurality. The use of reference signs in the claims with respect to elements indicated in the figures shall also not be construed as limiting the scope of the invention. Furthermore, individual features mentioned in different claims, may possibly be advantageously combined, and the mentioning of these features in different claims does not exclude that a combination of features is not possible and advantageous.