CLASSIFICATION-BASED IMAGE MERGING, TUNING, CORRECTION, AND REPLACEMENT
20220392032 · 2022-12-08
Inventors
- Satya Mallick (San Jose, CA, US)
- Gurismar Singh (Bengaluru, IN)
- Pranav Mishra (Bengaluru, IN)
- Sowjana Konduri (Bengaluru, IN)
- Sunita Nayak (San Diego, CA, US)
- Steve Elich (San Jose, CA, US)
- William Botero (San Jose, CA, US)
- Ryan Ozubko (San Jose, CA, US)
Cpc classification
G06V10/26
PHYSICS
G06T2207/20016
PHYSICS
International classification
Abstract
Methods for improving and modifying a High Dynamic Range (HDR) scene, captured as a series of images of the scene with different exposure levels and the scene through classification-based image merging, tuning, correction, and replacement. The approach employs mixing images to improve the selection and display of both shadowed and highlighted details. The increased efficiency resulting from improvements in computational resource utilization of image processing hardware can, from the implementation of the improved computational methods herein, significantly reduce the time required to generate and display a tone-mapped HDR image, a gamma-corrected HDR image, and/or a segmented and replaced HDR image.
Claims
1. A method of determining gamma correction in an image, comprising: receiving, by one or more processors, at least a first exposure image of a scene; receiving, by the one or more processors, at least a second exposure image of the scene, wherein the second exposure image of the scene has a shorter exposure time than the first exposure image of the scene; computing, by the one or more processors, a pixel value for a pixel location of a high dynamic range (HDR) image to be a sum of a pixel value of the first exposure image weighted by a first exposure weight and a pixel value of the second exposure image weighted by a second exposure weight, to produce a merged HDR image comprising Y bit data; adaptively mapping, by the one or more processors, the HDR image, to produce an output HDR image having Z bit data and a total number of pixels; applying, by the one or more processors, a range of gamma value correction levels to the output HDR image; detecting a number of pixels having a value of black level less than a predefined black level threshold; and selecting a tuned gamma value correction level.
2. The method of claim 1, further comprising: When selecting a tuned gamma value correction level, said number of pixels having a value of black level less than a predefined black level threshold is less than 0.025*(said total number of pixels).
3. A method of correcting detail obscured by brightness glare in an image, comprising: receiving, by one or more processors, at least a first exposure image of a scene; receiving, by the one or more processors, at least a second exposure image of the scene, wherein the second exposure image of the scene has a shorter exposure time than the first exposure image of the scene; computing, by the one or more processors, a refined mask, by performing a conjugation of the at least first exposure image of the scene and the at least second exposure image of the scene and selecting a number of pixels having a value of black level greater than a predefined black level threshold to form an unrefined mask of the scene, quantifying an amount of detail present in at least one portion of the second exposure image of the scene having a brightness level higher than the average brightness level of all pixels in the second exposure image of the scene, by applying a Laplacian to said second exposure image of the scene and applying a median blur denoising operation to form an intermediary laplacian mask, and selecting at least one pixel in at least one region of the intermediary laplacian mask that does not have a zero value; and computing a blended image by applying a gaussian pyramid operation and a laplacian pyramid merging of the at least second exposure image and an exposure fusion image using the refined mask.
4. A method of segmenting an image having sky by computing a pixel mask, comprising: receiving, by one or more processors, at least a first exposure image of a scene; receiving, by the one or more processors, at least a second exposure image of the scene, wherein the at least second exposure image of the scene has a shorter exposure time than the at least first exposure image of the scene; detecting, a number of pixels having a blue hue value greater than a predefined blue hue level threshold, greater than a red hue value for the number of pixels, and greater than a green hue value for the number of pixels; computing, by one or more processors, a detection mask by conjugating at least one mean blue hue mask and at least one threshold blue hue mask; and computing, by one or more processors, at least one group of pixels from the detection mask as sky, by detecting a largest group of pixels having a blue hue value greater than a predefined blue hue level threshold, greater than a red hue value for the number of pixels in the detection mask, and greater than a green hue value for the number of pixels in the detection mask, and designating pixels away from said largest group of pixels as not sky.
5. The method of claim 4, further comprising: computing, by one or more processors, a value mask by averaging the brightness value of the at least one group of pixels in a value channel; computing, by one or more processors, a threshold blue-red mask by detecting at least one pixel having a blue hue value greater than a predefined red hue level threshold; computing, by one or more processors, a threshold blue-green mask by detecting at least one pixel having a blue hue value greater than a predefined red hue level threshold; computing, by one or more processors, a combined detection mask by conjugating the threshold blue-red mask, threshold blue-green mask, and the value mask; computing, by one or more processors, a hue, saturation, and value mask by conjugating a hue threshold, a saturation threshold, and a value threshold of the at least one group of pixels of the at least second exposure image of the scene; computing, by one or more processors, a combined mask by applying a disjunction to the detection mask, combined detection mask, and hue, saturation, and value mask; computing, by one or more processors, a bright mask by conjugating at least a brightest intensity channel of the at least first exposure image of the scene and a brightest intensity channel of the at least second exposure image of the scene and retaining portions exceeding a defined brightness threshold; computing, by one or more processors, a sure sky mask by conjugating the combined mask and the bright mask and subsequently applying a morphological erosion; computing, by one or more processors, a finalized sky segmentation mask by calculating a probable foreground segment mask, calculating a probable background segment mask, iteratively segmenting a conjunction of the sure sky mask, the probable foreground segment mask, and the probable background segment mask to form a grab cut mask, and applying a disjunction to the grab cut mask and sure sky mask.
6. A method of classifying, segmenting, and replacing the sky in an image scene, comprising: classifying at least one group of pixels in an image scene as a sky portion by inputting a digitized image into a convolutional network of artificial neurons pretrained through the repeated convolution and pooling of at least one set of clear sky images and at least one set of sky images at least partially containing cloud cover; computing a pixel mask bounding the sky portion, wherein the pixel mask is calculated from the collection and convolution of at least a first exposure image of a scene and at least a second exposure image of the scene, wherein the at least second exposure image of the scene has a shorter exposure time than the at least first exposure image of the scene; and replacing the sky portion bounded by the pixel mask.
7. The method of claim 6, wherein replacing the sky portion bounded by the pixel mask occurs by applying a segmented interpolation.
8. The method of claim 6, wherein replacing the sky portion bounded by the pixel mask occurs by applying alpha blending to a whole-image replacement.
9. A method of classifying, segmenting, and replacing at least one portion of an image scene, comprising: classifying at least one group of pixels in an image scene as a relevant portion by inputting a digitized image into a convolutional network of artificial neurons pretrained through the repeated convolution and pooling of at least one image set containing at least one training replacement portion; computing a pixel mask bounding the relevant portion and calculated from the collection and convolution of at least a first exposure image of a scene and at least a second exposure image of the scene, wherein the at least second exposure image of the scene has a shorter exposure time than the at least first exposure image of the scene; and replacing the relevant portion of bounded by the pixel mask.
10. The method of claim 9, wherein replacing the relevant portion bounded by the pixel mask occurs by applying a segmented interpolation.
11. The method of claim 9, wherein replacing the relevant portion bounded by the pixel mask occurs by applying alpha blending to a whole-image replacement.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0020] The above and other aspects, features and advantages of the invention will be more apparent from the following more particular description thereof, presented in conjunction with the following drawings wherein:
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
DETAILED DESCRIPTION
[0038] The present invention comprising classification-based image merging, tuning, correction, and replacement method will now be described. In the following exemplary description numerous specific details are set forth in order to provide a more thorough understanding of embodiments of the invention. It will be apparent, however, to an artisan of ordinary skill that the present invention may be practiced without incorporating all aspects of the specific details described herein. Furthermore, although steps or processes are set forth in an exemplary order to provide an understanding of one or more systems and methods, the exemplary order is not meant to be limiting. One of ordinary skill in the art would recognize that the steps or processes may be performed in a different order, and that one or more steps or processes may be performed simultaneously or in multiple process flows without departing from the spirit or the scope of the invention. In other instances, specific features, quantities, or measurements well known to those of ordinary skill in the art have not been described in detail so as not to obscure the invention. It should be noted that although examples of the invention are set forth herein, the claims, and the full scope of any equivalents, are what define the metes and bounds of the invention.
[0039] For a better understanding of the disclosed embodiment, its operating advantages, and the specified object attained by its uses, reference should be made to the accompanying drawings and descriptive matter in which there are illustrated exemplary disclosed embodiments. The disclosed embodiments are not intended to be limited to the specific forms set forth herein. It is understood that various omissions and substitutions of equivalents are contemplated as circumstances may suggest or render expedient, but these are intended to cover the application or implementation.
[0040] The term “first”, “second” and the like, herein do not denote any order, quantity or importance, but rather are used to distinguish one element from another, and the terms “a” and “an” herein do not denote a limitation of quantity, but rather denote the presence of at least one of the referenced item.
[0041] As used herein, the term “substantially,” “about,” and similar terms are used as terms of approximation and not as terms of degree, and are intended to account for the inherent deviations in measured or calculated values that would be recognized by those of ordinary skill in the art. Further, the use of “may” when describing embodiments of the present invention refers to “one or more embodiments of the present invention.” As used herein, the terms “use,” “using,” and “used” may be considered synonymous with the terms “utilize,” “utilizing,” and “utilized,” respectively. Also, the term “exemplary” is intended to refer to an example or illustration.
[0042] As will be understood by one skilled in the art, for any and all purposes, such as in terms of providing a written description, all ranges disclosed herein also encompass any and all possible sub-ranges and combinations of sub-ranges thereof. Any listed range can be easily recognized as sufficiently describing and enabling the same range being broken down into at least equal halves, thirds, quarters, fifths, tenths, etc. As a non-limiting example, each range discussed herein can be readily broken down into a lower third, middle third and upper third, etc. As will also be understood by one skilled in the art all language such as “up to”, “at least”, “greater than”, “less than”, and the like include the number recited and refer to ranges which can be subsequently broken down into sub-ranges as discussed above. Finally, as will be understood by one skilled in the art, a range includes each individual member. Thus, for example, a group having 1-3 articles refers to groups having 1, 2, or 3 articles. Similarly, a group having 1-5 articles refers to groups having 1, 2, 3, 4, or 5 articles, and so forth. The phrases “and ranges in between” can include ranges that fall in between the numerical value listed. For example, “1, 2, 3, 10, and ranges in between” can include 1-1, 1-3, 2-10, etc. Similarly, “1, 5, 10, 25, 50, 70, 95, or ranges including and or spanning the aforementioned values” can include 1, 5, 10, 1-5, 1-10, 10-25, 10-95, 1-70, etc.
[0043] Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and/or the present specification, and should not be interpreted in an idealized or overly formal sense, unless expressly so defined herein.
[0044] One or more embodiments of the present invention will now be described with references to
[0045]
[0046]
[0047] In the first step of the ghosting detection process 18, the images are thresholded using two thresholds t.sub.d and t.sub.b, for each image 212, 214, and 216 where:
(I>t.sub.b or I<t.sub.d)=>I=0
Where I represents that respective image's intensity or brightness.
[0048] In the second step, for each pixel's intensity in each respective image 212, 214, and 216 check for the following condition:
i.sub.H.sup.i>I.sub.M.sup.i>I.sub.L.sup.i
where i is the pixel location, and I.sub.H, I.sub.M, I.sub.L are the high 216, medium 214, and low 212 exposure images, respectively.
[0049] In the third step, create a binary mask, using the equation in step two, setting a mask pixel value to ‘1’ if condition is ‘false’, and setting a mask pixel value to ‘0’ if the condition is ‘true.’ The white pixels in the formulated binary black/white ghosting mask indicate the pixels in the merged composite image 218 that containing a ghosting region 220.
[0050] In the final step, filter for noise and count the number of non-zero pixels in the merged composite image 218. Finally, threshold this number of non-zero pixels by dividing it by total number of pixels in the image for ghosting:
TABLE-US-00001 If (N.sub.non zero / N.sub.total > threshold){ deghost = true }
[0051] Referring again to
[0052]
[0053] Each of the four alignment fields 316, 318, 320, and 322 is divided into segments, and each segment is matched with a corresponding candidate segment in an alternate image. The overall displacement in alignment is calculated from the sum of L1 distance between the pixel of the reference segment and the search area expanded by (4,4) in the alternate segment. This generates an aligned image set where for each segment, there are offset values determining which segment to choose while merging the images 312 and 314.
[0054] After aligning the images 312 and 314, they can be merged first temporally and then spatially. In the initial temporal merging step, each segment is first merged between burst images based on an offset determined by the alignment process described above. Weighting for selection of a segment is determined based on the average distance between pixel values of aligned segments.
[0055] The temporal phase of post-alignment merging follows the following equations:
O(x,y.sub.t)=Σ.sup.i=n(W.sub.t.sup.i*I.sub.t.sup.i(x,y)/W.sub.s)+I.sub.t.sup.t(x,y)/W.sub.s
Where,
W.sub.r.sup.i=1/ND.sub.t.sup.i if ND.sub.t.sup.i>290 else 0
ND.sub.r.sup.i=max(1,Σ.sup.x=16,y=16(I.sub.t.sup.i(x,y)−I.sub.r(x,y))/256)
W.sub.s=Σ.sup.i=3(W.sub.t)
Where I.sub.t.sup.i is the underlying intensity value of segment ‘t’ in the i.sup.th exposure image and I.sub.t.sup.i is the initially-segregated reference image.
[0056] Following the temporal merging of images, the temporally-merged segments are now merged spatially to create an aligned and deghosted composite HDR image 26 (
[0057] Referring again to
TABLE-US-00002 {0: ‘BATHROOM’, 1: ‘BEDROOM’, 2: ‘CLOSET’, 3: ‘COMMON_AREA_ROOM’, 4: ‘DINING’, 5: ‘ENTRY_SHOT_FRONT_DOOR’, 6: ‘FRONT_EXTERIORS’, 7: ‘GARAGE’, 8: ‘HOA_COMMON_AREA_AMENITIES_EXTERIOR’, 9: ‘HOA_COMMON_AREA_AMENITIES_INTERIOR’, 10: ‘KITCHEN’, 11: ‘LAUNDRY’, 12: ‘REAR_EXTERIORS’, 13: ‘SPECIALTY’, 14: ‘STAIRS’, 15: ‘WHITE_BALANCE’}
The applied label is then used to select particular parameters and/or thresholds in later HDR image processing. Again, depending on the applicable context for image review and use, the classification labeling may differ from the above.
[0058] If ghosting was not detected following the application of the process 18 above, the images 12, 14, and 16 can be merged using a standard Mertens algorithm (via Exposure Fusion) 24, resulting in a manipulable HDR image 26. As part of the merge, the Exposure Fusion weights for contrast, saturation and exposure were determined by the classification 22 of the scene as being either interior or exterior.
[0059] To improve image quality, the merged output HDR image 26 from either the classification 22 and Exposure Fusion merge 24, or the deghosted alignment and merging process 20, is tuned using gamma corrections 28 followed by correcting dark and bright regions 30 in the scene.
[0060] In this embodiment related to residential photography, the tuning algorithm is different for ‘interior’ and ‘exterior’ scenes. In interior scenes, one goal is to not allow black levels below a certain threshold. With an HDR image 26 as a starting point, the HDR image 26 can be adaptively mapped to produce an output HDR image having ‘Z’ bit data and a total number of pixels N.sub.total. A range of gamma values from 0.5 to 2.0 are then applied to alter the image 26, the number of pixels (N.sub.b) with values less than a black level threshold (t.sub.b) is determined, and then the lowest tuned gamma value correction level is selected, for which:
N.sub.b<0.025*(N.sub.t)
where N.sub.t is the total number of pixels.
[0061] Further, interior scenes often include significant glare in brighter pixel regions, thus leading to an overall degradation of detail within brighter regions of the scene. To correct this, the omitted details are recovered from the lowest exposure image.
[0062]
M.sub.b=AND(I.sub.b,I.sub.d)>t.sub.b, where I.sub.b is the brightest image and I.sub.d is the darkest image.
[0063]
[0064] Referring again to
M.sub.i=Denoise(Laplacian(I.sub.d))
[0065] for each region (R) in M.sub.i: [0066] if N.sub.r<threshold [0067] M.sub.b[N.sub.r]=0
[0068] Where, N.sub.r is the number of non zero pixels in region R
[0069] Next is selecting 422 at least one pixel N.sub.r in at least one region R of the intermediary Laplacian mask M.sub.i 616 that does not have a zero value in comparison to the unrefined mask M.sub.b 518, and then keeping only those common regions in the unrefined mask M.sub.b 518 not having a zero value to form a refined mask M.sub.b (Refined) 618.
[0070] The final step in correcting detail obscured by brightness glare in an ‘interior’ image 410 is computing a blended image I.sub.blend 716 (
I.sub.blend=Blend(I.sub.d,I.sub.EF,M.sub.b(Refined))
The end result is a blended image I.sub.blend 716 where the brightest regions of the darker (e.g. under-exposed) second exposure image I.sub.d 514 bearing a greater amount of detail replace the overexposed regions in the simple Exposure Fusion image I.sub.EF 712 that lack detail.
[0071] In contrast to interior images, exterior image gamma tuning can be accurately determined based on the image's colorfulness. Colorfulness (C) is generally calculated in the following way:
C=stdRoot+(0.3*meanRoot)
Where,
stdRoot=√(std.sub.B.sup.2+std.sub.YB.sup.2)
meanRoot=√(mean.sub.B.sup.2+mean.sub.YB.sup.2)
YB=absolute(0.5*(R+G)−B)
where R,G,B are the RGB channels of the exterior image, and mean.sub.X and std.sub.X are mean and standard deviations of channel x, respectively.
[0072] In the context of exterior exposure fusion images I.sub.EF and white-balanced images I.sub.WB, Colorfulness is calculated on the exposure fusion result and the output of auto white balance on exposure fusion.
I.sub.WB=WhiteBalance(I.sub.EF)
C1=Colorfulness(I.sub.EF)
C2=Colorfulness(I.sub.WB)
If (C1<1.03*C2):
I.sub.EF=I.sub.WB
The output is gamma corrected as follows:
[0073] for gamma in [0.5,0.6,0.7,0.8,0.9,1.0,1.1,1.2]:
I.sub.gamma=GammaAdjustment(I.sub.EF)
if (maxMean.sub.gamma>128) and (minMean.sub.gamma>110):
I.sub.EF=I.sub.gamma
Where
maxMean.sub.gamma=max(Mean(R.sub.gamma),Mean(G.sub.gamma),Mean(B.sub.gamma))
minMean.sub.gamma=min(Mean(R.sub.gamma),Mean(G.sub.gamma),Mean(B.sub.gamma))
[0074] With a gamma-corrected (based on Colorfulness) exterior HDR composite image I.sub.gamma, we are then able to correct over-brightened regions of the image that lost detail due to over exposure (as done above for interior images 410). The difference in the context of an exterior image is that gradient values are not used to filter out the over-exposed regions:
M.sub.d=AND(INV(I.sub.d),INV(I.sub.b))
Where M.sub.d is the mask of the darkest regions in the exterior scene. This darkness mask is then utilized in a blend to create a brightness-corrected and gamma-tuned HDR exterior image composite:
I.sub.ext blend=Blend(I.sub.d,I.sub.gamma,M.sub.d)
[0075] After improving the visibility of details apparent in glare-obscured regions of a scene captured with HDR composite images, regions of that scene may need to be selected in their entirety for manipulation, correction, or wholesale replacement. In one embodiment of the present invention, the portion of an exterior image has such a region, such that segmentation and replacement of that portion can be performed to replace the overcast appearance within the final image to something else, or to manipulate the appearance of the existing region by increasing blue level or even enhancing contrast. In the present embodiment, the region for manipulation and/or replacement is an overcast sky.
[0076]
[0077]
[0078] Ideally in the sky region 816 (
M.sub.BR=(B−R)>th
M.sub.BG=(B−G)>th
M.sub.B=B>th
M.sub.1=AND(OR(M.sub.BR,M.sub.BG),M.sub.B)
A threshold blue hue mask M.sub.2 1010 (
meanB=mean(M.sub.1*B)
M.sub.2=B−meanB<th
M.sub.3=AND(M.sub.2,M.sub.1)
[0079] The detection mask M.sub.3 1012 (
dist.sub.1=L2-norm([μ.sup.B,μ.sup.G,μ.sup.R],[Rmean.sup.r,Gmean.sup.r,Bmean.sup.r])
if dist.sub.1>th:
M.sub.3{r}=0
[0082]
[0083] In the lowest (darkest) exposure image I.sub.dark 812 (
sky_brightness=mean(V[0:0.2*image_h])
min_brightness=mean(V[0.6*image_h:image_h])
M.sub.V[0:0.4*image_h]=V[0:0.4*image_h]>min(0.7*sky_brightness,t1)
M.sub.V[0.4*image_h:]=V[0.4*image_h:]>min(2*min_brightness,t2)
if sky_brighntess<=t3:
M.sub.V=AND(M.sub.V,V<245)
[0084] We then compute thresholded masks for the blue-red 926 and blue-green 928 channels to get masks for blue-red M.sub.BR1 1014 (
M.sub.BR1=B>R+10
M.sub.BG1=B>G+10
[0085] Finally, the combined detection mask M.sub.41020 (
M.sub.4=AND(AND(M.sub.BR1,M.sub.BG1),M.sub.V)
[0086] After completing the two heuristic approaches for detecting and differentiating (i.e. segmenting) the ‘sky’ region 816 (
[0087]
M.sub.HSV=AND(S<t.sub.1,AND(V>t.sub.2,H<t.sub.3))
[0088] Next, the newly-formed hue, saturation, and value mask M.sub.HSV 1022 (
M.sub.combined=OR(M.sub.3,M.sub.4,M.sub.HSV)
[0089] Next, a bright mask M.sub.bright 1026 (
M.sub.bright=AND(INT.sub.bright,INT.sub.dark)>t.sub.b
[0090] Next, a sure sky mask M.sub.sure 1028 (
M.sub.sure=(M.sub.combined⊖M.sub.bright)
[0091] Next, an intermediary grab-cut mask M.sub.grab_cut 1030 (
M.sub.sure=sure foreground seed
M.sub.combined−M.sub.sure=probable foreground (M.sub.pf)
M.sub.sure_bg=Bottom 40% region=sure background
INV(M.sub.sure_bg+M.sub.combined)=probable background
M.sub.grab_cut=GRAB_CUT(M.sub.sure,M.sub.pf,M.sub.sure_bg,INV(M.sub.sure_bg+M.sub.combined))
[0092] Finally, the newly-created intermediary grab-cut mask M.sub.grab_cut 1030 (
M.sub.final=OR(M.sub.grab_cut,M.sub.sure)
[0093] In this embodiment, the creation of the finalized sky segmentation mask M.sub.final1032 (
[0094] The segmentation improvement method described in the present embodiment also affords improvement in pretraining convolutional networks of artificial neurons through the repeated convolution and pooling of at least one set of clear sky images and at least one set of sky images at least partially containing cloud cover. In an alternative embodiment, clear-sky and cloud-cover images may be synthetically generated using computer graphics. In a further alternative embodiment, the improvements in pretraining can be applied to features and/or portions of indoor and outdoor scenes other than the sky (e.g. rectangular real estate signboards, ceilings, lawns, pools, carpets, etc.).
[0095] While the invention herein disclosed has been described by means of specific embodiments and applications thereof, numerous modifications and variations could be made thereto by those skilled in the art without departing from the scope of the invention set forth in the claims.