SPATIALLY ADAPTIVE TONE MAPPING OPERATOR
20180012339 · 2018-01-11
Inventors
Cpc classification
International classification
Abstract
A method for spatially-adaptive tone mapping in an image having high dynamic range includes using a computing device to receive an input image from an image sensor comprising a plurality of pixels having pixel locations and determine within the input image a plurality of local size scales, each comprising a neighborhood having substantially constant illumination. The variation in reflectance within each neighborhood is estimated and local contrast within each neighborhood is enhanced. Using the illumination and variation within the contrast-enhanced neighborhoods, the image is remapped to a reduced dynamic range to generate an output image.
Claims
1. A method for spatially-adaptive tone mapping in an image having high dynamic range, the method comprising: receiving in a computer processor input image data corresponding to a plurality of pixels having pixel locations; determining within the input image data a plurality of local size scales, each comprising a neighborhood of pixel locations having substantially constant local illumination; estimating a variation in reflectance within each neighborhood; enhancing local contrast within one or more of the neighborhoods based upon a level of local illumination; combining the illumination and variation within each of the one or more neighborhoods that are contrast-enhanced and any other neighborhoods that are not contrast-enhanced to remap the image with a reduced dynamic range; and generating an output image having reduced dynamic range relative to the input image.
2. The method of claim 1, wherein determining comprises convolving pixels within the input image data with each of a set of nested kernels.
3. The method of claim 2, wherein the set of nested kernels comprises inverted truncated parabolas.
4. The method of claim 1, wherein the image sensor comprises an adaptively-binning image sensor.
5. The method of claim 1, wherein the plurality of local size scales has different shapes and sizes.
6. The method of claim 1, wherein enhancing local contrast comprises boosting contrast in neighborhoods having low- or mid-level illumination values.
7. The method of claim 1, wherein estimating the variation in reflectance includes estimating a local standard deviation.
8. The method of claim 7, further comprising generating a standard deviation at each pixel using the local standard deviation.
9. The method of claim 1, further comprising: extracting color information from the input image data; and prior to generating the output image, prior to generating the output image, restoring color to the reduced dynamic range image using the extracted color information.
10. The method of claim 1, further comprising detecting a noise level within the image data, wherein boosting local contrast is adaptively reduced the one or more pixel regions when the noise level would be amplified above a predetermined level.
11. The method of claim 10, wherein the image data comprises a plurality of video frames, and wherein the noise level is estimated based on frame to frame differences.
12. A method for spatially-adaptive tone mapping in an image having high dynamic range, the method comprising: inputting into a computer processor input image data, the image data comprising a plurality of pixels having pixel locations; convolving the image data with a plurality of kernels to identify a plurality of pixel regions of substantially constant local illumination; estimating a variation in reflectance within each identified pixel region; boosting local contrast within one or more of the pixel regions based upon a level of local illumination; combining the illumination and reflectance variation within each of the one or more neighborhoods that are contrast-boosted and any other pixel regions that are not contrast-boosted to remap the image with a reduced dynamic range; and generating an output image having reduced dynamic range relative to the input image.
13. The method of claim 12, wherein the plurality of kernels are nested kernels of different sizes.
14. The method of claim 13, wherein the nested kernels comprise inverted truncated parabolas.
15. The method of claim 12, wherein the image sensor comprises an adaptively-binning image sensor.
16. The method of claim 12, wherein the plurality of kernels has different shapes and sizes.
17. The method of claim 12, wherein estimating the variation in reflectance includes estimating a local standard deviation.
18. The method of claim 17, further comprising generating a standard deviation at each pixel using the local standard deviation.
19. The method of claim 12, further comprising: extracting color information from the input image data; and prior to generating the output image, prior to generating the output image, restoring color to the reduced dynamic range image using the extracted color information.
20. The method of claim 12, further comprising detecting a noise level within the image data, wherein boosting local contrast is adaptively reduced the one or more pixel regions when the noise level would be amplified above a predetermined level.
21. The method of claim 20, wherein the image data comprises a plurality of video frames and further comprising estimating a noise level based on frame to frame differences.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0016] The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS
[0027] In an exemplary embodiment, the inventive method includes a first step to establish the local size scale over which illumination is approximately constant and outside of which the illumination changes. This is the local illumination scale, r.sub.m(x,y). The possible values of r.sub.m(x,y) are limited to one of a sequence of logarithmically spaced radii: r.sub.i=f.sup.i−1)r.sub.0 for i=1 to i=i.sub.max where f is the scale factor between successive sizes (f is a constant, e.g., 1.4) and r.sub.0 is the smallest radius (r, is a constant, e.g., 1.6 pixels). The r.sub.i are the size scales for the nested convolving kernels similar to the size scales of the Reinhard et al. nested Gaussian kernels. In general, a wide variety of kernel shapes can be used. However, our preference is for kernels with finite support as our measurements are intended to be local and should not be affected by large changes in the wings of the kernel function. To give a concrete example, we will use here a nest of logarithmically sized, inverted, truncated parabolas as the set of kernels for convolution. The kernels obey:
where x and y are measured from the center of the kernel in pixels and C, is used to normalize the kernel to unit integral. Once each kernel is convolved with the image, the illumination size scale, r.sub.m(x,y), is determined using a difference of the kernel-weighted, average luminance as in Reinhard et al. and Equations 3 and 4 above. The local illumination at each pixel is the kernel-weighted, average luminance at the scale found (we use the symbol V.sub.m(x,y) for the kernel-weighted average luminance instead of V.sub.1(x,y,s.sub.m(x,y)) as used in Reinhard et al.):
V.sub.m(x,y)=L(x,y)K.sub.m(x,y), (6)
where represents the convolution operation, L(x,y) is the re-normalized HDR input image luminance channel, and K.sub.m(x,y) is the kernel associated with r.sub.m(x,y), the illumination size scale found at pixel (x,y). In practice, the entire image is convolved with each possible kernel producing a set of convolved images that provide input to the local scale selection process. A binary masking operation and the set of convolved images then is used to create the local illumination image, I(x,y), such that I(x,y) at each pixel is the value of the function V.sub.m(x,y) from Equation 6.
[0028] Note that the local illumination image, I(x,y), described above, does not accurately describe the local illumination because it is not possible to uniquely divide the luminance, L(x,y)=R.sub.f(x,y)I(x,y) into the reflectance, R.sub.f(x,y), and the illumination, I(x,y). But in practice, the I(x,y) image, which is an illumination approximation, is useful for reducing the dynamic range of the image in the TMO part of the method described here.
[0029] The next step is to estimate the variation in reflectances in each of the neighborhoods found in the previous TMO step. This can be done in a variety of manners, but to give a concrete example we will calculate a local standard deviation. The standard deviation image is formed from the residual image:
R(x,y)=L(x,y)−I(x,y), (7)
where R(x,y) is the residual luminance. A local standard deviation is estimated using convolution with the same set of kernel functions as used for the illumination:
φ.sub.i(x,y)=√{square root over ((R.sup.2(x,y))K.sub.i(x,y))}, (8)
where φ.sub.i(x,y) is the local standard deviation found by weighting with a kernel of scale size r.sub.i. A binary masking operation, using the same masks and thus the same size scales selected in the local illumination image calculation, is used to generate a standard deviation, σ(x, y), at each pixel. Thus, I(x,y) and σ(x, y) are calculated using exactly the same pixels surrounding the pixel at location (x,y) and the same weights at each surrounding pixel.
[0030] The tone mapping global dynamic range reduction and local contrast enhancement are done using a single formula once the size scales, I(x,y), R(x,y), and σ(x, y) have been calculated. These calculations can be done without access to the entire image because the truncated, inverted parabolas used as kernels have finite support (e.g., in the examples run using this code, the largest matrix used for the largest size kernel is 17×17, so a full calculation can begin when the first 17 lines of an image are available to the processor). While many different TMO/contrast enhancement mappings can be performed with the quantities collected in the image analysis described above, one potential formula for the TMO/contrast enhancement operator (TMCEO) that includes some constants to vary the amount of contrast enhancement at low and mid-illumination levels is given by:
or, after some algebra, equivalently:
where L.sub.d(x,y) is the reduced dynamic range linear image, and α, β, and γ are the parameters that control the contrast boost. Negative values produced by Equation 9 or 10 are assigned a value of zero.
[0031] The reduced dynamic range image, L.sub.d(x,y), can have some “outlier” pixels with very high relative luminance values using this TMO. The highest values in the histogram of luminance values are remapped to lower values after the entire image has been processed and a histogram of luminance values has been calculated. The re-mapping involves selecting a fraction of the pixel values (e.g., 0.01—the brightest 1% of pixels) and compressing those brightest luminance values to a reduced range (e.g., to 0.9 to 1.0 where 1.0 is the maximum luminance in the reduced dynamic range image).
[0032] Color information can be restored to the image using linear color ratios from the original HDR image, although other methods could also be used. The resulting linear tri-color representation, R.sub.d(x,y), G.sub.d(x,y), B.sub.d(x,y), can then be converted to a non-linear color space, sRGB or Y′CbCr, for compression, transmission and display. A bit-depth of eight bits in the sRGB or Y′CbCr space is sufficient because the TMO has mapped the HDR image into a much lower dynamic range image. Eight bits is a common bit depth for compression and transmission although other bit depths could be considered. In addition, most color monitors, both CRT and LCD flat panel, have a dynamic range that can be completely represented in eight-bit sRGB or eight-bit Y′CbCr.
EXAMPLE 1
TMCEO Processing of Architectural Image
[0033] The performance of the inventive TMCEO approach can be judged by applying the sample algorithm of Equations (9) and (10) to several HDR images. A common set of test images used to test TMOs is the Debevec Library [6] of HDR images. The test images were all processed with two sets of values for the parameters that control the sample TMCEO of Equations (9) and (10): set A with α=0.36, α=0.03, β=1.0, γ=0.03 (
[0034] Comparison of
EXAMPLE 2
TMCEO Processing of Landscape Image
[0035]
[0036] The Vine Sunset HDR image from the Debevec Library is a high key image (dominated by bright regions). The Reinhard et al. TMO produces little visibility of details in the dark regions of high key images in general. By using the TMCEO described here, with the contrast enhancement turned on, the details in the low light regions of the image are more apparent, as can be seen in
EXAMPLE 3
TMCEO Processing of Architectural Image
[0037]
EXAMPLE 4
TMCEO Processing of Architectural Image
[0038]
EXAMPLE 5
TMCEO Processing of Architectural Image
[0039]
EXAMPLE 6
TMCEO Processing of Architectural Image
[0040]
EXAMPLE 7
TMCEO Processing of Landscape Image
[0041]
EXAMPLE 8
TMCEO Processing of Landscape Image
[0042]
[0043] Using the inventive method, the incorporation of a contrast boost operator into a TMO designed to reduce the dynamic range of a HDR image is efficient for a number of reasons. The TMO works as a contrast reduction agent over global size scales while preserving contrast over regions that show an approximately uniform illumination. A simple, separate pre- or post-processing stage of contrast boost would also boost the contrast between regions that are at different illumination levels, however, this would negate some of the benefit from the TMO dynamic range reduction. So, any contrast boost operation should take into account the dynamic range reduction process of the TMO. An easy way to do that is to use the same regions for illumination and contrast boost.
[0044] The kernels used in our example are inverted, truncated parabolas. These work well in practice. Other kernels, even non-symmetric kernels, could be used to convolve the input image in order to find the regions of constant illumination. The use of pixel centered regions of various sizes and shapes make the code spatially adaptive allowing large illumination differences to be mapped down to a lower dynamic range while avoiding most artifacts. If a contrast boost operation is needed, then the large contrast differences that are mapped down by the dynamic range reduction in the TMO are not re-boosted by the contrast boost operator if the TMO is combined with the contrast enhancement function as described above in our sample TMCEO. The amount of contrast boost can be controlled by simple adjustment of a few parameters. For example, different amounts of contrast boost can be set for video intended for surveillance versus images intended for a movie.
[0045] The contrast boost parameters can be functions of location: α(x,y), β(x, y), and γ(x,y), or functions of the local illumination estimate: α(I(x,y)), β(I(x,y)), and γ(I(x,y)). Applications that require fine-tuning of the contrast boost could set these parameters manually or through further automation written into the image-processing pipeline. One possible application of a local, illumination dependent contrast enhancement is a contrast enhancement that reduces automatically when the noise level in part of an image is high enough that noise would be amplified above some set level. The noise could be estimated from frame to frame differences in video after the frames are correlated to prevent triggering from motion in the scene. Or the noise could be estimated from the known WDR camera properties, the internal gain setting of the camera, and the local illumination level. Such adaptive local contrast enhancement could easily fit into the scheme proposed here that integrates TMO global contrast reduction with local contrast enhancement.
[0046] Embodiments of the methods described herein may be implemented in hardware, software, firmware, or any combination thereof. If completely or partially implemented in software, the software may be executed in one or more processors, such as a microprocessor, application specific integrated circuit (ASIC), field programmable gate array (FPGA), or digital signal processor (DSP). The software instructions may be initially stored in a computer-readable medium and loaded and executed in the processor. In some cases, the software instructions may also be sold in a computer program product, which includes the computer-readable medium and packaging materials for the computer-readable medium. In some cases, the software instructions may be distributed via removable computer readable media, via a transmission path from computer readable media on another digital system, etc. Examples of computer-readable media include non-writable storage media such as read-only memory devices, writable storage media such as disks, flash memory, memory, or a combination thereof.
[0047]
[0048] The computer system may operate in a networked environment using logical connections to one or more remote computers, such as a personal computer, a hand-held device, a server, a router, a network PC, a peer device or other common network node. Network connections (wired or wireless) may be used to transmit the saved images (raw and processed) for remote storage and/or analysis. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets, the Internet, and other distributed network environments.
REFERENCES (INCORPORATED HEREIN BY REFERENCE)
[0049] 1. Land, E. H., “Retinex”, American Scientist, 52 (2), pp247-253, 255-264 (1964). [0050] 2. Land, E. H., and McCann, J. J., “Lightness and the Retinex Theory”, J. Opt. Soc. Amer., 61 (1) pp.1-11 (1971). [0051] 3. Reinhard, E., Stark, M., Shirley, P., and Ferwerda, J., “Photographic Tone Reproduction for Digital Images,” in SIGGRAPH 02 Conference Proceedings, pp. 267-276, 2002. [0052] 4. Tumblin, J. and Turk, G., “LCIS: A Boundary Hierarchy for Detail-Preserving Contrast Reduction,” in SIGGRAPH 99 Conference Proceedings, pp. 83-90, 1999. [0053] 5. Fattal, R., Lischinski, D., and Werman, M., “Gradient Domain High Dynamic Range Compression,” Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques, pp. 249-256, 2002. [0054] 6. Debevec, P. E., and Malik, J., 1997, “Recovering High Dynamic Range Radiance Maps from Photographs, In: Proc. ACM SIGGRAPH 97, T. Whitted, Ed., pp. 369-378, 1997. [0055] 7. Hasan, F., “Real-Time Embedded Algorithms for Local Tone Mapping of High Dynamic Range Images,” University of Akron PhD Thesis, December, 2007). [0056] 8. Biswas, K. K. and Pattanaik, S., “A Simple Spatial Tone Mapping Operator for High Dynamic Range Images,” Published in: Thirteenth Color Imaging Conference: Color Science and Engineering Systems, Technologies, and Applications, Scottsdale, Ariz.; November 2005, pp. 291-296.