IMAGE PROCESSING NOISE REDUCTION
20220309622 · 2022-09-29
Inventors
- Willie C. Kiser (Albuquerque, NM, US)
- Michael D. Tocci (Albuquerque, NM, US)
- Nora Tocci (Albuquerque, NM, US)
Cpc classification
G06T2207/20182
PHYSICS
H04N25/60
ELECTRICITY
International classification
Abstract
Noise reduction in images is provided by performing a noise reduction step on blocks of pixels within a video-processing pipeline. The noise reduction step consists of applying a discrete cosine transform (DCT) to the block of pixels, quantizing the resulting DCT coefficients, and performing an inverse of the DCT to the quantized coefficients. The output of that noise reduction step is a block of image pixels similar to the input pixels, but with significantly less image noise. Because the noise reduction step can be performed quickly on small blocks of pixels, the noise reduction can be performed in real-time in a video processing pipeline.
Claims
1. A method of image noise reduction, the method comprising: obtaining image data from at least one image sensor; performing a discrete cosine transform (DCT) on the image data to obtain DCT values, quantizing the DCT values, and conducting an inverse DCT and inverse quantization, thereby to produce noise-reduced image data.
2. The method of claim 1, further comprising processing said noise-reduced image data through a video processing pipeline.
3. The method of claim 2, wherein the video processing pipeline is a high dynamic range pipeline.
4. The method of claim 1, further comprising exposing said image data to a low-pass filter.
5. The method of claim 2, wherein the video processing pipeline comprises streaming pixel values from each of a plurality of image sensors in a frame independent manner through a kernel operation that identifies saturated pixel values and a merge module to merge the pixel values to produce a high-dynamic range (HDR) image.
6. The method of claim 5, wherein each of the sensors includes a Bayer filter.
7. The method of claim 5, wherein the pipeline performs the following steps in the recited order: synchronize, the kernel operation, a tone-mapping operation, and the noise reduction step.
8. The method of claim 3, further comprising transforming the data from the at least one image sensor from an RGB color space into a YCbCr color space and performing the noise reduction step in the YCbCr color space.
9. The method of claim 5, wherein the plurality of image sensors are each positioned with respect to at least one beamsplitter and a lens of the video camera such that incoming light is split onto the plurality of image sensors so that each image sensor senses an image that is identical but for light level.
10. A method for removing noise from a real-time stream of digital image data, the method comprising: receiving data from at least one image sensor; applying a noise reduction pipeline to remove noise from the data, wherein the pipeline includes: downsampling at least a portion of the data in order to remove a first portion of the noise, frequency domain processing the data to remove a second portion of the noise, the processing including: converting the data into the frequency domain, wherein the data are expressed as a combination of distinct frequencies, identifying a portion of said distinct frequencies as being associated with a second portion of the noise, removing the identified portion of said distinct frequencies, converting remaining image data to yield noise-reduced image data; and providing the data for displaying, broadcasting, or storing as a digital image.
11. The method of claim 10 wherein the data are transformed from a first color space into a second color space.
12. The method of claim 11, wherein the first color space is an RGB color space and the second color space is a YCbCr color space.
13. The method of claim 10, further comprising exposing at least a portion of said data to a low pass filter.
14. The method of claim 13, wherein the low-pass filter is inserted into the noise reduction pipeline at any point to filter image data represented in the frequency domain, the low-pass filter configured to remove a third portion of the noise.
15. The method of claim 14, wherein the low-pass filter is selected from a plurality of low-pass filters and the selected low-pass filter includes filtering parameters configured to filter noise that is specific to a particular digital image acquisition device and environmental conditions at the time of image acquisition.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
DETAILED DESCRIPTION
[0030]
[0031] In preferred embodiments, the noise reduction step is performed on an N×N block of pixels for 2<N<16, such as 8×8. Quantizing the DCT values may be done by dividing elements of the DCT values by corresponding elements in a quantization matrix and rounding the result.
[0032] In certain embodiments, the noise reduction step 113 is implemented by a high-dynamic range (HDR) video camera. In some embodiments of an HDR camera, pixel values 501 are streamed through a pipeline on a processing device 219 in real time. Real-time means that HDR video from the camera may be displayed essentially simultaneously as the camera captures the scene (e.g., at the speed that the signal travels from sensor to display minus a latency no greater than a frame of film). There is no requirement for post-processing the image data and no requirement to capture, store, compare, or process entire “frames” of images. The described method 101 and its noise reduction step 113 are applicable to pipeline processing for real-time HDR video.
[0033]
[0034] The kernel operation 413 operates on pixel values 501 as they stream from each of the plurality of image sensors 265 by examining, for a given pixel on the HE sensor 213, values from a neighborhood 601 of pixels surrounding the given pixel, finding saturated values in the neighborhood 601 of pixels, and using information from a corresponding neighborhood 601 on the ME sensor 211 to estimate a value for the given pixel.
[0035] Various components of the apparatus 201 may be connected via a printed circuit board 205. The apparatus 201 may also include memory 221 and optionally a processor 227 (such as a general-purpose processor like an ARM microcontroller). Apparatus 201 may further include or be connected to one or more of an input-output device 239 or a display 267. Memory can include RAM or ROM and preferably includes at least one tangible, non-transitory medium. A processor may be any suitable processor known in the art, such as the processor sold under the trademark XEON E7 by Intel (Santa Clara, Calif.) or the processor sold under the trademark OPTERON 6200 by AMD (Sunnyvale, Calif.). Input/output devices according to the invention may include a video display unit (e.g., a liquid crystal display or LED display), keys, buttons, a signal generation device (e.g., a speaker, chime, or light), a touchscreen, an accelerometer, a microphone, a cellular radio frequency antenna, port for a memory card, and a network interface device, which can be, for example, a network interface card (NIC), Wi-Fi card, or cellular modem. The apparatus 201 may include or be connected to a storage device 241. The plurality of sensors are preferably provided in an arrangement that allows multiple sensors 265 to simultaneously receive images that are identical except for light level.
[0036] The method 101 and its noise reduction step 113 are applicable to a variety of image-processing applications and cameras. For example, the noise reduction 113 may be provided within a consumer device such as a smartphone or a digital camera. The noise reduction step 113 may be performed by a processor in a mobile personal device in which the at least one sensor is part of a digital camera on the mobile personal device. Optionally, providing the image data for storing as the digital image can include converting the image data into a JPEG and storing the JPEG on a computer-readable storage medium.
[0037]
[0038] As shown in
[0039] Of the 8% of the total light that is reflected upwards, 94% (or 7.52% of the total light) is transmitted through the second beamsplitter 319 and focused onto the medium-exposure (ME) sensor 211. The other 6% of this upward-reflected light (or 0.48% of the total light) is reflected back down by the second beamsplitter 319 toward the first beamsplitter 301 (which is again at 45.), through which 92% (or 0.44% of the total light) is transmitted and focused onto the low-exposure (LE) sensor 261.
[0040] In preferred embodiments, pixel values stream from the HE sensor 213, the ME sensor 211, and the LE sensor 261 in sequences directly to the processing device 219. Those sequences may be not synchronized as they arrive onto the processing device 219.
[0041]
[0042]
[0043]
[0044] The bottom portion of
[0045] Streaming the pixel values 501 through the kernel operation 413 includes examining values from a neighborhood 601 of pixels surrounding a first pixel 615 on the HE sensor 213, finding saturated values in the neighborhood 601 of pixels, and using information from a corresponding neighborhood 613 from the ME sensor 211 to estimate a value for the first pixel 615. The processing device makes comparisons between corresponding pixel values from different sensors. It may be useful to stream the pixel values through the kernel operation in a fashion that places the pixel under consideration 615 adjacent to each pixel from the neighborhood 601 as well as adjacent to each pixel from the corresponding neighborhood on another sensor. For merging 139, two registered LDR images (one high-exposure image IHE and a second medium-exposure image IME) are to be merged 139 into an HDR image IHDR . The merging 139 starts with the information in the high-exposure image IHE and then combines in data from the next darker-exposure image IME, as needed. To reduce the transition artifacts described earlier, the apparatus 201 works on each pixel location (x, y) by looking at the information from the surrounding (2k+1)×(2k +1) pixel neighborhood 601, denoted as N(x,y).
[0046] In some embodiments as illustrated in
[0047] Case 1: The pixel 615 is not saturated and the neighborhood 601 has no saturated pixels, so the pixel value is used as-is.
[0048] Case 2: The pixel 615 is not saturated, but the neighborhood 601 has 1 or more saturated pixels, so blend between the pixel value at IHE(x, y) and the one at the next darker-exposure IME(x, y) depending on the amount of saturation present in the neighborhood.
[0049] Case 3: The pixel 615 is saturated but the neighborhood 601 has 1 or more non-saturated pixels, which can be used to better estimate a value for IHE(x,y): calculate the ratios of pixel values in the ME image between the unsaturated pixels in the neighborhood and the center pixel, and use this map of ME ratios to estimate the actual value of the saturated pixel under consideration.
[0050] Case 4: The pixel 615 is saturated and all pixels in the neighborhood 601 are saturated, so there is no valid information from the high-exposure image, use the ME image and set IHDR(x, y)=IME(x, y).
[0051] When there are three LDR images, the process above is simply repeated in a second iteration, substituting IHDR for IRE and ILE for IME. In this manner, data is merged 139 from the higher exposures while working toward the lowest exposure, and data is only used from lower exposures when the higher-exposure data is at or near saturation.
[0052] This produces an HDR image that can be demosaiced and converted from pixel values to irradiance. This refers to transforming the (e.g., RGB) pixels into a YC.sub.bC.sub.r color space. YCbCr may be found in the literature variously written as YCbCr, Y′CbCr, Y Pb/Cb Pr/Cr, or Y′C.sub.BC.sub.R. In a YCbCr color space, Y is the luma component and Cb and Cr are the blue-difference and red-difference chroma components. Y′ (with prime) is distinguished from Y, which is luminance, meaning that light intensity is nonlinearly encoded based on gamma corrected RGB primaries. Y′CbCr color spaces are defined by a mathematical coordinate transformation from an associated RGB color space. If the underlying RGB color space is absolute, the Y′CbCr color space is an absolute color space as well; conversely, if the RGB space is ill-defined, so is Y′CbCr.
[0053] The final HDR full-color image may then be tone mapped (e.g., with commercial software packages such as FDRTools, HDR Expose, Photomatix, etc.) The noise reduction method 101 may be performed at any suitable step. In some embodiments, the noise reduction is performed on the converted irradiance values.
[0054] The apparatus 201 may be implemented using three Silicon Imaging SI-1920HD high-end cinema CMOS sensors mounted in a camera body. Those sensors have 1920×1080 pixels (5 microns square) with a standard Bayer color filter array, and can measure a dynamic range of around 10 stops (excluding noise). The sensors are aligned by aiming the camera at small pinhole light sources, locking down the HE sensor and then adjusting setscrews to align the ME and LE sensors.
[0055] The camera body may include a Hasselblad lens mount to allow the use of high-performance, interchangeable commercial lenses. For beamsplitters, the apparatus may include uncoated pellicle beamsplitters, such as the ones sold by Edmund Optics [part number NT39-482]. The apparatus 201 may perform the steps of the method 101. Preferably, the multiple image sensors include at least a high exposure (HE) sensor 213 and a middle exposure (ME) sensor 211, and the merging includes using HE pixel values 501 that are not saturated and ME pixel values 501 corresponding to the saturated pixel values. The multiple sensors may further include a low exposure (LE) sensor 261, and the method 101 may include identifying saturated pixel values 501 originating from both the HE sensor 213 and the ME sensor 211. Because the pixel values stream through a pipeline, it is possible that at least some of the saturated pixel values 501 are identified before receiving values from all pixels of the multiple image sensors at the processing device 219 and the method 101 may include beginning to merge 139 portions of the sequences while still streaming 129 later-arriving pixel values 501 through the kernel operation 413.
[0056]
[0057] The pipeline includes color processing, tone-mapping, and noise reduction 101, which includes a noise reduction step 113. The noise reduction step 113 may be performed in any suitable color space. For example, in some embodiments, the method includes transforming the data from the at least one image sensor from an RGB color space into a YC.sub.BC.sub.R color space and performing the noise reduction step in the YC.sub.BC.sub.R color space.
[0058] Digital image capturing devices such as those embodied by
[0059] The noise reduction step 113 may be implemented within a pixel-processing pipeline such as the pipeline 231 in the processing device 219. A pipeline can be implemented on a discrete image processing circuit or as an integrated circuit. In a discrete circuit design, the pipeline can be connected to existing circuitry of a digital image device. Pixels are streamed into the pipeline in N×N blocks, preferably 8×8. Thus, the incoming image data (obtained from an image sensor) is initially an 8×8 block of pixel values. Given pixel values that range from 0 to 255, those incoming N×N pixel values can be centered on 0 by subtracting 128 from each to give a matrix M. The matrix M is the starting image data that will be cleaned by the noise reduction step 113, and the output of that step will be an N×N matrix I of similar, but de-noised pixel values. A discrete cosine transform is applied to the matrix M to transform it into a matrix D of DCT coefficients.
[0060]
[0061]
[0062]
[0063] In equation 809, the rounding operation truncates values after a decimal point and, in a simple version, simply rounds to integers. E.g., if Dij/Qij is, say, 10.567, then round(Dij/Qij) is 10. The quantized coefficients are then subject to an inverse of the DCT as per, for example, equation 813 to give de-noised image data in the matrix I corresponding a cleaned-up version of the original N×N block. Some of the input equations are explained in Ken Cabeen and Peter Gent, Image Compression and the Discrete Cosine Transform, Math 45, College of the Redwoods (11 pages), incorporated by reference. It is important to note that the noise reduction step 113 consists of operating a processing device 219 such as an FPGA or ASIC on an N×N matrix of pixel values N to output an N×N matrix of pixel values I. The noise reduction step 113 can be performed for successive N×N blocks from an input image and the output is a de-noised version of the input image. Equations 807, 809, and 813 illustrate how to obtain I, which in preferred embodiments is an 8×8 block of de-noised image pixels. No compression or encoding need occur within the image processing step 113.
[0064] In embodiments of the disclosure, a pipeline receives each 8×8 block of pixel values in an RBG color space. The pipeline performs color transformation on each block to produce an 8×8 image block that is in the YCbCr color space. The YCbCr color space allows the pipeline to separate the luma (or luminance) domain from the chroma (or color) domain of the digital image. The pipeline then includes distinct channels to remove noise contained each of the luma and chroma domains of the digital image. Advantageously, this bifurcated process enables noise reduction such that noise unique to each domain of the image can be specifically targeted and removed.
[0065] For example, the human visual system is more sensitive to luminance than it is to color. In other words, the human eye is better able to perceive discrete changes in the luminance domain than it is able to in the color domain. Accordingly, the pipeline can enable an aggressive noise filtering strategy in the color domain (e.g., one that may not only remove noise but also actual color information (i.e., non-noise data)) such that the human visual system is not able to perceive the loss in actual color information. In the current example, color information in the digital image is downsampled. In particular, the color information is sampled in either one or both of the rows and columns of the 8×8 block such that the color information in one pixel is used to represent the color information in a predetermined number of the pixel's neighbors. This predetermined number is set by the ‘sample rate’. Thus, the sampling can remove unwanted noise while keeping actual color information. The 8×8 block is essentially color smoothed by enabling the color information in the one pixel to represent the color information in a predetermined number of its neighbors. The sampling rate can be dynamically adjusted per 8×8 block of pixel values as a function of the location of the block in the overall image. For instance, chroma noise becomes more apparent in the very dark or very light areas of digital images and, as such, the sampling rate can be increased in the image blocks corresponding to those areas. Although not illustrated, some embodiments may also downsample the luminance domain of an image.
[0066] As is known, any image can be represented as combination of a particular set of frequencies. Because noise is generally associated with quick changes in luminance or color in a short space, high frequency portions of the image are generally associated with noise. As such, those high frequency components that are generally associated with noise can be removed from the image.
[0067] To that end, the pipeline first transforms each 8×8 block of pixel values as represented in the spatial domain (i.e., each pixel value corresponds to a particular location in an image) into the frequency domain. In particular, a discrete cosine transform (DCT) (e.g., via Equation 805) is applied to the matrix M to transform it into an 8×8 matrix D of DCT coefficients. Each unit of the matrix D represents a particular cosine frequency and each DCT coefficient contained in a unit is a weight identifying how much the particular cosine frequency contributes to the overall 8×8 image. Thus, there are 64 possible cosine frequencies that when combined in a particular way can generate any 8×8 image. The matrix D is arranged such that the cosine frequencies increase from the top-left to the bottom-right of the matrix.
[0068] In order to remove those high frequency components, the pipeline quantizes the 8×8 matrix of DCT coefficients by using a matrix of quantization coefficients Q. Any suitable quantization matrix Q can be used and, in fact, the default quantization coefficients from a JPEG algorithm may be used. In particular, each coefficient value in the matrix D is divided by a particular quantization coefficient in the matrix Q (e.g., Dij/Qij, where ij represents a particular location in the matrices D and Q). The quantization further truncates values after a decimal point and, in a simple version, simply rounds to integers. E.g., if Dij/Qij is, say, 10.567, then round (Dij/Qij) is 10. Thus, the quantization matrix Q is selected such that weightings of high frequency components are reduced to a fractional number and, thus, when the fractional number is truncated, the coefficient is turned to zero. In other words, the frequency corresponding to that coefficient is given no weight in the overall 8×8 image so that the noise associated with that frequency is removed from the image. A matrix C of quantized values is then produced by the pipeline that includes the quantized values produced from the quantization process discussed above.
[0069] The pipeline begins reformatting the image data to reproduce a displayable image. Accordingly, the pipeline conducts performs inverse quantization on the matrix C. In particular, each value in the matrix C is multiplied by corresponding values in the matrix Q (e.g., Cij*Qij). This step yields an 8×8 matrix D′ of reverse quantized DCT coefficients. As stated above, this will yield DCT coefficients having a value of ‘0’ for those coefficients corresponding to frequencies associated with noise (e.g., high frequency components of the digital image).
[0070] The reverse quantized DCT coefficients are then subject to an inverse of the DCT to give de-noised image data in the matrix I corresponding a cleaned-up version of the original N×N block. It is important to note that the image compress step 113 consists of operating a processing device 219 such as an FPGA or ASIC on an N×N matrix of pixel values N to output an N×N matrix of pixel values I. The image processing step 113 can be performed for successive N×N blocks from an input image and the output is a de-noised version of the input image.
[0071] In addition, an optional low-pass filter may be introduced at any point in the pipeline to filter noise from any frequency related 8×8 matrix of data produced in the pipeline (e.g., on any of the matrices D, C, and I). The low-pass filter can be configured to remove certain ranges of frequencies from image data. In particular, the pipeline may include multiple low-pass filters, each including different filtering parameters that are specific to particular camera devices, environmental settings in which an image is being captured, or any other hardware or environmental factor. Accordingly, noise that may be introduced that is specific to a particular set of circumstances may be dynamically accounted for and removed by utilizing such a selecting a certain filter from the low-pass filters.
[0072] In one example, the low-pass filter operates to remove high frequency components and may be implemented in any suitable way. For example, in certain embodiments, the low pass filter sets to zero those DCT coefficients that correspond to certain frequencies above a predetermined threshold (e.g., those frequencies associated with noise). In some embodiments, the low pass filter compares the absolute value of each of the coefficients in the DCT coefficient matrix D to a predetermined value. Based on comparison, the low-pass filter identifies which of the DCT coefficients are below the threshold. The low-pass filter sets the values of the identified DCT coefficients to zero. By setting the value of a DCT coefficient to zero, a frequency component associated with that coefficient does not contribute to an image. As such, the low pass filter removes noise components from the image data. The output of the low-pass filter is an 8×8 matrix Df that includes filtered DCT coefficients. The remaining steps of the pipeline operate on this matrix Df in the same manner as described above.
[0073] In another example, the low-pass filter operates on the matrix C of quantized DCT coefficients. For instance, the low-pass filter sets to zero all of the quantized coefficients corresponding to frequencies above a certain cutoff frequency (e.g., those frequencies associated with noise). In certain embodiments, the low pass-filter compares the absolute value of each non-zero quantized DCT coefficient in the quantized matrix C to a predetermined threshold. Based on comparison, the low-pass filter identifies which of the non-zero coefficients are below the threshold. The low-pass filter sets the values of the identified non-zero quantized coefficients to zero. By setting the value of the identified quantized coefficients to zero, a frequency component associated with that coefficient does not contribute to an image. As such, the low pass filter removes noise components from the image data. The output of the low-pass filter is an 8×8 matrix Cf of filtered quantized coefficients. The remaining steps of the pipeline operate on this matrix Cf in the same manner as described above.
[0074] The described methods and devices can produce a de-noised HDR video signal. With reference back to