Method and system for real-time motion artifact handling and noise removal for ToF sensor images
11215700 · 2022-01-04
Assignee
Inventors
Cpc classification
G01S17/894
PHYSICS
G06T2207/20182
PHYSICS
G01S17/36
PHYSICS
International classification
G01S7/4865
PHYSICS
G01S17/894
PHYSICS
G01S17/36
PHYSICS
Abstract
A method and system for real-time motion artifact handling and noise removal for time-of-flight (ToF) sensor images. The method includes: calculating values of a cross correlation function c(τ) at a plurality of temporally spaced positions or phases from sent (s(t)) and received (r(t)) signals, thereby deriving a plurality of respective cross correlation values [c(τ.sub.0), c(τ.sub.1), c(τ.sub.2), c(τ.sub.3)]; deriving, from the plurality of cross correlation values [c(τ.sub.0), c(τ.sub.1), c(τ.sub.2), c(τ.sub.3)], a depth map D having values representing, for each pixel, distance to a portion of an object upon which the sent signals (s(t)) are incident; deriving, from the plurality of cross correlation values [c(τ.sub.0), c(τ.sub.1), c(τ.sub.2), c(τ.sub.3)], a guidance image (I; I′); and generating an output image D′ based on the depth map D and the guidance image (I; I′), the output image D′ comprising an edge-preserving and smoothed version of depth map D, the edge-preserving being from guidance image (I; I′).
Claims
1. A method for real-time motion artifact handling and noise removal for time-of-flight (ToF) sensor images, the ToF sensor images corresponding to received signals (r(t)) received by a ToF sensor following sending of modulated sent signals (s(t)), the method comprising: calculating values of a cross correlation function c(τ) at a plurality of temporally spaced positions or phases from the sent (s(t)) and received (r(t)) signals, thereby deriving a plurality of respective cross correlation values [c(τ.sub.0), c(τ.sub.1), c(τ.sub.2), c(τ.sub.3)], wherein each of the plurality of respective cross correlation values [c(τ.sub.0), c(τ.sub.1), c(τ.sub.2), c(τ.sub.3)] corresponds to a respective one of a plurality of phase-shifted images; deriving, from the plurality of cross correlation values [c(τ.sub.0), c(τ.sub.1), c(τ.sub.2), c(τ.sub.3)], a depth map D, the depth map D having values representing, for each pixel, distance to a portion of an object upon which the sent signals (s(t)) are incident; selecting, from the plurality of phase-shifted images, a guidance image (I; I′), the guidance image (I; I′) being an image having well defined edges; and generating an output image D′ based on the depth map D and the guidance image (I; I′), the output image D′ comprising an edge-preserving and smoothed version of depth map D, the edge-preserving being from guidance image (I; I′).
2. The method of claim 1, comprising acquiring the plurality of phase shifted images in succession, each phase shifted image corresponding to a respective temporally spaced position or phase.
3. The method of claim 1, wherein selecting the guidance image comprises selecting as the guidance image a phase-shifted image from a plurality of previously-acquired phase-shifted images, based on one or more predetermined criteria.
4. The method of claim 3, wherein the predetermined criteria comprise that the phase-shifted image (i) with maximum amplitude of the object degraded by motion artefact, (ii) with maximum object edge sharpness value, (iii) with the best edge contrast and/or image SNR, (iv) with the maximum average spatial amplitude or (v) that is non-corrupted, is selected as the guidance image.
5. The method of claim 1, including using a guided filter (GF) to apply valid depth measurements to previously identified corrupted depth pixels due to motion artifacts.
6. The method of claim 1, wherein generating an output image D′ comprises determining the output image D′ as:
D′.sub.i=ā.sub.lI.sub.i+
7. The method of claim 1, wherein generating an output image D′ comprises: filtering the guidance image I to generate a de-noised guidance image I′; and generating an output image D′ based on the depth map D and the de-noised guidance image I′.
8. The method of claim 7, wherein filtering the guidance image I to generate a de-noised guidance image I′ comprises using a guided filter to perform said filtering.
9. The method of claim 1, wherein generating an output image D′ further comprises: generating a plausibility map P based on the plurality of cross correlation values [c(τ.sub.0), c(τ.sub.1), c(τ.sub.2), c(τ.sub.3)], the plausibility map P comprising, for each pixel of the depth map D, a value indicative of whether the pixel is corrupted by motion or saturation; and generating the output image D′ based on the depth map D and the plausibility map P, and on either the guidance image I or the de-noised guidance image I′.
10. The method of claim 9, wherein generating the plausibility map P comprises determining, for each pixel, a metric p.sub.i representing the deviation of the phase-shifted amplitudes from a sinusoidal model of the cross-correlation function.
11. The method of claim 10, wherein the metric p.sub.i comprises:
p.sub.i=|c(τ.sub.1)−c(τ.sub.0)31 c(τ.sub.2)+c(τ.sub.3)|/(a+α) where α is a regularization parameter preventing high value of p.sub.i when the amplitude a is low.
12. The method of claim 10, wherein the plausibility map P has values P.sub.i, for each pixel i such that:
13. The method of claim 12, wherein δ is determined by capturing by the ToF sensor of an empty or motionless scene.
14. The method of claim 7, wherein filtering the guidance image I to derive the de-noised guidance image I′ comprises: applying an edge preserving de-noising filter to guidance image I, whereby edge information and noise reduction from the guidance image I is transferred to the output image D′.
15. The method of claim 7, wherein filtering the guidance image I comprises deriving de-noised guidance image I′ using:
I′.sub.i=ā.sub.lI.sub.i+
16. The method of claim 7, wherein generating an output image D′ further comprises: generating a plausibility map P based on the plurality of cross correlation values [c(τ.sub.0), c(τ.sub.1), c(τ.sub.2), c(τ.sub.3)], the plausibility map P comprising, for each pixel of the depth map D, a value indicative of whether the pixel is corrupted by motion or saturation; and generating the output image D′ based on the depth map D and the plausibility map P, and on either the guidance image I or the de-noised guidance image I′, wherein generating an output image D′ comprises generating an output image D′ according to:
D′.sub.i=ā.sub.lI′.sub.i+
17. The method of claim 1, wherein the output image D′ comprise a version of depth map D alternatively or additionally from which motion artifacts and/or noise have been suppressed or removed.
18. The method of claim 1, wherein the cross correlation function c(τ) is calculated as:
19. The method of claim 18, wherein the cross correlation values [c(τ.sub.0), c(τ.sub.1), c(τ.sub.2), c(τ.sub.3)] are determined from c(τ) as four samples (τ.sub.k), k=0, . . . , 3, taken at four subsequent time intervals
20. The method of claim 10, wherein the distance measurements d for each pixel of the depth map D are obtained from
21. The method of claim 1, wherein an amplitude image A defined as A=[a.sub.ij].sub.m×n, where the a.sub.ij are determined from:
a=½√{square root over ((c(τ.sub.3)−c(τ.sub.1)).sup.2+(c(τ.sub.0)−c(τ.sub.2)).sup.2)}, where c(τ.sub.0), c(τ.sub.1), c(τ.sub.2), c(τ.sub.3) are the cross correlation values.
22. The method of claim 1, wherein four cross correlation values [c(τ.sub.0), c(τ.sub.1), c(τ.sub.2), c(τ.sub.3)] are calculated from the cross correlation function c(τ).
23. A programmable image processing system when suitably programmed for carrying out the method of claim 1, the system comprising circuitry for receiving or storing the received signals (r(t)) and sent signals (s(t)), and processing circuitry for performing the methods.
24. A system for real-time motion artifact handling and noise removal for time-of-flight (ToF) sensor images, the ToF sensor images corresponding to received signals (r(t)) received by a ToF sensor following sending of modulated sent signals (s(t)), the system comprising: circuitry for receiving or storing the received signals (r(t)) and sent signals (s(t)); processing circuitry, coupled to the circuitry for receiving or storing the received signals (r(t)) and sent signals (s(t)), the processing circuitry being operable for calculating values of a cross correlation function c(τ) at a plurality of temporally spaced positions or phases from the sent (s(t)) and received (r(t)) signals, thereby deriving a plurality of respective cross correlation values [c(τ.sub.0), c(τ.sub.1), c(τ.sub.2), c(τ.sub.3)], wherein each of the plurality of respective cross correlation values [c(τ.sub.0), c(τ.sub.1), c(τ.sub.2), c(τ.sub.3)] corresponds to a respective one of a plurality of phase-shifted images; deriving, from the plurality of cross correlation values [c(τ.sub.0), c(τ.sub.1), c(τ.sub.2), c(τ.sub.3)], a depth map D, the depth map D having values representing, for each pixel, distance to a portion of an object upon which the sent signals (s(t)) are incident; selecting, from the plurality of phase-shifted images, a guidance image (I; I′), the guidance image (I; I′) being an image having well defined edges; and generating an output image D′ based on the depth map D and the guidance image (I; I′), the output image D′ comprising an edge-preserving and smoothed version of depth map D, the edge-preserving being from guidance image (I; I′).
25. A non-transitory recordable, rewritable or storable medium having recorded or stored thereon data defining or transformable into instructions for execution by processing circuitry and corresponding to at least the steps of claim 1.
26. A server computer incorporating a communications device and a memory device and being adapted for transmission on demand or otherwise of data defining or transformable into instructions for execution by processing circuitry and corresponding to at least the steps of claim 1.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Preferred embodiments of the invention will now be described, by way of example, with reference to the accompanying drawings, in which:
(2)
(3)
(4)
(5)
DETAILED DESCRIPTION
(6) As used herein, the “images” or “image signals” may be analog or digital, and may be subject to conventional analog or digital filtering.
(7) Where references are made herein to steps, operations or manipulations involving “images”, etc., these are implemented, where appropriate, by means of operations upon electronically processable representations (e.g. captured stills of video frame signals, bitstream video data, MPEG files or video streams, PC-video, or any other capturable or viewable image data format) of such “images”.
(8) Where references are made herein to steps, operations or manipulations involving “images”, “image signals” or “image data”, these may be implemented, where appropriate, by means of software controlled processor operations, hardware circuitry or any suitable combination of these.
(9) While the present invention is suitably embodied in a computer system, it may be incorporated in an adaptor, an image processor, or any other equipment located between or incorporating an image source or image capture device and a display device (e.g. LCD, Plasma, projector, etc.), or in the display device itself. The computer system suitably comprises a processor coupled (where appropriate via DACs and ADCs, or other interfaces) to RAM, ROM, storage devices, image capture and/or image storage devices, display driver and display devices, data communication and other peripherals, as is well known to persons skilled in the art; therefore, these will not be illustrated or discussed further.
(10) In the following, the ToF working principle is briefly discussed, to facilitate the understanding of the disclosed embodiments of the present invention.
(11) Time of Flight Principle
(12)
(13) A ToF camera 102 includes a modulation element 104 generating a transmitted or sent signal s(t) that is emitted by optical emitter 106 as a modulated NIR illumination signal 108. The NIR illumination signal 108 is incident upon object 110 within a scene being sensed, with the reflected optical signal, reflected by object 110 being received at sensor (e.g.2D CCD array) 112 as received signal r(t).
(14) Also within ToF camera 102, a phase delay element 114 receives the sent signal s(t) and applies a phase delay to it, thus outputting a phased delayed signal s(t+τ), where τ is a phase delay. Processing circuitry (not shown) within or coupled to sensor 112 then calculates, based on phased delayed signal s(t+τ) and received signal r(t), cross correlation function c(τ), as discussed in further detail below.
(15) As illustrated in
(16)
s(t)=1+cos(ωt)
r(t)=h+a. cos(ωt−φ)
with ω=2πf.sub.m the angular modulation frequency, with f.sub.m the modulation frequency, and with h the background light plus the non-modulated part of the incident signal; the waveforms and their relationships are illustrated in
(17) The cross correlation function c(τ) is calculated as follows:
(18)
(19) Three or more samples of c(τ) per modulated period T are usually needed in order to unambiguously determine the phase Φ and the amplitude a of the incident signal, as well as its offset h. In embodiments disclosed herein, the so-called four-taps technique is used, in which four samples (τ.sub.k), k=0, . . . , 3, are taken at intervals
(20)
within a modulated period T.
(21)
(22) In embodiments disclosed herein, four samples instead of three are chosen, to (i) improve robustness against noise, (ii) enable a highly symmetric design of the sensor, (iii) ensure that the phase is insensitive to quadratic non-linearities in detection, and (iv) simplify the formulae for the phase Φ, the amplitude a, and the offset h.
(23) The working principle of ToF cameras 102 based on modulated NIR light resolves distance calculation from four phase-shifted images. Ideally, the four phase-shifted images would be simultaneously acquired, but in practice the acquisition is done sequentially. This in turn can cause corrupted distance calculations in those regions of non-matching raw phase values due to motion, that is, along object boundaries and within inhomogeneous reflection surfaces, which are more prominent the faster the object moves, the closer the object is to the ToF camera 102, and the higher the scene is exposed (higher integration time). Therefore, a larger integration time may be set for static scenes or scenes with slow moving objects, which would increase the depth accuracy, whereas and despite the increase of noise, short integration times may be set for high dynamic scenes with fast moving objects in order to avoid motion artefacts.
(24) The distance measurements d to the object 110 in
(25)
with c≅3.10.sup.8 m/s the speed of light and L the working range or non-ambiguity distance range of the ToF camera 102:
(26)
(27) The factor ½ is due to the fact that light travels twice the distance between the camera 102 and the sensed object 110.
(28) The ToF camera 102 incorporates, as will be appreciated by persons skilled in the art, an image sensor 112 whose size corresponds to the camera resolution (m×n). Hence, each single pixel constituting the image sensor 112 is identified by the pixel position (i, j ), where i indicates the row and j indicates the column. Each pixel measures a distance d.sub.ij to the object 110 (or a respective discrete portion thereof). As a result, the ToF camera 102 provides a distance image or depth map D defined as D=[d.sub.ij].sub.m×n, the matrix of all the elements d.sub.ij.
(29) In the same way, an amplitude image A is defined as A=[a.sub.ij].sub.m×n.
(30)
(31) Briefly stated, from values [c(τ.sub.0), c(τ.sub.1), c(τ.sub.2), c(τ.sub.3)] of the correlation function c(τ), the depth map D is derived by depth map module 204, the depth map D comprising values representing, for each pixel thereof, a distance to an object upon which the sent signals are incident. Also based on the values [c(τ.sub.0), c(τ.sub.1), c(τ.sub.2), c(τ.sub.3)] of the correlation function c(τ), a guidance image I is generated by guidance image module 206, and, in a preferred embodiment, a de-noised guidance image I′ is generated from guidance image I at guidance image de-noising module 208. (I in a preferred embodiment, however, the guidance image I may be used.) Finally, an output image (processed depth map) D′ is generated and output by motion artifact handling module 210, based on depth map D and guidance image I or more preferably de-noised guidance image I′. In a further preferred embodiment, plausibility map generation module 212 generates a plausibility map P; and the output image (processed depth map) D′ is generated and output by motion artifact handling module 210 based on (i) depth map D, (ii) plausibility map P and (iii) guidance image I or more preferably de-noised guidance image I′.
(32) The processing by the various modules in
(33) Guided Filter
(34) In this section, the guided filter (GF), employed in at least some embodiments of the invention, is briefly discussed: this is used to (1) de-noise the guidance image in de-noising module 208, and (2) set valid depth measurements to the previously identified corrupted depth pixels due to motion artifact.
(35) The GF, in a preferred embodiment, is a new edge-preserving smoothing filter that, compared to the widely used bilateral filter, presents a better behavior near edges with a major advantage of being a fast and non-approximate linear time algorithm (O(N) time), regardless of the kernel size and the intensity range.
(36) Given a depth map D and a guidance image I, the resulting edge-preserving from I and smoothed version of D, i.e. D′ is expressed as:
D′.sub.i=ā.sub.lI.sub.i+
where
(37)
are linear coefficients assumed to be constant in w.sub.k. Ī.sub.k and σ.sub.k.sup.2 are respectively the mean and the variance of I in w.sub.k, |w| is the number of pixels in w.sub.k and ϵ is a regularization parameter penalizing large a.sub.k.
(38)
is the mean of D in w.sub.k.
(39) The selection of the window size w.sub.k may be done according to application: it may be small for image detail enhancement in order to enhance thin details, and larger for structure transferring filtering. The smoothing level is given by the ϵ parameter.
(40) Plausibility Map
(41) Each pixel of the four phase-shifted images acquired for distance calculation are samples [c(τ.sub.0), c(τ.sub.1), c(τ.sub.2), c(τ.sub.3)] of the cross correlation function c(τ) between the emitted s(t) and received r(t) sinusoidally modulated signal, as illustrated in
(42) According to an embodiment, a pixel i affected by motion is identified by the following metric
p.sub.i=|c(τ.sub.1)−c(τ.sub.0)−c(τ.sub.2)+c(τ.sub.3)|/(a+α)
where α is a regularization parameter preventing high value of p.sub.i when the amplitude a is low.
(43) In this embodiment, a motion is detected at pixel i if its plausibility is larger than a threshold δ:
(44)
with δ a motion threshold value. The value of motion threshold value δ may be easily derived or adjusted by recording an empty or motionless scene by the ToF camera 102.
Guidance Image Selection and Processing
(45) A guidance image I with well-defined and sharp edges is needed to adjust the object boundaries in D affected by the motion artefact. Selection of the guidance image is performed in guidance image module 206 in
(46) If it is assumed herein that the motion during each phase-shifted image c(τ.sub.i) acquisition is negligible, any of the four phase-shifted images could be considered as a guidance image. However, as each phase-shifted image corresponds to a sampling of the cross correlation function c(τ) between the received (r(t)) and emitted (s(t)) modulated signals, the phase-shifted image having the maximum intensity will have the best SNR and thus, the best contrast at object boundaries. Therefore, in an embodiment, the phase-shifted image having the maximum average spatial amplitude is then selected as a guidance image I. A further step is preferably done in order to avoid transferring the noise from I to the filtered D. That is, the guidance image I is filtered using a GF with both guidance image and image to be filtered being the same, i.e.
I′.sub.i=ā.sub.lI.sub.i+
where
(47)
σ.sub.k.sup.2 is the variance of I in w.sub.k, |w| is the number of pixels in w.sub.k, ϵ is a regularization parameter penalizing large a.sub.k, and
(48)
is the mean of I in w.sub.k.
(49)
(50) Depth Motion Artifact Suppression Algorithm
(51) Returning to
D′.sub.i=ā.sub.lI′.sub.i+
where
(52)
and
(53)
is the mean of D in w.sub.k weighted by the map P, |w| is the constant number of pixels in the window w.sub.i centered at pixel i, |w.sub.k|=Σ.sub.i∈w.sub.
(54)
where
(55)
(56)
(57) While embodiments have been described by reference to embodiments having various components in their respective implementations, it will be appreciated that other embodiments make use of other combinations and permutations of these and other components.
(58) Furthermore, some of the embodiments are described herein as a method or combination of elements of a method that can be implemented by a processor of a computer system or by other means of carrying out the function. Thus, a processor with the necessary instructions for carrying out such a method or element of a method forms a means for carrying out the method or element of a method. Furthermore, an element described herein of an apparatus embodiment is an example of a means for carrying out the function performed by the element for the purpose of carrying out the invention.
(59) In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In other instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
(60) Thus, while there has been described what are believed to be the preferred embodiments of the invention, those skilled in the art will recognize that other and further modifications may be made thereto without departing from the scope of the invention, and it is intended to claim all such changes and modifications as fall within the scope of the invention. For example, any formulas given above are merely representative of procedures that may be used. Functionality may be added or deleted from the block diagrams and operations may be interchanged among functional blocks. Steps may be added or deleted to methods described within the scope of the present invention.