Methods and apparatus for coded time-of-flight camera

Abstract

In illustrative implementations, a time-of-flight camera robustly measures scene depths, despite multipath interference. The camera emits amplitude modulated light. An FPGA sends at least two electrical signals, the first being to control modulation of radiant power of a light source and the second being a reference signal to control modulation of pixel gain in a light sensor. These signals are identical, except for time delays. These signals comprise binary codes that are m-sequences or other broadband codes. The correlation waveform is not sinusoidal. During measurements, only one fundamental modulation frequency is used. One or more computer processors solve a linear system by deconvolution, in order to recover an environmental function. Sparse deconvolution is used if the scene has only a few objects at a finite depth. Another algorithm, such as Wiener deconvolution, is used is the scene has global illumination or a scattering media.

Claims

1. A method comprising, in combination: (a) an electrical signal controlling modulation of a light source, such that the light source illuminates a scene, the electrical signal is periodic, and each period of the electrical signal comprises an m-sequence; (b) a time-of-flight (ToF) sensor taking measurements of light that is incident on the ToF sensor and reflected from the scene; and (c) one or more computers performing an algorithm that takes the measurements as input and that calculates multiple scene depths in the scene for each pixel in a set of mixed pixels;  wherein (i) the one or more computers calculate a light sweep image, and (ii) the one or more computers calculate, based on the light sweep image, values of control signals which control a digital image displayed on a screen, such that (A) the digital image includes light intensities for voxels at a specific depth of the scene, but not at other depths in the scene, and (B) one or more of the voxels correspond to a scene position that is viewable from the ToF sensor only through a diffuser.

2. The method of claim 1, wherein the algorithm performs sparse deconvolution.

3. The method of claim 1, wherein the algorithm includes Weiner deconvolution or Hodrick-Prescott filtering.

4. The method of claim 1, wherein the scene is static during the modulation and during the measurements.

5. The method of claim 1, wherein: (a) the ToF sensor has a correlation waveform that is the cross-correlation of (i) a reference signal that controls modulation of gain of pixels of the ToF sensor and (ii) received light at the ToF sensor; and (b) the correlation waveform is not substantially sinusoidal.

6. Apparatus comprising, in combination: (a) a light source, (b) a time-of-flight (ToF) sensor; (c) a set of one or more computer processors; and (d) machine-accessible media;  wherein (i) the machine-accessible media do not comprise a transitory signal, and (ii) the machine-accessible media have instructions encoded thereon for the set of processors (A) to generate an electrical signal to control modulation of a light source, such that the electrical signal comprises an m-sequence, and (B) to perform an algorithm that takes, as an input, measurements by the ToF sensor, and that calculates, for each pixel in a set of mixed pixels, multiple depths in a scene, (C) to calculate a light sweep image, and (D) to calculate, based on the light sweep image, values of control signals which control a digital image displayed on a screen, such that (I) the digital image includes light intensities for voxels at a specific depth of the scene, but not at other depths in the scene, and (II) one or more of the voxels correspond to a scene position that is viewable from the ToF sensor only through a diffuser.

7. The apparatus of claim 6, wherein the algorithm involves sparse deconvolution.

8. The apparatus of claim 6, wherein the algorithm involves Wiener deconvolution or Hodrick-Prescott filtering.

9. The apparatus of claim 6, wherein at least one pixel in the ToF sensor comprises a lock-in pixel.

10. The apparatus of claim 6, wherein at least one pixel in the ToF sensor comprises a single-photon avalanche diode.

11. The apparatus of claim 6, wherein the algorithm involves sparse deconvolution.

12. The apparatus of claim 6, wherein the algorithm involves Wiener deconvolution or Hodrick-Prescott filtering.

13. A method comprising, in combination: (a) an electrical signal controlling modulation of a light source, such that the light source illuminates a scene and that at least a portion of the electrical signal comprises a flat spectrum signal; (b) a time-of-flight (ToF) sensor taking measurements of light that is incident on the ToF sensor and reflected from the scene; and (c) one or more computers performing an algorithm that takes the measurements as input and that calculates multiple scene depths for each pixel in a set of mixed pixels; wherein (i) the one or more computers calculate a light sweep image, and (ii) the one or more computers calculate, based on the light sweep image, values of control signals which control a digital image displayed on a screen, such that (A) the digital image includes light intensities for voxels at a specific depth of the scene, but not at other depths in the scene, and (B) one or more of the voxels correspond to a scene position that is viewable from the ToF sensor only through a diffuser.

14. The method of claim 13, wherein the algorithm performs sparse deconvolution.

15. The method of claim 13, wherein the electrical signal is not substantially square.

16. The method of claim 13, wherein the scene is static during the modulation and during the measurements.

17. The method of claim 13, wherein: (a) the ToF sensor has a correlation waveform that is the cross-correlation of (i) a reference signal that controls modulation of gain in pixels of the ToF sensor and (ii) received light at the ToF sensor; and (b) the correlation waveform is not substantially sinusoidal.

18. Apparatus comprising, in combination: (a) a light source configured to emit light that illuminates a static scene, which light undergoes modulation; (b) a time-of-flight (ToF) sensor; and (c) one or more computers that are programmed (i) to output an electrical signal that controls the modulation, such that the electrical signal is periodic and each period of the electrical signal comprises a flat spectrum signal, and (ii) to perform an algorithm that calculates a depth map of the scene, which depth map specifies, for each pixel in a set of mixed pixels of the ToF sensor, multiple depths in the scene, (iii) to calculate a light sweep image, and (iv) to calculate, based on the light sweep image, values of control signals which control a digital image displayed on a screen, such that (I) the digital image includes light intensities for voxels at a specific depth of the scene, but not at other depths in the scene, and (II) one or more of the voxels correspond to a scene position that is viewable from the ToF sensor only through a diffuser.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1A shows a first scene with a glass vase.

(2) FIG. 1B comprises frames of a movie of light sweeping over the first scene.

(3) FIG. 2 shows a screen displaying a digital image of a second scene (with a unicorn in the scene).

(4) FIG. 3A is a conceptual diagram of a conventional ToF camera, in which a sinusoidal correlation waveform is used.

(5) FIG. 3B is a conceptual diagram of an example of this invention, in which a non-sinusoidal correlation waveform is used.

(6) FIGS. 4A, 4B, and 4C show examples of an environmental profile function.

(7) FIG. 5A shows three different codes: (1) a conventional square code; (2) a broadband code; and (3) an m-sequence.

(8) FIGS. 5B, 5C, and 5D show (a) frequency domain spectrum, (b) autocorrelation and (c) low-pass filter (LPF) autocorrelation for the three different codes shown in FIG. 5A.

(9) FIG. 6 is a diagram of hardware components, in an illustrative implementation of this invention.

(10) FIG. 7 is a flow chart, showing steps in an illustrative implementation of this invention.

(11) FIGS. 8A and 8B show examples of an analog light sensor and digital light sensor, respectively.

(12) FIG. 9 shows frames of a movie of light sweeping over the second scene (with the unicorn).

(13) FIGS. 10A and 10B are images of a scene captured through a diffuser. FIG. 10A is a conventional photograph, in which the image is blurred by the diffuser. FIG. 10B is a much clearer image captured in an illustrative implementation of this invention.

(14) The above Figures show some illustrative implementations of this invention, or provide information that relates to those implementations. However, this invention may be implemented in many other ways.

DETAILED DESCRIPTION

(15) Conventional time-of-flight (“ToF”) systems achieve depth ranging by amplitude modulation of a continuous wave. These conventional ToF systems assume a single optical path length (range or depth) value per pixel the scene, although the scene may actually consist of multiple depths, e.g., a transparency in front of a wall. If there are multiple depths in a scene, then there are multiple path lengths (from the light source, to the scene, and back to the ToF camera), and a single pixel may receive light reflected from multiple depths in the scene. This scenario (which occurs when multiple light-paths hit the ToF sensor at the same pixel) is commonly known as a “mixed pixel” or “multipath interference”. A conventional ToF camera cannot accurately determine the depth of a mixed pixel; instead, a conventional ToF camera measures a range that is a non-linear mixture of the incoming light paths

(16) In illustrative implementations, this invention solves or mitigates the mixed pixel problem in many scenarios, as follows: The sequence of optical path lengths involved in light reaching each pixel is recovered and expressed as a time profile. A light source emits a binary code of strobed illumination, in order to illuminate the scene. Light reflects back from the scene to the ToF camera. A light sensor in the ToF camera records a sequence of demodulated values using successive electronic delays. In some cases (e.g. if the environmental response is sparse), a computer uses a sparse deconvolution algorithm to recover a sequences of Diracs in the time profile corresponding to the sequence of path lengths to multipath combinations. Alternately, if the environmental function is not sparse, then another algorithm (such as Wiener deconvolution) is used to solve for the time profile.

(17) FIG. 1A shows a scene with a glass vase in foreground and stuffed animals and a wall in the background. FIG. 1B comprises frames of a movie of light sweeping over the scene. The scene is spatially static. Each frame of the movie is a digital image calculated from a light sweep image. In this movie, multipath effects can be seen in the glass vase. In the early time-slots (e.g. 0 ns and 1 ns), bright spots are formed from the specularities on the glass. Light then sweeps over the other objects on the scene and finally hits the back wall, where it can also be seen through the glass vase (8 ns). Light leaves, first from the specularities (8-10 ns), then from the stuffed animals. The time slots correspond to the true geometry of the scene (light travels approximately 1 foot per nanosecond, times are for round-trip). In the example shown in FIG. 1B, this invention solves for multipath effects, e.g., the interactions of the translucent vase and back wall.

(18) FIG. 2 shows a screen 201 of a monitor 203 displaying a digital image of a scene. The scene has an almost transparent, acrylic unicorn in the foreground, and a wall in the background. A conventional ToF camera cannot recover depth of a transparent object, such as the acrylic unicorn. However, in illustrative implementations of this invention, a multipath algorithm accurately calculates the depth of the foreground (acrylic unicorn) and the background (wall). In the example shown in FIG. 2, a computer 205 outputs control signals to control the display of the digital image on the screen 201.

(19) Throughout this discussion, f(•) is used for functions with continuous argument and f[•] is used for functions with discrete argument.

(20) As used herein, cross-relation is defined as follows: Given two functions a(t) and b(t), the cross-correlation is

(21) c a , b ( τ ) = lim Δ .fwdarw. 1 2 Δ - Δ + Δ a * ( t ) b ( t + τ ) dt ( a .Math. b ) ( t ) t
where a* denotes complex-conjugate of a, custom character denotes the cross-correlation operator, and τ is the time shift between the two functions.

(22) Cross-correlation is related to the convolution operator by:
(acustom characterb)(t)=(ā.sup.**b)(t)
where ā(t)=a(−t), and where * denotes the linear convolution operator.

(23) The definition of cross-correlation leads to a natural extension for discrete sequences:

(24) c a , b [ τ ] = 1 K .Math. k = 0 K - 1 a * [ k ] b [ τ + k ] ( a .Math. b ) [ τ ]
where a[k]=a(kT) and b[k]=b(kT), ∀kεcustom character and for some sampling step T>0.

(25) A ToF camera uses an illumination control waveform i.sub.ω(t) with a modulation frequency ω to strobe the light source. In practice, in ToF cameras, the illumination waveform is often a periodic function such that:
i.sub.ω(t+T.sub.0)=i.sub.ω(t)
where T.sub.0 is the time-period of repetition.

(26) Since a homodyne setting is employed, for the sake of notational simplicity, the subscript ω is not used. Instead, i=i.sub.ω is used. The TOF camera measurements are obtained by computing c.sub.m,r (τ) where m(t) is the optical signal from the light source, and r(t) is the reference signal.

(27) In many embodiments (both of this invention and of conventional ToF cameras), the illumination control signal and reference signal are the same, that is, i(t)=r(t). The phase, which is encoded in the shift τ*=arg max.sub.τc.sub.m,r[τ], can be obtained a number of ways.

(28) In some conventional ToF cameras, only 2 to 4 samples of the correlation function c.sub.m,r [τ] are used for the computation of the phase. For many modulation functions, a sample on the rising edge and another on the falling edge are sufficient to find the peak. Another technique for computing the phase involves oversampling of the correlation function. The oversampling case is germane to the problem of multipath as the correlation function (for a binary code) can become distorted. In a conventional ToF camera, the final calculation from phase to distance is a straightforward linear calculation. For a Photonic Mixer Device (“PMD”) that uses square wave modulation and a sinusoidal form for c.sub.m,r [τ], the conversion is simply:

(29) d = c ϕ 4 π f ω , c = 3 × 10 8 m / s

(30) Time-of-flight cameras can theoretically operate at different modulation frequencies, which means that the distance is constant at different modulation frequencies and thus the ratio φ/f.sub.ω is constant, that is, doubling the modulation frequency will double the phase for a single-path scene.

(31) Conventional ToF cameras use a sinusoidal correlation function. This approach works for conventional range imaging, but cannot deal with multipath objects.

(32) FIG. 3A is a conceptual diagram of a conventional ToF camera, in which a sinusoidal correlation waveform is used. FIG. 3B is a conceptual diagram of an example of this invention, in which a non-sinusoidal correlation waveform is used. The former (the conventional ToF camera) cannot accurately measure scene depth in the presence of multipath interference, but the latter (the example of this invention shown in FIG. 3B) can.

(33) In FIGS. 3A and 3B, a scene has two objects at different depths, so that the environmental response 312 comprises light reflecting from the scene in two reflected pulses 301, 302, at times t.sub.1 and t.sub.2, respectively. Also, in FIGS. 3A and 3B, correlation waveforms 311, 321 are each a plot of cross-correlation c.sub.m,r[τ] (that is the cross-correlation of the reference signal and the optical signal from the light source).

(34) FIG. 3A illustrates operation of a conventional ToF camera. In a conventional ToF camera, a sinusoidal correlation waveform 311 and an environmental response 312 are convolved, resulting in an output measurement 313 that is also sinusoidal. In FIG. 3A, the two reflected sine waves 314, 315 from the two objects at different scene depths add to produce a sinusoidal measurement 313. The phase of this measured sinusoid 313 is in-between the phases of the component sinusoids 314, 315, which creates a problem of unicity. It is unclear whether two component sine waves 314, 315 are really in the environment, or if only one component 313 exists (with the mixed phase).

(35) FIG. 3B shows a non-sinusoidal correlation waveform 321 that avoids problems with unicity. This can be done by selecting appropriate binary sequences for r(t) and i(t) (such as the broadband code and m-sequence codes illustrated in FIG. 5A). The code selection also ties in with the conditioning of the inverse problem, as discussed below.

(36) In the example of this invention shown in FIG. 3B, the environmental response 312 is sparse and has two Dirac delta peaks 301, 302, corresponding to two different depths in a scene. A non-sinusoidal correlation waveform 321 and the environmental response 312 are convolved, resulting in output measurement 323. The non-sinusoidal correlation waveform 321 shows a distinct peak that is non-bandlimited. Therefore, when the non-sinusoidal correlation waveform 321 is convolved with the environmental response 312, the output measurement 323 has two distinct peaks. The Diracs in the environmental response are recovered by sparse deconvolution.

(37) FIGS. 4A, 4B, and 4C show examples of an environmental profile function. In each of these Figures, the environment profile function ξ[t] is a discretized time profile. In FIG. 4A, the scene comprises one opaque wall 401, and the environment profile appears as a single spike 405. In FIG. 4B, the scene comprises a transparency 412 in front of an opaque wall 411, and the environment profile appears as a pair of spikes 415, 416. In FIG. 4C, the scene comprises scattering media 422 in front of an opaque wall 421, and the environmental response appears as a time profile 425 that is not sparse. For many applications of time-of-flight, a sparse formulation of ξ[t] is desired.

(38) In some existing ToF cameras, an alternate approach is used: to acquire range maps at different modulation frequencies and then solve a fitting problem to resolve multipath. Unfortunately, the problem of exponential fitting is known to be ill-conditioned and the implementation is often challenging: it is time consuming, requires additional hardware for multi-frequency, and the frequency response calibration varies from shot to shot.

(39) In contrast, in illustrative implementations of this invention, multipath recovery is performed using only a single fundamental frequency.

(40) Forward Model of Environmental Convolution

(41) In illustrative implementations of this invention, a forward model of environment convolution is employed. For this forward model: Start by relating the optically measured signal m[t] to the discrete illumination control signal i[t]:
m[t]=(i*φ*ξ)[t]  (1)

(42) Here, the illumination signal i[t] is first convolved with a low-pass filter φ[t]. This represents a smoothing due to the rise/fall time of the electronics. Subsequent convolution with an environment response ξ[t] returns the optical measurement m[t].

(43) The function ξ[t] is a scene-dependent time profile function (FIG. 4). For a single opaque object, ξ[t] appears as a Kronecker Delta function: δ[t−φ] where φ represents the sample shift that encodes path-length and object depth. In the multipath case, without scattering, the environment function represents a summation of discrete Dirac functions:

(44) ξ [ t ] = .Math. k = 1 K - 1 α k δ [ t - t k ] ,
where {α.sub.k,τ.sub.k}.sub.k=0.sup.K-1 denotes amplitude scaling and phases, respectively.

(45) The measured cross-correlation function in the presence of the environment function is:

(46) c r , i * φ * ξ [ τ ] = ( r .Math. ( i * φ * ξ ) ) [ τ ] = ( ( r .Math. i ) ζ [ t ] ) * φ * .Math. k = 0 K - 1 α k δ [ .Math. - t k ] Sparse Environment Response = ζ * φ * .Math. k = 0 K - 1 α k δ [ .Math. - t k ] ( 2 )
where ζ is the deterministic kernel.

(47) In this forward model (including Equation 2), measurements c.sub.r,i*φ*ξ[τ] are the cross-correlations in presence of an unknown, parametric environment response, ξ[t]. The measurement is modeled as a convolution between the environment and the deterministic kernel, ζ[t]=(rcustom characteri)[t] and the low pass filter, φ.

(48) To condition the problem, note that Equation 2 can be developed as:

(49) ( ζ * φ ) h * ξ [ t ] = ( h * ξ ) [ t ] ( 3 )
where h[t] is the convolution kernel resulting from low-pass filtering of ζ.

(50) In vector-matrix form the convolution is a circulant Toeplitz matrix acting on a vector:

(51) y = ( h * ξ ) Hx [ t ] H d × d Toeplitz : x d × 1 y d × 1 = Hx ( 4 )
where yεcustom character.sup.d is measurement vector which amounts to the sampled version of the correlation function where d represents the number of samples.

(52) The convolution matrix Hεcustom character.sup.d×d is a circulant Toeplitz matrix, where each column is a sample-shift of h[t]. Note that h implicitly contains a low-pass filter φ. The vector xεcustom character.sup.d is the vector corresponding to the environment ξ[t],
x=[ξ[0],ξ[1], . . . ,ξ[d−1]].sup.T.

(53) To estimate parameters of ξ, given y, the inverse of the convolution matrix H should be well defined. Equation 4 is a linear system. Provided that H is well conditioned, H can be inverted in the context of linear inverse problems. Since H has a Toeplitz structure, it is diagonalized by the Fourier matrix and the eigenvalues correspond to the spectral components of h.

(54) Controlling the condition number of H amounts to minimizing the ratio of highest to lowest Fourier coefficients of h[t] or eigenvalues of H. This is the premise for using binary sequences with a broadband frequency response. FIGS. 5A and 5B show several codes as well as their spectrums.

(55) Sparse Formulation

(56) Since ξ[t] is completely characterized by {α.sub.k,τ.sub.k}.sub.k=0.sup.K-1, in the multipath case, it is desirable to estimate these parameters. For given set of measurements y
y[τ]=Σ.sub.k=0.sup.K-1α.sub.kh[τ−t.sub.k]custom charactery=Hx
and knowledge of h, the problem of estimating ξ boils down to

(57) arg min ( α k , t k ) .Math. .Math. k = 0 K - 1 y [ t ] - .Math. k = 0 K - 1 α k h [ t - t k ] .Math. 2 .

(58) There are many classic techniques to solve this problem in time or in frequency domain, including a pseudoinverse or even Tikhonov regularization. However, since ξ is a K-sparse signal, an advantageous approach is to begin with a sparsity promoting optimization scheme. The problem falls into the deconvolution framework mainly because of the low-pass nature of h or the smearing effect of H. In this context, the sparse deconvolution problem results in the following problem:

(59) 0 arg min x .Math. Hx - y .Math. 2 2 such that .Math. x .Math. 0 K , ( 5 )
where ∥x∥.sub.0 is the number of non-zero entries in x.

(60) Due to non-convexity of ∥x∥.sub.0 and mathematical technicalities, this problem is intractable in practice. However, a version of the same which incorporates convex relaxation can be cast as:

(61) arg min x .Math. Hx - y .Math. 2 2 such that .Math. x .Math. 1 K ,
where ∥x∥.sub.1=Σ.sub.k|x.sub.k| is the l.sub.1-norm.

(62) This is commonly known as the LASSO problem. Several efficient solvers exist for this problem. In some implementations of this invention, SPGL1 and CVX are used to solve the LASSO problem. Alternately, in some implementations, a computer executes a modified version of the greedy, orthogonal matching pursuit algorithm (OMP) that approximates the l.sub.0 problem. This modified version uses (1) non-negativity constraints, and (2) proximity constraints.

(63) For non-negativity, two modifications are made to OMP: (a) when searching for the next atom, consider only positive projections or inner products, and (b) when updating the residual error, use a solver to impose a positivity constraint on the coefficients.

(64) The proximity constraints are imposed as follows: For the first projection, allow OMP to proceed without modifications. After the first atom has been computed, the subsequent atom must be in proximity to the leading atom in the sense that the columns are near one another in the matrix H. In practice, this involves enforcing a Gaussian penalty on the residual error. This can be formulated as a maximum a posteriori (MAP) estimation problem:

(65) arg max x p ( x | y ) arg max x p ( y | x ) likelihood p ( x ) prior
where p(y|x) is the likelihood which is a functional form of the combinatorial projection onto the dictionary, and p(x) is a prior, modelled as
p(xN(x;μ,σ.sup.2) where μ=x.sub.K=1
where N is the usual Normal Distribution with mean and variance μ and σ.sup.2, respectively. Here, x.sub.K=1 represents the column index of the first atom.

(66) In illustrative implementations of this invention, the prior is a physics-inspired heuristic that is carefully chosen in the context of binary codes and by extension the knowledge of the convolution kernel.

(67) Deconvolution for Time Profile Movies

(68) In a transient movie each pixel can be represented as a time profile vector which encodes intensity as a function of time. For a time-of-flight camera, the analogue is a phase profile vector, or in other words the environment function ξ[t]. In the discussion above (regarding a sparse formulation), ξ[t] is assumed to be a sparse function resulting from a few objects at finite depths. However, in the case of global illumination and scattering, the environment response is a non-sparse function (see FIG. 4C). In some implementations of this invention, in order to create a transient movie in the context of global illumination, Tikhonov regularization is used for the deconvolution problem. For example, in some implementations, the deconvolution problem is solved in the framework of Hodrick-Prescott filtering which can be thought of Tikhonov regularization with a smoothness prior:

(69) arg min x .Math. y - Hx .Math. 2 2 + λ .Math. Dx .Math. 2 2 ,
where D is a second order difference matrix with circulant Toeplitz structure and λ is the smoothing parameter. Generalized Cross-Validation is used to select the optimal value of λ.

(70) In some implementations of this invention (e.g., in which the environmental function is not sparse), a computer performs Wiener deconvolution with smoothness constraints (rather than sparse deconvolution with sparseness constraints) to solve for the time profile.

(71) In some implementations, a Hodrick-Prescott filter is used to approximate a Weiner deconvolution. However, Weiner deconvolution in this invention is not limited to a Hodrick-Prescott filter. In some implementations, other algorithms or filters are used for Weiner deconvolution.

(72) Code Strategies

(73) In illustrative implementations of this invention, a symmetric coding strategy is employed: that is, the same code is used for r(t) and i(t). Because the smearing matrix in equation 4 is Toeplitz, its eigenvalues correspond to spectral amplitudes. Since the condition number relates the maximal and minimal eigenvalues, a low condition number in this context corresponds to a broadband spectrum.

(74) In illustrative implementations of this invention, the codes used for r(t) and i(t) are maximum-length sequences (m-sequences). The m-sequences are pseudorandom binary sequences (PN-sequences). PN-sequences are deterministic, yet have a flat spectrum typical of random codes. The m-sequence is generated recursively from primitive polynomials. An advantage of m-sequences is that they have desirable autocorrelation properties. Consider an m-sequence stored in vector z with a period P:

(75) a [ z , z ] ( k ) .Math. i z i z _ i - k = { 1 k = 0 1 P 0 < k < P - 1 } ( 6 )
where a.sub.[z,z] defines the autocorrelation operator.

(76) As the period length P increases the autocorrelation approaches an impulse function, which has an ideal broadband spectral response. Also, advantageously, m-sequences are easy to generate, deterministic, and spectrally flat.

(77) In illustrative implementations, the m-sequence has these advantages, among others: (i) the code length is easy to adjust and (ii) the autocorrelation function is nearly zero outside of the peak. The length of the m-sequence is carefully chosen: too short of a sequence and the autocorrelation will be high outside the peak and too long of a code leads to a longer acquisition time. In a prototype of this invention, the following m-sequence was used: 0101110110001111100110100100001. This m-sequence had length 31 (m=5).

(78) An m-sequence with a long period is not the only type of broadband code that can be used in this invention. In some implementations of this invention, other broadband codes are used. For example, in some embodiments, the DFT of the broadband code is perfectly flat, except for a DC component (if any). Also, for example, in some embodiments, a broadband code is chosen so as to (1) maximize the minimum of the magnitudes of the DFT values of the code, and (2) to minimize the variance of the DFT values of the code. For example, in some embodiments, a Raskar broadband code is employed. As used herein, “Raskar broadband code” means the following sequence: 1010000111000001010000110011110111010111001001100111. The Raskar broadband code has a low condition number. Indeed, the Raskar broadband code has a slightly lower condition number than the specific m-sequence listed above.

(79) FIG. 5A shows portions of three different codes: (1) a conventional square code 501; (2) a Raskar broadband code 502; and (3) an m-sequence 503 (specifically, the m-sequence listed above). FIGS. 5B, 5C and 5D show (a) frequency domain spectrum, (b) autocorrelation and (c) low-pass filter (LPF) autocorrelation, respectively, for the three different correlation codes shown in FIG. 5A. Specifically: FIG. 5B shows a spectrum of a square code 511, a spectrum of a Raskar broadband code 512, and a spectrum of an m-sequence code 513. FIG. 5C shows autocorrelation of a square code 521, autocorrelaton of a Raskar broadband code 522, and autocorrelation of an m-sequence code 523. FIG. 5D shows LPF autocorrelation of a square code 531, LPF autocorrelaton of a Raskar broadband code 532, and LPF autocorrelation of an m-sequence code 533.

(80) In illustrative implementations, the broadband or m-sequence codes lead to a well-conditioned inverse problem for Equation 4 (due to their frequency domain spectrums, see FIG. 5B).

(81) The frequency domain spectrum (FIG. 5B) of the correlation codes affects the condition number in the linear inverse problem, discussed above. The autocorrelation function (FIG. 5C) is the autocorrelation of the corresponding bit sequence shown in FIG. 5A. In the context of physical constraints, a low pass filter smoothens out the response of the correlation function to provide the measured autocorrelation.

(82) Square codes (e.g., 501) are used in conventional commercial implementations and lead to a substantially sinusoidal correlation function. This is a double-edged sword. While sinusoidal correlation functions allow a neat parametric method to estimate the phase—one only needs 3 samples to parameterize a sinusoid—they lead to problems of unicity and are thus not suitable for multipath scenarios (FIG. 3A). The spectra of a square code 511 has many nulls, leading to an ill-conditioned problem. Moreover, the LPF autocorrelation of a square code 531 is sinusoidal. This sinusoid creates a unicity problem.

(83) At first glance, Delta codes (not shown in FIGS. 5A, 5B, 5C, 5D) seem promising for deconvolution as their spectrum is broadband. However, aside from the obvious issue of SNR (signal to noise ratio), it is not possible to generate a true Delta code in hardware. A narrow box function approximates a Delta code. However, in the Fourier domain this is a sinc function with characteristic nulls which makes the problem poorly conditioned.

(84) Prototype

(85) The following three paragraphs are a description of a prototype of this invention:

(86) In this prototype, only one fundamental modulation frequency is used. A time-of-flight camera sends binary codes at arbitrary shifts. The sensor is a Photonic Mixer Device PMD19k-2 which has a pixel array size of 160×120. This sensor is controlled by a Stratix® III FPGA (field programmable gate array) operated at a clock frequency of 1800 MHz. Sony® SLD1239JL-54 laser diodes are used for illumination. These laser diodes are stable at the modulation frequency (50 MHz) used in this prototype. The analog pixel values are converted to 16 bit unsigned values by an ADC (analog-to-digital converter) during the pixel array readout process. The cross-correlation is measured by the PMD.

(87) Using data acquired at a single modulation frequency, this prototype captures time profile movies of a scene.

(88) In this prototype, the Stratix® III FPGA makes rapid sweeps between the reference and illumination signal. The modulation signals are generated on the phase lock loop (PLL) inside the Stratix® III FPGA with a configurable phase and frequency from a voltage controlled oscillator. The oscillator operates at 1800 MHz. The theoretical, best-case time resolution is calculated at 69.4 ps from the hardware specs. From a sampling perspective, this limit describes the spacing between two samples on the correlation waveform.

(89) This invention is not limited to the prototype described in the previous three paragraphs. This invention may be implemented in many other ways.

Examples of Hardware and Methods

(90) In some examples of this invention, the FPGA supports oscillation frequencies higher than 1800 MHz. For example, a Kintex® 7 FPGA supports frequencies up to 2133 MHz, which would theoretically allow for a time resolution of 58.6 ps. In illustrative implementations of this invention, the higher the oscillation frequency, the better the time resolution.

(91) In some embodiments, this invention calculates scene depths and visualizes light sweep images in real time. In some embodiments with this real-time performance, one or more of the following features are present: (a) a sampling period of less than 60 ps, (b) a compressed sensing approach to sampling time profiles; and (c) electronic hardware (such as an FPGA, integrated circuit or signal processor) optimized for rapid execution of readout, correlation calculation, or sparse deconvolution.

(92) In some implementations, hardware for this invention includes: (1) a pulsed light source with 50 MHz bandwidth; (b) a lock-in CMOS ToF sensor; and (c) a microcontroller or FPGA. The software on the microcontroller or FPGA handles the read-out from the sensor and strobes the illumination in a coded pattern. To sample the correlation waveform the FPGA software quickly shifts either the reference or illumination codes. The light source illuminates the scene with amplitude modulated light. The image sensor converts the returning light into electrons. The number of electrons collected at each pixel is converted into a digital value through an ADC. Data processing of the digital frames convert the frames into a range and amplitude image.

(93) FIG. 6 is a diagram of hardware components, in an illustrative implementation of this invention. A phase lock loop (PLL) 601 on an FPGA 602 generates a modulation signal to control amplitude modulation of an illumination source 603. In some cases, the illumination source 603 comprises one or more laser diodes 604 and a driver 605. In other cases, the light source 603 comprises LEDs. The PLL also generates a reference signal for a light sensor (e.g., a Photonic Mixer Device) 610. Analog pixel values from the sensor 610 are processed by a signal processing integrated circuit 611, which includes an ADC. The FPGA 602 sends correlation data to a computer 612 for further processing and visualization.

(94) In illustrative implementations of this invention, there is a relative delay between reference and illumination codes. Either code can be delayed. In some embodiments, the reference code is delayed and the illumination code is not delayed. In other embodiments, the illumination code is delayed, and the reference code is not delayed. In other embodiments, both the reference code and the illumination code are delayed, but by different amounts. In the example shown in FIG. 6, the delay electronics are implemented with a PLL on the FPGA. Alternately, in some embodiments, separate delay electronics are used.

(95) In illustrative implementations, light emitted by a light source is amplitude modulated. In these implementations, the radiant power of the light is modulated. For example, the modulation frequency (frequency of the periodic signal formed by modulating the radiant power of the light source) may be 1800 MHz. The modulation frequency of the light is not the same as the frequency band (aka color) of the light which is being modulated. For example, if the light source emits blue visible light in a frequency band from 610-670 terahertz, and the radiant power of the light source is modulated (e.g., strobed) at 1800 MHz, then the modulation frequency is 1800 MHz. Put differently, the color of the light is a carrier frequency and the amplitude modulation frequency is an envelope frequency.

(96) FIG. 7 is a flow chart, showing steps in an illustrative implementation of this invention. In the example shown in FIG. 7, a light source (such as laser diodes or LEDs) emits amplitude-modulated light 703 to illuminate a scene. Modulation of the light source (and the emitted light) is controlled by an electronic modulation signal 701 generated by FGPA TTL (transistor-transistor logic). Light reflects back from the scene to a light sensor (i.e., mixer 707). The mixer 707 is a PMD implemented using CMOS photogates. The mixer 707 receives a reference signal 705 from the FPGA. The mixer 707 computes per pixel values of the cross-correlation of the reference signal 705 and optical measurements of light reflected from the scene. In FIG. 7, M is the total number of frames. In FIG. 7, m is the delay, and ranges from 0 to M−1. A computer uses deconvolution 711 to compute a time profile image 713. The type of deconvolution 711 that is used may vary. For example, in some implementations: (a) if the environmental function is sparse, then the deconvolution 711 comprises sparse deconvolution; and (b) if the environmental function is not sparse, then the deconvolution 711 comprises Wiener deconvolution.

(97) In illustrative implementations, the reference signal and modulation signal are symmetric, and are broadband codes.

(98) In illustrative implementations of this invention, either an analog or a digital light sensor may be used to measure light reflected from the scene. FIGS. 8A and 8B show examples of an analog light sensor 801 and digital light sensor 803, respectively. For example, analog light sensor 801 may have analog lock-in pixels, or may comprise an analog photonic mixer device. For example, digital light sensor 803 may utilize single-photon avalanche diodes for single-photon synchronous detection.

(99) Examples of Applications

(100) FIG. 9 shows frames of a movie of light sweeping over the scene shown in FIG. 2 (with the cup and unicorn). The scene is spatially static. Each frame of the movie is a digital image calculated from a light sweep image. In the scene, a near-transparent acrylic unicorn is positioned 2 meters in front of a wall. A 10 ns gap in time occurs between light sweeping over the unicorn and the back wall. The number “13” printed on the back wall is only visible at later time-slots, while the body of the unicorn is opaque at early time slots. In the first frame (0.1 ns), specularities are visible in the unicorn. In the second frame (0.2 ns), the specularities have disappeared and only global illumination of the unicorn persists. By using sparse deconvolution and solving the multipath problem, a computer calculates the depth of the unicorn or the wall behind.

(101) Now consider global illumination (due to internal scattering). In FIG. 2, a near transparent acrylic unicorn (thickness ˜5 mm) is placed 10 centimeters away from the co-located camera and light source. Approximately 2 meters behind the unicorn is an opaque white wall. Two returns from the unicorn occur—a direct reflection off the acrylic unicorn and a reflection from the back wall passing through the unicorn. The first frame of FIG. 9 is acquired at 0.1 ns, when light is starting to wash over the unicorn. The second frame of FIG. 9 is acquired at 0.2 ns. The second frame shows internal reflections of light from inside the unicorn, but not reflections from the surface of the unicorn. Observe the unicorn's leg in the second frame (0.2 nanoseconds). The leg, which was specular at 0.1 nanoseconds (first frame, FIG. 9), has now (in frame 2) decreased in intensity.

(102) In the example shown in FIGS. 2 and 9, this invention distinguishes direct and global illumination by solving a multipath deconvolution problem.

(103) FIGS. 10A and 10B are images of a scene captured through a diffuser. FIG. 10A is a conventional photograph, in which the image is blurred by the diffuser. FIG. 10B is a much clearer image recovered in an illustrative implementation of this invention. In the clearer image in FIG. 10B, text (“TIME-OF-FLIGHT”) that was hidden by the diffuser in FIG. 10A is visible. FIG. 10B is an example of a visual display, in which light intensities at voxels at only a certain depth (and not at other depths) in a light sweep image are displayed. By deconvolving and visualizing the amplitude of the Dirac from the back wall, the computer makes the hidden text visible.

(104) Computers

(105) In exemplary implementations of this invention, one or more electronic computers (e.g. 205, 602, 612, 709) are specially adapted: (1) to control the operation of, or interface with, hardware components of a ToF camera (including any light source and any light sensor) or of a display screen; (2) to select or calculate any m-sequence or other broadband code, (3) to perform any calculation described above, including any convolution, correlation or sparse deconvolution algorithm; (4) to receive signals indicative of human input, (5) to output signals for controlling transducers for outputting information in human perceivable format, and (6) to process data, to perform computations, to execute any algorithm or software, and to control the read or write of data to and from memory devices. The one or more computers may be in any position or positions within or outside of the ToF camera. For example, in some cases (a) at least one computer is housed in or together with other components of the ToF camera, such as the imaging sensor, and (b) at least one computer is remote from other components of the ToF camera. The one or more computers may be connected to each other or to other components in the ToF camera either: (a) wirelessly, (b) by wired connection, or (c) by a combination of wired and wireless connections.

(106) In exemplary implementations, one or more computers are programmed to perform any and all algorithms described herein, and any and all functions described in the immediately preceding paragraph. For example, in some cases, programming for a computer is implemented as follows: (a) a machine-accessible medium has instructions encoded thereon that specify steps in an algorithm; and (b) the computer accesses the instructions encoded on the machine-accessible medium, in order to determine steps to execute in the algorithm. In exemplary implementations, the machine-accessible medium comprises a tangible non-transitory medium. In some cases, the machine-accessible medium comprises (a) a memory unit or (b) an auxiliary memory storage device. For example, while a program is executing, a control unit in a computer may fetch the next coded instruction from memory.

(107) Clarification

(108) To say that a signal is “broadband” means that the signal is broadband in the Fourier frequency domain.

Definitions

(109) The terms “a” and “an”, when modifying a noun, do not imply that only one of the noun exists.

(110) Here are some non-limiting examples of a “camera”: (a) an optical instrument that records images; (b) a digital camera; (c) a video camera; (d) a camera that uses photographic film or a photographic plate; (e) a light field camera; (f) an imaging system, (g) a light sensor; (h) a time-of-flight camera; (h) apparatus that includes a light sensor or an array of light sensors; and (i) apparatus for gathering data about light incident on the apparatus. The term “camera” includes any computers that process data captured by the camera.

(111) The term “comprise” (and grammatical variations thereof) shall be construed as if followed by “without limitation”. If A comprises B, then A includes B and may include other things.

(112) The term “computer” includes any computational device that performs logical and arithmetic operations. For example, in some cases, a “computer” comprises an electronic computational device, such as an integrated circuit, a microprocessor, a mobile computing device, a laptop computer, a tablet computer, a personal computer, or a mainframe computer. For example, in some cases, a “computer” comprises: (a) a central processing unit, (b) an ALU (arithmetic/logic unit), (c) a memory unit, and (d) a control unit that controls actions of other components of the computer so that encoded steps of a program are executed in a sequence. For example, in some cases, the term “computer” also includes peripheral units, including an auxiliary memory storage device (e.g., a disk drive or flash memory). However, a human is not a “computer”, as that term is used herein.

(113) The term “contain” (and grammatical variations thereof) shall be construed as if followed by “without limitation”. If A contains B, then A contains B and may contain other things.

(114) “Defined Term” means a term that is set forth in quotation marks in this Definitions section.

(115) “DFT” means Discrete Fourier Transform.

(116) For an event to occur “during” a time period, it is not necessary that the event occur throughout the entire time period. For example, an event that occurs during only a portion of a given time period occurs “during” the given time period.

(117) The term “e.g.” means for example.

(118) The fact that an “example” or multiple examples of something are given does not imply that they are the only instances of that thing. An example (or a group of examples) is merely a non-exhaustive and non-limiting illustration.

(119) Unless the context clearly indicates otherwise: (1) a phrase that includes “a first” thing and “a second” thing does not imply an order of the two things (or that there are only two of the things); and (2) such a phrase is simply a way of identifying the two things, respectively, so that they each can be referred to later with specificity (e.g., by referring to “the first” thing and “the second” thing later). For example, unless the context clearly indicates otherwise, if an equation has a first term and a second term, then the equation may (or may not) have more than two terms, and the first term may occur before or after the second term in the equation. A phrase that includes a “third” thing, a “fourth” thing and so on shall be construed in like manner.

(120) A “flat spectrum signal” means a signal, such that (a) the signal is an m-sequence, (b) the DFT of the signal does not include any component that is zero-valued and between two peaks of the DFT, or (c) the maximum magnitude of the DFT does not exceed the minimum magnitude of the DFT by more than 1000 decibels.

(121) “For instance” means for example.

(122) “Frame” shall be construed broadly. For example, the term “frame” includes measured data about a scene that is captured by a camera during a single time period or single exposure, even if (i) the data is not humanly perceptible, (ii) the data has not been computationally processed, and (iii) there is not a one-to-one mapping between the data and the scene being imaged.

(123) “Herein” means in this document, including text, specification, claims, abstract, and drawings.

(124) The terms “horizontal” and “vertical” shall be construed broadly. For example, “horizontal” and “vertical” may refer to two arbitrarily chosen coordinate axes in a Euclidian two dimensional space, regardless of whether the “vertical” axis is aligned with the orientation of the local gravitational field. For example, a “vertical” axis may oriented along a local surface normal of a physical object, regardless of the orientation of the local gravitational field.

(125) Unless the context clearly indicates otherwise: (1) the term “implementation” means an implementation of this invention; (2) the term “embodiment” means an embodiment of this invention; and (3) the term “cases” means implementations of this invention.

(126) The term “include” (and grammatical variations thereof) shall be construed as if followed by “without limitation”.

(127) “Intensity” means any measure of or related to intensity, energy or power. For example, the “intensity” of light includes any of the following measures: irradiance, spectral irradiance, radiant energy, radiant flux, spectral power, radiant intensity, spectral intensity, radiance, spectral radiance, radiant exitance, radiant emittance, spectral radiant exitance, spectral radiant emittance, radiosity, radiant exposure or radiant energy density.

(128) “Light” means electromagnetic radiation of any frequency. For example, “light” includes, among other things, visible light and infrared light. Likewise, any term that directly or indirectly relates to light (e.g., “imaging”) shall be construed broadly as applying to electromagnetic radiation of any frequency.

(129) A “light sweep image” means a set of data that: (a) is calculated from measurements, taken by a ToF sensor, of light reflected from a scene; and (b) specifies, for one or more pixels in the ToF sensor, light intensity at multiple depths in the scene.

(130) As used herein, (i) a single scalar is not a “matrix”, and (ii) a rectangular array of entries, all of which are zero (i.e., a so-called null matrix), is not a “matrix”.

(131) “Mixed pixel” means a pixel in a light sensor, which pixel receives light from points at different depths in a scene.

(132) To “multiply” includes to multiply by an inverse. Thus, to “multiply” includes to divide.

(133) The term “or” is inclusive, not exclusive. For example A or B is true if A is true, or B is true, or both A or B are true. Also, for example, a calculation of A or B means a calculation of A, or a calculation of B, or a calculation of A and B.

(134) A parenthesis is simply to make text easier to read, by indicating a grouping of words. A parenthesis does not mean that the parenthetical material is optional or can be ignored.

(135) As used herein, the term “set” does not include a so-called empty set (i.e., a set with no elements). Mentioning a first set and a second set does not, in and of itself, create any implication regarding whether or not the first and second sets overlap (that is, intersect).

(136) As used herein, a “subset” of a set consists of less than all of the elements of the set.

(137) “Some” means one or more.

(138) To say that a signal is “substantially sinusoidal” means that, if the signal were sampled at least 1000 times per period at equally spaced time intervals, then for each sample during an entire period of the signal, the value of the signal would be equal to that of an ideal sinusoidal wave, plus or minus 10% of the amplitude of the ideal sinusoidal wave.

(139) To say that a signal is “substantially square” means that, if the signal were sampled at least 1000 times per period of the signal at equally spaced time intervals, then for at least 90% of the samples in a period of the signal, the value of the signal would be equal to that of an ideal square wave, plus or minus 10% of the amplitude of the ideal square wave.

(140) “Such as” means for example.

(141) “Time-of-flight camera” or “ToF camera” means a camera that takes measurements that are dependent on a phase difference, the phase difference being between a phase of light emitted by the camera and a phase of light reflected back to the camera.

(142) A “time-of-flight sensor” or “ToF sensor” means a sensor of a ToF camera, which sensor takes measurements that are dependent on a phase difference, the phase difference being between a phase of light emitted by the ToF camera and a phase of light reflected back to the ToF camera.

(143) Spatially relative terms such as “under”, “below”, “above”, “over”, “upper”, “lower”, and the like, are used for ease of description to explain the positioning of one element relative to another. The terms are intended to encompass different orientations of an object in addition to different orientations than those depicted in the figures.

(144) A matrix may be indicated by a bold capital letter (e.g., D). A vector may be indicated by a bold lower case letter (e.g., α). However, the absence of these indicators does not indicate that something is not a matrix or not a vector.

(145) Except to the extent that the context clearly requires otherwise, if steps in a method are described herein, then: (1) steps in the method may occur in any order or sequence, even if the order or sequence is different than that described; (2) any step or steps in the method may occur more than once; (3) different steps, out of the steps in the method, may occur a different number of times during the method, (4) any step or steps in the method may be done in parallel or serially; (5) any step or steps in the method may be performed iteratively; (6) a given step in the method may be applied to the same thing each time that the particular step occurs or may be applied to different things each time that the given step occurs; and (7) the steps described are not an exhaustive listing of all of the steps in the method, and the method may include other steps.

(146) This Definitions section shall, in all cases, control over and override any other definition of the Defined Terms. For example, the definitions of Defined Terms set forth in this Definitions section override common usage or any external dictionary. If a given term is explicitly or implicitly defined in this document, then that definition shall be controlling, and shall override any definition of the given term arising from any source (e.g., a dictionary or common usage) that is external to this document. If this document provides clarification regarding the meaning of a particular term, then that clarification shall, to the extent applicable, override any definition of the given term arising from any source (e.g., a dictionary or common usage) that is external to this document. To the extent that any term or phrase is defined or clarified herein, such definition or clarification applies to any grammatical variation of such term or phrase, taking into account the difference in grammatical form. For example, the grammatical variations include noun, verb, participle, adjective, or possessive forms, or different declensions, or different tenses. In each case described in this paragraph, Applicant is acting as Applicant's own lexicographer.

(147) Variations

(148) This invention may be implemented in many different ways. Here are some non-limiting examples:

(149) In one aspect, this invention is a method comprising, in combination: (a) using an electrical signal to control modulation of a light source, such that the light source illuminates a scene, the electrical signal is periodic, and each period of the electrical signal comprises an m-sequence; (b) using a ToF sensor to take measurements of light that is incident on the ToF sensor and reflected from the scene; and (c) using one or more computers to perform an algorithm that takes the measurements as input and that calculates per-pixel depth data, which depth data specifies multiple scene depths for each pixel in a set of mixed pixels. In some cases, the algorithm performs sparse deconvolution. In some cases, the algorithm includes Weiner deconvolution or Hodrick-Prescott filtering. In some cases, the measurements occur entirely during a time period in which the light source emits light at a single fundamental modulation frequency and not at any other fundamental modulation frequency. In some cases: (a) the ToF sensor has a correlation waveform that is the cross-correlation of (i) a reference signal that controls modulation of gain of pixels of the ToF sensor and (ii) received light at the ToF sensor; and (b) the correlation waveform is not substantially sinusoidal. In some cases: (a) the one or more computers calculate a light sweep image; (b) the one or more computers use the light sweep image to calculate values of control signals for controlling a digital image displayed on a screen; (c) the digital image includes light intensities for voxels at a specific depth of the scene, but not at other depths in the scene; and (d) one or more of the voxels correspond to a scene position that is viewable from the ToF sensor only through a diffuser. Each of the cases described above in this paragraph is an example of the method described in the first sentence of this paragraph, and is also an example of an embodiment of this invention that may be combined with other embodiments of this invention.

(150) In another aspect, this invention is an apparatus comprising, in combination: (a) a light source, (b) a ToF sensor; (c) a set of one or more computer processors; and (d) machine-accessible media; wherein (i) the machine-accessible media do not comprise a transitory signal, and (ii) the machine-accessible media have instructions encoded thereon for the set of processors (A) to generate an electrical signal to control modulation of a light source, such that the electrical signal comprises an m-sequence, and (B) to perform an algorithm that takes, as an input, measurements by the ToF sensor, and that calculates, for each pixel in a set of mixed pixels, multiple depths in the scene. In some cases, the algorithm involves sparse deconvolution. In some cases, the algorithm involves Wiener deconvolution or Hodrick-Prescott filtering. In some cases, at least one pixel in the ToF sensor comprises a lock-in pixel. In some cases, at least one pixel in the ToF sensor comprises a single-photon avalanche diode. Each of the cases described above in this paragraph is an example of the apparatus described in the first sentence of this paragraph, and is also an example of an embodiment of this invention that may be combined with other embodiments of this invention.

(151) In another aspect, this invention is a method comprising, in combination: (a) using an electrical signal to control modulation of a light source, such that the light source illuminates a scene and that at least a portion of the electrical signal comprises a flat spectrum signal; (b) using a ToF sensor to take measurements of light that is incident on the ToF sensor and reflected from the scene; and (c) using one or more computers to perform an algorithm that takes the measurements as input and that calculates per-pixel depth data, which depth data specifies multiple scene depths for each pixel in a set of mixed pixels. In some cases, the algorithm performs sparse deconvolution. In some cases, the electrical signal is not substantially square. In some cases, the measurements occur entirely during a time interval in which the light source emits light at a single fundamental modulation frequency and not at any other fundamental modulation frequency. In some cases: (a) the ToF sensor has a correlation waveform that is the cross-correlation of (i) a reference signal that controls modulation of gain in pixels of the ToF sensor and (ii) received light at the ToF sensor; and (b) the correlation waveform is not substantially sinusoidal. In some cases: (a) the one or more computers calculate a light sweep image; (b) the one or more computers use the light sweep image to calculate values of control signals for controlling a digital image displayed on a screen; (c) the digital image includes light intensities for voxels at a specific depth of the scene, but not at other depths in the scene; and (d) one or more of the voxels correspond to a scene position that is viewable from the ToF sensor only through a diffuser. Each of the cases described above in this paragraph is an example of the method described in the first sentence of this paragraph, and is also an example of an embodiment of this invention that may be combined with other embodiments of this invention.

(152) In another aspect, this invention is an apparatus comprising, in combination: (a) a light source for emitting light to illuminate a scene, which light undergoes modulation; (b) a ToF sensor; and (c) one or more computers for (i) generating an electrical signal to control the modulation, such that the electrical signal is periodic and each period of the electrical signal comprises a flat spectrum signal, and (ii) performing an algorithm that calculates a depth map of the scene, which depth map specifies, for each pixel in a set of mixed pixels of the ToF sensor, multiple depths in the scene. In some cases, the algorithm involves sparse deconvolution. In some cases, the algorithm involves Wiener deconvolution or Hodrick-Prescott filtering. Each of the cases described above in this paragraph is an example of the apparatus described in the first sentence of this paragraph, and is also an example of an embodiment of this invention that may be combined with other embodiments of this invention.

(153) While exemplary implementations are disclosed, many other implementations will occur to one of ordinary skill in the art and are all within the scope of the invention. Each of the various embodiments described above may be combined with other described embodiments in order to provide multiple features. This invention includes not only the combination of all identified features but also includes each combination and permutation of one or more those features. Furthermore, while the foregoing describes a number of separate embodiments of the apparatus and method of the present invention, what has been described herein is merely illustrative of the application of the principles of the present invention. Other arrangements, methods, modifications, and substitutions by one of ordinary skill in the art are therefore also within the scope of the present invention. Numerous modifications may be made by one of ordinary skill in the art without departing from the scope of the invention.