METHOD FOR DEPTH MEASUREMENT WITH A TIME-OF-FLIGHT CAMERA USING AMPLITUDE-MODULATED CONTINUOUS LIGHT

20220128692 · 2022-04-28

    Inventors

    Cpc classification

    International classification

    Abstract

    A method for depth measurement with a time-of-flight camera using amplitude-modulated continuous light by acquiring for each of a plurality of pixels of a sensor array of the camera at least one sample sequence having at least four amplitude samples (A.sub.0, A.sub.1, A.sub.2, A.sub.3) at a sampling frequency higher than a modulation frequency of the amplitude-modulated continuous light. The method further includes: determining for each sample sequence of each pixel a confidence value (C) indicating a degree of correspondence of the amplitude samples (A.sub.0, A.sub.1, A.sub.2, A.sub.3) with a sinusoidal time evolution of the amplitude; and determining for each of a plurality of binning areas, each of which comprises a plurality of pixels, a binned depth value (D.sub.b) based on the amplitude samples (A.sub.0, A.sub.1, A.sub.2, A.sub.3) of sample sequence of pixels from the binning area, wherein the contribution of a sample sequence to the binned depth value (D.sub.b) depends on its confidence value (C).

    Claims

    1. A method for depth measurement with a time-of-flight camera using amplitude-modulated continuous light, the method comprising: acquiring for each of a plurality of pixels of a sensor array of the camera at least one sample sequence comprising at least four amplitude samples at a sampling frequency higher than a modulation frequency of the amplitude-modulated continuous light; characterised in that the method further comprises: determining for each sample sequence of each pixel a confidence value indicating a degree of correspondence of the amplitude samples with a sinusoidal time evolution of the amplitude; and determining for each of a plurality of binning areas, each of which comprises a plurality of pixels, a binned depth value based on the amplitude samples of sample sequences of pixels from the binning area, wherein the contribution of a sample sequence to the binned depth value depends on its confidence value.

    2. A method according to claim 1, further comprising acquiring four amplitude samples at a sampling frequency four times higher than a modulation frequency of the amplitude-modulated continuous light.

    3. A method according to claim 1, further comprising acquiring, for at least one pixel, a plurality of sample sequences.

    4. A method according to claim 1, wherein the confidence value is determined by a relation of the amplitude samples of an individual sample sequence to each other.

    5. A method according to claim 1, further comprising: classifying each sample sequence as valid if the confidence value fulfills a predefined criterion and as invalid otherwise; and using the amplitude samples of a sample sequence to determine the binned depth value only if the sample sequence is valid.

    6. A method according to claim 5, wherein the sample sequence is classified based on a relation of the confidence value to a first threshold value.

    7. A method according to claim 1, wherein the binned depth value (DO is determined based on a linear combination of amplitude samples of sample sequences of pixels from the binning area, wherein the contribution of each sample sequence to the linear combination depends on the confidence value of the respective sample sequence.

    8. A method according t claim 1, wherein the binned depth value is determined by averaging pixel depth values of sample sequences of pixels from the binning area, wherein a weight of each pixel depth value depends on the confidence value of the respective sample sequence of the respective pixel, an wherein the pixel depth value is determined based on the amplitude samples of the sample sequence of the pixel.

    9. A method according to claim 1, further comprising determining a first difference between a first amplitude sample and a third amplitude sample of a sample sequence of a pixel, and assigning sample sequences having a positive first difference to a first group and sample sequences having a negative first difference to a second group.

    10. A method according to claim 9, further comprising: defining a vector having a second difference between a second amplitude sample and a fourth amplitude sample as a first component and the first difference as a second component; defining a first group vector which is a linear combination, based on the confidence values of the respective sample sequence, of the vectors of the first group and a second group vector which is a linear combination, based on the confidence value of the respective sample sequence, of the vectors of the second group; and determining the binned depth value based on a phase difference between the second group vector and the first group vector.

    11. A method according to claim 10, further comprising: determining the binned depth value based on both the first group vector and the second group vector if the phase difference is below a second threshold; and determining the binned depth value based on only one of the first group vector and the second group vector if the phase difference is above the second threshold.

    12. A method according to claim 11, wherein the second threshold is 180°.

    13. A method according to claim 11, wherein if the phase difference (Δϕ) is above the second threshold, the binned depth value is determined based on the group vector of the group having more valid sample sequences.

    14. A method according to claim 1, further comprising: determining the binned depth value based on both the first group vector and the second group vector if the first components of both group vectors are negative; and determining the binned depth value (D.sub.b) based on only one of the first group vector and the second group vector if at least one first component is positive.

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0031] Further details and advantages of the present invention will be apparent from the following detailed description of not limiting embodiments with reference to the attached drawing, wherein:

    [0032] FIG. 1 is a schematic view of a TOF camera that can be used for the inventive method and an object;

    [0033] FIG. 2 is a diagram showing the time evolution of a function and four amplitude samples;

    [0034] FIG. 3 is a vector diagram;

    [0035] FIG. 4 is a diagram illustrating amplitude values on a sensor array;

    [0036] FIG. 5 is another diagram showing the time evolution of a function and a plurality of amplitude samples;

    [0037] FIG. 6 is a first diagram illustrating the results of a depth measurement according to prior art;

    [0038] FIG. 7 is a second diagram illustrating the results of a depth measurement according to prior art;

    [0039] FIG. 8 is a vector diagram illustrating a vector addition;

    [0040] FIG. 9 is a third diagram showing the results of a depth measurement according to prior art;

    [0041] FIG. 10 is a flowchart illustrating a first embodiment of the inventive method.

    [0042] FIG. 11 is a diagram illustrating a binary confidence mask;

    [0043] FIG. 12 is a diagram illustrating the construction of a binary confidence mask and the application of this confidence mask;

    [0044] FIG. 13 is another vector diagram illustrating a vector addition;

    [0045] FIG. 14 is vector diagram illustrating the positions of two group vectors;

    [0046] FIG. 15 is a first diagram showing the results of a depth measurement;

    [0047] FIG. 16 is a second diagram showing the results of a depth measurement;

    [0048] FIG. 17 is a third diagram showing the results of a depth measurement;

    [0049] FIG. 18 is a flowchart illustrating a second embodiment of the inventive method; and

    [0050] FIG. 19 is a fourth diagram showing the results of a depth measurement.

    DETAILED DESCRIPTION

    [0051] FIG. 1 schematically shows a TOF camera 1 that is adapted for depth measurement using amplitude-modulated continuous light. It comprises a rectangular sensor array 2 with a plurality (e.g. several thousand or several ten thousand) of pixels 3. Furthermore, it may comprise a memory and a processing unit, which are not shown for sake of simplicity. The camera 1 is configured to emit amplitude-modulated continuous light 10 using one or several light emitters 5. The light 10 is reflected by a 3D object 20 or scenery in a field of view of the camera 1 and the reflected light 11 is received by the pixels 3 of the sensor array 2. The original modulation function s(t), with a phase delay τ, is correlated with the received function q(t), which yields a correlation function c(τ). The amplitude of the received function q(t) is sampled at a frequency four times higher than a modulation frequency f.sub.mod of the light 10. In other words, four amplitude samples A.sub.0 . . . 3, also referred to as taps, are used to retrieve the phase of the modulated light, as illustrated by FIG. 2. Each four amplitude samples A.sub.0 . . . 3 are part of a sample sequence for the respective pixel 3.

    [0052] A sinusoidal signal with amplitude A and phase can be represented by a 2D vector that can be determined from the 4 amplitude samples determined in the tap measurements, i.e.


    r(A,ϕ)=(A.Math.cos ϕ,A.Math.sin ϕ)=(d.sub.13,d.sub.02)  (eq. 1)

    where d.sub.02=A.sub.0−A.sub.2, which is hereinafter referred to as a first difference, and d.sub.13=A.sub.1−A.sub.3, which is hereinafter referred to as a second difference, are the pairwise differences of two amplitude samples A.sub.k, k=0 . . . 3. The amplitude and phase of the received signal can therefore be computed as


    ϕ=a tan 2(d.sub.02,d.sub.13)  (eq.2)


    A=√{square root over ((d.sub.02).sup.2+(d.sub.13).sup.2)}  (eq. 3)

    While the amplitude A of the signal is proportional to the number of the received photons, the phase ϕ is proportional to the depth D of the object seen by the corresponding pixels.

    [00008] ϕ = 4 π f mod c * D ( eq . 4 )

    where D is the pixel depth value, i.e. the distance from the pixel of the camera, c is the speed of light and f.sub.mod is the modulation frequency of the signal. Accordingly, the depth can be calculated by

    [00009] D = c 4 π f mod * ϕ ( eq . 5 )

    [0053] FIG. 3 is a vector representation of the signal received from an object in a depth of D=2 m when the modulation frequency is f.sub.mod=20 MHz.

    [0054] However, motion artifacts can occur along the edges of an object 20 which is moving in the scene of the camera 1. As tap measurements are performed subsequently, a pixel 3 close to an object edge may “see” the object surface during the acquisition of one amplitude sample, while in a subsequent acquisition it may see the background. FIG. 4 shows by way of example the occurrence of a motion artifact. An object 20 with the shape of an ‘O’ is shifted 1 pixel up and 1 pixel left during the acquisition of each amplitude sample. The greyscale values represent the number of tap measurements with the object present in the respective pixel. A black pixel represents zero tap measurements with the object present, while a white pixel represents four (out of four) tap measurements with the object present. On the right is the full image, on the left a zoom around the upper left outer edge of the ‘O’.

    [0055] FIG. 5 illustrates the error introduced by the motion of the object 20. Depending on what the pixel 3 sees during the acquisition, the measured amplitude sample A.sub.k lies on the sinusoidal cross correlation curves of the foreground, or the background signal. If one sums up the 4 subsequent amplitude samples A.sub.k of one pixel 3, one recognizes a blurring effect along the edge of the object 20. In a neighbourhood of 4×4 pixels 3 one can find a pixel 3 that “sees” in all taps the foreground object 20 as well as pixels 3 that partly see the foreground object 20 or the background. If there are subsequently acquisitions with different integration times performed, corresponding to additional sample sequences, they may also correspond to a different depth.

    [0056] Due to the different depth and remission of the foreground and background object, the amplitude samples A.sub.k may vary drastically. As a consequence, the phase and depth computed according to eq.2 and eq.5 may be wrong. The result is illustrated by the diagram in FIG. 6, which is a high-resolution depth image of an “O” shaped target at 2 m depth in front of a background at 7 m with moving with a shift of 1 pixel per tap acquisition in both horizontal and vertical direction. Along the object edges, the calculated depths vary between 1.65 m and 5.41 m, due to the motion artifact. It should be noted that the measured depths lie not only between the foreground and the background depth, but can be also outside of this depth range. The corresponding pixels may be referred to as flying pixels.

    [0057] According to prior art, there are two main approaches to alleviate this problem, both of which make use of a binning method. A plurality of pixels, e.g. 4×4 pixels, are taken as a binning area, for which a single depth value is determined. In a first approach, the amplitude sample A.sub.k for all pixels in the binning area are summed and a single depth value is calculated using eq.2 (with the sums instead of the individual amplitude samples A.sub.k) and eq.5. This approach may be referred to as “tap binning”. The result of this is shown in FIG. 7. One recognizes that there are outliers in the measured depth lying in a range between 0.31 m and 6.93 m. In other words, there are still depth values outside the depth range of the object 20 and the background, and the flying-pixel effect is even increased compared to the high-resolution image of FIG. 6. One reason for this increase can be understood by the vector diagram of FIG. 8. Adding the amplitude samples A.sub.k corresponds to a vector addition as shown in FIG. 8 for a first vector representing the depth of the object 20 and a second vector representing the background. Since the phases of the two vectors differ by over 180°, the phase of the resulting vector is smaller than that of either vector. This leads to a depth value outside the depth range.

    [0058] According to another approach, pixel depth values are determined for each individual pixel in the binning area and these pixel depth values are averaged to determine a depth value for the binning area. This approach may be referred to as “pixel binning”. The results are shown in FIG. 9. The averaging leads to a blurring of the depth values lying in the range from 1.81 m to 6.9 m.

    [0059] The abovementioned problems are reduced or eliminated by the inventive method. FIG. 10 is a flow chart illustrating a first embodiment of the inventive method.

    [0060] After the start of the method, a binning area 4 is selected at 100. This may be e.g. an area comprising 4×4 pixels 3 (see also FIG. 12). Next, a pixel 3 within the binning area 4 is selected at 110. At 120, the amplitude samples A.sub.k are determined for a sample sequence of this pixel 3. At 130, a confidence value C is calculated based on the amplitude samples A.sub.k. An individual confidence value C is calculated for every sample sequence of the respective pixel 3, i.e. if there is only one sample sequence, one confidence value C is calculated for each pixel. One possible definition of the confidence value C is as follows:

    [00010] C = 1 - .Math. A 1 - A 0 + A 3 - A 2 .Math. A ( eq . 6 )

    [0061] where the amplitude A is computed according to eq.3, but can be approximated by


    A=√{square root over ((d.sub.02).sup.2+(d.sub.13).sup.2)}≈MAX(|d.sub.02|,|d.sub.13|)  (eq. 7)

    [0062] By this definition, the confidence value C is always in the range between 0 and 1, with the highest possible value 1 representing a perfect sinusoidal function. At 140, this confidence value C is compared with a first threshold C.sub.min, which could be calculated, estimated or determined by calibration using a stationary scenery. In the following examples, the first threshold C.sub.min could be 0.25. The first threshold C.sub.min may also be referred to as a “motion parameter”, since it may be suitable to distinguish sample sequences that are affected by object motion from those that are unaffected by object motion. If the confidence value C is smaller than the first threshold C.sub.min, the respective sample sequence is classified as invalid at 190 and is basically not regarded further. If, on the other hand, the confidence value C is greater than the first threshold C.sub.min, the respective sample sequence is classified as valid at 150. The amplitude values or the first and second difference d.sub.02, d.sub.13, which can be regarded as second and first component of a vector, respectively, are kept for further processing.

    [0063] This procedure can be regarded as a creation of a binary confidence mask, which is graphically illustrated in FIG. 11. The upper part of FIG. 11 corresponds to FIG. 4, while the lower part shows the corresponding confidence mask, with the left part being a magnification of a portion near the edge of the object 20. The black colour indicates pixels that are regarded as invalid, while the white colour indicates pixels with sample sequences that are regarded valid. One recognizes that the areas where the taps are blurred are masked by the confidence mask.

    [0064] FIG. 12 further illustrates the construction of the confidence mask and the binning process for a binning area 4 of 4×4 pixels 3, where for sake of simplicity, a single sample sequence for each pixel 3 is assumed. First, as illustrated at a), individual amplitude samples are determined for each pixel (with the different shades representing sample numbers or points in time, respectively). Then, as shown in at b), confidence values are determined for each pixel (with dark tones representing high confidence values). At c) a confidence mask is shown with black representing pixels (or sample sequences, respectively) that are invalid and white representing pixels that are valid. Using this confidence mask together with the individual amplitude samples effectively yields binned amplitude samples for the entire binning area 4, as illustrated at d), where different shades again represent sample numbers or points in time, respectively). If several sample sequences, corresponding to several integration times, are considered, a confidence mask can be constructed for each integration time.

    [0065] At 160, the sign of the first difference d.sub.02 is determined. If the sign is positive, the sample sequence and its vector are assigned to a first group at 170, and if the sign is negative, the sample sequence and its vector are assigned to a second group at 180. As indicated by the dashed arrows, the steps 160, 170 and 180 can also be skipped in a simplified version of the method.

    [0066] The steps mentioned so far are repeated for all pixels 3 in the binning area 4 and, where applicable, for all sample sequences of each pixel 3. When it is determined at 200 that the last pixel 3 has been processed, the method continues at 210 by adding the vectors in the first and second group, respectively, to calculate a first group vector r.sub.P=[x.sub.Py.sub.P] and a second group vector r.sub.M=[x.sub.My.sub.M]. In other words, all vectors with a positive first difference d.sub.02 are summed and all vectors with a negative first difference d.sub.02 are summed. Therefore, the components of the first and second group vector r.sub.P, r.sub.M are calculated as follows, where the sum over multiple integration times is optional:

    [00011] y P = .Math. It = 0 n .Math. valid pixels with d 02 It > 0 d 02 It ( eq . 9 a ) x P = .Math. It = 0 n .Math. valid pixels with d 02 It > 0 d 13 It ( eq . 9 b ) y M = .Math. It = 0 n .Math. valid pixels with d 02 It < 0 d 02 It ( eq . 9 c ) x M = .Math. It = 0 n .Math. valid pixels with d 02 It < 0 d 13 It ( eq . 9 d )

    [0067] It should be noted that in either of the first and second group, only vectors of valid sample sequences are summed, while invalid sample sequences are disregarded for the binning process. The phases of any two vectors in the first group differ by less than 180°, wherefore the addition of these vectors cannot lead to flying pixels. The same applies to the vectors in the second group. The fact that the phases of the summed vectors differ by less than 180° guarantees that the resulting group vectors are not affected by the binning effect.

    [0068] At 220, the phase difference ΔΦ of the second and first group vector is calculated (assuming both phases to be between 0° and 360°) and compared to a second threshold Φ.sub.max. In particular, the second threshold Φ.sub.max may be equal to 180°. If the phase difference ΔΦ is smaller, like in the example of FIG. 13, the first and second group vector r.sub.P, r.sub.M are simply added at 230 to calculate a binned vector r.sub.b=[x.sub.by.sub.b], i.e.:


    r.sub.b=r.sub.P+r.sub.M  (eq. 10)

    [0069] If the phase difference ΔΦ is greater, as shown in the example of FIG. 14, this could indicate that the groups correspond to pixels 3 of background and foreground objects, respectively. Either way, adding the two group vectors r.sub.P, r.sub.M would result in a flying pixel. For these reasons, one group vector r.sub.P, r.sub.M is selected as the binned vector r.sub.b at 240, namely the group vector of the bigger group, i.e. the group with the higher number of valid sample sequences:

    [00012] r b = { r P ; N P > N M r M ; N M > N P ( eq . 11 )

    where N.sub.P, N.sub.M are the numbers of valid sample sequences in the first and second group, respectively. Finally, the binned depth value D.sub.b is determined based on the binned vector r.sub.b using eq.2 and eq.5.

    [0070] If several sample sequences with several integration times It=1,2, . . . n are recorded, the binned values, e.g. the components x.sub.b, y.sub.b can be normalized as:

    [00013] [ x b , y b ] norm = 1 .Math. It = 1 n N It T It .Math. [ x b , y b ]

    with N.sub.It being the number of pixels 3 with valid sample sequences for a specific integration time and T.sub.It being the length of the integration time. This yields a normalized amplitude which shows no artificial jumps as it is independent on the number of pixels 3 considered in the binning and thus allows to apply standard image processing methods, like stray light compensation to the binned taps or amplitude.

    [0071] In the simplified version of the method indicated by the dashed line, all vectors of valid pixels 3 are added to determine the binned vector r.sub.b at 250:

    [00014] x b = .Math. It = 0 n .Math. valid pixels in binning Area d 13 , It ( eq . 12 a ) y b = .Math. It = 0 n .Math. valid pixels in binning Area d 02 , It ( eq . 12 b )

    Afterwards, the binned depth value D.sub.b is determined based on the binned vector r.sub.b. FIG. 15 shows the results for this simplified version. Comparing it to FIG. 7, significant improvement can be seen, but there are still outlier “flying pixels” having a depth outside of the depth range of the binned pixels 3. This problem is reduced if the vectors are assigned to the first and second group and are treated separately, as can be seen in FIG. 16. While FIG. 16 shows the results for a single integration time, FIG. 17 shows the results for two integration times, the first integration time being 4 times longer than the second integration time. In this case, the number of flying pixels is reduced to an all but negligible amount.

    [0072] There are two possible alternatives to checking the in equation for the phase difference ΔΦ at 220. First, one could check the following relation:


    x.sub.My.sub.P<x.sub.Py.sub.M  (eq. 13)

    If so, the method continues at 230, if not, it continues at 240. This condition is in relation to the slope of a vector (x,y) which is proportional to y/x. For the crucial case of distinguishing whether the angle between two vectors is smaller or larger than 180°, either one of the vectors has to be in quadrant 1 and the other in quadrant 3 or one vector is located in quadrant 2, the other in quadrant 4. For any other case, the distinction is trivial. If one vector is in quadrant 1 and the other vector is in quadrant 4, the angle between the two vectors is obviously larger than 180°. If one vector is in quadrant 2 and the other vector is in quadrant 3, the angle between the two vectors is obviously smaller than 180°.

    [0073] Second, one could decide whether both x.sub.P and x.sub.M are negative, which means that the object depth is close to half of the ambiguity depth. If so, the method continues at 230, if not, it continues at 240.

    [0074] FIG. 18 is a flowchart illustrating a second embodiment of the inventive method. Steps 100, 110, 120, 130 and 140 are identical to the first embodiment and will not be explained again for sake of brevity. If the sample sequence is classified as valid at 150, a pixel depth value D is determined at 155 according to eq.2 and eq. 5. After all pixels 3 in the binning area 4 have been processed, the binned depth value D.sub.b is determined by averaging the pixel depth values D:

    [00015] D b = 1 N valid pixels .Math. valid points D ( eq . 8 )

    [0075] If multiple sample sequences are acquired for each pixel 3, a pixel depth value is determined for each sample sequence individually, and eq.8 has to be modified to average over all valid sample sequences of all pixels 3 or over all valid pixels 3 of all integration times. The result of calculating the binned depth value D.sub.b is a low-resolution depth image, where the binned depth value D.sub.b represents the arithmetic mean of the valid pixels in the respective binning area 4. FIG. 19 shows an example of a depth image computed in this embodiment. In comparison to FIG. 9, which shows the result of an averaging process without the distinction between valid and invalid pixels, the effect of flying pixels is reduced. It should be noted, though, that this second embodiment of the inventive method requires calculation of pixel depth values D for each valid pixel (and each sample sequence, where applicable) on the full high resolution image, which can lead to increased computational effort and/or increased memory requirements.