IMAGE SENSORS AND SENSING METHODS TO OBTAIN TIME-OF-FLIGHT AND PHASE DETECTION INFORMATION
20230232130 · 2023-07-20
Inventors
- Nadav Geva (Tel Aviv, IL)
- Michael Scherer (Tel Aviv, IL)
- Ephraim Goldenberg (Tel Aviv, IL)
- Gal Shabtay (Tel Aviv, IL)
Cpc classification
H04N13/232
ELECTRICITY
G01S17/894
PHYSICS
H04N25/75
ELECTRICITY
H04N13/122
ELECTRICITY
H04N2013/0081
ELECTRICITY
H04N13/254
ELECTRICITY
H04N13/271
ELECTRICITY
International classification
H04N13/271
ELECTRICITY
Abstract
Indirect time-of-flight (i-ToF) image sensor pixels, i-ToF image sensors including such pixels, stereo cameras including such image sensors, and sensing methods to obtain i-ToF detection and phase detection information using such image sensors and stereo cameras. An i-ToF image sensor pixel may comprise a plurality of sub-pixels, each sub-pixel including a photodiode, a single microlens covering the plurality of sub-pixels and a read-out circuit for extracting i-ToF phase signals of each sub-pixel individually.
Claims
1. A system, comprising: a light source; an image sensor including a plurality of image sensor pixels, each image sensor pixel comprising a plurality of sub-pixels, each sub-pixel including a photodiode; a microlens covering the plurality of sub-pixels, wherein the plurality of image sensor pixels are indirect time-of-flight (i-ToF) image sensor pixels that are configured to receive light which is emitted from the light source and reflected from a scene to generate i-ToF phase signals; and a read-out circuit (ROC) for extracting the i-ToF phase signals of each sub-pixel individually, wherein each i-ToF image sensor pixel includes a switch, wherein in one state the switch is closed so that the sub-pixels together form one pixel and the ROC reads out the one pixel for generating an i-ToF depth map, and wherein in another state the switch is opened so that the ROC reads out the sub-pixels individually for generating a stereo depth map.
2. The system of claim 1, wherein the light source is in the near infrared (NIR) region.
3. The system of claim 1, wherein the i-ToF phase signals represent stereo image data as captured by a stereo camera having a vertical or a horizontal baseline.
4. The system of claim 1, wherein the system includes an application processor, and wherein the application processor is configured to generate a fused depth map by using stereo depth map data and ToF depth map data.
5. (canceled)
6. The system of claim 1, wherein the extracting of the i-ToF phase signals of each sub-pixel individually includes extracting of fewer than all i-ToF signals generated by the plurality of the sub-pixels.
7. (canceled)
8. The system of claim 6, wherein the extracted i-ToF phase signals are used to calculate a relative ToF depth map.
9. The system of claim 6, wherein the extracted i-ToF phase signals are used to calculate a 1-shot depth map.
10. The system of claim 8, wherein the relative TOF depth map is used to generate a high fps depth map stream having a fps ≥ 35.
11. The system of claim 9, wherein the 1-shot depth map is used to generate a high fps depth map stream having a fps ≥35.
12. The system of claim 1, wherein the system is integrated into a smartphone.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0055] Non-limiting examples of embodiments disclosed herein are described below with reference to figures attached hereto that are listed following this paragraph. Identical structures, elements or parts that appear in more than one figure are generally labeled with a same numeral in all the figures in which they appear. The drawings and descriptions are meant to illuminate and clarify embodiments disclosed herein, and should not be considered limiting in any way.
[0056]
[0057]
[0058]
[0059]
[0060]
[0061]
[0062]
[0063]
[0064]
[0065]
[0066]
[0067]
[0068]
[0069]
DETAILED DESCRIPTION
[0070]
[0071]
[0072] In a “binning mode”, SPs of ToF pixels may be summarized as a single “effective” pixel. In some examples, a binning mode may be implemented in the analog domain by adding the signals V.sub.out of equal phases, for example and with reference to
[0073]
[0074] Other 4-PD embodiments may include 4 SPs realized in a 4-tap ToF pixel structure, i.e. each SP i (i=1,...,4) may have 4 storage nodes PGA.sub.i - PGD.sub.i. Charges collected by each PD of the 4 PDs may be stored in the 4 storage nodes PGA.sub.i - PGD.sub.i (i=1,...,4). For example, charges collected in PD1 may be stored in each of C.sub.A1, C.sub.B1, C.sub.C1 and C.sub.D1 etc..
[0075]
[0076] For generating a ToF depth map, switch 432 is closed (not shown), so that PD1 and PD2 together form one PD. The one PD is driven in a 2-tap ToF pixel and a ToF depth map is calculated as known in the art.
[0077] For generating a stereo depth map, switch 432 is opened (as shown in
[0078] In some embodiments, the switches of all pixels included in a ToF image sensor may be controlled together, i.e. the switches of all pixels may be opened, or the switches of all pixels may be opened closed. In other embodiments, each pixel or each group of pixels may be controlled individually. For example based on information from past images or frames, one may open or close the switch of a particular pixel for calculating a stereo depth or a ToF depth of this particular pixel.
[0079]
[0080] If a pixel like 430 is used for calculating a ToF depth, the depth signal will suffer from “flying pixel” artifact. For generating a ToF depth, in a pixel like 430 PD1 and PD2 together form one PD. In the given scenario this means that the depth signals of object 1 (at z.sub.1) and object 2 (at z.sub.2) are intermixed, leading to a flying pixel depth signal (“z.sub.FP”) which provides a depth signal z.sub.1< z.sub.FP<z.sub.2.
[0081] If a pixel like 400 or like 410 is used for calculating a ToF depth, the depth signal will not suffer from “flying pixel” artifact, as for generating a ToF depth, PD1 and PD2 can be evaluated independently.
[0082]
[0083] In step 506 all phase values are output for further processing. Further processing may be performed by an application processor (AP) or any other processing device, as known in the art. The further processing includes the calculation and analysis of a stereo depth map (steps 508a-512a) as well as the calculation and analysis of a ToF depth map (steps 508b-510b). Steps 508a-512a and steps 508b-510b may be performed sequentially or in parallel such as depicted in
[0084] With reference to first and second images described above, consider a first example (“Example 1”) referring to a 2-tap pixel and a “1-shot depth map” approach. In Example 1, both step 502 and step 504 are performed once for capturing two images that in sum include 4 phases (0 deg, 90 deg, 180 deg and 270 deg). The 4 phases are output (step 506) and a ToF depth map is calculated in 508b. In other examples referring to a 2-tap pixel design, methods known in the art such as tap-shuffle and dual-frequency may be applied. For this, step 502, step 504 (and step 506) may be performed repeatedly, e.g. four times when using dual-frequency and tap-shuffle for each of the two frequencies.
[0085] In a second example (“Example 2”), referring to a 4-tap pixel design and a “1-shot depth map” approach, there may be only one image capture required, i.e. only step 502 may be performed before outputting the 4 phases in step 506.
Stereo Depth Map
[0086] In step 508a, 2D images of SP1 and of SP2 are generated. 2D images of SP1 correspond to left-side images (i.e. images that contain only image data passing the left side of the camera lens), while 2D images of SP2 correspond to right-side images (i.e. images contain only image data passing the right side of the camera lens). Generation of 2D images may be performed according to different options. In the following, we refer to Example 1.
[0087] In some examples that may be referred to as “single-phase” images, a 2D image may be generated by outputting the values of one of the four storage node signals. Exemplarily referring only to the left-side 2D image (SP1), the four existing storage node signals are: PGA1 (0 deg), PGB1 (90 deg), PGB1 (180 deg) and PGB1 (270 deg). In some examples of single-phase images, only the storage node signal containing the highest amount of image information may be output for forming the 2D image. As an example for determining a highest amount of image information, one may sum over the particular phase signals of all pixels for each storage node, and define the storage node having the largest sum as the storage node that contains the highest amount of image information.
[0088] In other examples that may be referred to as “all-phase” images, a 2D image may be generated by outputting the sum over all signals of all the storage nodes. Exemplarily for SP1, the pixel’s value may be obtained by summing PGA1 (0 deg), PGB1 (90 deg), PGB1 (180 deg) and PGB1 (270 deg).
[0089] In yet other examples of images, a 2D image may be generated by using some combination of single-phase images and all-phase-images. As an example, one may use only two out of the four existing storage node signals for generating the 2D image.
[0090] In yet other examples, a 2D image may be generated by using only storage node signals from identical frames, i.e. only from an image captured in step 502, or only from an image captured in step 504. This method for 2D image generation may be beneficial when capturing a dynamic scene where there are significant changes between the two captures in step 502 and 504, as a depth map can be calculated from each frame. In comparison to e.g. a depth map generated by ToF with using tap-shuffle and dual-frequency, for 2-tap and 4-tap ToF this corresponds to x8 and x4 increase in depth map fps respectively.
[0091] In yet other examples where more than two frames are captured (i.e. where steps 502-506 are performed repeatedly), a 2D image may be generated by averaging over storage node signals from different frames. For example, one may average over identical phases of all captured frames or one may average over particular phases (e.g. PGA1 and PGA2) of all captured frames or some of the captured frames.
[0092] In step 510a, left-side and right-side 2D images are used to calculate a stereo depth map. As known, for a regular stereo vision system having two apertures spatially separated by baseline B, an object’s distance can be calculated and/or estimated using equation 1:
where Z′ is the depth estimation for a particular pixel which may to be calculated by a processing unit, f is the camera’s focal length, D is the disparity in pixels, and ps is the pixel size of the image sensor. The disparity in pixels refers to the property of stereo vision systems (e.g. to a dual-camera) that, when after image alignment an object point in focus is imaged to two different image points in the two output images, the magnitude of this difference is the disparity D. Via the measurement of the disparity D between two aligned stereo images, the depth of an object can be calculated according to the equation 1.
[0093] For the regular stereo vision system see above, disparity D is given by
with Z being the object-lens-distance of an object point. For an object at infinity, D approaches zero.
[0094] For a 2PD camera as described above, the disparity is zero for an object point in focus, i.e. in focus the stereo image pair entirely overlaps. So for the 2PD camera with baseline B=aperture radius, disparity D is given by
with z.sub.0 being the distance from the lens to the focus plane.
[0095] In step 512a, the stereo depth map is analyzed. The analysis may assign a confidence score to particular pixels or segments of pixels of the depth map. A high confidence score may refer to a high quality depth information, and a low confidence score may refer to a low quality depth information. Low quality depth information may e.g. be obtained for captured scene segments that do not include clearly visible textures, contours or any other contrast gradients that are required for aligning the stereo images and for determining disparity D, and/or have medium (3-5 m) or large (>5 m) lens-object distances.
[0096] Additionally, the analysis may assign a resolution score to particular pixels or segments of pixels of the depth map. The resolution score may serve as a measure of the depth resolution and/or the spatial resolution (i.e. pixel resolution) of the depth map.
[0097] The resolution score and the confidence score of a stereo depth are called “stereo score”.
ToF Depth Map
[0098] In step 508b, the object-lens distance (i.e. depth) of all object points in a scene is calculated by using the 4 phases (0 deg, 90 deg, 180 deg and 270 deg) as known in the art for ToF. In some examples, before calculating the ToF depth image, all or some of the phase signals of the SPs that have identical phase relation may be summed (e.g. by “binning” as described above). An identical phase relation may be given for PGA1 and PGA2 as well as for PGB1 and PGB2 etc. In other examples, the ToF depth image may be calculated by using the phase signals of each of the SPs individually, i.e. a plurality of ToF depth images may be calculated. In some examples, one may fuse the plurality of ToF depth images to obtain a single ToF depth image. In other examples, one may average the plurality of ToF depth images to obtain a single ToF depth image.
[0099] In step 510b, the ToF depth map is analyzed. The analysis may assign a confidence score to particular pixels or segments of pixels of the depth map. A high confidence score may refer to a high quality depth information, and a low confidence score may refer to a low quality depth information. Low quality depth information may be obtained for ToF depth map segments that include: [0100] specular objects which do not reflect much light in direction of the ToF sensor; [0101] a high amount of ambient or background light; [0102] fast moving objects that lead to motion blur artifacts; [0103] “flying pixel” and “multi-path” artifacts as known in the art; [0104] multi-user interference as known in the art, or [0105] large (>4 m) lens-object distances.
[0106] Additionally, the analysis may assign a resolution score to particular pixels or segments of pixels of the ToF depth map. Resolution score and confidence score of a ToF depth map are called ToF score.
Fusion of Stereo and ToF Information
[0107] In step 514, a high-quality depth map is generated by fusing stereo and ToF depth map segments as known in the art. In some examples, one may consult measures such as a confidence score or a resolution score in order to decide whether the stereo depth map or the ToF depth map is to be used for the particular segment of the fused depth map.
[0108] In step 516, the fused depth map generated in step 514 is output to a program or user. In some examples, the fused depth map generated in step 514 may include stereo depth information or ToF depth information only. A depth image including stereo depth information only may e.g. be beneficial for obtaining a stream of depth maps having high fps, i.e. a fast depth map mode, as from the 2PD stereo image pair a depth map can be calculated for each frame.
[0109] In examples for fast depth map modes, a ToF pixel such as 2-tap ToF pixel 400 may be operated in a high fps mode that does not support ToF depth calculation.
[0110] Consider an example (“Example 3”) for achieving a high fps depth map stream by including stereo depth information only: one may capture a first phase image in step 502 and output the phase of this first phase image in step 506 without capturing a second phase image in step 504. From this first image, a stereo depth map may be calculated in step 510a which is output in step 516.
[0111] Another example (“Example 4”) for achieving a high fps depth map stream may be based on a reduced read out scheme and including stereo depth information only. Here and in the following, a depth map fps may be called “high” for fps=30 or more, e.g. fps=60 or fps=240. In example 4, one may expose a pixel such as pixel 400 and collect charges in the storage nodes as known in the art. However, for the sake of higher fps one may e.g. read out only PGA1 and PGA2, but one may not read out PGB1 and PGB2. This is in contrast with the commonly performed reading out of PGA1, PGA2, PGB1 and PGB2 that are required for ToF depth map generation. The overall cycle time T.sub.cycle required for phase image capturing comprises an “integration” phase lasting the integration time Tint which may e.g. be about 0.1 ms - 5 ms, and a read out phase lasting the read out time T.sub.read. In general, T.sub.read takes a significantly larger share of T.sub.cycle than T.sub.int. As an example with relevance for a modern 4-tap ToF image sensor, T.sub.read may e.g. make up about 50% - 90% of T.sub.cycle, and T.sub.read may be about T.sub.read = 5.Math.T.sub.int- 25.Math.T.sub.int. Here, T.sub.read is the time required for reading out all taps, and it can be reduced by not reading all taps. So referring to a 2-tap pixel where only one tap per SP is read out, T.sub.cycle can be reduced by 10% - 100%, leading to a fps increase by 10% -100%. Referring to a 4-tap ToF pixel such as pixel 410 where only one tap per SP is read out, T.sub.cycle can be reduced by 10% - 300% leading to a fps increase by 10% - 300%. For example, one may read out only PGA1 and PGA2 but not read out PGB-PGD1 and PGB-PGD2. The phase images of only PGA1 and PGA2 may be used for extracting a stereo depth map. Whereas we refer here to reading out PGA1 and PGA2 only, and not reading out all other storage nodes, one may, in an analog manner, only read PGB1 and PGB2. Other possibilities may include reading out only PGA1 and PGB2 and not reading out all other storage nodes, etc. One may select which storage node pair to read out according to a pre-defined read-out scheme, e.g. such as always reading out PGA1 and PGA2 only. In other examples one may select the read-out scheme dynamically, e.g. according to the amount of scene information stored in the respective storage nodes. For example, one may determine in pre-view, i.e. before the actual depth map is captured according to steps 502-516, which storage node pair (such as PGA1 and PGA2, or PGB1 and PGB2 etc.) includes the highest amount of image information.
[0112] In other examples for fast depth map modes, a ToF pixel may be operated in a high fps mode that supports calculation of a relative ToF depth map. A relative depth map provides a depth value for a particular pixel not as an absolute depth value (such as e.g. a depth of 1 m or 1.5 m), but only as a ratio of the depth of the other pixels in the sensor. As an example, the depth value of a particular pixel located at a position (i, j) in the sensor array may be d.sub.ij. Value d.sub.ij may have no absolute depth assigned, but may be expressed in terms of other pixels in the sensor, e.g. depth value d.sub.ij may be 75% the depth value of a neighboring pixel at a position (i+1,j), i.e. d.sub.ij = 0.75.Math. d.sub.i+1j. Wherein for the calculation of an absolute depth map four phase signals are required, for calculating a relative depth only two (or more) phase signals are required.
[0113] Consider an example (“Example 5”) relevant for a 4-tap pixel such as pixel 410: for achieving a high fps depth map stream including a relative ToF depth map a reduced read out scheme as described in Example 4 may be used. The 4-tap pixel may be integrated in a “gated ToF” system as known in the art, i.e. the light source of the ToF system may emit a rectangular pulse. In gated ToF, the storage nodes correspond to particular depth slices in a scene. One may therefore select which storage node pairs to read out according to which depth slices are considered to carry the most relevant or important information of a scene. E.g. one may read out only the pairs PGA1 and PGA2 as well as PGB1 and PGB2, but one may not read out the pairs PGC1 and PGC2 as well as PGD1 and PGD2. This may allow for a fps increase of the depth map stream of 10%-100%.
[0114] Another example (“Example 6”) is relevant for a 2-tap pixel such as pixel 400 and for achieving a high fps depth map stream including a relative ToF depth map. A reduced read out scheme may e.g. be: [0115] in step 502, read out only PGA1 and PGA2 (which may sample the 0 deg phase) but do not read out PGB1 and PGB2 (which may sample the 180 deg phase). [0116] in step 504, read out only PGA1 and PGA2 (which may sample the 90 deg phase) but do not read out PGB1 and PGB2 (which may sample the 270 deg phase).
[0117] This may allow for a fps increase of the depth map stream of 10%-100%.
[0118] In some examples, the combination or fusion of stereo depth and ToF depth may be used for overcoming the ToF depth ambiguity, e.g. instead of using the dual-frequency modulation. So instead of using a second and additional modulation/demodulation frequency, mitigating depth ambiguity may be performed by using the stereo depth map calculated in step 510a. Also this can be used for increasing fps of a depth map stream.
[0119] A yet another example (“Example 7”) is especially relevant for a pixel like 2-tap pixel 430. In a first example of example 7 (switch 432 open) for generating a stereo depth map, only steps 508a, 510a and 512a may be performed, and steps 508b and 510b may not be performed. In a second example of example 7 (switch 432 closed) for generating a ToF depth map, only steps 508b and 510b may be performed and steps 508a, 510a and 512a may not be performed.
[0120] In a yet another example (“Example 8”) and for a pixel like 2-tap pixel 430, in a further step that preceeds step 502, it may be decided for each pixel (or group of pixel) whether it is used as a ToF pixel or as a stereo pixel. For pixel 430 used as ToF pixel, switch 432 is closed, for pixel 430 being used as stereo pixel, switch 432 is opened. The decision whether to use a particular pixel as a ToF or as a stereo pixel, may e.g. be decided based on the ToF score and/or the stereo score that are obtained from prior depth images. In some examples for generating a depth map only using stereo image data, one may operate a 2PD ToF pixel as described herein in a “passive” manner, i.e. one may not use the light source of the ToF system but one may rely on the ambient or background illumination only.
[0121]
[0122]
[0123] In some examples, pixels with pixel layout 602 or 602′ may be “sparsely” integrated into an image sensor, i.e. these 2PD ToF pixels may be surrounded by regular (i.e. non-2PD) ToF pixels. A “next” 2PD ToF pixel may e.g. be located 5 or 10 or 25 or 50 pixels away from a 2-PD pixel with a pixel layout such as 602 or 602′. In other examples and such as shown in
[0124]
[0125]
[0131] As a rule of thumb known in the art, for meaningful depth estimation a disparity of~0.5pixel or more is required. Accordingly and with reference to
[0132] In some examples, techniques for stereo baseline magnification such as e.g. described by Zhou et al. in “Stereo Magnification: Learning view synthesis using multiplane images” published in [ACM Trans. Graph., Vol. 37, No. 4, Article 65. Publication date: August 2018] may be used.
[0133] While this disclosure describes a limited number of embodiments, it will be appreciated that many variations, modifications and other applications of such embodiments may be made. In general, the disclosure is to be understood as not limited by the specific embodiments described herein, but only by the scope of the appended claims.
[0134] All references mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual reference was specifically and individually indicated to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present application.