Active stereo with adaptive support weights from a separate image
10929658 · 2021-02-23
Assignee
Inventors
- Adam G. Kirk (Renton, WA, US)
- Christoph Rhemann (Cambridge, GB)
- Oliver A. Whyte (Seattle, WA, US)
- Shahram Izadi (Cambridge, GB)
- Sing Bing Kang (Redmond, WA)
Cpc classification
G06F3/0659
PHYSICS
G06F12/00
PHYSICS
B29C64/386
PERFORMING OPERATIONS; TRANSPORTING
B29C64/00
PERFORMING OPERATIONS; TRANSPORTING
A63F13/213
HUMAN NECESSITIES
G02B27/4233
PHYSICS
G06F11/3024
PHYSICS
G01B11/2545
PHYSICS
H04N13/239
ELECTRICITY
G01B11/2513
PHYSICS
G01B11/25
PHYSICS
H04N13/25
ELECTRICITY
H04N23/11
ELECTRICITY
G02B27/4205
PHYSICS
H04N13/254
ELECTRICITY
H04N2013/0081
ELECTRICITY
H04N13/271
ELECTRICITY
International classification
H04N13/25
ELECTRICITY
H04N13/254
ELECTRICITY
H04N13/271
ELECTRICITY
H04N17/00
ELECTRICITY
G01B11/25
PHYSICS
B29C64/00
PERFORMING OPERATIONS; TRANSPORTING
H04N13/239
ELECTRICITY
G02B27/42
PHYSICS
B29C64/386
PERFORMING OPERATIONS; TRANSPORTING
G06F12/00
PHYSICS
G06F9/30
PHYSICS
Abstract
Systems and methods for stereo matching based upon active illumination using a patch in a non-actively illuminated image to obtain weights that are used in patch similarity determinations in actively illuminated stereo images is provided. To correlate pixels in actively illuminated stereo images, adaptive support weights computations are used to determine similarity of patches corresponding to the pixels. In order to obtain adaptive support weights for the adaptive support weights computations, weights are obtained by processing a non-actively illuminated (clean) image.
Claims
1. A method comprising: processing a plurality of images including a first image, a second image, and a third image, wherein the first image and the second image are actively illuminated stereo images, and the third image is a non-actively illuminated image; determining a first patch in the first image based on a location of a first pixel of interest, the first patch comprising a first plurality of pixels surrounding the first pixel of interest; determining a second patch in the second image based on a location of a second pixel corresponding to the first pixel, the second patch comprising a second plurality of pixels surrounding the second pixel; determining a third patch in the third image based on a location of a third pixel corresponding to at least one of the first pixel and the second pixel, the third patch comprising a third plurality of pixels surrounding the third pixel; and optimizing a similarity score between the first patch and the second patch in the actively illuminated stereo images by: determining weights for the third plurality of pixels in the third patch in the third image based upon a similarity between the third pixel and respective pixels in the third plurality of pixels; and using the weights of the third plurality of pixels in the third patch to determine the similarity score between the first patch and the second patch in the actively illuminated stereo images.
2. The method of claim 1, wherein the plurality of images are of a scene actively illuminated with infrared light in a part of an infrared spectrum, and wherein capturing the non-actively illuminated image includes using a notch filter to capture the scene with the part of the infrared spectrum that contains the active illumination removed.
3. The method of claim 1, wherein capturing the plurality of images comprises capturing one actively illuminated stereo image and the non-actively illuminated image active via same optical path.
4. The method of claim 1, wherein capturing the plurality of images comprises using one camera to capture one actively illuminated stereo image in one frame and using a second camera to capture the non-actively illuminated image in another frame.
5. The method of claim 1 further comprising using the weights to determine similarity via an adaptive support weights algorithm.
6. A system comprising: an image processing component comprising a matching algorithm; and image capturing component that captures a plurality of images including actively illuminated stereo images and a non-actively illuminated image, wherein the image processing component is configured to: determine a first patch for the non-actively illuminated image based on a location of a first pixel in a first image; determine a second patch for a first actively illuminated stereo image based on a location of a second pixel in the first actively illuminated stereo image, the second pixel corresponding to the first pixel; determine a third patch for a second actively illuminated stereo image based on a location of a third pixel in the second actively illuminated stereo image, the third pixel corresponding to one of the first pixel and the second pixel; and optimize a similarity score between the second patch and the third patch in the actively illuminated stereo images by: processing the first, second, and third patch using the matching algorithm, the matching algorithm configured to process the first patch in the non-actively illuminated image to determine weights corresponding to pixels in the first patch, and to use the weights determined in the non-actively illuminated image to determine similarity between the second and third patches in the first and second actively illuminated stereo images.
7. The system of claim 6, wherein the matching algorithm linearly scans pixels in at least one of the actively illuminated images to look for matching pixels based upon patch similarity.
8. The system of claim 6, wherein the matching algorithm is further configured to determine matching pixel data, and wherein the matching algorithm is coupled to or incorporates a depth processing algorithm that processes the matching pixel data to generate a depth map.
9. The system of claim 6, wherein the actively illuminated stereo images comprise infrared (IR) images, and wherein the non-actively illuminated image comprises a red, green and blue (RGB) image.
10. The system of claim 6, wherein the actively illuminated stereo images comprise RGB images, and wherein the non-actively illuminated image comprises an IR image.
11. The system of claim 6, wherein the actively illuminated stereo images comprise IR images, and wherein the non-actively illuminated image comprises an IR image filtered with a notch filter that removed active illumination.
12. The system of claim 6, wherein the image capturing component comprises a device including a plurality of cameras and an active illumination projector.
13. The system of claim 6, wherein the matching algorithm uses the weights to determine similarity via an adaptive support weights algorithm.
14. The system of claim 6, wherein the image capturing component includes two cameras that share an optical path via reflection, wherein one of the two cameras captures one of the actively illuminated stereo images, and the other of the two cameras captures the non-actively illuminated image.
15. The system of claim 6, wherein the image capturing component includes one camera that captures one of the actively illuminated stereo images and the non-actively illuminated image.
16. The system of claim 15, wherein the camera includes a splitter mechanism configured to split incoming light into one of the actively illuminated stereo images and the non-actively illuminated image.
17. The system of claim 15, wherein the camera includes a Bayer pattern on sensed pixels to receive light that includes the active illumination on one subset of pixels and receive light that does not include the active illumination on another subset of pixels.
18. One or more computer-readable memory devices having executable instructions, that when executed by a processor, cause the processor to perform operations, comprising: receiving actively illuminated stereo infrared (IR) images; receiving a non-actively illuminated image; determining a first patch in the non-actively illuminated image based on a location of a first pixel in the non-actively illuminated image; determining a second patch in a first actively illuminated stereo IR image based on a location of a second pixel in the first actively illuminated stereo IR image, the second pixel corresponding to the first pixel; determining a third patch in a second actively illuminated stereo IR image based on a location of a third pixel in the second actively illuminated stereo IR image, the third pixel corresponding to one of the first pixel and the second pixel; and optimizing a similarity score between the second patch and the third patch in the actively illuminated stereo images by: obtaining adaptive support weights for the first patch in the non-actively illuminated image; and using the adaptive support weights in an active support weights computation to determine similarity of the second patch and the third patch.
19. The method of claim 1, wherein the similarity score is based on texture data corresponding to the first patch and the second patch, and wherein the weights of the third plurality of pixels are based on color similarities.
20. The method of claim 1, wherein using the weights of the third plurality of pixels in the third patch to determine the similarity score comprises using the weights of the third plurality of pixels in the third patch to determine a similarity score between a weighted match score associated with the first patch and a weighted match score associated with the second patch in the actively illuminated stereo images.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The present invention is illustrated by way of example and not limited in the accompanying figures in which like reference numerals indicate similar elements and in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
DETAILED DESCRIPTION
(9) Various aspects of the technology described herein are generally directed towards using a non-actively illuminated image to provide adaptive support weights for two actively illuminated stereo images that are being processed to find matching pixels therein. For example, a third camera may be used to capture a third (non-actively illuminated) image via light from a part of the spectrum (e.g., visible light) that is different from the active illumination spectrum (e.g., infrared) that is sensed in the captured stereo images. In general, in the non-actively illuminated image, the active illumination pattern is not visible, whereby the general assumption that pixels with similar depths have similar colors holds true. Thus, for any pixel being evaluated in the actively illuminated stereo images, adaptive support weights can be determined based upon similarities (e.g., color similarities) between the counterpart pixel and its patch's pixels in the non-actively illuminated image. As a result, adaptive support weights are able to be used in active stereo image matching.
(10) It should be understood that any of the examples herein are non-limiting. As such, the present invention is not limited to any particular embodiments, aspects, concepts, structures, functionalities or examples described herein. Rather, any of the embodiments, aspects, concepts, structures, functionalities or examples described herein are non-limiting, and the present invention may be used various ways that provide benefits and advantages in active depth sensing and image processing in general.
(11)
(12) In the example of
(13) In
(14) The images captured by the cameras 101-104 are provided to an image processing system or subsystem 118. In some implementations, the image processing system 118 and image capturing system or subsystem 104, or parts thereof, may be combined into a single device. For example a home entertainment device may include all of the components shown in
(15) The image processing system or subsystem 118 includes a processor 120 and a memory 122 containing one or more image processing algorithms, including a stereo matching algorithm 124 as described herein. This may be in hardware logic, firmware and/or in software. In general, in one implementation the stereo matching algorithm 124 determines which dots in a left IR image correlate with which dots in a right IR image, (block 130) whereby depth data may be determined by further processing disparities between matching dots; a depth map thus may be computed.
(16) Also shown in
(17) Note that a calibrated projector may be treated as a camera. That is, if the projected pattern is known, and the projector is calibrated (e.g., its position/orientation/focal length and so forth are known), then patch-based stereo (as described herein) between the known projector image (which as used herein may be considered a captured image) and the actively-illuminated camera image, using adaptive support weights computed from the non-actively-illuminated image, may be performed. Thus, an alternative system may comprise one calibrated projector, one camera to capture an actively-illuminated image, and one camera to capture a non-actively-illuminated image.
(18)
(19) More particularly, when computing a match score between a pixel p in the left image 201 (also referred to as I.sub.L) and a pixel q in the right image 202 (also referred to as I.sub.R), and the relative positions and orientations of the three cameras are known, the position of the pixel (denoted s) in the third image 203 (also referred to as I.sub.W) where the corresponding point would be visible, if p and q did indeed match. The matching algorithm 124 computes the weighted match score between the patches 223 and 224 around p and q, with weights 228 taken from the patch 225 around s in the third image, denoted I.sub.W:
(20)
where is a scalar parameter.
(21) The contributions of the different parts of the patch 225 are thus weighted based upon other pixels' similarities (e.g., color) in the patch 225 to the pixel s. These weights 228 may be used as if extracted from the actively illuminated images, that is, they are used when computing the NCC or SSD, e.g., based upon conventional Adaptive Support Weights technology, except with externally determined weights. Note that NCC may benefit from having weights decoupled from the patches being processed with those weights.
(22) With the pixel match data 222, further stereo depth processing 230 may determine a depth map 232. For example, disparities in one or more features between matched pixels (e.g., along with triangulation) may be used to determine depth.
(23) In one implementation, the left image's pixel p is chosen as the reference pixel, with the right image 202 scanned along a line to find candidate q pixels to find a best match, with the s pixel in the image 103 re-determined as the scanning progresses. Notwithstanding, this may be reversed in other implementations, e.g., the left image may be scanned with the right image used as the reference point.
(24) In another alternative, the pixels (e.g., the pixel s) in the non-actively illuminated image 203 may be chosen as the reference points. In this situation, both left and right images 201 and 202, respectively, may be simultaneously processed to look for matching pixels based upon Adaptive Support Weights techniques.
(25) As can be readily appreciated, various possible other camera combinations may benefit from the technology described herein. For example, instead of the configuration in
(26) Another alternative is to use filtering, as generally represented in
(27) Time slicing also may be used. For example, the same camera may capture one actively illuminated frame followed by one non-actively illuminated frame. If the frame rate is fast enough relative to any motion in the scene being captured, the pixel matching may be based on using weights extracted from the non-actively illuminated frame.
(28) Turning to another aspect, the equations exemplified herein are presented in a simplified form with respect to a three camera setup, using square patches having identical patch sizes in the three images. In reality, a square patch from one image will appear distorted in both the other two images, and may also have a different size. However, the distortions and size differences may be compensated for in known ways, and in general the underlying concepts are identical.
(29) Notwithstanding, to reduce such effects, in another aspect, two cameras may share the same optical path, one for capturing the actively illuminated image and another for capturing the non-actively illuminated image. Having the same optical path simplifies the computations, e.g., the p and s pixels (or the q and s pixels) shown in
(30) As another alternative, an optical path may be the same for an actively illuminated image and a non-actively illuminated image by having one camera configured with optics/filtering to provide separate images. Thus, instead of the third camera being a separate physical device that captures images from a different viewpoint relative to one or both cameras of the stereo pair, a third camera may be integrated into one of the stereo cameras such that differently illuminated images are captured from the same viewpoint. For example, as in
(31) Alternatively, the mechanism 552 represents that one of the stereo cameras has a Bayer pattern on the pixels whereby some pixels receive light that includes the active illumination, and others do not. From such a single sensor it is possible to produce the two images (one image 554 with and one image 556 without the active illumination) for use in matching with the other (e.g., right camera 558) image 559.
(32)
(33) Using the patch in the non-actively illuminated image, in step 608, the weights are determined, e.g., based upon color similarities of other pixels in the patch with the central pixel. These weights are used in step 610 to compute a patch similarity score between the actively illuminated images.
(34) Step 612 repeats the process (e.g., linearly scanning pixels) until the patch-based similarity scores are obtained for pixels that may match. The highest score may be used to determine the pixel that matches the reference pixel, which is output as part of the matched pair at step 614.
(35) Note that while color similarity is used as one measure for determining relative weights, other types of similarity may be used. For example, other captured data may include texture data. As one example, texture may be used as a measure to determine possible similarity, using large patches. If not sufficiently similar, a new pixel/patch is chosen as a candidate for matching, and so on. However, if sufficiently similar, a zoomed-in patch may be used, such as for color similarity to determine weights as described herein. This may increase accuracy in pixel matching, at the cost of larger patch processing and multiple-stage patch matching.
(36) Example Operating Environment
(37) It can be readily appreciated that the above-described implementation and its alternatives may be implemented on any suitable computing device, including a gaming system, personal computer, tablet, DVR, set-top box, smartphone and/or the like. Combinations of such devices are also feasible when multiple such devices are linked together. For purposes of description, a gaming (including media) system is described as one exemplary operating environment hereinafter.
(38)
(39) The CPU 702, the memory controller 703, and various memory devices are interconnected via one or more buses (not shown). The details of the bus that is used in this implementation are not particularly relevant to understanding the subject matter of interest being discussed herein. However, it will be understood that such a bus may include one or more of serial and parallel buses, a memory bus, a peripheral bus, and a processor or local bus, using any of a variety of bus architectures. By way of example, such architectures can include an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, an Enhanced ISA (EISA) bus, a Video Electronics Standards Association (VESA) local bus, and a Peripheral Component Interconnects (PCI) bus also known as a Mezzanine bus.
(40) In one implementation, the CPU 702, the memory controller 703, the ROM 704, and the RAM 706 are integrated onto a common module 714. In this implementation, the ROM 704 is configured as a flash ROM that is connected to the memory controller 703 via a Peripheral Component Interconnect (PCI) bus or the like and a ROM bus or the like (neither of which are shown). The RAM 706 may be configured as multiple Double Data Rate Synchronous Dynamic RAM (DDR SDRAM) modules that are independently controlled by the memory controller 703 via separate buses (not shown). The hard disk drive 708 and the portable media drive 709 are shown connected to the memory controller 703 via the PCI bus and an AT Attachment (ATA) bus 716. However, in other implementations, dedicated data bus structures of different types can also be applied in the alternative.
(41) A three-dimensional graphics processing unit 720 and a video encoder 722 form a video processing pipeline for high speed and high resolution (e.g., High Definition) graphics processing. Data are carried from the graphics processing unit 720 to the video encoder 722 via a digital video bus (not shown). An audio processing unit 724 and an audio codec (coder/decoder) 726 form a corresponding audio processing pipeline for multi-channel audio processing of various digital audio formats. Audio data are carried between the audio processing unit 724 and the audio codec 726 via a communication link (not shown). The video and audio processing pipelines output data to an A/V (audio/video) port 728 for transmission to a television or other display/speakers. In the illustrated implementation, the video and audio processing components 720, 722, 724, 726 and 728 are mounted on the module 714.
(42)
(43) In the example implementation depicted in
(44) Memory units (MUs) 750(1) and 750(2) are illustrated as being connectable to MU ports A 752(1) and B 752(2), respectively. Each MU 750(1), 750(2), 750(3), 750(4), 750(5), and 750(6) offers additional storage on which games, game parameters, and other data may be stored. In some implementations, the other data can include one or more of a digital game component, an executable gaming application, an instruction set for expanding a gaming application, and a media file. When inserted into the console 701, each MU 750 can be accessed by the memory controller 703.
(45) A system power supply module 754 provides power to the components of the gaming system 700. A fan 756 cools the circuitry within the console 701.
(46) An application 760 comprising machine instructions is typically stored on the hard disk drive 708. When the console 701 is powered on, various portions of the application 760 are loaded into the RAM 706, and/or the caches 710 and 712, for execution on the CPU 702. In general, the application 760 can include one or more program modules for performing various display functions, such as controlling dialog screens for presentation on a display (e.g., high definition monitor), controlling transactions based on user inputs and controlling data transmission and reception between the console 701 and externally connected devices.
(47) The gaming system 700 may be operated as a standalone system by connecting the system to high definition monitor, a television, a video projector, or other display device. In this standalone mode, the gaming system 700 enables one or more players to play games, or enjoy digital media, e.g., by watching movies, or listening to music. However, with the integration of broadband connectivity made available through the network interface 732, gaming system 700 may further be operated as a participating component in a larger network gaming community or system.
CONCLUSION
(48) While the invention is susceptible to various modifications and alternative constructions, certain illustrated embodiments thereof are shown in the drawings and have been described above in detail. It should be understood, however, that there is no intention to limit the invention to the specific forms disclosed, but on the contrary, the intention is to cover all modifications, alternative constructions, and equivalents falling within the spirit and scope of the invention.