Apparatus, method and computer program for performing object recognition

Abstract

An apparatus for performing object recognition includes an image camera to capture a first resolution image and a depth map camera to capture a second resolution depth map. The first resolution is greater than the second resolution. The apparatus is configured to perform object recognition based on the image and the depth map.

Claims

1. An apparatus for performing object recognition, the apparatus comprising: an image camera to capture a first resolution image; a depth map camera to capture a second resolution depth map, wherein the first resolution is greater than the second resolution; and processing circuitry configured to: determine whether a detected face matches a known face based on the image and the depth map; and responsive to a determination that the detected face matches the known face, determine whether the detected face is two-dimensional or three-dimensional, wherein the processing circuitry is configured to: align the depth map with the image based on a distance between the image camera and the depth map camera and the distance between the apparatus and the detected face; and determine a tone of the detected face based on the image and compare the tone of the detected face with a tone of the known face.

2. The apparatus according to claim 1, wherein the image camera comprises an array of image pixels, the depth map camera comprises an array of depth map pixels, and a resolution of the array of image pixels is greater than a resolution of the array of depth map pixels.

3. The apparatus according to claim 2, wherein the resolution of the array of image pixels is between 1 Mpixels and 12 Mpixels and the resolution of the array of depth map pixels is between 0.4 kpixels and 308 kpixels.

4. The apparatus according to claim 1, wherein the image pixels are smaller than the depth map pixels.

5. The apparatus according to claim 1, wherein the image pixels comprise red pixels, green pixels and blue pixels.

6. The apparatus according to claim 1, wherein the depth map pixels comprise infrared pixels.

7. The apparatus according to claim 1, wherein the image camera and the depth map camera are arranged having a field of view of the image camera and a field of view of the depth map camera that are overlapping in an overlapping region.

8. The apparatus according to claim 1, wherein to determine whether the detected face matches the known face includes to determine whether the detected face is a three-dimensional face.

9. The apparatus according to claim 1, wherein the processing circuitry is configured to determine whether the detected face is two-dimensional or three-dimensional based on the depth map.

10. The apparatus according to claim 1 wherein the processing circuitry is configured to, responsive to a determination that the detected face is three-dimensional, determine whether the detected face matches the known face based on the depth map.

11. The apparatus according to claim 1, wherein the processing circuitry is configured to determine a size of the detected face based on the depth map.

12. The apparatus according to claim 1, wherein the processing circuitry is configured to upscale the resolution of the depth map to match the resolution of the image.

13. The apparatus according to claim 1, wherein the processing circuitry is configured to: determine a position of the detected face in the depth map; and determine a position of the detected face in the image based on the position of the detected face in the depth map.

14. The apparatus according to claim 1, wherein the depth map camera comprises at least one component configured to perform at least one of: determine whether the detected face is two-dimensional or three-dimensional based on the depth map; determine a shape of the detected face based on the depth map and compare the shape of the detected face with a shape of the known face; determine features of the detected face based on the depth map and compare the features of the detected face with one or more features of the known face; determine positions of features of the detected face based on the depth map and compare the positions of the features of the detected face with positions of one or more features of the known face; determine the distance between the apparatus and the detected face based on the depth map; determine a shortest distance between the apparatus and the detected face based on the depth map; determine a distance between the apparatus and a center of the detected face based on the depth map; determine a size of the detected face based on the depth map; determine a length of the detected face along a major axis and a width of the detected face along a minor axis; upscale the second resolution of the depth map to match the first resolution of the image; or determine a position of the detected face in the depth map.

15. The apparatus according to claim 1, wherein the image camera comprises at least one component configured to: determine a position of the detected face in the image based on a position of the detected face in the depth map.

16. A system, comprising: an image camera configured to capture an image having a first resolution; a depth map camera configured to capture a depth map having a second resolution, the first resolution being greater than the second resolution; and processing circuitry coupled to the image camera and depth map camera and configured to perform facial recognition based on the captured image and depth map, wherein the processing circuitry is configured to, responsive to a determination that a detected face matches a known face, determine whether the detected face is two-dimensional or three-dimensional based on the depth map, wherein the processing circuitry is configured to: align the depth map with the image based on a distance between the image camera and the depth map camera and a distance between the apparatus and the detected face; and determine a tone of the detected face based on the image and compare the tone of the detected face with a tone of the known face.

17. The system according to claim 16, wherein the system comprises a mobile phone, a tablet computer, a desktop computer, a laptop computer, a video game console, a video door or a smart watch.

18. A method for performing facial recognition, the method comprising: capturing a first resolution image with an image camera; capturing a second resolution depth map with a depth map camera, wherein the second resolution is less than the first resolution; and performing facial recognition based on the image and the depth map, the performing facial recognition including: determining whether the detected face matches with a known face; responsive to determining that the detected face matches the known face, determining whether the detected face is two-dimensional or three-dimensional; aligning the depth map with the image based on a distance between the image camera and the depth map camera and a distance between the apparatus and the detected face; and determining a tone of the detected face based on the image and comparing the tone of the detected face with a tone of the known face.

19. A non-transitory computer-readable medium having instructions stored thereon that, when executed by one or more computer processors, cause the one or more computer processors to: capture a first resolution image with an image camera; capture a second resolution depth map with a depth map camera, wherein the first resolution is greater than the second resolution; and perform facial recognition based on the captured first resolution image and second resolution depth map, the performing facial recognition including: determining whether the detected face matches with a known face; responsive to determining that the detected face matches the known face, determining whether the detected face is two-dimensional or three-dimensional; aligning the depth map with the image based on a distance between the image camera and the depth map camera and a distance between the apparatus and the detected face; and determining a tone of the detected face based on the image and comparing the tone of the detected face with a tone of the known face.

20. The non-transitory computer-readable medium of claim 19, wherein to perform facial recognition includes to determine whether a face detected in the first resolution image matches with a known face based at least in part on the first resolution image.

21. The method of claim 18, wherein performing facial recognition based on the image and the depth map comprises: determining whether the detected face matches with a known face based on the first resolution image; and responsive to determining that the detected face matches the known face, determining based on the depth map whether the detected face is two-dimensional or three-dimensional.

22. The apparatus of claim 1 wherein the circuitry is configured to initiate facial recognition using the depth map and, in response to the depth map indicating a match, initiate facial recognition using the image.

23. The apparatus of claim 1 wherein the circuitry is configured to selectively authorize an action based on the determination of whether the detected face is two-dimensional or three-dimensional.

24. The system of claim 16, comprising a housing.

25. The apparatus of claim 1 wherein the processing circuitry includes a circuit embedded in the image camera and a circuit embedded in the depth camera.

Description

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

(1) Reference will now be made, by way of example only, to the accompanying drawings in which:

(2) FIG. 1 shows a front view of an apparatus according to an embodiment disclosed herein;

(3) FIG. 2 shows a side view of an apparatus according to an embodiment disclosed herein;

(4) FIGS. 3 to 8 show flow diagrams of various methods according to embodiments disclosed herein; and

(5) FIG. 9 shows a block diagram of an apparatus according to an embodiment disclosed herein.

DETAILED DESCRIPTION

(6) FIGS. 1 and 2 show an apparatus 2 according to an embodiment disclosed herein. For example, the apparatus 2 may be part of a larger system such as a smart card reader, a mobile phone, a tablet computer, a laptop computer, a desktop computer, a smart watch, a video game console, etc.

(7) The apparatus 2 comprises a housing. The housing comprises a front part 4 and a rear part (not represented). The front part 4 is made of glass and is transparent (e.g., has a transmittance greater than or equal to 90%) to both visible light (e.g., 400 nm to 700 nm wavelengths) and infrared light (e.g., 800 nm to 990 nm wavelength). For example, the front part 4 may be made of alkali-aluminosilicate sheet glass to improve the impact resistance.

(8) The front part 4 comprises an external face orientated toward the outside of the apparatus 2 and an internal face orientated toward the inside of the apparatus 2. The internal face includes a coated area 6 coated with an ink 8 to prevent the internal face from being damaged by scratches. The ink 8 may be transparent (e.g., transmittance between 1% and 50%) to infrared light. The internal face also includes non-coated areas 7, 9 and 11 respectively in the optical paths of an image camera 10, a depth-map camera 12 and a screen 14.

(9) The image camera 10 and the depth-map camera 12 are typically located at the top of the apparatus 2 so that they are not obstructed when the user holds the apparatus 2.

(10) The image camera 10 comprises a first optical axis 16. The depth map 12 camera comprises a second optical axis 18. The image camera 10 and the depth map camera 12 are arranged so that the first and second optical axes 16, 18 are distant by a distance d. The image camera 10 and the depth map camera 12 are also arranged so that the first and second optical axes 16, 18 are parallel. In this way, the image planes of the image camera 10 and the depth map camera 12 are coplanar.

(11) The image camera 10 comprises a first field of view 20. The depth map camera 12 comprises a second field of view 22. The image camera 10 and the depth map camera 12 are arranged so that the first and second fields of views 20, 22 overlap within an overlapping region 24.

(12) As can be seen, in the overlapping region 24 an object, such as a face 26 of a user, can be captured by both the image camera 10 and the depth map camera 12. In a non-overlapping region 28 outside the first and second fields of views 20, 22 a face cannot be captured neither by the image camera 10 nor by the depth image camera 12. In a non-overlapping region 30 inside the first field of view 20 but outside the second field of view 22 a face can only be captured by the image camera 10. In a non-overlapping region 32 outside the first field of view 20 but inside the second field of view 22 a face can only be captured by the depth map camera 12.

(13) It will be understood that the overlapping region 24 is bounded and there is a minimum distance Zmin between the face 26 and the apparatus 2. The distance d between the first and second optical axes 16, 18 of the image camera 10 and the depth map camera 12 and the first and second fields of view of the image camera 10 and the depth map camera 12 are selected so that the distance Zmin is acceptable. For example, a distance Zmin greater than or equal to 10 cm is acceptable as it is unlikely that a user will hold the apparatus 2 closer than this. Indeed, the human eye is uncomfortable focusing on objects this close. The distance Zmin is typically equal to 20 cm.

(14) The first field of view 20 of the image camera 10 is typically greater than the second field of view 22 of the depth map camera 12. For example, the first field of view 20 of the image camera 10 may be between 50 and 60 which allows capturing a large scene without the distortion associated with wider fields of view. The second field of view 22 of the depth map camera 12 may be between 10 and 40. The infrared photons flux density is reduced with wider field of views and it becomes harder to differentiate between infrared photons from ambient light and infrared photons from an infrared emitter.

(15) The image camera 10 comprises an array 34 of red pixels, green pixels and blue pixels. The resolution of the array 34 of red pixels, green pixels and blue pixels is typically between 1 Mpixels and 2 Mpixels.

(16) Each red pixel, green pixel and blue pixel typically comprises a photodiode and transistors (e.g., 4T, 1T75, 1T5 architecture). Each red pixel, green pixel and blue pixel typically has a size between 1 m and 2 m.

(17) The image camera 10 comprises optics 36 (e.g., one or more lenses or mirrors) to direct light from the overlapping region 24 to the array 34 of red pixels, green pixels and blue pixels.

(18) The image camera 10 comprises a filter 38 to filter (i.e., block) infrared light. The filter 38 may be part of the optics 36 or separate from the optics 36. For example, the filter 38 may be a coating applied on the optics 36.

(19) Alternatively, the image camera 10 does not comprise the filter 38. Instead, the filter 38 is part of the front part 4 of the housing. For example, the filter 38 may be a coating applied on the internal face of the front part 4 of the housing.

(20) The image camera 10 comprises a readout unit 40 configured to selectively readout the red pixels, green pixels and blue pixels. The readout unit 40 typically comprises circuitry to generate a periodic reset pulse, read pulse and transfer gate pulse (TG) (e.g., TG for 4T architecture, not 3T pixels). The time between the reset pulse and TG/READ controls the exposure/integration time of the pixel. In a 4T pixel, the sense node is preferably reset shortly before the TG pulse and the pixel read out and this value is used as part of a correlated double-sampling operation to remove the reset (kTC) noise of the sense node by subtracting this signal from that obtained after the TG pulse.

(21) The image camera 10 comprises a control unit 42 configured to selectively control the red pixels, green pixels and blue pixels. The control unit 42 typically comprises circuitry to generate reset, TG and read pulses for each row of the pixel, also pulses for controlling the operation of any sample/hold and/or analog to digital (ADC) circuitry. Preferably, the control circuit monitors the signals generated by the array and adaptively adjusts their timing to ensure optimal exposure (i.e., pixels are not saturated). Preferably, the range of the ADC is adjusted to suit the range of voltages obtained by reading out the pixel array. This operation may include a programmable gain amplifier (PGA) between the pixel output of the ADC or preferably changing the voltage swing of a reference signal (e.g., ramp which is connected to one input of a comparator and the other signal of the comparator is connected to the array output). Optionally, the control unit 42 can enable/disable suitable clamping circuitry which limits the voltage excursion of the signals output from the pixel array.

(22) In this way, the image camera 10 is able to capture an image comprising a red channel, a green channel and a blue channel.

(23) The image camera 10 comprises a memory unit 44 and a processing unit 46. The memory unit 44 stores instructions which, when executed by the processing unit 46, allow the processing unit 46 to process an image and perform one or more of the steps of the methods of FIGS. 3 to 8 (discussed in further details below). The instructions may be stored during manufacture in a non-volatile part of the memory unit 44 or after manufacture (e.g., uploaded) in a volatile part of the memory unit 44. Preferably, the instructions are signed with a cryptographic key so that unauthorized instructions cannot be executed by the processing unit 46.

(24) More specifically, the processing unit 46 may be configured to detect a face based on an image.

(25) The processing unit 46 may be configured to determine whether a detected face matches with a known face based on an image.

(26) The processing unit 46 may be configured to determine the shape of a detected face based on an image and compare the shape of the detected face with a shape of a known face stored in the memory unit 44. The shape of a face may typically be a prolate spheroid.

(27) The processing unit 46 may be configured to determine features of a detected face based on an image and compare the features of the detected face with the features of a known face stored in the memory unit 44. The features of a face may typically comprise a mouth, a nose, ears or eye sockets, etc.

(28) The processing unit 46 may be configured to determine positions of features of a detected face based on an image and compare the positions of the features of the detected face with the positions of features of a known face stored in the memory unit 44. The positions of the features may comprise relative positions (e.g., positions of the features with regard to one another or with regard to the center of the face) or absolute positions (e.g., positions in the field of view 20 of the image camera 10).

(29) The processing unit 46 may be configured to align a depth map with an image knowing the distance d between the optical axes 16, 18 of the image camera 10 and the depth map camera 12 and the distance Z between the apparatus 2 and a detected face.

(30) The processing unit 46 may be configured to determine a position of a detected face in an image based on a position of a detected face in a depth map.

(31) The processing unit 46 may be configured to determine a skin tone of a detected face based on an image and compare the skin tone of the detected face with a skin tone of a known face stored in the memory unit 44.

(32) The processing unit 46 may be configured to communicate with a processing unit 48 of the depth map camera 12 or with a central processing unit 50 of the apparatus 2 via a bus. For example, the bus may be an Inter Integrated Circuit (I2C) bus or a Serial Peripheral (SPI) bus.

(33) The image camera 10 (e.g., array 34 of red pixels, green pixels, blue pixels, optics 36, filter 38, readout unit 40, control unit 42, memory unit 44 and processing unit 46) are preferably integrated on a single chip. In this way, the processing of the depth map is less likely to be spoofed by a malicious user.

(34) It will however be understood that the components of the image camera 10 could also be integrated on separate chips.

(35) The depth map 12 comprises an array 52 of infrared time-of-flight pixels. The resolution of the array 52 of infrared time-of-flight pixels may be lower than the resolution of the array 34 of red pixels, green pixels and blue pixels of the image camera 10. The resolution of the array 52 of infrared time-of-flight pixels is typically between 0.4 kpixels and 308 kpixels.

(36) Each infrared time-of-flight pixel typically comprises a Single Photon Avalanche Diode (SPAD). Again, SPADs are well-known in the art and therefore their functioning is not discussed in detail. Each infrared time-of-flight pixel of the depth map camera 12 may be larger than the red pixels, green pixels and blue pixels of the image camera 10. Each infrared time-of-flight typically has a size between 2 m and 30 m.

(37) The depth map camera 12 comprises optics 54 (e.g., one or more lenses or mirrors) to direct light from the overlapping region 24 to the array 52 of infrared time-of-flight pixels.

(38) The depth map camera 12 comprises a filter 56 to filter (i.e., block) visible light. The filter 56 may be part of the optics 54 or separate from the optics 54. For example, the filter 56 may be a coating applied on the optics 54.

(39) Alternatively, the depth map camera 54 does not comprise the filter 56. Instead, the filter 56 is part of the front part 4 of the housing. For example, the filter 56 may be a coating applied on the internal face of the front part 4 of the housing.

(40) In an implementation, the non-coated area 9 on the internal face of the front part 4 of the housing is replaced by a coated area 9 coated with the ink 8 and the ink 8 is transparent (e.g., transmittance greater than or equal to 10%) to infrared light while blocking (e.g., transmittance lower than 10%) visible light.

(41) The depth map camera 12 comprises an infrared emitter 58 to emit infrared light and optics 60 (e.g., one or more lenses or mirrors) to direct infrared light toward the overlapping region 24.

(42) The depth map camera 12 comprises a readout unit 62 configured to selectively readout the infrared time-of-flight pixels. Reading out an infrared time-of-flight pixel typically comprises detecting a pulse generated by the absorption of an infrared photon, determining a time-of-flight of the photon assuming that the photon was generated by the infrared light emitter 58 and reflected by the face 26, determining a distance between the apparatus 2 and the face 26 and generating a value indicative of a distance Z between the apparatus 2 and the face 26.

(43) The depth map camera 12 comprises a control unit 64 configured to selectively control the infrared time-of-flight pixels. Controlling an infrared time-of-flight pixel typically comprises quenching and resetting the pixel after generating a pulse.

(44) In this way, the depth map camera 12 is able to capture a depth map comprising a depth or distance or Z channel.

(45) The depth map camera 12 comprises a memory unit 66 and the processing unit 48. The memory unit 66 stores instructions which when executed by the processing unit 48 allow the processing unit 48 to process a depth map and perform one or more of the steps of the methods of FIGS. 3 to 8 (discussed in further details below). The instructions may be stored during manufacture in a non-volatile part of the memory unit 66 or after manufacture (e.g., uploaded) in a volatile part of the memory unit 66. Preferably, the instructions are signed with a cryptographic key so that unauthorized instructions cannot be executed by the processing unit 48.

(46) More specifically, the processing unit 48 may be configured to detect a face based on a depth map.

(47) The processing unit 48 may be configured to determine whether a detected face is two-dimensional or three-dimensional based on a depth map.

(48) The processing unit 48 may be configured to determine whether a detected face matches with a known a face based on a depth map.

(49) The processing unit 48 may be configured to determine the shape of a detected face based on a depth map and compare the shape of the detected face with a shape of a known face stored in the memory unit 66. The shape of a face may typically be a prolate spheroid.

(50) The processing unit 48 may be configured to determine features of a detected face based on a depth map and compare the features of the detected face with the features of a known face stored in the memory unit 66. The features of a face may typically comprise a mouth, a nose, ears or eye sockets, etc.

(51) The processing unit 48 may be configured to determine positions of features of a detected face based on a depth map and compare the positions of the features of a detected face with the positions of features of a known face stored in the memory unit 66. The positions of the features may comprise relative positions (e.g., positions of the features with regard to one another or with regard to the center of the face) or absolute positions (e.g., positions in the field of view 22 of the depth map camera 12).

(52) The processing unit 48 may be configured to determine a distance between the apparatus 2 and a detected face based on the depth map.

(53) The processing unit 48 may be configured to determine a shortest distance between the apparatus 2 and a detected face based on the depth map.

(54) The processing unit 48 may be configured to determine a distance between the apparatus 2 and a center of a detected face based on a depth map.

(55) The processing unit 48 may be configured to determine a size of a detected face based on a depth map.

(56) The processing unit 48 may be configured to determine a length of a detected face along a major axis (i.e., top to bottom or chin to forehead) and a width of a detected face along a minor axis (i.e., left to right or ear to ear).

(57) The processing unit 48 may be configured to upscale the resolution of a depth map to match the resolution of an image.

(58) The processing unit 48 may be configured to determine a position of a detected face in a depth map.

(59) The processing unit 48 may be configured to communicate with the processing unit 46 of the image camera 10 or with the central processing unit 50 of the apparatus 2 via a bus. For example, the bus may be an Inter Integrated Circuit (I2C) bus or a Serial Peripheral (SPI) bus.

(60) FIGS. 3 to 8 show flow diagrams of various methods that can be implemented by the apparatus 2. It will be understood that these methods can be combined or that other method can be implemented by the apparatus 2 without falling outside of the scope of the claims.

(61) FIG. 3 shows a flow diagram of a first method that can be implemented by the apparatus 2.

(62) Initially, a user holds the apparatus 2 so that the face 26 is located within the overlapping region 24.

(63) In step 300, the image camera 10 captures an image.

(64) In step 302, the depth map camera 12 captures a depth map.

(65) In step 304, the processing unit 46 of the image camera 10 detects a face and determines whether the detected face matches with a known face based on the captured image.

(66) If the processing unit of the image camera determines that the detected face matches with a known face, the method proceeds to step 306.

(67) If the processing unit of the image camera determines that the detected face does not match with a known face, the method proceeds to step 312.

(68) In step 306 (i.e., the detected face matches with a known face), the processing unit 48 of the depth map camera 12 determines whether the detected face is two-dimensional or three-dimensional. It will be understood that if each feature of the detected face is at substantially the same distance to the apparatus 2, the detected face is two-dimensional. Otherwise, the detected face is three-dimensional. In this way, the processing unit 46 of the image camera 10 may determine whether a fraudulent user is presenting a picture of a face to the apparatus 2.

(69) If the detected face is three-dimensional the method proceeds to step 310. If the detected face is two-dimensional the method proceeds to step 312.

(70) In step 310 (i.e., the detected face matches with a known face AND the face is three-dimensional), the processing unit 48 of the depth map camera 12 communicates an indication to the processing unit 46 of the image camera 10 or to the central processing unit 50 of the apparatus 2 that the detected face is three-dimensional. In response, the processing unit 46 of the image camera 10 or the central processing unit 50 of the apparatus 2 authorizes an action to be taken. For example, the action may be to unlock the apparatus 2 or to authorize a payment.

(71) In step 312, (i.e., the detected face does not match with a known face OR the detected face is two-dimensional), the processing unit 48 of the depth map camera 10 communicates an indication to the processing unit 46 of the image camera 10 or the central processing unit 50 of the apparatus 2 that the detected face is two-dimensional. In response, the processing unit 46 of the image camera 10 or the central processing unit 50 of the apparatus 2 does not authorize the action to be taken.

(72) The indication may a bit communicated via the I2C bus or SPI bus. Alternatively, the indication may a bit communicated via a dedicated input/output line or via a register accessible by the processing unit 46 of the image camera 10 or the central processing unit 50 of the apparatus 2.

(73) The method of FIG. 3 is advantageous because it requires minimum computing, memory and power resources to perform face recognition. Indeed, the processing unit 48 of the depth map camera 12 only performs a security check. It does not determine whether the detected face matches with a known face based on the depth map.

(74) FIG. 4 shows a flow diagram of a second method that can be implemented by the apparatus 2. The method of FIG. 4 is identical to the method of FIG. 3 except that it comprises an additional step 408.

(75) In step 408 (i.e., the detected face does not match with a known face AND the detected face is three-dimensional), the processing unit 48 of the depth map camera 12 determines whether the detected face matches with a known face based on the depth map.

(76) If the detected face matches with a known face the method proceeds to step 310. If the detected face does not match with a known face the method proceeds to step 312.

(77) The method of FIG. 4 is more reliable than the method of FIG. 3 because the processing unit 48 of the depth map camera 12 not only performs a security checks but also determines whether the detected face matches with a known face.

(78) FIG. 5 shows a flow diagram of a third method that can be implemented by the apparatus 2. The method of FIG. 5 is identical to the method of FIG. 3 except that the steps 304 and 306 are replaced by steps 504, 506, 508 and 509.

(79) In step 504, the processing unit 48 of the depth map camera 12 detects a face and determines a distance Z between the detected face and the apparatus 2 based on the depth map. The processing unit 48 of the depth map camera 12 communicates the distance to the processing unit 46 of the image camera 10 or the central processing unit 50 of the apparatus 2.

(80) In step 506, the processing unit 48 of the depth map camera 12 upscales the resolution of the depth map to match the resolution of the image. The processing unit 48 of the depth map camera 12 communicates the upscaled depth map to the processing unit 46 of the image camera 10 or the central processing unit 50 of the apparatus 2.

(81) In step 508, the processing unit 46 of the image camera 10 or the central processing unit 50 of the apparatus 2 aligns the upscaled depth map and the image to generate a combined image and depth map comprising a Red channel, a Green channel, a Blue channel and depth or distance or Z channel).

(82) It will be understood that such aligning can be performed because the processing unit 46 of the image camera 10 or the central processing unit 50 of the apparatus 2 knows the distance d between the optical axes 16, 18 of the image camera 10 and the depth map camera 12, the fields of view 20, 22 of the image camera 10 and the depth map camera 12, and the distance Z between the detected face and the apparatus 2.

(83) In step 509, the processing unit 46 of the image camera 10 or the central processing unit 50 of the apparatus 2 determines whether the detected face matches with a known face based on the combined image and depth map.

(84) FIG. 6 shows a flow diagram of a fourth method that can be implemented by the apparatus 2. The method of FIG. 6 is identical to the method of FIG. 3 except that the steps 304 and 306 are replaced by steps 604, 606 and 608.

(85) In step 604, the processing unit 46 of the image camera 10 detects a face and determines whether the detected face match with a known face based on the image and the depth map (e.g., as discussed above).

(86) If the detected face matches with a known face, the method proceeds to step 606. If the detected face does not match with a known face, the method proceeds to step 312.

(87) In step 606 (i.e., the detected face matches with a known face), the processing unit 48 of the depth map camera 12 checks whether it can also detect the face and, if so, determines a size of the detected face and determines a distance Z to the detected face. The processing unit 48 of the depth camera 12 communicates the result of the detection, the size and the distance to the processing unit 46 of the image camera 10 or the central processing unit 50 of the apparatus 2.

(88) In the event that the processing unit 48 depth map camera 12 does not detect the face (e.g., the face is located within the non-overlapping region 30 and therefore can only be detected by the processing unit 46 of the image camera 10) then the size and the distance communicated may be a default size (e.g., 0 or bignum representing some default size) and a default distance (e.g., 0, Zmin, Zmin, bignum representing some default distance).

(89) In step 608, the processing unit 46 of the image camera 10 determines whether the size and the distance meet some thresholds (e.g., the size is within an acceptable size range and the distance is within an acceptable distance range).

(90) If the processing unit 46 of the image camera 10 determines that the size and the distance meet the thresholds, the method proceeds to step 310. If the processing unit 46 of the image camera 10 determines that the size and the distance do not meet the thresholds, the method proceeds to step 312.

(91) FIG. 7 shows a flow diagram of a fifth method that can be implemented by the apparatus 2. The method of FIG. 7 is identical to the method of FIG. 3 except that the steps 304 and 306 are replaced by steps 704, 706 and 708.

(92) In step 704, the processing unit 46 of the image camera 10 and/or the processing unit 48 of the depth map camera 12 detect a face and determine whether the detected face match with a known face based on the image and/or the depth map.

(93) If the detected face matches with a known face, the method proceeds to step 706. If the detected face does not match with a known face, the method proceeds to step 312.

(94) In step 706 (i.e., the detected face matches with a known face), the processing unit 46 of the image camera 10 determines a skin tone of the detected face based on the image.

(95) In step 708, the processing unit 46 of the image camera 10 determines whether the skin tone of the detected face matches with the skin tone of the known face.

(96) If the skin tone of the detected face matches with the skin tone of the known face, the method proceeds to step 310. If the skin tone of the detected face does not match with the skin tone of the known face, the method proceeds to step 312.

(97) In this way, the processing unit 46 of the image camera 10 ensures that a malicious user is not merely presenting a two-dimensional or three-dimensional representation (e.g., a picture, a cast, a printed model) of a face to the apparatus 2.

(98) FIG. 8 shows a flow diagram of a sixth method that can be implemented by the apparatus 2. The method of FIG. 8 is identical to the method of FIG. 3 except that the steps 304 and 306 are replaced by steps 804, 806 and 808.

(99) In step 804, the processing unit 48 of the depth map camera 12 detects a face and determines a position of the detected face in the depth map. The processing unit 48 of the depth map camera communicates the position of the detected face in the depth map to the processing unit 46 of the image camera 10 or the central processing unit 50 of the apparatus 2.

(100) The processing unit 48 of the depth map camera 12 also determines a distance Z between the detected face and the apparatus 2. The processing unit 48 of the depth map camera 12 communicates the distance between the detected face and the apparatus 2 to the processing unit 46 of the image camera 10 or the central processing unit 50 of the apparatus 2.

(101) In step 806, the processing unit 46 of the image camera 10 or the central processing unit 50 of the apparatus 2 determines a position of a detected face in the image based on the position of the detected face in the depth map.

(102) It will be understood that the position of the detected face in the depth map may be derived knowing the distanced between the optical axes 16, 18 of the image camera 10 and the depth map camera 12, the fields of view 20, 22 of the image camera 10 and the depth map camera 12 and the distance Z between the detected face and the apparatus 2.

(103) In step 808, the processing unit 46 of the image camera determines whether the detected face matches a known face based on the image.

(104) In this way, the apparatus 2 may initiate face recognition using the depth map camera (e.g., low consumption mode) and then complete face recognition using the image camera (e.g., high consumption mode).

(105) It will also be understood that in the above methods the steps performed by the processing unit 46 of the image camera 10 and/or the steps performed by processing unit 48 of the depth map camera 12 could equally be performed by the central processing unit 50 of the apparatus 2.

(106) FIG. 9 shows a block diagram summarizing the structure of an apparatus according to an embodiment.

(107) The apparatus comprises a component 902 for capturing an image. The apparatus comprises a component 904 for capturing a depth map. The apparatus comprises a component 906 for determining whether a detected face matches a known face based on an image. The apparatus comprises a component 908 for determining whether a detected face matches a known face based on a depth map. The apparatus comprises a component 910 for upscaling the resolution of a depth map to match the resolution of an image. The apparatus comprises a component 912 for aligning a depth map with an image to generate a combined image and depth map. The apparatus comprises a component 914 for determining a size of a detected face and a distance to a detected face based on a depth map. The apparatus comprises a component 916 for determining whether a detected face matches a known face based on a combined image and depth of the face. The apparatus comprises a component 918 for determining a skin tone of a detected face based on an image. The apparatus comprises a component 920 for determining a position of a detected face in a depth map. The apparatus comprises a component 922 for determining a position of a detected face in an image based on a position of the detected face in a depth map.

(108) It will be understood that the components 906 to 922 may be implemented in hardware and/or in software.

(109) Various embodiments with different variations have been described here above. It should be noted that those skilled in the art may combine various elements of these various embodiments and variations.

(110) Such alterations, modifications, and improvements are intended to be part of this disclosure, and are intended to be within the scope of the claims. Accordingly, the foregoing description is by way of example only and is not intended to be limiting.

(111) The various embodiments described above can be combined to provide further embodiments. These and other changes can be made to the embodiments in light of the above-detailed description. In general, in the following claims, the terms used should not be construed to limit the claims to the specific embodiments disclosed in the specification and the claims, but should be construed to include all possible embodiments along with the full scope of equivalents to which such claims are entitled. Accordingly, the claims are not limited by the disclosure.

Apparatus, method and computer program for performing object recognition

Assignee

Inventors

Cpc classification

Classification Explorer

H04N13/25

ELECTRICITY

Classification Explorer

G06V20/64

PHYSICS

Classification Explorer

G06V10/751

PHYSICS

Classification Explorer

H04N13/204

ELECTRICITY

Classification Explorer

H04N2213/003

ELECTRICITY

Classification Explorer

G06V40/172

PHYSICS

Classification Explorer

H04N13/254

ELECTRICITY

Classification Explorer

G06V10/143

PHYSICS

Classification Explorer

H04N13/271

ELECTRICITY

Classification Explorer

H04N2013/0074

ELECTRICITY

Classification Explorer

G06V40/10

PHYSICS

Classification Explorer

G06V30/2504

PHYSICS

International classification

Classification Explorer

G06K9/68

PHYSICS

Classification Explorer

G06K9/20

PHYSICS

Classification Explorer

H04N13/204

ELECTRICITY

Classification Explorer

H04N13/271

ELECTRICITY

Abstract

Claims

Description