VIDEO USER INTERFACE AND METHOD FOR USE IN DETERMINING DEPTH INFORMATION RELATING TO A SCENE
20240007759 ยท 2024-01-04
Inventors
Cpc classification
H04N23/55
ELECTRICITY
H04N23/95
ELECTRICITY
G06T7/80
PHYSICS
G06T7/521
PHYSICS
H04N23/53
ELECTRICITY
G06F21/32
PHYSICS
International classification
H04N23/95
ELECTRICITY
G06T7/521
PHYSICS
G06T7/80
PHYSICS
H04N23/53
ELECTRICITY
H04N23/55
ELECTRICITY
Abstract
A video user interface for an electronic device may help in determining depth information relating to a scene. comprises a display, a spatial filter defining a coded aperture, an image sensor and a lens. The scene is disposed in front of the display. The image sensor and the lens are both disposed behind the display. The spatial filter is defined by, or disposed behind, the display. The spatial filter, the image sensor, and the lens are arranged to allow the image sensor to capture an image of the scene through the coded aperture and the lens. The video user interface may be used to determine depth information relating to the scene. The video user interface may use the determined depth information to recognize one or more features in the scene, such as one or more features of a user of the electronic device in the scene, for example one or more facial features of a user of the electronic device in the scene. The video user interface may unlock the electronic device in response to recognizing one or more features in the scene.
Claims
1. A video user interface for an electronic device for use in determining depth information relating to a scene, the video user interface comprising: a display; a spatial filter defining a coded aperture, the spatial filter being disposed behind the display; an image sensor; and a lens, wherein the image sensor and the lens are both disposed behind the display, and wherein the spatial filter, the image sensor, and the lens are arranged to allow the image sensor to capture an image of a scene through the coded aperture and the lens, the scene being disposed in front of the display.
2. The video user interface as claimed in claim 1, wherein a least one of: the spatial filter comprises a binary spatial filter; the spatial filter comprises a plurality of spatial filter pixels, wherein the plurality of spatial filter pixels defines the coded aperture; the spatial filter comprises a plurality of opaque spatial filter pixels; the plurality of opaque spatial filter pixels define one or more gaps therebetween, wherein the one or more gaps define the coded aperture; the spatial filter comprises a plurality of transparent spatial filter pixels, wherein the plurality of transparent spatial filter pixels define the coded aperture; at least some of the opaque spatial filter pixels are interconnected or contiguous; all of the opaque spatial filter pixels are interconnected or contiguous; at least some of the opaque spatial filter pixels are non-contiguous; at least some of the transparent spatial filter pixels are interconnected or contiguous; at least some of the transparent spatial filter pixels are non-contiguous; the spatial filter comprises a 2D array of spatial filter pixels, wherein the 2D array of spatial filter pixels defines the coded aperture; the spatial filter comprises a uniform 2D array of spatial filter pixels, wherein the uniform 2D array of spatial filter pixels defines the coded aperture.
3. The video user interface as claimed in claim 1, wherein the spatial filter comprises an nn array of spatial filter pixels, wherein the spatial filter pixels define the coded aperture and wherein n is an integer, or wherein the spatial filter comprises an nm array of spatial filter pixels, wherein the spatial filter pixels define the coded aperture and wherein n and m are integers.
4. The video user interface as claimed in claim 1, wherein at least one of: the display is at least partially transparent; an area of the display is at least partially transparent; wherein the display comprises an LED display.
5. The video user interface as claimed in claim 1, wherein the display and the image sensor are synchronized so that the display emits light and the image sensor captures the image of the scene at different times.
6. The video user interface as claimed in claim 1, wherein: the spatial filter is disposed between the display and the lens; the spatial filter is disposed between the lens and the image sensor; the spatial filter is integrated with the lens; or the spatial filter is disposed on a rear surface of the display on an opposite side of the display to the scene.
7. The video user interface as claimed in claim 1, wherein the display defines the spatial filter; wherein the display comprises one or more at least partially transparent areas and one or more at least partially opaque areas; wherein the spatial filter is defined by the one or more at least partially transparent areas and the one or more at least partially opaque areas; and wherein the one or more at least partially transparent areas of the display and/or the one or more at least partially opaque areas of the display are temporary or transitory.
8. (canceled)
9. (canceled)
10. The video user interface as claimed in claim 7, wherein at least one of: the display comprises a plurality of light emitting pixels; the light emitting pixels define the spatial filter; the light emitting pixels define the one or more at least partially transparent areas of the display and/or the one or more at least partially opaque areas of the display; the display comprises one or more gaps between the light emitting pixels; the one or more gaps between the light emitting pixels define the spatial filter; the one or more gaps between the light emitting pixels define the one or more at least partially transparent areas of the display and/or the one or more at least partially opaque areas of the display; and/or the one or more at least partially opaque areas of the display.
11. The video user interface as claimed in claim 1, wherein the image sensor comprises a visible image sensor sensitive to visible light, wherein the image sensor comprises an RGB image sensor or wherein the image sensor comprises an infra-red image sensor sensitive to infra-red light such as near infra-red (NIR) light.
12. (canceled)
13. (canceled)
14. The video user interface as claimed in claim 1, wherein a geometry of the coded aperture is selected so as to maximize a divergence parameter value, wherein the divergence parameter is defined so that the greater the divergence parameter value calculated for a given coded aperture geometry, the better the discrimination that is achieved between regions of different depths in the image of the scene captured by the image sensor when using the given coded aperture geometry.
15. The video user interface as claimed in claim 14, wherein calculating the divergence parameter value for each candidate coded aperture geometry comprises: applying a plurality of different scale factor values to the geometry of the candidate coded aperture to obtain a plurality of scaled versions of the candidate coded aperture; calculating a divergence parameter value for each different pair of scaled versions of the candidate coded aperture selected from the plurality of scaled versions of the candidate coded aperture; and identifying the divergence parameter value for each candidate coded aperture geometry as the minimum divergence parameter value calculated for any different pair of scaled versions of the candidate coded aperture selected from the plurality of scaled versions of the candidate coded aperture.
16-17. (canceled)
18. The video user interface as claimed in claim 1, further comprising a processing resource configured to determine depth information relating to each of one or more regions of the scene based at least in part on the captured image and calibration data.
19. The video user interface as claimed in claim 18, wherein the calibration data comprises a plurality of calibration images of a plurality of calibration scenes and a corresponding plurality of measured depth values, wherein each calibration scene includes a point light source located at a different one of the measured depths and each calibration scene is captured by the image sensor through the coded aperture and the lens.
20. An electronic device comprising the video user interface as claimed in claim 1.
21. A method for use in determining depth information relating to a scene using a video user interface, wherein the video user interface comprises a display, a spatial filter defining a coded aperture, an image sensor and a lens, and the method comprises: capturing an image of a scene through the coded aperture and the lens using the image sensor, the image sensor and the lens both being disposed behind the display, the scene being disposed in front of the display, and the spatial filter being disposed behind the display.
22. The method as claimed in claim 21, further comprising: determining depth information relating to each of one or more regions of the scene based at least in part on the captured image and calibration data.
23. The method as claimed in claim 22, wherein the calibration data comprises a plurality of calibration images of a plurality of calibration scenes and a corresponding plurality of measured depth values, wherein each calibration scene includes a point light source located at a different one of the measured depths and each calibration scene is captured by the image sensor through the coded aperture and the lens.
24. The method as claimed in claim 22, further comprising generating an all-focus image of the scene and/or a re-focused image of the scene based on the determined depth information relating to each of one or more regions of the scene.
25. (canceled)
26. The method as claimed in claim 22, further comprising recognizing one or more features in the scene based on the determined depth information relating to each of the one or more regions of the scene.
27. The method as claimed in claim 26, further comprising unlocking the electronic device in response to recognizing one or more features in the scene.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0110] A video user interface for an electronic device and associated methods will now be described by way of non-limiting example only with reference to the accompanying drawings of which:
[0111]
[0112]
[0113]
[0114]
[0115]
[0116]
[0117]
[0118]
[0119]
[0120]
[0121]
[0122]
[0123]
[0124]
[0125]
[0126]
[0127]
[0128]
DETAILED DESCRIPTION
[0129] Referring initially to
[0130] Referring to
[0131] In use, the image sensor 116 captures an image of the scene 130 disposed in front of the mobile electronic device 102 through the display 108, the coded aperture of the spatial filter 118, and the lens 114, and the processing resource 120 processes the image captured by the image sensor 116 to determine depth information relating to each of one or more regions of the scene 130.
[0132] The processing resource 120 synchronizes the display 108 and the image sensor 116 so that the display 108 emits light and the image sensor 116 captures the image of the scene 130 at different times. Synchronization of the display 108 and the image sensor 116 in this way may avoid any light from the display 108 being captured by the image sensor 116 to thereby prevent light from the display 108 altering, corrupting or obfuscating the captured image of the scene 130.
[0133] The image of the scene 130 captured by the image sensor 116 and the depth information relating to each region of the scene 130 may together constitute a depth image or a depth map of the scene 130. The depth information relating to each of the one or more regions of the scene 130 may comprise a distance from any part of the video user interface 104 to each of the one or more regions of the scene 130. For example, the depth information relating to each of one or more regions of the scene 130 may comprise a distance from the lens 114 of the video user interface 104 to each of one or more regions of the scene 130. The depth information relating to each of one or more regions of the scene 130 may comprise a distance from a focal plane of the lens 114 of the video user interface 104 to each of one or more regions of the scene 130, wherein the focal plane of the lens 114 is defined such that different light rays which emanate from a point in the focal plane of the lens 114 are focused to the same point on the image sensor 116.
[0134] As will be described in more detail below, the processing resource 120 is configured to determine depth information relating to each of one or more regions of the scene 130 based at least in part on the image of the scene 130 captured by the image sensor 120 and calibration data. As will be understood by one skilled in the art, the spatial filter 118 allows light to reach the image sensor 116 in a specifically calibrated pattern, which can be decoded to retrieve depth information. Specifically, as may be appreciated from Image and Depth from a Conventional Camera with a Coded Aperture, Levin et al., ACM Transactions on Graphics, Vol. 26, No. 3, Article 70, pp. 70-1 to 70-9, which is incorporated herein by reference in its entirety, when compared with a conventional uncoded aperture, the coded aperture defined by the spatial filter 118 may be used to provide improved depth discrimination between different regions of an image of a scene having different depths. Accordingly, it should be understood that protection may be sought for any of the features of Levin et al.
[0135] The calibration data comprises a plurality of calibration images of a plurality of calibration scenes and a corresponding plurality of measured depth values, wherein each calibration scene includes a point light source located at a different one of the measured depths and each calibration scene is captured by the image sensor 116 through the coded aperture and the lens 114. The measured depth of the point light source in a corresponding calibration scene comprises a measured distance from any part of the video user interface 104 to the point light source in the corresponding calibration scene. For example, the measured depth of the point light source in the corresponding calibration scene may comprise a measured distance from the lens 114 to the point light source in the corresponding calibration scene. The measured depth of the point light source in the corresponding calibration scene may comprise a measured distance from a focal plane of the lens 114 to the point light source in the corresponding calibration scene, wherein the focal plane of the lens 114 is defined such that different light rays which emanate from a point in the focal plane of the lens 114 are focused to the same point on the image sensor 116.
[0136] It should be understood that the relative positions of the spatial filter 118, the image sensor 116 and the lens 114 when the image sensor 116 captures the images of the point light source in the corresponding calibration scene through the coded aperture and the lens 114 for the generation of the calibration data, should be the same as the relative positions of the spatial filter 118, the image sensor 116 and the lens 114 when the image sensor 116 captures the image of the scene 130 through the coded aperture and the lens 114.
[0137] Referring to
[0143] In effect, the calibration distance determined at step 168 for each region j of the scene 130 provides depth information relating to each region j of the scene 130. For example, the captured image of the scene 130 and the calibration distance determined at step 168 for each region j of the scene 130 may together be considered to constitute a depth image or a depth map of the scene 130.
[0144] It should be understood that the method generally designated 160 for use in generating depth information relating to the scene 130 is described in more detail in Sections 3, 4 and 5 of Levin et al. and that protection may be sought for any of the features described in Sections 3, 4 and 5 of Levin et al.
[0145] Furthermore, as will be understood by one of ordinary skill in the art, the depth information relating to the scene 130 may be used to generate an all-focus image of the scene 130 as described at Section 5.2 of Levin et al. and/or to generate a re-focused image of the scene 130 as described at Section 5.4 of Levin et al. Accordingly, it should be understood that protection may be sought for any of the features described in Sections 5.2 and/or 5.4 of Levin et al.
[0146] The calibration data is generated by performing a calibration procedure 170 which is illustrated in
[0150] It also should be understood that the calibration procedure 170 is described in more detail in Section 5.1 of Levin et al. and that protection may be sought for any of the features described in Section 5.1 of Levin et al.
[0151] The geometry of the coded aperture defined by the spatial filter 118 may be optimized by selecting the geometry of the coded aperture so as to maximize a divergence parameter value. The divergence parameter is defined so that the greater the divergence parameter value calculated for a given coded aperture geometry, the better the depth discrimination that is achieved between regions of different depths in the image of the scene 130 captured by the image sensor 116 when using the given coded aperture geometry. Specifically, the coded aperture geometry is selected by generating, for example randomly generating, a plurality of different candidate coded aperture geometries, calculating a divergence parameter value for each candidate coded aperture geometry, and selecting the candidate coded aperture geometry which has the maximum calculated divergence parameter value.
[0152] Specifically, the divergence parameter value for each candidate coded aperture geometry is calculated by applying a plurality of different scale factor values to the geometry of the candidate coded aperture to obtain a plurality of scaled versions of the candidate coded aperture, calculating a divergence parameter value for each different pair of scaled versions of the candidate coded aperture selected from the plurality of scaled versions of the candidate coded aperture, and identifying the divergence parameter value for each candidate coded aperture geometry as the minimum divergence parameter value calculated for any different pair of scaled versions of the candidate coded aperture selected from the plurality of scaled versions of the candidate coded aperture. For example,
[0153] The plurality of different scale factor values applied to each candidate coded aperture geometry is selected from a predetermined range of scale factor values, wherein each scale factor value corresponds to a different depth of the point light source in a scene selected from a predetermined range of depths of the point light source. For the example of the specific candidate coded aperture geometry of
[0154] The divergence parameter value for each different pair of scaled versions of the candidate coded aperture is calculated by calculating the divergence parameter value based on a statistical blurry image intensity distribution for each of the two scaled versions of the candidate coded aperture of each different pair of scaled versions of the candidate coded aperture. Specifically, the divergence parameter value for each different pair of scaled versions of the candidate coded aperture is calculated by calculating a Kullback-Leibler divergence parameter DKL defined by:
D.sub.KL[P.sub.k1(y), P.sub.k2(y)]=.sub.yP.sub.k1(y)[log P.sub.k1(y)log P.sub.k2(y)]dy [0155] where y is a simulated blurry image of a point light source captured by the image sensor 116 through the candidate coded aperture, P.sub.k1(y) and P.sub.k2(y) are the statistical blurry image intensity distributions of the blurry image y at different scale factor values k.sub.1 and k.sub.2 corresponding to different depths of the point light source in a scene, and each of the statistical blurry image intensity distributions P.sub.k1(y) and P.sub.k2(y) follows a Gaussian distribution.
[0156] Thus, for the example of the specific candidate coded aperture geometry of
[0157] The divergence parameter value calculated for the candidate coded aperture geometry is then compared to divergence parameter values calculated for one or more other candidate coded aperture geometries and the candidate coded aperture geometry having the maximum divergence parameter value is selected for the spatial filter 118. For example,
[0158] It should be understood that the method described above for selecting the geometry of the coded aperture is described in more detail in Section 2 of Levin et al. and that protection may be sought for any of the features described in Section 2 of Levin et al.
[0159] Referring now to
[0160] Referring now to
[0161] Referring now to
[0162] Referring now to
[0163] One of ordinary skill in the art will understand that various modifications may be made to the video user interfaces and methods described above without departing from the scope of the present disclosure. For example, any of the image sensors 116, 216, 316, 416, 516 may be sensitive to visible light, for example any of the image sensors 116, 216, 316, 416, 516 may be a visible image sensor or an RGB image sensor. Any of the image sensors 116, 216, 316, 416, 516 may be sensitive to infra-red light such as near infra-red (NIR) light, for example any of the image sensors 116, 216, 316, 416, 516 may be an infra-red image sensor. The video user interface may comprise a plurality of image sensors. For example, the video user interface may comprise an infra-red image sensor defined by, or disposed behind, the display for use in generating a depth image of a scene disposed in front of the display as described above and a separate visible image sensor defined by, or disposed behind, the display for capturing conventional images of the scene disposed in front of the display. The video user interface may comprise a source, emitter or projector of infra-red light for illuminating the scene with infra-red light. The source, emitter or projector of infra-red light may be disposed behind the display. Use of a source, emitter or projector of infra-red light in combination with an infra-red image sensor for use in generating a depth image of a scene disposed in front of the display may provide improved depth information relating to the scene.
[0164] Any of the video user interfaces 104, 204, 304, 404, 504 described above may be used in an electronic device of any kind, for example a mobile and/or portable electronic device of any kind, including in a phone such as a mobile phone, a cell phone, or a smart phone, or in a tablet or a laptop.
[0165] Embodiments of the present disclosure can be employed in many different applications including in the recognition of one or more features in the scene. For example, any of the video user interfaces 104, 204, 304, 404, 504 may be suitable for use in the recognition of one or more features of a user, such as one or more features of a user, of the electronic device in the scene, for facial unlocking of the electronic device. Such a video user interface may allow emojis, or one or more other virtual elements, to be superimposed on top of an image of the scene captured by the image sensor through the coded aperture and the lens. Such a video user interface may allow the generation of an improved selfie image captured by the image sensor through the coded aperture and the lens. Such a video user interface may allow emojis, or one or more other virtual elements, to be superimposed on top of the selfie image captured by the image sensor through the coded aperture and the lens.
[0166] Although preferred embodiments of the disclosure have been described in terms as set forth above, it should be understood that these embodiments are illustrative only and that the claims are not limited to those embodiments. Those skilled in the art will understand that various modifications may be made to the described embodiments without departing from the scope of the appended claims. Each feature disclosed or illustrated in the present specification may be incorporated in any embodiment, either alone, or in any appropriate combination with any other feature disclosed or illustrated herein. In particular, one of ordinary skill in the art will understand that one or more of the features of the embodiments of the present disclosure described above with reference to the drawings may produce effects or provide advantages when used in isolation from one or more of the other features of the embodiments of the present disclosure and that different combinations of the features are possible other than the specific combinations of the features of the embodiments of the present disclosure described above.
[0167] The skilled person will understand that in the preceding description and appended claims, positional terms such as above, along, side, etc. are made with reference to conceptual illustrations, such as those shown in the appended drawings. These terms are used for ease of reference but are not intended to be of limiting nature. These terms are therefore to be understood as referring to an object when in an orientation as shown in the accompanying drawings.
[0168] Use of the term comprising when used in relation to a feature of an embodiment of the present disclosure does not exclude other features or steps. Use of the term a or an when used in relation to a feature of an embodiment of the present disclosure does not exclude the possibility that the embodiment may include a plurality of such features.
[0169] The use of reference signs in the claims should not be construed as limiting the scope of the claims.
LIST OF REFERENCE NUMERALS
[0170] 2 mobile electronic device; [0171] 4 video user interface; [0172] 6 front face of mobile electronic device; [0173] 8 display; [0174] 9 notch; [0175] 10 notch cover; [0176] 12 camera; [0177] 14 lens; [0178] 16 image sensor; [0179] 20 processing resource; [0180] 30 scene; [0181] 102 mobile electronic device; [0182] 104 video user interface; [0183] 106 front face of mobile electronic device; [0184] 108 display; [0185] 112 camera; [0186] 114 lens; [0187] 116 image sensor; [0188] 118 spatial filter; [0189] 118a opaque spatial filter pixel; [0190] 118b transparent spatial filter pixel; [0191] 120 processing resource; [0192] 130 scene; [0193] 204 video user interface; [0194] 206 front face of mobile electronic device; [0195] 208 display; [0196] 212 camera; [0197] 214 lens; [0198] 216 image sensor; [0199] 218 spatial filter; [0200] 220 processing resource; [0201] 230 scene; [0202] 304 video user interface; [0203] 306 front face of mobile electronic device; [0204] 308 display; [0205] 312 camera; [0206] 314 lens; [0207] 316 image sensor; [0208] 318 spatial filter; [0209] 320 processing resource; [0210] 330 scene; [0211] 404 video user interface; [0212] 406 front face of mobile electronic device; [0213] 408 display; [0214] 412 camera; [0215] 414 lens; [0216] 416 image sensor; [0217] 418 spatial filter; [0218] 420 processing resource; [0219] 430 scene; [0220] 504 video user interface; [0221] 506 front face of mobile electronic device; [0222] 508 display; [0223] 512 camera; [0224] 514 lens; [0225] 516 image sensor; [0226] 518 spatial filter; [0227] 520 processing resource; and [0228] 530 scene.