METHOD AND DEVICE FOR AUDIO STEERING USING GESTURE RECOGNITION
20240098434 ยท 2024-03-21
Inventors
- Hassane Guermoud (Thorigne Fouillard, FR)
- Michel Kerdranvat (Chantepie, FR)
- Alexey OZEROV (Rennes, FR)
Cpc classification
H04S1/002
ELECTRICITY
H04R2499/15
ELECTRICITY
G06F3/017
PHYSICS
H04N21/44218
ELECTRICITY
H04N21/4104
ELECTRICITY
G06V40/28
PHYSICS
International classification
Abstract
A method and device for audio steering from a loudspeaker line array of a display device toward a user direction is disclosed. Data corresponding to a viewer gesture is obtained from at least one sensor of a display device. A distance and an angle between the viewer and a plurality of loudspeakers coupled to the display is determined based on the obtained data. Phase shifting is applied to an audio signal powering the plurality of loudspeakers based on the determined distance and angle to audio steer toward the user direction.
Claims
1. A system, comprising: a display device including an image sensor; and at least one processor, configured to: obtain, from the image sensor, data corresponding to a gesture of a viewer; determine respective distances between the viewer and a plurality of loudspeakers coupled to the display device based on the obtained data; and shifting respectively audio signals powering the plurality of loudspeakers, based on the determined respective distances.
2. The system of claim 1, wherein the image sensor is a camera.
3. The system of claim 1, wherein the viewer gesture is one of a hand gesture, a facial expression, a head movement from side-to-side, head nodding, and arm movements from side-to-side.
4. The system of claim 3, wherein the hand gesture is one of holding up one hand palm flat, holding up one of more fingers, holding up a thumb and making a circle by contacting any finger with the thumb.
5. The system of claim 1, wherein the plurality of loudspeakers is configured as a line array.
6. The system of claim 1, wherein the plurality of loudspeakers is positioned adjacent to a bottom portion of the display device.
7. The system of claim 1, wherein an input for each loudspeaker in the plurality of loudspeakers is coupled to a phase-shifting gain controller which is fed with an audio source.
8. The system of claim 1, wherein the viewer gesture is used to direct phase shifting of the audio signal powering the plurality of loudspeakers away from a location for the viewer.
9. The system of claim 1, wherein an image sensor focal length for the image sensor is obtained based on images of viewer gestures for a first position and a second position.
10. The system of claim 3, wherein a hand size of the hand gesture is obtained using gender and age estimation based on face capture.
11. A method, comprising: obtaining, from at least one image sensor of a display device, data corresponding to a gesture of a viewer; determining respective distances between the viewer and a plurality of loudspeakers coupled to the display device based on the obtained data; and shifting respectively signals powering the plurality of loudspeakers based on the determined respective distances.
12. The method of claim 11, wherein the image sensor is a camera.
13. The method of claim 11, wherein the viewer gesture is one of a hand gesture, a facial expression, a head movement from side-to-side, head nodding, and arm movements from side-to-side.
14. The method claim 13, wherein the hand gesture is one of holding up one hand palm flat, holding up one of more fingers, holding up a thumb and making a circle by contacting any finger with the thumb.
15. The method of claim 11, wherein the plurality of loudspeakers is configured as a line array.
16. The method of claim 11, wherein the plurality of loudspeakers is positioned adjacent to a bottom portion of the display device.
17. The method of claim 11, wherein an input for each loudspeaker in the plurality of loudspeakers is coupled to a phase-shifting gain controller which is fed with an audio source.
18. The method of claim 11, wherein the viewer gesture is used to direct phase shifting of the audio signal powering the plurality of loudspeakers away from a location for the viewer.
19. (canceled)
20. The method of claim 13, wherein a hand size of the hand gesture is obtained using gender and age estimation based on face capture.
21. A computer program product comprising instructions which when executed cause a processor to implement the method of claim 11.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] Other features and advantages of embodiments shall appear from the following description, given by way of indicative and non-exhaustive examples and from the appended drawings, of which:
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
DETAILED DESCRIPTION
[0023]
[0024] The display device 305 may be any consumer electronics device incorporating a display screen (not shown), such as, for example, a digital television. The display device 305 includes at least one processor 320 and a sensor 310. Processor 320 may include software that is configured to determine distance and angle estimation with respect to a user location. Processor 320 may also be configured to determine the phase shift applied to the audio signals powering the audio array 330. The sensor 310 identifies gestures performed by a user (not shown) of the display device 305.
[0025] The processor 320 may include embedded memory (not shown), an input-output interface (not shown), and various other circuitries as known in the art. Program code may be loaded into processor 320 to perform the various processes described hereinbelow.
[0026] Alternatively, the display device 305 may also include at least one memory (e.g., a volatile memory device, a non-volatile memory device) which stores program code to be loaded into the processor 320 for subsequent execution. The display device 305 may additionally include a storage device (not shown), which may include non-volatile memory, including but not limited to EEPROM, ROM, PROM, RAM, DRAM, SRAM, flash, a magnetic disk drive, and/or an optical disk drive. The storage device may comprise an internal storage device, an attached storage device, and/or a network accessible storage device, as non-limiting examples.
[0027] The sensor 310 may be any device that can identify gestures performed by a user of the display device 305. In one example embodiment, the sensor may be, for example, a camera, and more specifically an RGB camera. The sensor 310 may be internal to the display device 305 as shown in
[0028] The audio array 330 is an array of loudspeakers arranged in a line (see
[0029] The general principle of the proposed solution relates to using viewer gestures to initiate audio steering from a loudspeaker line array of a display device toward a user direction. The audio steering is performed on-the-fly, based on a touchless interaction with the display device without relying on a calibration step or use of a remote-control device.
[0030]
[0031] In the example implementation, the method is carried out by apparatus 300 (
[0032]
[0033] Referring again to
[0034] In one example embodiment, a set of known user gestures may be available to the processor 320. For such an embodiment, when one user gesture of the set of known user gestures is detected by the sensor 310, audio steering from the display device towards a user direction is initiated.
[0035]
[0036]
[0037] Referring to step 410 of
[0038] Referring to
where d is the distance of the hand (
[0039] The hand height (H) can vary depending on gender and age. In an example embodiment, a gender and age estimation based on face capture may be used to approximate this variable. For example, gender and age estimation may be estimated usingMANIMALA ET AL., Anticipating Hand and Facial Features of Human Body using Golden Ratio, International Journal of Graphics & Image Processing, Vol. 4, No. 1, February 2014, pp. 15-20.
[0040] Referring to
[0041]
[0042]
[0043] Based on the images of the hand gestures for the first position (d.sub.1) and the second position (d.sub.2) depicted in
where d.sub.1?d.sub.2 is the length of the user forearm and has a relation with the hand height through gender and age estimation (MANIMALA ET AL., Anticipating Hand and Facial Features of Human Body using Golden Ratio, International Journal of Graphics & Image Processing, Vol. 4, No. 1, February 2014, pp. 15-20) (
[0044] Referring to step 420 of
[0045]
[0046] In
[0047] As in
where t.sub.i is the phase shift to be applied to the audio signal, x.sub.i is the distance between the loudspeaker at position i and the hand of the user located in the scene, x.sub.max=max(x.sub.i) which is the longest distance between loudspeakers and the hand of user located in the scene.
where Depth is the distance between the camera to the intersection of the hand plan in the scene, ?.sub.i is the angle between x.sub.i and Depth, and l.sub.i is the horizontal distance between the camera and the loudspeaker at position i.
[0048] In an example embodiment, the viewer gesture is used to direct phase shifting of the audio signal powering the plurality of loudspeakers away from a location for the viewer. For this embodiment, the viewer may not be interested in the displayed video content and he/she might want to browse a mobile phone or tablet. The viewer initiates the phase shifting to guide the audio signal in the direction of person(s) watching the displayed video content. The viewer gesture to initiate such audio phase shifting may be, for example, to have the arm movement to swipe towards a left direction to direct audio towards people on the left of the viewer, or have the arm movement to swipe towards a right direction to direct audio towards people on the right of the viewer.
[0049] Although the present embodiments have been described hereinabove with reference to specific embodiments, the present disclosure is not limited to the specific embodiments, and modifications will be apparent to a skilled person in the art which lie within the scope of the claim.
[0050] Many further modifications and variations will themselves to those versed in the art upon making reference to the foregoing illustrative embodiments, which are given by way of example only and which are not intended to limit the scope of the disclosure, that being determined solely by the appended claims. In particular, the different features from different embodiments may be interchanged, where appropriate.