Gesture operation method based on depth values and system thereof
10386934 ยท 2019-08-20
Assignee
Inventors
- JINN-FENG JIANG (KAOHSIUNG, TW)
- SHIH-CHUN HSU (KAOHSIUNG, TW)
- TSU-KUN CHANG (KAOHSIUNG, TW)
- TSUNG-HAN LEE (KAOHSIUNG, TW)
- HUNG-YUAN WEI (KAOHSIUNG, TW)
Cpc classification
G06F3/017
PHYSICS
G06V20/597
PHYSICS
H04N13/254
ELECTRICITY
G06V40/28
PHYSICS
G06V20/59
PHYSICS
H04N13/239
ELECTRICITY
G06T7/521
PHYSICS
G06F18/295
PHYSICS
H04N2013/0081
ELECTRICITY
G06V10/763
PHYSICS
International classification
G06T7/521
PHYSICS
H04N13/254
ELECTRICITY
H04N13/239
ELECTRICITY
G06F3/03
PHYSICS
Abstract
A gesture operation method based on depth values and the system thereof are revealed. A stereoscopic-image camera module acquires a first stereoscopic image. Then an algorithm is performed to judge if the first stereoscopic image includes a triggering gesture. Then the stereoscopic-image camera module acquires a second stereoscopic image. Another algorithm is performed to judge if the second stereoscopic image includes a command gesture for performing the corresponding operation of the command gesture.
Claims
1. A gesture operation method based on depth values, applied in a mobile vehicle, wherein said mobile vehicle comprises a stereoscopic-image camera module which comprises a first camera unit, a second camera unit, a structured-light projecting module, and a structured-light camera unit; and wherein an included angle is formed between said first camera unit and said second camera unit, comprising steps of: acquiring a first image by said first camera unit; acquiring a second image by said second camera unit; acquiring a first stereoscopic image by performing a processing unit from said first image, said second image, and said included angle; recognizing if a plurality of first depth values of said first stereoscopic image includes a triggering gesture by performing a first algorithm; judging if a number of said plurality of first depth values is smaller than a first threshold value; projecting a plurality of light planes continuously from said structured-light projecting module as the number of said first depth values is smaller than said first threshold value; receiving a plurality of first light pattern messages from said structured-light camera unit; obtaining a third stereoscopic image by calculating from said plurality of first light pattern messages; fusing said first stereoscopic image and said third stereoscopic image to form a fused first stereoscopic image; acquiring a third image by said first camera unit; acquiring a fourth image by said second camera unit; acquiring a second stereoscopic image by performing said processing unit from said third image, said fourth image, and said included angle; recognizing if a plurality of second depth values of said second stereoscopic image includes a command gesture by performing a second algorithm; judging if a number of the plurality of second depth values is smaller than a second threshold value; projecting a plurality of light planes continuously from the structured-light projecting module as the number of the second depth values is smaller than the second threshold value; receiving a plurality of second light pattern messages from the structured-light camera unit; obtaining a fourth stereoscopic image by calculating from the plurality of second light pattern messages; fusing the second stereoscopic image and the fourth stereoscopic image to form a fused second stereoscopic image; and performing the processing unit corresponding to an operation of the command gesture.
2. The gesture operation method based on depth values of claim 1, further comprising steps of: identifying a location of said triggering gesture in said mobile vehicle using a clustering algorithm; said processing unit judging if performing said corresponding operation of said command gesture is permitted according to said location of said triggering gesture in said mobile vehicle; and said processing unit performing said corresponding operation of said command gesture if permitted.
3. The gesture operation method based on depth values of claim 1, wherein said mobile vehicle includes a stereoscopic-image camera module; said stereoscopic-image camera module includes a first camera unit and a second camera unit; and an included angle is formed between said first camera unit and said second camera unit, and the method further comprising steps of: said first camera unit acquiring a first image; said second camera unit acquiring a second image; and said processing unit giving said first stereoscopic image according to said first image, said second image, and said included angle.
4. The gesture operation method based on depth values of claim 1, wherein said mobile vehicle includes a stereoscopic-image camera module; said stereoscopic-image camera module includes a first camera unit and a second camera unit; and an included angle is formed between said first camera unit and said second camera unit, and the method further comprising steps of: said first camera unit acquiring a third image; said second camera unit acquiring a fourth image; and said processing unit giving said second stereoscopic image according to said third image, said fourth image, and said included angle.
5. The gesture operation method based on depth values of claim 1, wherein said mobile vehicle includes a stereoscopic-image camera module; said stereoscopic-image camera module includes a structured-light projecting module and a structured-light camera unit, and the method further comprising steps of: said structured-light projecting module projecting a plurality of light planes continuously; said structured-light camera unit receiving a plurality of light pattern messages; and calculating according to said plurality of light pattern messages to obtain said first stereoscopic image or said second stereoscopic image.
6. A gesture operation system based on depth values for a mobile vehicle, comprising: a stereoscopic-image camera module comprising a first camera unit, a second camera unit, a structured-light projecting module, and a structured-light camera unit, wherein an included angle is formed between said first camera unit and said second camera unit; and a processing unit, obtaining a first stereoscopic image according to a first image acquired by said first camera unit, a second image acquired by said second camera unit, and said included angle, obtaining a second stereoscopic image according to a third image acquired by said first camera unit, a fourth image acquired by said second camera unit, and said included angle, and recognizing if a plurality of depth values of said first stereoscopic image includes a triggering gesture by performing a first algorithm and recognizing if a plurality of depth values of said second stereoscopic image includes a command gesture by performing a second algorithm; wherein said processing unit judges if said number of said plurality of depth values of said first stereoscopic image is smaller than a threshold value; said structured-light projecting module projects a plurality of light planes continuously if said number of said plurality of depth values of said first stereoscopic image is smaller than said threshold value; said structured-light camera unit receives a plurality of light pattern messages; said processing unit calculates according to said plurality of light pattern messages to obtain a third stereoscopic image; and said processing unit fuses said first stereoscopic image and said third stereoscopic image to form a fused first stereoscopic image; further, said processing unit judges if said number of said plurality of depth values of said second stereoscopic image is smaller than a threshold value; said structured-light projecting module projects a plurality of light planes continuously if said number of said plurality of depth values of said first stereoscopic image is smaller than said threshold value; the structured-light camera unit receives a plurality of light pattern messages; said processing unit calculates according to said plurality of light pattern messages to obtain a fourth stereoscopic image; and the processing unit fuses the second stereoscopic image and the fourth stereoscopic image to form a fused second stereoscopic image; and wherein the processing unit is corresponded to an operation of said command gesture.
7. The gesture operation system based on depth values of claim 6, wherein said processing unit identifies a location of said triggering gesture in said mobile vehicle by performing a clustering algorithm and judges if performing said corresponding operation of said command gesture is permitted according to said location of said triggering gesture in said mobile vehicle; and if permitted, said processing unit performs said corresponding operation of said command gesture.
8. The gesture operation system based on depth values of claim 6, wherein said stereoscopic-image camera module includes a first camera unit and a second camera unit; an included angle is formed between said first camera unit and said second camera unit; and said processing unit obtains said first stereoscopic image according to a first image acquired by said first camera unit, a second image acquired by said second camera unit, and said included angle.
9. The gesture operation system based on depth values of claim 6, wherein said stereoscopic-image camera module includes a first camera unit and a second camera unit; an included angle is formed between said first camera unit and said second camera unit; and said processing unit obtains said second stereoscopic image according to a third image acquired by said first camera unit, a fourth image acquired by said second camera unit, and said included angle.
10. The gesture operation system based on depth values of claim 6, wherein said stereoscopic-image camera module includes a structured-light projecting module and a structured-light camera unit; said structured-light projecting module projects a plurality of light planes continuously; said structured-light camera unit receives a plurality of light pattern messages; and said processing unit calculates according to said plurality of light pattern messages to obtain said first stereoscopic image or said second stereoscopic image.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
DETAILED DESCRIPTION
(12) In order to make the structure and characteristics as well as the effectiveness of the present invention to be further understood and recognized, the detailed description of the present invention is provided as follows along with embodiments and accompanying figures.
(13) According to the prior art, when a driver operates the equipment installed in a mobile vehicle, the driving safety is affected because he needs to move away his line of sight frequently and temporarily or there are too many devices to be operated. Accordingly, the present invention provides a gesture operation method.
(14) In the following, the flow of the gesture operation method based on depth values according to the first embodiment of the present invention will be described. Please refer to
(15) In the following, the system required to implement the gesture operation method based on depth values according to the present invention will be described. Please refer to
(16) The above gesture control system 1 is disposed in a mobile vehicle so that that the stereoscopic-image camera module 10 photographs the interior of the mobile vehicle. In other words, the stereoscopic-image camera module 10 is disposed at a location capable of photographing the driver space or/and the passenger space. The mobile vehicle can include a car, a truck, a bus, an electric car, or others.
(17) As shown in
(18) As shown in
(19) The above processing unit 30 is an electronic component capable of performing arithmetic, logic, and image operations.
(20) In the following, the flow of the gesture operation method based on depth values according to the first embodiment of the present invention will be described. Please refer to
(21) In the step S1 of acquiring a first stereoscopic image, the first camera unit 101 acquires a first image and the second camera unit 103 acquires a second image. The processing unit 30 performs an operation to obtain a first stereoscopic image according to the first image, the second image, and the included angle 105.
(22) In the step S3 of recognizing if a triggering gesture is included, the processing unit 30 performs a first algorithm to recognize if a triggering gesture is included in a plurality of depth values of the first stereoscopic image. If so, the step S5 is performed; otherwise, return to the step S1. The first algorithm is a gesture recognition algorithm. Namely, the processing unit 30 uses the gesture recognition algorithm to recognize if a triggering gesture is included in the first stereoscopic image according to the depth values (the Z values) of a plurality of pixels in the first stereoscopic image.
(23) In the step S5 of acquiring a second stereoscopic image, the first camera unit 101 acquires a third image and the second camera unit 103 acquires a fourth image. The processing unit 30 performs an operation to obtain a second stereoscopic image according to the third image, the fourth image, and the included angle 105.
(24) In the step S7 of recognizing if a command gesture is included, the processing unit 30 performs a second algorithm to recognize if a command gesture is included in a plurality of depth values of the second stereoscopic image. If so, the step S9 is performed; otherwise, return to the step S5. The second algorithm is a gesture recognition algorithm. Namely, the processing unit 30 uses the gesture recognition algorithm to recognize if a command gesture is included in the second stereoscopic image according to the depth values (the Z values) of a plurality of pixels in the second stereoscopic image.
(25) In the step S9 of performing the corresponding operation of the command gesture, the processing unit 30 performs the corresponding operation of the command gesture after recognizing the command gesture. For example, as the command gesture is to turn on the GPS, the processing unit 30 turns on the GPS; as the command gesture is turn on the audio equipment, the processing unit 30 turns on the audio equipment.
(26) Then, the gesture operation method based on depth values according to the first embodiment of the present invention is completed. Compared to the gesture recognition methods using plane images, the present invention performs gesture recognition using stereoscopic images can acquire the outlines of hands more precisely and thus achieving more accurate recognition performance.
(27) In addition, while recognizing gestures using plane images, the user's hands are limited to planar movements, namely, up, down, left, and right movements. Contrarily, while performing gesture recognition using the depth values of stereoscopic images, the three-dimensional movements, namely, forward, backward, up, down, left, and right movements, of the user's hands can be further recognized. Thereby, the user can record more gestures in the gesture operation system 1 for achieving the purpose of operating one or more devices disposed in the mobile vehicle.
(28) Furthermore, according to the present invention, before recognizing the command gesture, the triggering gesture will be recognized first. The user should perform the triggering gesture first and the gesture operation system 1 can recognize the triggering gesture. Thereby, once the user waves his hands unintentionally, false operations by the gesture operation system 1 can be prevented.
(29) In the following, the gesture operation method based on depth values according to the second embodiment of the present invention will be described. Please refer to
(30) The structured-light projecting module 17 includes a laser unit 1071 and a lens set 1073. As shown in
(31) According to the present embodiment, in the steps S1 and S5, the structured-light projecting module 107 projects a plurality of light planes 1075 continuously to the surface of the one or more object 2 in the mobile vehicle. Then the structured-light camera unit 10 acquires a plurality of light pattern messages reflected from the surface of the object 2 after projecting the light planes 1075. The processing unit 30 calculates according to the received plurality of light pattern messages to obtain the first stereoscopic image and the second stereoscopic image.
(32) According to another embodiment of the present invention, the projection distance of the laser can be further increased or reduced by adjusting the power of the laser unit 1071. For example, the power of the laser unit 1071 can be increased in an open space and reduced in a crowded space.
(33) According to the gesture operation method based on depth values according to the second embodiment of the present invention, because the structured-light projecting module 107 can project structured light spontaneously to the surrounding object 2, the method can be applied to a dark or dim environment.
(34) In the following, the gesture operation method based on depth values according to the third embodiment of the present invention will be described. Please refer to
(35) In the step S4 of identifying the location of the triggering gesture in the mobile vehicle, the processing 30 performs a clustering algorithm to identify the location of the triggering gesture in the mobile vehicle. In other words, the processing unit 30 performs the clustering algorithm to identify the location of the user performing the triggering gesture in the mobile vehicle. For example, the user performing the triggering gesture is located at the driver's seat, the shotgun seat, or the back seat.
(36) According to an embodiment of the present invention, after identifying the location of the user performing the triggering gesture in the mobile vehicle, the user's identification can be further acquired. For example, the user located at the driver's seat is the driver; the user located at the shotgun seat is the co-driver or the passenger; or the one located at the back seat is the passenger.
(37) According to an embodiment of the present invention, the clustering algorithm is the K-means clustering algorithm or other algorithms capable of clustering objects.
(38) In the step S8 of judging if performing the command gesture is permitted, the processing unit 30 judges if performing the corresponding operation of the command gesture is permitted according to the location of the triggering gesture in the mobile vehicle. In other words, the processing unit 30 will judge whether the corresponding operation of the command gesture should be performed according to the location of the user performing the triggering gesture.
(39) Then, the gesture operation method based on depth values according to the third embodiment of the present invention is completed. By using the present embodiment, different levels of permission can be granted according to different locations. For example, the driver can own the permission of operating the windshield wiper, the car lamp, and the GPS; the co-driver can have the permission to operate the GPS and the multimedia system; and the passenger is permitted to operate the multimedia system. Thereby, the risk of users' unintentional activation of devices, such as the windshield wiper, that might interfere the driver can be lowered. Hence, the safety of the present invention is enhanced.
(40) Next, the gesture operation method based on depth values according to the fourth embodiment of the present invention will be described. Please refer to
(41) In the step S1 of acquiring a first stereoscopic image, the first camera unit 101 acquires a first image and the second camera unit 103 acquires a second image. The processing unit 30 performs an operation to obtain a first stereoscopic image according to the first image, the second image, and the included angle 105.
(42) In the step S101 of judging if the value of depth values of the first stereoscopic image is smaller than a threshold value, the processing unit 30 performs the judgment. If so, the step S103 is executed; otherwise, go to the step S3. In other words, the processing unit 30 scans the pixels of the first stereoscopic image and gathers statistics of the number of pixels including depth values for giving the number of depth values. Then the processing unit 30 judges if the number of depth values is smaller than a threshold value.
(43) In the step S103 of acquiring a third stereoscopic image, the structured-light projecting module 107 projects a plurality of light planes 1075 continuously to the surface of the one or more object 2 in the mobile vehicle. Then the structured-light camera unit 109 acquires a plurality of light pattern messages reflected from the surface of the object 2 after projecting the light planes 1075. The processing unit 30 calculates according to the received plurality of light pattern messages to obtain the third stereoscopic image.
(44) In the step S105 of fusing the first stereoscopic image and the third stereoscopic image, the processing unit 30 performs the fusion according to an image fusion algorithm to obtain a fused first stereoscopic image.
(45) Afterwards, in the step S3, the processing judges if the fused first stereoscopic image includes the triggering gesture according to a first algorithm.
(46) According to the present embodiment, while executing the step S1, the first stereoscopic image can be also acquired by using the structured-light projecting module 107 and the structured-light camera unit 109. In this case, in the step S103, the processing unit 30 performs an operation to obtain the third stereoscopic image according to the first image acquired by the first camera unit 101, the second image acquired by the second camera unit 103, and the included angle 105.
(47) Because the first and second images taken by the first and second camera units 101, 103 in a dim place may appear partially or fully dark, the calculated first stereoscopic image may include no depth value in some pixels. In addition, the light pattern messages in the first stereoscopic image acquired by the structured-light projecting module 107 and the structured-light camera unit 109 in a bright environment will be incomplete owing to interference of light. In this condition, the obtained first stereoscopic image may include no depth value in some pixels. If the pixels containing no depth value in the first stereoscopic image is excessive, it is difficult for the first algorithm to recognize if the first stereoscopic image includes the triggering gesture. According to the present embodiment, once the number of depth values in the first stereoscopic image is judged to be smaller than a threshold value, the third stereoscopic image is further acquired and fused with the first stereoscopic image. The produced fused first stereoscopic image will contain sufficient number of the depth values for facilitating the first algorithm to recognize if the fused first stereoscopic image includes the triggering gesture.
(48) According to an embodiment of the present embodiment, the threshold value is 50% of the total number of pixels of the first stereoscopic image. In other words, when the first stereoscopic image contains 1000 pixels, the threshold value is 500.
(49) Next, the gesture operation method based on depth values according to the fifth embodiment of the present invention will be described. Please refer to
(50) In the step S5 of acquiring a second stereoscopic image, the first camera unit 101 acquires a third image and the second camera unit 103 acquires a fourth image. The processing unit 30 performs an operation to obtain a second stereoscopic image according to the third image, the fourth image, and the included angle 105.
(51) In the step S101 of judging if the value of depth values of the second stereoscopic image is smaller than a threshold value, the processing unit 30 performs the judgment. If so, the step S503 is executed; otherwise, go to the step S7. In other words, the processing unit 30 scans the pixels of the second stereoscopic image and gathers statistics of the number of pixels including depth values for giving the number of depth values. Then the processing unit 30 judges if the number of depth values is smaller than a threshold value.
(52) In the step S53 of acquiring a fourth stereoscopic image, the structured-light projecting module 107 projects a plurality of light planes 1075 continuously to the surface of the one or more object 2 in the mobile vehicle. Then the structured-light camera unit 109 acquires a plurality of light pattern messages reflected from the surface of the object 2 after projecting the light planes 1075. The processing unit 30 calculates according to the received plurality of light pattern messages to obtain the fourth stereoscopic image.
(53) In the step S505 of fusing the second stereoscopic image and the fourth stereoscopic image, the processing unit 30 performs the fusion according to an image fusion algorithm to obtain a fused second stereoscopic image.
(54) Afterwards, in the step S7, the processing judges if the fused second stereoscopic image includes the triggering gesture according to a second algorithm.
(55) According to the present embodiment, while executing the step S5, the second stereoscopic image can be also acquired by using the structured-light projecting module 107 and the structured-light camera unit 109. In this case, in the step S503, the processing unit 30 performs an operation to obtain the fourth stereoscopic image according to the third image acquired by the first camera unit 101, the fourth image acquired by the second camera unit 103, and the included angle 105.
(56) According to an embodiment of the present embodiment, the threshold value is 50% of the total number of pixels of the first stereoscopic image. In other words, when the second stereoscopic image contains 1000 pixels, the threshold value is 500.
(57) According to an embodiment of the present embodiment, the first stereoscopic image can be continuous images, representing the continuous gestures of the triggering gesture. In the step S3, the processing unit 30 performs the first algorithm and the hidden Markov model (HMM) to recognize if the first stereoscopic image includes the triggering gesture.
(58) According to an embodiment of the present embodiment, the first stereoscopic image can be continuous images, representing the continuous gestures of the triggering gesture. In the step S3, the processing unit 30 performs the first algorithm and the hidden Markov model (HMM) to recognize if the first stereoscopic image includes the triggering gesture.
(59) According to an embodiment of the present embodiment, the plurality of light pattern messages are formed by reflecting the plurality of light planes from one or more object.
(60) Accordingly, the present invention conforms to the legal requirements owing to its novelty, nonobviousness, and utility. However, the foregoing description is only embodiments of the present invention, not used to limit the scope and range of the present invention. Those equivalent changes or modifications made according to the shape, structure, feature, or spirit described in the claims of the present invention are included in the appended claims of the present invention.