Eye-tracking communication methods and systems

11612342 · 2023-03-28

Assignee

Inventors

Cpc classification

International classification

Abstract

Provided is a control system that interfaces with an individual through tracking the eyes and/or tracking other physiological signals generated by an individual. The system, is configured to classify the captured eye images into gestures, that emulate a joystick-like control of the computer. These gestures permit the user to operate, for instance a computer or a system with menu items.

Claims

1. An eye tracking-based system, comprising: a camera operable for continuously capturing images of one or both of the user's eye and eyelid and generating image data representative thereof; a first output module, a computerized or process-driven module; and a control unit in data communication with the camera and with the first output module; wherein the control unit is configured for receiving and processing said image data to identify at least one of pupil position and eyelid movement, and to classify the eye image into gestures based on pupil position and pupil presence duration within an area of a threshold map, said processing further comprises determining whenever the pupil area touches a border or is tangent to a border of the threshold map, to thereby define the gesture type, selected from one or more of pupil position, sequence of pupil positions, and sequences of eyelid blinks and generating gesture data, operating a hierarchical user-selectable menu items to permit the user to navigate through and select menu items by said gestures data, and for driving the first output module to present the menu items to the user; and wherein the first output module is configured for providing the user with audio presentation of a time-based prompt menu for selection of items with a predefined gesture, wherein a first menu item is announced in time t.sub.1 and a second menu item is announced in t.sub.2, and a third menu item is announced in t.sub.3, the first menu item is selected if said predefined gesture is made in a time between t.sub.1 and t.sub.2 and the second menu item is selected if said predefined gesture is made in a time between t.sub.2 and t.sub.3.

2. The system of claim 1, wherein said camera is carried on a holder attachable to the user's head.

3. The system of claim 1, comprising a driver for a second output module for outputting data representative of selected items.

4. The system of claim 3, wherein the second output module is configured for outputting an alert.

5. The system of claim 4, wherein at least one gesture triggers said alert.

6. The system of claim 3, wherein the second output module is configured for outputting an alert to a care-giver.

7. An eye tracking-based system, comprising: a camera operable for continuously capturing images of one or both of the user's eye and eyelid and generating image data representative thereof; a first output module, a computerized or process-driven module; and a control unit in data communication with the camera and with the first output module; wherein the control unit is configured for receiving and processing said image data to identify at least one of pupil position and eyelid movement, and to classify these into gestures comprising one or more of pupil position, sequence of pupil positions, and sequences of eyelid blinks and generating gesture data, operating a hierarchical user-selectable menu items to permit the user to navigate through and select menu items by said gesture data, and for driving the first output module to present the menu items to the user; and wherein the first output module is configured for providing the user with audio presentation of a time-based prompt menu for selection of items with a predefined gesture, wherein a first menu item is announced in time t.sub.1 and a second menu item is announced in t.sub.2, and a third menu item is announced in t.sub.3, the first menu item is selected if said predefined gesture is made in a time between t.sub.1 and t.sub.2 and the second menu item is selected if said predefined gesture is made in a time between t.sub.2 and t.sub.3.

8. An eye tracking-based system, comprising: a camera operable for continuously capturing images of one or both of the user's eye and eyelid and generating image data representative thereof; a first output module, a computerized or process-driven module; and a control unit in data communication with the camera and with the first output module; wherein the control unit is configured for receiving and processing said image data to identify at least one of pupil position and eyelid movement, and to classify the eye image into gestures based on pupil position and pupil presence duration within an area of a threshold map, said processing further comprises determining whenever the pupil area touches a border or is tangent to a border of the threshold map, to thereby define the gesture type, selected from one or more of pupil position, sequence of pupil positions, and sequences of eyelid blinks and generating gesture data, operating a hierarchical user-selectable menu items to permit the user to navigate through and select menu items by said gestures data, and for driving the first output module to present the menu items to the user; and wherein the first output module is configured for providing the user with audio presentation of a time-based prompt menu for selection of items with a predefined gesture, wherein a first menu item is announced in time t.sub.1 and a second menu item is announced in t.sub.2, and a third menu item is announced in t.sub.3, wherein if said predefined gesture is made in a time t, the first menu item is selected if t is between t.sub.1 and t.sub.2 and the second menu item is selected if t is between t.sub.2 and t.sub.3.

9. An eye tracking-based system, comprising: a camera operable for continuously capturing images of one or both of the user's eye and eyelid and generating image data representative thereof; a first output module, a computerized or process-driven module; and a control unit in data communication with the camera and with the first output module; wherein the control unit is configured for receiving and processing said image data to identify at least one of pupil position and eyelid movement, and to classify the eye image into gestures based on pupil position and pupil presence duration within an area of a threshold map, said processing further comprises determining whenever the pupil area touches a border or is tangent to a border of the threshold map, to thereby define the gesture type, selected from one or more of pupil position, sequence of pupil positions, and sequences of eyelid blinks and generating gesture data, operating a hierarchical user-selectable menu items to permit the user to navigate through and select menu items by said gestures data, and for driving the first output module to present the menu items to the user; and wherein the first output module is configured for providing the user with audio presentation of a time-based prompt menu for selection of items with a predefined gesture, wherein a first menu item is announced in time t.sub.1 and a second menu item is announced in t.sub.2, and a third menu item is announced in t.sub.3, wherein t.sub.2 is a time after t.sub.1 and t.sub.3 is a time after t.sub.2, the first menu item is selected if said predefined gesture is made in a time frame between t.sub.1 and t.sub.2 and the second menu item is selected if said predefined gesture is made in a time frame between t.sub.2 and t.sub.3.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) In order to better understand the subject matter that is disclosed herein and to exemplify how it may be carried out in practice, embodiments will now be described, by way of non-limiting example only, with reference to the accompanying drawings, in which:

(2) FIG. 1A-1B are schematic block diagrams of a system in accordance with embodiments of this disclosure.

(3) FIG. 2 is a schematic block diagram of a system in accordance with another embodiment of this disclosure.

(4) FIG. 3A-3B is a schematic block diagram of a control unit in accordance with an aspect of this disclosure.

(5) FIG. 4 is a schematic visual depiction of a menu layer in accordance with an embodiment of this disclosure.

(6) FIGS. 5-6 are schematic visual depictions of a menu layer in accordance with another embodiment of this disclosure.

(7) FIG. 7 is a schematic illustration of a time-based prompt menu for selection of items according to an embodiment of this disclosure.

(8) FIG. 8 is a schematic illustration of one embodiment of an eye tracking-based system, comprising a camera, a bone conduction speaker and a control unit.

(9) FIG. 9 is a schematic illustration of a one embodiment of the joystick-like gesture classification in which the position of the pupil area is determined based on a threshold map (the innermost square), a position map (the middle square) and the ROI map (the outmost square).

(10) FIG. 10 is an illustration of one embodiment of mapping between an eye gesture and commands in a single gesture operation mode.

(11) FIG. 11 is an illustration of one embodiment of mapping between eye gestures and commands in a two gestures operation mode.

(12) FIG. 12 is an illustration of one embodiment of mapping between eye gestures and commands in a three gestures operation mode.

(13) FIG. 13 is an illustration of one embodiment of mapping between eye gestures and commands in a four gestures operation mode.

(14) FIG. 14 is an illustration of one embodiment of mapping between eye gestures and commands in a five gestures operation mode.

DETAILED DESCRIPTION OF EMBODIMENTS

(15) Reference is first made to FIGS. 1A-1B illustrating a schematic block diagram of a system in accordance with an embodiment of this disclosure. The eye tracking-based system 100 comprises a camera 102, mounted on a frame or carried on a holder attached to a user's head. The camera 102 is operable for continuously capturing images of one or both of the user's eye and eyelid and generating image data representative thereof. The system 100 includes a control unit 104 that is in data communication with the camera 102 and with first output module 106, typically through an actuator module 108 that drives the first output module 106. Output module 106 may be a visual display, e.g. digital screen, or an audible device, e.g. speaker, headphones, etc.

(16) Control unit 104 includes also a processor 110 that is configured for receiving and processing image data from the camera 102 and for identifying at least one of pupil position and eyelid movement and to classify these into gestures comprising one or more of pupil position, sequence of pupil positions, and sequences of eyelid blinks and generating gesture data. The processor 110 is also configured for driving the menu generator 112 which, through the action of actuator module 108 drives the presentation of the menu to the user. This permits the user to navigate through and select menu items by said gesture data.

(17) FIG. 1B shows a block diagram of the system of the present disclosure, wherein the system communicates with a sensor for measuring a physiological parameter 115 (for example EEG, Electromyography (EMG), or head motion measurement device). Specifically, the device 115 is in data communication with the control unit 104 which is configured to communicate commands based on the user's detected physiological signal. The physiological signals can be analyzed and translated into commands in the system 100, such as starting the system, initiating navigation process, selecting menu item, etc.

(18) In FIGS. 2-3 like elements to those of FIGS. 1A-1B were given like reference numerals shifted by 100. For example element 204 in FIG. 2 serves the same function as element 104 in FIGS. 1A-1B. The reader is, thus, referred to the description of FIGS. 1A-1B to understand their meaning and function.

(19) The system of FIG. 2 differs from that of FIGS. 1A-1B in that the former includes also a second actuator module 214 that is operable to drive a second output unit 216 that may be part of the system or an external element such as an alerting device, a display screen, a utility for operating device in the user's vicinity (drapes, music player, lights, etc.). In other words, the second output unit 216 establishes the connection of the system to the environment of the system, by wired (for example infrared) or wireless connections to other devices, connections to a cloud server or connection to communication means with the surrounding of the user. For example, the system can be wirelessly connected, e.g. by Wi-Fi or Bluetooth, to smart home devices that are operable by gestures of a user using the system 200. The system may be configured to drive the second output unit by means of specific defined gestures or through a selectable menu item. Such specific gesture may be predefined or user-selectable.

(20) Reference is now being made to FIG. 3 illustrating a schematic block diagram of a control unit in accordance with two aspects of this disclosure. The control unit 304 includes a data input utility 303 that is in data communication with a camera that captures continuous images of the eye. 302 (FIG. 3A) or a with a sensor for measuring a physiological parameter 301 (FIG. 3B) The data received by the data input utility 303 is processed by the processor 310 and processed data is classified by the classifier 305 into gestures. The classified gestures are then being sent to a computer 307 that is in data communication with the control unit by a data communication module 309 to thereby control operation of the computer 307.

(21) Reference is now being made to FIG. 4, which is a schematic visual depiction of a menu layer in accordance with an embodiment of this disclosure. As can be appreciated, the menu layer has several menu items, each one of them is selected by a different gesture. For example, an up gesture UG, namely an up position of the pupil, will be driven a selection of playing music. Accordingly, left gesture LG will be driven communication with a caregiver menu, center gesture CG watching TV, right gesture RG hearing a book and down gesture DG will open free texting menu. Some of the menu items are enabled by wireless connectivity, such as between the TV and the system by Bluetooth or Wi-Fi network, while other menu items are enabled by connectivity to a cloud server such as in the case of hearing a book or playing music. Playing music or hearing a book can be done directly from the cloud server without having the data on a local memory. It should be noted that the data exchange with the cloud may work on both ways, namely data can be downloaded from the cloud to the system and data can be uploaded to the cloud from the system.

(22) At any time and in any layer of the menu, when the user will make a predefined gestures sequence PGS, it will trigger a predefined action such as outputting an emergency alert for a caregiver e.g. by voice alert through a speaker, textual alert to a mobile device, alerting a medical center or any combination thereof. The predefined gestures sequence PGS may be configured according the user's will, for example it can be a sequence of 3 or 4 blinks, a sequence of up gesture UG, down gesture DG, up gesture UG and down gesture DG, or any other desired sequence.

(23) FIGS. 5-7 are schematic visual depictions of a menu layer in accordance with another embodiment of this disclosure, exemplifying a unique method of selection of item menu by a way of free texting. The letters are clustered in groups of letters, e.g. 4, 5 or 6 letters in each group. The user can navigate between the groups and selecting a specific letter in a group by making a specific gesture in the right timing. In FIG. 5 the system is now presenting the group of letters A,B,C,D,E. The user can make an up gesture UG and navigate to the group V,W,X,Y,Z or down gesture DG to navigate the group F,G,H,I,J. Other gestures can be made to trigger other commands such as deleting a letter, using a backspace key or going back to the previous menu, and are brought up only by way of example. It should be noted that these commands can be replaced by any other suitable commands or can be removed. FIG. 6 exemplifies a user selection of down gesture DG in the menu layer of FIG. 5 that triggered a menu item including the group of letters F,G,H,I,J. The system may trigger an automatic output session of the letters in the group, such as announcing the name of each letter, through speakers or headphones, in a time difference from other letters as exemplified in FIG. 7. As can be appreciated, the letter F is announced in a time t.sub.1 and the letter G is announced in the time t.sub.2 etc. When a specific predefined gesture PG is made, e.g. blinking once or twice, a letter is selected. For example, if the predefined gesture PG is made in time t.sub.1<t<t.sub.2 the letter F will be selected and if the predefined gesture PG is made in time t.sub.3<t<t.sub.4 the letter H will be selected. In another embodiment of the system, the output session of the letters in the group is triggered according to the user request by a predefined gesture PG. That may be of relevance where the subject using the system lacks the capability of performing some of the gestures, such as left, right, up or down gesture. In this scenario, the navigation in the system may initiate by a first predefined gesture PG.sub.1 and items menu can be selected by a second predefined gesture PG.sub.2, the first and the second predefined gestures may be different or identical. For example, when the system is in the state of FIG. 6, the user may close his eyes to trigger the output session of the letters in the group and when the desired letter is heard, the user opens his eyes for selecting the letter. It should be understood that by making the up or down gestures UG, DG the system will navigate to other groups of letters as can be seen in FIG. 6.

(24) In order to improve classification of the gestures, the system may be trained by a machine/deep learning algorithm. First, the system is received with labeled gestures images (Blink, Center, Up, Down, Right, Left) to gather initial dataset. Then, the system go through a training session with a set of training images. During this training session the system, namely the neural network of the system, learns how to recognize each of the categories in the labeled images. When the present model makes a mistake, it corrects itself and improves. When the training session of the network is over, a testing set of images is received and processed by the system to check the new model of classification. The classification made by the system is compared with the ground-truth labels of the testing set and the number of correct classifications can be computed and values of precision, recall, and f-measure, which are used to quantify the performance of such a network can be obtained.

(25) A schematic illustration of an assistive communication eye tracking-based system is provided by FIG. 8. The system comprise a camera (802) mounted on a lightweight head mount (800, fitted onto the user's head by a family member, a caregiver or the user itself), a bone conduction speaker\headphone (804) and a control unit (not shown).

(26) In clinical trials carried out by the inventors of the present application it demonstrated that patients who were able to comfortably control the system following a brief, several minutes trial. As a non-limiting example provide below as Table 1, in a clinical trial held at Rambem Hospital, Israel, studying the “call for help” function required an average training time of 1.12 minutes, studying to communicate predetermined set of sentences required an average training time of 6.44 minutes, and free-text letter by letter communication using a mobile screen required an average training time of 11.08 minutes.

(27) TABLE-US-00001 TABLE 1 Communication Average training type time (minutes) “Call for help” 1.12 A sentence 6.44 Free text 11.08

(28) A non-limiting embodiment of the joystick-like gesture classification is illustrated in FIG. 9. The classification is based on finding the position of the pupil area. It is obtained based on a threshold map (the innermost square). Specifically a particular position is determined whenever the pupil area touches a border or tangent to a border of the threshold map. For instance, when the pupil area touches the upper border of the threshold map the image data would be classified as an “up” gesture. The threshold map may be derived from a position map (the middle square), for instance being at least 80% away from the center of the position map, and optionally, the position map is within a larger, region of interest (ROI), defined based on anatomical features of the eye or its surrounding.

(29) FIGS. 10-14 provides illustrations of several embodiments of mapping between eye gestures and commands in a single, two, three, four and five gestures operation modes. According to the mapping illustrated in FIG. 10 the user may initiate a scan session and select items by performing a blink gesture. According to the mapping illustrated in FIG. 11 the user may initiate a scan session and select items by performing a blink gesture, and selecting a back command by performing a “right” gesture. According to the mapping illustrated in FIG. 12 the user may traverse the menu items with two gestures (“right”, and “left”) and select items by performing a third, blink gesture. According to the mapping illustrated in FIG. 13 the user may traverse the menu items with three gestures (“right”, “left”, “up”) and select items by performing a blink gesture. According to the mapping illustrated in FIG. 14 the user may traverse the menu items with four gestures (“right”, “left”, “up”, “down”) and select items by performing a blink gesture.