DYNAMIC IDENTITY AUTHENTICATION
20230306094 · 2023-09-28
Inventors
- David Mendlovic (Tel Aviv, IL)
- Menahem Koren (Tel Aviv, IL)
- Lior Gelberg (Tel Aviv, IL)
- Khen Cohen (Tel Aviv, IL)
- Mor-Avi Azulay (Tel Aviv, IL)
- Ohad Volvovitch (Tel Aviv, IL)
Cpc classification
G06F21/32
PHYSICS
International classification
G06F21/32
PHYSICS
G06V40/10
PHYSICS
Abstract
A method of identifying a person, the method comprising: acquiring spatiotemporal data for each of a plurality of anatomical landmarks associated with an activity engaged in by a person that defines a spatiotemporal trajectory of the anatomical landmark during the activity; modeling the acquired spatiotemporal data as a spatiotemporal graph (ST-Graph); and processing the ST-Graph using at least one non-local graph convolution neural network (NLGCN) to provide an identity for the person.
Claims
1. A method of identifying a person, the method comprising: acquiring spatiotemporal data for each of a plurality of anatomical landmarks associated with an activity engaged in by a person that provide data defining at least one spatiotemporal trajectory of the anatomical landmarks during the activity; modeling the acquired spatiotemporal data as a spatiotemporal graph (ST-Graph); and processing the ST-Graph using at least one non-local graph convolution neural network (NLGCN) to provide an identity for the person, wherein the processing of the ST-Graph comprises segmenting the plurality of anatomical landmarks into a plurality of sets of anatomical landmarks, each set characterized by a different configuration of degrees of freedom of motion.
2. The method according to claim 1 wherein the at least one NLGCN comprises at least one adaptive NLGCN (ANLGCN) including an adaptive adjacency matrix learned responsive to data relating anatomical landmarks of the plurality of anatomical landmarks that is not dictated solely by the person's physical body structure.
3. (canceled)
4. The method according to claim 1 and comprising modeling the acquired spatiotemporal data associated with the anatomical landmarks in each set as a ST-Graph.
5. The method according to claim 4 wherein processing comprises processing the ST-Graph modeled for each set of the plurality of sets of anatomical landmarks with an NLGCN of the at least one NLGCN independent of processing the other sets of the plurality of sets to determine data indicating an identity for the person.
6. The method according to claim 5 and comprising fusing the determined data from all the sets to provide the identity for the person.
7. The method according to claim 1, wherein acquiring the spatiotemporal data comprises acquiring a sequence of video frames imaging the person engaging in the activity, each video frame including an image of at least one body region of interest (BROI) imaging an anatomical landmark of the plurality of anatomical landmarks.
8. The method according to claim 7 and comprising processing the video frames to detect in each video frame the at least one BROI.
9. The method according to claim 7 and comprising identifying in each of the at least one detected BROI an image of an anatomical landmark of the plurality of anatomical landmarks.
10. The method according to claim 9 and comprising processing the images of the identified anatomical landmarks to determine the data defining the spatiotemporal trajectories.
11. The method according to claim 7 wherein the plurality of anatomical landmarks comprises joints.
12. The method according to claim 11 wherein the plurality of anatomical landmarks comprises bones connecting the joints.
13. The method according to claim 11 wherein the joints comprise finger knuckles.
14. The method according to claim 13 wherein the activity comprises a sequence of finger manipulations.
15. The method according to claim 14 wherein the finger manipulations comprise manipulations engaged in to operate a keyboard.
16. The method according to claim 11 wherein the joints comprise joints of the large appendages.
17. The method according to claim 16 wherein the activity is a sport.
18. The method according to claim 17 wherein the sport is soccer.
19. The method according to claim 17 wherein the sport is golf.
20. A method of identifying a person, the method comprising: acquiring spatiotemporal data for each of a plurality of anatomical landmarks associated with an activity engaged in by a person that provide data defining at least one spatiotemporal trajectory of the anatomical landmarks during the activity, wherein the plurality of anatomical landmarks comprises facial landmarks; modeling the acquired spatiotemporal data as a spatiotemporal graph (ST-Graph); and processing the ST-Graph using at least one non-local graph convolution neural network (NLGCN) to provide an identity for the person.
21. The method according to claim 20 wherein the facial landmarks comprise facial landmarks whose motions are used to define action units (AUs) of the facial action coding system (FACS) used to taxonomize facial expressions and micro-expressions.
22. The method according to claim 20 wherein the plurality of anatomical landmarks comprises minutia pair features of fingerprints of a plurality of fingers of a hand.
23. (canceled)
Description
BRIEF DESCRIPTION OF FIGURES
[0014] Non-limiting examples of embodiments of the invention are described below with reference to figures attached hereto that are listed following this paragraph. Identical features that appear in more than one figure are generally labeled with a same label in all the figures in which they appear. A label labeling an icon representing a given feature of an embodiment of the invention in a figure may be used to reference the given feature. Dimensions of features shown in the figures are chosen for convenience and clarity of presentation and are not necessarily shown to scale
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
DETAILED DESCRIPTION
[0026] In the discussion, unless otherwise stated, adjectives such as “substantially” and “about” modifying a condition or relationship characteristic of a feature or features of an embodiment of the disclosure, are understood to mean that the condition or characteristic is defined to within tolerances that are acceptable for operation of the embodiment for an application for which it is intended. Wherever a general term in the disclosure is illustrated by reference to an example instance or a list of example instances, the instance or instances referred to, are by way of non-limiting example instances of the general term, and the general term is not intended to be limited to the specific example instance or instances referred to. The phrase “in an embodiment”, whether or not associated with a permissive, such as “may”, “optionally”, or “by way of example”, is used to introduce for consideration an example, but not necessarily a required configuration of possible embodiments of the disclosure. Unless otherwise indicated, the word “or” in the description and claims is considered to be the inclusive “or” rather than the exclusive or, and indicates at least one of, or any combination of more than one of items it conjoins.
[0027]
[0028] In a block 22 DYNAMIDE optionally acquires a sequence of video frames of a person engaged in an activity that DYNAMIDE is configured to process to determine an identity for the person engaged in the activity, in accordance with an embodiment of the disclosure. In a block 24 DYNAMIDE processes the video frames to identify images of body regions of interest (BROIs) in the video frames that image at least one AFID related to the activity. Identifying a BROI in a video frame optionally comprises determining at least one bounding box in the frame that includes an image of the BROI. In a block 26 DYNAMIDE processes each of the bounding boxes determined for the video frames to identify in each of the bounding boxes an image of the at least one AFID. Identifying an image of an AFID in a bounding box of a video frame optionally comprises associating with the image a spaciotemporal ID (ST-ID) comprising an identifying label of the AFID, an “AFID ST-ID”, which is used to label all identified images of the same AFID in the video frames and determining spatiotemporal coordinates for the image. The spatiotemporal coordinates comprise a time stamp and at least two spatial coordinates. The time stamp identifies a time, a temporal location, at which the video frame comprising the bounding box in which the AFID is located was acquired relative to times at which other video frames in the sequence of video frames were acquired. The at least two spatial coordinates correspond to a spatial location of the AFID at a time indicated by the time stamp. Optionally, the AFID ST-ID for a given identified AFID comprises a standard deviation (sd) for each spatial coordinate and a probability that the AFID-ID label associated with the AFID ST-ID is correct. An earliest and latest time stamp and extreme spatial coordinates determined for the AFID ST-IDs determine a space-time volume, which may be referred to as a spatiotemporal AFID hull (ST-Hull), that contains the spatiotemporal coordinates of all instances of the AFIDs imaged and identified in the sequence of video frames.
[0029] In a block 28 DYNAMIDE uses the ST-IDs of the AFIDS to configure the identified instances of the AFIDs as nodes of an AFID spatiotemporal graph (ST-Graph) that are connected by spatial and temporal edges. Spatial edges connect ST-Graph nodes that represent imaged instances of AFIDs identified by a same time stamp, that is instances of AFIDs that are imaged in a same video frame, and represent spatial constraints imposed on the AFIDs by structure of a person's body. The configuration of nodes connected by spatial edges that represent spatial relations of instances of AFIDs imaged in a same given frame and given time may be referred to as a spatial graph (S-Graph) of the AFIDs at the given time. Temporal edges connect temporally adjacent nodes in the ST-Graph representing images of the same AFID in two consecutively acquired video frames in the sequence of video frames. Temporal edges represent an elapsed time between two consecutive time stamps. The ST-Graph may be considered to comprise S-Graphs for the AFIDs connected by temporal edges.
[0030] In an embodiment, in a block 30 DYNAMIDE processes the AFID ST-Graph using, an optionally adaptive, non-local graph convolutional neural net, an ANLGCN, to determine, optionally in real time, which person from amongst a plurality of persons that the ANLGCN was trained to recognize, is engaged or engaging in the activity. In an embodiment the ANLGCN is configured to span the AFID ST-Hull and enable data associated with an imaged instance of an AFID at any spatiotemporal location in the hull to be weighted by a learned weight and contribute to a convolution performed by the ANLGCN for a spatiotemporal location anywhere else in the hull. Optionally, the NLGCN is configured as a multi-stream GCN comprising a plurality of component NLGCNs that operate to process sets of AFID data characterized by independent degrees of freedom.
[0031]
[0032] Imaging system 110 is operable to provide a video sequence 114 of a plurality of “N” 2D and/or 3D video frames 114.sub.n, 1≤n≤N, of a hand or hands 52 of person 50 typing on keypad 62. Imaging system 110 is connected by at least one wireline and/or wireless communication channel 113 to hub 120, via which imaging system 110 transmits video frames it acquires to the hub. Hub 120 is configured to process the received video frames 114.sub.n to identify person 50 whose hand 52 is imaged in the video frames. The hub comprises and/or has access to data and/or executable instructions, hereinafter also referred to as software, and to any of various electronic and/or optical physical, and/or virtual, processors, memories, and/or wireline or wireless communication interfaces, hereinafter also referred to as hardware, that may be required to support functionalities that the hub provides.
[0033] By way of example, hub 120 comprises software and hardware that support an object detection module 130 operable to detect BROIs in video frames 114.sub.n, an AFID identifier module 140 for identifying AFIDs in detected BROIs and providing each identified AFID with a ST-ID, and a classifier module 150 comprising a non-local classifier operable to process the set of ST-IDs as a spatiotemporal graph to identify person 50.
[0034] In an embodiment object BROI detector module 130 comprises a fast object detector, such as a YOLO (You Look Only Once) detector that is capable of detecting relevant BROIs in real time. AFID identifier module 140 may comprise a convolutional pose machine (CPM) for identifying AFIDs in the detected BROIs. Classifier module 150 comprises an, optionally adaptive, non-local graph convolutional network noted above and discussed below. In
[0035] In an embodiment the AFIDs that DYNAMIDE 100 uses to identify a person typing are joints (finger and/or wrist joints) and finger bones (phalanges) of the typing hand.
[0036]
[0037]
[0038] As discussed above, in processing sequence 114 of video frames 114.sub.n object detection module 130 may determine bounding boxes that locate images of hand 52 in the frames as objects comprising joint AFIDs that AFID detector 140 identifies and DYNAMIDE 100 uses to identify person 50. A bounding box determined by object detector module 130 for hand 52 in video frame 114.sub.n is indicated by a dashed rectangle 116. Knuckle AFIDs that AFID detector 140 detects in bounding box 116 and identifies are indicated by the generic AFID labels JH
[0039]
[0040] The node data associated with ST-Graph-52 provides a set of spatiotemporal input features that classifier module 150 of DYNAMIDE hub 120 processes to determine the identity of person 50 typing on keypad 62 of ATM 60. The set of input features may be modeled as schematically shown in
[0041] In an embodiment, classifier module 150 may have a classifier comprising at least one non-local graph convolutional net (NLGCN) to process the data in tensor 300 and provide an identity for person 50 in accordance with an embodiment of the disclosure. Optionally, the at least one NLGCN comprises at least one adaptive ANLGCN which includes in addition to a non-local GCN layer an adaptive adjacency matrix. The adaptive adjacency matrix operates to improve classifier recognition of spatiotemporal motion of joints of a hand relative to each other that are not dictated by spatial structure and are idiosyncratic to the manner in which a person performs typing.
[0042] By way of example,
[0043] There is therefore provided in accordance with an embodiment of the disclosure, a method of identifying a person, the method comprising: acquiring spatiotemporal data for each of a plurality of anatomical landmarks associated with an activity engaged in by a person that provide data defining at least one spatiotemporal trajectory of the anatomical landmarks during the activity; modeling the acquired spatiotemporal data as a spatiotemporal graph (ST-Graph); and processing the ST-Graph using at least one non-local graph convolution neural network (NLGCN) to provide an identity for the person. Optionally, the at least one NLGCN comprises at least one adaptive NLGCN (ANLGCN) including an adaptive adjacency matrix learned responsive to data relating anatomical landmarks of the plurality of anatomical landmarks that is not dictated solely by the person's physical body structure. Additionally or alternatively, processing the ST-Graph comprises segmenting the plurality of anatomical landmarks into a plurality of sets of anatomical landmarks, each set characterized by a different configuration of degrees of freedom of motion. Optionally the method comprises modeling the acquired spatiotemporal data associated with the anatomical landmarks in each set as a ST-Graph. Processing may comprise processing the ST-Graph modeled for each set of the plurality of sets of anatomical landmarks with an NLGCN of the at least one NLGCN independent of processing the other sets of the plurality of sets to determine data indicating an identity for the person. The method optionally comprises fusing the determined data from all the sets to provide the identity for the person.
[0044] In an embodiment acquiring the spatiotemporal data comprises acquiring a sequence of video frames imaging the person engaging in the activity, each video frame including an image of at least one body region of interest (BROI) imaging an anatomical landmark of the plurality of anatomical landmarks. Optionally the method comprises processing the video frames to detect in each video frame the at least one BROI. Additionally or alternatively the method optionally comprises identifying in each of the at least one detected BROI an image of an anatomical landmark of the plurality of anatomical landmarks. Optionally, the method comprises processing the images of the identified anatomical landmarks to determine the data defining the spatiotemporal trajectories.
[0045] In an embodiment the plurality of anatomical landmarks comprises joints. Optionally, the plurality of anatomical landmarks comprises bones connecting the joints. Additionally or alternatively the joints comprise finger knuckles. Optionally, the activity comprises a sequence of finger manipulations. The finger manipulations may comprise manipulations engaged in to operate a keyboard.
[0046] In an embodiment the joints comprise joints of the large appendages. Optionally the activity is a sport. Optionally the sport is soccer. Optionally, the sport is golf.
[0047] In an embodiment the plurality of anatomical landmarks comprises facial landmarks.
[0048] Optionally, the facial landmarks comprise facial landmarks whose motions are used to define action units (AUs) of the facial action coding system (FACS) used to taxonomize facial expressions and micro-expressions. In an embodiment the plurality of anatomical landmarks comprises minutia pair features of fingerprints of a plurality of fingers of a hand.
[0049] There is further provided in accordance with an embodiment a system for identifying a person, the system comprising: an imaging system operable to acquire a video having video frames imaging a person engaging in an activity; and software useable to process the video frames in accordance with any of the preceding claims to provide an identity for the person.
[0050] Descriptions of embodiments of the invention in the present application are provided by way of example and are not intended to limit the scope of the invention. The described embodiments comprise different features, not all of which are required in all embodiments. Some embodiments utilize only some of the features or possible combinations of the features. Variations of embodiments of the invention that are described, and embodiments comprising different combinations of features noted in the described embodiments, will occur to persons of the art. The scope of the invention is limited only by the claims.