Continuously evolving and interactive Disguised Face Identification (DFI) with facial key points using ScatterNet Hybrid Deep Learning (SHDL) network
11594074 · 2023-02-28
Inventors
Cpc classification
G06F18/214
PHYSICS
G06V10/60
PHYSICS
International classification
G06V40/00
PHYSICS
G06V10/60
PHYSICS
Abstract
Disguised Face Identification (DFI) system and method for identifying multiple individuals with disguised faces in uncontrolled environments/scenarios is provided. The Disguised Face Identification (DFI) system and method includes detecting facial landmarks/facial key-points and performing face identification using the ScatterNet Hybrid Deep Learning (SHDL) Network. The system also can be evolved, after deployment, by the user as it provides one with an ability to add new faces to a known face database which are identified by the system thereafter. Further includes two facial disguise (FG) datasets, the datasets are simple facial disguise (FG) datasets and complex facial disguise (FG) datasets for training the deep convolutional networks.
Claims
1. A Disguised Face Identification (DFI) system for identifying individuals with disguised faces, the system comprising: a Disguised Face Identification (DFI) Framework receives one or more input mage; a ScatterNet Hybrid Deep Learning (SHDL) Network performs estimation of facial keypoints from the input image using the deep convolutional networks; a Disguised Face Classification framework; one or more facial disguise (FG) datasets; and a known non-disguised faces database, wherein, the ScatterNet Hybrid Deep Learning (SHDL) Network detects facial keypoints from the input image, then these facial key points are arranged into star structure to form a unique face-specific signature, and the unique face-specific signature is compared by the Disguised Face Classification framework to perform matching the input image into the known non-disguised faces database that identifies the individuals with the disguised faces.
2. The Disguised Face Identification (DFI) system of claim 1, wherein the system is for identifying the individuals with the disguised faces in uncontrolled environments/scenarios.
3. The Disguised Face Identification (DFI) system of claim 1, wherein the system further identifies multiple individuals with different disguise in uncontrolled scenarios.
4. The Disguised Face Identification (DFI) system of claim 1, wherein the system further recognize the disguised faces at different orientations and distances.
5. The Disguised Face Identification (DFI) system of claim 1, wherein the system identifies the individuals with the disguised faces including a wide variety of altered physical attributes on the face or wearing numerous disguises such as but not limited to wearing a wig, changing hairstyle or hair color, wearing eyeglasses, removing or growing beards, wearing scarves, wearing caps, wearing mask etc.
6. The Disguised Face Identification (DFI) system of claim 1, wherein the Disguised Face Identification (DFI) Framework includes the facial disguise (FG) datasets.
7. The Disguised Face Identification (DFI) system of claim 1, wherein the facial disguise (FG) datasets include simple facial disguise (FG) datasets and complex facial disguise (FG) datasets for training the deep convolutional networks.
8. The Disguised Face Identification (DFI) system of claim 1, wherein the disguised face identification (DFI) framework further performs evaluation of the facial keypoints on the facial disguise (FG) datasets.
9. The Disguised Face Identification (DFI) system of claim 1, wherein the system further includes an interactive monitor screen.
10. The Disguised Face Identification (DFI) system of claim 9, wherein the system is further evolved by a user, allowing the user to add faces to the database by simply clicking images of a face on the monitor screen when needed and the individual is identified immediately.
11. The Disguised Face Identification (DFI) system of claim 1, wherein the system further comprising a training module that trains the ScatterNet Hybrid Deep Learning (SHDL) Network.
12. The Disguised Face identification (DR) system of claim 1, wherein the system is trained on a large dataset that contains faces with varied disguises, covering different backgrounds and under varied illuminations.
13. A method of identifying individuals with disguised faces, the method comprising: receiving one or more image of at least one disguised face into Disguised Face Identification (DFI) Framework; and estimating of facial keypoints from the image of the disguised face by a ScatterNet Hybrid Deep Learning (SHDL) Network; wherein the ScatterNet Hybrid Deep Learning (SHDL) Network detects the facial keypoints from the disguised face, then these facial key points are arranged into star structure to form a unique face-specific signature, and the unique face-specific signature is compared by a Disguised Face Classification framework to perform matching the input image into a known non-disguised faces database that identifies the individuals with the disguised faces.
14. The method of claim 13, wherein identifying the individuals with the disguised faces in uncontrolled environments/scenarios.
15. The method of claim 13, wherein identifies multiple individuals with different disguise in uncontrolled scenarios.
16. The method of claim 13, wherein identifying the individuals with the disguised faces including a wide variety of altered physical attributes on the face or wearing numerous disguises such as but not limited to wearing a wig, changing hairstyle or hair color, wearing eyeglasses, removing or growing beards, wearing scarves, wearing caps, wearing mask etc.
17. The method of claim 13, wherein the Disguised Face Identification (DFI) Framework includes the facial disguise (FG) datasets, the facial disguise (FG) datasets further include simple facial disguise (FG) datasets and complex facial disguise (FG) datasets for training the deep convolutional networks.
18. The method of claim 13, wherein further performs evaluation of the facial keypoints on the facial disguise (FG) datasets.
19. The method of claim 13, wherein the method is further evolved by a user, allowing the user to add faces to the database by simply clicking images of a face on a monitor screen when needed and the individual is identified immediately.
20. The method of claim 13, wherein the method further includes training on a large dataset that contains faces with varied disguises, covering different backgrounds and under varied illuminations.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The object of the invention may be understood in more detail and more particularly the description of the invention briefly summarized above by reference to certain embodiments thereof which are illustrated in the appended drawings, which drawings form a part of this specification. It is to be noted, however, that the appended drawings illustrate preferred embodiments of the invention and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective equivalent embodiments.
(2)
(3)
(4)
(5)
(6)
(7)
(8)
DETAILED DESCRIPTION OF THE INVENTION
(9) The present invention will now be described more fully hereinafter with reference to the accompanying drawings in which a preferred embodiment of the invention is shown. This invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiment set forth herein. Rather, the embodiment is provided so that this disclosure will be thorough, and will fully convey the scope of the invention to those skilled in the art.
(10) In various embodiments, the present invention provides a Disguised Face Identification (DFI) system and method for detecting the facial keypoints and performing face identification using the detected facial key-points.
(11)
(12) Disguised Face Identification (DFI) system 100 includes a Disguised Face Identification (DFI) Framework 120 configured with ScatterNet Hybrid Deep Learning (SHDL) Network 130, a training module 140, a processor 150 and a memory 160. The Disguised Face Identification (DFI) system 100 performs estimation of facial landmarks or facial keypoints using the ScatterNet Hybrid Deep Learning (SHDL) Network 130. The training module 140 trains the ScatterNet Hybrid Deep Learning (SHDL) Network 130. The processor 150 executes instructions to perform estimation of facial landmarks or facial keypoints on the ScatterNet Hybrid Deep Learning (SHDL) Network 130. The processor 150 receives instructions from memory 160, or external circuitry. Each of these components may be embodied as hardware, software, firmware, or a combination thereof. Together, these components perform face detection for an individual of the input image 110. The Disguised Face Identification (DFI) system 100 includes the disguised face identification (DFI) framework 120 for evaluation of the facial landmarks or facial keypoints on facial disguise (FG) datasets.
(13) The training module 140 trains the ScatterNet Hybrid Deep Learning (SHDL) Network 130 in the Disguised Face Identification (DFI) framework 120 for performing facial landmarks or key points identification.
(14) In one embodiment, the training module 130 trains the ScatterNet Hybrid Deep Learning (SHDL) Network 130 using facial disguise (FG) datasets. The Disguised Face Identification (DFI) system 100 may have a large database for storing facial disguise (FG) datasets.
(15) As shown in
(16) As shown in the
(17) In one embodiment of the present invention, the facial disguise (FG) datasets 220 include simple facial disguise (FG) datasets 220a and complex facial disguise (FG) datasets 220b for training the deep convolutional networks.
(18) In one preferred embodiment the present invention discloses the Disguised Face Identification (DFI) framework 120 for detecting facial key-points for disguised face identification. The Disguised Face Identification (DFI) framework 120 first uses the ScatterNet Hybrid Deep Learning Network 130 to detect several facial landmarks facial keypoints, as shown in
(19) The invention provides two facial disguise (FG) datasets 220 to improve the training of deep convolutional networks due to their reliance on large training datasets.
(20) In one embodiment of the present invention, the Disguised Face Identification (DFI) Framework 120 uses the ScatterNet Hybrid Deep Learning Network 130 to extract several key-points from the face that are considered essential to describe the facial structure.
(21) In one embodiment, the several facial key points belong to the eyes region, nose region and lips region. The facial key points for the eyes region consist of the points P1, P2, P3, P4, P5, P6, P7, P8, P9, and P10, the nose region facial key points consist of keypoint P11 and the Lips region facial keypoints consists of P12, P13, and P14 keypoints as shown in
(22) The training of the deep convolutional network used for facial key-point detection requires a large amount of data. However, such datasets are not available due to which researchers have relied upon pre-trained deep networks obtained using standard non-disguised datasets to detect facial keypoints. The deep networks trained on non-disguise datasets may not be suitable for this application as they may not transfer well to the disguised face application.
(23) In order to avoid the above-mentioned issues, in another embodiment of the present invention proposes two facial disguise (FG) datasets 220, i.e., simple facial disguise (FG) datasets 220a and complex facial disguise (FG) dataset 220b that can be used in the future to train deep convolutional networks for facial keypoints detection. The Disguised Face Identification (DFI) framework 120 is trained and tested for facial disguise identification on both simple facial disguise (FG) dataset 220a and complex facial disguise (FG) dataset 220b.
(24) As mentioned above, the deep convolutional networks requires a large number of images with various combinations of disguises like people with eyeglasses, beard, different hairstyles, different hair colors, wig and scarf or cap to perform accurately. Therefore, in the present invention, the Disguised Face Identification (DFI) system 100 includes Face Disguise (FG) Datasets 220 of nearly 2000 images or more with (i) Simple facial disguise (FG) dataset 220a and (ii) Complex facial disguise (FG) dataset 220b that contain people with varied disguises, covering different backgrounds and under varied illuminations. In one embodiment, each proposed dataset (Simple and Complex) is formed of at least 2000 images or more recorded with male and female subjects aged from 18 years to 30 years. In an alternate embodiment, each proposed dataset (Simple and Complex) can formed of at least 2000 images or more recorded with either male or either female or both with any age groups without limiting the scope of the invention.
(25) As shown in
(26) In another embodiment, Disguised Face Identification (DFI) system 100 first detects several facial keypoints using the ScatterNet Hybrid Deep Learning (SHDL) Network 130. The ScatterNet Hybrid Deep Learning (SHDL) Network 130 for facial landmark estimation is composed of a hand-crafted ScatterNet front-end and a supervised learning-based back-end formed of the modified coarse-to-fine deep regression network (RN). The ScatterNet Hybrid Deep Learning (SHDL) Network 130 is constructed by replacing the first convolutional, relu, and pooling layers of the coarse-to-fine deep regression network with the hand-crafted parametric log ScatterNet. This accelerates the learning of the regression network (RN) as the Scatter-Net front-end extracts invariant (translation, rotation, and scale) edge features which can be directly used to learn more complex patterns from the start of learning. The invariant edge features can be beneficial for this application as the humans can appear with these variations in the facial images. Since the first layer (Scatter-Net) of the network is fixed or has no learnable parameters, fewer network parameters are required to be learned further requiring the need for fewer labelled examples. This makes the ScatterNet Hybrid Deep Learning (SHDL) Network 130 superior (in terms of speed of learning and annotated dataset requirement) to other deep convolutional networks.
(27) Further, the new faces' facial landmarks are detected by the ScatterNet Hybrid Deep Learning (SHDL) Network 130 with only a single image. The landmarks are connected to form the unique signatures and can recognize the faces immediately after that.
(28) The ScatterNet Hybrid Deep Learning (SHDL) Network 130 is used for the Facial Landmarks or facial keypoints detection in the Disguised Face Identification (DFI) framework 120. The Facial Landmarks or facial keypoints detection problem is formulated as a regression problem that can be modelled by the ScatterNet Hybrid Deep Learning (SHDL) Network 130. The ScatterNet Hybrid Deep Learning (SHDL) Network 130 takes an image of the face from either the simple or complex dataset or both and outputs the pixel coordinates of each facial landmark or facial keypoint for the face. An L2 norm is computed between the predicted points and the annotated landmark points of the same face image. The training objective is to estimate the network weights with the available training set D=(x;y) such that and the difference between the predicted and annotated landmarks is minimised. The loss function is shown below:
(29)
is a Gaussian centered at joint y.sub.k.
(30) The keypoints detected by the ScatterNet Hybrid Deep Learning (SHDL) Network 130 are connected to form a unique face-specific signature (star structure) which is further used for face identification. This unique face-specific signature (star structure) is shown in
(31) The detected facial landmarks or facial keypoints are next used by the Disguised Face Classification framework 230 to perform classification.
(32) In another embodiment, the present invention uses Disguised Face Classification framework 230 for comparing a disguised input face to the known non-disguised face database. The disguised input face image is considered a match to a specific image in the database if computed by estimating the L1 norm between the orientations between different key points in the star structure are below a specific threshold. In the star structure, the point at the Nose is the reference point for the various angles that are to be measured as shown in
(33) The similarity is calculated according to the equation below:
(34)
where τ is the similarity, θ.sub.i represents the orientation of the i.sup.th key point of the disguised image, and ϕ.sub.i stands for the corresponding angles for each non-disguised image in the known non-disguised face database.
(35) In one exemplary embodiment, it provides for the performance of facial key-points detection using ScatterNet Hybrid Deep Learning (SHDL) Network 106 of the disguised face identification (DFI) framework 120 on both the datasets 220. The performance of the ScatterNet Hybrid Deep Learning (SHDL) Network 130 is evaluated by comparing the coordinates of the detected key-points for an image in the simple or complex datasets with their ground truth annotations marked by the user. The performance of the key-point detection ScatterNet Hybrid Deep Learning (SHDL) Network 130 is shown in the form of graphs that plot accuracy vs. distance from the ground truth pixels. A keypoint is deemed correctly located if it is within a set distance of d pixels from the annotated ground truth. The key-point detection performance for both the simple (red) and complex (green) background face disguise dataset is plotted for each key-point as shown in
(36) Result 1:
(37) In another exemplary embodiment, Table 1 provides the quantitative comparison of the predicted key-points for both the datasets at 3 (d=5, 10, 15) pixel distances from the ground-truth. As observed for d=5, an average keypoint detection accuracy of 85% was recorded for the simple background dataset as opposed to an accuracy of 74% for the complex background dataset.
(38) TABLE-US-00001 TABLE 1 Table shows the keypoint detection accuracy (in %) on the simple background and complex background, face disguise (FG) dataset. The accuracy is tabulated with respect to the distance d (5, 10 and 15) in pixels from the ground truth (GT). There are 14 rows corresponding to 14 facial keypoints (can be more keypoints as well) and the last row corresponds to the average of all the facial key points plots. Distance (Pixels) from Ground Truth (GT) Simple (FG) Dataset Complex (FG) Dataset Points D = 5 D = 10 D = 15 D = 5 D = 10 D = 15 P1 54 86 97 32 68 90 P2 85 95 98 84 94 97 P3 85 100 100 74 97 97 P4 83 99 100 64 93 94 P5 82 96 96 64 90 94 P6 87 98 99 85 98 99 P7 40 78 97 36 75 96 P8 82 99 99 74 99 99 P9 39 75 95 32 70 95 P10 93 97 97 64 96 96 P11 97 99 99 96 99 99 P12 54 84 94 41 74 90 P13 91 96 96 85 93 93 P14 73 95 95 46 76 89 All 85 94 94 56 89 92
(39) The accuracy increases for both datasets with an increase in pixel distance from the annotated ground-truth for both datasets.
(40) Result 2:
(41) In another exemplary embodiment, the present invention provides a comparison of keypoint detection performance by the ScatterNet Hybrid Deep Learning (SHDL) Network 130 with other various available prior art systems and methods namely CoordinateNet (CN), CoordinateNet extended (CNE) and SpatialNet. The keypoint detection accuracies are presented for the simple background face disguise dataset and complex face disguise dataset at d=5 pixel distance. The keypoint detection accuracy results for simple background are 77.6%, 78.2%, 81%, and 85% for CN, CNE, SpatialNet and ScatterNet Hybrid Deep Learning Network used by the proposed disguised face identification (DFI) framework 104 respectively. The ScatterNet Hybrid Deep Learning Network outperforms the other networks by a significant margin. The classification results for complex background face disguise dataset are 44%, 44.7%, 52.67% and 56% for SpatialNet and ScatterNet Hybrid Deep Learning (SHDL) Network, respectively.
(42) TABLE-US-00002 TABLE 2 is a comparison of classification accuracies (%) of various architectures namely Coordinate Net (CN), Coordinate extended (CNE), Spatial net and Spatial Fusion (DFI) on the simple and complex face disguise datasets. Other Architectures Dataset DFI CN CNE Spatial Net Simple 85 77.6 78 81 Complex 56 44 44.7 52.67
(43) Result 3:
(44) In another exemplary embodiment, the present invention provides the disguise face classification performance for each disguise for both the simple and complex datasets in table 3. It is observed from Table. 3 that the facial disguise classification performance decreases with an increase in the complexity of the disguise.
(45) TABLE-US-00003 TABLE 3 Table presents the face disguise classification accuracy (%) for selected disguises on both datasets. Disguises Dataset Cap Scarf Cap + Scarf Cap + Glasses + Scarf Simple 90 77 69 55 Complex 83 67 56 43
(46) Result 4:
(47) Finally, the Table. 4 shows that the disguise face classification framework 230 able to outperform the state-of-the-art on the simple face disguise datasets and complex Face disguise datasets by 13% and 9% respectively.
(48) TABLE-US-00004 TABLE 4 Table shows the face disguise classification accuracy (%) compared against the state-of-the-art Comparison Dataset DFI State-of-the-art Simple FG Dataset 74.4 65.2 Complex FG Dataset 62.6 53.4
(49) The present invention can be used to identify wanted individuals intentionally attempting to hide their identity using different disguises in uncontrolled scenarios such as airports, shopping malls, government facilities etc.
(50) One advantage of this invention is the use of the proposed ScatterNet Hybrid Deep (SHDL) Learning Network 130 that allows the Disguised Face Identification (DFI) framework 120 to learn key-point estimation rapidly as well as with relatively fewer annotated examples of faces. This is extremely advantageous as compared to other deep networks due to their reliance on large annotated datasets.
(51) Another advantage of the present invention provides a large number of images and the Face Disguise (FG) Datasets 220 which can be effectively used to train the ScatterNet Hybrid Deep Learning (SHDL) Network 130 for facial key-point detection as standard datasets would not be suitable for this task.
(52) Another advantage is that the deployed system can also be evolved by the user as he/she can add new faces, by simply clicking the face on the monitor screen (interactive) 210, to the database which are detected thereafter by the system as it learns to recognize the new faces using only their single added image using one-shot learning.
(53) The embodiments according to the present invention may be implemented in the form of program instructions that can be executed by computers and may be recorded on computer readable media. The computer readable media may include program instructions, a data file, a data structure, or a combination thereof.
(54) The implementations of the described technology, in which the system is connected with a network server and a computer system capable of executing a computer program to execute the functions. Further, data and program files may be input to the system, which reads the files and executes the programs therein. Some of the elements of a general purpose computer system are a processor having an input/output (I/O) section, a Central Processing Unit (CPU), and a memory.
(55) The described technology is optionally implemented in software devices loaded in memory, stored in a database, and/or communicated via a wired or wireless network link, thereby transforming the computer system into a special purpose machine for implementing the described operations.
(56) The embodiments of the invention described herein are implemented as logical steps in one or more computer systems. The implementation is a matter of choice, dependent on the performance requirements of the computer system implementing the invention. Accordingly, the logical operations making up the embodiments of the invention described herein are referred to variously as operations, steps, objects, or modules. Furthermore, it should be understood that logical operations may be performed in any order, unless explicitly claimed otherwise or a specific order is inherently necessitated by the claim language.
(57) The foregoing description of embodiments of the invention has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed, and modifications and variations are possible in light of the above teachings or may be acquired from practice of the invention. The embodiments were chosen and described in order to explain the principles of the invention and its practical application to enable one skilled in the art to utilize the invention in various embodiments and with various modifications as are suited to the particular use contemplated.