SYSTEM FOR DETECTING FACE LIVELINESS IN AN IMAGE

Abstract

The present invention discloses a liveliness detection technique. The technique is described for identifying facial attributes. The technique identifies the presented face in the image as real or deceptive. The system and method includes identifying the facial attributes and utilizing a multi task learning network. The neural network includes segmentation and classification functionalities. The final output is used to get pixel level semantic information and high level semantic information.

Claims

1. An image detection system for detecting liveliness in an image, wherein the image detection system comprising: a face detection module, wherein the face detection module comprises: a processing unit, wherein the processing unit processes the image to identify a region of interest; a scaling unit, wherein the scaling unit scales the region of interest to identify one or more facial features to generate a first image; an annotation unit, wherein the annotation unit annotates a first color from a plurality of defined colors to a face, a second color from a plurality of defined colors to a foreground region and a third color from the plurality of defined colors to a background region of the first image to generate a second image; and a segmentation module, wherein the segmentation module comprises: an extractor unit, wherein the extractor unit extracts a number of pixels within the second image based on the plurality of defined colors to generate a third image; and a counter unit, wherein the counter unit counts live pixels and spoofed pixels in the third image to detect liveliness in the image.

2. A liveliness detection system for detecting liveliness in an image, wherein the liveliness detection system comprising: a face detection module, wherein the face detection module comprises: a processing unit, wherein the processing unit processes the image to identify a region of interest; a scaling unit, wherein the scaling unit scales the region of interest to identify one or more facial features to generate a first image; an annotation unit, wherein the annotation unit annotates a first color from a plurality of defined colors to a foreground region and a second color from the plurality of defined colors to a background region of the first image to generate a second image; and a segmentation module, wherein the segmentation module comprises: an extractor unit, wherein the extractor unit extracts a number of pixels within the second image based on the plurality of defined colors to generate a third image; and a counter unit, wherein the counter unit counts live pixels and poofed pixels in the third image to detect liveliness in the image; a classification module, wherein the classification module extracts high level feature information on the second image and generates a third image, further wherein the classification unit is configured to estimate a probability whether the features belong to live; and a fusing module, wherein the fusing module fuses the second image with the third image to form a final image for detecting liveliness of the image.

3) The image detection system in accordance with the claim 1, wherein the annotations are on the face, foreground and background.

4) The image detection system in accordance with the claim 3, wherein the annotations are on the basis of real face, fake face, real foreground, fake foreground, real background and fake background

5) The image detection system in accordance with the claim 3, wherein the annotations are colored.

6) The image detection system in accordance with the claim 5, wherein the annotation unit allocates the color on the basis of liveness of the foreground region and the background region.

7) The image detection system in accordance with the claim 1, wherein the annotation unit produces a feature map of the images.

8) The image detection system in accordance with the claim 2, wherein the classification module is part of a backbone network is in block architecture.

9) The image detection system in accordance with the claim 2, wherein the backbone network is a cascade of convolution layer with pooling layer and activation layer,

10) The image detection system in accordance with the claim 2, wherein the backbone network is in block architecture.

11) The image detection system in accordance with the claim 1, wherein the convolution layer and activation layer identifies features of each block.

12) The image detection system in accordance with the claim 1, wherein the segmentation module includes decode head to get feature representation which carries local and global context information.

13) A method for detecting face liveliness on an input image utilizing multitask learning by performing steps of: obtaining by a face detection module an input image and identifying a region of interest (ROI) and perform scaling on the image for appropriate feature extraction annoting the image detected by the face detection module to generate a feature map according to annotation extracting the pixel level semantic information and counting the number of spoofed and live pixels to generate a first result extracting a high level semantic information by calculating the probability that the features belong to spoofed or live to generate a second result combining the first result with the second result to generate the liveliness of image.

Description

BRIEF DESCRIPTION OF FIGURES

[0032] The objects and features of the present invention will become more fully apparent from the following description and appended claims, taken in conjunction with the accompanying drawings. Understanding that these drawings depict only typical embodiments of the invention and are, therefore, not to be considered limiting of its scope, the invention will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:

[0033] FIG. 1A illustrates an image detection system in first case scenario in accordance with the present invention;

[0034] FIG. 1B illustrates the image detection system in second case scenario in accordance with the present invention;

[0035] FIG. 2 illustrates a face detection module of the image detection system in accordance with the present invention;

[0036] FIG. 3 illustrates a segmentation module of the image detection system in accordance with the present invention;

[0037] FIG. 4 illustrates a overview of segmentation guided classification face anti-spoofing network in accordance with the present invention;

[0038] FIG. 5 illustrates annotating the live and spoof images with four labels at pixel level in accordance with the present invention;

[0039] FIG. 6 illustrates a data pre-processing for multi-scale generation in accordance with the present invention;

[0040] FIG. 7 illustrates a method of detecting liveliness in the image in accordance with the present invention; and

[0041] FIG. 8 illustrates a cascaded result fusion flow chart in accordance with the present invention.

DETAILED DESCRIPTION OF FIGURES

[0042] Face liveness detection is an important task in computer vision, which aims to facilitate the facial interaction system to determine whether the presented face is real or deceptive. With the successful deployment of phone unlocking, access control and electronic wallet payment, facial interaction systems have become an indispensable part of the real world. However, these facial interaction systems pose a major threat. Imagine a situation where an attacker owns your photos or videos, can unlock your phone, and can even use an electronic wallet to pay. For this reason, face liveness detection has become an important technology to protect our privacy and property from illegal use by others.

[0043] Generally, the sources of illegal attacks mainly consist of printing photograph, screen images or videos, ultra-realistic face masks or a 3-D model of an authorized client. Among these types of attacks, the most flexible attack is print photos or screen images captured from the Internet.

[0044] FIG. 1A illustrates an image detection system for detecting liveliness in an image in first case scenario in accordance with the present invention. The image detection system 100 includes a face detection module 200 and a segmentation module 300. The face detection module 200 comprises a processing unit, a scaling unit and an annotation unit.

[0045] The processing unit is configures to process the image for identifying a region of interest. The scaling unit performs scaling of the region of interest in the image to identify one or more facial features and generate a first image.

[0046] The annotation unit is configured to annotate an individual color from a number of defined colors. A first color is annotated to a face in the image, a second color to a foreground region and a third color to a background region of the first image to generate a second image. The annotation unit allocates the color on the basis of liveness of the foreground region and the background region.

[0047] The segmentation module 300 for extracting pixel-level semantic information on the first image and generating a second image, where the segmentation unit is configured to count the number of live and spoofed pixels from the first image. In case the segmentation module fails to detect the liveliness of the video a classification module is used.

[0048] Moreover, the face detection module (200) does face detection and generate a bounding box. The segmentation module (300) extracts feature representations from the image. The feature representation carries local and global context information. Also, the per pixel-prediction is obtained from the last convolution layer in the segmentation head.

[0049] The segmentation guided classification face anti-spoofing network. The backbone, which is cascaded convolution layer with pooling layer and activation layer, to get the features of each block. Then the decode head (also called segmentation head) is applied to get the final feature representation, which carries local and global context information. Finally, the final per-pixel prediction is obtained from the last convolution layer in the decode head. And the feature map is also fed into the classification head, which consists of one convolution layer and fully connected layer, to get the final feature representation, and get the final probability for spoof/live with a soft-max function.

[0050] FIG. 1B illustrates an image detection system for detecting liveliness in an image in first case scenario in accordance with the present invention. The segmentation guided classification image detection system (100b) includes a face detection module (200), segmentation module (300), classification module (400) and a fusing module (500). The face detection module (200) does face detection and generate a bounding box.

[0051] The segmentation module (300) extracts feature representations from the image which carries local and global context information. Also, per pixel prediction is obtained from the last convolution layer in the segmentation head. The classification module (400) is a layered architecture having one convolution layer and fully connected layer. The feature map is fed into the classification module (400) to get the final feature representation and to get the final probability of spoof/live with a soft-max function. The fusion module (500) verifies the probability of live/spoof and the probability of live/spoof is used for second verification.

[0052] The classification unit for extracting high-level feature information on the second image and generating a third image, where the classification unit is configured to estimate a probability whether the features belong to live or spoofed. The fusion unit configured to fuse the second image with the third image for detecting liveliness of the image.

[0053] In recent decades, research on attribute-based representations based on objects, faces, and scenes has attracted widespread attention as a supplement to classification representations. However, few works try to use semantic information in face anti-spoofing. In fact, for face anti-spoofing, additional semantic information can be used to characterize the target image through attributes instead of distinguishing it into a single category, that is, live or spoof. In this present article, we design a multi-task learning network, named segmentation guided classification face anti-spoofing network as shown in FIG. 1.

[0054] The backbone is amalgamation of cascaded convolution layer with pooling layer and activation layer wherein a segmentation head also called a decode head is present to decode block by block facial feature representations and finally the final feature representation containing per pixel prediction is obtained from last convolution layer in the segmentation or decode head.

[0055] The obtained feature map can also be further analyzed, depending upon the requirements, by a classification head which is in the form of layered architecture containing as discussed one convolution layer and fully connected layer to get a much detailed probabilistic estimation of images wherein a soft-max function is used to get final probability for spoofed and live pixels.

[0056] The segmentation network, in order to get accurate semantic information for each pixel, annotates the live and spoof images with four labels at pixel level. The yellow color denotes fake face, green color denotes fake foreground and blue color denotes fake background. And the red color denotes real face, purple color denotes real foreground and black color denote real background.

[0057] In order to pay more attention on the face area, the input image is more processed with a general face detection model to get the face bounding box, and rescale it to a larger one to get more features. The edge information is contained in the rescaled bounding box.

[0058] The network output two results while doing inference, one is output of segmentation task, which segment each pixel of an input image into 1) real face/foreground pixel, 2) fake face/foreground pixel, 3) real background pixel and 4) fake background pixel. The number of real-face pixel and fake face pixel will be counted and used to calculate the probability for real face. The other result, live/spoof probability, is output of classification task. The formula (1) is applied to get the final prediction result.

P.sub.1=αΣP.sub.s1/(ΣP.sub.s1+ΣP.sub.ss)+β.sub.c1

[0059] In the formula one: P.sub.1 denotes the probability of final result belonging to “live” class, ΣP.sub.s1 and ΣP.sub.ss denote the total number of real-face and fake face pixels respectively in the output segmentation map. P.sub.c1 denotes the probability of “live” class from classification head. α and β denote the weights for segmentation results and classification results respectively. (defaultα=β=0.5)

[0060] The probability of live or spoof from segmentation network is verified firstly and if it meets the condition, then output the final result, otherwise, the probability of live or spoof is used for second verification. The fusion module fuses the output from the segmentation module and the classification module to detect the liveliness of the image.

[0061] FIG. 2 illustrates a face detection module of the image detection system in accordance with the present invention. The rescaled bounding box is obtained from a general face detection module 200. Moreover, the information outside the face area and the face part is obtained.

[0062] The face detection module includes a processing unit 202, a scaling unit 204 and an annotating unit 206. The processing unit 202 processes the input images. The processing is done in order to generate a bounding box and once bounding box is generated. The scaling unit performs the Attention scaling 204 to rescale the image to larger one to get more features.

[0063] In order to get accurate semantic information for each pixel an annotation unit (206) is utilized in the system. The annotation unit annotates the face, the foreground region and the background region individually with different colors.

[0064] A first color is annotated to a face in the image, a second color to a foreground region and a third color to a background region of the first image to generate a second image. The annotation unit allocates the color on the basis of liveness of the foreground region and the background region.

[0065] In order to get accurate semantic information for each pixel, annotates the live and spoof images with four labels at pixel level. The yellow color denotes fake face, green color denotes fake foreground and blue color denotes fake background. In addition, the red color denotes real face; purple color denotes real foreground and black color denote real background.

[0066] FIG. 3 illustrates internal components of segmentation module 300. The segmentation module segments each pixel of an input image into 1) real face/foreground pixel, 2) fake face or foreground pixel, 3) real background pixel and 4) fake background pixel.

[0067] The segmentation module includes an extractor unit 302 and a counter unit 304. The extractor unit 302 extracts features from each block of the feature map. The pixel level semantic information is extracted by the segmentation module 300. The counter unit 304 counts the number of spoof and live pixels respectively to predict the recognition result to detect the liveliness of the image.

[0068] FIG. 4 illustrates an overview of segmentation guided classification face anti-spoofing network in accordance with the present invention. An input image (402) is presented to the network. The network includes a backbone (404) which is cascaded convolution layer with pooling layer (412) and activation layer. The backbone (which is cascaded convolution layer with pooling layer and activation layer) is used to get the features of each block. Then decode head (also called segmentation head) is applied to get the final feature representation, which carries local and global context information.

[0069] Feature from each block is applied to get final feature representation. Per-pixel prediction is obtained from the last convolution layer in the segmentation or the decode head. The feature map thus obtained is fed into the classification head (406) which consists of one convolution layer and fully connected layer to get the final feature representation and get the probability for spoof/live with a soft max function.

[0070] FIG. 5 illustrates annotating the live and spoof images with four labels at pixel level in accordance with the invention. The segmentation module, in order to get accurate semantic information for each pixel, we annotate the live and spoof images with four labels at pixel level.

[0071] In the figure left ground truth image (502) and left feature map (504) yellow color denotes fake face, green color denotes fake foreground and blue color denotes fake background. In the right ground truth image (506), and right feature map (508) red color denotes real face, purple color denotes real foreground and black color denote real background.

[0072] FIG. 6 illustrates a data pre-processing for multi-scale generation in accordance with the invention. In order to pay more attention on the face area, the input image is processed with a general face detection model to get the face bounding box. The edge information is contained in the rescaled bounding box.

[0073] The input image (602) is presented to face detection module (600) to get the face bounding box (604) attention scaling (606) is performed to rescale it to a larger one or smaller one (608) to get more features.

[0074] FIG. 7 illustrates a method of detecting liveliness in the image in accordance with the invention. At step (702), input image is presented to face detection module the face detection module identifies region of interest by creating a bounding box around the image and scaling unit rescales the image to get larger and more features

[0075] At step (704), the image detected by the face detection module is annotated to generate feature map the feature map is generated according to the annotation where yellow color denotes fake face, green color denotes fake foreground and blue color denotes fake background, red color denotes real face, purple color denotes real foreground and black color denote real background

[0076] Next, at step (706) the segmentation module extracts pixel level semantic information from the feature map. The extraction of pixels from the feature map is performed by extractor unit and the counter unit counts the number of spoofed and live pixel to generate the first result. In this case the classification module is not utilized.

[0077] If it is required to extract high level semantic information the classification module comes into picture. Further, at step (708) probability that the features belong to spoofed or live is calculated to generate a second result.

[0078] Finally, at step (710) fusing module combines the first result with the second result to detect the liveliness of the image.

[0079] FIG. 8 illustrates a cascaded result fusion flow chart in accordance with the invention. Input image (802) is encoded by encoder (804). Then, presented to a segmentation decoder (806) and a classification decoder (816). Live or spoof probability 1 (808) is calculated and first stage verification (810) is performed if the result is satisfactory. Later, this output becomes the final result otherwise classification decoder (816) calculates live or spoof probability 2 (812). The results achieved from segmentation task are combined with classification task to get the final result (814).

[0080] In some embodiments only segmentation level information is required and in some embodiment's results of segmentation module is used as an auxiliary to collect the final results.

[0081] While the various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example only, and not of limitation. Likewise, the figure may depict an example architectural or other configuration for the invention, which is done to aid in understanding the features and functionality that can be included in the invention. The invention is not restricted to the illustrated example architectures or configurations, but the desired features can be implemented using a variety of alternative architecture and configurations.

[0082] Although, the invention is described above in terms of various exemplary embodiments and implementations, it should be understood that the various features, aspects, and functionality described in one or more of the individual embodiments are not limited in their applicability to the particular embodiment with which they are described, but instead can be applied, alone or in various combinations, to one or more of the other embodiments of the invention, whether or not such embodiments are described and whether or not such features are presented as being a part of a described embodiment. Thus, the breadth and scope of the present invention should not be limited by any of the above-described exemplary embodiments.

[0083] The presence of broadening words and phrases such as “one or more,” “at least,” “but not limited to” or other like phrases in some instances shall not be read to mean that the narrower case is intended or required in instances where such broadening phrases may be absent.

SYSTEM FOR DETECTING FACE LIVELINESS IN AN IMAGE

Assignee

Inventors

Cpc classification

Classification Explorer

G06V40/45

PHYSICS

Classification Explorer

G06V10/82

PHYSICS

Classification Explorer

G06V20/20

PHYSICS

Classification Explorer

G06V10/26

PHYSICS

Classification Explorer

G06V10/25

PHYSICS

Classification Explorer

G06F18/25

PHYSICS

Classification Explorer

G06V40/161

PHYSICS

International classification

Classification Explorer

G06V40/40

PHYSICS

Abstract

Claims

Description