Supervised facial recognition system and method
10192142 ยท 2019-01-29
Assignee
Inventors
Cpc classification
G06V10/76
PHYSICS
G06F18/2134
PHYSICS
International classification
Abstract
A computer executed method for supervised facial recognition comprising the operations of preprocessing, feature extraction and recognition. Preprocessing may comprise dividing received face images into several subimages, converting the different face image (or subimage) dimensions into a common dimension and/or converting the datatypes of all of the face images (or subimages) into an appropriate datatype. In feature extraction, 2D DMWT is used to extract information from the face images. Application of the 2D DMWT may be followed by FastICA. FastICA, or, in cases where FastICA is not used, 2D DMWT, may be followed by application of the l.sub.2-norm and/or eigendecomposition to obtain discriminating and independent features. The resulting independent features are fed into the recognition phase, which may use a neural network, to identify an unknown face image.
Claims
1. A computer executed method for facial recognition comprising: receiving a face image; performing preprocessing on the face image; applying a 2D DMWT to the preprocessed face image to obtain a resultant image matrix for the face image, the resultant image matrix having a plurality of subimages; converting each of the subimages into a vector; combining the vectors for each of the subimages to create a feature matrix; applying 2D FastICA to the feature matrix to obtain a plurality of independent subimages; converting the plurality of independent subimages into two-dimensional form; determining a resultant feature vector using the plurality of two-dimensional independent subimages; and performing recognition of the resultant feature vector.
2. The method of claim 1, wherein the operation of performing preprocessing on the face image comprises: converting an image dimension of the face image to a common dimension; and converting the face image from a first datatype to a second datatype.
3. The method of claim 2, wherein the common dimension is of size NN and wherein N is of the power two.
4. The method of claim 2, wherein the second datatype is a double datatype.
5. The method of claim 1, wherein the plurality of subimages is four subimages and the plurality of independent subimages is four independent subimages.
6. The method of claim 1, wherein the operation of determining the resultant feature vector using the plurality of two-dimensional independent subimages comprises: determining an eigenvalue for each of the two-dimensional independent subimages; combining the eigenvalues to obtain a resultant feature matrix; and converting the resultant feature matrix into a resultant feature vector.
7. The method of claim 1, wherein the operation of determining the resultant feature vector using the plurality of two-dimensional independent subimages comprises: determining an eigenvector for each of the plurality of two-dimensional independent subimages; converting each eigenvector into a feature vector; combining the feature vectors to create a resultant feature matrix; determining a l.sub.2-norm for each row in the resultant feature matrix; and convert the resultant feature matrix to the resultant feature vector using the l.sub.2-norms.
8. The method of claim 1, wherein the operation of performing recognition of the resultant feature vector comprises: identifying the face image by using the resultant feature vector as input to a neural network.
9. A computer executed method for facial recognition comprising: receiving a face image; performing preprocessing on the face image, wherein said step of preprocessing comprises dividing the face image into subimages, converting each of said subimages to a common dimension, and converting the face image to a datatype suitable for transform using 2D DMWT; applying a 2D DMWT to the preprocessed face image to obtain a resultant image matrix for the face image, the resultant image matrix having a plurality of subimages; determining a resultant feature vector using the plurality of subimages; and performing recognition of the resultant feature vector.
10. The method of claim 9, wherein the operation of performing preprocessing on the face image comprises: converting an image dimension of the face image to a common dimension; and converting the face image from a first datatype to a second datatype.
11. The method of claim 10, wherein the common dimension is of size NN and wherein N is of the power two.
12. The method of claim 10, wherein the second datatype is a double datatype.
13. The method of claim 9, wherein the plurality of subimages is four subimages.
14. The method of claim 9, wherein the operation of determining the resultant feature vector using the plurality of subimages comprises: determining an eigenvalue for each of the subimages; combining the eigenvalues to obtain a resultant feature matrix; and converting the resultant feature matrix into a resultant feature vector.
15. The method of claim 9, wherein the operation of determining the resultant feature vector using the plurality of subimages comprises: determining an eigenvector for each of the subimages; converting each eigenvector into a feature vector; combining the feature vectors to create a resultant feature matrix; determining a l.sub.2-norm for each row in the resultant feature matrix; and convert the resultant feature matrix to the resultant feature vector using the l.sub.2-norms.
16. The method of claim 9, wherein the operation of performing recognition of the resultant feature vector comprises: identifying the face image by using the resultant feature vector as input to a neural network.
17. A computer executed method for facial recognition comprising: receiving a face image; performing preprocessing on the face image to obtain a plurality of subimages resulting in a face image of dimension NN; applying a 2D DMWT to the plurality of preprocessed subimages to obtain a resultant subimage matrix for each subimage, each resultant image matrix having a plurality of sub-subimages; converting each of the sub-subimages into a vector; combining the vectors to create a pose matrix; determining a resultant feature vector using the pose matrix; and performing recognition of the resultant feature vector.
18. The method of claim 17, wherein the operation of performing preprocessing on the face image to obtain a plurality of subimages comprises: dividing the face image into the plurality of subimages, wherein the plurality of subimages have an original dimension; converting the original dimension of the plurality of subimages to a common dimension; and converting the plurality of subimages from a first datatype to a second datatype.
19. The method of claim 18, wherein the common dimension is of size NN and wherein N is of the power two.
20. The method of claim 18, wherein the second datatype is a double datatype.
21. The method of claim 17, wherein the plurality of subimages is four subimages and the plurality of sub-subimages is four sub-subimages.
22. The method of claim 17, wherein the operation of determining the resultant feature vector using the pose matrix comprises: applying 2D FastICA to the pose matrix to obtain a FastICA matrix; determining a l.sub.2-norm for each row in the FastICA matrix; and converting the FastICA matrix to the resultant feature vector using the l.sub.2-norms.
23. The method of claim 22, wherein the FastICA matrix is a FastICA signal matrix.
24. The method of claim 22, wherein the FastICA matrix is a mixing matrix of a FastICA signal matrix.
25. The method of claim 22, wherein the FastICA matrix is a feature matrix of a FastICA signal matrix.
26. The method of claim 17, wherein the operation of determining the resultant feature vector using the pose matrix comprises: determining a l.sub.2-norm for each row in the pose matrix; and converting the pose matrix to the resultant feature vector using the l.sub.2-norms.
27. The method of claim 17, wherein the operation of performing recognition of the resultant feature vector comprises: identifying the face image by using the resultant feature vector as input to a neural network.
28. A computer executed method for facial recognition comprising: receiving a face image; performing preprocessing on the face image to obtain a plurality of subimages, resulting in a face image of NN dimension; applying a 2D DMWT to the plurality of preprocessed subimages to obtain a resultant subimage matrix for each subimage, the resultant image matrix having a plurality of sub-subimages that correspond to each of the plurality of subimages; converting each of the plurality of sub-subimages into a plurality of vectors, wherein each vector corresponds to one of the plurality of sub-subimages; combining the vectors that correspond to each of the subimages to create a plurality of pose matrices, wherein each pose matrix corresponds to one of the subimages; determining a plurality of resultant feature vectors using the plurality of pose matrices; and performing recognition of the plurality of resultant feature vectors.
29. The method of claim 28, wherein the operation of performing preprocessing on the face image to obtain a plurality of subimages comprises: dividing the face image into the plurality of subimages, wherein the plurality of subimages have an original dimension; converting the original dimension of the plurality of subimages to a common dimension; and converting the plurality of subimages from a first datatype to a second datatype.
30. The method of claim 29, wherein the common dimension is of size NN and wherein N is of the power two.
31. The method of claim 29, wherein the second datatype is a double datatype.
32. The method of claim 28, wherein the plurality of subimages is four subimages and the plurality of sub-subimages is four sub-subimages.
33. The method of claim 28, wherein the operation of determining the plurality of resultant feature vectors using the plurality of pose matrices comprises: applying 2D FastICA to the pose matrices to obtain a plurality of FastICA matrices, wherein each FastICA matrix corresponds to one of the pose matrices; determining a l.sub.2-norm for each row in each of the plurality of FastICA matrices; and converting the FastICA matrices to the plurality of resultant feature vectors using the l.sub.2-norms.
34. The method of claim 33, wherein the plurality of FastICA matrices are FastICA signal matrices.
35. The method of claim 33, wherein the plurality of FastICA matrices are mixing matrices from FastICA signal matrices.
36. The method of claim 33, wherein the plurality of FastICA matrices are feature matrices from FastICA signal matrices.
37. The method of claim 28, wherein the operation of determining the plurality of resultant feature vectors using the plurality of pose matrices comprises: determining a l.sub.2-norm for each row in each of the plurality of pose matrices; and converting the pose matrices to the plurality of resultant feature vectors using the l.sub.2-norms.
38. The method of claim 28, wherein the operation of performing recognition of the plurality of resultant feature vector comprises: identifying the face image by using the plurality of resultant feature vectors as input to a neural network.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
(23) A detailed description of the embodiments for methods and systems of supervised-learning facial recognition using multiresolution and independent features will now be presented with reference to
(24)
(25) Electronic device 107 may include or be in communication with database 110. During training and testing of embodiments of the facial recognition method, face images 109 may come from database 110.
(26) Electronic device 107 may be a mobile computing device. Electronic device 107 may comprise a user interface for providing output and/or receiving input. Electronic device 107 may comprise an output device such as a display which is coupled to a processor. The user input interface, which allows electronic device 107 to receive data, may comprise means, such as one or more devices that may allow electronic device 107 to receive data, such as a keypad, a touch display, for example if the display comprises touch capability, and/or the like.
(27) Electronic device 107 may comprise a memory device including, in one embodiment, volatile memory, such as volatile Random Access Memory (RAM) including a cache area for the temporary storage of data. Electronic device 107 may also comprise other memory, for example, non-volatile memory, which may be embedded and/or may be removable. The non-volatile memory may comprise an EEPROM, flash memory or the like. The memories may store any of a number of pieces of information, and data. The information and data may be used by electronic device 107 to implement one or more functions of the electronic device.
(28) Electronic device 107 can be connected by conventional access hardware to the sensor via a wired or wireless connection. Electronic device 107 can be connected by conventional access hardware to the Internet. Electronic device 107 and sensor 108 may be in bi-directional communication with each other via the Internet.
(29) Electronic device 107 of an exemplary embodiment need not be the entire electronic device, but may be a component or group of components of the electronic device in other exemplary embodiments. Electronic device 107 may comprise a processor or other processing circuitry. As used in this application, the term circuitry refers to at least all of the following: hardware-only implementations (such as implementations in only analog and/or digital circuitry) and to combinations of circuits and software and/or firmware such as to a combination of processors or portions of processors/software including digital signal processor(s), software, and memory(ies) that work together to cause an apparatus, such as a computer, to perform various functions and to circuits, such as a microprocessor(s) or portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present. This definition of circuitry applies to all uses of this term in this application, including in any claims. As a further example, as used in this application, the term circuitry would also cover an implementation of merely a processor, multiple processors, or portion of a processor and its (or their) accompanying software and/or firmware.
(30) Further, the processor(s) may comprise functionality to operate one or more software programs, which may be stored in memory and which may, among other things, cause the processor to implement at least one embodiment including, for example, one or more of the functions or operations described herein.
(31) Embodiments of the facial recognition method comprise three phases: preprocessing, feature extraction and recognition. In embodiments, as shown in
(32) Embodiments of the facial recognition methods were trained and tested using the ORL, YALE and FERET databases, which have a large number of poses and persons and different light conditions, facial expressions and angle rotations. The variation in the number of poses, facial expressions, light conditions and angle rotations of these databases can be seen in
(33) A face image has highly redundant information and large dimensions. For feature extraction, techniques can be applied to get an efficient representation of the images by extracting discriminating and independent features. Exemplary methods for feature extraction, including use of the 2D DMWT, FastICA, the l.sub.2-norm and/or eigendecomposition, to obtain discriminating and independent features, are described in further detail below.
(34) The 2D DMWT based on MRA can be used for dimensionality reduction, localizing all the useful information in one single band, and/or noise reduction. 2D FastICA can be used for decorrelating the high order statistics since most of the important information is contained in the high order statistics of the image and/or reducing the computational complexity and improving the convergence rate. ICA features may be less sensitive to the facial variations arising from different facial expressions and different poses. ICA features are independent which may lead to a better representation and, hence, better identification and recognition rates.
(35) In an embodiment, illustrated in the flowchart of
(36)
(e.g. 6464) and each one is further divided into four
(37)
(e.g. 3232) subimages. As can be seen, all of the useful information is localized in the upper left band, which corresponds to the low-low (LL) frequency band of the multiwavelet transform. The LL sub-band is retained, while the remaining sub-bands are eliminated. Therefore, the resultant image matrix in this example is
(38)
(e.g. 6464). Returning to the flowchart of
(39)
(e.g. 3232) subimages is converted to a
(40)
(e.g. 10241) vector in operation 16, thereby creating a vector corresponding to each subimage (and four vectors corresponding to each image).
(41) Next, in operation 17, the
(42)
vectors for each set of four subimages are combined to create a feature matrix each having dimension
(43)
(e.g. 10244). Each feature matrix corresponds to a set of four
(44)
subimages. 2D Fast ICA is applied, in operation 18, to each of the
(45)
feature matrices to obtain a set of four
(46)
(e.g. 321) independent subimages for each feature matrix. An example of the application of 2D Fast ICA is shown in
(47) Next, new features are extracted from the original features. In operation 20, the eigenvalues are determined for each of the independent subimages. The eigenvalues for each set of four independent subimages are then combined in operation 21 to obtain a resultant feature matrix having dimension
(48)
(e.g. 324). Then, in operation 22, each of the resultant feature matrices is converted into one dimensional form into a N1 (e.g. 1281) resultant feature vector. Recognition is then performed, in operation 23, using the resultant feature vectors.
(49) In the recognition phase, one of several methods can be used to identify the unknown image including, for example, by measuring the Euclidean distance, which may be done using a neural network toolbox (NNT) based classifier for training and testing. Training and testing can also be done using the Back Propagation Training Algorithm (BPTA), Raidal Bias Function or Kohonen Self Organizing Networks based classifiers.
(50) Alternatively, as shown in
(51)
(e.g. 3232) subimages is converted to a
(52)
(e.g. 10241) vector in operation 116, thereby creating a vector corresponding to each subimage (and four vectors corresponding to each image).
(53) Next, in operation 117, the
(54)
vectors for each set of four subimages are combined to create a feature matrix each having dimension
(55)
(e.g. 10244). Each feature matrix corresponds to a set of four
(56)
subimages. 2D Fast ICA is applied, in operation 118, to each of the
(57)
feature matrices to obtain a set of four
(58)
(e.g. 321) independent subimages for each feature matrix. As described above, an example of the application of 2D Fast ICA is shown in
(59) Next, new features are extracted from the original features. In operation 124, eigenvectors for each of the independent subimages are determined. The eigenvectors have dimension
(60)
(e.g. 3232). Then, in operation 125, each eigenvector is converted into a feature vector having dimension
(61)
(e.g. 10241). The feature vectors for each set of four independent subimages are then combined, in operation 126 to create a resultant feature matrix having dimension
(62)
(e.g. 10244). In operation 127, the l.sub.2-norm is determined for each row in each of the resultant feature matrices and used, in operation 128, to convert each resultant feature matrix to a resultant feature vector having dimension
(63)
(e.g. 10241), thereby reducing the dimensionality and constraining the energy of each image in a column. Recognition is then performed, in operation 3, using the resultant feature vectors.
(64) In the recognition phase, one of several methods can be used to identify the unknown image including, for example, by measuring the Euclidean distance, which may be done using a neural network toolbox (NNT) based classifier for training and testing. Training and testing can also be done using the Back Propagation Training Algorithm (BPTA), Raidal Bias Function or Kohonen Self Organizing Networks based classifiers.
(65) In another embodiment, illustrated in the flowchart of
(66) In operation 45, the eigenvalues are determined for each of the
(67)
(e.g. 3232) subimages. The eigenvalues for each set of four subimages are then combined in operation 46 to obtain a resultant feature matrix. Then, in operation 47, each of the resultant feature matrices is converted into one dimensional form into a N1 (e.g. 1281) resultant feature vector. Recognition is then performed, in operation 48, using the resultant feature vectors.
(68) Alternatively, as shown in
(69) In operation 49, eigenvectors for each of the
(70)
(e.g. 3232) subimages are determined. Then, in operation 50, each eigenvector is converted into a feature vector. The feature vectors for each set of four subimages are then combined, in operation 51, to create a resultant feature matrix. In operation 52, the l.sub.2-norm is determined for each row in each of the resultant feature matrices and used, in operation 53, to convert each resultant feature matrix to a resultant feature vector having dimension
(71)
(e.g. 10241), thereby reducing the dimensionality and constraining the energy of each image in a column. Recognition is then performed, in operation 48 using the resultant feature vectors.
(72) In another exemplary embodiment, as shown in
(73) In an embodiment, illustrated in the flowchart of
(74)
(e.g. 6464) and each one is further divided into four
(75)
(e.g. 3232) sub-subimages. As can be seen in
(76)
(e.g. 6464). Returning to the flowchart of
(77)
sub-subimages is converted into a vector of
(78)
(e.g. 10241) dimension in operation 75 as shown in
(79)
vectors that correspond to the subimages in each set of four subimages (for a total 16 sub-subimages, which represent all of the sub subimages that correspond to a single original face image) are combined to create a
(80)
pose matrix. Each pose matrix corresponds to a set of four subimages.
(81) 2D Fast ICA is applied, in operation 77, to each of the
(82)
pose matrices to obtain a FastICA matrix, which may be, for example, a FastICA signal matrix, a mixing matrix of a FastICA signal matrix or a feature matrix of a FastICA signal matrix. A FastICA matrix is created for each pose matrix. In operation 78, the l.sub.2-norm is determined for each row in each of the FastICA matrices and used, in operation 79, to convert each FastICA matrix to a resultant feature vector having dimension
(83)
(e.g. 10241), thereby reducing the dimensionality and constraining the energy of each image in a column. Recognition is then performed, in operation 80, using the resultant feature vectors.
(84) As an alternative, facial recognition method 85 extracts features without using FastICA as shown in
(85)
sub-subimages is converted into a vector of
(86)
(e.g. 10241) dimension in operation 175 as shown in
(87)
vectors that correspond to the subimages in each set of four subimages (for a total 16 sub-subimages, which represent all of the sub subimages that correspond to a single original face image) are combined to create a
(88)
pose matrix. Each pose matrix corresponds to a set of four subimages. After the
(89)
pose matrix is created in operation 176, l.sub.2-norm is determined for each row in each of the pose matrix in operation 181. The l.sub.2-norm is then used, in operation 182, to convert each pose matrix to a resultant feature vector having dimension
(90)
(e.g. 10241). Recognition is then performed, in operation 180, using the resultant feature vectors.
(91) In another embodiment, illustrated in the flowchart of
(92)
rather than the set of four subimages (with dimension
(93)
(that correspond to a single face image). As shown in
(94)
(e.g. 6464) and each one is further divided into four
(95)
(e.g. 3232) sub-subimages. The resultant subimage matrix is
(96)
(e.g. 6464). Returning to the flowchart of
(97)
sub-subimages is converted into a vector of
(98)
(e.g. 10241) dimension in operation 95 as shown in
(99)
vectors for each set of four sub-subimages are combined to create a
(100)
pose matrix. Each pose matrix corresponds to a set of four sub-subimages.
(101) 2D Fast ICA is applied, in operation 97, to each of the
(102)
pose matrices to obtain a FastICA matrix, which may be, for example, a FastICA signal matrix, a mixing matrix of a FastICA signal matrix or a feature matrix of a FastICA signal matrix. A FastICA matrix is created for each pose matrix. In operation 98, the l.sub.2-norm is determined for each row in each of the FastICA matrices and used, in operation 99, to convert each FastICA matrix to a resultant feature vector having dimension
(103)
(e.g. 10241), thereby reducing the dimensionality and constraining the energy of each image in a column. Recognition is then performed, in operation 100, using the resultant feature vectors.
(104) As an alternative, facial recognition method 105 extracts features without using FastICA
(105)
sub-subimages is converted into a vector of
(106)
(e.g. 10241) dimension in operation 195 as shown in
(107)
vectors for each set of four sub-subimages are combined to create a
(108)
pose matrix. Each pose matrix corresponds to a set of four sub-subimages. The l.sub.2-norm is determined for each row in each of the pose matrix in operation 197. The l.sub.2-norm is then used, in operation 198, to convert each pose matrix to a resultant feature vector having dimension
(109)
(e.g. 10241). Recognition is then performed, in operation 199, using the resultant feature vectors.
(110) In the recognition phase, a neural network based on the BPTA was used for training and testing. BPTA is a supervised learning algorithm. Therefore, it is necessary to choose a desired output for each database. ORL, YALE and FERET databases were used as an example for training and testing the recognition phase. There are 40, 15, and 200 different desired outputs for the ORL, YALE, and FERET databases, respectively, corresponding to the different number of persons in each database. Three layers are used in the NNT, namely, an input, a hidden, and an output layer.
(111) Exemplary experimental results of various embodiments described herein are provided in
(112) There are 40 persons in the ORL database, each with 10 different poses. Therefore, the total number of poses used to test the system is 400 poses. P denotes the number of poses used for training. Hence, 10P poses are used for testing. P=1, P=3 and P=5 poses were used here. The table of
(113) The YALE database consists of 15 persons, each with 11 different poses. Therefore, the total number of poses used to test the system is 165 poses. In the training phase, P=1, P=3 and P=5 poses were used. The table of
(114) There are 200 persons in the FERET database, each with 11 different poses. Therefore, the total number of poses used to test the system is 2200 poses. In the training phase, P=1, P=3 and P=5 poses were used. The table of
(115) Note that FastICA decorrelates the images and produces statistically independent sets of images. Then eigenanalysis of the resulting features generates an efficient image representation. The configuration of the neural network during the training phase can affect the overall performance of the methods. Choosing the number of hidden layers, the number of neurons in the hidden layers, the types of the activation functions, the training function, the training method and the target performance can impact the overall performance of the system. In the exemplary results for methods 10, 30, 40 and 55, one hidden layer was used with 512 neurons for the ORL and YALE databases and two hidden layers were used with 512 and 256 neurons for first and the second hidden layer, respectively for the FERET database. In the above exemplary results for methods 70, 85, 90 and 105, one hidden layer was used with 1024 neurons for the ORL and YALE databases and two hidden layers were used with 1024 and 512 neurons for the first and second hidden layer, respectively, for the FERET database. The activation function that was used was the hyperbolic tangent sigmoid and back propagation was used for training and testing. The mean square error (MSE) was 10.sup.7.
(116) Having now described the invention, the construction, the operation and use of preferred embodiments thereof, and the advantageous new and useful results obtained thereby, the new and useful constructions, and reasonable mechanical equivalents thereof obvious to those skilled in the art, are set forth in the appended claims.