Non-negative matrix factorization face recognition method and system based on kernel machine learning
10679040 ยท 2020-06-09
Assignee
Inventors
- Wensheng Chen (Guangdong, CN)
- Yang Zhao (Guangdong, CN)
- Bo Chen (Guangdong, CN)
- Binbin Pan (Guangdong, CN)
Cpc classification
G06F18/214
PHYSICS
G06F16/90
PHYSICS
G06F18/2133
PHYSICS
G06V10/7715
PHYSICS
G06F18/241
PHYSICS
International classification
Abstract
The invention provides a non-negative matrix factorization face recognition method and system based on kernel machine learning, which comprises five steps. The invention has the following beneficial effects: the invention avoids the learning of the inaccurate pre-image matrix by directly learning two kernel matrices, K.sub.wx and K.sub.ww, and avoids the derivation of the kernel function in the iterative formula by changing the learning object, so that there is no limit to the selection of kernel function and a general algorithm for any kernel function is obtained.
Claims
1. A non-negative matrix factorization face recognition method based on kernel machine learning, comprising the following steps: (A) each preset training sample image is represented as a column vector; (B) through a known kernel function and a known training sample vector, a symmetric positive semi-definite kernel matrix K.sub.xx is constructed; (C) three objective functions are respectively established and the objective functions are minimized by a method of cross-iteration, and new features of the training samples in a kernel space and two kernel matrices related to non-linearly mapped samples are obtained; (D) through the two kernel matrices obtained in the learning phase, a test sample is projected into the kernel space to obtain new feature of the test sample in the kernel space; (E) according to a nearest neighbor method, the new features of the test sample are compared with preset centers of the new features of each type of training sample to realize a classification and identification of the test sample, wherein in Step E, a distance between a new feature of each test sample and a new feature center of each type of training sample is calculated, and then the test image is classified according to the nearest neighbor method as a nearest distance class.
2. The non-negative matrix factorization face recognition method as claimed in claim 1, wherein in Step C, the formulae of cross-iteration are:
H.sup.(t+1)=H.sup.(t).Math.K.sub.wx(K.sub.wwH),
K.sub.xw.sup.(t+1)=K.sub.xw.sup.(t).Math.(K.sub.xxH.sup.T)(K.sub.xw.sup.(t)HH.sup.T),
K.sub.ww.sup.(t+1)=K.sub.ww.sup.(t).Math.(K.sub.wxH.sup.T+HK.sub.xw)(K.sub.ww.sup.(t)HH.sup.T+HH.sup.TK.sub.ww.sup.(t)), wherein a final feature matrix H and the two kernel matrices, K.sub.wx and K.sub.ww, can be obtained by the method of cross-iteration.
3. The non-negative matrix factorization face recognition method as claimed in claim 1, wherein in Step C, the cross-iterative process comprises the following steps of: (1) inputting the kernel matrix K.sub.xx; (2) initializing matrices H, K.sub.xw, K.sub.ww; (3) fixing K.sub.xw, K.sub.ww and updating H; (4) fixing H, K.sub.ww and updating K.sub.xw; (5) fixing H, K.sub.xw and updating K.sub.ww; (6) determining and verifying whether a stop condition is satisfied, if the stop condition s satisfied, perform Step (7); otherwise, perform Step (3); (7) the approximate solutions of H, K.sub.xw, K.sub.ww are obtained.
4. The non-negative matrix factorization face recognition method as claimed in claim 1, wherein in Step D, the new feature H.sub.Y of the test sample is obtained by the following three methods: Method I: method of directly utilizing generalized inverse matrix of iterated K.sub.xw, comprising:
H.sub.Y=K.sub.xw.sup.+K.sub.xy, then each column of the matrix H.sub.Y represents a new feature of the image to be classified; Method II: through non-negative matrix factorization, K.sub.xy and K.sub.xw in K.sub.xy=K.sub.xwH.sub.Y remain unchanged and the non-negative H.sub.Y can be iterated by a non-negative matrix factorization formula:
H.sub.Y.sup.(t+1)=H.sub.Y.sup.(t).Math.(K.sub.xw.sup.TK.sub.xy)(K.sub.xw.sup.TK.sub.xwH.sub.Y.sup.(t)); Method III: through a non-negative sparse representation, K.sub.xy will be sparsely represented by K.sub.xw, wherein H.sub.Y is a sparse coefficient matrix, and H.sub.Y can be obtained by the iterative formula:
H.sub.Y.sup.(t+1)=H.sub.Y.sup.(t).Math.(K.sub.xw.sup.TK.sub.xy)(K.sub.xw.sup.TK.sub.xwH.sub.Y.sup.(t)+1.sub.rd), wherein, is a control parameter, 1.sub.rd is an all-one matrix with a size of rd.
5. A non-negative matrix factorization face recognition system based on kernel machine learning, comprising: a register, storing instructions; a processor, coupled to the register and configured to execute the instructions to: represent each preset training sample image as a column vector; construct a symmetric positive semi-definite kernel matrix K.sub.xx by a known kernel function and the known training sample vectors; establish three objective functions respectively and realizing a minimization of the objective function through a method of cross-iteration, and new features of the training samples in a kernel space and two kernel matrices related to non-linearly mapped samples are obtained; project a test sample into the kernel space through the two kernel matrices obtained in the learning phase to obtain new features of the test sample in the kernel space; compare the new feature of the test sample with a preset center of the new feature of each type of training sample according to the nearest neighbor method to realize a classification and identification of the test sample, wherein the processor is configured to execute the instructions to calculate a distance between a new feature of each test sample and a new feature center of each type of training sample, and then classify the test image according to the nearest neighbor method as the nearest distance class.
6. The non-negative matrix factorization face recognition system as claimed in claim 5, wherein formula of the cross-iteration is:
H.sup.(t+1)=H.sup.(t).Math.K.sub.wx(K.sub.wwH),
K.sub.xw.sup.(t+1)=K.sub.xw.sup.(t).Math.(K.sub.xxH.sup.T)(K.sub.xw.sup.(t)HH.sup.T),
K.sub.ww.sup.(t+1)=K.sub.ww.sup.(t).Math.(K.sub.wxH.sup.T+HK.sub.xw)(K.sub.ww.sup.(t)HH.sup.T+HH.sup.TK.sub.ww.sup.(t)), wherein a final feature matrix H and the two kernel matrices, K.sub.wx and K.sub.ww, can be obtained by the method of cross-iteration.
7. The non-negative matrix factorization face recognition system as claimed in claim 5, wherein the processor is further configured to execute the instructions to: input the kernel matrix K.sub.xx; initialize matrices H, K.sub.xw, K.sub.ww; fix K.sub.xw, K.sub.ww and updating H; fix H, K.sub.ww and updating K.sub.xw; fix H, K.sub.xw and updating K.sub.ww; judge and verify whether a stop condition is satisfied, and if the stop condition is satisfied, the approximate solutions of H, K.sub.xw, K.sub.ww are obtained; otherwise, the matrices K.sub.xw, K.sub.ww are fixed and H is updated.
8. The non-negative matrix factorization face recognition system as claimed in claim 5, wherein the processor is further configured to execute the instructions to obtain the new feature H.sub.Y of the test sample by the following three methods: Method I: method of directly utilizing generalized inverse matrix of iterated K.sub.xw, comprising:
H.sub.Y=K.sub.xw.sup.+K.sub.xy, then each column of the matrix H.sub.Y represents a new feature of the image to be classified; Method II: through the non-negative matrix factorization, K.sub.xy and K.sub.xw in K.sub.xy=K.sub.xwH.sub.Y remain unchanged and the non-negative H.sub.Y can be iterated by a non-negative matrix factorization formula:
H.sub.Y.sup.(t+1)=H.sub.Y.sup.(t).Math.(K.sub.xw.sup.TK.sub.xy)(K.sub.xw.sup.TK.sub.xwH.sub.Y.sup.(t)); Method III: through a non-negative sparse representation, K.sub.xy will be sparsely represented by K.sub.xw, wherein H.sub.Y is a sparse coefficient matrix, and H.sub.Y can be obtained by the iterative formula:
H.sub.Y.sup.(t+1)=H.sub.Y.sup.(t).Math.(K.sub.xw.sup.TK.sub.xy)(K.sub.xw.sup.TK.sub.xwH.sub.Y.sup.(t)+1.sub.rd), wherein, is a control parameter, 1.sub.rd is an all-one matrix with a size of rd.
Description
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
(1)
(2)
(3)
DETAILED DESCRIPTION OF THE INVENTION
(4) As shown in
(5) in Step S1, each preset training sample image is represented as a column vector;
(6) in Step S2, through the known kernel function and the training sample vector, a symmetric positive semi-definite kernel matrix is K.sub.xx constructed;
(7) in Step S3, three objective functions are respectively established and the objective functions are minimized by the method of cross-iteration, and the new features of the training samples in the kernel space and two kernel matrices related to the non-linearly mapped samples are obtained;
(8) in Step S4, through the two kernel matrices obtained in the learning phase, the test sample is projected into the kernel space to obtain the new features of the test sample in the kernel space;
(9) in Step S5, according to the nearest neighbor method, the new features of the test sample are compared with the preset centers of the new features of each type of training sample to realize the classification and identification of the test sample.
(10) As shown in
(11) Step Q1: inputting the kernel matrix K.sub.xx;
(12) Step Q2: initializing the matrices H, K.sub.xw, K.sub.ww;
(13) Step Q3: fixing K.sub.xw, K.sub.ww and updating H;
(14) Step Q4: fixing H, K.sub.ww and updating K.sub.xw;
(15) Step Q5: fixing H, K.sub.xw and updating K.sub.ww;
(16) Step Q6: determining whether the verification satisfies the stop condition, if the condition is satisfied, perform Step Q7; otherwise, perform Step Q3;
(17) Step Q7: the approximate solution of H, K.sub.xw, K.sub.ww is obtained.
(18) X is set as a training sample matrix and the sample matrix is mapped to a high-dimensional space through a non-linear mapping , intended to represent the mapped sample as a linear combination of the mapped pre-images:
(X)=(W)H.
(19) To avoid the learning of the inaccurate pre-image, three error functions are established:
(20)
(21) wherein [K.sub.xx].sub.ij=(x.sub.i),(x.sub.j)
=k(x.sub.i,x.sub.j), the RBF kernel function,
(22)
t>0, is used in the invention. [K.sub.ww].sub.ij=(x.sub.i),(x.sub.j)
=k(x.sub.i,x.sub.j), [K.sub.xw].sub.ij=
(x.sub.i),(x.sub.j)
=k(x.sub.i,x.sub.j), and K.sub.wx=K.sub.xw.sup.T.
The three objective functions essentially reflect the degree of approximation after the decomposition of (X).
(23) The problem of solving the feature matrix then evolves into three sub-problems. Two matrices in the feature matrix H and kernel matrix K.sub.xw, K.sub.ww, are fixed respectively, and the remaining one matrix is studied, that is:
(24)
(25) By solving these three sub-problems, a new feature H and two kernel matrices, K.sub.xw, K.sub.ww, of the mapped training sample are obtained.
(26) Learning of the feature matrix H:
(27) For Sub-problem (1), two kernel matrices, K.sub.xw, K.sub.ww, are fixed and the feature matrix H are studied. The iterative formula of H is obtained by the gradient descent method. According to the gradient descent method:
H.sup.(t+1)=H.sup.(t).sub.1(H.sup.(t)).Math.F.sub.1(H.sup.(t)),(4)
(28) wherein .sub.1(H.sup.(t)) is a step matrix, F.sub.1(H) is the gradient of F.sub.1 related to H.
F.sub.1(H)=K.sub.wwHK.sub.wx.
(29) To ensure the non-negativeness of H in every iteration, the step matrix is selected as:
.sub.1=H(K.sub.wwH),
(30) The selected step matrix .sub.1 is taken into the iterative formula H that can be obtained from (4), with the theorem as follows.
(31) Theorem 1: the kernel matrices, K.sub.wx and K.sub.ww, are fixed, the objective function F.sub.1 is non-increasing, and the feature matrix H in Sub-problem (1) is updated by the following iterative method:
H.sup.(t+1)=H.sup.(t).Math.K.sub.wx(K.sub.wwH).
(32) Learning of the kernel matrix K.sub.xw:
(33) For Sub-problem (2), according to the gradient descent method:
K.sub.xw.sup.(t+1)=K.sub.xw.sup.(t).sub.2.sup.(t).Math.F.sub.2(K.sub.xw.sup.(t)),(5)
(34) wherein .sub.2 is a step matrix, F.sub.2(K.sub.xw) is the gradient of F.sub.2 related to the objective function K.sub.xw.
F.sub.2(K.sub.xw)=K.sub.xwHH.sup.TK.sub.xxH.sup.T.
(35) To ensure the non-negativeness of K.sub.xw in every iteration, the step matrix .sub.2 is selected as:
.sub.2.sup.(t)=K.sub.wx.sup.(t).Math.(K.sub.xw.sup.(t)HH.sup.T),
(36) .sub.2 is taken into (5) to obtain the iterative formula of K.sub.xw, with the theorem as follows:
(37) Theorem 2: the kernel matrices, H and K.sub.ww, are fixed, the objective function F.sub.2 is non-increasing, and the feature matrix K.sub.xw in Sub-problem (2) is updated by the following iterative method:
K.sub.xw.sup.(t+1)=K.sub.xw.sup.(t).Math.(K.sub.xxH.sup.T)(K.sub.xw.sup.(t)HH.sup.T).
(38) Learning of the kernel matrix K.sub.ww:
(39) For Sub-problem (3), according to the gradient descent method:
K.sub.ww.sup.(t+1)=K.sub.ww.sup.(t).sub.3.sup.(t).Math.F.sub.3(K.sub.ww.sup.(t)),(6)
(40) wherein .sub.3 is a step matrix, F.sub.3(K.sub.ww) is the gradient of F.sub.3 related to the objective function K.sub.ww.
F.sub.3(K.sub.ww)=K.sub.wwHH.sup.T+HH.sup.TK.sub.wwK.sub.wxH.sup.THK.sub.xw,
(41) To ensure the non-negativeness of K.sub.ww in every process of iteration, the step matrix .sub.3 is selected as:
.sub.3.sup.(t)=K.sub.ww.sup.(t)(K.sub.ww.sup.(t)HH.sup.T+HH.sup.TK.sub.ww.sup.(t)),
(42) .sub.3 is taken into (6) to obtain the iterative formula of K.sub.ww, with the theorem as follows:
(43) Theorem 3: the feature matrix H and kernel matrix K.sub.xv are fixed, the objective function F.sub.3 is non-increasing, and the feature matrix K.sub.ww in Sub-problem (3) is updated by the following iterative method:
K.sub.ww.sup.(t+1)=K.sub.ww.sup.(t).Math.(K.sub.wxH.sup.T+HK.sub.xw)(K.sub.ww.sup.(t)HH.sup.T+HH.sup.TK.sub.ww.sup.(t)).
(44) To sum up, through Theorem 1, Theorem 2 and Theorem 3, the iterative formula of the non-negative matrix factorization algorithm based on kernel machine learning proposed in the present patent can be obtained:
H.sup.(t+1)=H.sup.(t).Math.K.sub.wx(K.sub.wwH),
K.sub.xw.sup.(t+1)=K.sub.xw.sup.(t).Math.(K.sub.xxH.sup.T)(K.sub.xw.sup.(t)HH.sup.T),
K.sub.ww.sup.(t+1)=K.sub.ww.sup.(t).Math.(K.sub.wxH.sup.T+HK.sub.xw)(K.sub.ww.sup.(t)HH.sup.T+HH.sup.TK.sub.ww.sup.(t)),
(45) The final feature matrix H and two kernel matrices, K.sub.wx and K.sub.ww, can be obtained by the method of cross-iteration.
(46) Extraction of Features:
(47) For the test sample Y, the matrix of test sample mapped to the kernel space by non-linear mapping is (Y), which is represented by the mapped pre-image (W) in the kernel space:
(Y)=(W)H.sub.Y.Math.
(X).sup.T(Y)=(X).sup.T(W)H.sub.Y.Math.
K.sub.xy=K.sub.xwH.sub.Y
(48) wherein YR.sup.md is the matrix of all the test samples, [K.sub.xy].sub.ij=k(x.sub.i,y.sub.j). The new feature H.sub.Y of the test sample is obtained by the following three methods:
(49) Method I:
(50) the method of generalized inverse matrix directly by means of K.sub.xw iterated, comprising:
H.sub.Y=K.sub.xw.sup.+K.sub.xy.
(51) then each column of the matrix H.sub.Y represents a new feature of the image to be classified.
(52) Method II:
(53) through the non-negative matrix factorization, K.sub.xy and K.sub.x, in K.sub.xy=K.sub.xwH.sub.Y remain unchanged and the non-negative H.sub.Y can be iterated by a non-negative matrix factorization formula:
H.sub.Y.sup.(t+1)=H.sub.Y.sup.(t).Math.(K.sub.xw.sup.TK.sub.xy)(K.sub.xw.sup.TK.sub.xwH.sub.Y.sup.(t));
(54) Method III:
(55) through the non-negative sparse representation, K.sub.xy will be sparsely represented by K.sub.xw, wherein H.sub.Y is a sparse coefficient matrix, and H.sub.Y can be obtained by the iterative formula:
H.sub.Y.sup.(t+1)=H.sub.Y.sup.(t).Math.(K.sub.xw.sup.TK.sub.xy)(K.sub.xw.sup.TK.sub.xwH.sub.Y.sup.(t)+1.sub.rd);
(56) wherein is a control parameter, 1.sub.rd is an all-one matrix with the size of rd.
(57) Through the iterative H, the center of the new feature of each type of training sample is obtained. A distance between a new feature of each test sample and a new feature center of each type of training sample is calculated, and then the test image is classified according to the nearest neighbor method as the nearest distance class.
(58) The invention further provides a non-negative matrix factorization face recognition system based on kernel machine learning, comprising:
(59) a representation module, used for representing each preset training sample image as a column vector;
(60) a construction module, used for constructing a symmetric positive semi-definite kernel matrix K.sub.xx by the known kernel functions and training sample vectors;
(61) a processing module, used for establishing three objective functions respectively and realizing the minimization of the objective function through the method of cross-iteration, and the new features of the training samples in the kernel space and two kernel matrices related to the non-linearly mapped samples are obtained;
(62) a projection module, used for projecting the test sample into the kernel space through the two kernel matrices obtained in the learning phase to obtain the new features of the test sample in the kernel space;
(63) a recognition module, used for comparing the new feature of the test sample with the preset center of the new feature of each type of training sample according to the nearest neighbor method to realize the classification and identification of the test sample.
(64) In the processing module, the formula of cross-iteration is:
H.sup.(t+1)=H.sup.(t).Math.K.sub.wx(K.sub.wwH),
K.sub.xw.sup.(t+1)=K.sub.xw.sup.(t).Math.(K.sub.xxH.sup.T)(K.sub.xw.sup.(t)HH.sup.T),
K.sub.ww.sup.(t+1)=K.sub.ww.sup.(t).Math.(K.sub.wxH.sup.T+HK.sub.xw)(K.sub.ww.sup.(t)HH.sup.T+HH.sup.TK.sub.ww.sup.(t)),
(65) The final feature matrix H and two kernel matrices, K.sub.wx and K.sub.ww, can be obtained by the method of cross-iteration.
(66) The processing module, comprising:
(67) an input module, used for inputting the kernel matrix K.sub.xx;
(68) an initialization module, used for initializing the matrices H, K.sub.xw, K.sub.ww;
(69) a first processing module, used for fixing K.sub.xw, K.sub.ww and updating H;
(70) a second processing module, used for fixing H, K.sub.ww and updating K.sub.xw;
(71) a third processing module, used for fixing H, K.sub.xw, and updating K.sub.ww;
(72) a judging module, used for judging and verifying whether the stop condition is satisfied, and if the condition is satisfied, the approximate solution of H, K.sub.xw, K.sub.ww is obtained; otherwise, the first processing module is performed.
(73) In the projection module, the new feature H.sub.Y of the test sample is obtained by the following three methods:
(74) Method I:
(75) the method of generalized inverse matrix directly by means of K.sub.xw iterated, comprising:
H.sub.Y=K.sub.xw.sup.+K.sub.xy,
(76) then each column of the matrix H.sub.Y represents a new feature of the image to be classified;
(77) Method II:
(78) through the non-negative matrix factorization, K.sub.xy and K.sub.xw in K.sub.xy=K.sub.xwH.sub.Y remain unchanged and the non-negative H.sub.Y can be iterated by a non-negative matrix factorization formula:
H.sub.Y.sup.(t+1)=H.sub.Y.sup.(t).Math.(K.sub.xw.sup.TK.sub.xy)(K.sub.xw.sup.TK.sub.xwH.sub.Y.sup.(t));
(79) Method III:
(80) through the non-negative sparse representation, K.sub.xy will be sparsely represented by K.sub.xw, wherein H.sub.Y is a sparse coefficient matrix, and H.sub.Y can be obtained by the iterative formula:
H.sub.Y.sup.(t+1)=H.sub.Y.sup.(t).Math.(K.sub.xw.sup.TK.sub.xy)(K.sub.xw.sup.TK.sub.xwH.sub.Y.sup.(t)+1.sub.rd);
(81) wherein is a control parameter, 1.sub.rd is an all-one matrix with the size of rd.
(82) In the recognition module, a distance between a new feature of each test sample and a new feature center of each type of training sample is calculated, and then the test image is classified according to the nearest neighbor method as the nearest distance class.
(83) The beneficial effects of the invention are as below:
(84) 1. the invention avoids the learning of the inaccurate pre-image matrix by directly learning two kernel matrices, K.sub.wx, and K.sub.ww;
(85) 2. the invention avoids the derivation of the kernel function in the iterative formula by changing the learning object, so that there is no limit to the selection of kernel function and a general algorithm for any kernel function is obtained;
(86) 3. through the new iterative formula obtained, such an algorithm generalized for any kernel function becomes more convenient;
(87) 4. through the comparison of the experimental data in a public common face database, the algorithm achieves a higher recognition accuracy by verifying the effectiveness of the new algorithm.
(88) TABLE-US-00001 TABLE 1 Recognition rate of the algorithm proposed in the invention and the traditional PNMF algorithm in the ORL face database (%) TN 2 3 4 5 6 7 8 9 PNMF 80.97 85.76 89.91 92.50 94.04 93.74 95.14 93.57 Proposed 85.89 86.98 92.24 93.50 95.26 95.96 96.83 98.82 Method
(89)
(90) The invention only needs to learn the kernel matrix, and the method is simple and practical by establishing three different objective functions to learn the feature matrix and two kernel matrices respectively.
Explanation of Key Words
(91) Non-negative matrix factorization (NMF): the idea of NMF is to approximately factorize the non-negative sample matrix X to the product of two non-negative matrices:
X.sub.mnW.sub.mrH.sub.rn,
(92) wherein W.sub.mr and H.sub.rn are non-negative matrices. Each column of W is called the pre-image matrix and H is a coefficient matrix.
(93) Kernel non-negative matrix factorization (KNMF): the idea of KNMF is to map a non-negative sample matrix into a high-dimensional space through a non-linear mapping . The non-negative sample matrix to be mapped is approximately factorized into the product of the pre-image matrix and the coefficient matrix that are mapped:
(X.sub.mn)(W.sub.mr)H.sub.rn,
(94) wherein W.sub.mr and H.sub.rn are non-negative matrices.
(95) The foregoing are further detailed for the invention in combination with detailed preferable embodiments, but are not intended to limit detailed embodiments of the invention. Those skilled in the art can make a variety of simple deductions or variations without deviating from the principle of the invention, and all these should be covered in the protection scope of the invention.