Processing method for high order tensor data

Abstract

A processing method for high-order tensor data, which can avoid that the vectorization process of the image observation sample set damage the internal structure of the data, simplify the large amount of redundant information in the high-order tensor data in the image observation sample set, and improve the image processing speed. In this processing method for high-order tensor data, the high-order tensor data are divided into three parts: the shared subspace component, the personality subspace component and the noise part; the shared subspace component and the personality subspace component respectively represent the high-order tensor data as a group of linear combination of the tensor base and the vector coefficient; the variational EM method is used to solve the base tensor and the vector coefficient; design a classifier to classify the test samples by comparing the edge distribution of samples.

Claims

1. A processing method for high-order tensor data, comprising: dividing the high-order tensor data into three parts comprising a shared subspace component, a personality subspace component and a noise part; wherein, the shared subspace component and the personality subspace component respectively represent the high-order tensor data as a group of linear combination of a tensor base and a vector coefficient; using a variational EM method to solve the tensor base and the vector coefficient; and designing a classifier to classify a plurality of test samples by comparing an edge distribution of the plurality of samples.

2. The processing method for high-order tensor data according to claim 1, further comprising forming a two dimensional model assuming an observed data set y={Y.sub.ij}.sub.i=1,j=1.sup.I,N.sup.i; wherein the observed data set consist of N images of I individuals and each individual has Ni images; Yij represents a jth image of an ith individual and Yij is modeled as formula (1):
Y.sub.ij= custom character ×.sub.3h.sub.i.sup.T+×.sub.3g.sub.ij.sup.T+M+E.sub.ij (1) in the formula (1), ×.sub.3 h.sub.i.sup.T or ×.sub.3 g.sub.ij.sup.T is a generalization of vector representation for a given sample, and are factor tensors consisting of a plurality of base matrices in a shared subspace and an individual subspace, respectively; ×.sub.3 represents a product of tensor and matrix (vector), h.sub.i∈ custom character .sup.K.sup.1 is a shared latent variable crossing all of the plurality of images Y.sub.i1, . . . , Y.sub.iN.sub.i of object/class i and g.sub.ij∈.sup.K.sup.2 represents the vector coefficient of the jth sample of the ith object in the individual space; M is a mean matrix of the high-order data; E.sub.ij is a stochastic noise term, each element in the stochastic noise term follows the normal distribution custom character (0, σ.sup.2).

3. The processing method for high-order tensor data according to the claim 2, wherein, the vector of the formula (1) is formula (2)
vec(Y.sub.ij)=Fh.sub.i+Wg.sub.ij+vec(M)+vec(E.sub.ij) (2) wherein, columns f.sub.k.sub.1 and w.sub.k.sub.2 in the matrix F and W is respectively k1, k2 face of tensor custom character and , are used to restrict the base tensor by CP decomposition of tensor; wherein, $ℱ = 〚 λ; F^{(1)}, F^{(2)}, F^{(h)} 〛 = {.Math.}_{r = 1}^{R_{1}} λ_{r} f_{1, r}^{(1)} \circ f_{1, r}^{(2)} \circ f_{1, r}^{(h)} and$ $�� = 〚 η; W^{(1)}, W^{(2)}, W^{(g)} 〛 = {.Math.}_{r = 1}^{R_{2}} η_{r} w_{1, r}^{(1)} \circ w_{1, r}^{(2)} \circ w_{1, r}^{(g)} .$

4. The processing method for high-order tensor data according to the claim 3, wherein, Q(h, g, ρ) is any joint distribution over a plurality of hidden variables, a lower bound on a likelihood of the observed data is formula (2), $\begin{matrix} \begin{matrix} ℒ (Θ) = \log p (y .Math. Θ) \\ \geq \int_{h} \int_{g} \int_{ρ} Q (h, g, ρ) \log \frac{p (y, h, g, ρ)}{Q (h, g, ρ)} dhdgd ρ \\ = E_{h, g, ρ} [\log p (y .Math. h, g, ρ)] + E_{h} [\log \frac{p (h)}{Q (h)}] + \\ E_{g} [\log \frac{p (g)}{Q (g)}] + E_{ρ} [\log \frac{p (ρ)}{Q (ρ)}] \\ \overset{Δ}{=} \hat{ℒ} (Q (h), Q (g), Q (ρ), Θ) \end{matrix} & (3) \end{matrix}$ wherein, Q(h, g, ρ)=Q(h)Q(g)Q(ρ).

5. The processing method for high-order tensor data according to claim 1, wherein, a high-order observed data set is {y.sub.ij|y.sub.ij∈ custom character .sup.D.sup.1.sup.× . . . ×D.sup.M; i=1, . . . , I, j=1 . . . , N.sub.i, Σ.sub.iN.sub.i=N}; each sample is a tensor of order M, and a linear discriminant analysis model of the higher-order tensor is formula (4)
y.sub.ij=×.sub.(M+1)h.sub.i.sup.T+×.sub.(M+1)g.sub.ij.sup.T+.sub.ij (4); wherein h.sub.i∈ custom character .sup.K.sup.1, g.sub.ij∈.sup.K.sup.2, and are a tensor of order (M+1), wherein a size of and is respectively D.sub.1× . . . ×D.sub.M×K.sub.1 and D.sub.1× . . . ×D.sub.M×K.sub.2; wherein, the base tensor has a structure of CP decomposition, $\begin{matrix} ℱ = 〚 F^{(1)}, F^{(2)}, .Math., F^{(M)}, F^{(h)} 〛 \\ = {.Math.}_{r = 1}^{R_{1}} f_{1, r}^{(1)} \circ f_{1, r}^{(2)} \circ .Math. \circ f_{1, r}^{(M)} \circ f_{1, r}^{(h)} \end{matrix}$ $and$ $\begin{matrix} �� = 〚 W^{(1)}, W^{(2)}, .Math., W^{(M)}, W^{(g)} 〛 \\ = {.Math.}_{r = 1}^{R_{2}} w_{1, r}^{(1)} \circ w_{1, r}^{(2)} \circ .Math. \circ w_{1, r}^{(M)} \circ w_{1, r}^{(g)} . \end{matrix}$

6. The processing method for high-order tensor data according to claim 5, wherein, for a higher-order model, an EM algorithm is used to solve a problem, a posterior of h.sub.i is a multivariate normal distribution; a mean matrix and a covariance matrix are, respectively, $\begin{matrix} {\overline{h}}_{i} = (\underset{ℱ}{.Math.} + \frac{1}{\overline{ρ} N_{i}} I_{K_{1}}) a_{i} \\ \underset{h}{.Math.} = {(I_{K 1} + \frac{\overline{ρ} N}{I} {.Math.}_{ℱ})}^{- 1} \end{matrix}$ where,
custom character =F.sup.(h)[(F.sup.(m)TF.sup.(m))⊙(F.sup.(m)TF.sup.(m))]F.sup.(h)T
a.sub.i=F.sup.(h)diag(F.sup.(m)TY.sub.(m).sup.iF.sup.(m)), where ⊙ represents the Hadamard elementwise product; ${\overline{y}}_{i} = \frac{1}{N_{i}} {.Math.}_{j = 1}^{N_{i}} (y_{ij} - �� \times_{(M + 1)} g_{ij}),$ wherein, Y.sub.(m).sup.i represents a mode-m metricized version of y.sub.i, and
F.sup.(m)=F.sup.(M) custom character . . . F.sup.(m+1)F.sup.(m−1) . . . F.sup.(1) where represents the Khatri-Rao product; by computing, the posterior of g.sub.ij is a multivariate Gauss distribution, and a mean and a covariance of a latent variable g.sub.ij are
g.sub.ij=+1/ρI.sub.K).sup.−1b.sub.ij
Σ.sub.g=(I.sub.K+ρ custom character ).sup.−1
where
b.sub.ij=W.sup.(g)diag(W.sup.(m)TŶ.sub.(m).sup.ijW.sup.(m))
=W.sup.(g)[(W.sup.(m)TW.sup.(m))⊙(W.sup.(m)TW.sup.(m))]W.sup.(g)T wherein, ŷ.sub.ij=y.sub.ij−×.sub.(M+1)h.sub.i.sup.T, and Ŷ.sub.(m).sup.ij represents a mode-m metricized version of ŷ.sub.ij; and
W.sup.(m)=W.sup.(M) custom character . . . W.sup.(m+1)W.sup.(m−1) . . . W.sup.(1) an optimal Q*(ρ) is a Gamma distribution; in M-step, taking derivatives of with respect to F.sup.(n) (n=1, . . . , N) and F.sup.(h), to be zero, obtain $F^{(m)} = [{.Math.}_{i = 1}^{I} {.Math.}_{j = 1}^{N_{i}} {\tilde{Y}}_{(m)}^{ij} {\overline{F}}^{(m)} Diag (h_{i}^{T} F^{(h)})] {[({\overline{F}}^{(m) T} {\overline{F}}^{(m)}) ⊙ ({NF}^{(h) T} \underset{h}{.Math.} F^{(h)} + F^{(h) T} \tilde{H} {\tilde{H}}^{T} F^{(h)})]}^{- 1} F^{(h)} = {[\tilde{H} {\tilde{H}}^{T} + N \underset{h}{.Math.}]}^{- 1} [{.Math.}_{i = 1}^{I} {.Math.}_{j = 1}^{N_{i}} {h_{i} [diag ({\overline{F}}^{(m) T} {\tilde{Y}}_{(m)}^{(ij) T} F^{(m)})]}^{T}] [(F^{(m) T} F^{(m)}) ⊙ ({\overline{F}}^{(m) T} {\overline{F}}^{(m)})] - 1 .$ wherein, {tilde over (Y)}.sub.(m).sup.ij represents a mode-m metricized version of {tilde over (y)}.sub.ij; and {tilde over (y)}.sub.ij=y.sub.ij− custom character ×.sub.(M+1) g.sub.ij, taking derivatives of with respect to W.sup.(m) (m=1, . . . , M) and W.sup.(h), to be zero, obtain $\begin{matrix} W^{(m)} = [{.Math.}_{i = 1}^{I} {.Math.}_{j = 1}^{N_{i}} {\hat{Y}}_{(m)}^{ij} {\overline{W}}^{(m)} Diag (g_{ij}^{T} W^{(g)})] {[({\overline{W}}^{(m) T} {\overline{W}}^{(m)}) ⊙ ({NW}^{(g) T} \underset{g}{.Math.} W^{(g)} + W^{(g) T} {GG}^{T} W^{(g)})]}^{- 1} \\ and \\ W^{(h)} = {[{GG}^{T} + N \underset{g}{.Math.}]}^{- 1} [{.Math.}_{i = 1}^{I} {.Math.}_{j = 1}^{N_{i}} {g_{ij} [diag ({\overline{W}}^{(m) T} {\hat{Y}}_{(m)}^{(ij) T} W^{(m)})]}^{T}] {[(W^{(m) T} W^{(m)}) ⊙ ({\overline{W}}^{(m) T} W^{(m)})]}^{- 1} . \end{matrix}$

7. The processing method for high-order tensor data according to claim 6, wherein, for a sample Y.sub.p to be classified, in a category by calculating a likelihood under a model; a key point of recognition is the plurality of samples in a same class share a same identity variable; assuming Y.sub.1 . . . N is a plurality of observed samples, if Y.sub.p and one particular Yi belong to a same category, Y.sub.p and one particular Yi share a same h.

8. The processing method for high-order tensor data according to claim 7, wherein, Y1 and Yp belongs to the same category and both of identity variables are h1; a likelihood of the plurality of observed samples and Yp as p.sub.1 (Y.sub.1, Y.sub.2, Y.sub.p), Yp and Y2 came from the same category with the identity variable h2 and the likelihood is represented as p.sub.2 (Y.sub.1, Y.sub.2, Y.sub.p); then comparing p.sub.1 and p.sub.2, and finding out a largest likelihood value; where Yp belongs to a sample's label corresponding to the largest likelihood; the likelihood of the plurality of samples is calculated as follows; Y.sub.ij follows a matrix-variate normal conditioned on a latent variables h and g; proposing a model
y.sub.ij=Fh.sub.i+Wg.sub.ij+e.sub.ij wherein, y.sub.ij:=vec(Y.sub.ij), F=(F.sup.(2)⊙F.sup.(1))F.sup.(h)T and W=(W.sup.(2)⊙W.sup.(1))W.sup.(g)T; a plurality of factor matrices have been learned in a training phrase, the plurality of factor matrices are known to a recognition processing; the likelihood of N equals $\begin{matrix} p_{n} (y_{1 .Math. N, p}) = {.Math.}_{i = 1, i \neq n}^{N} p (y_{i}) p (y_{p}, y_{n}); & (20) \end{matrix}$ illustrating a marginal distribution of n images y.sub.1 . . . n that share the same identity variable h; Forming generative equations $\begin{matrix} [\begin{matrix} y_{1} \\ y_{2} \\ .Math. \\ y_{n} \end{matrix}] = [\begin{matrix} F & W & 0 & .Math. & 0 \\ F & 0 & W & .Math. & 0 \\ .Math. & .Math. & .Math. & ⋱ & .Math. \\ F & 0 & 0 & .Math. & W \end{matrix}] [\begin{matrix} h \\ g_{1} \\ g_{2} \\ .Math. \\ g_{n} \end{matrix}] + [\begin{matrix} e_{1} \\ e_{2} \\ .Math. \\ e_{n} \end{matrix}], & (5) \end{matrix}$ wherein,
y′=Az′+m′+e′ taking the following probabilistic assumption
p(z′)= custom character (0,I)
and
p(e′)=(0,σ.sup.2I) converting the model (5) into the form of probabilistic PCA to obtain the marginal distribution over the observed y′ by integrating out the latent variables as
p(y′)=(m′,AA.sup.T+σ.sup.2I) (21); using (21), calculating both p(y.sub.i) and p(y.sub.p, y.sub.n) and in (20) and then obtaining p.sub.n(y.sub.1, . . . , N, p); comparing all p.sub.n(y.sub.1, . . . , N, p) and finding out a largest value, then dividing the sample Yp the sample's label corresponding to the largest p.sub.n(y.sub.1, . . . , N, p).

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 shows a schematic diagram of formula (1) according to the invention.

(2) FIG. 2 shows a graphic illustration of probabilistic recognition method according to the present invention.

(3) FIG. 3 shows a visual effect of the reconstruction of the invention on the extended Yale database.

(4) FIG. 4 shows reconstructed images of the invention on AR database.

DETAILED DESCRIPTION OF THE EMBODIMENTS

(5) As shown as FIG. 1, in this processing method for high-order tensor data, the high-order tensor data are divided into three parts: the shared subspace component, the personality subspace component and the noise part; the shared subspace component and the personality subspace component respectively represent the high-order tensor data as a group of linear combination of the tensor base and the vector coefficient; the variational EM method is used to solve the base tensor and the vector coefficient; design a classifier to classify the test samples by comparing the edge distribution of samples.

(6) Compared with the traditional linear discriminant method, the present invention extracts the common feature and the individual feature directly from the tensor data in two spaces respectively. In each space, by the structural feature of the tensor, it constructs a representation way from the tensor data to the vector data, so as to extract the discriminant feature of the tensor sample, thus avoiding that the vectorization process of the image observation sample set damage the internal structure of the data, which is flexible and simple, which can extract the discriminant features of the tensor sample, simplify the large amount of redundant information in the high-order tensor data in the image observation sample set, and improve the image processing speed.

(7) Preferably, two dimensional model is assuming an observed data set y={Y.sub.ij}.sub.i=1,j=1.sup.I,N.sup.i, which consist of N images of I individuals and each individual has Ni images. Yij represents the jth image of the ith individual and it will be modeled as formula (1):
Y.sub.ij= custom character ×.sub.3h.sub.i.sup.T+×.sub.3g.sub.ij.sup.T+M+E.sub.ij (1)
In this model, ×.sub.3 h.sub.i.sup.T or ×.sub.3 g.sub.ij.sup.T can be regarded as the generalization of vector representation for a given sample, and are the factor tensors consisting of base matrices in shared subspace and individual subspace, respectively. ×.sub.3 represents the product of tensor and matrix (vector), h.sub.i∈ custom character .sup.K.sup.1 is a shared latent variable crossing all the images Y.sub.i1, . . . , Y.sub.iN.sub.i of object/class i and g.sub.ij∈.sup.K.sup.2 represents the coefficient of the jth sample of the ith object in individual space. M is the mean matrix of the data, suppose M=0. E.sub.ij is a stochastic noise term, each element in which follows the normal distribution custom character (0, σ.sup.2).

(8) Preferably, the vector of model (1) is formula (2)
vec(Y.sub.ij)=Fh.sub.i+Wg.sub.ij+vec(M)+vec(E.sub.ij) (2)
in which, the columns and f.sub.k.sub.1 and w.sub.k.sub.2 in matrix F and W is respectively k1, k2 face of tensor custom character and , are used to restrict the base tensor by CP decomposition of tensor. Assuming

(9) $ℱ = 〚 λ; F^{(1)}, F^{(2)}, F^{(h)} 〛 = {.Math.}_{r = 1}^{R_{1}} λ_{r} f_{1, r}^{(1)} {•f}_{1, r}^{(2)} {•f}_{1, r}^{(h)}$ $and$ $�� = 〚 η; W^{(1)}, W^{(2)}, W^{(g)} 〛 = {.Math.}_{r = 1}^{R_{2}} η_{r} w_{1, r}^{(1)} • w_{1, r}^{(2)} • w_{1, r}^{(g)} .$

(10) Preferably, assuming Q(h, g, ρ) is any joint distribution over the hidden variables, find that a lower bound on likelihood of the observed data is formula (2),

(11) $\begin{matrix} \begin{matrix} ℒ (⊖) = \log p (�� .Math. ⊖ \\ \geq \int_{h} \int_{g} \int_{ρ} �� (h, g, ρ) \log \frac{p (��, h, g, ρ)}{�� (h, g, ρ)} dhdgd ρ \\ = E_{h, g, ρ} [\log p (�� .Math. h, g, ρ)] + E_{h} [\log \frac{p (h)}{�� (h)}] \\ + E_{g} [\log \frac{p (g)}{�� (g)}] + E_{ρ} [\log \frac{p (ρ)}{�� (ρ)}] \\ \overset{Δ}{=} \hat{ℒ} (�� (h), �� (g), �� (ρ), ⊖) \end{matrix} & (3) \end{matrix}$
The second equation of the formula (3) is based on the feature of separation of Q(h, g, ρ), Q(h, g, ρ)=Q(h)Q(g)Q(ρ).

(12) Preferably, assuming a high-order observed data set {y.sub.ij|y.sub.ij∈ custom character .sup.D.sup.1.sup.× . . . ×D.sup.M; i=1, . . . , I, j=1, . . . , N.sub.i, Σ.sub.iN.sub.i=N}.

(13) each sample is a tensor of order M, and the linear discriminant analysis model of higher-order tensor is formula (4)
y.sub.ij= custom character ×.sub.(M+1)h.sub.i.sup.T+×.sub.(M+1)g.sub.ij.sup.T+.sub.ij (4)
Wherein h.sub.i∈.sup.K.sup.1, g.sub.ij∈.sup.K.sup.2, and is a tensor of order (M+1), whose size is respectively D.sub.1× . . . ×D.sub.M×K.sub.1 and D.sub.1× . . . ×D.sub.M×K.sub.2. Assuming base tensor has the structure of CP decomposition,

(14) $\begin{matrix} ℱ = 〚 F^{(1)}, F^{(2)}, .Math., F^{(M)}, F^{(h)} 〛 \\ = {.Math.}_{r = 1}^{R_{1}} f_{1, r}^{(1)} • f_{1, r}^{(2)} • .Math. • f_{1, r}^{(M)} • f_{1, r}^{(h)} \end{matrix}$ $and$ $\begin{matrix} �� = 〚 W^{(1)}, W^{(2)}, .Math., W^{(M)}, W^{(g)} 〛 \\ = {.Math.}_{r = 1}^{R_{2}} w_{1, r}^{(1)} • w_{1, r}^{(2)} • .Math. • w_{1, r}^{(M)} • w_{1, r}^{(g)} . \end{matrix}$

(15) Preferably, for higher-order model, EM algorithm is used to solve the problem, therefore the posterior of h.sub.i is still a multivariate normal distribution. The mean and covariance matrix are, respectively,

(16) $\begin{matrix} {\overline{h}}_{i} = (\underset{ℱ}{.Math.} + \frac{1}{\overline{ρ} N_{i}} I_{K_{1}}) a_{i} \\ \underset{h}{.Math.} = {(I_{K 1} + \frac{\overline{ρ} N}{I} \underset{ℱ}{.Math.})}^{- 1} \end{matrix}$
where,
custom character =F.sup.(h)[(F.sup.(m)TF.sup.(m))⊙(F.sup.(m)TF.sup.(m)]F.sup.(h)T
a.sub.i=F.sup.(h)diag(F.sup.(m)TY.sub.(m).sup.iF.sup.(m)).
Where ⊙ represents the Hadamard elementwise product.
Define

(17) ${\overline{��}}_{i} = \frac{1}{N_{i}} {.Math.}_{j = 1}^{N_{i}} ({��}_{ij} - W \times_{(M + 1)} g_{ij}),$
then Y.sub.(m).sup.i represents the mode-m metricized version of y.sub.i, and
F.sup.(m)=F.sup.(M) custom character . . . F.sup.(m+1)F.sup.(m−1) . . . F.sup.(1)
Where represents the Khatri-Rao product.
By computing, the posterior of g.sub.ij is still a multivariate Gauss distribution, and the mean and covariance of latent variable g.sub.ij have become
g.sub.ij=(+1/ρI.sub.K).sup.−1b.sub.ij
Σ.sub.g=(I.sub.K+ρ custom character ).sup.−1
where
b.sub.ij=W.sup.(g)diag(W.sup.(m)TŶ.sub.(m).sup.ijW.sup.(m))
Σ.sub.W=W.sup.(g)[(W.sup.(m)TW.sup.(m))⊙(W.sup.(m)TW.sup.(m))]W.sup.(g)T
Let ŷ.sub.ij=y.sub.ij−×.sub.(M+1)h.sub.i.sup.T, and Ŷ.sub.(m).sup.ij represents the mode-m metricized version of ŷ.sub.ij. And
W.sup.(m)=W.sup.(M) custom character . . . W.sup.(m+1)W.sup.(m−1) . . . W.sup.(1)
The optimal Q*(ρ) is still a Gamma distribution.
In M-step, taking derivatives of with respect to F.sup.(n) (n=1, . . . , N) and F.sup.(h), and letting them be zero, obtain

(18) $F^{(m)} = {[{.Math.}_{i = 1}^{I} {.Math.}_{j = 1}^{N_{i}} {\tilde{Y}}_{(m)}^{ij} {\overline{F}}^{(m)} Diag (h_{i}^{T} F^{(h)})] [({\overline{F}}^{(m) T} {\overline{F}}^{(m)}) ⊙ ({NF}^{(h) T} \underset{h}{.Math.} F^{(h)} + F^{(h) T} \tilde{H} {\tilde{H}}^{T} F^{(h)})]}^{- 1}$ $F^{(h)} = {{[\tilde{H} {\tilde{H}}^{T} + N \underset{h}{.Math.}]}^{- 1} [{.Math.}_{i = 1}^{I} {.Math.}_{j = 1}^{N_{i}} {h_{i} [diag ({\overline{F}}^{(m) T} {\tilde{Y}}_{(m)}^{(ij) T} F^{(m)})]}^{T}] [(F^{(m) T} F^{(m)}) ⊙ ({\overline{F}}^{(m) T} {\overline{F}}^{(m)})]}^{- 1} .$
Denote {tilde over (Y)}.sub.(m).sup.ij presents the mod-m metricized version of {tilde over (y)}.sub.ij. And {tilde over (y)}.sub.ij=y.sub.ij− custom character ×.sub.(M+1)g.sub.ij,
Taking derivatives of with respect to W.sup.(m) (m=1, . . . , M) and W.sup.(h), and letting them be zero, obtain

(19) $W^{(m)} = {[{.Math.}_{i = 1}^{I} {.Math.}_{j = 1}^{N_{i}} {\tilde{Y}}_{(m)}^{ij} {\overline{W}}^{(m)} Diag (g_{ij}^{T} W^{(g)})] [({\overline{W}}^{(m) T} {\overline{W}}^{(m)}) ⊙ ({NW}^{(g) T} \underset{g}{.Math.} W^{(g)} + W^{(g) T} {GG}^{T} W^{(g)})]}^{- 1} and$ $W^{(h)} = {{[{GG}^{T} + N \underset{g}{.Math.}]}^{- 1} [{.Math.}_{i = 1}^{I} {.Math.}_{j = 1}^{N_{i}} {g_{ij} [diag ({\overline{W}}^{(m) T} {\tilde{Y}}_{(m)}^{(ij) T} W^{(m)})]}^{T}] [(W^{(m) T} W^{(m)}) ⊙ ({\overline{W}}^{(m) T} {\overline{W}}^{(m)})]}^{- 1} .$

(20) Preferably, for a sample Y.sub.p to be classified, its category by calculating the likelihood under a model. The key point of recognition is that samples in the same class share the same identity variable. Assuming Y.sub.1 . . . N is observed samples, if Y.sub.p and one particular Yi belong to the same category, they shall share the same h.

(21) Preferably, Y1 and Yp belongs to the same category and both of identity variables are h1. In this case, the likelihood of observed samples and Yp as p.sub.1(Y.sub.1, Y.sub.2, Y.sub.p), Yp and Y2 came from the same category with the identity variable h2 and the likelihood can be represented as p.sub.2(Y.sub.1, Y.sub.2, Y.sub.p). Then we compare p.sub.1 and p.sub.2, and find out the largest likelihood value. Thus, Yp belongs to the sample's label corresponding to the largest likelihood. The likelihood of samples can be calculated as follows.

(22) Because Y.sub.ij follows a matrix-variate normal conditioned on the latent variables h and g. For convenience, we work with p(y.sub.ij) instead of p(Y.sub.ij). Here, we note y.sub.ij:=vec(Y.sub.ij). Thus, the proposed model (1) can become
y.sub.ij=Fh.sub.i+Wg.sub.ij+e.sub.ij
Where F=(F.sup.(2)⊙F.sup.(1))F.sup.(h)T and W=(W.sup.(2)⊙W.sup.(1))W.sup.(g)T
These factor matrices have been learned in training phrase, which are known to the recognition processing. Thus, the likelihood of N equals

(23) $\begin{matrix} p_{n} (y_{1 .Math. N, p}) = {.Math.}_{i = 1, i \neq n}^{N} p (y_{i}) p (y_{p}, y_{n}) & (20) \end{matrix}$
Next, we illustrate the marginal distribution of n images y.sub.1 . . . n that share the same identity variable h. The generative equations become

(24) $\begin{matrix} [\begin{matrix} y_{1} \\ y_{2} \\ .Math. \\ y_{n} \end{matrix}] = [\begin{matrix} F & W & 0 & .Math. & 0 \\ F & 0 & W & .Math. & 0 \\ .Math. & .Math. & .Math. & ⋱ & .Math. \\ F & 0 & 0 & .Math. & W \end{matrix}] [\begin{matrix} h \\ g_{1} \\ g_{2} \\ .Math. \\ g_{n} \end{matrix}] + [\begin{matrix} e_{1} \\ e_{2} \\ .Math. \\ e_{n} \end{matrix}] & (5) \end{matrix}$
noting,
y′=Az′+m′+e′
Taking the following probabilistic assumption
p(z′)= custom character (0,I)
and
p(e′)=(0,σ.sup.2I)
Converts the model (5) into the form of probabilistic PCA. Thus, the marginal distribution over the observed y′ can be easily obtained by integrating out the latent variables as
p(y′)=(m′,AA.sup.T+σ.sup.2I) (21)
By (21), both p(y.sub.i) and p(y.sub.p, y.sub.n) in (20) can be calculated and then we can obtain P.sub.n(y.sub.1, . . . , N,p). Comparing all p.sub.n(y.sub.1, . . . , N,p) and finding out the largest value, then the sample Yp is divided into the sample's label corresponding to the largest p.sub.n(y.sub.1, . . . , N,p).

(25) The invention is described in more detail below.

(26) The technical solution adopted by the invention is a probabilistic linear discriminant analysis method based on tensor data vector expression, and the specific implementation process of the method is as follows.

(27) Two Dimensional Model Construction

(28) Assuming an observed data set y={Y.sub.ij}.sub.i=1,j=1.sup.I,N.sup.i, which consist of N images of I individuals and each individual has Ni images (Σ.sub.i=1.sup.IN.sub.i=N). Yij represents the jth image of the ith individual and it will be modeled as formula (1):
Y.sub.ij= custom character ×.sub.3h.sub.i.sup.T+×.sub.3g.sub.ij.sup.T+M+E.sub.ij (1)

(29) In this model, custom character ×.sub.3 h.sub.i.sup.T or ×.sub.3 g.sub.ij.sup.T can be regarded as the generalization of vector representation for a given sample, in which coefficient is vector, as shown as FIG. 1. and are the factor tensors consisting of base matrices in shared subspace and individual subspace, respectively. ×.sub.3 represents the product of tensor and matrix (vector). h.sub.i∈ custom character .sup.K.sup.1 is a shared latent variable crossing all the images Y.sub.i1, . . . , Y.sub.iN.sub.i of object/class i and g.sub.ij∈.sup.K.sup.2 represents the coefficient of the jth sample of the ith object in individual space. M is the mean matrix of the data. Without loss of generality, we suppose M=0 (a zero matrix). E.sub.ij is a stochastic noise term, each element in which follows the normal distribution custom character (0, σ.sup.2).

(30) Furthermore, we impose a prior of multivariate normal distribution on latent variables to generate a Bayesian model

(31) 0 $p (h_{i}) = �� (h_{i} .Math. 0, I_{K_{1}}) = {(\frac{1}{2 π})}^{K_{1} / 2} \exp {- \frac{1}{2} h_{i}^{T} h_{i}}$ $and$ $p (g_{ij}) = �� (g_{ij} .Math. 0, I_{K_{2}}) = {(\frac{1}{2 π})}^{K_{2} / 2} \exp {- \frac{1}{2} g_{ij}^{T} g_{ij}}$

(32) Letting

(33) $ρ = \frac{1}{σ^{2}},$
we assume σ satisfies the following Gamma prior:

(34) $p_{σ} (ρ) = Γ (ρ .Math. a, b) = \frac{b^{a}}{Γ (a)} ρ^{a - 1} \exp {- b ρ}$

(35) After vectorizing model (1), we can obtain the following model:
vec(Y.sub.ij)=Fh.sub.i+Wg.sub.ij+vec(M)+vec(E.sub.ij) (2)

(36) in which, the columns f.sub.k.sub.1 and w.sub.k.sub.2 in matrix F and W is respectively k1, k2 face of tensor custom character and , are used to restrict the base tensor by CP decomposition of tensor. Therefore, the model is transformed into the traditional probabilistic linear discriminant analysis. However, a major problem is that the parameters in the model will increase with the increase of the original data. For example, for N-order tensor data, the projections custom character and are (N+1)-order tensors with K.sub.1Π.sub.nD.sub.n and K.sub.2Π.sub.nD.sub.n components, respectively. They will quickly reaches billions when the mode dimensionalities K.sub.1, K.sub.2, D.sub.1:N and N are moderate. These make it extremely difficult to learn parameters for base tensors. Hence, we propose to introduce tensor CP decompositions in constructing the multiplicative interactions in tensors custom character and , respectively. Assuming

(37) $ℱ = 〚 λ; F^{(1)}, F^{(2)}, F^{(h)} 〛 = {.Math.}_{r = 1}^{R_{1}} λ_{r} f_{1, r}^{(1)} • f_{1, r}^{(2)} • f_{1, r}^{(h)}$ $and$ $�� = 〚 η; W^{(1)}, W^{(2)}, W^{(g)} 〛 = {.Math.}_{r = 1}^{R_{2}} η_{r} w_{1, r}^{(1)} • w_{1, r}^{(2)} • w_{1, r}^{(g)} .$

(38) Solution of the Model

(39) For the above-mentioned model, h, g and ρ are the model latent variables and θ={F.sup.(1), F.sup.(2), F.sup.(h), W.sup.(1), W.sup.(2), W.sup.(g)} represents all the model parameters. Therefore, the joint distribution of a set of given observations y and all the latent variables can be expressed as

(40) $p (��, h, g, ρ .Math. ⊖) = \underset{i, j}{.Math.} �� (Y_{i} - ℱ \times {}_{3}h_{i}^{T} - �� \times {}_{3}w_{ij}^{T} .Math. 0, σ I, σ I)$ $�� (h_{i} .Math. 0, I_{K_{1}}) �� (g_{ij} .Math. 0, I_{K_{2}}) p_{σ} (ρ) .$

(41) So the likelihood of the sample is
custom character (θ)=log p(y|θ)=log ∫∫∫p(y,h,g,ρ|θ)dhdgdρ

(42) Furthermore, we investigate the variational electromagnetic (EM) algorithm for the proposed TPLDA model.

(43) As the proposed model depends on unobserved latent variables, we need to find a lower bound on likelihood of the observed data for any variational distribution Q(h, g, ρ) over the hidden variables.

(44) $\begin{matrix} ℒ (⊖) = \log p (�� .Math. ⊖) \geq \int_{h} \int_{g} \int_{ρ} �� (h, g, ρ) \log \frac{p (��, h, g, ρ)}{�� (h, g, ρ)} dhdgdp = E_{h, g, ρ} [\log p (�� .Math. h, g, ρ)] + E_{h} [\log \frac{p (h)}{�� (h)}] + E_{g} [\log \frac{p (g)}{�� (g)}] + E_{p} [\log \frac{p (ρ)}{�� (ρ)}] \overset{Δ}{=} \hat{ℒ} (�� (h), �� (g), �� (ρ), ⊖) & (3) \end{matrix}$

(45) By taking a separable Q(h, g, ρ), that is Q(h, g, ρ)=Q(h)Q(g)Q(ρ), the second equation can be deduced. The variational EM algorithm can be implemented as follows. 1) Variational E-Step: with fixing the current parameter value θ, Q-distributions of latent variables are calculated in E-Step. (i). Updating Q-Distribution of h.sub.i: From model (1), we can see that custom character and g.sub.ij are constant with respect to h.sub.i; hence, we can rewrite the model (1) as

(46) ${.Math.}_{j = 1}^{N_{i}} (Y_{ij} - �� \times {}_{3}g_{ij}^{T}) = N_{i} ℱ \times {}_{3}h_{i}^{T} + {.Math.}_{j = 1}^{N_{i}} E_{ij}$

(47) Define

(48) ${\overline{Y}}_{i} = \frac{1}{N_{i}} {.Math.}_{j = 1}^{N_{i}} (Y_{ij} - W \times {}_{3}g_{ij}) and {\overline{E}}_{i} = \frac{1}{N_{i}} {.Math.}_{j = 1}^{N_{i}} E_{ij} .$
Then, the above-mentioned equation becomes
Y.sub.i= custom character ×.sub.3h.sub.i.sup.T+E.sub.i

(49) And each element in Ē.sub.i follows the normal distribution

(50) $�� (0, \frac{σ^{2}}{N_{i}})$
with zero mean.

(51) In (3), the last two expectation terms are constant with respect to h.sub.i, hence we only need consider computing the first two expectations. The likelihood of observed data p(y|h, g, ρ) in (3) can be obtained using the following equation.

(52) $p (�� .Math. h, g, ρ) = {.Math.}_{i = 1}^{I} �� ({\overline{Y}}_{i} .Math. ℱ \times {}_{3}h_{i}^{T}, \frac{σ^{2}}{N_{i}})$

(53) Therefore, the posterior of h.sub.i is still a multivariate normal distribution. The mean and covariance matrix are, respectively,

(54) 0 ${\overline{h}}_{i} = (\underset{ℱ}{.Math.} + \frac{1}{\overline{ρ} N_{i}} I_{K_{1}}) a_{i}$ $and$ $\underset{h}{.Math.} = {(I_{K 1} + \frac{\overline{ρ} N}{I} Σ_{ℱ})}^{- 1}$

(55) where
a.sub.i=F.sup.(h)diag(F.sup.(1)TY.sub.iF.sup.(2))
and
custom character =F.sup.(h)[(F.sup.(1)TF.sup.(1))⊙(F.sup.(2)TF.sup.(2))]F.sup.(h)T

(56) Where ⊙ represents the Hadamard elementwise product.

(57) ii) Updating Q-Distribution of g.sub.ij:

(58) we rewrite model (1) as,
Y.sub.ij− custom character ×.sub.3h.sub.i.sup.T=×.sub.3g.sub.ij.sup.T+E.sub.ij

(59) Define Ŷ.sub.ij=Y.sub.ij− custom character ×.sub.3 h.sub.i.sup.T. Then the above-mentioned model has become
Ŷ.sub.ij=×.sub.3g.sub.ij.sup.T+E.sub.ij

(60) Thus, the posterior of g.sub.ij is still a multivariate normal distribution. The mean and covariance matrix are, respectively,
g.sub.ij=( custom character +1/ρI.sub.K).sup.−1b.sub.ij
and
Σ.sub.g=(I.sub.K+ρ).sup.−1
Where
b.sub.ij=W.sup.(g)diag(W.sup.(1)TŶ.sub.ijW.sup.(2))
and
Σ.sub.W=W.sup.(g)[(W.sup.(1)TW.sup.(1))⊙(W.sup.(2)TW.sup.(2))]W.sup.(g)T

(61) iii) Updating the posterior of ρ:

(62) utilizing Bayesian Theory, the log of optimal Q*(ρ) distribution can be calculated as expectation of the log joint distribution over all the hidden and visible variables with respect to all the other latent variables. Thus, we can get the following equation:

(63) $\begin{matrix} \log Q^{*} (ρ) = E_{h, g} [\log p (��, h, g, ρ .Math. ⊖)] \\ = \frac{D_{1} D_{2} N}{2} \log ρ - \frac{ρ}{2} ψ + (a - 1) \log ρ - b ρ \end{matrix}$

(64) where

(65) $ψ = {.Math.}_{i = 1}^{I} {.Math.}_{j = 1}^{N_{i}} {.Math. Y_{ij} - ℱ \times {}_{3}{\overline{h}}_{i}^{T} - �� \times {}_{3}{\overline{g}}_{ij}^{T} .Math.}_{F}^{2} + N tr (\underset{ℱ}{.Math.} \underset{h}{.Math.}) + N tr (Σ_{��} Σ_{g}) .$

(66) Thus, Q-Distribution of ρ is proportional to

(67) ${��}^{*} (ρ) \propto ρ^{a - 1} + \frac{D_{1} D_{2} N}{2} \exp {- b ρ - \frac{ψ}{2} ρ}$

(68) Therefore, the best Q*(ρ) is actually a Gamma distribution, wherein the parameters become

(69) $\overline{a} = a + \frac{D_{1} D_{2} N}{2} and \overline{b} = b + \frac{1}{2} ψ .$

(70) 2) Variational M-Step: With all the fixed Q-Distribution of latent variables, maximize all the parameters of custom character (Q(h), Q(g), Q(ρ), θ). We select all terms containing parameters θ in (3) and get

(71) $\hat{ℒ} \propto E_{h, g, ρ} [\log p (��, h, g, ρ [ℱ, ��)]] = {.Math.}_{i = 1}^{I} {.Math.}_{j = 1}^{N_{i}} {.Math. Y_{ij} - ℱ \times {}_{3}h_{i}^{T} - �� \times {}_{3}g_{ij}^{T} .Math.}_{F}^{2} + N tr (\underset{ℱ}{.Math.} \underset{h}{.Math.}) + N tr (\underset{��}{.Math.} \underset{g}{.Math.}) + C \overset{Δ}{=} \tilde{ℒ} .$

(72) Where C represents a constant, which is independent with parameter θ. In addition, we can deduce
custom character ×.sub.3h.sub.i.sup.T=F.sup.(1)Diag(h.sub.i.sup.TF.sup.(h))F.sup.(2).sup.T
and
×.sub.3g.sub.ij.sup.T=W.sup.(1)Diag(g.sub.ij.sup.TW.sup.(g))W.sup.(2).sup.T.

(73) Thus, custom character can be rewritten as

(74) $\hat{ℒ} \propto {.Math.}_{i = 1}^{I} {.Math.}_{j = 1}^{N_{i}} {.Math. {\tilde{Y}}_{ij} - F^{(1)} Diag (h_{i}^{T} F^{(h)}) F^{(2) T} .Math.}_{F}^{2} + N tr ([(F^{(1) T} F^{(1)}) ⊙ (F^{(2) T} F^{(2)})] F^{(h) T} \underset{h}{.Math.} F^{(h)})$

(75) where
{tilde over (Y)}.sub.ij=Y.sub.ij− custom character ×.sub.3g.sub.ij.sup.T

(76) Taking derivatives of objective function custom character with respect to F.sup.(1), F.sup.(2) and F.sup.(h), we have

(77) $F^{(1)} = {[{.Math.}_{i = 1}^{I} {.Math.}_{j = 1}^{N_{i}} {\tilde{Y}}_{ij} F^{(2)} Diag (h_{i}^{T} F^{(h)})] [(F^{(2) T} F^{(2)}) ⊙ ({NF}^{(h) T} \underset{h}{.Math.} F^{(h)} + F^{(h) T} \tilde{H} {\tilde{H}}^{T} F^{(h)})]}^{- 1}, F^{(2)} = {[{.Math.}_{i = 1}^{I} {.Math.}_{j = 1}^{N_{i}} {\tilde{Y}}_{ij}^{T} F^{(1)} Diag (h_{i}^{T} F^{(h)})] [(F^{(1) T} F^{(1)}) ⊙ ({NF}^{(h) T} \underset{h}{.Math.} F^{(h)} + F^{(h) T} \tilde{H} {\tilde{H}}^{T} F^{(h)})]}^{- 1} and$ $F^{(h)} = {{[\tilde{H} {\tilde{H}}^{T} + N \underset{h}{.Math.}]}^{- 1} [{.Math.}_{i = 1}^{I} {.Math.}_{j = 1}^{N_{i}} {h_{i} [d iag (F^{(2) T} {\tilde{Y}}_{ij}^{T} F^{(1)})]}^{T}] [(F^{(1) T} F^{(1)}) ⊙ (F^{(2) T} F^{(2)})]}^{- 1}$

(78) Note that {tilde over (H)}∈ custom character .sup.K.sup.1.sup.×N is made up of all samples' h.sub.i and the columns corresponding to the same identity are equal. can be rewritten as

(79) $\hat{ℒ} \propto {.Math.}_{i = 1}^{I} {.Math.}_{j = 1}^{N_{i}} {.Math. {\tilde{Y}}_{ij} - W^{(1)} Diag (g_{ij}^{T} W^{(g)}) W^{(2) T} .Math.}_{F}^{2} + N tr ([(W^{(1) T} W^{(1)}) ⊙ (W^{(2) T} W^{(2)})] W^{(g) T} Σ_{g} W^{(g)})$

(80) Taking derivatives of objective function custom character with respect to W.sup.(1), W.sup.(2) and W.sup.(h), and letting them be zero, we have

(81) $W^{(1)} = {[{.Math.}_{i = 1}^{I} {.Math.}_{j = 1}^{N_{i}} {\hat{Y}}_{ij} W^{(2)} Diag (g_{ij}^{T} W^{(g)})] [(W^{(2) T} W^{(2)}) ⊙ ({NW}^{(g) T} \underset{g}{.Math.} W^{(g)} + W^{(g) T} {GG}^{T} W^{(g)})]}^{- 1}$ $W^{(2)} = {[{.Math.}_{i = 1}^{I} {.Math.}_{j = 1}^{N_{i}} {\hat{Y}}_{ij}^{T} W^{(1)} Diag (g_{ij}^{T} W^{(g)})] [(W^{(1) T} W^{(1)}) ⊙ ({NW}^{(g) T} \underset{g}{.Math.} W^{(g)} + W^{(g) T} {GG}^{T} W^{(g)})]}^{- 1} and$ $W^{(g)} = {{[{GG}^{T} + N \underset{g}{.Math.}]}^{- 1} [{.Math.}_{i = 1}^{I} {.Math.}_{j = 1}^{N_{i}} {g_{ij} [diag (W^{(2) T} {\hat{Y}}_{ij}^{T} W^{(1)})]}^{T}] [(W^{(1) T} W^{(1)}) ⊙ (W^{(2) T} W^{(2)})]}^{- 1}$

(82) 3) Overall Algorithm: The entire variational EM algorithm is composed of two steps. In E-Step, fixing all parameters in θ, hidden variables are updated as their Q-distributions. In M-Step, fixing all the hidden variable posteriors we update parameters. The two steps are conducted alternatively until a termination condition is satisfied.

(83) Define {tilde over (H)}.sub.t and G.sub.t as coefficient matrices in t-step. Thus, the reconstruction errors between the original and the reconstructed image sets can be represented as

(84) 0 $e (t) = 1 - \frac{{.Math. �� - ℱ_{t} \times {}_{3}{\tilde{H}}_{t}^{T} - {��}_{t} \times {}_{3}G_{t}^{T} .Math.}_{F}}{{.Math. �� .Math.}_{F}}$

(85) When the reconstruction errors are enough small or iterative number exceeds the defined value, the algorithm stops.

(86) High Dimensional Model Construction

(87) assuming a high-order observed data set {y.sub.ij|y.sub.ij∈ custom character .sup.D.sup.1.sup.× . . . ×D.sup.M; i=1, . . . , I, j=1, . . . , N.sub.i, Σ.sub.iN.sub.i=N}

(88) each sample is a tensor of order M, and the linear discriminant analysis model of higher-order tensor is
y.sub.ij= custom character ×.sub.(M+1)h.sub.i.sup.T+×.sub.(M+1)g.sub.ij.sup.T+ε.sub.ij (4)

(89) Wherein h.sub.i∈ custom character .sup.K.sup.1, g.sub.ij∈.sup.K.sup.2, and is a tensor of order (M+1), whose size is respectively D.sub.1× . . . ×D.sub.M×K.sub.1 and D.sub.1× . . . ×D.sub.M×K.sub.2. Similar with the two order model, assuming base tensor has the structure of CP decomposition,

(90) $\begin{matrix} ℱ = 〚 F^{(1)}, F^{(2)}, .Math., F^{(M)}, F^{(h)} 〛 \\ = {.Math.}_{r = 1}^{R_{1}} f_{1, r}^{(1)} • f_{1, r}^{(2)} • .Math. • f_{1, r}^{(M)} • f_{1, r}^{(h)} \end{matrix} and \begin{matrix} �� = 〚 W^{(1)}, W^{(2)}, .Math., W^{(M)}, W^{(g)} 〛 \\ = {.Math.}_{r = 1}^{R_{2}} w_{1, r}^{(1)} • w_{1, r}^{(2)} • .Math. • w_{1, r}^{(M)} • w_{1, r}^{(g)} . \end{matrix}$

(91) The Solution of High Dimensional Model Algorithm

(92) For higher-order model, EM algorithm is used to solve the problem, therefore the posterior of h.sub.i is still a multivariate normal distribution. The mean and covariance matrix are, respectively,

(93) $\begin{matrix} {\overline{h}}_{i} = (\underset{ℱ}{.Math.} + \frac{1}{\overline{ρ} N_{i}} I_{K_{1}}) a_{i} \\ \underset{h}{.Math.} = {(I_{K 1} + \frac{\overline{ρ} N}{I} {.Math.}_{ℱ})}^{- 1} \end{matrix}$

(94) where,
custom character =F.sup.(h)[(F.sup.(m)TF.sup.(m))⊙(F.sup.(m)TF.sup.(m))]F.sup.(h)T
a.sub.i=F.sup.(h)diag(F.sup.(m)TY.sub.(m).sup.iF.sup.(m)).

(95) Where ⊙ represents the Hadamard elementwise product.

(96) Define

(97) ${\overline{y}}_{i} = \frac{1}{N_{i}} {.Math.}_{j = 1}^{N_{i}} (y_{ij} - �� \times_{(M + 1)} g_{ij}),$
then Y.sub.(m).sup.i represents the mode-m metricized version of y.sub.i, and
F.sup.(m)=F.sup.(M) custom character . . . F.sup.(m+1)F.sup.(m−1) . . . F.sup.(1)

(98) Where custom character represents the Khatri-Rao product.

(99) By computing, the posterior of g.sub.ij is still a multivariate Gauss distribution, and the mean and covariance of latent variable g.sub.ij have become
g.sub.ij=( custom character +1/ρI.sub.K).sup.−1b.sub.ij
Σ.sub.g=(I.sub.K+ρ).sup.−1
where
b.sub.ij=W.sup.(g)diag(W.sup.(m)TŶ.sub.(m).sup.ijW.sup.(m))
=W.sup.(g)[(W.sup.(m)TW.sup.(m))⊙(W.sup.(m)TW.sup.(m))]W.sup.(g)T

(100) Let ŷ.sub.ij=y.sub.ij− custom character ×.sub.(M+1)h.sub.i.sup.T, and Ŷ.sub.(m).sup.ij represents the mode-m metricized version of ŷ.sub.ij. And
W.sup.(m)=W.sup.(M) . . . W.sup.(m+1)W.sup.(m−1) . . . W.sup.(1)

(101) The optimal Q*(ρ) is still a Gamma distribution.

(102) In M-step, taking derivatives of custom character with respect to F.sup.(n) (n=1, . . . , N) and F.sup.(h)and letting them be zero, obtain

(103) $F^{(m)} = [{.Math.}_{i = 1}^{I} {.Math.}_{j = 1}^{N_{i}} {\tilde{Y}}_{(m)}^{ij} {\overline{F}}^{(m)} Diag (h_{i}^{T} F^{(h)})] {[({\overline{F}}^{(m) T} {\overline{F}}^{(m)}) ⊙ ({NF}^{(h) T} \underset{h}{.Math.} F^{(h)} + F^{(h) T} \tilde{H} {\tilde{H}}^{T} F^{(h)})]}^{- 1} F^{(h)} = {[\tilde{H} {\tilde{H}}^{T} + N \underset{h}{.Math.}]}^{- 1} [{.Math.}_{i = 1}^{I} {.Math.}_{j = 1}^{N_{i}} {h_{i} [diag ({\overline{F}}^{(m) T} {\tilde{Y}}_{(m)}^{(ij) T} F^{(m)})]}^{T}] [(F^{(m) T} F^{(m)}) ⊙ ({\overline{F}}^{(m) T} {\overline{F}}^{(m)})] - 1 .$

(104) In which {tilde over (Y)}.sub.(m).sup.ij presents the mode-m metricized version of {tilde over (y)}.sub.ij. And {tilde over (y)}.sub.ij=y.sub.ij− custom character ×.sub.(M+1)g.sub.ij.

(105) Taking derivatives of custom character with respect to W.sup.(m) (m=1, . . . , M) and W.sup.(h), and letting them be zero, obtain

(106) $\begin{matrix} W^{(m)} = [{.Math.}_{i = 1}^{I} {.Math.}_{j = 1}^{N_{i}} {\hat{Y}}_{(m)}^{ij} {\overline{W}}^{(m)} Diag (g_{ij}^{T} W^{(g)})] {[({\overline{W}}^{(m) T} {\overline{W}}^{(m)}) ⊙ ({NW}^{(g) T} \underset{g}{.Math.} W^{(g)} + W^{(g) T} {GG}^{T} W^{(g)})]}^{- 1} \\ and \\ W^{(h)} = {[{GG}^{T} + N \underset{g}{.Math.}]}^{- 1} [{.Math.}_{i = 1}^{I} {.Math.}_{j = 1}^{N_{i}} {g_{ij} [diag ({\overline{W}}^{(m) T} {\hat{Y}}_{(m)}^{(ij) T} W^{(m)})]}^{T}] {[({\overline{W}}^{(m) T} W^{(m)}) ⊙ ({\overline{W}}^{(m) T} W^{(m)})]}^{- 1} . \end{matrix}$

(107) Recognition Classifier

(108) For a sample Y.sub.p to be classified, we judge its category by calculating the likelihood under a model. The key point of recognition is that samples in the same class share the same identity variable. Assuming Y.sub.1 . . . N is observed samples, if Y.sub.p and one particular Yi belong to the same category, they shall share the same h. For example, in FIG. 2, the left subfigure shows Y1 and Yp belongs to the same category and both of identity variables are h1. In this case, we note the likelihood of observed samples and Yp as p.sub.1(Y.sub.1, Y.sub.2, Y.sub.p). The right subfigure shows Yp and Y2 came from the same category with the identity variable h2 and the likelihood can be represented as p.sub.2(Y.sub.1, Y.sub.2, Y.sub.p). Then we compare p.sub.1 and p.sub.2, and find out the largest likelihood value. Thus, Yp belongs to the sample's label corresponding to the largest likelihood. The likelihood of samples can be calculated as follows.

(109) Because Y.sub.ij follows a matrix-variate normal conditioned on the latent variables h and g. For convenience, we work with p(y.sub.ij) instead of p(Y.sub.ij). Here, we note y.sub.ij:=vec(Y.sub.ij). Thus, the proposed model (1) can become
y.sub.ij=Fh.sub.i+Wg.sub.ij+e.sub.ij

(110) Where F=(F.sup.(2)⊙F.sup.(1))F.sup.(h)T and W=(W.sup.(2)⊙W.sup.(1))W.sup.(g)T. These factor matrices have been learned in training phrase, which are known to the recognition processing. Thus, the likelihood of N equals

(111) $\begin{matrix} p_{n} (y_{1 .Math. N, p}) = {.Math.}_{i = 1, i \neq n}^{N} p (y_{i}) p (y_{p}, y_{n}) & (20) \end{matrix}$

(112) Next, we illustrate the marginal distribution of n images y.sub.1 . . . n that share the same identity variable h. The generative equations become

(113) $\begin{matrix} [\begin{matrix} y_{1} \\ y_{2} \\ .Math. \\ y_{n} \end{matrix}] = [\begin{matrix} F & W & 0 & .Math. & 0 \\ F & 0 & W & .Math. & 0 \\ .Math. & .Math. & .Math. & ⋱ & .Math. \\ F & 0 & 0 & .Math. & W \end{matrix}] [\begin{matrix} h \\ g_{1} \\ g_{2} \\ .Math. \\ g_{n} \end{matrix}] + [\begin{matrix} e_{1} \\ e_{2} \\ .Math. \\ e_{n} \end{matrix}] & (5) \end{matrix}$

(114) noting,
y′=Az′+m′+e′

(115) Taking the following probabilistic assumption
p(z′)= custom character (0,I)
and
p(e′)=(0,σ.sup.2I)

(116) Converts the model (5) into the form of probabilistic PCA. Thus, the marginal distribution over the observed y′ can be easily obtained by integrating out the latent variables as
p(y′)= custom character (m′,AA.sup.T+σ.sup.2I) (21)

(117) By (21), both p(y.sub.i) and p(y.sub.p, y.sub.n) (20) can be calculated and then we can obtain p.sub.n(y.sub.1, . . . , N, p). Comparing all p.sub.n(y.sub.1, . . . , N, p) and finding out the largest value, then the sample Yp is divided into the sample's label corresponding to the largest p.sub.n(y.sub.1, . . . , N, p)

(118) Experimental Effect of the Invention

(119) The following two public databases are used to test the reconstruction experiment.

(120) 1) The Extended Yale Face Database

(121) (http://vision.ucsd.edu/content/yale-face-database)

(122) All images in extended Yale database were captured from 38 individuals with total of 2414 frontal face images. There are 59-64 images for each individual and these images were captured in different laboratories with the same lighting conditions. In each individual, we randomly select 40 images as training sample and the remaining images as a testing sample. The resolution of all images is cropped and normalized to 32*32 pixels. Table 1 lists the comparison results between the invention and the traditional PLDA method, including the number of free parameters of projections, reconstruction results, and recognition rates. In the two methods, reduced dimension is K1=38, K2=50 on extended Yale database. The larger the reconstruction error value in the list, the better the reconstruction effect. We run each experiment ten times and record the mean recognition rate and variance. The results of TPLDA algorithm are given under four cases: R1=R2=170,200,250. From these results, we can see that the number of free parameters of TPLDA is lower than PLDA algorithm with comparable recognition rates obviously.

(123) TABLE-US-00001 TABLE 1 TPLDA PLDA 170 200 250 The number of 90112 25184 28244 33344 free parameter Reconstruction 0.8511 0.8686 0.8754 0.8837 error Recognition 0.9512 ± 0.9555 ± 0.9588 ± 0.9585 ± rate 0.0027 0.0034 0.0032 0.0029

(124) FIG. 3 shows the reconstruction visual results on extended Yale database. The first row shows 12 original images. The second row lists the images reconstructed by subspace component custom character ×.sub.3 h.sub.i.sup.T, which are the same for each identity and they did not contain illumination information. The reconstructed images ×.sub.3 g.sub.ij.sup.T are shown in the third row. We can observe that the reconstructed images by ×.sub.3 g.sub.ij.sup.T can illustrate more detailed illumination information.

(125) 2) AR Face Database

(126) (http://rvl1.ecn.purdue.edu/aleix/aleix_face_DB.html)

(127) The AR face database contains 126 subjects with over 4000 color images. The images of each subject are captured by changing facial expressions, illumination conditions, and occlusions. Each individual contains 26 frontal view images, where the first 13 images are taken in the same period and the last 13 images are taken after 2 weeks. All original images are RGB color images with a size of 768*576. In the experiments, all images are downsampled to 70*60 pixels. In this experiment, 100 individuals are selected, whose first 13 images are used for training and whose last 13 images for testing.

(128) Table 2 lists the comparison results between the invention and the traditional PLDA method, in which reduced dimension is K1=100, K2=100. In this experiment, we record the above-mentioned results under four cases: R1=R2=150,170,200,250. From this table, we can conclude that when the number of parameters is far less than the number of PLDA parameters, the recognition result is higher than the traditional method.

(129) TABLE-US-00002 TABLE 2 TPLDA PLDA 150 170 200 250 The number of 840000 69000 78200 92000 115000 free parameter Reconstruction 0.9165 0.9067 0.9089 0.9113 0.9144 error Recognition 0.7976 ± 0.7927 ± 0.803 ± 0.8242 ± 0.8341 ± rate 0.0038 0.0110 0.0102 0.0058 0.0053

(130) FIG. 4 shows the reconstructed results of the invention on AR database. All images of an individual are shown in the first row in this figure. The second and third rows are the images reconstructed by custom character ×.sub.3 h.sub.i.sup.T and ×.sub.3 g.sub.ij.sup.T, respectively. It can be clearly seen from the figure that the model of the invention can well separate the identity information from details information of a person.

(131) Recognition and Classification of 3D Video Sequences

(132) In order to verify the effectiveness of the proposed method on video sequence, the clustering effect of the method on Ballet database is tested. Ballet database contains 44 video clips and these clips were captured for three subjects performing eight complex action patterns. In this experiment, each video is clipped into subgroup as an image set. Then we can obtain 713 image sets. In training phase, we randomly select one-third samples to train and the left samples are used for testing. Table 3 lists the recognition results of the six algorithms. The reduced dimension is 30, 50 and 70, respectively. From the table, we can see that the proposed algorithm TPLDA can obtain the best results.

(133) TABLE-US-00003 TABLE 3 TPLDA Algorithm 30 50 70 PCA 0.8407 0.8323 0.8365 CP 0.8394 ± 0.01455 0.8216 ± 0.0085 0.8115 ± 0.0111 TBV-DR 0.8476 ± 0.0125 0.8430 ± 0.0129 0.8361 ± 0.0137 TRPDA 0.8553 (3, 3, 3) 0.8491 (5, 5, 3) 0.8532 (8, 8, 5) STDA 0.8550 (5, 5, 3) 0.8512 (10, 10, 5) 0.8491 (15, 15, 10) TPLDA 0.8598 ± 0.0178 0.8523 ± 0.0149 0.08551 ± 0.0170

(134) The above contents are only the preferable embodiments of the present invention, and do not limit the present invention in any manner. Any improvements, amendments and alternative changes made to the above embodiments according to the technical spirit of the present invention shall fall within the claimed scope of the present invention.

Processing method for high order tensor data

Assignee

Inventors

Cpc classification

Classification Explorer

G06V10/7715

PHYSICS

Classification Explorer

G06V40/169

PHYSICS

Classification Explorer

G06F18/24155

PHYSICS

Classification Explorer

G06F18/241

PHYSICS

Classification Explorer

G06F18/2132

PHYSICS

International classification

Classification Explorer

G06K9/62

PHYSICS

Classification Explorer

G06K9/44

PHYSICS

Abstract

Claims

Description