Hypercomplex deep learning methods, architectures, and apparatus for multimodal small, medium, and large-scale data representation, analysis, and applications
11645835 · 2023-05-09
Assignee
Inventors
Cpc classification
G06V10/454
PHYSICS
G06F18/2148
PHYSICS
G06V20/194
PHYSICS
G06N3/082
PHYSICS
G06V10/98
PHYSICS
International classification
G06N3/082
PHYSICS
G06V10/44
PHYSICS
G06V10/98
PHYSICS
Abstract
A method and system for creating hypercomplex representations of data includes, in one exemplary embodiment, at least one set of training data with associated labels or desired response values, transforming the data and labels into hypercomplex values, methods for defining hypercomplex graphs of functions, training algorithms to minimize the cost of an error function over the parameters in the graph, and methods for reading hierarchical data representations from the resulting graph. Another exemplary embodiment learns hierarchical representations from unlabeled data. The method and system, in another exemplary embodiment, may be employed for biometric identity verification by combining multimodal data collected using many sensors, including, data, for example, such as anatomical characteristics, behavioral characteristics, demographic indicators, artificial characteristics. In other exemplary embodiments, the system and method may learn hypercomplex function approximations in one environment and transfer the learning to other target environments. Other exemplary applications of the hypercomplex deep learning framework include: image segmentation; image quality evaluation; image steganalysis; face recognition; event embedding in natural language processing; machine translation between languages; object recognition; medical applications such as breast cancer mass classification; multispectral imaging; audio processing; color image filtering; and clothing identification.
Claims
1. A method for training one or more convolutional neural network (CNN) layers, the method comprising: by a computer processor having a memory coupled thereto: creating a first layer of the CNN comprising a hypercomplex representation of input training data, wherein the hypercomplex representation of the input training data comprises a first tensor, wherein a first dimension of the first tensor corresponds to a first dimension of pixels in an input image, wherein a second dimension of the first tensor corresponds to a second dimension of pixels in the input image, wherein the first dimension is orthogonal to the second dimension, and wherein a third dimension of the first tensor corresponds to different spectral components of the input image in a multispectral representation; and performing hypercomplex convolution of the first tensor with a second tensor to produce a third output tensor, wherein the second tensor is a hypercomplex representation of first weights that relate one or more distinct subsets of the pixels in the input image with each pixel in the output tensor, and wherein the third output tensor serves as input training data for a second layer of the CNN; adjusting the first weights in the second tensor such that an error function related to the input training data and its hypercomplex representation is reduced; and storing the adjusted first weights in the memory as trained parameters of the first layer of the CNN.
2. The method of 1, wherein the distinct subsets of pixels are selected to comprise local windows in the spatial dimensions of the input image.
3. The method of claim 1, wherein the input image comprises a three-dimensional image, wherein a third dimension of the first tensor corresponds to a depth of pixels in the input image, and wherein the depth is orthogonal to the first dimension and the second dimension.
4. The method of claim 1, wherein adjusting the first weights in the second tensor such that the error function related to the input training data and its hypercomplex representation is reduced comprises performing a steepest descent calculation on the first weights the second tensor.
5. The method of claim 1, the method further comprising: creating a second layer of the CNN comprising a hypercomplex representation of the third output tensor, wherein the hypercomplex representation of the third output tensor comprises a fourth tensor, performing hypercomplex convolution of the fourth tensor with a fifth tensor to produce a sixth output tensor, wherein the fifth tensor is a hypercomplex representation of second weights; adjusting the second weights in the fifth tensor such that an error function related to the third output tensor and its hypercomplex representation is reduced; and storing the adjusted second weights in the memory as trained parameters of the second layer of the CNN.
6. The method of claim 1, the method further comprising: iteratively repeating said creating, performing, adjusting and storing on one or more subsequent layers of the CNN.
7. The method of claim 1, wherein performing hypercomplex convolution of the first tensor with the second tensor comprises performing quaternion multiplication of the first tensor with the second tensor.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Advantages of the present invention will become apparent to those skilled in the art with the benefit of the following detailed description of embodiments and upon reference to the accompanying drawings in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
(25)
(26)
(27)
(28)
(29)
(30)
(31)
(32)
(33)
(34)
(35)
(36)
(37)
(38)
(39)
(40)
(41)
(42)
(43)
(44)
(45)
(46)
(47)
(48)
(49)
(50)
(51)
(52)
(53)
(54)
(55)
(56)
(57)
(58)
(59)
(60)
(61)
(62) While the invention may be susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. The drawings may not be to scale. It should be understood, however, that the drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but to the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the present invention as defined by the appended claims.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
(63) It is to be understood the present invention is not limited to particular devices or methods, which may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting. As used in this specification and the appended claims, the singular forms “a”, “an”, and “the” include singular and plural referents unless the content clearly dictates otherwise. Furthermore, the word “may” is used throughout this application in a permissive sense (i.e., having the potential to, being able to), not in a mandatory sense (i.e., must). The term “include,” and derivations thereof, mean “including, but not limited to.” The term “coupled” means directly or indirectly connected.
(64) The Detailed Description will be set forth according to the following outline:
(65) 1 Mathematical Definition 1.1 Hypercomplex layer introduction 1.2 Quaternion convolution 1.3 Octonion Convolution 1.4 Quaternion convolution for neural networks 1.5 Locally-connected layers 1.6 Hypercomplex polar form conversion 1.7 Activation function angle quantization 1.8 Hypercomplex layer learning rule 1.9 Hypercomplex error propagation 1.10 Unsupervised learning for hypercomplex layer pre-training and deep belief networks 1.11 Hypercomplex layer pre-training using expert features 1.12 Hypercomplex transfer learning 1.13 Hypercomplex layer with state 1.14 Hypercomplex tensor layer 1.15 Hypercomplex dropout 1.16 Hypercomplex pooling
(66) 2 Exemplary implementations of hypercomplex layer 2.1 Convolution implementations: GEMM 2.1.1 Using 16 large real GEMM calls 2.1.2 Using 8 large real GEMM calls 2.2 Convolution Implementations: cuDNN 2.3 Theano implementation 2.4 GPU and CPU implementations 2.5 Phase-based activation and angle quantization implementation
(67) 3 Exemplary hypercomplex deep neural network structures 3.1 Feedforward neural network 3.2 Neural network with pooling 3.3 Recurrent neural network 3.4 Neural network with layer jumps 3.5 State layers 3.6 Parallel Filter Sizes 3.7 Parallel Graphs 3.8 Combinations of Hypercomplex and Other Modules 3.9 Hypercomplex layers in a graph
(68) 4 Exemplary applications 4.1 Image super resolution 4.2 Image segmentation 4.3 Image quality evaluation 4.4 Image steganalysis 4.5 Face recognition 4.6 Natural language processing: event embedding 4.7 Natural language processing: machine translation 4.8 Unsupervised learning: object recognition 4.9 Control systems 4.10 Generative Models 4.11 Medical imaging: breast mass classification 4.12 Medical Imaging: MRI 4.13 Hypercomplex processing of multi-sensor data 4.14 Hypercomplex multispectral image processing and prediction 4.15 Hypercomplex image filtering 4.16 Hypercomplex processing of gray level images 4.17 Hypercomplex processing of enhanced color images 4.18 Multimodal biometric identity matching 4.19 Multimodal biometric identity matching with autoencoder 4.20 Multimodal biometric identity matching with unlabeled data 4.21 Multimodal biometric identity matching with transfer learning 4.22 Clothing identification
1 Mathematical Definition
(69) A hypercomplex neural network layer is defined presently. In the interest of clarity, a quaternion example is defined below. However, it is to be understood that the method described is applicable to any hypercomplex algebra, including but not limited to biquaternions, exterior algebras, group algebras, matrices, octonions, and quaternions.
(70) 1.1 Hypercomplex Layer Introduction
(71) An exemplary hypercomplex layer is shown in
(72) Mathematically, in the quaternion case, these steps are defined as follows:
(73) Convolution Step:
(74) Let α∈.sup.m×n denote the input to the quaternion layer. The first step, convolution, produces the output s∈
.sup.r×t, as defined in Equation 1:
(75)
(76) where k∈.sup.p×q represents the convolution filter kernel and the x symbol denotes quaternion multiplication. An alternative notation for hypercomplex convolution is the asterisk, where s=k*.sub.h a. Details about hypercomplex convolution are explained in Sections 1.2 and 1.4.
(77) Activation Step:
(78) Continuing with the quaternion example, the activation function is applied to s∈.sup.r×t and may be any mathematical function. An exemplary function used herein is a nonlinear function that converts the quaternion values into polar (angular) representation, quantizes the phase angles, and then recomputes updated quaternion values on an orthonormal basis (1, i, j, k). More information about this function is provided in Sections 1.6 and 1.7.
(79) 1.2 Quaternion Convolution
(80) Exemplary methods for hypercomplex convolution are described presently. The examples described herein all pertain to quaternion convolution.
(81) A quaternion example of hypercomplex convolution is pictured in .sup.p×q is convolved with a hypercomplex input a∈
.sup.m×n. The process for convolution is to extract the real-valued coefficients from k and a, perform some set of operations on the coefficients, and to finally recombine the coefficients into an output s∈
.sup.r×t.
(82) A specific example of the approach described above is shown in and q.sub.1∈
is:
(83)
(84) Because convolution is a linear operator, the multiplication in Equation 2 may be replaced by the convolution operator of Equation 1. Correspondingly, a convolution algorithm to compute s(x,y)=k(x,y)*a(x,y) for quaternion matrices is shown in Equation 3:
(85)
where *.sub.h denotes quaternion convolution, *.sub.r denotes real-valued convolution, and (x,y) are the 2d array indices.
(86) One may observe the Equation 3 requires sixteen real-valued convolution operations to perform a single quaternion convolution. However, due to the linearity of convolution, high-speed techniques for quaternion multiplication may also be applied to the convolution operation. For example,
t.sub.1(x,y)=a.sub.k(x,y)*.sub.ra.sub.s(x,y)
t.sub.2(x,y)=d.sub.k(x,y)*.sub.r(x,y)
t.sub.3(x,y)=b.sub.k(x,y)*.sub.rd.sub.a(x,y)
t.sub.4(x,y)=c.sub.k(x,y)*.sub.rb.sub.a(x,y)
t.sub.5(x,y)=(a.sub.k(x,y)+b.sub.k(x,y)+c.sub.k(x,y)+d.sub.k(x,y))*.sub.r(a.sub.a(x,y)+b.sub.a(x,y)+c.sub.a(x,y)+d.sub.a(x,y))
t.sub.6(x,y)=(a.sub.k(x,y)+b.sub.k(x,y)—c.sub.k(x,y)—d.sub.k(x,y))*.sub.r(a.sub.a(x,y)+b.sub.a(x,y)—c.sub.a(x,y)—d.sub.a(x,y))
t.sub.7(x,y)=(a.sub.k(x,y)−b.sub.k(x,y)+c.sub.k(x,y)−d.sub.k(x,y))*.sub.r(a.sub.a(x,y)−b.sub.a(x,y)+c.sub.a(x,y)−d.sub.a(x,y))
t.sub.8(x,y)=(a.sub.k(x,y)−b.sub.k(x,y)−c.sub.k(x,y)+d.sub.k(x,y))*.sub.r(a.sub.a(x,y)−b.sub.a(x,y)−c.sub.a(x,y)+d.sub.a(x,y)) 4
(87) In Equation 4, *.sub.r represents a real-valued convolution; one will observe that there are eight real-valued convolutions.
(88) To complete the quaternion convolution s(x,y), the temporary terms t.sub.i are scaled and summed as shown in Equation 5:
(89)
(90) 1.3 Octonion Convolution
(91) Octonions represent another example of hypercomplex numbers. Octonion convolution may be performed using quaternion convolution as outlined presently:
(92) Let o.sub.n∈:
o.sub.n=w.sub.0o.sub.0n+e.sub.1o.sub.1n+e.sub.2o.sub.2n+e.sub.3o.sub.3n+e.sub.4o.sub.4n+e.sub.5o.sub.5n+e.sub.6o.sub.6n+e.sub.7o.sub.7n 6
(93) To convolve octonions o.sub.a*.sub.oo.sub.b, first represent each argument as a pair of quaternions, resulting in w, x, y, z∈:
w=1o.sub.0a+io.sub.1a+jo.sub.2a+ko.sub.3a
x=1o.sub.4a+io.sub.5a+jo.sub.6a+ko.sub.7a
y=1o.sub.0b+io.sub.1b+jo.sub.2b+ko.sub.3b
z=1o.sub.4b+io.sub.5b+jo.sub.6b+ko.sub.7b 7
(94) Next, perform quaternion convolution, for example, as described in Section 1.2:
s.sub.L=w*.sub.hy−z*.sub.hx*
s.sub.R=w*.sub.hz−y*.sub.hx 8
(95) where a superscript * denotes quaternion conjugation and *.sub.h denotes quaternion convolution.
(96) Finally, recombine s.sub.L and s.sub.R to form the final result s=o.sub.a*.sub.oo.sub.b∈:
s.sub.L=1a.sub.L+ib.sub.L+jc.sub.L+kd.sub.L
s.sub.R=1a.sub.R+ib.sub.R+jc.sub.R+kd.sub.R
s=e.sub.0a.sub.L+e.sub.1b.sub.L+e.sub.2c.sub.L+e.sub.3d.sub.L+e.sub.4a.sub.R+e.sub.5b.sub.R+e.sub.6c.sub.R+e.sub.7d.sub.R 9
(97) The exemplary process of octonion convolution described above is shown in
(98) 1.4 Quaternion Convolution for Neural Networks
(99) Sections 1.2 and 1.3 describe an examples of hypercomplex convolution of two-dimensional arrays. The techniques described above may be employed in multi-dimensional convolution that is typically used for neural network tasks.
(100) For example,
(101) The approach described above may also be extended to input arrays with a depth of larger than 1, thereby causing the filter kernel to become 4-dimensional. Conceptually, a loop over the input and output dimensions may be performed. The inside of the loop contains two-dimensional convolutions as described in the prior section. Note further that each two-dimensional convolution takes input values from all depth levels of the input array. If, for example, the 10×10 input array had an input depth of two, then each convolution kernel would have 3×3×2 weights, rather than 3×3 weights as described above. Therefore, the typical shape of a 4-dimensional hypercomplex convolution kernel is (D.sub.o, D.sub.i, K.sub.x, K.sub.r), where D.sub.o represents the output depth, D.sub.i represents the input depth, and K.sub.x and K.sub.y are the 2d filter kernel dimensions. Finally, in the present hypercomplex example of quaternions, all of the data points above are quaternion values and all convolutions are quaternion in nature. As will be discussed in the implementation sections below, existing computer software libraries typically do not support quaternion arithmetic. Therefore, an additional dimension may be added in software to represent the four components (1, i, j, k) of a quaternion.
(102) 1.5 Locally-Connected Layers
(103) The convolutions described thus far are sums of local, weighted windows of the input, where the filter kernel represents a set of shared weights for all window locations. Rather than using shared weights in a filter kernel, a separate set of weights may be used for each window location. In this case, one has a locally-connected hypercomplex layer, rather than a convolutional hypercomplex layer.
(104) 1.6 Hypercomplex Polar Form Conversion
(105) The exemplary quaternion conversion to and from polar (angular) form is defined presently. Let a single quaternion value output from the convolution step be denoted as s∈:
s=a+bi+cj+dk 10
(106) The polar conversion representation is shown in Equation 11:
s=|s|e.sup.iϕe.sup.jθe.sup.kψ 11
, where s.sub.p∈ represents a single quaternion number and (ϕ, θ, ψ) represent the quaternion phase angles. Note that a term to represent the norm of the quaternion, |s| is intentionally set to one during the polar conversion.
(107) The angles (ϕ, θ, ψ) are calculated as shown in Equation 12:
(108)
(109) The definition of tan 2.sup.−1(x,y) is given in Equation 13:
(110)
(111) Most software implementations of Equation 13 return zero for the case of x=0, y=0, rather than returning an error or a Not a Number (NaN) value.
(112) Finally, to convert the quaternion polar form in Equation 11 to the standard form of Equation 10, one applies Euler's formula as shown in Equation 14:
(113)
(114) In Equation 14, s.sub.u∈has the “u” subscript because it is the unit-norm version of our original variable s∈
. Not restricting |s| to 1 in Equation 10 would result in s.sub.u=s.
(115) 1.7 Activation Function Angle Quantization
(116) to polar form using Equation 12 to produce a set of angles (ϕ, θ, ψ). These angles are then quantized using a set of pre-determined output values. Each angle is associated with a unique set of output angles, and those sets may or may not be equal. Examples of differing output angle sets are shown in
(117) The quantization process described above creates a set of three new angles, (ϕ.sub.p, θ.sub.p, ψ.sub.p), as shown in
(118) 1.8 Hypercomplex Layer Learning Rule
(119) A learning rule for a single hypercomplex neuron is described presently. A layer of hypercomplex neurons is merely a collection of hypercomplex neurons that run in parallel, therefore the same learning rule applies to all neurons independently. The output of each neuron forms a single output of a hypercomplex layer. This learning rule is an example of the quaternion case and is not meant to limit the scope of the claims in this document.
(120) Stochastic gradient descent has gained popularity in the machine learning community for training real-valued neural network layers. However, because the activation function described Section 1.7 and shown in
(121) A typical error correction weight update rule for a fully-connected neural network layer is shown in Equation 15:
w.sub.i.sup.(k+1)=w.sub.i.sup.(k)+μ.Math.δ.sub.i.Math.
(122) Equation 15 represents training cycle k of a neuron, where x.sub.i is the input value to the weight, δ.sub.i is the related to the current training error, μ is the learning rate, w.sub.i.sup.(k) is the current value of the weight, w.sub.i.sup.(k+1) is the updated value of the weight, and i is the index of all the weights for the neuron.
(123) To extend the fully-connected error correction rule in Equation 15 to convolutional layers, the multiplication between δ.sub.i and
(124) Returning to the fully-connected example, the goal of training a neuron is to have its output, y.sup.(k+1) equal some desired response value of d∈. Solving for δ.sub.i:
(125)
(126) Note that Equation 16 employs the relationship:
√{square root over (x.Math.,
, or
17
(127) where ∥x∥ is the 2-norm of x and x is the quaternion conjugate of x.
(128) Further note that the norm of each input is assumed to equal one, as the norm of the output from the activation function in Sections 1.6 and 1.7 is equal to one. This is not a restriction of the algorithm, as weight updates in the learning algorithm (discussed in Sections 1.8 and 1.9) may be scaled to account for non-unit-norm inputs, and/or inputs may be scaled to unit norm. Equation 15 clearly has more unknowns than variables and therefore does not have a solution. Accordingly, the authors assume without proof that each neural weight shares equal responsibility for the final error, and that all of the δ.sub.i variable should equal one another. This leads to the following weight update rule, assuming that there are n weights:
(129)
(130) In equation 18, k represents the current training cycle, x.sub.i is the input value to the weight, n is the number of neurons, μ is the learning rate, w.sub.i.sup.(k) is the current value of the weight, w.sub.i.sup.(k+1) is the updated value of the weight, and i is the index of all the weights for the neuron.
(131) Extending this approach to convolution and multiple neurons, the new weight update rule is:
(132)
where W.sup.(k+1) is the updated weight array, W.sup.(k) is the current weight array, n is the number of neurons, μ is the learning rate, D is the desired response vector, Y.sup.(k) is the current response vector, * represents quaternion convolution, and
(133) 1.9 Hypercomplex Error Propagation
(134) This section provides an example of error propagation between hypercomplex layers to enable learning in multi-layer graph structures. This particular example continues the quaternion example of Section 1.8. Following the approach in Section 1.8, the multi-layer learning rule will be derived using a fully-connected network, and then, by linearity of convolution, the appropriate hypercomplex multiplication will be converted into a quaternion convolution operation.
(135)
(136) As described in Section 1.8, the neurons each use an error correction rule for learning and, consequently, cannot be used with the gradient descent learning methods that are popular in existing machine learning literature.
(137) Following the approach in Section 1.8, we set the output of the neural network equal to the desired response d and solve for the error terms δ.sub.i:
(138)
(139) Solving for the training error in terms of δ.sub.A.sub.
(140)
(141) As in the single-neuron case, there is more than one solution to Equation 21. In order to resolve this problem, the assumption is that each neural network layer contributes equally to the final network output error, implying:
(142)
(143) Accordingly, errors are propagated through the graph (or network) by scaling the errors by the hypercomplex multiplicative inverse of the connecting weights.
(144) To propagate the error from the output of a layer to its inputs:
(145)
where e.sub.l represents the error at the current layer (or the network error at the output layer), [w.sub.l.sup.(k).sup.
(146) Once the error terms e.sub.l have been computed, the weight update is similar to the single-layer case of Section 1.8:
(147)
where n represents the number of neurons in layer l, and *.sub.h represents hypercomplex convolution.
(148) For inputs that are not of unit-norm, Equation 24 may be modified to scale the weight updates:
(149)
(150) 1.10 Unsupervised Learning for Hypercomplex Layer Pre-Training and Deep Belief Networks
(151) The hypercomplex learning algorithms of Sections 1.8 and 1.9 both presume that the initial weights for each layer are selected randomly. However, this need not be the case. For example,
(152) Suppose there is a multi-layer hypercomplex neural network in which the input layer is “Layer A”, the first hidden layer is “Layer B”, the next layer is “Layer C”, and so on. Unsupervised learning of Layer A may be performed by removing it from the network, attaching a fully-connected layer, and training the new structure as an autoencoder, which means that the desired response is equal to the input. A diagram of an exemplary hypercomplex autoencoder has been drawn in
(153) One may use the pre-trained output from Layer A to pre-train Layer B in the same manner. This is shown in the lower half of
(154) Once pre-training of all layers is complete, the hypercomplex weight settings of each layer may be copied to the original multi-layer network, and fine-tuning of the weight parameters may be performed using any appropriate learning algorithm, such as those developed in Sections 1.8 and 1.9.
(155) This approach is superior to starting with random weights and propagating errors for two reasons: First, it allows one to use large quantities of unlabeled data for the initial pre-training, thereby expanding the universe of useful training data; and, second, since weight adjustment through multi-layer error propagation takes many training cycles, the pre-training procedure significantly reduces overall training time by reducing the workload for the error propagation algorithm.
(156) Moreover, the autoencoders described above may be replaced by any other unsupervised learning method, for example, restricted Boltzmann machines (RBMs). Applying contrastive divergence to each layer, from the lowest to the highest, results in a hypercomplex deep belief network.
(157) 1.11 Hypercomplex Layer Pre-Training Using Expert Features
(158) Historically, multi-layer perceptron (MLP) neural network classification systems involve the following steps: Input data, such as images, are converted to a set of features, for example, local binary patterns. Each feature, which, for example, takes the form of a real number, is stacked into a feature vector that is associated with a particular input pattern. Once the feature vector for each input pattern is computed, the feature vectors and system desired response (e.g. classification) values are presented to a fully-connected MLP neural network for training. Many have observed that overall system performance is highly dependent upon the selection of features, and therefore domain experts have spent extensive time engineering features for these systems.
(159) One may think of hypercomplex convolutional networks as a way to algorithmically learn hypercomplex features. However, it may be desirable to incorporate expert features into the system as well. An exemplary method for this is shown in
(160) As with the unsupervised pre-training method described in Section 1.10, the hypercomplex layer or layer is pre-trained and then its weights are copied back to the original hypercomplex graph. Learning rules, for example those of Sections 1.8 and 1.9, may then be applied to the entire graph to fine tune the weights.
(161) Using this method, one can start with feature mappings defined by domain experts and then improve the mappings further with hypercomplex learning techniques. Advantages include use of domain-expert knowledge and reduced training time.
(162) 1.12 Hypercomplex Transfer Learning
(163) Deep learning algorithms perform well in systems where an abundance of training data is available for use in the learning process. However, many applications do not have an abundance of data; for example, many medical classification tasks do not have large databases of labeled images. One solution to this problem is transfer learning, where other data sets and prediction tasks are used to expand the trained knowledge of the hypercomplex neural network or graph.
(164) An illustration of transfer learning is shown in
(165) There are numerous ways of performing the knowledge transfer, though methods will generally seek to represent the “other tasks” and target task in the same feature space through an adaptive (and possibly nonlinear) transform, e.g., using a hypercomplex neural network.
(166) 1.13 Hypercomplex Layer with State
(167) An exemplary neural network layer is discussed in 1.1 and shown in
(168)
(169) 1.14 Hypercomplex Tensor Layer
(170) The hypercomplex tensor layer enables evaluation of whether or not hypercomplex vectors are in a particular relationship. An exemplary quaternion layer is defined in Equation 26.
(171)
(172) In Equation 26, a∈.sup.d×1 and b∈
.sup.d×1 represent quaternion input vectors to be compared. There are k relationships R that may be established between a and b, and each relationship has its own hypercomplex weight array W.sub.R.sup.[i]∈
.sup.d×d, where 1≤i≤k (and W.sub.R.sup.[1:k]∈
.sup.d×d×k). The output of a.sup.T.Math.W.sub.R.sup.[1:k].Math.b is computed by slicing W.sub.R.sup.[1:k] for each value of [1, k] and performing two-dimensional hypercomplex matrix multiplication. Furthermore, the hypercomplex weight array V.sub.R∈
.sup.k×2d that acts as a fully-connected layer. Finally, a set of hypercomplex bias weights b.sub.R∈
.sup.k×1 may optionally be present.
(173) The function f is the layer activation function and may, for example, be the hypercomplex angle quantization function of Section 1.7. Finally, the weight vector u.sub.R∈.sup.k×1 is transposed and multiplied to create the final output y E H. When an output vector in
.sup.k×1 is desired rather than a single quaternion number, the multiplication with u.sub.R.sup.T may be omitted.
(174) A major advantage to hypercomplex numbers in this layer structure is that hypercomplex multiplication is not commutative, which helps the learning structure understand that g(a, R, b)≠g(b, R, a). Since many relationships are nonsymmetrical, this is a useful property. For example, the relationship, “The branch is part of the tree,” makes sense, whereas the reverse relationship, “The tree is part of the branch,” does not make sense. Moreover, due to the hypercomplex nature of a and b, one can compare relationships between tuples rather than only single elements.
(175) 1.15 Hypercomplex Dropout
(176) In order to prevent data overfitting, dropout operators are frequently found in neural networks. A real-valued dropout operator sets each element in a real-valued array to zero with some nonzero (but usually small) probability. Hypercomplex dropout operators may also be employed in hypercomplex networks, again to prevent data overfitting. However, in a hypercomplex dropout operator, if one of the hypercomplex components (e.g. (1, i, j, k) in the case of a quaternion) is set to zero, then all other components must also be zero to preserve inter-channel relationships within the hypercomplex value. If unit-norm output is desired or required, the real component may assigned a value of one and all other components may be assigned a value of zero.
(177) 1.16 Hypercomplex Pooling
(178) It is frequently desirable to downsample data within a hypercomplex network using a hypercomplex pooling operation. An exemplary method for performing this operation is to take a series of local windows, apply a function to each window (e.g. maximum function), and finally represent the data using only the function's output for each window. In hypercomplex networks, the function will, for example, take arguments that involve all hypercomplex components and produce an output that preserves inter-channel hypercomplex relationships.
2 Exemplary Implementations of Hypercomplex Layer
(179) A major difficulty in the practical application of learning systems is the computational complexity of the learning process. In particular, the convolution step described in Equation 1 typically cannot be computed efficiently via Fast Fourier Transform due to small filter kernel sizes. Accordingly, significant effort has been expended by academia and industry to optimize real-valued convolution for real-valued neural network graphs. However, no group has optimized hypercomplex convolution for these tasks, nor is there literature on hypercomplex deep learning. This section presents methods for adapting real-valued computational techniques to hypercomplex problems. The techniques in this section are critical to making the hypercomplex systems described in Section 1 practical for engineering use.
(180) 2.1 Convolution Implementations: GEMM
(181) One approach to computing the hypercomplex convolution of Equation 1 is to write a set of for loops that shift the hypercomplex kernel to every appropriate position in the input image, perform a small set of multiplications and additions, and then continue to the next position. While such an approach would theoretically work, modern computer processors are optimized for large matrix-matrix multiplication operations, rather than multiplication between small sets of numbers. Correspondingly, the approach presented in this subsection reframes the hypercomplex convolution problem as a large hypercomplex multiplication, and then explains how to use highly-optimized, real-valued multiplication libraries to complete the computation. Examples of such real-valued multiplication libraries include: Intel's Math Kernel Library (MKL); Automatically Tuned Linear Algebra Software (ATLAS); and Nvidia's cuBLAS. All of these are implementations of a Basic Linear Algebra Subprograms (BLAS) library, and all provide matrix-matrix multiplication functionality through the GEMM function.
(182) In order to demonstrate a hypercomplex neural network convolution, an example of the quaternion case is discussed presently. To aid the discussion, define the following variables:
X=inputs of shape(G,D.sub.i,4,X.sub.i,Y.sub.i)
A=filter kernel of shape(D.sub.o,D.sub.i,4,K.sub.x,K.sub.y)
S=outputs of shape(G,D.sub.o,4,X.sub.o,Y.sub.o) 27
where G is the number of data patterns in a processing batch, D.sub.i is the input depth, D.sub.o is the output depth, and the last two dimensions of each size represent the rows and columns for each variable. Note that the variables X, A, and S have been defined to correspond to real-valued memory arrays common in modern computers. Since each memory location holds a single real number and not a hypercomplex number, each variable above has been given an extra dimension of size 4. This dimension is used to store the quaternion components of the array. The goal of this computation is to compute the quaternion convolution S=A*X. As discussed above, one strategy is to reshape the A and X matrices such that a single quaternion matrix multiply could be employed to compute the convolution. Reshaped matrices A′ and X′ are shown in Equation 28, shown in
(183) In A′ of Equation 28, a.sub.i are row vectors. The variable i indexes the output dimension of the kernel, D.sub.0. Each row vector is of length D.sub.i.Math.K.sub.x.Math.K.sub.y, corresponding to a filter kernel at all input depths. Observe that the A′ matrix is of depth 4 to store the quaternion components; therefore, A′ may be thought of as a two-dimensional quaternion matrix.
(184) In X′ of Equation 28, x.sub.r,s are column vectors. The variable r indexes the data input patterns dimension, G. Each column vector is of length D.sub.i.Math.K.sub.x.Math.K.sub.y, corresponding to a filter kernel at all input depths. Since the filter kernel must be applied to each location in the image, for each input pattern, there are many columns x.sub.r,s. The s subscript is to index the filter locations; each input pattern contains M filter locations, where M equals:
M=(X.sub.i−K.sub.x+1).Math.(Y.sub.i−K.sub.y+1) 29
(185) The total number of columns of X′ is equal to G.Math.M. Like A′, X′ is a two-dimensional quaternion matrix and is stored in three real dimensions in computer memory.
(186) The arithmetic for quaternion convolution is performed in Equation 28, using a quaternion matrix-matrix multiply function that is described below in Sections 2.1.1 and 2.1.2.
(187) One will observe that the result S′ from Equation 28 is still a two-dimensional quaternion matrix. This result is reshaped to form the final output S.
(188) 2.1.1 Using 16 Large Real GEMM Calls
(189) An example of a quaternion matrix multiply routine that may be used to compute Equation 28 is discussed presently. One approach is to employ Equation 2, which computes the product of two quaternions using sixteen real-valued multiply operations. The matrix form of this equation is identical to the scalar version in Equation 2, and each multiply may be performed using a highly-optimized GEMM call to the appropriate BLAS library. Moreover, the sixteen real-valued multiply operations may be performed in parallel if hardware resources permit.
(190) 2.1.2 Using 8 Large Real GEMM Calls
(191) Another example of a quaternion matrix multiply routine only requires eight GEMM calls, rather than the sixteen calls of Equation 2. This method takes arguments k(x,y)=a.sub.k(x,y)+b.sub.k(x,y)i+c.sub.k(x,y)i+d.sub.k (x,y)k and a(x,y)=a.sub.a(x,y)+b.sub.a(x,y)i+c.sub.a(x,y)j+d.sub.a(x,y)k, and performs the quaternion operation k.Math.a. The first step is to compute eight intermediate values using the real-valued GEMM call:
t.sub.1(x,y)=a.sub.k(x,y).Math.a.sub.a(x,y)
t.sub.2(x,y)=d.sub.k(x,y).Math.c.sub.a(x,y)
t.sub.3(x,y)=b.sub.k(x,y).Math.d.sub.a(x,y)
t.sub.4(x,y)=c.sub.k(x,y).Math.b.sub.a(x,y)
t.sub.5(x,y)=(a.sub.k(x,y)+b.sub.k(x,y)+c.sub.k(x,y)+d.sub.k(x,y)).Math.(a.sub.a(x,y)+b.sub.a(x,y)+c.sub.a(x,y)+d.sub.a(x,y))
t.sub.6(x,y)=(a.sub.k(x,y)+b.sub.k(x,y)−c.sub.k(x,y)−d.sub.k(x,y)).Math.(a.sub.a(x,y)+b.sub.a(x,y)−c.sub.a(x,y)−d.sub.a(x,y))
t.sub.7(x,y)=(a.sub.k(x,y)−b.sub.k(x,y)+c.sub.k(x,y)−d.sub.k(x,y)).Math.(a.sub.a(x,y)−b.sub.a(x,y)+c.sub.a(x,y)−d.sub.a(x,y))
t.sub.8(x,y)=(a.sub.k(x,y)−b.sub.k(x,y)−c.sub.k(x,y)+d.sub.k(x,y)).Math.(a.sub.a(x,y)−b.sub.a(x,y)−c.sub.a(x,y)+a.sub.a(x,y)) 30
(192) To complete the quaternion multiplication s(x,y), the temporary terms t.sub.1 are scaled and summed as shown in Equation 31:
(193)
(194) Because the GEMM calls represent the majority of the compute time, the method in this section executes more quickly than the method of Section 2.1.1.
(195) 2.2 Convolution Implementations: cuDNN
(196) One pitfall to the GEMM-based approach described in the prior section is that formation of the temporary quaternion matrices A′ and X′ is memory-intensive. Graphics cards, for example those manufactured by Nvidia, are frequently used for matrix multiplication. Unfortunately, these cards have a limited onboard memory, and therefore inefficient use of memory is a practical engineering problem.
(197) For real-valued convolution, memory-efficient software such as Nvidia's cuDNN library has been developed. This package performs real-valued convolution in a memory- and compute-efficient manner. Therefore, rather than using the GEMM-based approach above, adapting cuDNN or another convolution library to hypercomplex convolution may be advantageous. Because, like multiplication, convolution is a linear operation, the algorithms for quaternion multiplication may be directly applied to quaternion convolution by replacing real-valued multiplication with real-valued convolution. This lead to Equations 3, 4, and 5 of Section 1.2 and is explained in more detail there. The real-valued convolutions in these equations may be carried out using an optimized convolution library such as cuDNN, thereby making quaternion convolution practical on current computer hardware.
(198) 2.3 Theano Implementation
(199) The hypercomplex neural network layers described thus far are ideal for use in arbitrary graph structures. A graph structure is a collection of layers (e.g. mathematical functions, hypercomplex or otherwise) with a set of directed edges that define the signal flow from each layer to the other layers (and potentially itself). Extensive effort has been expended to create open-source graph solving libraries, for example, Theano.
(200) Three key mathematical operations are required to use the hypercomplex layers with a graph library such as Theano: First, the forward computation through the layer (Sections 1.2 to 1.7); second, a weight update must be computed by a learning rule for a single layer (Section 1.8); and, third, errors must be propagated through the layer to the next graph element (Section 1.9). Since these three operations have been introduced in this document, it is therefore possible to use the hypercomplex layer in an arbitrary graph structure.
(201) The authors have implemented an exemplary set of Theano operations to enable the straightforward construction of arbitrary graphs of hypercomplex layers. The authors employ the memory storage layout of using a three-dimensional real-valued array to represent a two-dimensional quaternion array; this method is described further in Section 2.1 in the context of GEMM operations. Simulation results in Section 4 have been produced using the exemplary Theano operations and using the cuDNN library as described in Section 2.2.
(202) 2.4 GPU and CPU Implementations
(203) As has been alluded to throughout this document, computational efficiency is a key criteria that must be met in order for hypercomplex layers to be practical in solving engineering challenges. Because convolution and matrix multiplication are computationally intensive, the standard approach is to run these tasks on specialized hardware, such as a graphics processing unit (GPU), rather than on a general-purpose processor (e.g. from Intel). The cuDNN library referenced in Section 2.2 is specifically written for Nvidia GPUs. The exemplary implementation of hypercomplex layers indeed employs the cuDNN library and therefore operates on the GPU. However, the computations may be performed on any other computational device and the implementation discussed here is not meant to limit the scope or claims of this patent.
(204) 2.5 Phase-Based Activation and Angle Quantization Implementation
(205) An important bottleneck in GPU computational performance is the time delay to transfer data to and from the GPU memory. Because the computationally-intensive tasks of convolution and multiplication are implemented on the GPU in the exemplary software, it is critical that all other graph operations take place on the GPU to reduce transfer time overhead. Therefore, the activation function described in Sections 1.6 and 1.7 have also been implemented using the GPU, as have pooling operators and other neural network functions.
3 Exemplary Hypercomplex Deep Neural Network Structures
(206) Sections 1 and 2 of this document have provided examples of a hypercomplex neural network layers, using quaternion numbers for illustrative purposes. This section discusses exemplary graph structures (i.e. “neural networks”) of hypercomplex layers. The layers may be arranged in any graph structure and therefore have wide applicability to the engineering problems discussed in Section 4. The Theano implementation of the hypercomplex layer, discussed in Section 2.3, allows for construction of arbitrary graphs of hypercomplex layers (and other components) using minimal effort.
(207) 3.1 Feedforward Neural Network
(208)
(209) 3.2 Neural Network with Pooling
(210) The feedforward neural network of
(211) 3.3 Recurrent Neural Network
(212) Graphs of hypercomplex layers also may contain feedback loops, as shown in
(213) 3.4 Neural Network with Layer Jumps
(214) As shown in
(215) 3.5 State Layers
(216) In the entirety of this document, the term “hypercomplex layer” is meant to encompass any variation of a hypercomplex neural network layer, including, but not limited to, fully-connected layers such as those of
(217) 3.6 Parallel Filter Sizes
(218)
(219) 3.7 Parallel Graphs
(220) Hypercomplex layers and/or graphs may, for example, be combined in parallel with other structures. The final output of such a system may be determined by combining the results of the hypercomplex layers and/or graph and the other structure using any method of fusion, e.g. averaging, voting systems, maximum likelihood estimation, maximum a posteriori estimation, and so on. An example of this type of system is depicted in
(221) 3.8 Combinations of Hypercomplex and Other Modules
(222) Hypercomplex layers, graphs, and/or modules may be combined in series with nonhypercomplex components; an example of this is shown in
(223) 3.9 Hypercomplex Layers in a Graph
(224) The hypercomplex layers may be arranged in any graph-like structure; Sections 3.1 to 3.8 provide examples but are not meant to limit the arrangement or interconnection of hypercomplex layers. As discussed in Section 2.3, the exemplary quaternion hypercomplex layer has specifically been implemented to allow for the creation of arbitrary graphs of hypercomplex layers. These layers may be combined with any other mathematical function(s) to create systems.
4 Exemplary Applications
(225) This section provides exemplary engineering applications of the hypercomplex neural network layer.
(226) 4.1 Image Super Resolution
(227) An exemplary application of hypercomplex neural networks is image super resolution. In this task, a color image is enlarged such that the image is represented by a larger number of pixels than the original. Image super resolution is performed by digital cameras, where it is called, “digital zoom,” and has many applications in surveillance, security, and other industries.
(228) Image super resolution may be framed as an estimation problem: Given a low-resolution image, estimate the higher-resolution image. Real-valued, deep neural networks have been used for this task. To use a neural network for image super resolution, the following steps are performed: To simulate downsampling, full-size original images are blurred using a Gaussian kernel; the blurred images are paired with their original, full-resolution sources and used to train a neural network as input and desired response, respectively; and, finally, new images are presented to the neural network and the network output is taken to be the enhanced image.
(229) One major limitation of the above procedure is that real-valued neural networks do not understand color, and therefore most approaches in literature are limited to grayscale images. We adapt the hypercomplex neural network layer introduced in this patent application to the super resolution application in
(230) Each step in
(231) The above steps were performed using a quaternion neural network with 3 layers. The first convolutional layer has the parameters D.sub.i=1, D.sub.o=64, K.sub.x=K.sub.y=9. The second layer takes input directly from the first and has parameters D.sub.i=64, D.sub.o=32, K.sub.x=K.sub.y=1. Finally, the third layer takes input directly from the second and has parameters D.sub.i=32, D.sub.o=1, K.sub.x=K.sub.y=5. For information on parameter definitions, please see Section 2.1 of this document. Note that the convolution operations were performed on all valid points (i.e. no zero padding), so the prediction image is of size 20×20 color pixels, which is somewhat smaller than the 32×32 pixel input size.
(232) This experiment was repeated using a real-valued convolutional neural network with the same parameters. However, when using a real-valued neural network, the quaternion polar form (Equations 12 and 14) is not used for the real-valued neural networks. Rather, the images are presented to the network as a depth 3 input, where each input depth corresponds to one of the colors. Consequently, D.sub.i=3 for the first layer rather than 1 in the quaternion case, and D.sub.o=3 in the last layer. This allows the real neural network to process color images, but the real network does not understand that there is a significant relationship between the three color channels. Each neural network was trained using 256 input images. The networks were trained for at least 150,000 cycles; in all cases, the training error had reached steady state before training was deemed complete. An additional set of 2544 images was used for testing.
(233) TABLE-US-00001 TABLE 1 Neural Network PSNR (Training Set) PSNR (Testing Set) Real-Valued 29.72 dB 30.06 dB Hypercomplex 31.24 dB 31.89 dB
(234) The mean PSNR values are shown for the training and testing datasets in Table 1. Higher PSNR values represent better results, and one can observe that the hypercomplex network outperforms the real-valued network.
(235) Sample visual outputs from each algorithm are shown in
(236) 4.2 Image Segmentation
(237) Another exemplary application of the hypercomplex neural network is color image segmentation. Image segmentation is the process of assigning pixels in a digital image to multiple sets, where each set typically represents a meaningful item in the image. For example, in a digital image of airplanes flying, one may want to segment the image into sky, airplane, and cloud areas.
(238) When combined with the pooling operator, real-valued convolutional neural networks have seen wide application to image classification. However, the shift-invariance caused by pooling makes for poor image localization, which is necessary for image segmentation. One strategy to combat this problem is to upsample the image during convolution. Such upsampling is standard practice in signal processing, where one inserts zeros between consecutive data values to create a larger image. This upsampled image is trained into a hypercomplex neural network.
(239) An exemplary system for hypercomplex image segmentation is shown in
(240) 4.3 Image Quality Evaluation
(241) The hypercomplex networks introduced in this patent may also be used for automated image quality evaluation.
(242) Image quality analysis is fundamentally a task for human perception, and therefore humans provide the ultimate ground truth in image quality evaluation tasks. However, human ranking of images is time-consuming and expensive, and therefore computational models of human visual perception are of great interest. Humans typically categorize images using natural language—i.e. with words like, “good,” or, “bad.” Existing studies have asked humans to map these qualitative descriptions to a numerical score, for example, 1 to 100. Research indicates that people are not good at the mapping process, and therefore the step of mapping to a numerical score adds noise. Finally, existing image quality measurement systems have attempted to learn the mapping from image to numerical score, and are impeded by the noisy word-to-score mapping described previously.
(243) Another approach is to perform blind image quality assessment, where the image quality assessment system learns the mapping from image to qualitative description. Thus, a blind image quality assessment system is effectively a multiclass classifier, where each class corresponds to a word such as, “excellent,” or, “bad.” Such a system is shown in
(244) The first processing step of the system in
(245) A key feature of the hypercomplex network is its ability to process all color channels simultaneously, thereby preserving important inter-channel relationships. Existing methods either process images in grayscale or process color channels separately, thereby losing important information.
(246) 4.4 Image Steganalysis
(247) Another exemplary application of hypercomplex deep learning is image steganalysis. Image steganography is the technique of hiding information in images by slightly altering the pixel values of the image. The alteration of pixels is performed such that the human visual system cannot perceptually see a difference between the original and altered image. Consequently, steganography allows for covert communications over insecure channels and has been used by a variety of terrorist organizations for hidden communications. Popular steganographic algorithms include HUGO, WOW, and S-UNIWARD.
(248) Methods for detecting steganography in images have been developed. However, all of the methods either process images in grayscale or process each color channel separately, thereby losing important inter-channel relationships. The hypercomplex neural network architecture in this patent application overcomes both limitations; an example of a steganalysis system is shown in
(249) 4.5 Face Recognition
(250) Due to hypercomplex networks' advantages in color image processing, face recognition is another exemplary application. One method of face recognition is shown in
(251) 4.6 Natural Language Processing: Event Embedding
(252) An important task in natural language processing is event embedding, the process of converting events into vectors. An example of an event may be, “The cat ate the mouse.” In this case, the subject O.sub.1 is the cat, the predicate P is “ate”, and the object O.sub.2 is the poor mouse. Open-source software such as ZPar can be used to extract the tuple (O.sub.1, P, O.sub.2) from the sentence, and this tuple is referred to as the event.
(253) However, the tuple (O.sub.1, P, O.sub.2) is still a tuple of words, and machine reading and translation tasks typically require the tuple to be represented as a vector. An exemplary application of a hypercomplex network for tuple event embedding is shown in
(254) 4.7 Natural Language Processing: Machine Translation
(255) A natural extension of the event embedding example of Section 4.6 is machine translation: Embedding a word, event, or sentence as a vector, and then running the process in reverse but using neural networks trained in a different language. The result is a system that automatically translates from one written language to another, e.g. from English to Spanish.
(256) The most general form of a machine translation system is shown in
(257) Another example of a machine translation system is shown in
(258) 4.8 Unsupervised Learning: Object Recognition
(259) The deep neural network graphs described in this patent application are typically trained using labeled data, where each training input pattern has an associated desired response from the network. However, in many applications, extensive sets of labeled are not available. Therefore, a way of learning from unlabeled data is valuable in practical engineering problems.
(260) An exemplary approach to using unlabeled data is to train autoassociative hypercomplex neural networks, where the desired response of the network is the same as the input. Provided that the hypercomplex network has intermediate representations (i.e. layer outputs within the graph) that are of smaller size than the input, the autoassociative network will create a sparse representation of the input data during the training process. This sparse representation can be thought of as a form of nonlinear, hypercomplex principal component analysis (PCA) and extracts only the most “informative” pieces of the original data. Unlike linear PCA, the hypercomplex network performs this information extraction in a nonlinear manner that takes all hypercomplex components into account.
(261) An example of autoassociative hypercomplex neural networks is discussed in Section 1.10, where autoassociative structures are employed for pre-training neural network layers.
(262) Additionally, unsupervised learning of hypercomplex neural networks may be used for feature learning in computer vision applications. For example, in
(263) 4.9 Control Systems
(264) Hypercomplex neural networks and graph structures may be employed, for example, to control systems with state, i.e. a plant.
(265) 4.10 Generative Models
(266)
(267) 4.11 Medical Imaging: Breast Mass Classification
(268) Another exemplary application hypercomplex networks is breast cancer classification or severity grading. For this application, use of the word, “classification,” will refer both to applications where cancer presence is determined on a true or false basis, and will also refer to applications where the cancer is graded according to any type of scale—i.e. grading the severity of the cancer.
(269) A simple system for breast cancer classification is shown in
(270) Note that a variety of multi-dimensional mammogram techniques are currently under development, and that false color is currently added to existing mammograms. Therefore, the advantages that a hypercomplex network has when processing color and multi-dimensional data apply to this example.
(271) Since breast cancer classification has been studied extensively, a large database of expert features has already been developed for use with this application. A further improvement upon the described hypercomplex breast cancer classification system is shown in
(272) Finally, additional postprocessing may be performed after the output of the hypercomplex network. An example of this is shown in
(273) 4.12 Medical Imaging: MM
(274)
(275) 4.13 Hypercomplex processing of multi-sensor data
(276) Most modern sensing applications employ multiple sensors, either in the context of a sensor array using multiple of the same type of sensor or by using different types of sensors (e.g. as in a smartphone). Because these sensors are typically measuring related, or the same, quantity, the output from each sensor is usually related in some way to the outputs from the other sensors.
(277) Multi-sensor data may be represented by hypercomplex numbers and, accordingly, processed in a unified manner by the introduced hypercomplex neural networks. An example of speech recognition is shown in
(278) The goal is to perform similar speech recognition on a speaker who is far away from a microphone array. To accomplish this, a microphone array captures far away speech and represents it using hypercomplex numbers. Next, a deep hypercomplex neural network (possibly including state elements) is trained to output corrected cepstrum coefficients by using the close talking input and its cepstrum converter to created labeled training data. Finally, during the recognition phase, the close talking microphone can be disabled completely and the hypercomplex neural network feeds the speech recognizer directly, delivering speech recognition quality similar to that of the close-talking system.
(279) 4.14 Hypercomplex Multispectral Image Processing and Prediction
(280) Multispectral imaging is important for a variety of material analysis applications, such as remote sensing, art and ancient writing investigation and decoding, fruit quality analysis, and so on. In multispectral imaging, images of the same object are taken at a variety of different wavelengths. Since materials have different reflectance and absorption properties at different wavelengths, multispectral images allow one to perform analysis that is not possible with the human eye.
(281) A typical multispectral image dataset is processed using a classifier to determine some property or score of the material. This process is shown in
(282) In
(283) 4.15 Hypercomplex Image Filtering
(284) Another exemplary application of the hypercomplex neural networks introduced in this patent is color image filtering, where the colors are processed in a unified fashion as shown in
(285) 4.16 Hypercomplex Processing of Gray Level Images
(286) While processing of multichannel, color images has been discussed at length in this application, hypercomplex structures may also be employed for single-channel, gray-level images as well.
(287) 4.17 Hypercomplex Processing of Enhanced Color Images
(288) Standard color image enhancement and analysis techniques, such as, for example, luminance computation, averaging filters, morphological operations, and so on, may be employed in conjunction with the hypercomplex graph components/neural networks described in this application. For example,
(289) 4.18 Multimodal Biometric Identity Matching
(290) Biometric authentication has become popular in recent years due to advancements in sensors and imaging algorithms. However, any single biometric credential can still easily be corrupted due to noise: For example, camera images may be occluded or have unwanted shadows; fingerprint readers may read fingers covered in dirt or mud; iris images may be corrupted due to contact lenses; and so on. To increase the accuracy of biometric authentication systems, multimodal biometric measurements are helpful.
(291) An exemplary application of the hypercomplex neural networks in this patent application is multimodal biometric identity matching, where multiple biometric sensors are combined at the feature level to enable decision making based on a fused set of data.
(292) Advantages of unified multimodal processing include higher accuracy of classification and better system immunity to noise and spoofing attacks.
(293) 4.19 Multimodal Biometric Identity Matching with Autoencoder
(294) To further extend the example in Section 4.18, one may add an additional hypercomplex autoencoder, as pictured in
(295) 4.20 Multimodal Biometric Identity Matching with Unlabeled Data
(296) In biometrics, it is frequently the case that large quantities of unlabeled data are available, but only a small dataset of labeled data can be obtained. It is desirable to use the unlabeled data to enhance system performance through pre-training steps, as shown in
(297) Particularly in the case of facial images, most labeled databases have well-lit images in a controlled environment, while unlabeled datasets (e.g. from social media) have significant variations in pose, illumination, and expression. Therefore, creating data representations that are capable of representing both types of data will enhance overall system performance.
(298) 4.21 Multimodal Biometric Identity Matching with Transfer Learning
(299) A potential problem with the system described in Section 4.20 is that the feature representation created by the (stacked) autoencoder may result in a feature space where each modality is represented by separate elements; while all modalities theoretically share the same feature representations, they do not share the same numerical elements within those representations. An exemplary solution to this is shown in
(300) In this proposed system, matching sets of data are formed from, for example: (i) anatomical characteristics, for example, fingerprints, signature, face, DNA, finger shape, hand geometry, vascular technology, iris, and retina; (ii) behavioral characteristics, for example, typing rhythm, gait, gestures, and voice; (iii) demographic indicators, for example, age, height, race, and gender; and (iv) artificial characteristics such as tattoos and other body decoration. While training the autoencoder, one or more modalities from each matching dataset are omitted and replaced with zeros. However, the autoencoder/decoder pair is still trained to reconstruct the missing modality, thereby ensuring that information from the other modalities is used to represent the missing modality. As with
(301) 4.22 Clothing Identification
(302) The fashion industry presents a number of interesting exemplary applications of hypercomplex neural networks. In particular, most labeled data for clothing comes from retailers, who hire models to demonstrate clothing appearance and label the photographs with attributes of the clothing, such as sleeve length, slim or loose, color, and so on. However, many practical applications of clothing detection are relevant for so-called “in the street” clothes. For example, if a retailer wants to know how often a piece of clothing is worn, scanning social media could be an effective approach, provided that the clothing can be identified from photos that are taken in an uncontrolled environment.
(303) Moreover, in biometric applications, clothing identification may be helpful as an additional modality.
(304) An exemplary system for creating a store clothing to attribute classification system is shown in
5. Flowchart Diagram for Hypercomplex Training of Neural Network
(305)
(306) At 502, a hypercomplex representation of input training data may be created. In some embodiments, the input training data may comprise image data (e.g., traditional image data, multispectral image data, or hyperspectral image data. For example, in some embodiments, each spectral component of the multispectral image data may be associated in the hypercomplex representation with a separate dimension in hypercomplex space), and the hypercomplex representation may comprise a first tensor. In these embodiments, a first dimension of the first tensor may correspond to a first dimension of pixels in an input image, a second dimension of the first tensor may correspond to a second dimension of pixels in the input image, and a third dimension of the first tensor may correspond to different spectral components of the input image in a multispectral or hyperspectral representation. In other words, the first tensor may separately represent each of a plurality of spectral components of the input images as respective matrices of pixels. In some embodiments, the first tensor may have an additional dimension corresponding to depth (e.g., for a three-dimensional “image”, i.e., a multispectral 3D model).
(307) In other embodiments, the input training data may comprise text and the CNN may be designed for speech recognition or another type of text-processing application. In these embodiments, the hypercomplex representation of input training data may comprise a set of data tensors. In these embodiments, a first dimension of each data tensor may correspond to words in an input text, a second dimension of each data tensor may correspond to different parts of speech of the input text in a hypercomplex representation, and each data tensor may have additional dimensions corresponding to one or more of data depth, text depth, and sentence structure.
(308) At 504, hypercomplex convolution of the hypercomplex representation (e.g., the first tensor in some embodiments, or the set of two or more data tensors, in other embodiments) with a second tensor may be performed to produce a third output tensor. For image processing applications, the second tensor may be a hypercomplex representation of weights or adaptive elements that relate one or more distinct subsets of the pixels in the input image with each pixel in the output tensor. In some embodiments, the subsets of pixels are selected to comprise a local window in the spatial dimensions of the input image. In some embodiments, a subset of the weights map the input data to a hypercomplex output using a hypercomplex convolution function.
(309) In text processing applications, 504 may comprise performing hypercomplex multiplication using a first data tensor of the set of two or more data tensors of the hypercomplex representation with a third hypercomplex tensor to produce a hypercomplex intermediate tensor result, and then multiplying the hypercomplex intermediate result with a second data tensor of the set of two or more data tensors to produce a fourth hypercomplex output tensor, wherein the third hypercomplex tensor is a hypercomplex representation of weights or adaptive elements that relate the hypercomplex data tensors to one another. In some embodiments, the fourth hypercomplex output tensor may be optionally processed through additional transformations and then serve as input data for a subsequent layer of the neural network. In some embodiments, the third output tensor may serve as input data for a subsequent layer of the neural network.
(310) At 506, the weights or adaptive elements in the second tensor may be adjusted such that an error function related to the input training data and its hypercomplex representation is reduced. For example, a steepest descent or other minimization calculation may be performed on the weights or adaptive elements in the second tensor such that the error function is reduced.
(311) At 508, the adjusted weights may be stored in the memory medium to obtain a trained neural network. Each of steps 502-508 may be subsequently iterated on subsequent respective input data to iteratively train the neural network.
(312) Further modifications and alternative embodiments of various aspects of the invention will be apparent to those skilled in the art in view of this description. Accordingly, this description is to be construed as illustrative only and is for the purpose of teaching those skilled in the art the general manner of carrying out the invention. It is to be understood that the forms of the invention shown and described herein are to be taken as examples of embodiments. Elements and materials may be substituted for those illustrated and described herein, parts and processes may be reversed, and certain features of the invention may be utilized independently, all as would be apparent to one skilled in the art after having the benefit of this description of the invention. Changes may be made in the elements described herein without departing from the spirit and scope of the invention as described in the following claims.