INVARIANT REPRESENTATIONS OF HIERARCHICALLY STRUCTURED ENTITIES
20240037924 ยท 2024-02-01
Assignee
Inventors
Cpc classification
G06V10/772
PHYSICS
G06V10/76
PHYSICS
International classification
G06V10/86
PHYSICS
G06V10/75
PHYSICS
Abstract
A method for processing digital image recognition of invariant representations of hierarchically structured entities can be performed by a computer using an artificial neural network. The method involves learning a sparse coding dictionary on an input signal to obtain a representation of low-complexity components. Possible transformations are inferred from the statistics of the sparse representation by computing a correlation matrix. Eigenvectors of the Laplacian operator on the graph whose adjacency matrix is the correlation matrix from the previous step are computed. A coordinate transformation is performed to the base of eigenvectors of the Laplacian operator, and the first step is repeated with the next higher hierarchy level until all hierarchy levels of the invariant representations of the hierarchically structured entities are processed and the neural network is trained. The trained artificial neural network can then be used for digital image recognition of hierarchically structured entities.
Claims
1. A method for processing digital image recognition of invariant representations of hierarchically structured entities, performed by a computer using an artificial neural network, comprising the following method steps: Learning a sparse coding dictionary by the computer on an input signal (14) to obtain a representation of low-complexity components, Inferring possible transformations from the statistics of the sparse representation by computing a correlation matrix (8) between the low-complexity components with the computer resulting in invariance transformation of the data now encoded in the symmetries of the correlation matrix (8), Computation of the eigenvectors (9) of the Laplacian operator on the graph (18) whose adjacency matrix is the correlation matrix (8) from the previous step Performing a coordinate transformation to the base of eigenvectors (9) of the Laplacian operator, Repeating with step one with the next higher hierarchy level (11) until all hierarchy levels (7, 11) of the invariant representations of the hierarchically structured entities are processed and the neural network is trained, and Using the trained artificial neural network to the digital image recognition of hierarchically structured entities, creating representations of those entities which are invariant under the transformations learnt in the previous steps
2. The method according to claim 1, wherein the sparse coding dictionary learning comprises a first processing step of recognizing patterns (15) in the input signal data (14), wherein those patterns (15) represent specific recurring combinations in the input signal data (14).
3. The method according to claim 1, wherein the representation of low-complexity components is created by computing a correlation matrix (8) of co-occurrences of neuron activations.
4. The method according to claim 1, wherein the next higher hierarchy level (11) gets the result of the coordinate transformation from the base of eigenvectors (9) as input data.
5. The method according to claim 1, wherein the using of the trained artificial neural network to digital image recognition comprises image denoising, object recognition, speech recognition and text recognition.
6. The method according to claim 5, wherein, the text and object recognition comprises to solve captchas or to recognize chemical structures in images.
7. An artificial neural network established on a computer by performing the method according to claim 1.
8. A software product performing the method and establishing an artificial neural network on a computer according to claim 1.
Description
[0025] The drawings show:
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036] The solution is a software product which runs on a suitable computer and executes the following method in form of an algorithm on the input signal, which is preferably at least on digital image: [0037] 1. Perform sparse coding, as a form of dictionary learning, on the input signal to obtain a representation of low-complexity components; e.g. line segments in the case of an image. These low-complexity components are also called atoms. [0038] 2. Infer the possible transformations from the statistics of the sparse representation: Compute the correlation matrix 8 between the atoms, i.e. count how often a given pair of atoms is activated simultaneously by the same input data point. An allowed invariance transformation of the data is now encoded in the symmetries of this correlation matrix 8. [0039] 3. Perform a coordinate transformation to the base of eigenvectors of the inferred transformation. In this new basis, the problem of encoding the next higher hierarchy level 11 is reduced in dimensionality [0040] 4. Repeat the algorithms, starting at step 1. with the next higher hierarchy level 11.
[0041] The algorithm of the invented method is hereinafter explained more detailed by showing a working example using the
[0042] First
[0043] A panda is something pretty complex. But the same issues also appear with simpler objects.
[0044] The human brain still manages to recognize a cube without any effortbecause it has somehow formed an abstract idea of what a cube is: The invariant representation 4.
[0045] Even more impressive is how few examples we need to create these invariant representations. How many pandas or panda pictures a human brain has been processed in its life? Maybe a few dozen. How many pandas does a child need to see before it can recognize pandas? Maybe one or three or at maximum about ten. So a handful of examples is enough to learn for the human brain and then it can recognize every panda despite the astronomic number of possibilities how it can look like.
[0046] This ability of the human brain to form invariant representations is probably the biggest difference to AI algorithms according to the state of the art.
[0047] This problem needs to be solved not only for image recognition but also for abstract thinking. Because in the end abstract thoughts are always tied to sensory signals. It is not possible to think of a mathematical formula without somehow visualizing iteither its written form or its meaning or the objects it represents. That means that the problem of invariant representations is currently blocking the development of strong AI. A strong AI would enable a superhuman progress on many other scientific problems.
[0048]
[0049] The following figures explain the single method steps using a specific working example, starting with
[0050] The next method step explained in
[0051]
[0052] The decisive observation in the next step according to
[0053] A symmetry in a matrix can now be found by computing the Laplacian eigenvectors 9. For an exact symmetry, non-localized self-vectors result, essentially a kind of Fourier transformation, where the axis along which is transformed is the trace of the symmetry transformation. This is done essentially by considering three points: [0054] 1. Regard the correlation matrix 8 as edge weight of a graph 18 with 2700 nodes [0055] 2. Compute the Laplacian eigenvectors 9 (vibration modes) of this graph 18, meaning the eigenvectors of the Laplacian operator 9 on the graph 18 [0056] 3. Visualize the eigenvectors 15a, 16a in terms of the receptive fields of the graph nodes
[0057] In the working example there are 2700 neurons in the first layer 7. Therefore the correlation matrix 8 has the size 27002700. Their eigenvectors 9 therefore also have a dimension of 2700. Now, if an input image 14 activates some neurons, it can be seen as a vector 10 in a 2700-dimensional space. In the working example, five entries would be equal to one and all other zero. This vector 10 can now be expressed in another base, namely in the base of the Laplacian eigenvectors 9. So you get a new, transformed vector 10 with 2700 components. This new vector 10 is visualized as a long line of color-coded pixels 13 (see
[0058] If the lines of color-coded pixels are drawn for many input images 14, the lines for similar images are similar. Similar images means here those images which show the same letter, regardless of their color and position. So the system of the ANN has learned, in a way, that color and position are not as important as the type of letter.
[0059] The similarities between the lines of pixels 13, which we have already recognized visually easily, can now be used algorithmically. The second layer 11 is therefore built in a way that basically works the same way as for the first layer 7, but gets the resulting pixel lines 13 from
[0060] The algorithm has learned therefore in an unsupervised way to distinguish letters and other symbols independent of their position and color. By looking only at the statistical properties of the input data, it has discovered the concepts of translational invariance and color invariance. That means by applying the algorithm on an input signal, an ANN is trained to handle the invariant representations of the processed signals or rather images 14. The invented method therefore results in an specific trained neural network consisting of multiple layers 7, 11 handling the different hierarchy levels of the input signal or images 14.
[0061] In principle, possible further preferred embodiments could comprise of very different software products which use the described method, for example, to perform tasks like image denoising, object recognition, speech recognition, etc. The most immediate examples could be methods and constitutive systems which perform special cases of text recognition, for example, to solve Captchas or to recognize chemical structures in images.
LIST OF REFERENCES
[0062] 1 First example picture with a panda [0063] 2 Second example picture with random noise [0064] 3 Resulting manipulated example picture [0065] 4 Perception of a cube with invariant representations [0066] 5 Overview about the single steps of the algorithm [0067] 6 Image generator [0068] 7 Layer 1 [0069] 8 Correlation matrix [0070] 9 Laplace eigenvector [0071] 10 Input vector expressed with laplace eigenvector [0072] 11 Layer 2 [0073] 12 Overall input image [0074] 13 One line with color codes [0075] 14 Single input images [0076] 15 Collected patterns in input images [0077] 15a Cutout of visualized laplace eigenvectors [0078] 16 Cutout from collected patterns in input images [0079] 16a Resulting visualized laplace eigenvectors [0080] 17a First layer 1 neuron [0081] 17b Second layer 1 neuron [0082] 17c Layer 2 neuron [0083] 18 Correlation graph with edge weights