Memory device and matrix processing unit utilizing the memory device
11250106 · 2022-02-15
Inventors
Cpc classification
G03C1/733
PHYSICS
G06F17/16
PHYSICS
G02F1/163
PHYSICS
International classification
G06F17/16
PHYSICS
G02F1/163
PHYSICS
G11C13/00
PHYSICS
G06F17/17
PHYSICS
Abstract
A matrix processing apparatus having a three-dimensional slice access memory and an input-/output block. The slice access memory includes cells organized into cell slices, each slice storing an entire selected data matrix. The three-dimensional slice access memory is configured to allow read/write access to the entire data matrix at the same time. The input/output block is connected to the three-dimensional slice access memory and is configured to format data into a format acceptable to the three-dimensional slice access memory.
Claims
1. A matrix processing apparatus, comprising: a three-dimensional slice access memory comprising a plurality of cells organized in a plurality of cell slices, each slice storing an entire selected data matrix, said three-dimensional slice access memory being configured to allow read/write access to said entire selected data matrix at the same time; and an input/output block connected to said three-dimensional slice access memory and configured to format data into a format acceptable to said three-dimensional slice access memory, wherein each of said cells of said three-dimensional slice access memory comprises a circuit having a photochrom fluorescing under an influence of illumination emitted by a light source, and a photo-resistive element, wherein light from said photochrom falls onto said photo-resistive element whose resistance depends on an intensity of said light from said photochrom.
2. The apparatus of claim 1, further comprising at least one matrix processing device configured to manipulate data in matrix form and at least one matrix data bus connecting said at least one matrix processing device to said three-dimensional slice access memory.
3. The apparatus of claim 2, wherein said at least one matrix data bus comprises a plurality of channels, and wherein a number of channels in said at least one matrix data bus corresponds to dimensions of said selected data matrix.
4. The apparatus of claim 2, wherein said at least one matrix processing device is a vector-matrix multiplication device configured to multiply a matrix by a vector.
5. The apparatus of claim 2, wherein said at least one matrix processing device is a matrix-matrix multiplication device configured to multiply a first matrix by a second matrix.
6. The apparatus of claim 2, wherein said at least one matrix processing device is a Hadamard product device configured to perform element-wise multiplication of matrices having the same dimensions.
7. The apparatus of claim 2, wherein said at least one matrix processing device is a matrix addition device configured to perform element-by-element addition of matrices having the same dimensions.
8. The apparatus of claim 2, wherein said at least one matrix processing device is a matrix determinant calculation device configured to calculate a determinant of a particular matrix.
9. The apparatus of claim 2, further comprising an external data bus and a central controller, said central controller being connected to said three-dimensional slice access memory, said at least one matrix processing device and said input/output block, wherein said external data bus is separate and distinct from said matrix data bus.
10. The apparatus of claim 9, where said central controller is configured to control at least one of said three-dimensional slice access memory, said at least one matrix processing device and said input/output block based on external instructions conveyed to said central controller via said external data bus.
11. A matrix processing apparatus, comprising: a three-dimensional slice access memory comprising a plurality of cells organized in a plurality of cell slices, each slice storing an entire selected data matrix, said three-dimensional slice access memory being configured to allow read/write access to said entire selected data matrix at the same time; and an input/output block connected to said three-dimensional slice access memory and configured to format data into a format acceptable to said three-dimensional slice access memory, wherein each of said cells of said three-dimensional slice access memory comprises a circuit having a photochrom fluorescing under an influence of illumination emitted by a light source, and a photocell, wherein light from said photochrom falls onto said photocell which converts fluorescence of said photochrom into an electric current.
12. A matrix processing apparatus, comprising: a three-dimensional slice access memory comprising a plurality of cells organized in a plurality of cell slices, each slice storing an entire selected data matrix, said three-dimensional slice access memory being configured to allow read/write access to said entire selected data matrix at the same time; and an input/output block connected to said three-dimensional slice access memory and configured to format data into a format acceptable to said three-dimensional slice access memory, wherein each of said cells of said three-dimensional slice access memory comprises a crossbar of multiple strips of light sources and multiple bands of optical summarizers positioned perpendicularly to said multiple strips of light sources, wherein each of said cells of said three-dimensional slice access memory further comprises a layer of photochromic film having a plurality of pixels, and wherein light from said light sources selectively illuminates at least some of said pixels of the photochromic film causing fluorescence of said illuminated pixels.
13. The apparatus of claim 12, wherein a light from said illuminated pixels of the photochromic film falls onto and is at least partially converted into a fluorescence of said optical summarizers.
14. The apparatus of claim 13, wherein said light from said illuminated pixels of the photochromic film is concentrated along each of said optical summarizers and is outputted from said optical summarizers as a total light signal.
15. The apparatus of claim 14, wherein said optical summarizers and said photochromic film are photochromic fluorescent optical fibers.
16. A memory device comprising: a three-dimensional slice access memory having a plurality of cells organized in a plurality of cell slices, each slice storing an entire selected data matrix, said three-dimensional slice access memory being configured to allow read/write access to said entire selected data matrix at the same time, wherein each of said cells comprises a circuit having a photochrom fluorescing under an influence of illumination emitted by a light source, and a photo-resistive element, wherein light from said photochrom falls onto said photo-resistive element whose resistance depends on an intensity of said light from said photochrom.
17. The memory device of claim 16, wherein said circuit further comprises at least one first light emitting diode and at least one second light emitting diode, said first light emitting diode emitting a first light wavelength converting said photochrom into a fluorescent state, and said second light emitting diode emitting a second light wavelength suppressing fluorescence of said photochrom.
18. The memory device of claim 17, wherein said first light emitting diode and said second light emitting diode are connected to the same circuit in parallel with an opposite polarity.
19. The memory device of claim 16, wherein said photo-resistive element is a photoresistor.
20. The memory device of claim 16, wherein said photo-resistive element is a phototransistor.
21. A memory device comprising: a three-dimensional slice access memory having a plurality of cells organized in a plurality of cell slices, each slice storing an entire selected data matrix, said three-dimensional slice access memory being configured to allow read/write access to said entire selected data matrix at the same time, wherein each of said cells comprises a circuit having a photochrom fluorescing under an influence of illumination emitted by a light source, and a photocell, wherein light from said photochrom falls onto said photocell which converts fluorescence of said photochrom into an electric current.
22. The memory device of claim 21, wherein said circuit further comprises at least one first light emitting diode and at least one second light emitting diode, said first light emitting diode emitting a first light wavelength converting said photochrom into a fluorescent state, and said second light emitting diode emitting a second light wavelength suppressing fluorescence of said photochrom.
23. The memory device of claim 22, wherein said first light emitting diode and said second light emitting diode are connected to the same circuit in parallel with an opposite polarity.
24. A memory device comprising: a three-dimensional slice access memory having a plurality of cells organized in a plurality of cell slices, each slice storing an entire selected data matrix, said three-dimensional slice access memory being configured to allow read/write access to said entire selected data matrix at the same time, wherein each of said cells comprises a crossbar of multiple strips of light sources and multiple bands of optical sunmiarizers positioned perpendicularly to said multiple strips of light sources, wherein each of said cells of said three-dimensional slice access memory further comprises a layer of photochromic film having a plurality of pixels, and wherein light from said light sources selectively illuminates at least some of said pixels of the photochromic film causing fluorescence of said illuminated pixels.
25. The memory device of claim 24, wherein a light from said illuminated pixels of the photochromic film falls onto and is at least partially converted into a fluorescence of said optical summarizers.
26. The memory device of claim 25, wherein said light from said illuminated pixels of the photochromic film is concentrated along each of said optical summarizers and is outputted from said optical summarizers as a total light signal.
27. The memory device of claim 26, wherein said optical summarizers and said photochromic film are photochromic fluorescent optical fibers.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The invention is illustrated by way of examples which are not a limitation, and the figures of the accompanying drawings in which references denote corresponding parts, and in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
(25)
(26)
(27)
(28)
(29)
(30)
(31)
(32)
(33)
DETAILED DESCRIPTION
(34) As shown in
(35) It should be further understood by a person skilled in the art that additional blocks/devices can be utilized within the architecture of the MPU of the present invention.
(36) Slice Access Memory (SAM)
(37) A prerequisite for coordinated and maximally efficient operation of all devices included in the described Matrix Processing Unit (MPU) is the use of a special type of non-volatile fast 3D-memory that provides read/write access simultaneously to the whole matrix.
(38) Even though SAM can be constructed utilizing ordinary DRAM, DRAM is volatile, and consumes energy even at the absence of memory operations. This results in significant power consumption. This configurationwould also require to save and load memory each time processor power is turned off.
(39) Further, the architecture of DRAM allows the use of only a small number of channels in parallel, making it slow. Non-volatile flash memory is not quite suitable for building SAM, since it can fail due to a large number of read/write cycles, which is unacceptable when working with a processor.
(40) Memristor SAM
(41) A seemingly suitable base for building SAM are non-volatile, energy-saving memristors with an almost unlimited tolerance for read/write cycles. A typical memristor crossbar, shown in
(42) However, this approach requires storage of both positive and negative values, which are controlled by the opposite polarity of the voltage. This can be accomplished by using a two-layer architecture, as shown in
(43) Building a multi-layer architecture from the same memristor crossbar layers (
(44) The main disadvantage of this approach is the main feature of the memristor itself, i.e., the effect on the resistance of the memristor applied to the memristor voltage. Each operation of reading the resistance of the memristor leads to its change and requires periodic regeneration of the initial state. All this complicates the practical use of memristors as a matrix memory.
(45) Photo-Memristor
(46) To eliminate the issues of memristor-based SAM, the proposed system separates the recording and the reading processes as illustrated in
(47) In this embodiment, the non-volatile memory medium is a layer of photochromic substance 4 fluorescing under the influence of illumination at a certain wavelength emitted by the light source 1. The light from the fluorescence of photochrom 4 falls on a resistive element whose resistance depends on the intensity of the light. A photo-resistive element can be, for example, a photoresistor 8 in
(48) However, unlike an ordinary single-chain memristor, the proposed device contains three circuits: 5, 6 and 7, where circuit 5 is designed for read mode, causing fluorescence of photochromic 4; circuit 6 is designed for recording mode, increasing or decreasing the fluorescence level of photochrom 4, depending on the polarity of the voltage on the circuit 6; and circuit 7 is designed to read the resistance level of the resistive element 8 or 9. Separation of work into independent read and write modes eliminates the shortcomings of the memristors described above, but allows usage of the proposed photo-memristor in circuits designed for an ordinary memristor.
(49) In spite of the described advantages of a photo-memristor, it has some limitations. The energy consumption of a photo-memristor can exceed the energy consumption of an ordinary memristor, since in an ordinary memristor the energy is expended only on the transmission of currents through the memristors themselves, but, in the photo-memristor, the energy is also expended on the illumination of the photochrom.
(50) Photochromic SAM
(51) To reduce energy consumption of Photo-Memristor SAM, the SAM architecture can be based on pairing a light source with memory and light receiver. An example of such a pair may be a pair of a photochrom and a photodiode. Photochrom is preferably a light source with memory, and the photodiode transforms light into electro-currents, as a light receiver, as shown in
(52) Another example of the source and the receiver of light may be photoactive organic field-effect transistors (OFETs)—light-emitting organic field-effect transistors (LE-OFETs) and light-receiving organic field-effect transistors (LR-OFETs). LE-OFETs can function as non-volatile optical memories, and LR-OFETs, as phototransistors.
(53) As shown in
(54) Photochromic SAM (
(55) For example, the recording of information on a photochromic film can be performed using a simple Passive-Matrix LED/OLED (
(56) The simplest implementation of such a circuit for a recording layer is a double crossbar, as shown in
(57) The formation of a multilayer structure (
(58) Optical Summators in Photochromic SAM
(59) Summation of light signals can be carried out not only by semiconductor (photodiode) circuits (14) shown in
(60) The light from the fluorescent photochrom (22) falls on the fluorescent optical fiber (23) and is partially converted by it into fluorescence of the optical fiber itself. Further, the light propagates through the fiber. Thus, the light from the fluorescent photochrom (22), concentrated along the entire length of the fiber, is summed in it and arrives at the output as a total light signal (24).
(61) Functions of the photochromic memory layer (25) and the optical concentrators can be combined using a fluorescent photochrome as the material of the optical concentrator, as shown in
(62) Information Coding
(63) Positional Coding
(64) To encode numeric data, it is proposed to use a positional coding system, in which the number is represented in the form of a sum of digits multiplied by the corresponding position parameter. For example, a number consisting of the digits a, b, c, and d:
abcd=a.sub.3b.sub.2c.sub.1d.sub.0=aη.sup.3+bη.sup.2+cη.sup.1+dη.sup.0,
(65) where η—base of the numeral system (note that a.sub.3b.sub.2c.sub.1d.sub.0 represents a sequence of digits, not multiplication). This approach allows to store in memory any number in the form of a vector, regardless of the amount of information stored in one memory cell.sup.0. For example, for.sup.2 the decimal number system 618=6.sub.21.sub.18.sub.0=6.Math.10.sup.2+1.Math.10.sup.1+8.Math.10.sup.0, and three cells with values of 6, 1 and 8 will be occupied in memory. Binary encoding in this case will not differ from usual computer binary coding. The power of number of position parameters is conditional. For example, for nonnegative powers (from 0 and above) integers are encoded. For negative powers of the lower position parameters, real numbers are encoded. For example, 6.18=6.sub.01.sub.−18.sub.−2=6.Math.10.sup.0+1.Math.10.sup.−1+8.Math.10.sup.−2.
(66) Analog Positional Accumulation
(67) With positional coding, analog summation is performed for digits with an equal digit:
a.sub.3b.sub.2c.sub.1d.sub.0+e.sub.3f.sub.2g.sub.1h.sub.0=(aη.sup.3+bη.sup.2+cη.sup.1+dη.sup.0)+(eη.sup.3+fη.sup.2+gη.sup.1+hη.sup.0)=(a+e)η.sup.3+(b+f)η.sup.2+(c+g)η.sup.1+(d+h)η.sup.0
(68) Analog Positional Multiplication
(69) With positional coding, analog multiplication is performed for numbers like Matrix-Matrix Multiplication for two vectors of digits, forming a multiplication matrix:
(70)
(71) where the multiplied digital factors can be represented in the form of a matrix product:
(72)
(73) In this case, the positions, with the corresponding matrix multipliers, form the matrix of the power degrees:
(74)
(75) It can be seen that the degrees are arranged along parallel diagonal lines, which allows an analogue summation of multipliers of the multiplication matrix, as seen in
(76) The result of such analog summation is already very close to the normal form of positional coding:
a.sub.3b.sub.2c.sub.1d.sub.0.Math.e.sub.3f.sub.2g.sub.1h.sub.0==ae.Math.η.sup.6+(be+af).Math.η.sup.5+(ce+bf+ag).Math.η.sup.4+(de+cf+bg+ah).Math.η.sup.3+(df+cg+bh).Math.η.sup.2+(dg+ch).Math.η.sup.1+dh.Math.η.sup.0
(77) It remains only to convert the multipliers represented by the analog sums to the positional form and perform the positional summation.
(78) For example, in decimal notation: 124.Math.3118=386632. We represent the product in the proposed matrix form:
(79)
(80) Now add the factors with equal positions (diagonals), as seen in
0.Math.10.sup.6+(3+0).Math.10.sup.5+(6+1+0).Math.10.sup.4+(12+2+1+0).Math.10.sup.3+(4+2+8).Math.10.sup.2+(4+16).Math.10.sup.1+32.Math.10.sup.0=0+300000+70000+15000+1400+200+32=386632
(81) We have come to the right result.
(82) The proposed mechanism for multiplying numbers can be implemented analogously and executed in one step. This mechanism is naturally implemented in some of the above MPU devices, for example, the matrix-to-matrix multiplication block (MMM).
(83) As an example, we multiply two matrices with three-digit numbers in the decimal number system:
(84)
(85) The numbers are encoded positionally, as was shown above. Moreover, the vectors of the numbers of the matrix A will be located in the columns, and the vectors of the numbers of
(86)
the matrix B will be located in the rows:
(87)
(88) The result is a 9×9 matrix shown in FIG. 33, or a 3×3 matrix consisting of 3×3 submatrices, each of which encodes an individual number of the resulting matrix.
(89) where for a submatrix shown in
(90) We add the factors with equal positions (diagonals):
0.Math.10.sup.4+(0±1).Math.10.sup.3+(0+3+2).Math.10.sup.2+(6+9).Math.10.sup.1+24.Math.10.sup.0=0+1000+500+150+24=1674
(91) For the submatrix shown in
(92) We add the factors with equal positions (diagonals):
0.Math.10.sup.4+(0+3).Math.10.sup.3+(0+30+3).Math.10.sup.2+(37+15).Math.10.sup.1+22.Math.10.sup.0=0+3000+3300+520+22=6842
(93) Folding the obtained matrix, we get a result analogous to that obtained by the usual multiplication:
(94)
(95) Encoding Negative Values
(96) In connection with the features of the analog implementation of matrix calculations in the described MPU, the coding of negative values will differ from the methods used in classical computers.
(97) Since direct analog calculations in the proposed device are performed only with absolute values, it is necessary to separate the positive and negative values and perform these calculations separately.
(98) Separate positive and negative values can be either in space or in time.
(99) Separation in space:
(100) Independent parallel SAM layers are preferably utilized for this function. In SAM, positive storing layers preferably alternate with negative storing layers. As shown in
(101) Thus, when reading the vector of values from these two layers, the vector (31) will be read from the alternating positive and negative values.
(102) Computational operations with positive and negative matrix components should be carried out separately, so the matrix of positive and negative values stored in SAM must be divided in space not only by layers, but also by slices, as shown in
(103) Separation in time:
(104) The compactness of recording information in SAM can be improved by marking the sign of the value with a flag, just as it is done in modern computers. However, in that case a mechanism for managing access to memory is required, depending on the flag value.
(105) For example, access to the values is via nMOS or pMOS transistors. The Gate signal is fed from the value sign flag. One control signal for SAM, allows access only to values with a positive flag another control signal access only to values with a negative flag set. In this scenario, the separation of matrices into positive and negative components will occur in time, since one-step access to the data will be provided only to either positive or negative values. Calculations with both will need to be performed sequentially.
(106) Another way of separating positive and negative values can be the mixture of photochromes reacting to different wavelengths. Some wavelengths correspond only to negative values, while the others only to positive. This allows to work selectively with information, depending on the conventional sign.
(107) Matrix Addition (MA)
(108) The choice of two or more matrices in SAM simultaneously leads to their automatic summation, thus eliminating necessity to develop a separate device for this purpose.
(109) When adding matrices, negative components add only with negative ones, and positive components only with positive ones. The result is the difference between the positive and negative sums.
(110) Matrix Multiplication (MM)
(111) For multiplication, it is also necessary to separate positive and negative values. As shown above, the matrix must be divided into two matrices, one of which contains only positive values, and the second only negative ones. Multiplication is performed separately for the positive and negative components of both matrices. Thus, there are four independent multiplications: Matrix 1.sup.+ (positive component of Matrix 1,
(112)
by Matrix 2.sup.+(positive component of Matrix 2,
(113)
Matrix 1.sup.− (negative component of Matrix 1,
(114)
by Matrix 2.sup.− (negative component of Matrix 2,
(115)
While,
(116)
are the positive component of the resulting matrix,
(117)
are the negative component. To calculate the result of matrix multiplication, it is required to subtract its negative component from the positive component of the resulting matrix.
(118) For example, for matrices
(119)
(120) We divide the matrices A and B into positive and negative components:
(121)
(122) We obtain the positive components of the matrix C:
(123)
(124) As a result:
(125)
(126) The negative components of the matrix C:
(127)
(128) As a result:
(129)
(130) And, finally:
(131)
(132) We have arrived at the same result as in direct multiplication of matrices.
(133) Vector-Matrix Multiplication (VMM)
(134) In addition to devices such as TPU [1] and EnLight256 [2], VMM can be implemented, as shown above, on a single layer memristor crossbar (
(135) As shown above, the VMM can be implemented on the basis of the crossbar of linear light sources and linear photodiodes, using a photochromic film (as shown in
(136) As illustrated in
(137) Matrix-Matrix Multiplication (MMM)
(138) The complexity of computing VMM by definition is O(n.sup.2), while the complexity of calculating MMM by definition is O(n.sup.3), where n is the dimension of the side of the matrix. Use of unique algorithmic techniques led to reduction of the complexity of MMM in solving practical problems to about O(n.sup.2.52). Due to the “Coppersmith-Vinograd barrier” in asymptotic estimates of the speed of the algorithms, no further algorithmic increase in the speed of MMM calculation is foreseen. The transition from VMM to MMM means a radical (power-law) increase in the speed of computation.
(139) The SAM architecture of the present invention, for example, based on photochromes, allows not only VMM on a separate layer, but also MMM, when using a multi-layer package, where MMM can be represented as n independent VMMs, the results of which (vectors) are collected in a matrix.
(140) However, with this approach, n identical layers of SAM need to create n identical copies of the same matrix. Only in this case it will apply to MMM. The necessity of creation of preliminary multiple copies of one matrix is the bottleneck of such an approach. It negates the entire gain of time from the speed of calculations by the cost of copying. The Photochromic SAM architecture allows to build a device for MMM, which will copy only one copy of the matrix, where it is necessary, thus eliminating the copying issue.
(141) MMM Using Transparent Modulator
(142) If a layer of photochromic substance (11) with fluorescent pixels (12) of
(143) The proposed architecture makes it possible to form a multilayer structure, as shown in
(144) However, to multiply a vector by a matrix, it is also necessary to multiply the values of the matrix by the values of the vector, that is, it is necessary to further modulate the luminescence intensity of the pixel light sources, along the lines in the plane of the layer and perpendicular to the photodiode bands. Such modulation can be implemented in various ways. For example, a modulator can be a set of parallel bands with an adjustable transparency (for example, liquid crystal or photochrom), as shown in the embodiment of
(145) In this embodiment, bands of the optical modulator with an adjustable transparency (38) are located between the grounding circuits (36) of the light sources (34) and the photodiode bands (37), and in the same plane, but perpendicular to the photodiode bands (37). For each band of the optical modulator with adjustable transparency (38) its signal is fed from the input vector, which establishes a certain transparency. Light from the sources (34), passing through the band of the optical modulator (38) actually multiplies the value of the input matrix by the value of the input vector. The modulated light is summed over the photodiode bands. This way VMM is implemented on one MMM layer. As was shown above, a copy of the same input matrix is formed on each layer of such a device, therefore, on each layer, the multiplication of different vectors is performed on the same matrix, resulting in MMM calculation on the described device.
(146) The described device for calculating the MMM can be represented by a parallelepiped shown in
(147) MMM Using TFT Modulator
(148) In another preferred embodiment, shown in
(149) An array of such nodes forms one layer of the device similar to one layer of Photochromic SAM, as shown in
(150) Formation of a multilayer structure of layers of
(151) Optical MMM
(152) The above-described MMM implementations, both for the Transparent Modulator and for the TFT Modulator, describe the same MMM device concept illustrated in
(153) This MMM architecture allows to create a purely optical device for implementing MMM. Specific miniature devices can be used, for example, nano-devices that generate a beam of light only if two beams with certain wavelengths fall at the same time on such nano-device. The intensity of the generated light depends on both beams that fell on this nano-device. This provides multiplication of the two initial values. If a transparent substance is uniformly filled with such optical nano-devices, the resulting optical composite can be used for MMM.
(154) As shown in
(155) Similarly, summation of the multiplied values is illustrated in
(156) Hadamard Product (HP)
(157) For element-by-element multiplication of matrices, optical modulation similar to the one proposed in Photochromic SAM can be used. As illustrated in
(158) For the multiplication of numbers in the positional coding, the method of analogous digit multiplication proposed above can be used. However, in order not to use the complex MMM 3D model proposed above for computing HP (
(159) For example, for the product abcd.Math.efgh, matrices will be used:
(160)
(161) As a result of the proposed device for calculating HP of
(162)
(163) The result does not differ from the analogous position multiplication method proposed above. From the matrix obtained by summation over the diagonals, the result of multiplying the original numbers is obtained.
(164) Example of multiplying two matrices with three-digit numbers in the decimal number system:
(165)
(166) It is necessary to represent the matrices A and B in the proposed positional coding with duplication of the digit vectors:
(167)
(168) Then:
(169)
(170) The result is a 9×9 matrix shown in
(171) where for submatrix shown in
(172) add the factors with equal positions (diagonals):
0.Math.10.sup.4+(0+0).Math.10.sup.3+(0+0+0).Math.10.sup.2+(0+3).Math.10.sup.1+2.Math.10.sup.0=0+0+0+30+2=32
(173) For submatrix shown in
(174) add the factors with equal positions (diagonals):
0.Math.10.sup.4+(0+7).Math.10.sup.3+(0+21+2).Math.10.sup.2+(7+6).Math.10.sup.1+2.Math.10.sup.0=0++7000+2300+130+2=9432
(175) For submatrix shown in
(176) we add the factors with equal positions (diagonals):
0.Math.10.sup.4+(0+0).Math.10.sup.3+(10+0+0).Math.10.sup.2+(10+0).Math.10.sup.1+15.Math.10.sup.0=0+0+1000+100+15=1115
(177) Folding the calculated matrix, we obtain a result similar to that obtained by the conventional HP:
(178)
(179) Matrix Interface
(180) Matrix Data Bus (MDB), Matrix memory (SAM) and matrix computing devices (such as VMM, MMM, MA, HP, etc.) will not function without the ability to provide them with the necessary information from the outside. It is necessary to provide a fast method of transferring the original matrices into and within the Matrix Processing Unit (MPU) and of extracting the results of matrix calculations. One possible method for providing a fast interface for the MPU can be a device built using matrices and light like all other sub-units of the MPU. It is proposed to share/mix light source matrices, for example, based on OLED, Quantum Dots or LE-OFETs, and photodetectors matrices, for example, based on photodiodes or LR-OFETs. As shown in
(181) To ensure two-way information transfer, both sides include both a radiating matrix and a light-receiving matrix, for example, a photodiode array, as shown in
(182) Central Controller
(183) Central Controller (CC) is a device that provides programmatic control of the IO, SAM, and all of the matrix conversion devices. Control is performed by a stream of instructions coming from an instructions data bus (Instr) separately from the External Data Bus (EDB), where matrix data to be processed is transmitted through EDB. Unlike other MPU devices, CC can be implemented on a digital serial architecture. To provide multi-thread management, the MPU CC can have a multi-core architecture. CC performs arithmetic and logical operations and has its own memory, registers, data bus, etc. CC manages the operation of the MPU, has access to SAM data, and is capable of processing this data. It is not recommended to use CC to process significant amounts of data, since this will lead to a significant decrease in the performance of the MPU. For direct access to SAM from CC, a local Matrix Register (MR) CC is required.
(184) In the preferred embodiment, the CC should execute the following instruction groups:
(185) 1. Work with IO
(186) 1.1 Read the matrix from the EDB and place it in the buffer (local MR IO)
(187) 1.2 Record the matrix from the buffer in EDB
(188) 2 Work with matrix computing devices (such as MMM, MA, HP, etc.)
(189) 2.1 Read the matrix from the MDB and write it into the indicated MR of the selected matrix computing device
(190) 2.2 Read the matrix from the indicated MR of the selected computing device and write it in MDB
(191) 2.3 Perform the calculation on the selected computing device and write the result to the indicated MR
(192) 3 Work with SAM
(193) 3.1 Read the matrix from the buffer and write it to SAM at the specified index
(194) 3.2 Read the matrix from SAM at the specified index and write it to the buffer
(195) 3.3 Read the matrix from SAM at the specified index and write it in MDB
(196) 3.4 Read the matrix from the MDB and write it to SAM at the specified index
(197) 3.5 Read the matrix from SAM at the specified index and write it in MR CC
(198) 3.6 Read the matrix from MR CC and write it to SAM at the specified index
(199) 3.7 Read the value from MR CC at the specified address in the matrix
(200) 3.8 Write the value in MR CC to the specified address in the matrix
(201) 3.9 Use SAM as a matrix computing device 3.9.1 Calculate MA 3.9.1.1 Read the preliminary summary matrix from SAM for the specified set of indices corresponding to the summable matrices, and write it in MDB 3.9.1.2 Read the preliminary total matrix from the MDB and write it into the MR device, normalizing the preliminary total matrix to the standard positional coding 3.9.2 Compute the VMM 3.9.2.1 Read preliminary vector obtained by multiplying vector defined by a set of transverse SAM indices, by the matrix recorded in the SAM layer at a specified index layer, and record the obtained preliminary vector in MDB 3.9.2.2 Read VMM preliminary results, in the form of a vector of the MDB and to record it in the MR of the device, normalizing the provisional matrix to a standard positional encoding 3.9.3 Perform commands for other SAM computational operations
(202) The transition from the arithmetic logic concept of the processor to the matrix one, as well as from the use of electronic circuits to the use of opto-electronic, allows to radically increase the speed and ability to handle complexity of calculations, as well as to reduce power consumption and heating.
(203) In the preceding specification, the invention has been described with reference to specific exemplary embodiments thereof. It will however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention as set forth in the claims that follow. The specification and drawings are accordingly to be regarded in an illustrative manner rather than a restrictive sense.
REFERENCES
(204) [1] https://cloud.google.com/blog/big-data/2017/05/an-in-depth-look-at-googles-first-tensor-processing-unit-tpu [2] http://besho.narod.ru/reviews/newage/EnLight256.pdf [3] https://www.osapublishing.org/ol/ViewMedia.cfm?uri=ol-9-8-322&seq=0&guid=d6aaaf54-f305-fb9f-6c03-453f96d7ad0b [4] https://www.semanticscholar.org/paper/A-Memristor-Crossbar-Based-Computing-Engine-Optimi-Liu-Yang/eb06412b3121f74c951741f389e99da5fd24bb57 [5] https://docs.google.com/presentation/d/1mV_wFWgIbNcvKfE-vwulv0SAag2Rt1C3Uyp9-zhIqaY/edit#slide=id.g35833395fc_0_0