Patent classifications
G06F7/501
ARTIFICIAL INTELLIGENCE ACCELERATORS
An artificial intelligence (AI) accelerator includes memory circuits configured to output weight data and vector data, a multiplication circuit/adder tree performing a multiplying/adding calculation on the weight data and the vector data to generate multiplication/addition result data, a first accumulator synchronized with an odd clock signal to perform an accumulative adding calculation on odd-numbered multiplication/addition result data of the multiplication/addition result data and a first latched data, and a second accumulator synchronized with an even clock signal to perform an accumulative adding calculation on even-numbered multiplication/addition result data of the multiplication/addition result data and a second latched data.
Method and apparatus for performing convolution operation on folded feature data
Disclosed are a method and an apparatus for performing convolution operation on folded feature data. The method comprises: reading the folded feature data provided to a convolution layer and an original convolution kernel from a dynamic random access memory (DRAM); pre-processing the folded feature data and the original convolution kernel; storing the pre-processed folded feature data into a static random-access memory (SRAM); folding the pre-processed original convolution kernel in at least one dimension of width or height according to a folding manner of the folded feature data to generate one or more folded convolution kernels corresponding to the original convolution kernel; storing the one or more folded convolution kernels in the SRAM; and reading the pre-processed folded feature data and the one or more folded convolution kernels from the SRAM into a calculation unit for convolving the pre-processed folded feature data with the one or more folded convolution kernels.
Method and apparatus for performing convolution operation on folded feature data
Disclosed are a method and an apparatus for performing convolution operation on folded feature data. The method comprises: reading the folded feature data provided to a convolution layer and an original convolution kernel from a dynamic random access memory (DRAM); pre-processing the folded feature data and the original convolution kernel; storing the pre-processed folded feature data into a static random-access memory (SRAM); folding the pre-processed original convolution kernel in at least one dimension of width or height according to a folding manner of the folded feature data to generate one or more folded convolution kernels corresponding to the original convolution kernel; storing the one or more folded convolution kernels in the SRAM; and reading the pre-processed folded feature data and the one or more folded convolution kernels from the SRAM into a calculation unit for convolving the pre-processed folded feature data with the one or more folded convolution kernels.
Method for using and forming low power ferroelectric based majority logic gate adder
An adder uses with first and second majority gates. For a 1-bit adder, output from a 3-input majority gate is inverted and input two times to a 5-input majority gate. Other inputs to the 5-input majority gate are the same as those of the 3-input majority gate. The output of the 5-input majority gate is a sum while the output of the 3-input majority gate is the carry. Multiple 1-bit adders are concatenated to form an N-bit adder. The input signals to the majority gates can be analog, digital, or a combination of them, which are driven to first terminals of non-ferroelectric capacitors. The second terminals of the non-ferroelectric capacitors are coupled to form a majority node. Majority function of the input signals occurs on this node. The majority node is then coupled to a first terminal of a non-linear polar capacitor. The second terminal of the capacitor provides the output of the logic gate.
Method for using and forming low power ferroelectric based majority logic gate adder
An adder uses with first and second majority gates. For a 1-bit adder, output from a 3-input majority gate is inverted and input two times to a 5-input majority gate. Other inputs to the 5-input majority gate are the same as those of the 3-input majority gate. The output of the 5-input majority gate is a sum while the output of the 3-input majority gate is the carry. Multiple 1-bit adders are concatenated to form an N-bit adder. The input signals to the majority gates can be analog, digital, or a combination of them, which are driven to first terminals of non-ferroelectric capacitors. The second terminals of the non-ferroelectric capacitors are coupled to form a majority node. Majority function of the input signals occurs on this node. The majority node is then coupled to a first terminal of a non-linear polar capacitor. The second terminal of the capacitor provides the output of the logic gate.
MICROPROCESSOR EQUIPPED WITH AN ARITHMETIC AND LOGIC UNIT AND WITH A HARDWARE SECURITY MODULE
This microprocessor is configured to compute a code C.sub.1, used to detect an execution fault, using a relationship C.sub.i=P o F.sub.α(D.sub.i), where: F.sub.α(D.sub.i)=E.sub.0 o . . . o E.sub.q o . . . o E.sub.NbE−1(D.sub.i), E.sub.q(x)=T.sub.αm,q o . . . o T.sub.αj,q o . . . o T.sub.α1,q o T.sub.α0,q(X), and T.sub.αj,q is a conditional transposition, configured by a secret parameter α.sub.j,q, that permutes two blocks of bits B.sub.2j+1,q and B.sub.2j,q of the variable x only when the parameter a.sub.j,q is equal to a first value, the blocks B.sub.2j+1,q and B.sub.2j,q of all of the transpositions T.sub.αj,q of the stage E.sub.q being different from one another and not overlapping and the blocks B.sub.2j+1,q and B.sub.2j,q are placed within one and the same block of greater size permuted by a transposition of the higher stage E.sub.q+1.
MICROPROCESSOR EQUIPPED WITH AN ARITHMETIC AND LOGIC UNIT AND WITH A HARDWARE SECURITY MODULE
This microprocessor is configured to compute a code C.sub.1, used to detect an execution fault, using a relationship C.sub.i=P o F.sub.α(D.sub.i), where: F.sub.α(D.sub.i)=E.sub.0 o . . . o E.sub.q o . . . o E.sub.NbE−1(D.sub.i), E.sub.q(x)=T.sub.αm,q o . . . o T.sub.αj,q o . . . o T.sub.α1,q o T.sub.α0,q(X), and T.sub.αj,q is a conditional transposition, configured by a secret parameter α.sub.j,q, that permutes two blocks of bits B.sub.2j+1,q and B.sub.2j,q of the variable x only when the parameter a.sub.j,q is equal to a first value, the blocks B.sub.2j+1,q and B.sub.2j,q of all of the transpositions T.sub.αj,q of the stage E.sub.q being different from one another and not overlapping and the blocks B.sub.2j+1,q and B.sub.2j,q are placed within one and the same block of greater size permuted by a transposition of the higher stage E.sub.q+1.
Systems and methods for energy-efficient analog matrix multiplication for machine learning processes
A novel energy-efficient multiplication circuit using analog multipliers and adders reduces the distance data has to move and the number of times the data has to be moved when performing matrix multiplications in the analog domain. The multiplication circuit is tailored to bitwise multiply the innermost product of a rearranged matrix formula to output the generate a matrix multiplication result in form of a current that is then digitized for further processing.
Systems and methods for energy-efficient analog matrix multiplication for machine learning processes
A novel energy-efficient multiplication circuit using analog multipliers and adders reduces the distance data has to move and the number of times the data has to be moved when performing matrix multiplications in the analog domain. The multiplication circuit is tailored to bitwise multiply the innermost product of a rearranged matrix formula to output the generate a matrix multiplication result in form of a current that is then digitized for further processing.
ERROR CALIBRATION APPARATUS AND METHOD
An error calibration apparatus and method are provided. The method is adapted for calibrating a machine learning (ML) accelerator. The ML accelerator achieves computation by using an analog circuit. An error between an output value of one or more computing layers of a neural network and a corresponding corrected value is determined. The computation of the computing layers is achieved by the analog circuit. A calibration node is generated according to the error. The calibration node is located at the next layer of the computing layers. The calibration node is used to minimize the error. The calibration node is achieved by a digital circuit. Accordingly, error and distortion of the analog circuit could be reduced.