Patent classifications
G06F7/50
INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND INFORMATION PROCESSING PROGRAM
A reservoir includes a common input layer, first and second output layers that outputs a first and a second readout values based on an input, a first partial reservoir including the input layer and the first output layer, and a second partial reservoir having a size between the input layer and the second output layer larger than the size of the first partial reservoir, and the training processing including: first calculating a third output weight that reduces a difference between a first product sum value of a third readout value and a first output weight; and second calculating a fourth output weight that reduces a difference between a second product sum value of a fourth readout value and a second output weight and differential teaching data that is a difference between a third product sum value of the third readout value and the third output weight and the teaching data.
High Performance Systems And Methods For Modular Multiplication
A circuit system for performing modular reduction of a modular multiplication includes multiplier circuits that receive a first subset of coefficients that are generated by summing partial products of a multiplication operation that is part of the modular multiplication. The multiplier circuits multiply the coefficients in the first subset by constants that equal remainders of divisions to generate products. Adder circuits add a second subset of the coefficients and segments of bits of the products that are aligned with respective ones of the second subset of the coefficients to generate sums.
High Performance Systems And Methods For Modular Multiplication
A circuit system for performing modular reduction of a modular multiplication includes multiplier circuits that receive a first subset of coefficients that are generated by summing partial products of a multiplication operation that is part of the modular multiplication. The multiplier circuits multiply the coefficients in the first subset by constants that equal remainders of divisions to generate products. Adder circuits add a second subset of the coefficients and segments of bits of the products that are aligned with respective ones of the second subset of the coefficients to generate sums.
HARDWARE ACCELERATOR FOR PERFORMING COMPUTATIONS OF DEEP NEURAL NETWORK AND ELECTRONIC DEVICE INCLUDING THE SAME
A hardware accelerator includes a processing core including a plurality of multipliers configured to perform one-dimensional (1D) sub-word parallelism between a sign and a mantissa of a first tensor and a sign and a mantissa of a second tensor, a first processing device configured to operate in a two-dimensional (2D) operation mode in which results of computation by the plurality of multipliers are output, and a second processing device configured to operate in a three-dimensional (3D) operation mode in which results of computation by the plurality of multipliers are accumulated in a channel direction and then a result of accumulating the results of computation is output.
HARDWARE ACCELERATOR FOR PERFORMING COMPUTATIONS OF DEEP NEURAL NETWORK AND ELECTRONIC DEVICE INCLUDING THE SAME
A hardware accelerator includes a processing core including a plurality of multipliers configured to perform one-dimensional (1D) sub-word parallelism between a sign and a mantissa of a first tensor and a sign and a mantissa of a second tensor, a first processing device configured to operate in a two-dimensional (2D) operation mode in which results of computation by the plurality of multipliers are output, and a second processing device configured to operate in a three-dimensional (3D) operation mode in which results of computation by the plurality of multipliers are accumulated in a channel direction and then a result of accumulating the results of computation is output.
Pre-characterization mixed-signal design, placement, and routing using machine learning
Systems, methods, and devices are disclosed herein for developing a cell design. Operations of a plurality of electrical cells are simulated to collect a plurality of electrical parameters. A machine learning model is trained using the plurality of electrical parameters. The trained machine learning model receives data having cell layout design constraints. The trained machine learning model determines a cell layout for the received data based on the plurality of electrical parameters. The cell layout is provided for further characterization of electrical performance within the cell layout design constraints.
Multiplier and Adder in Systolic Array
The subject matter described herein provides systems and techniques for the design and use of multiply-and-accumulate (MAC) units to perform matrix multiplication by systolic arrays, such as those used in accelerators for deep neural networks (DNNs). These MAC units may take advantage of the particular way in which matrix multiplication is performed within a systolic array. For example, when a matrix A is multiplied with a matrix B, the scalar value, a, of the matrix A is reused many times, the scalar value, b, of the matrix B may be streamed into the systolic array and forwarded to a series of MAC units in the systolic array, and only the final values and not the intermediate values of the dot products, computed for the matrix multiplication, may be correct. MAC unit hardware that is particularized to take advantage of these observations is described herein.
Multiplier and Adder in Systolic Array
The subject matter described herein provides systems and techniques for the design and use of multiply-and-accumulate (MAC) units to perform matrix multiplication by systolic arrays, such as those used in accelerators for deep neural networks (DNNs). These MAC units may take advantage of the particular way in which matrix multiplication is performed within a systolic array. For example, when a matrix A is multiplied with a matrix B, the scalar value, a, of the matrix A is reused many times, the scalar value, b, of the matrix B may be streamed into the systolic array and forwarded to a series of MAC units in the systolic array, and only the final values and not the intermediate values of the dot products, computed for the matrix multiplication, may be correct. MAC unit hardware that is particularized to take advantage of these observations is described herein.
ADAPTIVE MAC ARRAY SCHEDULING IN A CONVOLUTIONAL NEURAL NETWORK
The present invention relates to convolution neural networks (CNN) and methods for improving computational efficiency of multiply accumulate (MAC) array structure Specifically, the invention relates to cutting of activation data into a number of tiles for increasing overall computation efficiency. The invention discloses techniques to cut an activation data into a plurality of tiles by using a 3-D convolution computation core and support bigger tensor sizes. Lastly, the invention provides adaptive scheduling of MAC array to achieve high utilization in multi-precision neural network acceleration.
ADAPTIVE MAC ARRAY SCHEDULING IN A CONVOLUTIONAL NEURAL NETWORK
The present invention relates to convolution neural networks (CNN) and methods for improving computational efficiency of multiply accumulate (MAC) array structure Specifically, the invention relates to cutting of activation data into a number of tiles for increasing overall computation efficiency. The invention discloses techniques to cut an activation data into a plurality of tiles by using a 3-D convolution computation core and support bigger tensor sizes. Lastly, the invention provides adaptive scheduling of MAC array to achieve high utilization in multi-precision neural network acceleration.