Patent classifications
G06F7/533
GENERALIZED NEAR OPTIMAL PACKET ENCODING
A communication system includes: a transmitter including: an arithmetic decoder configured to generate an output symbol based on input bits and a symbol frequency table that sets frequencies of excluded symbols to 0 and frequencies of allowed symbols to non-zero values, the transmitter being configured to iteratively generate a sequence of restricted packets and an ending state, the sequence of restricted packets excluding instances of the one or more excluded symbols and to transmit the sequence of restricted packets and the ending state on a channel; and a receiver including: an arithmetic encoder configured to compute an output state based on an input state, an input symbol, and the symbol frequency table, the receiver being configured to: supply an ending state received from the channel and the restricted packets to the arithmetic encoder to iteratively generate a final state, and recover a bit sequence from the final state.
Multiplier and Adder in Systolic Array
The subject matter described herein provides systems and techniques for the design and use of multiply-and-accumulate (MAC) units to perform matrix multiplication by systolic arrays, such as those used in accelerators for deep neural networks (DNNs). These MAC units may take advantage of the particular way in which matrix multiplication is performed within a systolic array. For example, when a matrix A is multiplied with a matrix B, the scalar value, a, of the matrix A is reused many times, the scalar value, b, of the matrix B may be streamed into the systolic array and forwarded to a series of MAC units in the systolic array, and only the final values and not the intermediate values of the dot products, computed for the matrix multiplication, may be correct. MAC unit hardware that is particularized to take advantage of these observations is described herein.
Multiplier and Adder in Systolic Array
The subject matter described herein provides systems and techniques for the design and use of multiply-and-accumulate (MAC) units to perform matrix multiplication by systolic arrays, such as those used in accelerators for deep neural networks (DNNs). These MAC units may take advantage of the particular way in which matrix multiplication is performed within a systolic array. For example, when a matrix A is multiplied with a matrix B, the scalar value, a, of the matrix A is reused many times, the scalar value, b, of the matrix B may be streamed into the systolic array and forwarded to a series of MAC units in the systolic array, and only the final values and not the intermediate values of the dot products, computed for the matrix multiplication, may be correct. MAC unit hardware that is particularized to take advantage of these observations is described herein.
Processing apparatus and processing method
The present disclosure provides a computation device and method. The device may include an input module configured to acquire input data; a model generation module configured to construct an offline model according to an input network structure and weight data; a neural network operation module configured to generate a computation instruction based on the offline model and cache the computation instruction, and compute the data to be processed based on the computation instruction to obtain a computation result; and an output module configured to output a computation result. The device and method may avoid the overhead caused by running an entire software architecture, which is a problem in a traditional method.
PROCESSING ELEMENT, NEURAL PROCESSING DEVICE INCLUDING SAME, AND MULTIPLICATION OPERATION METHOD USING SAME
The present disclosure discloses a processing element and a neural processing device including the processing element. The processing element includes a weight register configured to store a weight, an input activation register configured to store input activation, a flexible multiplier configured to generate result data by performing a multiplication operation of the weight and the input activation by using a first multiplier of a first precision or using both the first multiplier and a second multiplier of the first precision in response to a calculation mode signal and a saturating adder configured to generate a partial sum by using the result data.
PROCESSING ELEMENT, NEURAL PROCESSING DEVICE INCLUDING SAME, AND MULTIPLICATION OPERATION METHOD USING SAME
The present disclosure discloses a processing element and a neural processing device including the processing element. The processing element includes a weight register configured to store a weight, an input activation register configured to store input activation, a flexible multiplier configured to generate result data by performing a multiplication operation of the weight and the input activation by using a first multiplier of a first precision or using both the first multiplier and a second multiplier of the first precision in response to a calculation mode signal and a saturating adder configured to generate a partial sum by using the result data.
Processing apparatus and processing method
The present disclosure relates to a processing device including a memory configured to store data to be computed; a computational circuit configured to compute the data to be computed, which includes performing acceleration computations on the data to be computed by using an adder circuit and a multiplier circuit; and a control circuit configured to control the memory and the computational circuit, which includes performing acceleration computations according to the data to be computed. The present disclosure may have high flexibility, good configurability, fast computational speed, low power consumption, and other features.
COMPUTING APPARATUS AND METHOD FOR VECTOR INNER PRODUCT, AND INTEGRATED CIRCUIT CHIP
The present disclosure relates to a computing apparatus, a method and an integrated circuit chip for a vector inner product, where the computing apparatus may be included in a combined processing apparatus. The combined processing apparatus may further include a general interconnection interface and other processing apparatus. The computing apparatus interacts with other processing apparatus to jointly complete a computing operation specified by a user. The combined processing apparatus may further include a storage apparatus, where the storage apparatus is respectively connected to the computing apparatus and other processing apparatus, and the storage apparatus is used for storing data of the computing apparatus and other processing apparatus.
COMPUTING APPARATUS AND METHOD FOR VECTOR INNER PRODUCT, AND INTEGRATED CIRCUIT CHIP
The present disclosure relates to a computing apparatus, a method and an integrated circuit chip for a vector inner product, where the computing apparatus may be included in a combined processing apparatus. The combined processing apparatus may further include a general interconnection interface and other processing apparatus. The computing apparatus interacts with other processing apparatus to jointly complete a computing operation specified by a user. The combined processing apparatus may further include a storage apparatus, where the storage apparatus is respectively connected to the computing apparatus and other processing apparatus, and the storage apparatus is used for storing data of the computing apparatus and other processing apparatus.
Fast digital multiply-accumulate (MAC) by fast digital multiplication circuit
Certain aspects provide methods and apparatus for multiplication of digital signals. In accordance with certain aspects, a multiplication circuit may be used to multiply a portion of a first digital input signal with a portion of a second digital input signal via a first multiplier circuit to generate a first multiplication signal, and multiply another portion of the first digital input signal with another portion of the second digital input signal via a second multiplier circuit to generate a second multiplication signal. A third multiplier circuit and multiple adder circuits may be used to generate an output of the multiplication circuit based on the first and second multiplication signals.