Patent classifications
G06F7/533
PROCESSING ELEMENT, NEURAL PROCESSING DEVICE INCLUDING SAME, AND MULTIPLICATION OPERATION METHOD USING SAME
The present disclosure discloses a processing element and a neural processing device including the processing element. The processing element includes a weight register configured to store a weight, an input activation register configured to store input activation, a flexible multiplier configured to generate result data by performing a multiplication operation of the weight and the input activation by using a first multiplier of a first precision or using both the first multiplier and a second multiplier of the first precision in response to a calculation mode signal and a saturating adder configured to generate a partial sum by using the result data.
PROCESSING ELEMENT, NEURAL PROCESSING DEVICE INCLUDING SAME, AND MULTIPLICATION OPERATION METHOD USING SAME
The present disclosure discloses a processing element and a neural processing device including the processing element. The processing element includes a weight register configured to store a weight, an input activation register configured to store input activation, a flexible multiplier configured to generate result data by performing a multiplication operation of the weight and the input activation by using a first multiplier of a first precision or using both the first multiplier and a second multiplier of the first precision in response to a calculation mode signal and a saturating adder configured to generate a partial sum by using the result data.
MULTIPLY-ACCUMULATE "0" DATA GATING
In an example, an apparatus comprises a plurality of execution units and logic, at least partially including hardware logic, to gate at least one of a multiply unit or an accumulate unit in response to an input of value zero. Other embodiments are also disclosed and claimed.
APPARATUS AND METHOD USING NEURAL NETWORK
An apparatus includes a first holding unit and a second holding unit configured to hold first-type data and second-type data, respectively, a first operation unit configured to execute a first product-sum operation based on the first-type data, a branch unit configured to output an operation result of the first product-sum operation in parallel, a sampling unit configured to sample the operation result and to output a sampling result, and a second operation unit configured to execute a second product-sum operation based on the second-type data and the sampling result.
APPARATUS AND METHOD USING NEURAL NETWORK
An apparatus includes a first holding unit and a second holding unit configured to hold first-type data and second-type data, respectively, a first operation unit configured to execute a first product-sum operation based on the first-type data, a branch unit configured to output an operation result of the first product-sum operation in parallel, a sampling unit configured to sample the operation result and to output a sampling result, and a second operation unit configured to execute a second product-sum operation based on the second-type data and the sampling result.
Calculation of a number of iterations
Performing an arithmetic operation in a data processing unit, including calculating a number of iterations for performing the arithmetic operation with a given number of bits per iteration. The number of bits per iteration is a positive natural number. A number of consecutive digit positions of a digit in a sequence of bits represented in the data processing unit is counted. The length of the sequence is a multiple of the number of bits per iteration. A quotient of the number of consecutive digit positions divided by the number of bits per iteration is calculated, as well as a remainder of the division.
Calculation of a number of iterations
Performing an arithmetic operation in a data processing unit, including calculating a number of iterations for performing the arithmetic operation with a given number of bits per iteration. The number of bits per iteration is a positive natural number. A number of consecutive digit positions of a digit in a sequence of bits represented in the data processing unit is counted. The length of the sequence is a multiple of the number of bits per iteration. A quotient of the number of consecutive digit positions divided by the number of bits per iteration is calculated, as well as a remainder of the division.
SYSTEM AND METHOD FOR LONG ADDITION AND LONG MULTIPLICATION IN ASSOCIATIVE MEMORY
A method for an associative memory device includes replacing a set of three multi-bit binary numbers P, Q and R, stored in the associative memory device, with two multi-bit binary numbers X and Y, also stored in the associative memory device, wherein a sum of the binary numbers P, Q and R is equal to a sum of the binary numbers X and Y. A system includes an associative memory array having rows and columns and a multi-bit multiplier. Each column of the array stores two multi-bit binary numbers to be multiplied. The multi-bit multiplier multiplies, in parallel, the two multi-bit binary numbers per column by concurrently processing all bits of partial products generated by the multiplier. The multiplier performs the processing without any carry propagation delay when adding all but the last two partial products.
SYSTEM AND METHOD FOR LONG ADDITION AND LONG MULTIPLICATION IN ASSOCIATIVE MEMORY
A method for an associative memory device includes replacing a set of three multi-bit binary numbers P, Q and R, stored in the associative memory device, with two multi-bit binary numbers X and Y, also stored in the associative memory device, wherein a sum of the binary numbers P, Q and R is equal to a sum of the binary numbers X and Y. A system includes an associative memory array having rows and columns and a multi-bit multiplier. Each column of the array stores two multi-bit binary numbers to be multiplied. The multi-bit multiplier multiplies, in parallel, the two multi-bit binary numbers per column by concurrently processing all bits of partial products generated by the multiplier. The multiplier performs the processing without any carry propagation delay when adding all but the last two partial products.
Neural processing accelerator
A system for calculating. A scratch memory is connected to a plurality of configurable processing elements by a communication fabric including a plurality of configurable nodes. The scratch memory sends out a plurality of streams of data words. Each data word is either a configuration word used to set the configuration of a node or of a processing element, or a data word carrying an operand or a result of a calculation. Each processing element performs operations according to its current configuration and returns the results to the communication fabric, which conveys them back to the scratch memory.