Patent classifications
G06F7/5312
METHODS AND ELECTRONIC DEVICE FOR HIGH PERFORMANCE MODULO MULTIPLICATION
Embodiments herein disclose high performance modulo multiplication methods performed by circuitry of an electronic device. The method includes obtaining and summing partial products to obtain a partial multiplication result using a primary Wallace tree. The partial multiplication result is fed back in a next cycle for subsequent limb multiplication associated with the primary Wallace tree. The obtaining and summing of partial products and feeding back operations are repeated until all limbs associated with the primary Wallace tree are completed. A residual computation of a partial multiplication result associated with a final limb of the primary Wallace tree is then performed, to obtain a multiplication result using a secondary Wallace tree, where the final limb stores the partial multiplication result of a last iteration.
Multiply-and-accumulate unit in carry-save adder format and application in a feedback loop equalizer
A multiply and accumulation (MAC) unit for multiplying a provided first and a provided second multiplicand and for adding a provided summand to the resulting product is described. The MAC includes at least one multiplication block which is configured to multiply a first input signal and a second input signal, wherein the first input signal is given in a carry-save adder format and the second input signal is given in a binary format, wherein the multiplication result is provided in a carry-save format, and a carry-save adder which is configured to add to the result of the multiplication the provided summand.
Multiplier and adder in systolic array
The subject matter described herein provides systems and techniques for the design and use of multiply-and-accumulate (MAC) units to perform matrix multiplication by systolic arrays, such as those used in accelerators for deep neural networks (DNNs). These MAC units may take advantage of the particular way in which matrix multiplication is performed within a systolic array. For example, when a matrix A is multiplied with a matrix B, the scalar value, a, of the matrix A is reused many times, the scalar value, b, of the matrix B may be streamed into the systolic array and forwarded to a series of MAC units in the systolic array, and only the final values and not the intermediate values of the dot products, computed for the matrix multiplication, may be correct. MAC unit hardware that is particularized to take advantage of these observations is described herein.
Combined adder and pre-adder for high-radix multiplier circuit
Circuitry accepting a first input value and a second input value, and outputting (a) a first sum involving the first input value and the second input value, and (b) a second sum involving the first input value and the second input value, includes a first adder circuit, a second adder circuit, a compressor circuit and a preprocessing stage. The first input value and the second input value are input to the first adder circuit to provide the first sum. The first input value and the second input value are input to the preprocessing stage to provide inputs to the compressor circuit, which provides first and second compressed output signals which in turn are input to the second adder circuit to provide the second sum. The preprocessing stage may include circuitry to programmably zero the first input value, so that the first sum is programmably settable to the second input value.
Arithmetic circuit and arithmetic method
According to one embodiment, an arithmetic circuit includes follows. The arithmetic unit performs an arithmetic operation including addition and multiplication to generate a first value of (n+m) bits. The rounding preprocessor performs an OR operation on lower (mk) bits of the first value to generate a second value of 1 bit. The register stores a third value of (n+k+1) bits obtained by concatenating upper (n+k) bits of the first value and the second value. The rounding postprocessor calculates a carry bit value of 1 bit from a most significant bit of the third value and lower (k+1) bits of the third value, and adds the carry bit value to upper n bits of the third value.
Compressed wallace trees in FMA circuits
An embodiment of an apparatus comprises one or more fractional width fused multiply-accumulate (FMA) circuits configured as a shared Wallace tree, and circuitry coupled to the one or more fractional width FMA circuits to provide one or more fractional width FMA operations through the one or more fractional width FMA circuits. Other embodiments are disclosed and claimed.