G06F7/5095

LOSSY ARITHMETIC
20180006817 · 2018-01-04 ·

Embodiments include a method of adding first and second binary numbers having C bits and divided into D words to provide a third binary number in E successive adding operations, C, D and E being plural positive integers, the method comprising: a first group of D adding operations adding together respective words of the first and second binary numbers to provide D sum and carry outputs ranging from a least significant to a most significant sum and carry output; one or more subsequent groups of adding operations adding together sum and carry outputs from an immediately preceding group of adding operations, a final group of the one or more subsequent groups resulting in the third binary number consisting of the sum outputs from the final group and a carry from the most significant carry output of the final group, wherein E is less than D.

Optimization of neural networks using hardware calculation efficiency and adjustment factors
11243743 · 2022-02-08 · ·

In one embodiment, a method includes receiving a request for an operation to be performed; determining that the operation is associated with a machine-learning algorithm, and in response, route the operation to a computing circuit; performing, at the computing circuit, the operation, including: determining a linear domain product of a first log-domain number and a second log-domain number associated with the operation based on a summation of the first log-domain number and the second log-domain number and output a third log-domain number approximating the linear domain product of the first log-domain number and the second log-domain number; converting the third log-domain number to a first linear-domain number; summing the first linear-domain number and a second linear-domain number associated with the operation, and output a third linear-domain number as the summed result.

SYSTOLIC ARRAY CELLS WITH OUTPUT POST-PROCESSING
20220156344 · 2022-05-19 ·

This specification relates to systolic arrays of hardware processing units. In one aspect, a matrix multiplication unit includes multiple cells arranged in a systolic array. Each cell includes multiplication circuitry configured to determine a product of elements of input matrices. Each cell includes an accumulator configured to determine an accumulated value by accumulating a sum of the products output by the multiplication circuitry. Each cell also includes a post-processing component configured to determine a post-processed value by performing one or more post-processing operations on the accumulated value.

LSTM circuit with selective input computation

An apparatus is described. The apparatus includes a long short term memory (LSTM) circuit having a multiply accumulate circuit (MAC). The MAC circuit has circuitry to rely on a stored product term rather than explicitly perform a multiplication operation to determine the product term if an accumulation of differences between consecutive, preceding input values has not reached a threshold.

XIU-ACCUMULATING REGISTER, XIU-ACCUMULATING REGISTER CIRCUIT, AND ELECTRONIC DEVICE
20210224035 · 2021-07-22 ·

The present disclosure relates to aft XIU-accumulating register, aft XIU-accumulating register circuit, and an electronic device. The XIU-accumulating register includes a first accumulating unit and a second accumulating unit. The first accumulating unit includes a first adder and a first register; the first adder is configured to accumulate fractional bit data of an accumulated variable, and the first register is configured to store an accumulated result of the fractional bit data and carry bit data of the accumulated result of the fractional bit data. The second accumulating unit includes a second adder and a second register; the second adder is configured to accumulate integer bit data of the accumulated variable, and the second register is configured to store an accumulated result of the integer bit data.

LSTM CIRCUIT WITH SELECTIVE INPUT COMPUTATION

An apparatus is described. The apparatus includes a long short term memory (LSTM) circuit having a multiply accumulate circuit (MAC). The MAC circuit has circuitry to rely on a stored product term rather than explicitly perform a multiplication operation to determine the product term if an accumulation of differences between consecutive, preceding input values has not reached a threshold.

Accelerated quantized multiply-and-add operations

Disclosed herein are techniques for accelerating convolution operations or other matrix multiplications in applications such as neural network. In one example, an apparatus comprises a first circuit, a second circuit, and a third circuit. The first circuit is configured to: receive first values in a first format, the first values being generated from one or more asymmetric quantization operations of second values in a second format, and generate difference values based on subtracting a third value from each of the first values, the third value representing a zero value in the first format. The second circuit is configured to generate a sum of products in the first format using the difference values. The third circuit is configured to convert the sum of products from the first format to the second format based on scaling the sum of products with a scaling factor.

ACCUMULATOR HARDWARE
20230409287 · 2023-12-21 ·

Accumulator hardware logic includes first and second addition logic units and a store. The first addition logic unit comprises a first input, a second input and an output, each of the first and second inputs arranged to receive an input value in each clock cycle. The second addition logic unit comprises a first input that is connected directly to the output of the first addition logic unit. It also comprises a second input and an output. The store is arranged to store a result output by the second addition logic unit. The accumulator hardware logic further comprises shifting hardware and/or negation hardware positioned in a feedback path between the store and the second input of the second addition logic unit. The shifting hardware is configured to perform a shift by a fixed number of bit positions in a fixed direction.

Advanced peripheral bus based serial peripheral interface communication device
10866919 · 2020-12-15 ·

Embodiments of the present disclosure provide an APB (Advanced Peripheral Bus) bus-based SPI (Serial Peripheral Interface) communication device. The device comprises: an APB interface module, an SPI bus interface module, an encryption module, and a decryption module, wherein the encryption module receives plaintext data and a key from a master via the APB interface module, generates, when enabled, ciphertext data according to the plaintext data and the key, and sends the ciphertext data to a slave via the SPI bus interface module; the decryption module receives the ciphertext data from the slave via the SPI bus interface module and receives a key from the master via the APB interface module, generates, when enabled, plaintext data according to the ciphertext data and the key, and sends the plaintext data to the master via the APB interface module. The present disclosure can improve the security of data transmission.

ACCELERATED QUANTIZED MULTIPLY-AND-ADD OPERATIONS

Disclosed herein are techniques for accelerating convolution operations or other matrix multiplications in applications such as neural network. In one example, an apparatus comprises a first circuit, a second circuit, and a third circuit. The first circuit is configured to: receive first values in a first format, the first values being generated from one or more asymmetric quantization operations of second values in a second format, and generate difference values based on subtracting a third value from each of the first values, the third value representing a zero value in the first format. The second circuit is configured to generate a sum of products in the first format using the difference values. The third circuit is configured to convert the sum of products from the first format to the second format based on scaling the sum of products with a scaling factor.