G06F7/5095

Second-order delta-sigma modulator and transmission apparatus
10707893 · 2020-07-07 · ·

A second-order modulator includes a plurality of integrators and a parallel higher-bit processing part, and the parallel higher-bit processing part includes a plurality of addition and determination processing sections. The addition and determination processing section receives first and second carry inputs and first and second state inputs, and outputs a quantized output and first and second state outputs. A first selector selects one set from sets of the first and the second state outputs from the plurality of addition and determination processing sections and outputs the selected set, and a second selector selects one quantized output from the quantized outputs from the plurality of addition and determination processing sections. An output of the first selector is used as a selection control signal for the first and the second selectors.

Accelerated quantized multiply-and-add operations

Disclosed herein are techniques for accelerating convolution operations or other matrix multiplications in applications such as neural network. A computer-implemented method includes receiving low-precision inputs for a convolution operation from a storage device, and subtracting a low-precision value representing a high-precision zero value from the low-precision inputs to generate difference values, where the low-precision inputs are asymmetrically quantized from high-precision inputs. The method also includes performing multiplication and summation operations on the difference values to generate a sum of products, and generating a high-precision output by scaling the sum of products with a scaling factor.

OPTIMIZATION OF NEURAL NETWORKS USING HARDWARE CALCULATION EFFICIENCY AND ADJUSTMENT FACTORS
20200125330 · 2020-04-23 ·

In one embodiment, a method includes receiving a request for an operation to be performed; determining that the operation is associated with a machine-learning algorithm, and in response, route the operation to a computing circuit; performing, at the computing circuit, the operation, including: determining a linear domain product of a first log-domain number and a second log-domain number associated with the operation based on a summation of the first log-domain number and the second log-domain number and output a third log-domain number approximating the linear domain product of the first log-domain number and the second log-domain number; converting the third log-domain number to a first linear-domain number; summing the first linear-domain number and a second linear-domain number associated with the operation, and output a third linear-domain number as the summed result.

OPTIMIZATION OF NEURAL NETWORKS USING HARDWARE CALCULATION EFFICIENCY
20200125991 · 2020-04-23 ·

In one embodiment, a method includes receiving a request for an operation to be performed; determining that the operation is associated with a machine-learning algorithm, and in response, route the operation to a computing circuit; performing, at the computing circuit, the operation, including: determining a linear domain product of a first log-domain number and a second log-domain number associated with the operation based on a summation of the first log-domain number and the second log-domain number and output a third log-domain number approximating the linear domain product of the first log-domain number and the second log-domain number; converting the third log-domain number to a first linear-domain number; summing the first linear-domain number and a second linear-domain number associated with the operation, and output a third linear-domain number as the summed result.

LSTM CIRCUIT WITH SELECTIVE INPUT COMPUTATION

An apparatus is described. The apparatus includes a long short term memory (LSTM) circuit having a multiply accumulate circuit (MAC). The MAC circuit has circuitry to rely on a stored product term rather than explicitly perform a multiplication operation to determine the product term if an accumulation of differences between consecutive, preceding input values has not reached a threshold.

ADVANCED PERIPHERAL BUS BASED SERIAL PERIPHERAL INTERFACE COMMUNICATION DEVICE
20190361836 · 2019-11-28 ·

Embodiments of the present disclosure provide an APB (Advanced Peripheral Bus) bus-based SPI (Serial Peripheral Interface) communication device. The device comprises: an APB interface module, an SPI bus interface module, an encryption module, and a decryption module, wherein the encryption module receives plaintext data and a key from a master via the APB interface module, generates, when enabled, ciphertext data according to the plaintext data and the key, and sends the ciphertext data to a slave via the SPI bus interface module; the decryption module receives the ciphertext data from the slave via the SPI bus interface module and receives a key from the master via the APB interface module, generates, when enabled, plaintext data according to the ciphertext data and the key, and sends the plaintext data to the master via the APB interface module. The present disclosure can improve the security of data transmission.

SECOND-ORDER DELTA-SIGMA MODULATOR AND TRANSMISSION APPARATUS
20190326924 · 2019-10-24 · ·

A second-order modulator includes a plurality of integrators and a parallel higher-bit processing part, and the parallel higher-bit processing part includes a plurality of addition and determination processing sections. The addition and determination processing section receives first and second carry inputs and first and second state inputs, and outputs a quantized output and first and second state outputs. A first selector selects one set from sets of the first and the second state outputs from the plurality of addition and determination processing sections and outputs the selected set, and a second selector selects one quantized output from the quantized outputs from the plurality of addition and determination processing sections. An output of the first selector is used as a selection control signal for the first and the second selectors.

ACCELERATED QUANTIZED MULTIPLY-AND-ADD OPERATIONS

Disclosed herein are techniques for accelerating convolution operations or other matrix multiplications in applications such as neural network. A computer-implemented method includes receiving low-precision inputs for a convolution operation from a storage device, and subtracting a low-precision value representing a high-precision zero value from the low-precision inputs to generate difference values, where the low-precision inputs are asymmetrically quantized from high-precision inputs. The method also includes performing multiplication and summation operations on the difference values to generate a sum of products, and generating a high-precision output by scaling the sum of products with a scaling factor.

Data accumulation apparatus and method, and digital signal processing device

The present disclosure provides a data accumulation device and method, and a digital signal processing device. The device comprises: an accumulation tree module for accumulating input data in the form of a binary tree structure and outputting accumulated result data; a register module including a plurality of groups of registers and used for registering intermediate data generated by the accumulation tree module during an accumulation process and the accumulated result data; and a control circuit for generating a data gating signal to control the accumulation tree module to filter the input data not required to be accumulated, and generating a flag signal to perform the following control: selecting a result obtained after adding one or more of intermediate data stored in the register to the accumulated result as output data, or directly selecting the accumulated result as output data. Thus, a plurality of groups of input data can be rapidly accumulated to a group of sums in a clock cycle. At the same time, the accumulation device can flexibly select to simultaneously accumulate some data of the plurality of input data by means of a control signal.

APPARATUS AND METHOD FOR RIGHT SHIFTING PACKED QUADWORDS AND EXTRACTING PACKED DOUBLEWORDS

An apparatus and method for performing sum of absolute differences with accumulation. For example, one embodiment of a processor comprises: a decoder to decode an instruction to generate a decoded instruction; a first source register to store a first plurality of packed bytes; a second source register to store a second plurality of packed bytes; execution circuitry to execute the decoded instruction, the execution circuitry comprising: adder circuitry to determine a difference between each byte in the first source register and a corresponding byte in the second source register, absolute value circuitry to determine an absolute value of each difference, the adder circuitry to add pairs of the absolute values to generate a plurality of temporary results, and extension circuitry to extend the temporary results to temporary words; and accumulator circuitry to add each temporary word to a word from a third source register to generate a plurality of accumulated words; and a destination register to store the accumulated words as packed words.