G06F7/57

ULTRA-LOW-POWER AND LOW-AREA SOLUTION OF BINARY MULTIPLY-ACCUMULATE SYSTEM AND METHOD

Data structure and microcontroller architecture performing binary multiply-accumulate operations using multiple partial copies of weights. Destination-register location, source-register location, and weight-register location are received. Using the weight-register location, a sub-set of the weight bits is copied a select number of times based on a filter index value that is received. Each copy of the sub-set of weights is executed in parallel. Using the source-register location, a sub-set of the input bits is selected based on the size of the sub-set of weights, wherein the sub-set of input bits is shifted one bit from a previous sub-set of input bits. XOR operation is performed on each corresponding bit in the copy of the sub-set of weights with each corresponding bit in the selected sub-set of input bits. In a corresponding destination sub-location, output of each XOR operation is aggregated with each other and with current value of the corresponding destination sub-location.

ULTRA-LOW-POWER AND LOW-AREA SOLUTION OF BINARY MULTIPLY-ACCUMULATE SYSTEM AND METHOD

Data structure and microcontroller architecture performing binary multiply-accumulate operations using multiple partial copies of weights. Destination-register location, source-register location, and weight-register location are received. Using the weight-register location, a sub-set of the weight bits is copied a select number of times based on a filter index value that is received. Each copy of the sub-set of weights is executed in parallel. Using the source-register location, a sub-set of the input bits is selected based on the size of the sub-set of weights, wherein the sub-set of input bits is shifted one bit from a previous sub-set of input bits. XOR operation is performed on each corresponding bit in the copy of the sub-set of weights with each corresponding bit in the selected sub-set of input bits. In a corresponding destination sub-location, output of each XOR operation is aggregated with each other and with current value of the corresponding destination sub-location.

Neural net work processing
11537860 · 2022-12-27 · ·

A neural network processor is disclosed that includes a combined convolution and pooling circuit that can perform both convolution and pooling operations. The circuit can perform a convolution operation by a multiply circuit determining products of corresponding input feature map and convolution kernel weight values, and an add circuit accumulating the products determined by the multiply circuit in storage. The circuit can perform an average pooling operation by the add circuit accumulating input feature map data values in the storage, a divisor circuit determining a divisor value, and a division circuit dividing the data value accumulated in the storage by the determined divisor value. The circuit can perform a maximum pooling operation by a maximum circuit determining a maximum value of input feature map data values, and storing the determined maximum value in the storage.

SEMICONDUCTOR INTEGRATED CIRCUIT AND ARITHMETIC LOGIC OPERATION SYSTEM
20220405057 · 2022-12-22 · ·

According to one embodiment, in a semiconductor integrated circuit, the plurality of storage devices are arranged in a form of a plurality of rows. Each of the storage devices are configured to store a bit position value of a weight of multiple bits. The plurality of multiplication circuits are arranged in a form of a plurality of rows and are configured to multiply a plurality of input voltages by the weight of multiple bits to generate a plurality of multiplication results. The one or more capacitive devices are configured to accumulate charges corresponding to the plurality of multiplication results. The adder circuit are configured to generate an output voltage corresponding to the total value of the charges accumulated in the one or more capacitive devices. The plurality of input voltages have different amplitudes. Each of the input voltages is associated with a corresponding bit position of the weight.

SEMICONDUCTOR INTEGRATED CIRCUIT AND ARITHMETIC LOGIC OPERATION SYSTEM
20220405057 · 2022-12-22 · ·

According to one embodiment, in a semiconductor integrated circuit, the plurality of storage devices are arranged in a form of a plurality of rows. Each of the storage devices are configured to store a bit position value of a weight of multiple bits. The plurality of multiplication circuits are arranged in a form of a plurality of rows and are configured to multiply a plurality of input voltages by the weight of multiple bits to generate a plurality of multiplication results. The one or more capacitive devices are configured to accumulate charges corresponding to the plurality of multiplication results. The adder circuit are configured to generate an output voltage corresponding to the total value of the charges accumulated in the one or more capacitive devices. The plurality of input voltages have different amplitudes. Each of the input voltages is associated with a corresponding bit position of the weight.

ENHANCED DIGITAL SIGNAL PROCESSOR (DSP) NAND FLASH

A method and apparatus for systems and methods for digital signal processing (DSP) in a non-volatile memory (NVM) device comprising CMOS coupled to NVM die, of a data storage device. According to certain embodiments, one or more DSP calculations are provided by a controller to the CMOS components of the NVM, that configure one or more memory die to carry out atomic calculations on the data resident on the die. The results of calculations of each die are provided to an output latch for each die, back-propagating data back to the configured calculation portion as needed, otherwise forwarding the results to the controller. The controller aggregates the results of DSP calculations of each die and presents the results to the host system.

ENHANCED DIGITAL SIGNAL PROCESSOR (DSP) NAND FLASH

A method and apparatus for systems and methods for digital signal processing (DSP) in a non-volatile memory (NVM) device comprising CMOS coupled to NVM die, of a data storage device. According to certain embodiments, one or more DSP calculations are provided by a controller to the CMOS components of the NVM, that configure one or more memory die to carry out atomic calculations on the data resident on the die. The results of calculations of each die are provided to an output latch for each die, back-propagating data back to the configured calculation portion as needed, otherwise forwarding the results to the controller. The controller aggregates the results of DSP calculations of each die and presents the results to the host system.

Selecting an ith largest or a pth smallest number from a set of n m-bit numbers

A method of selecting, in hardware logic, an i.sup.th largest or a p.sup.th smallest number from a set of n m-bit numbers is described. The method is performed iteratively and in the r.sup.th iteration, the method comprises: summing an (m−r).sup.th bit from each of the m-bit numbers to generate a summation result and comparing the summation result to a threshold value. Depending upon the outcome of the comparison, the r.sup.th bit of the selected number is determined and output and additionally the (m−r−1).sup.th bit of each of the m-bit numbers is selectively updated based on the outcome of the comparison and the value of the (m−r).sup.th bit in the m-bit number. In a first iteration, a most significant bit from each of the m-bit numbers is summed and each subsequent iteration sums bits occupying successive bit positions in their respective numbers.

Selecting an ith largest or a pth smallest number from a set of n m-bit numbers

A method of selecting, in hardware logic, an i.sup.th largest or a p.sup.th smallest number from a set of n m-bit numbers is described. The method is performed iteratively and in the r.sup.th iteration, the method comprises: summing an (m−r).sup.th bit from each of the m-bit numbers to generate a summation result and comparing the summation result to a threshold value. Depending upon the outcome of the comparison, the r.sup.th bit of the selected number is determined and output and additionally the (m−r−1).sup.th bit of each of the m-bit numbers is selectively updated based on the outcome of the comparison and the value of the (m−r).sup.th bit in the m-bit number. In a first iteration, a most significant bit from each of the m-bit numbers is summed and each subsequent iteration sums bits occupying successive bit positions in their respective numbers.

Neural network apparatus and method of processing variable-resolution operation by the same

A neural network apparatus that is configured to process an operation includes neural network circuitry configured to receive a first input of an n-bit activation, store a second input of an m-bit weight, perform a determination whether to perform an operation on an i.sup.th bit of the first input and a j.sup.th bit of the second input, output an operation value of an operation performed on the i.sup.th bit of the first input and the j.sup.th bit of the second input based on the determination, and produce an operation value of the operation based on the determination.