G06F7/501

Determining sums using logic circuits
11714448 · 2023-08-01 · ·

A logic circuit comprising: inputs for receiving multiple n-bit numbers, n being greater than one; and an adder capable of receiving m n-bit numbers, m being greater than one, and forming an output representing the sum of those numbers, the adder having a plurality of single-bit stages and being configured to form the sum by subjecting successive bits of each of the numbers to an operation in a respective one of the single-bit stages, the single-bit stages being such that the adder has insufficient capacity to add all possible combinations of bits in a respective bit position of m n-bit numbers; the addition circuit being configured to add the multiple n-bit numbers by: in the adder, adding a first one of the n-bit numbers to a value corresponding to a set of non-consecutive bits of another of the n-bit numbers to form a first intermediate value; adding the first intermediate value to a value corresponding to the bits of the said other of the n-bit numbers other than those in the said set to form a sum; and outputting the sum.

MEMORY ARRAY WITH PROGRAMMABLE NUMBER OF FILTERS

Aspects of the present disclosure are directed to devices and methods for performing MAC operations using a memory array as a compute-in-memory (CIM) device that can enable higher computational throughput, higher performance and lower energy consumption compared to computation using a processor outside of a memory array. In some embodiments, an activation architecture is provided using a bit cell array arranged in rows and columns to store charges that represent a weight value in a weight matrix. A read word line (RWL) may be repurposed to provide the input activation value to bit cells within a row of bit cells, while a read-bit line (RBL) is configured to receive multiplication products from bit cells arranged in a column. Some embodiments provide multiple sub-arrays or tiles of bit cell arrays.

MEMORY ARRAY WITH PROGRAMMABLE NUMBER OF FILTERS

Aspects of the present disclosure are directed to devices and methods for performing MAC operations using a memory array as a compute-in-memory (CIM) device that can enable higher computational throughput, higher performance and lower energy consumption compared to computation using a processor outside of a memory array. In some embodiments, an activation architecture is provided using a bit cell array arranged in rows and columns to store charges that represent a weight value in a weight matrix. A read word line (RWL) may be repurposed to provide the input activation value to bit cells within a row of bit cells, while a read-bit line (RBL) is configured to receive multiplication products from bit cells arranged in a column. Some embodiments provide multiple sub-arrays or tiles of bit cell arrays.

COMPUTE-IN-MEMORY SYSTEMS AND METHODS WITH CONFIGURABLE INPUT AND SUMMING UNITS
20230022516 · 2023-01-26 ·

A device includes a multiplication unit and a configurable summing unit. The multiplication unit is configured to receive data and weights for an Nth layer, where N is a positive integer. The multiplication unit is configured to multiply the data by the weights to provide multiplication results. The configurable summing unit is configured by Nth layer values to receive an Nth layer number of inputs and perform an Nth layer number of additions, and to sum the multiplication results and provide a configurable summing unit output.

COMPUTE-IN-MEMORY SYSTEMS AND METHODS WITH CONFIGURABLE INPUT AND SUMMING UNITS
20230022516 · 2023-01-26 ·

A device includes a multiplication unit and a configurable summing unit. The multiplication unit is configured to receive data and weights for an Nth layer, where N is a positive integer. The multiplication unit is configured to multiply the data by the weights to provide multiplication results. The configurable summing unit is configured by Nth layer values to receive an Nth layer number of inputs and perform an Nth layer number of additions, and to sum the multiplication results and provide a configurable summing unit output.

PROCESSING-IN-MEMORY(PIM) DEVICE
20230025899 · 2023-01-26 · ·

A PIM device includes a memory/arithmetic region including a plurality of memory banks and a plurality of MAC operators, the plurality of MAC operators including a first MAC operator, a peripheral region including a data input/output circuit, and a global data input/output (GIO) line capable of providing a data transmission path between the peripheral region and the memory/arithmetic region. The first MAC operator is configured to perform an EWM operation by performing a multiplication operation on first input data and second input data that are transmitted from first and second memory banks of the plurality of memory banks, respectively, to generate multiplication result data and transmitting the multiplication result data to a third memory bank. While the EWM operation is being performed, data transmission through the GIO line between the peripheral region and the memory/arithmetic region is blocked.

PROCESSING-IN-MEMORY(PIM) DEVICE
20230025899 · 2023-01-26 · ·

A PIM device includes a memory/arithmetic region including a plurality of memory banks and a plurality of MAC operators, the plurality of MAC operators including a first MAC operator, a peripheral region including a data input/output circuit, and a global data input/output (GIO) line capable of providing a data transmission path between the peripheral region and the memory/arithmetic region. The first MAC operator is configured to perform an EWM operation by performing a multiplication operation on first input data and second input data that are transmitted from first and second memory banks of the plurality of memory banks, respectively, to generate multiplication result data and transmitting the multiplication result data to a third memory bank. While the EWM operation is being performed, data transmission through the GIO line between the peripheral region and the memory/arithmetic region is blocked.

SYSTOLIC ARRAY WITH INPUT REDUCTION TO MULTIPLE REDUCED INPUTS

Systems and methods are provided to perform multiply-accumulate operations of reduced precision numbers in a systolic array. Each row of the systolic array can receive reduced inputs from a respective reducer. The reducer can receive a particular input and generate multiple reduced inputs from the input. The reduced inputs can include reduced input data elements and/or a reduced weights. The systolic array may lack support for inputs with a first bit-length and the reducers may reduce the bit-length of a given input from the first bit-length to a second shorter bit-length and provide multiple reduced inputs with second shorter bit-length to the array. The systolic array may perform multiply-accumulate operations on each unique combination of the multiple reduced input data elements and the reduced weights to generate multiple partial outputs. The systolic array may sum the partial outputs to generate the output.

SYSTOLIC ARRAY WITH INPUT REDUCTION TO MULTIPLE REDUCED INPUTS

Systems and methods are provided to perform multiply-accumulate operations of reduced precision numbers in a systolic array. Each row of the systolic array can receive reduced inputs from a respective reducer. The reducer can receive a particular input and generate multiple reduced inputs from the input. The reduced inputs can include reduced input data elements and/or a reduced weights. The systolic array may lack support for inputs with a first bit-length and the reducers may reduce the bit-length of a given input from the first bit-length to a second shorter bit-length and provide multiple reduced inputs with second shorter bit-length to the array. The systolic array may perform multiply-accumulate operations on each unique combination of the multiple reduced input data elements and the reduced weights to generate multiple partial outputs. The systolic array may sum the partial outputs to generate the output.

CIRCUIT FOR HANDLING PROCESSING WITH OUTLIERS

A system and method for handling processing with outliers. In some embodiments, the method includes: reading a first activation and a second activation, each including a least significant part and a most significant part, multiplying a first weight and a second weight by the respective activations, the multiplying of the first weight by the first activation including multiplying the first weight by the least significant part of the first activation in a first multiplier, the multiplying of the second weight by the second activation including: multiplying the second weight by the least significant part of the second activation in a second multiplier, and multiplying the second weight by the most significant part of the second activation in a shared multiplier, the shared multiplier being associated with a plurality of rows of an array of activations.