Patent classifications
G06F7/53
PARALLEL MATRIX MULTIPLICATION TECHNIQUE OPTIMIZED FOR MEMORY FETCHES
A matrix multiplication circuit comprises a memory storage device, processing circuitry, a parallel multiply circuit, and buffer circuits. The parallel multiply circuit simultaneously performs a count of multiplies in a parallel multiplication operation. The buffer circuits include prefetch buffer circuits each having a storage array dimension corresponding to the count of multiplies in the parallel multiplication operation. The processing circuitry loads a first prefetch buffer circuit with values from the first matrix; fetches a value of the second matrix and, in parallel with the fetch, preload the second prefetch buffer circuit with another value from the first matrix; initiates a parallel multiply of the fetched value of the second matrix and the values in the first prefetch buffer circuit; and stores partial product results of the parallel multiply, including adding a current partial product result to a previously stored partial product result.
PARALLEL MATRIX MULTIPLICATION TECHNIQUE OPTIMIZED FOR MEMORY FETCHES
A matrix multiplication circuit comprises a memory storage device, processing circuitry, a parallel multiply circuit, and buffer circuits. The parallel multiply circuit simultaneously performs a count of multiplies in a parallel multiplication operation. The buffer circuits include prefetch buffer circuits each having a storage array dimension corresponding to the count of multiplies in the parallel multiplication operation. The processing circuitry loads a first prefetch buffer circuit with values from the first matrix; fetches a value of the second matrix and, in parallel with the fetch, preload the second prefetch buffer circuit with another value from the first matrix; initiates a parallel multiply of the fetched value of the second matrix and the values in the first prefetch buffer circuit; and stores partial product results of the parallel multiply, including adding a current partial product result to a previously stored partial product result.
WEIGHT STATIONARY IN-MEMORY-COMPUTING NEURAL NETWORK ACCELERATOR WITH LOCALIZED DATA MULTIPLEXING
Systems, apparatuses, and methods include technology that identifies that a first memory cell of a plurality of memory cells stores data that is associated with a multiply-accumulate operation. The plurality of memory cells is associated with a multiply-accumulator (MAC). The technology executes a connection operation to electrically connect the first memory cell to the MAC to execute the multiply-accumulate operation. A second memory cell of the plurality of memory cells is electrically disconnected from the MAC during the multiply-accumulate operation. The technology executes, with the MAC, the multiply-accumulate operation based on the data.
Multiple mode arithmetic circuit
A tile of an FPGA includes a multiple mode arithmetic circuit. The multiple mode arithmetic circuit is configured by control signals to operate in an integer mode, a floating-point mode, or both. In some example embodiments, multiple integer modes (e.g., unsigned, two's complement, and sign-magnitude) are selectable, multiple floating-point modes (e.g., 16-bit mantissa and 8-bit sign, 8-bit mantissa and 6-bit sign, and 6-bit mantissa and 6-bit sign) are supported, or any suitable combination thereof. The tile may also fuse a memory circuit with the arithmetic circuits. Connections directly between multiple instances of the tile are also available, allowing multiple tiles to be treated as larger memories or arithmetic circuits. By using these connections, referred to as cascade inputs and outputs, the input and output bandwidth of the arithmetic circuit is further increased.
PROCESSING ELEMENT AND NEURAL PROCESSING DEVICE INCLUDING SAME
The present disclosure discloses a processing element and a neural processing device including the processing element. The processing element includes a weight register configured to store a weight, an input activation register configured to store an input activation, a flexible multiplier configured to receive a first sub-weight of a first precision included in the weight, receive a first sub-input activation of the first precision included in the input activation, and generate result data by performing multiplication calculation of the first sub-weight and the first sub-input activation as the first precision or a second precision different from the first precision according to the first sub-weight and the first sub-input activation and a saturating adder configured to generate a partial sum by using the result data.
Device and method for accelerating matrix multiply operations
A processing device is provided which comprises memory configured to store data and a plurality of processor cores in communication with each other via first and second hierarchical communication links. Processor cores of a first hierarchical processor core group are in communication with each other via the first hierarchical communication links and are configured to store, in the memory, a sub-portion of data of a first matrix and a sub-portion of data of a second matrix. The processor cores are also configured to determine a product of the sub-portion of data of the first matrix and the sub-portion of data of the second matrix, receive, from another processor core, another sub-portion of data of the second matrix and determine a product of the sub-portion of data of the first matrix and the other sub-portion of data of the second matrix.
MULTI-PARTITIONING DATA FOR COMBINATION OPERATIONS
Systems and methods are disclosed for processing and executing queries against one or more dataset. As part of processing the query, the system determines whether the query is susceptible to a significantly imbalanced partition. In the event, the query is susceptible to an imbalanced partition, the system monitors the query and determines whether to perform a multi-partitioning determination to avoid a significantly imbalanced partition.
DEVICE AND METHOD FOR MULTIPLICATION FOR IMPEDING SIDE-CHANNEL ATTACKS
A device for multiplying two bit sequences has a controller that selects and activates exactly one multiplier unit from a plurality of parallel multiplier units, according to a random signal. A partial multiplier unit shared by all the multiplier units receives and multiplies operands formed by the respectively activated multiplier unit. Each multiplier unit implements a different multiplication method with a respective selector unit that selects segments of the bit sequences to be multiplied, in accordance with a selection plan adapted to the respective multiplication method, to form operands from one or more segments and outputs the operands. The respective accumulation unit receives step by step partial products from the partial multiplier unit, accumulates the partial products in accordance with an accumulation plan adapted to the implemented multiplication method and matching the selection plan, and outputs the calculated product of after accumulation has been completed.
Methods and apparatus to estimate population reach from different marginal ratings and/or unions of marginal ratings based on impression data
Example methods, apparatus, and articles of manufacture are disclosed to estimate population reach. An example apparatus includes processor circuitry to determine first multipliers corresponding to a panelist impression count and panelist audience size totals of at least one of a first margin of media, a second margin of the media, or a union of the first margin and the second margin, the first margin, the second margin, and the union included in a tree association; concurrently determine second multipliers using the tree association and the first multipliers; determine third multipliers corresponding to a total audience size exposed to the media at at least one of the first margin, the second margin, or the union based on the tree association using database proprietor impression totals; and determine, based on the third multipliers, an estimate for the population reach of the media for at least one of the first margin, the second margin, or the union.
Methods and apparatus to estimate population reach from different marginal ratings and/or unions of marginal ratings based on impression data
Example methods, apparatus, and articles of manufacture are disclosed to estimate population reach. An example apparatus includes processor circuitry to determine first multipliers corresponding to a panelist impression count and panelist audience size totals of at least one of a first margin of media, a second margin of the media, or a union of the first margin and the second margin, the first margin, the second margin, and the union included in a tree association; concurrently determine second multipliers using the tree association and the first multipliers; determine third multipliers corresponding to a total audience size exposed to the media at at least one of the first margin, the second margin, or the union based on the tree association using database proprietor impression totals; and determine, based on the third multipliers, an estimate for the population reach of the media for at least one of the first margin, the second margin, or the union.