G06F7/5277

Matrix multiplication system, apparatus and method
11194549 · 2021-12-07 · ·

The present disclosure advantageously provides a system, matrix multiply accelerator (MMA) and method for efficiently multiplying matrices. The MMA includes a vector register to store the row vectors of one input matrix, a vector register to store the column vectors of another input matrix, a vector register to store an output matrix, and an array of vector multiply and accumulate (VMAC) units coupled to the vector registers. Each VMAC unit is coupled to at least two row vector signal lines and at least two column vector signal lines, and is configured to calculate the dot product for one element i,j of the output matrix by multiplying each row vector formed from the i.sup.th row of the first matrix with a corresponding column vector formed from the j.sup.th column of the second matrix to generate intermediate products, and accumulate the intermediate products into a scalar value.

MULTIPLE MULTIPLICATION ARRAYS

A data processing apparatus is provided. An A×B multiplier array has a group of logic gates clocked by a first clock signal, where A and B are both integers. A C×D multiplier array, separate from the A×B multiplier array, has second group of logic gates clocked by a second clock signal, where C and D are both integers. Addition circuitry performs an addition operation between a first at least partial product produced by the A×B multiplier array and a second at least partial product produced by the C×D multiplier array.

Multiplier circuit

A multiplier circuit is described in which sub-products calculated in a first stage of a carry-save adder (CSA) network are output early, processed by applying a processing function, and re-injected into a subsequent stage of the CSA network to add the processed sub-products. This allows a CSA network provided for multiplication operations to be reused for operations which require sub-products to be processed and added, such as floating-point dot product operations performed on floating-point values represented in bfloatl6 format.

Matrix Multiplication System, Apparatus and Method
20210124560 · 2021-04-29 · ·

The present disclosure advantageously provides a system, matrix multiply accelerator (MMA) and method for efficiently multiplying matrices. The MMA includes a vector register to store the row vectors of one input matrix, a vector register to store the column vectors of another input matrix, a vector register to store an output matrix, and an array of vector multiply and accumulate (VMAC) units coupled to the vector registers. Each VMAC unit is coupled to at least two row vector signal lines and at least two column vector signal lines, and is configured to calculate the dot product for one element i,j of the output matrix by multiplying each row vector formed from the i.sup.th row of the first matrix with a corresponding column vector formed from the j.sup.th column of the second matrix to generate intermediate products, and accumulate the intermediate products into a scalar value.

Systems and Methods for Low Latency Modular Multiplication
20210117157 · 2021-04-22 ·

An integrated circuit device includes multiplier circuitry configured to determine a plurality of columns of subproducts by multiplying a plurality of values. Each column of the plurality of columns includes one or more subproducts of a plurality of subproducts. The integrated circuit device also includes adder circuitry configured to determine a plurality of sums, each sum being a sum of one column of the plurality of columns. A first portion of the adder circuitry associated with a first column of the plurality of columns is configured to receive a first value and second value that are associated with the first column and a third value associated with a second column of the plurality of columns that differs from the first column. The third value is a carry-out value generated by a second portion of the adder circuitry associated with the second column of the plurality of columns.

MULTIPLIER CIRCUIT

A multiplier circuit is described in which sub-products calculated in a first stage of a carry-save adder (CSA) network are output early, processed by applying a processing function, and re-injected into a subsequent stage of the CSA network to add the processed sub-products. This allows a CSA network provided for multiplication operations to be reused for operations which require sub-products to be processed and added, such as floating-point dot product operations performed on floating-point values represented in bfloatl6 format.

IN-MEMORY COMPUTING (IMC) PROCESSOR AND OPERATING METHOD OF IMC PROCESSOR

An in-memory computing (IMC) processor includes IMC macros, and includes a static random access memory (SRAM) IMC device including the plurality of IMC macros, and configured to perform a multiply and accumulate (MAC) operation between input data and first weight data of a first weight map applied to a first of IMC macros in a first direction in which an input feature map including the input data is written to the first IMC macro, and a two-dimensional (2D) shift accumulator configured to perform a shift operation on partial sums corresponding to respective MAC operation results of the IMC macros and accumulate a result of the shift operation.

Systems and methods for low latency modular multiplication

An integrated circuit device includes multiplier circuitry configured to determine a plurality of columns of subproducts by multiplying a plurality of values. Each column of the plurality of columns includes one or more subproducts of a plurality of subproducts. The integrated circuit device also includes adder circuitry configured to determine a plurality of sums, each sum being a sum of one column of the plurality of columns. A first portion of the adder circuitry associated with a first column of the plurality of columns is configured to receive a first value and second value that are associated with the first column and a third value associated with a second column of the plurality of columns that differs from the first column. The third value is a carry-out value generated by a second portion of the adder circuitry associated with the second column of the plurality of columns.