Patent classifications
G06F7/5275
COMBINED DIVIDE/SQUARE ROOT PROCESSING CIRCUITRY AND METHOD
An apparatus comprises combined divide/square root processing circuitry to perform, in response to a divide instruction, a given radix-64 iteration of a radix-64 divide operation, and in response to a square root instruction, a given radix-64 iteration of a radix-64 square root operation; in which: the combined divide/square root processing circuitry comprises shared circuitry to generate at least one output value for the given radix-64 iteration on a same data path used for both the radix-64 divide operation and the radix-64 square root operation.
Device for computing the inner product of vectors
A device for computing the inner product of vectors includes a vector data arranger, a vector data pre-accumulator, a number converter, and a post-accumulator. The vector data arranger stores a first vector and sequentially outputs a plurality of vector data based on the first vector. The vector data pre-accumulator stores a second vector, receives each of the vector data, and pre-accumulates the second vector, so as to generate a plurality accumulation results. The number converter and the post-accumulator receive and process all the accumulation results corresponding to each of the vector data to generate an inner product value. The present invention implements a lookup table with the vector data pre-accumulator and the number converter to increase calculation speed and reduce power consumption.
BOOTH MULTIPLIER FOR COMPUTE-IN-MEMORY
A compute-in-memory device may include a Booth encoder configured to receive at least one input of first bits, a Booth decoder configured to receive at least one weight of second bits and to output a plurality of partial products of the at least one input and the at least one weight, an adder configured to add a first partial product of the plurality of the partial products and a second partial product of the plurality of partial products before the Booth decoder generates a third partial product of the plurality of the partial products and to generate a plurality of sums of partial products, and a carry-lookahead adder configured to add the plurality of sums of partial products and to generate a final sum.
MEMORY DEVICE AND OPERATION METHOD THEREOF
A memory device and an operation method thereof are provided. The memory device includes: a memory array including a plurality of memory cells for storing a plurality of weights; a multiplication circuit for performing bitwise multiplication on a plurality of input data and the weights to generate a plurality of multiplication results, wherein in performing bitwise multiplication, the memory cells generate a plurality of memory cell currents; a digital accumulating circuit for performing a digital accumulating on the multiplication results; an analog accumulating circuit for performing an analog accumulating on the memory cell currents to generate a first MAC operation result; and a decision unit for deciding whether to perform the analog accumulating; the digital accumulating or a hybrid accumulating, wherein in performing the hybrid accumulating, whether the digital accumulating circuit is triggered is based on the first MAC operation result.
DEVICE FOR COMPUTING AN INNER PRODUCT OF VECTORS
A device for computing an inner product of vectors includes a vector data arranger, a vector data pre-accumulator, a number converter, and a post-accumulator. The vector data arranger stores a first vector and sequentially outputs a plurality of vector data based on the first vector. The vector data pre-accumulator stores a second vector, receives each of the vector data, and pre-accumulates the second vector, so as to generate a plurality accumulation results. The number converter and the post-accumulator receive and process all the accumulation results corresponding to each of the vector data to generate an inner product value. The present invention implements a lookup table with the vector data pre-accumulator and the number converter to increase calculation speed and reduce power consumption.
Apparatus and method for performing an index operation
An apparatus and method are provided for performing an index operation. The apparatus has vector processing circuitry to perform an index operation in each of a plurality of lanes of parallel processing. The index operation requires an index value opm to be multiplied by a multiplier value e to produce a multiplication result. The number of lanes of parallel processing is dependent on a specified element size, and the multiplier value is different, but known, for each lane of parallel processing. The vector processing circuitry comprises mapping circuitry to perform, within each lane, mapping operations on the index value opm in order to generate a plurality of intermediate input values. The plurality of intermediate input values are such that the addition of the plurality of intermediate input values produces the multiplication result. Within each lane the mapping operations are determined by the multiplier value used for that lane. The vector processing circuitry also has vector adder circuitry to perform, within each lane, an addition of at least the plurality of intermediate input values, in order to produce a result vector providing a result value for the index operation performed in each lane. This provides a high performance, low latency, technique for vectorising index operations.
Systems and Methods for Low Latency Modular Multiplication
An integrated circuit device includes multiplier circuitry configured to determine a plurality of columns of subproducts by multiplying a plurality of values. Each column of the plurality of columns includes one or more subproducts of a plurality of subproducts. The integrated circuit device also includes adder circuitry configured to determine a plurality of sums, each sum being a sum of one column of the plurality of columns. A first portion of the adder circuitry associated with a first column of the plurality of columns is configured to receive a first value and second value that are associated with the first column and a third value associated with a second column of the plurality of columns that differs from the first column. The third value is a carry-out value generated by a second portion of the adder circuitry associated with the second column of the plurality of columns.
APPARATUS AND METHOD FOR PERFORMING AN INDEX OPERATION
An apparatus and method are provided for performing an index operation. The apparatus has vector processing circuitry to perform an index operation in each of a plurality of lanes of parallel processing. The index operation requires an index value opm to be multiplied by a multiplier value e to produce a multiplication result. The number of lanes of parallel processing is dependent on a specified element size, and the multiplier value is different, but known, for each lane of parallel processing. The vector processing circuitry comprises mapping circuitry to perform, within each lane, mapping operations on the index value opm in order to generate a plurality of intermediate input values. The plurality of intermediate input values are such that the addition of the plurality of intermediate input values produces the multiplication result. Within each lane the mapping operations are determined by the multiplier value used for that lane. The vector processing circuitry also has vector adder circuitry to perform, within each lane, an addition of at least the plurality of intermediate input values, in order to produce a result vector providing a result value for the index operation performed in each lane. This provides a high performance, low latency, technique for vectorising index operations.
Computing and summing up multiple products in a single multiplier
Methods, systems and computer program products for computing and summing up multiple products in a single multiplier are provided. Aspects include receiving a first number and a second number, creating partial products of the first number and the second number based on a multiplication of the first number and the second number, and reducing the number of partial products to create an intermediate result. Aspects also include receiving a third number and a fourth number, creating partial products of the third number and the fourth number based on a multiplication of the third number and the fourth number, creating a reduction tree and adding the intermediate result to the reduction tree. Aspects further include reducing the number of partial products in the reduction tree to create a second sum value and a second carry value and adding the second sum value and the second carry value to create a result.
Systems and methods for low latency modular multiplication
An integrated circuit device includes multiplier circuitry configured to determine a plurality of columns of subproducts by multiplying a plurality of values. Each column of the plurality of columns includes one or more subproducts of a plurality of subproducts. The integrated circuit device also includes adder circuitry configured to determine a plurality of sums, each sum being a sum of one column of the plurality of columns. A first portion of the adder circuitry associated with a first column of the plurality of columns is configured to receive a first value and second value that are associated with the first column and a third value associated with a second column of the plurality of columns that differs from the first column. The third value is a carry-out value generated by a second portion of the adder circuitry associated with the second column of the plurality of columns.