Patent classifications
G06F7/44
GENERALIZED ACCELERATION OF MATRIX MULTIPLY ACCUMULATE OPERATIONS
A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A dot product operation includes the steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products based on the exponents associated with each element of the first vector and each element of the second vector; and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder.
Generalized acceleration of matrix multiply accumulate operations
A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A dot product operation includes the steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products based on the exponents associated with each element of the first vector and each element of the second vector; and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder.
Generalized acceleration of matrix multiply accumulate operations
A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A dot product operation includes the steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products based on the exponents associated with each element of the first vector and each element of the second vector; and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder.
CONFIGURABLE POOLING PROCESSING UNIT FOR NEURAL NETWORK ACCELERATOR
A hardware implementation of a configurable pooling processing unit is configured to receive an input tensor comprising at least one channel, each channel of the at least one channel comprising a plurality of tensels; receive control information identifying one operation of a plurality of selectable operations to be performed on the input tensor, the plurality of selectable operations comprising a depth-wise convolution operation and one or more pooling operations; perform the identified operation on the input tensor to generate an output tensor by performing one or more operations on blocks of tensels of each channel of the at least one channel of the input tensor; and output the output tensor.
CONFIGURABLE POOLING PROCESSING UNIT FOR NEURAL NETWORK ACCELERATOR
A hardware implementation of a configurable pooling processing unit is configured to receive an input tensor comprising at least one channel, each channel of the at least one channel comprising a plurality of tensels; receive control information identifying one operation of a plurality of selectable operations to be performed on the input tensor, the plurality of selectable operations comprising a depth-wise convolution operation and one or more pooling operations; perform the identified operation on the input tensor to generate an output tensor by performing one or more operations on blocks of tensels of each channel of the at least one channel of the input tensor; and output the output tensor.
MEMORY CELL BASED ON EDRAM AND CIM COMPRISING THE SAME
A memory cell comprises: a weight storage circuit configured, when a write word line is activated, to receive a weight voltage, according to a weight value to be stored through a write bit line, and transmit the weight voltage to a storage node, and, when a read word line is activated, to drop a read voltage precharged, according to a voltage level of the storage node, to a voltage level of the read word line; and a MAC operation circuit configured, when a data enable line is activated, to transmit an input voltage according to a value of input data to a coupling node through a data input line, and, to discharge the coupling node according to a level of the weight voltage stored in the storage node, and, when the data enable line is reactivated, to transmit a voltage change of the coupling node to a multiply word line.
Cryptographic processing device and method for cryptographically processing data
A cryptographic processing device for cryptographically processing data, having a memory configured to store a first operand and a second operand represented by the data to be cryptographically processed, wherein the first operand and the second operand each correspond to an indexed array of data words, and a cryptographic processor configured to determine, for cryptographically processing the data, a product of the first operand with the second operand by accumulating results of partial multiplications, each partial multiplication comprising the multiplication of a data word of the first operand with a data word of the second operand wherein the cryptographic processor is configured to perform the partial multiplications in successive blocks of partial multiplications, each block being associated with a result index range and a first operand index range and each block comprising all partial multiplications between data words of the first operand within the first operand index range with data words of the second operand such that a sum of indices of the data word of the first operand and of the data word of the second operand is within the result index range.
HIERARCHICAL AND SHARED EXPONENT FLOATING POINT DATA TYPES
Embodiments of the present disclosure include systems and methods for providing hierarchical and shared exponent floating point data types. First and second shared exponent values are determined based on exponent values of a plurality of floating point values. A third shared exponent value is determined based the first shared exponent value and the second shared exponent value. First and second difference values are determined based on the first shared exponent value, the second shared exponent value, and the third shared exponent value. Sign values and mantissa values are determined for the plurality of floating point values. The sign value and the mantissa value for each floating point value in the plurality of floating point values, the third shared exponent value, the first difference value, and the second difference value are stored in a data structure for a shared exponent floating point data type.
Differential mixed signal multiplier with three capacitors
A differential mixed-signal logic processor is provided. The differential mixed-signal logic processor includes a plurality of mixed-signal multiplier branches for multiplication of an analog value A and a N-bit digital value B. Each of the plurality of mixed-signal multiplier branches include a first capacitor connected across a second capacitor and a third capacitor to provide a differential output across the second and third capacitors. A capacitance of the first capacitor is equal to half a capacitance of the second and third capacitors.
Differential mixed signal multiplier with three capacitors
A differential mixed-signal logic processor is provided. The differential mixed-signal logic processor includes a plurality of mixed-signal multiplier branches for multiplication of an analog value A and a N-bit digital value B. Each of the plurality of mixed-signal multiplier branches include a first capacitor connected across a second capacitor and a third capacitor to provide a differential output across the second and third capacitors. A capacitance of the first capacitor is equal to half a capacitance of the second and third capacitors.