Patent classifications
G06F7/53
SYSTEMS AND METHODS FOR CALCULATING LARGE POLYNOMIAL MULTIPLICATIONS
This disclosure is directed to multiplier circuitry that includes a multiplier that is configurable to generate a plurality of subproducts by performing a plurality of multiplication operations involving values having a first precision using a recursive multiplication process in which a second multiplier of the multiplier performs a second plurality of multiplication operations involving values having a second precision that are derived from the values having the first precision.
SYSTEMS AND METHODS FOR DATA PLACEMENT FOR IN-MEMORY-COMPUTE
According to one embodiment, a memory module includes: a memory die including a dynamic random access memory (DRAM) banks, each including: an array of DRAM cells arranged in pages; a row buffer to store values of one of the pages; an input/output (IO) module; and an in-memory compute (IMC) module including: an arithmetic logic unit (ALU) to receive operands from the row buffer or the IO module and to compute an output based on the operands and one of a plurality of ALU operations; and a result register to store the output of the ALU; and a controller to: receive, from a host processor, operands and an instruction; determine, based on the instruction, a data layout; supply the operands to the DRAM banks in accordance with the data layout; and control an IMC module to perform one of the ALU operations on the operands in accordance with the instruction.
COMPUTATIONAL MEMORY
A processing device includes a two-dimensional array of processing elements, each processing element including an arithmetic logic unit to perform an operation. The device further includes interconnections among the two-dimensional array of processing elements to provide direct communication among neighboring processing elements of the two-dimensional array of processing elements. A processing element of the two-dimensional array of processing elements is connected to a first neighbor processing element that is immediately adjacent the processing element in a first dimension of the two-dimensional array. The processing element is further connected to a second neighbor processing element that is immediately adjacent the processing element in a second dimension of the two-dimensional array.
Binary parallel adder and multiplier
An arithmetic logic unit (ALU) including a binary, parallel adder and multiplier to perform arithmetic operations is described. The ALU includes an adder circuit coupled to a multiplexer to receive input operands that are directed to either an addition operation or a multiplication operation. During the multiplication operation, the ALU is configured to determine partial product operands based on first and second operands and provide the partial product operands to the adder circuit via the multiplexer, and the adder circuit is configured to provide an output having a value equal to a product of the first operand second operands. During an addition operation, the ALU is configured to provide the first and second operands to the adder circuit via the multiplexer, and the adder circuit is configured to provide the output having a value equal to a sum of the first and second operands.
Binary parallel adder and multiplier
An arithmetic logic unit (ALU) including a binary, parallel adder and multiplier to perform arithmetic operations is described. The ALU includes an adder circuit coupled to a multiplexer to receive input operands that are directed to either an addition operation or a multiplication operation. During the multiplication operation, the ALU is configured to determine partial product operands based on first and second operands and provide the partial product operands to the adder circuit via the multiplexer, and the adder circuit is configured to provide an output having a value equal to a product of the first operand second operands. During an addition operation, the ALU is configured to provide the first and second operands to the adder circuit via the multiplexer, and the adder circuit is configured to provide the output having a value equal to a sum of the first and second operands.
DIGITAL IN-MEMORY COMPUTING MACRO BASED ON APPROXIMATE ARITHMETIC HARDWARE
Various embodiments described herein provide for a digital In-Memory Computing (IMC) macro circuit that utilizes approximate arithmetic hardware to reduce the number of transistors and devices in the circuit relative to a convention digital IMC, thereby improving the area-efficiency of the digital IMC, but while retaining the benefits of reduced variability relative to an analog-mixed-signal (AMS) circuit. The proposed digital IMC macro circuit also includes custom full adder (FA) circuits with pass gate logic in a ripple carry adder (RCA) tree. The disclosed digital IMC macro circuit can also perform a vector-matrix dot product in one cycle while achieving high energy and area efficiency.
Method and apparatus for dual issue multiply instructions
A method is provided that includes performing, by a processor in response to a dual issue multiply instruction, multiplication of operands of the dual issue multiply instruction using multiplication units comprised in a data path of the processor and configured to operate together to determine a product of the operands, and storing, by the processor, the product in a storage location indicated by the dual issue multiply instruction.
Method and apparatus for dual issue multiply instructions
A method is provided that includes performing, by a processor in response to a dual issue multiply instruction, multiplication of operands of the dual issue multiply instruction using multiplication units comprised in a data path of the processor and configured to operate together to determine a product of the operands, and storing, by the processor, the product in a storage location indicated by the dual issue multiply instruction.
SEMICONDUCTOR DEVICE AND METHOD OF CONTROLLING THE SEMICONDUCTOR DEVICE
A semiconductor device includes a dynamic reconfiguration processor that performs data processing for input data sequentially input and outputs the results of data processing sequentially as output data, an accelerator including a parallel arithmetic part that performs arithmetic operation in parallel between the output data from the dynamic reconfiguration processor and each of a plurality of predetermined data, and a data transfer unit that selects the plurality of arithmetic operation results by the accelerator in order and outputs them to the dynamic reconfiguration processor.
Multiple Mode Arithmetic Circuit
A tile of an FPGA includes a multiple mode arithmetic circuit. The multiple mode arithmetic circuit is configured by control signals to operate in an integer mode, a floating-point mode, or both. In some example embodiments, multiple integer modes (e.g., unsigned, two's complement, and sign-magnitude) are selectable, multiple floating-point modes (e.g., 16-bit mantissa and 8-bit sign, 8-bit mantissa and 6-bit sign, and 6-bit mantissa and 6-bit sign) are supported, or any suitable combination thereof. The tile may also fuse a memory circuit with the arithmetic circuits. Connections directly between multiple instances of the tile are also available, allowing multiple tiles to be treated as larger memories or arithmetic circuits. By using these connections, referred to as cascade inputs and outputs, the input and output bandwidth of the arithmetic circuit is further increased.