Patent classifications
G06F7/5318
MULTIPLIER FOR FLOATING-POINT OPERATION, METHOD, INTEGRATED CIRCUIT CHIP, AND CALCULATION DEVICE
The present disclosure relates to a multiplier, a method, an integrated circuit chip, and a computation apparatus for a floating-point computation. The computation apparatus may be included in a combined processing apparatus, which may also include a general interconnection interface and other processing apparatus. The computation apparatus interacts with other processing apparatus to jointly complete computation operations specified by the user. The combined processing apparatus may also include a storage apparatus, which is respectively connected to the computation apparatus and other processing apparatus and is used for storing data of the computation apparatus and other processing apparatus. Solutions of the present disclosure may be widely used in various floating-point data computations
SYSTEMS AND METHODS FOR DATA PLACEMENT FOR IN-MEMORY-COMPUTE
According to one embodiment, a memory module includes: a memory die including a dynamic random access memory (DRAM) banks, each including: an array of DRAM cells arranged in pages; a row buffer to store values of one of the pages; an input/output (IO) module; and an in-memory compute (IMC) module including: an arithmetic logic unit (ALU) to receive operands from the row buffer or the IO module and to compute an output based on the operands and one of a plurality of ALU operations; and a result register to store the output of the ALU; and a controller to: receive, from a host processor, operands and an instruction; determine, based on the instruction, a data layout; supply the operands to the DRAM banks in accordance with the data layout; and control an IMC module to perform one of the ALU operations on the operands in accordance with the instruction.
PIPELINES FOR POWER AND AREA SAVINGS AND FOR HIGHER PARALLELISM
A device including: a first adder having first adder inputs and first adder outputs; a first register having first register inputs and first register outputs, the first register inputs coupled to the first adder outputs; a second register having second register inputs and second register outputs, the second register inputs coupled to the first adder outputs; and a second adder having second adder inputs and second adder outputs and configured to receive register output signals from the first register outputs and the second register outputs. Wherein, the first adder is configured to calculate a first sum of a first input value and a second input value, and the first register is configured to store the first sum, and the first adder is configured to calculate a second sum of a third input value and a fourth input value, and the second register is configured to store the second sum.
IN-MEMORY COMPUTATION CIRCUIT AND METHOD
A memory circuit includes a selection circuit, a column of memory cells, and an adder tree. The selection circuit is configured to receive input data elements, each input data element including a number of bits equal to H, and output a selected set of kth bits of the H bits of the input data elements. Each memory cell of the column of memory cells includes a first storage unit configured to store a first weight data element and a first multiplier configured to generate a first product data element based on the first weight data element and a first kth bit of the selected set of kth bits. The adder tree is configured to generate a summation data element based on each of the first product data elements.
Systems and methods for data placement for in-memory-compute
According to one embodiment, a memory module includes: a memory die including a dynamic random access memory (DRAM) banks, each including: an array of DRAM cells arranged in pages; a row buffer to store values of one of the pages; an input/output (IO) module; and an in-memory compute (IMC) module including: an arithmetic logic unit (ALU) to receive operands from the row buffer or the IO module and to compute an output based on the operands and one of a plurality of ALU operations; and a result register to store the output of the ALU; and a controller to: receive, from a host processor, operands and an instruction; determine, based on the instruction, a data layout; supply the operands to the DRAM banks in accordance with the data layout; and control an IMC module to perform one of the ALU operations on the operands in accordance with the instruction.
Full adder cell with improved power efficiency
An adder circuit provides a first operand input and a second operand input to an XNOR cell. The XNOR cell transforms these inputs to a propagate signal that is applied to an OAT cell to produce a carry out signal. A third OAT cell transforms a third operand input and the propagate signal into a sum output signal.
Systems and methods for data placement for in-memory-compute
According to one embodiment, a memory module includes: a memory die including a dynamic random access memory (DRAM) banks, each including: an array of DRAM cells arranged in pages; a row buffer to store values of one of the pages; an input/output (IO) module; and an in-memory compute (IMC) module including: an arithmetic logic unit (ALU) to receive operands from the row buffer or the IO module and to compute an output based on the operands and one of a plurality of ALU operations; and a result register to store the output of the ALU; and a controller to: receive, from a host processor, operands and an instruction; determine, based on the instruction, a data layout; supply the operands to the DRAM banks in accordance with the data layout; and control an IMC module to perform one of the ALU operations on the operands in accordance with the instruction.
Apparatus and method for processing an instruction matrix specifying parallel and dependent operations
An execution unit to execute instructions using a time-lag sliced architecture (TLSA). The execution unit includes a first computation unit and a second computation unit, where each of the first computation unit and the second computation unit includes a plurality of logic slices arranged in order, where each of the plurality of logic slices except a lattermost logic slice is coupled to an immediately following logic slice to provide an output of that logic slice to the immediately following logic slice, where the immediately following logic slice is to execute with a time lag with respect to its immediately previous logic slice. Further, each of the plurality of logic slices of the second computation unit is coupled to a corresponding logic slice of the first computation unit to receive an output of the corresponding logic slice of the first computation unit.
HYBRID ELECTRO-PHOTONIC NETWORK-ON-CHIP
Various embodiments provide for a circuit package including an electronic integrated circuit comprising a plurality of processing elements, and a plurality of bidirectional photonic channels, e.g., implemented in a photonic integrated circuit underneath the electronic integrated circuit, that connect the processing elements into an electro-photonic network. The processing elements include message routers with photonic-channel interfaces. Each bidirectional photonic channel interfaces at one end with a photonic-channel interface of the message router of a first one of the processing elements and at the other end with a photonic-channel interface of the message router of a second one of the processing elements and is configured to optically transfer messages (e.g., packets) between the message routers of the first and second processing elements.
CLOCK SIGNAL DISTRIBUTION USING PHOTONIC FABRIC
Various embodiments provide for clock signal distribution within a processor, such as a machine learning (ML) processor, using a photonic fabric.