G06F7/5318

Dadda architecture that scales with increasing operand size

Aspects of the invention include physical design-optimal Dadda architectures that scale with increasing operand size. Partial product arrays can be generated for two n-bit operands and columns in the partial product arrays can be shifted to a first row. The number of partial products in each column can be iteratively reduced across one or more stages until each column has at most two partial products. At each stage a maximum column height is determined and each column having a height greater than the maximum column height is reduced using half-adders and full-adders. Result bits of the half-adders and the full-adders are placed at the bottom of the current column and carry bits of the half-adders and the full-adders are placed at the bottom of the next column.

OPERATION METHOD OF MULTIPLIER, ELECTRONIC DEVICE, AND STORAGE MEDIUM

Disclosed are an operation method of multiplier, an electronic device, and a storage medium. The method includes: determining a plurality of input data sets of the multiplier and an encoding manner for the multiplier; determining at least one low-order input data set in the plurality of input data sets; determining a carry compensation term corresponding to the at least one low-order input data set based on the at least one low-order input data set and the encoding manner; determining a target partial product array based on the carry compensation term corresponding to the at least one low-order input data set and the plurality of input data sets; and determining a product operation result for each input data set based on the target partial product array. According to this disclosure, multiplication operations with multiple precision may be implemented by using one multiplier, thereby reducing hardware resource consumption and hardware area.

METHODS AND ELECTRONIC DEVICE FOR HIGH PERFORMANCE MODULO MULTIPLICATION
20240361984 · 2024-10-31 ·

Embodiments herein disclose high performance modulo multiplication methods performed by circuitry of an electronic device. The method includes obtaining and summing partial products to obtain a partial multiplication result using a primary Wallace tree. The partial multiplication result is fed back in a next cycle for subsequent limb multiplication associated with the primary Wallace tree. The obtaining and summing of partial products and feeding back operations are repeated until all limbs associated with the primary Wallace tree are completed. A residual computation of a partial multiplication result associated with a final limb of the primary Wallace tree is then performed, to obtain a multiplication result using a secondary Wallace tree, where the final limb stores the partial multiplication result of a last iteration.

APPARATUS AND METHOD FOR PROCESSING AN INSTRUCTION MATRIX SPECIFYING PARALLEL AND DEPENDENT OPERATIONS
20180137081 · 2018-05-17 ·

A matrix of execution blocks form a set of rows and columns. The rows support parallel execution of instructions and the columns support execution of dependent instructions. The matrix of execution blocks process a single block of instructions specifying parallel and dependent instructions.

Apparatus and method for processing an instruction matrix specifying parallel and dependent operations
09886416 · 2018-02-06 · ·

A matrix of execution blocks form a set of rows and columns. The rows support parallel execution of instructions and the columns support execution of dependent instructions. The matrix of execution blocks process a single block of instructions specifying parallel and dependent instructions.

In-memory computation circuit and method

A memory circuit includes a selection circuit, a column of memory cells, and an adder tree. The selection circuit is configured to receive input data elements, each input data element including a number of bits equal to H, and output a selected set of kth bits of the H bits of the input data elements. Each memory cell of the column of memory cells includes a first storage unit configured to store a first weight data element and a first multiplier configured to generate a first product data element based on the first weight data element and a first kth bit of the selected set of kth bits. The adder tree is configured to generate a summation data element based on each of the first product data elements.

Systems and methods for data placement for in-memory-compute

According to one embodiment, a memory module includes: a memory die including a dynamic random access memory (DRAM) banks, each including: an array of DRAM cells arranged in pages; a row buffer to store values of one of the pages; an input/output (IO) module; and an in-memory compute (IMC) module including: an arithmetic logic unit (ALU) to receive operands from the row buffer or the IO module and to compute an output based on the operands and one of a plurality of ALU operations; and a result register to store the output of the ALU; and a controller to: receive, from a host processor, operands and an instruction; determine, based on the instruction, a data layout; supply the operands to the DRAM banks in accordance with the data layout; and control an IMC module to perform one of the ALU operations on the operands in accordance with the instruction.

Clock signal distribution using photonic fabric

Various embodiments provide for clock signal distribution within a processor, such as a machine learning (ML) processor, using a photonic fabric.

IN-MEMORY COMPUTATION CIRCUIT AND METHOD

A memory circuit includes a column of memory cells configured to receive a set of kth bits of a number H of bits of each input data element of a plurality of input data elements, and each memory cell of the column of memory cells is configured to multiply the kth bit of a corresponding input data element of the plurality of data elements with a first weight data element stored in the memory cell, and to generate a corresponding first product data element. The memory circuit includes an adder tree configured to generate a summation data element based on each of the first product data elements.

Artificial intelligence accelerators
12282844 · 2025-04-22 · ·

An artificial intelligence (AI) accelerator includes memory circuits configured to output weight data and vector data, a multiplication circuit/adder tree performing a multiplying/adding calculation on the weight data and the vector data to generate multiplication/addition result data, a first accumulator synchronized with an odd clock signal to perform an accumulative adding calculation on odd-numbered multiplication/addition result data of the multiplication/addition result data and a first latched data, and a second accumulator synchronized with an even clock signal to perform an accumulative adding calculation on even-numbered multiplication/addition result data of the multiplication/addition result data and a second latched data.