G06F7/5318

Compressed wallace trees in FMA circuits

An embodiment of an apparatus comprises one or more fractional width fused multiply-accumulate (FMA) circuits configured as a shared Wallace tree, and circuitry coupled to the one or more fractional width FMA circuits to provide one or more fractional width FMA operations through the one or more fractional width FMA circuits. Other embodiments are disclosed and claimed.

SYSTEMS AND METHODS FOR DATA PLACEMENT FOR IN-MEMORY-COMPUTE
20250190216 · 2025-06-12 ·

According to one embodiment, a memory module includes: a memory die including a dynamic random access memory (DRAM) banks, each including: an array of DRAM cells arranged in pages; a row buffer to store values of one of the pages; an input/output (IO) module; and an in-memory compute (IMC) module including: an arithmetic logic unit (ALU) to receive operands from the row buffer or the IO module and to compute an output based on the operands and one of a plurality of ALU operations; and a result register to store the output of the ALU; and a controller to: receive, from a host processor, operands and an instruction; determine, based on the instruction, a data layout; supply the operands to the DRAM banks in accordance with the data layout; and control an IMC module to perform one of the ALU operations on the operands in accordance with the instruction.

SPL

Clock signal distribution using photonic fabric
12339490 · 2025-06-24 · ·

Various embodiments provide for clock signal distribution within a processor, such as a machine learning (ML) processor, using a photonic fabric.

Electro-photonic network for machine learning
12339492 · 2025-06-24 · ·

Electro-photonic networks, including a plurality of processing elements connected by bidirectional photonic channels, suited for implementing neural-network models. Weights of the model may be preloaded into memory of the processing elements based on assignments of neural nodes to processing elements implementing them, and routers of the processing elements can be configured to stream activations between the processing elements based on a predetermined flow of activations in the model.

Multiplier with a new Partial Product Generation Method
20250217109 · 2025-07-03 ·

Integrated circuit devices, methods, and circuitry for an efficient multiplier are provided. Multiplier circuitry to multiply a multiplicand value with a multiplier value may include, among other things, decoding circuitry, tripler circuitry, and partial product multiplexing circuitry. The decoding circuitry may decode bits of the multiplier value using a decoding scheme that includes at least a coding that indicates a triple, the tripler circuitry may generate a triple of the multiplicand value and may include circuitry to generate the triple of the multiplicand value that sums at least two different vectors, and the partial product multiplexing circuitry may select the triple of the multiplicand as a partial product when the coding indicates the triple.

Electro-photonic network for machine learning
12353006 · 2025-07-08 · ·

Various embodiments provide for electro-photonic networks, including a plurality of processing elements connected by bidirectional photonic channels, suited for implementing neural-network models. Weights of the model may be preloaded into memory of the processing elements based on assignments of neural nodes to processing elements implementing them, and routers of the processing elements can be configured to stream activations between the processing elements based on a predetermined flow of activations in the model.

Mixed-Radix Multiplier Circuit

Integrated circuit devices, methods, and circuitry for an efficient multiplier are provided. Multiplier circuitry to multiply a multiplicand value with a multiplier value may include input circuitry, mixed-radix partial product generation circuitry, and partial product addition circuitry. The input circuitry may receive the multiplicand value and the multiplier value. The mixed-radix partial product generation circuitry may generate partial products that include a first radix partial product according to a first radix coding and a second radix partial product according to a second radix coding. The partial product addition circuitry may add the partial products to generate a product of the multiplicand value and multiplier value.

METHOD FOR PROCESSING DATA USING ADDER AND ELECTRONIC DEVICE

A method for processing data using an adder, an electronic device and a non-transitory computer-readable storage medium. The adder includes a first preprocessing unit and an encoding unit. The method includes: preprocessing first data to obtain a first preprocessing result corresponding to the first data by the first preprocessing unit in response to a summation instruction for the first data; and outputting a first encoded value to the encoding unit from the first preprocessing unit based on characteristics of the first data; where preprocessing the first data by the first preprocessing unit includes performing a NOT operation on the first data or outputting the first data directly; and the first encoded value is 0 or 1. According to the embodiments, the add 1 operations for all data are processed in a centralized manner by the encoding unit, thereby streamlining the circuit structure and saving circuit resources.

MEMORY SYSTEM AND METHODS FOR ACCELERATING RECURRENT NEURAL NETWORKS

A memory device is provided. The memory device comprises a multiply-and-accumulate (MAC) circuit and a post processing circuit. The MAC circuit comprises vector engine circuits that store a first input vector of a current time step of a recurrent neural network (RNN) and a first hidden vector of a previous time step of the RNN. The vector engine circuits perform MAC operations of the first input vector, the first hidden vector and a weight matrix. The post processing circuit generates a second hidden vector of the current time step of the RNN according to results of the MAC operations.

IN-MEMORY COMPUTATION CIRCUIT AND METHOD

A memory circuit includes a plurality of columns, each column including a plurality of storage elements and a plurality of multipliers, wherein each multiplier is coupled to a corresponding storage element, a data register configured to store a plurality of input data elements, and a plurality of multiplexers coupled to the data register. Each multiplexer is configured to output a bit of a plurality of bits of a corresponding input data element of the plurality of input data elements to a corresponding multiplier of the plurality of multipliers of each column, and each multiplier of the plurality of multipliers of each column is configured to output a product data element based on a weight data element stored in the corresponding storage element and the bit of the plurality of bits of the corresponding input data element.