G06F7/5443

Fused Multiply-Add operator for mixed precision floating-point numbers with correct rounding
11550544 · 2023-01-10 · ·

A fused multiply-add hardware operator comprising a multiplier receiving two multiplicands as floating-point numbers encoded in a first precision format; an alignment circuit associated with the multiplier configured to convert the result of the multiplication into a first fixed-point number; and an adder configured to add the first fixed-point number and an addition operand. The addition operand is a floating-point number encoded in a second precision format, and the operator comprises an alignment circuit associated with the addition operand, configured to convert the addition operand into a second fixed-point number of reduced dynamic range relative to the dynamic range of the addition operand, having a number of bits equal to the number of bits of the first fixed-point number, extended on both sides by at least the size of the mantissa of the addition operand; the adder configured to add the first and second fixed-point numbers without loss.

Artificial neural networks
11551075 · 2023-01-10 · ·

The present disclosure relates to a neuron for an artificial neural network. The neuron includes: a first dot product engine operative to: receive a first set of weights; receive a set of inputs; and calculate the dot product of the set of inputs and the first set of weights to generate a first dot product engine output. The neuron further includes a second dot product engine operative to: receive a second set of weights; receive an input based on the first dot product engine output; and generate a second dot product engine output based on the product of the first dot product engine output and a weight of the second set of weights. The neuron further includes an activation function module arranged to generate a neuron output based on the second dot product engine output. The first dot product engine and the second dot product engine are structurally or functionally different.

Compute optimizations for neural networks using ternary weight

One embodiment provides for a compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction that specifies multiple operands including a multi-bit input value and a ternary weight associated with a neural network and an arithmetic logic unit including a multiplier, an adder, and an accumulator register. To execute the decoded instruction, the multiplier is to perform a multiplication operation on the multi-bit input based on the ternary weight to generate an intermediate product and the adder is to add the intermediate product to a value stored in the accumulator register and update the value stored in the accumulator register.

Bipolar all-memristor circuit for in-memory computing
11694070 · 2023-07-04 · ·

A circuit for performing energy-efficient and high-throughput multiply-accumulate (MAC) arithmetic dot-product operations and convolution computations includes a two dimensional crossbar array comprising a plurality of row inputs and at least one column having a plurality of column circuits, wherein each column circuit is coupled to a respective row input. Each respective column circuit includes an excitatory memristor neuron circuit having an input coupled to a respective row input, a first synapse circuit coupled to an output of the excitatory memristor neuron circuit, the first synapse circuit having a first output, an inhibitory memristor neuron circuit having an input coupled to the respective row input, and a second synapse circuit coupled to an output of the inhibitory memristor neuron circuit, the second synapse circuit having a second output. An output memristor neuron circuit is coupled to the first output and second output of each column circuit and has an output.

Variable accuracy computing system
11693626 · 2023-07-04 · ·

The present disclosure relates to a computing system. The computing system comprises a data input configured to receive an input data signal, a computation unit having an input coupled with the data input, the computation unit being operative to apply a weight to a signal received at its input to generate a weighted output signal, and a controller. The controller is configured to monitor a parameter of the input signal and/or a parameter of the output signal and to issue a control signal to the computation unit to control a level of accuracy of the weighted output signal based at least in part on the monitored parameter.

Methods for performing fused-multiply-add operations on serially allocated data within a processing-in-memory capable memory device, and related memory devices and systems

Methods, apparatuses, and systems for in- or near-memory processing are described. Strings of bits (e.g., vectors) may be fetched and processed in logic of a memory device without involving a separate processing unit. Operations (e.g., arithmetic operations) may be performed on numbers stored in a bit-serial way during a single sequence of clock cycles. Arithmetic may thus be performed in a single pass as numbers are bits of two or more strings of bits are fetched and without intermediate storage of the numbers. Vectors may be fetched (e.g., identified, transmitted, received) from one or more bit lines. Registers of the memory array may be used to write (e.g., store or temporarily store) results or ancillary bits (e.g., carry bits or carry flags) that facilitate arithmetic operations. Circuitry near, adjacent, or under the memory array may employ XOR or AND (or other) logic to fetch, organize, or operate on the data.

Logarithmic addition-accumulator circuitry, processing pipeline including same, and methods of operation

An integrated circuit including a plurality of logarithmic addition-accumulator circuits, connected in series, to, in operation, perform logarithmic addition and accumulate operations, wherein each logarithmic addition-accumulator circuit includes: (i) a logarithmic addition circuit to add a first input data and a filter weight data, each having the logarithmic data format, and to generate and output first sum data having a logarithmic data format, and (ii) an accumulator, coupled to the logarithmic addition circuit of the associated logarithmic addition-accumulator circuit, to add a second input data and the first sum data output by the associated logarithmic addition circuit to generate first accumulation data. The integrated circuit may further include first data format conversion circuitry, coupled to the output of each logarithmic addition circuit, to convert the data format of the first sum data to a floating point data format wherein the accumulator may be a floating point type.

Key-value memory network for predicting time-series metrics of target entities

A system implements a key value memory network including a key matrix with key vectors learned from training static feature data and time-series feature data, a value matrix with value vectors representing time-series trends, and an input layer to receive, for a target entity, input data comprising a concatenation of static feature data of the target entity, time-specific feature data, and time-series feature data for the target entity. The key value memory network also includes an entity-embedding layer to generate an input vector from the input data, a key-addressing layer to generate a weight vector indicating similarities between the key vectors and the input vector, a value-reading layer to compute a context vector from the weight and value vectors, and an output layer to generate predicted time-series data for a target metric of the target entity by applying a continuous activation function to the context vector and the input vector.

Single-stage hardware sorting blocks and associated multiway merge sorting networks
11693623 · 2023-07-04 ·

A system and methods for designing single-stage hardware sorting blocks, and further using the single-stage hardware sorting blocks to reduce the number of stages in multistage sorting processes, or to define multiway merge sorting networks.

SRAM-based cell for in-memory computing and hybrid computations/storage memory architecture

An in-memory computing device includes in some examples a two-dimensional array of memory cells arranged in rows and columns, each memory cell made of a nine-transistor current-based SRAM. Each memory cell includes a six-transistor SRAM cell and a current source coupled by a switching transistor, which is controlled by input signals on an input line, to an output line associates with the column of memory cells the memory cell is in. The current source includes a switching transistor controlled by the state of the six-transistor SRAM cell, and a current regulating transistor adapted to generate a current at a level determined by a control signal applied at the gate. The control signal can be set such that the total current in each output line is increased by a factor of 2 in each successive column of the memory cells.