G06F7/544

IN-MEMORY COMPUTATION SYSTEM WITH DRIFT COMPENSATION CIRCUIT

A circuit includes a memory array with memory cells arranged in a matrix of rows and columns, where each row includes a word line connected to the memory cells of the row, and each column includes a bit line connected to the memory cells of the column. Computational weights for an in-memory compute operation (IMCO) are stored in the memory cells. A word line control circuit simultaneously actuates word lines in response to input signals providing coefficient data for the IMCO by applying word line signal pulses. A column processing circuit connected to the bit lines processes analog signals developed on the bit lines in response to the simultaneous actuation of the word lines to generate multiply and accumulate output signals for the IMCO. Pulse widths of the signal pulses are modulated to compensate for cell drift. The IMCO further handles positive/negative calculation for the coefficient data and computational weights.

COMPLEX MULTIPLICATION CIRCUIT

A first multiplex circuit generates a first multiplex signal obtained by time-divisionally multiplexing a first real part and a first imaginary part of a first complex number. A second multiplex circuit generates a second multiplex signal obtained by time-divisionally multiplexing a second real part and a second imaginary part of a second complex number. A multiply-subtract operation circuit performs a multiply-subtract operation of the first and second multiplex signals. A third multiplex circuit generates a third multiplex signal obtained by time-divisionally multiplexing the first and second real parts. A fourth multiplex circuit generates a fourth multiplex signal obtained by time-divisionally multiplexing the first and second imaginary parts. A multiply-accumulate operation circuit performs a multiply-accumulate operation of the third and fourth multiplex signals. A fifth multiplex circuit generates a fifth multiplex signal obtained by time-divisionally multiplexing output values of the multiply-subtract operation circuit and the multiply-accumulate operation circuit.

MEMORY ARRAY WITH PROGRAMMABLE NUMBER OF FILTERS

Aspects of the present disclosure are directed to devices and methods for performing MAC operations using a memory array as a compute-in-memory (CIM) device that can enable higher computational throughput, higher performance and lower energy consumption compared to computation using a processor outside of a memory array. In some embodiments, an activation architecture is provided using a bit cell array arranged in rows and columns to store charges that represent a weight value in a weight matrix. A read word line (RWL) may be repurposed to provide the input activation value to bit cells within a row of bit cells, while a read-bit line (RBL) is configured to receive multiplication products from bit cells arranged in a column. Some embodiments provide multiple sub-arrays or tiles of bit cell arrays.

COMPUTE-IN-MEMORY SYSTEMS AND METHODS WITH CONFIGURABLE INPUT AND SUMMING UNITS
20230022516 · 2023-01-26 ·

A device includes a multiplication unit and a configurable summing unit. The multiplication unit is configured to receive data and weights for an Nth layer, where N is a positive integer. The multiplication unit is configured to multiply the data by the weights to provide multiplication results. The configurable summing unit is configured by Nth layer values to receive an Nth layer number of inputs and perform an Nth layer number of additions, and to sum the multiplication results and provide a configurable summing unit output.

High Performance Systems And Methods For Modular Multiplication

A circuit system for performing modular reduction of a modular multiplication includes multiplier circuits that receive a first subset of coefficients that are generated by summing partial products of a multiplication operation that is part of the modular multiplication. The multiplier circuits multiply the coefficients in the first subset by constants that equal remainders of divisions to generate products. Adder circuits add a second subset of the coefficients and segments of bits of the products that are aligned with respective ones of the second subset of the coefficients to generate sums.

NEURAL NETWORK COMPUTING DEVICE AND COMPUTING METHOD THEREOF
20230027768 · 2023-01-26 ·

A computing method for performing a matrix multiplying-and-accumulating computation by a flash memory array which includes word lines, bit lines and flash memory cells. The computing method includes the following steps: respectively storing a weight value in each of the flash memory cells, receiving a plurality of input voltages via the word lines, performing an computation on one of the input voltages and the weight value by each of the flash memory cells to obtain an output current, outputting the output currents of the flash memory cells via the bit lines, and accumulating the output currents of the flash memory cells connected to the same bit line of the bit lines to obtain a total output current. Each of the flash memory cells is an analog device, and each of the input voltages, each of the output currents and each of the weight values are analog values.

PROCESSING-IN-MEMORY(PIM) DEVICE
20230025899 · 2023-01-26 · ·

A PIM device includes a memory/arithmetic region including a plurality of memory banks and a plurality of MAC operators, the plurality of MAC operators including a first MAC operator, a peripheral region including a data input/output circuit, and a global data input/output (GIO) line capable of providing a data transmission path between the peripheral region and the memory/arithmetic region. The first MAC operator is configured to perform an EWM operation by performing a multiplication operation on first input data and second input data that are transmitted from first and second memory banks of the plurality of memory banks, respectively, to generate multiplication result data and transmitting the multiplication result data to a third memory bank. While the EWM operation is being performed, data transmission through the GIO line between the peripheral region and the memory/arithmetic region is blocked.

ACCELERATED CRYPTOGRAPHIC-RELATED PROCESSING WITH FRACTIONAL SCALING
20230027423 · 2023-01-26 ·

Cryptographic-related processing is performed using an n-bit accelerator. The processing includes providing a binary operand to a multiply-and-accumulate unit of the n-bit accelerator. The multiply-and-accumulate unit performs an operation using the binary operand and a predetermined fractional constant F to obtain an operation result, and rounds the operation result by discarding x least-significant bits of the operation result to obtain a fractionally-scaled result, where x is a configurable number of bits to discard from the operation result, and the fractionally-scaled result facilitates performing the cryptographic-related processing.

DATA PROCESSING METHOD AND RELATED DEVICE
20230229898 · 2023-07-20 · ·

A data processing method includes: obtaining to-be-processed data and a target neural network model, where the target neural network model includes a first transformer layer, the first transformer layer includes a first residual branch and a second residual branch, the first residual branch includes a first attention head, and the second residual branch includes a target feed-forward network (FFN) layer; and performing target task related processing on the to-be-processed data based on the target neural network model, to obtain a data processing result, where the target neural network model is for performing a target operation on an output of the first attention head and a first weight value to obtain an output of the first residual branch, and/or the target neural network model is for performing a target operation on an output of the target FFN and a second weight value to obtain an output of the second residual branch.

MULTIPLICATION BY A RATIONAL IN HARDWARE WITH SELECTABLE ROUNDING MODE
20230229397 · 2023-07-20 ·

A fixed logic circuit for performing multiplication of an input x by a constant rational p/q so as to calculate an output y according to a directed rounding or round-to-nearest rounding mode. Fixed logic hardware is derived comprising an addition array configured to operate on canonical signed digit (CSD) forms of binary values (a CSD array) so as to form an approximation of a multiplication of an input x [m−1:0] by a rational p/q. A truncated summation array of a finite sequence of most significant bits of an infinite CSD expansion of the rational p/q operating on the bits of the input x satisfies

[00001] Δ high - Δ low < 1 q .

Registers define a plurality of corrective constants for a respective plurality of rounding modes, and selection logic selects the respective corrective constant for that rounding mode in dependence on a rounding mode in which the truncated summation array is to operate.