G06F2207/4814

Decoupled Execution Of Workload For Crossbar Arrays

A computing system architecture is presented for decoupling execution of workload by crossbar arrays and similar memory modules. The computing system includes: a data bus; a core controller connected to the data bus; and a plurality of local tiles connected to the data bus. Each local tile in the plurality of local tiles includes a local controller and at least one memory module, where the memory module performs computation using the data stored in memory without reading the data out of the memory.

Variable accuracy computing system
11693626 · 2023-07-04 · ·

The present disclosure relates to a computing system. The computing system comprises a data input configured to receive an input data signal, a computation unit having an input coupled with the data input, the computation unit being operative to apply a weight to a signal received at its input to generate a weighted output signal, and a controller. The controller is configured to monitor a parameter of the input signal and/or a parameter of the output signal and to issue a control signal to the computation unit to control a level of accuracy of the weighted output signal based at least in part on the monitored parameter.

Low area multiply and accumulate unit

An improved electronic mixed mode multiplier and accumulate circuit for artificial intelligence and computing system applications that perform vector-vector, vector-matrix and other multiply-accumulate computations. The circuit is provided is a high resolution, high linearity, low area, low power multiply—accumulate (MAC) unit to interface with a memory device for storing computation output results. The MAC unit uses a less number of current carrying elements resulting in much lower integrated circuit area, and provides a tight matching between the current elements thus preserving inherent linearity requirements due to current mode operation. Further the MAC performs current scaling using switches and current division where the current switches occupy minimum size transistors requiring a small area to implement that renders it compatible with MRAM such as a magnetic tunnel junction device. The MAC is hierarchically extended for increased number of bits to provide a delay implementation using orthogonal vector and current addition.

SEMICONDUCTOR INTEGRATED CIRCUIT AND ARITHMETIC LOGIC OPERATION SYSTEM
20220405057 · 2022-12-22 · ·

According to one embodiment, in a semiconductor integrated circuit, the plurality of storage devices are arranged in a form of a plurality of rows. Each of the storage devices are configured to store a bit position value of a weight of multiple bits. The plurality of multiplication circuits are arranged in a form of a plurality of rows and are configured to multiply a plurality of input voltages by the weight of multiple bits to generate a plurality of multiplication results. The one or more capacitive devices are configured to accumulate charges corresponding to the plurality of multiplication results. The adder circuit are configured to generate an output voltage corresponding to the total value of the charges accumulated in the one or more capacitive devices. The plurality of input voltages have different amplitudes. Each of the input voltages is associated with a corresponding bit position of the weight.

Nonvolatile memory device performing a multiplication and accumulation operation

A nonvolatile memory device includes a memory cell array and an computation output circuit. The memory cell array includes a plurality of nonvolatile memory elements configured to store a plurality of weights respectively and a plurality of bit lines coupled to the plurality of nonvolatile memory elements according to a plurality of input signals. The computation output circuit is configured to generate a computation signal from voltages induced at the plurality of bit lines according to the plurality of input signals.

Bias Unit Element with Binary Weighted Charge Transfer Capacitors
20220385293 · 2022-12-01 · ·

A Bias Unit Element (UE) has a digital input and sign input, and comprises a positive Bias UE and a negative Bias UE, each comprising groups of NAND gates generating an output and a complementary output, each of which are coupled to differential charge transfer lines through binary weighted charge transfer capacitors to a differential charge transfer bus comprising a positive charge transfer line and a negative charge transfer line. The sign input enables the positive Bias UE when the sign bit is positive and enables the negative Bias UE when the sign bit is negative.

Chopper Stabilized Analog Multiplier Accumulator with Binary Weighted Charge Transfer Capacitors
20220382516 · 2022-12-01 · ·

An architecture for a chopper stabilized multiplier-accumulator (MAC) uses a chop clock and common Unit Element (UE), the MAC formed as a plurality of MAC UEs receiving X and W values and a sign bit exclusive ORed with the chop clock, a plurality of Bias UEs receiving E value and a sign bit exclusive ORed with the chop clock, and a plurality of Analog to Digital Conversion (ADC) UEs which collectively perform a scalable MAC operation and generate a binary result. Each MAC UE, BIAS UE and ADC UE comprises groups of NAND gates with complementary outputs arranged in NAND-groups, each NAND gate coupled to a differential charge transfer bus through a binary weighted charge transfer capacitor. The analog charge transfer bus is coupled to groups of ADC UEs with an ADC controller which enables and disables the ADC UEs using successive approximation to determine the accumulated multiplication result.

Analog Multiplier Accumulator with Unit Element Gain Balancing
20220382517 · 2022-12-01 · ·

A Gain Balanced Analog Multiply-Accumulator (AMAC) has an inference memory which outputs subsets of inference data comprising X input values and one or more associated W coefficient values. The Gain Balanced AMAC has a number of Analog Multiplier-Accumulator Unit Elements (AMAC UE) in equal number to the number of X input values in each subset of inference data. In each of a series of multiply-accumulate cycles, the X input values and one or more W coefficient values from the inference memory are applied to each AMAC UE to generate a charge corresponding to the multiplication of X input value and W coefficient value of each AMAC UE which is transferred to a shared analog charge bus. The inference memory applies the X input value and W coefficient values of each subset to a different AMAC UE on subsequent cycles to balance the gain of the AMAC such that gain differences from one AMAC UE to another are not cumulative.

Neural network computation method using adaptive data representation

A method for neural network computation using adaptive data representation, adapted for a processor to perform multiply-and-accumulate operations on a memory having a crossbar architecture, is provided. The memory comprises multiple input and output lines crossing each other, multiple cells respectively disposed at intersections of the input and output lines, and multiple sense amplifiers respectively connected to the output lines. In the method, an input cycle of kth bits respectively in an input data is adaptively divided into multiple sub-cycles, wherein a number of the divided sub-cycles is determined according to a value of k. The kth bits of the input data are inputted to the input lines with the sub-cycles and computation results of the output lines are sensed by the sense amplifiers. The computation results sensed in each sub-cycle are combined to obtain the output data corresponding to the kth bits of the input data.

SRAM-BASED IN-MEMORY COMPUTING MACRO USING ANALOG COMPUTATION SCHEME
20220366968 · 2022-11-17 ·

Technology for generating an SRAM-based in-memory computing macro includes replacing a SRAM cell cluster defined by a generic SRAM macro with a single-bit multi-bank cluster, the single-bit multi-bank cluster including a plurality of CiM SRAM cells and a plurality of C-2C capacitor ladder cells, arranging a plurality of single-bit multi-bank clusters to form a multi-bit multi-bank cluster, and arranging a plurality of multi-bit multi-bank clusters into a multi-dimensional MAC computational unit within a region of the generic SRAM macro, where an output of at least two of the multi-bit multi-bank clusters are electrically coupled to form an output analog activation line, and where a plurality of bit lines and a plurality of word lines remain at the same grid locations as provided in the generic SRAM macro. Embodiments include arranging a plurality of multi-dimensional MAC computational units into an in-memory MAC computing array.