G06F7/501

ERROR CALIBRATION APPARATUS AND METHOD

An error calibration apparatus and method are provided. The method is adapted for calibrating a machine learning (ML) accelerator. The ML accelerator achieves computation by using an analog circuit. An error between an output value of one or more computing layers of a neural network and a corresponding corrected value is determined. The computation of the computing layers is achieved by the analog circuit. A calibration node is generated according to the error. The calibration node is located at the next layer of the computing layers. The calibration node is used to minimize the error. The calibration node is achieved by a digital circuit. Accordingly, error and distortion of the analog circuit could be reduced.

FOLDING COLUMN ADDER ARCHITECTURE FOR DIGITAL COMPUTE IN MEMORY
20230031841 · 2023-02-02 ·

Certain aspects provide an apparatus for performing machine learning tasks, and in particular, to computation-in-memory architectures. One aspect provides a circuit for in-memory computation. The circuit generally includes: a plurality of memory cells on each of multiple columns of a memory, the plurality of memory cells being configured to store multiple bits representing weights of a neural network, wherein the plurality of memory cells on each of the multiple columns are on different word-lines of the memory; multiple addition circuits, each coupled to a respective one of the multiple columns; a first adder circuit coupled to outputs of at least two of the multiple addition circuits; and an accumulator coupled to an output of the first adder circuit.

FOLDING COLUMN ADDER ARCHITECTURE FOR DIGITAL COMPUTE IN MEMORY
20230031841 · 2023-02-02 ·

Certain aspects provide an apparatus for performing machine learning tasks, and in particular, to computation-in-memory architectures. One aspect provides a circuit for in-memory computation. The circuit generally includes: a plurality of memory cells on each of multiple columns of a memory, the plurality of memory cells being configured to store multiple bits representing weights of a neural network, wherein the plurality of memory cells on each of the multiple columns are on different word-lines of the memory; multiple addition circuits, each coupled to a respective one of the multiple columns; a first adder circuit coupled to outputs of at least two of the multiple addition circuits; and an accumulator coupled to an output of the first adder circuit.

CIRCUITS AND METHODS FOR IN-MEMORY COMPUTING
20230089348 · 2023-03-23 ·

In some embodiments, an in-memory-computing SRAM macro based on capacitive-coupling computing (C3) (which is referred to herein as “C3SRAM”) is provided. In some embodiments, a C3SRAM macro can support array-level fully parallel computation, multi-bit outputs, and configurable multi-bit inputs. The macro can include circuits embedded in bitcells and peripherals to perform hardware acceleration for neural networks with binarized weights and activations in some embodiments. In some embodiments, the macro utilizes analog-mixed-signal capacitive-coupling computing to evaluate the main computations of binary neural networks, binary-multiply-and-accumulate operations. Without needing to access the stored weights by individual row, the macro can assert all of its rows simultaneously and form an analog voltage at the read bitline node through capacitive voltage division, in some embodiments. With one analog-to-digital converter (ADC) per column, the macro cab realize fully parallel vector-matrix multiplication in a single cycle in accordance with some embodiments.

CIRCUITS AND METHODS FOR IN-MEMORY COMPUTING
20230089348 · 2023-03-23 ·

In some embodiments, an in-memory-computing SRAM macro based on capacitive-coupling computing (C3) (which is referred to herein as “C3SRAM”) is provided. In some embodiments, a C3SRAM macro can support array-level fully parallel computation, multi-bit outputs, and configurable multi-bit inputs. The macro can include circuits embedded in bitcells and peripherals to perform hardware acceleration for neural networks with binarized weights and activations in some embodiments. In some embodiments, the macro utilizes analog-mixed-signal capacitive-coupling computing to evaluate the main computations of binary neural networks, binary-multiply-and-accumulate operations. Without needing to access the stored weights by individual row, the macro can assert all of its rows simultaneously and form an analog voltage at the read bitline node through capacitive voltage division, in some embodiments. With one analog-to-digital converter (ADC) per column, the macro cab realize fully parallel vector-matrix multiplication in a single cycle in accordance with some embodiments.

APPARATUS AND METHOD WITH NEURAL NETWORK OPERATIONS

A neural network apparatus includes: a first processing circuit and a second processing circuit each configured to perform a vector-by-matrix multiplication (VMM) operation on a weight and an input activation; a first register configured to store an output of the first processing circuit; an adder configured to add an output of the first register and an output of the second processing circuit; a second register configured to store an output of the adder; and an input circuit configured to input a same input activation to the first processing circuit and the second processing circuit and control the first processing circuit and the second processing circuit.

APPARATUS AND METHOD WITH NEURAL NETWORK OPERATIONS

A neural network apparatus includes: a first processing circuit and a second processing circuit each configured to perform a vector-by-matrix multiplication (VMM) operation on a weight and an input activation; a first register configured to store an output of the first processing circuit; an adder configured to add an output of the first register and an output of the second processing circuit; a second register configured to store an output of the adder; and an input circuit configured to input a same input activation to the first processing circuit and the second processing circuit and control the first processing circuit and the second processing circuit.

COMPACT, HIGH PERFORMANCE FULL ADDERS
20220342634 · 2022-10-27 ·

Examples of compact, high performance full adder circuits and methods of forming and operating the same are provided. In an example, a full adder comprises a first stage, a second stage and a third stage. The first stage has a first output at which a first reused signal is generated and a second output at which a second reused signal is generated. The second stage has a first reused signal input to which the first reused signal is applied, a second reused signal input to which the second reused signal is applied, and a sum output at which a sum signal is generated. The third stage has a third reused signal input to which the first reused signal is applied, a fourth reused signal input to which the second reused signal is applied, and a carry-out output at which a carry-out signal is generated. In some examples, the first stage includes a transistor stack and an inverter that share a transistor.

COMPACT, HIGH PERFORMANCE FULL ADDERS
20220342634 · 2022-10-27 ·

Examples of compact, high performance full adder circuits and methods of forming and operating the same are provided. In an example, a full adder comprises a first stage, a second stage and a third stage. The first stage has a first output at which a first reused signal is generated and a second output at which a second reused signal is generated. The second stage has a first reused signal input to which the first reused signal is applied, a second reused signal input to which the second reused signal is applied, and a sum output at which a sum signal is generated. The third stage has a third reused signal input to which the first reused signal is applied, a fourth reused signal input to which the second reused signal is applied, and a carry-out output at which a carry-out signal is generated. In some examples, the first stage includes a transistor stack and an inverter that share a transistor.

METHOD AND APPARATUS FOR GENERATING A DECODING POSITION CONTROL SIGNAL FOR DECODING USING POLAR CODES

Disclosed are a method and apparatus for generating a decoding position control signal for decoding using polar codes. The method and apparatus for generating a decoding position control signal for decoding using polar codes according to an embodiment of the present disclosure include generating a decoding tree obtained by forming a plurality of nodes in a hierarchical structure for a polar-encoded codeword, decoding the codeword using a successive cancellation (SC) decoding technique, and generating control signal through a preset operation relationship based on a position of a bit returned during re-decoding among the decoded codeword.