Patent classifications
G11C7/16
Charge-sharing compute-in-memory system
Certain aspects provide a circuit for in-memory computation. The circuit generally includes a first memory cell, and a first computation circuit. The first computation circuit may include a first switch having a control input coupled to an output of the first memory cell, a second switch coupled between a node of the first computation circuit and the first switch, a control input of the second switch being coupled to a discharge word-line (DCWL), a capacitive element coupled between the node and a reference potential node, a third switch coupled between the node and a read bit-line (RBL), and a fourth switch coupled between the node and an activation (ACT) line.
Semiconductor device and healthcare system
Provided is a semiconductor device capable of reducing its area, operating at a high speed, or reducing its power consumption. A circuit 50 is used as a memory circuit with a function of performing an arithmetic operation. One of a circuit 80 and a circuit 90 has a region overlapping with at least part of the other of the circuit 80 and the circuit 90. Accordingly, the circuit 50 can perform the arithmetic operation that is essentially performed in the circuit 60; thus, a burden of the arithmetic operation on the circuit 60 can be reduced. Moreover, the number of times of data transmission and reception between the circuits 50 and 60 can be reduced. Furthermore, the circuit 50 functioning as a memory circuit can have a function of performing an arithmetic operation while the increase in the area of the circuit 50 is suppressed.
Semiconductor device and healthcare system
Provided is a semiconductor device capable of reducing its area, operating at a high speed, or reducing its power consumption. A circuit 50 is used as a memory circuit with a function of performing an arithmetic operation. One of a circuit 80 and a circuit 90 has a region overlapping with at least part of the other of the circuit 80 and the circuit 90. Accordingly, the circuit 50 can perform the arithmetic operation that is essentially performed in the circuit 60; thus, a burden of the arithmetic operation on the circuit 60 can be reduced. Moreover, the number of times of data transmission and reception between the circuits 50 and 60 can be reduced. Furthermore, the circuit 50 functioning as a memory circuit can have a function of performing an arithmetic operation while the increase in the area of the circuit 50 is suppressed.
Sub-cell, Mac array and Bit-width Reconfigurable Mixed-signal In-memory Computing Module
A mixed-signal in-memory computing sub-cell only requires 9 transistors for 1-bit multiplication. A computing cell is constructed from a plurality of such sub-cells that share a common computing capacitor and a common transistor. Also proposed is a MAC array for performing MAC operations, which includes a plurality of the computing cells each activating the sub-cells therein in a time-multiplexed manner. Also proposed is a differential version of the MAC array with improved computation error tolerance and an in-memory mixed-signal computing module for digitalizing parallel analog outputs of the MAC array and for performing other tasks in the digital domain. An ADC block in the computing module makes full use of capacitors in the MAC array, thus allowing the computing module to have a reduced area and suffer from less computation errors. Also proposed is a method of fully taking advantage of data sparsity to lower the ADC block's power consumption.
Sub-cell, Mac array and Bit-width Reconfigurable Mixed-signal In-memory Computing Module
A mixed-signal in-memory computing sub-cell only requires 9 transistors for 1-bit multiplication. A computing cell is constructed from a plurality of such sub-cells that share a common computing capacitor and a common transistor. Also proposed is a MAC array for performing MAC operations, which includes a plurality of the computing cells each activating the sub-cells therein in a time-multiplexed manner. Also proposed is a differential version of the MAC array with improved computation error tolerance and an in-memory mixed-signal computing module for digitalizing parallel analog outputs of the MAC array and for performing other tasks in the digital domain. An ADC block in the computing module makes full use of capacitors in the MAC array, thus allowing the computing module to have a reduced area and suffer from less computation errors. Also proposed is a method of fully taking advantage of data sparsity to lower the ADC block's power consumption.
CIRCUITS AND METHODS FOR IN-MEMORY COMPUTING
In some embodiments, an in-memory-computing SRAM macro based on capacitive-coupling computing (C3) (which is referred to herein as “C3SRAM”) is provided. In some embodiments, a C3SRAM macro can support array-level fully parallel computation, multi-bit outputs, and configurable multi-bit inputs. The macro can include circuits embedded in bitcells and peripherals to perform hardware acceleration for neural networks with binarized weights and activations in some embodiments. In some embodiments, the macro utilizes analog-mixed-signal capacitive-coupling computing to evaluate the main computations of binary neural networks, binary-multiply-and-accumulate operations. Without needing to access the stored weights by individual row, the macro can assert all of its rows simultaneously and form an analog voltage at the read bitline node through capacitive voltage division, in some embodiments. With one analog-to-digital converter (ADC) per column, the macro cab realize fully parallel vector-matrix multiplication in a single cycle in accordance with some embodiments.
ADIABATIC CIRCUITS FOR COLD SCALABLE ELECTRONICS
A system and method comprising a cryogenic adiabatic circuit in a cryogenic environment and a clock generator at a higher temperature, the circuit's clock lines can be connected across the temperature gradient to the clock generator, where the clock generator runs below the frequency that would yield power dissipation equal to the static dissipation of a functionally equivalent CMOS circuit at room temperature, resulting in lower power for the function than possible at room temperature irrespective of the speed of operation.
Mixed-Signal Interface Circuit For Non-Volatile Memory Crossbar Array
Crossbar arrays perform analog vector-matrix multiplication naturally and provide a building block for modern computing systems. Specialized mixed-signal interface circuits are interfaced with the rows and columns of the crossbar arrays. During operation, the mixed-signal interface circuits provide high voltages for write operations and low voltages for read operations. This disclosure presents improved designs for the mixed-signal interface circuits which minimize the number of switches as well as the number level shifters.
APPARATUS AND METHOD WITH NEURAL NETWORK OPERATIONS
A neural network apparatus includes: a first processing circuit and a second processing circuit each configured to perform a vector-by-matrix multiplication (VMM) operation on a weight and an input activation; a first register configured to store an output of the first processing circuit; an adder configured to add an output of the first register and an output of the second processing circuit; a second register configured to store an output of the adder; and an input circuit configured to input a same input activation to the first processing circuit and the second processing circuit and control the first processing circuit and the second processing circuit.
Asynchronous analog accelerator for fully connected artificial neural networks
Methods of performing mixed-signal/analog multiply-accumulate (MAC) operations used for matrix multiplication in fully connected artificial neural networks in integrated circuits (IC) are described in this disclosure having traits such as: (1) inherently fast and efficient for approximate computing due to current-mode signal processing where summation is performed by simply coupling wires, (2) free from noisy and power hungry clocks with asynchronous fully-connected operations, (3) saving on silicon area and power consumption for requiring neither any data-converters nor any memory for intermediate activation signals, (4) reduced dynamic power consumption due to Compute-In-Memory operations, (5) avoiding over-flow conditions along key signals paths and lowering power consumption by training MACs in neural networks in such a manner that the population and or combinations of multi-quadrant activation signals and multi-quadrant weight signals follow a programmable statistical distribution profile, (6) programmable current consumption versus degree of precision/approximate computing, (7) suitable for ‘always-on’ operations and capable of ‘self power-off’, (8) inherently simple arrangement for non-linear activation operations such as Rectified Linear Unit, ReLu, and (9) manufacturable on main-stream, low cost, and lagging edge standard digital CMOS process requiring neither any resistors nor any capacitors.