METHOD AND DEVICE FOR PARALLEL ANALOG IN-MEMORY COMPUTING

Abstract

A method for parallel analog in-memory computing is provided. The method includes the following steps: inputting an analog current signal; replicating the analog current signal to form a corresponding replicated current signal, and performing weighted processing of all the replicated current signals to obtain a corresponding set of modulated current signals; and performing weighted accumulated operation of the set of modulated current signals according to Kirchhoff's current law to obtain an output current signal. In the present disclosure, input, processing and output of signals are performed in a pure current domain, and precise replication and output of current signals are achieved, thereby ensuring the output current precision even under the condition of a high line resistance.

Claims

1. A method for parallel analog in-memory computing, comprising the following steps: inputting an analog current signal; replicating the analog current signal to form a corresponding replicated current signal, and performing weighted processing of all replicated current signals to obtain a corresponding set of modulated current signals; and performing weighted accumulated operation of a set of modulated current signals according to Kirchhoff's current law to obtain an output current signal.

2. The method for parallel analog in-memory computing according to claim 1, wherein the analog current signal is replicated through a current replication circuit to form the corresponding replicated current signal.

3. The method for parallel analog in-memory computing according to claim 2, wherein the weighted processing of all replicated current signals is detailed as follows: performing weight assignment through a switch module, wherein the switch module comprises a switching element; wherein when the switching element is in a low-resistance state, an output modulated current signal equals an input current signal, representing that the input current signal is assigned with a weight of 1; and wherein when the switching element is in a high-resistance state, the output modulated current signal is far less than the input current signal, representing that the input current signal is assigned with a weight of 0.

4. A device using the method for parallel analog in-memory computing according to claim 1, comprising: single-bit units, configured for replicating and weighted processing of analog current signals to obtain a set of modulated current signals; and weighted modules, configured for performing weighted accumulated operation of the set of modulated current signals according to Kirchhoff's current law to obtain an output current signal.

5. The device according to claim 4, wherein each of the single-bit units comprises a replication module and a switch module; the replication module is configured for replicating the analog current signal and outputting the replicated current signal of a fixed size; and the switch module is configured for performing the weighted processing of the replicated current signal.

6. The device according to claim 5, wherein the replication module is a current mirror circuit that replicates the input current at a ratio of 1:1 or any other fixed ratio.

7. The device according to claim 5, wherein the switch module comprises a memristor or a non-volatile memory, and the non-volatile memory is one or more of MRAM, PCM and FLASH.

8. The device according to claim 4, wherein each of the weighted modules performs weighted summation of output modulated current signals from n switch modules respectively with 2.sup.i as weighted values, where n is an integer and i is an integer ranging from 0 to n1, to obtain an output current signal with an n-bit precision.

9. The device according to claim 8, wherein an electronic element of each of the weighted modules is one or more of the memristor, the MRAM, the PCM and the FLASH.

10. The device according to claim 5, wherein replication modules and switch modules included in the single-bit units form a crossbar array.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0030] FIG. 1 is a flowchart of an example of the present disclosure.

[0031] FIG. 2 is a structural schematic diagram of a device for parallel analog in-memory computing of the present disclosure.

[0032] FIG. 3 is a structural schematic diagram of a replication module and a switch module of the present disclosure.

[0033] FIG. 4 is a structural schematic diagram of a weighted module of the present disclosure.

[0034] FIG. 5 is a structural schematic diagram of a replication module and a switch module involved in an embodiment of the present disclosure.

[0035] FIG. 6 is a structural schematic diagram of a replication module and a switch module involved in an embodiment of the present disclosure.

DESCRIPTION OF THE EMBODIMENTS

[0036] The present disclosure will be further described below with reference to the accompanying drawings.

[0037] The method for parallel analog in-memory computing described in the present disclosure includes the following steps: [0038] S1, inputting an analog current signal; [0039] S2, replicating the analog current signal through a current replication circuit to form a corresponding replicated current signal, and performing weighted processing of all the replicated current signals to obtain a corresponding set of modulated current signals, where the weighted processing of the replicated current signal is detailed as follows: [0040] weight assignment is achieved through a switch module; when a switching element is in a low-resistance state, an output modulated current signal equals an input current signal, representing that the current signal is assigned with a weight of 1; and when the switching element is in a high-resistance state, the output modulated current signal is far less than the input current signal, representing that the current signal is assigned with a weight of 0; when the switching element is in the low-resistance state, an output current precisely corresponds to an input current; when the switching element is in the high-resistance state, the output current of the switching element is limited by a resistance of the switching element, and the output current is saturated after exceeding a certain threshold and no longer increases with an increase of the input current; therefore, by controlling a resistance state of the switching element and a magnitude of the input current, stable output of high and low current bits of a single-bit unit can be achieved, and a switching ratio of high and low current levels can be precisely controlled by adjusting the resistance state of the switching element and the magnitude of the input current; and [0041] S3, performing weighted accumulated operation of the set of modulated current signals according to Kirchhoff's current law to obtain an output current signal.

[0042] The device for the method for parallel analog in-memory computing of the present disclosure includes: [0043] single-bit units, configured for replicating and weighted processing of analog current signals to obtain a set of modulated current signals, where each of the single-bit units includes a replication module and a switch module, where the replication module is configured for replicating the analog current signal and outputting the replicated current signal at a ratio of 1:1 or any other fixed ratio, and the switch module is configured for performing the weighted processing of the replicated current signal; and [0044] weighted modules, configured for performing weighted accumulated operation of the set of modulated current signals according to Kirchhoff's current law to obtain an output current signal.

[0045] An output end of the replication module is connected to an input end of the switch module, and an output end of the switch module is connected to an input end of the weighted module.

[0046] A function of matrix-vector multiplication of the present disclosure is implemented through the replication module and the switch module, as shown in FIG. 4. The replication module is configured for achieving precise replication and output of an input current; and the switch module is configured for weighting and output of a current.

[0047] Functions of the replication module are implemented through a CMOS circuit, and the replication modules with different structures and elements vary in circuit complexity, precision and robustness. Generally speaking, the replication module is based on a current mirror circuit structure, and alternatively, additional transistors and connections (such as cascode designs) can be added to achieve a higher impedance and a smaller amplitude of transistor swing, and alleviate the impact of channel width modulation effects, so as to enhance replication precision and circuit robustness. Alternatively, different electronic elements (such as a field-effect transistor, a bipolar junction transistor, and the like) can be used for circuit designs. Specific circuit designs and connections are intended to achieve current replication and output, different specific structures vary solely in costs and performance, and these variations do not change the core design concept of the present disclosure. In practical applications, different modules can be selected based on different actual needs in a targeted manner.

[0048] Generally speaking, the switch module is configured for allocating weights of 0 and 1, which can be implemented by an element capable of binary weighting. This element can be a memristor, an MRAM, a PCM, a FLASH, a fixed resistor, a variable resistor, or any other mature electronic element with a non-volatile signal modulation function. Selection of these specific memory elements does not change the core design concept of the present disclosure.

[0049] In this example, the replication module is based on a CMOS current mirror circuit, and the switch module is based on a memristor, as shown in FIG. 4. The replication module includes a transistor M.sub.1 and a transistor M.sub.2. A gate electrode of the M.sub.1 is connected to a gate electrode and a drain electrode of the M.sub.2. In a signal input phase, current signals are input from the drain electrode of the M.sub.2. Since the drain electrode and the gate electrode of the M.sub.2 are short-circuited, and the M.sub.2 continuously works in a saturation region, output characteristics of the M.sub.1 and the M.sub.2 can be expressed respectively as follows:

[00002] $\begin{matrix} I_{1} = \frac{1}{2}_{n} C_{o x} \frac{W}{L} {(V_{1} - V_{th})}^{2} & (1) \end{matrix}$ $\begin{matrix} V_{1} = \sqrt{\frac{2 IL}{_{n} C_{o x} W}} + V_{th} & (2) \end{matrix}$ $\begin{matrix} I_{2} = \frac{1}{2}_{n} C_{o x} \frac{W}{L} {(V_{2} - V_{th})}^{2} & (3) \end{matrix}$ $\begin{matrix} I_{2} = \frac{{(\frac{W}{L})}_{2}}{{(\frac{W}{L})}_{1}} I_{1} & (4) \end{matrix}$

[0050] In the equations, .sub.n is an electron mobility, and V.sub.th is a threshold voltage, both of which are temperature-dependent parameters; and C.sub.ox is a gate oxide capacitance of an MOS element. A voltage drop V.sub.1 at the gate electrode of the transistor M.sub.1 caused by an input current I.sub.1 is shown in the equation (2). Because the gate electrodes of the M.sub.1 and the M.sub.2 are short-circuited, a gate voltage V.sub.2 applied to the gate electrode of the M.sub.2 is equal to V.sub.1, and an output current I.sub.2 is generated through its own transconductance (as shown in the equation (3)). The equations (1), (2), and (3) are combined to obtain an expression (4) of relation between the output current I.sub.2 and the input current I.sub.1. An output relationship between the M.sub.1 and the M.sub.2 is combined through an inverse function equation (2) of the equation (1), thereby eliminating the need of using the temperature-dependent parameters .sub.n and V.sub.th. An output-to-input ratio of the replication module is solely related to a channel width-to-length ratio of the M.sub.1 or the M.sub.2, such that the output current is a result of exact copy of the input current. When the channel width-to-length ratio of the M.sub.1 or the M.sub.2 is a fixed value, a ratio of the replicated current to the output current remains consistent with the channel width-to-length ratio.

[0051] The corresponding switch module is configured for controlling a weight of the output current and achieving a matrix-vector multiplication operation, and includes a switching element. When a switching element R.sub.1 is in the low-resistance state, the output current of the R.sub.1 precisely corresponds to the input current I.sub.1; when the switching element R.sub.1 is in the high-resistance state, the output current of the R.sub.1 is limited by a resistance of the R.sub.1, and the output current is saturated after exceeding a certain threshold and no longer increases with an increase of the input current I.sub.1. Therefore, by adjusting the switching element R to an appropriate resistance state and selecting an appropriate input current, stable output of high and low current bits can be achieved, where a high current output corresponds to 1 and a low current output corresponds to 0. The switching ratio of high and low current levels can be precisely controlled by adjusting the resistance state of the R.sub.1 and the magnitude of the input current.

[0052] The replication modules and the switch modules, as parts of single-bit units, constitute a large-scale crossbar array, as shown in FIG. 6. For a device with n rows of input and m columns of output, all the replication modules form a n1 array, and all the switch modules form an nm array.

[0053] The weighted module is shown in FIG. 4, and the weighted module performs weighted summation of a series of single-bit current signals that have been processed by the replication modules and the switch modules, to obtain a higher-precision output result. A series of n current signals processed through the above modules are further assigned with different weights in the form of exponentially increasing powers of 2 (such as 1, 2, 4, 8, and the like), followed by further summation, to obtain an output result

[00003] $i_{out} = {.Math.}_{i = 0}^{n} (2^{i} * i_{i + 1})$

with an nbit precision. The weights corresponding to different items in the weighted module are represented by n storage elements with multi-state non-volatile signal modulation capabilities. Each of the storage elements can be the memristor, the MRAM, the PCM, the FLASH, the fixed resistor, the variable resistor, or any other mature electronic element with the non-volatile signal modulation function. Selection of these specific memory elements does not change the core design concept of the present disclosure. A series of modulated current signals can be directly subjected to current summation according to Kirchhoff's law, or summation thereof can be implemented through an adder circuit and the like. A specific design can only affect circuit precision, stability, cost and the like, and does not compromise the core spirit of the present disclosure. Each of the storage elements is capable of storing a weight value w proportional to the n.sup.th power of 2, meaning that conductance values of two adjacent storage elements vary by a factor of 2. When an i.sup.th input signal i.sub.i flows into the weighted module, an analog signal 2.sup.i-1*i.sub.i is output, and then n analog signals are summed to obtain a final output result

[00004] $i_{out} = {.Math.}_{i = 0}^{n} (2^{i} * i_{i + 1}) .$

As the number n of superposed signal paths increases, an output precision also increases; and the entire device is capable of achieving output results of different precisions by modulating a size of n.

[0054] In this example, the input current signals of the device are based on current signals from image sensors such as cameras or photodiode arrays. As shown in FIG. 1, the replication module and the weighted module of the present disclosure are connected to a photodiode array (for signal input) and a display (for output display) respectively. The present disclosure is used to monitor image information from replication environments and achieve real-time analysis and output.

[0055] In this example, a trained neural network weight matrix is input into the switch module in a binary format. When an image sensor works, the photodiode array receives information from real environments and converts the information into a photocurrent vector, and the photocurrent vector as original current information is directly input into an array composed of the replication modules and the switch modules of the present disclosure. Any input current information, after being processed by the array and integrated by the weighted module, becomes an output current vector signal with an adjustable precision. These output results further act on a back-end display and a back-end processor. In this example, after processing of input signals through the device of the present disclosure, real-time recognition results can be displayed directly on the back-end display, and recognition and detection of environmental information can be achieved.

[0056] The present disclosure is capable of directly performing parallel weighted summation of a series of input analog current signals and outputting high-precision and controllable results. The present disclosure processes information in a pure current domain, which, in principle, overcomes the defects of precision loss due to line resistance voltage division that further limits an integration scale according to conventional schemes. Therefore, the present disclosure is capable of significantly reducing the limitations on an array scale according to original schemes, and constructing a larger-scale in-memory computing hardware array with higher performance and integration capacity. Further, high symmetry of the modules designed for the device of the present disclosure, combined with the adjustable weighted modules, allows for precision outputs suitable for a wide temperature range. Tests have shown that in an example with the current mirror circuit and the memristor as core elements, an output precision of no less than 6 bit can be achieved in an ambient temperature range of 100 K to 500 K. The device of the present disclosure demonstrates higher robustness and precision and achieves stable and high-performance operation in environments with extreme temperature variations, such as polar observation stations and space environments.

[0057] Further, the present disclosure directly processes analog current signals output by sensors, eliminates the need of current-to-voltage conversion and analog-to-digital/digital-to-analog conversion, reduces complexity and additional energy consumption of the device, and increases a computational speed. The present disclosure has low complexity and cost, and significant potential and advantages for commercial applications.

METHOD AND DEVICE FOR PARALLEL ANALOG IN-MEMORY COMPUTING

Assignee

Inventors

Cpc classification

Classification Explorer

G11C7/16

PHYSICS

Classification Explorer

G11C7/1006

PHYSICS

Classification Explorer

H03M1/366

ELECTRICITY

International classification

Classification Explorer

G11C7/16

PHYSICS

Classification Explorer

G11C7/10

PHYSICS

Classification Explorer

H03M1/36

ELECTRICITY

Abstract

Claims

Description