Abstract
Numerous embodiments for improving an analog neural memory in a deep learning artificial neural network as to accuracy or power consumption as temperature changes are disclosed. In some embodiments, a method is performed to determine in real-time a bias value to apply to one or more memory cells in a neural network. In other embodiments, a bias voltage is determined from a lookup table and is applied to a terminal of a memory cell during a read operation.
Claims
1. A method for a neural network, the method comprising: sensing an operating temperature associated with a set of memory cells; determining a bias in a lookup table based on the sensed operating temperature; applying the determined bias to terminals of the set of memory cells; and performing a read operation on the set of memory cells.
2. The method of claim 1, wherein the bias is generated from a memory cell.
3. The method of claim 1, wherein the set of memory cells comprises a neuron in the neural network.
4. The method of claim 1, wherein the set of memory cells are located in a first array.
5. The method of claim 4, further comprising: sensing a second operating temperature associated with a second set of memory cells; determining a second bias in a lookup table based on the second operating temperature; applying the determined second bias to terminals of the second set of memory cells, wherein the second set of memory cells are located in a second array different than the first array; and performing a read operation on the second set of memory cells.
6. The method of claim 1, wherein the set of memory cells are contained in a single layer in the neural network.
7. The method of claim 1, wherein the set of memory cells comprises all memory cells in a plurality of arrays in the neural network.
8. The method of claim 1, wherein the set of memory cells comprises all memory cells in one or more selected arrays in a plurality of arrays in the neural network.
9. The method of claim 1, wherein the set of memory cells comprises split-gate flash memory cells.
10. The method of claim 1, wherein the set of memory cells comprises stacked-gate flash memory cells.
11. A method for populating a bias look up table, the method comprising: programming a memory cell capable of storing any of N values with 1 of the N values; apply a series of currents of increasing size to a bit line of the programmed memory cell; comparing a voltage of the bit line to a reference voltage to generate a comparison output; when the comparison output changes value, measuring a voltage of a control gate terminal of the memory cell and storing the voltage in a lookup table.
12. The method of claim 11, wherein the memory array comprises non-volatile memory cells.
13. The method of claim 11, wherein the memory array comprises volatile memory cells.
14. The method of claim 11, wherein the voltage of the control gate terminal is measured using a sample-and-hold circuit.
15. A method for a neural network, the method comprising: sensing an operating temperature; indicating the sensed operating temperature with digital bits; and converting an output neuron current into a voltage; and scaling the voltage in response to the digital bits.
16. A voltage averaging circuit for generating a bias, comprising: a variable resistor coupled between an output node and ground; and a plurality of measuring blocks, each measuring block converting a respective input voltage into a current and mirroring the current into the output node; wherein the output node provides a bias equal to an average of the input voltages to the plurality of measuring blocks.
17. The voltage averaging circuit of claim 16, wherein a voltage at the output node is equal to a sum of a value provided by each measuring block, the value comprising the input voltage received by the measuring block multiplied by the ratio of the variable resistor divided by a sum of the resistors of the measuring blocks multiplied by the sum of the respective input voltages.
18. The voltage averaging circuit of claim 17, wherein the output voltage is applied to a control gate terminal of one or more cells in the neural network memory array.
19. A method for a neural network, the method comprising: sensing an operating temperature associated with a set of memory cells; determining a bias based on the sensed operating temperature; applying the determined bias to terminals of the set of memory cells; and performing a read operation on the set of memory cells.
20. The method of claim 19, wherein the set of memory cells comprises a neuron in the neural network.
21. The method of claim 19, wherein the bias is generated from a memory cell.
22. A method for generating a bias for a memory array, the method comprising: programming a memory cell to store a value; applying a series of currents of increasing size to a bit line of the programmed memory cell; and measuring a voltage of a control gate terminal of the memory cell to determine the bias.
23. The method of claim 22, further comprising: storing the determined bias.
24. The method of claim 23, further comprising: applying the bias to one or more memory cells in an array of memory cells during an operation on the one or more memory cells.
25. The method of claim 24, wherein the array is an analog neural memory array.
26. The method of claim 23, comprising: performing the programming, applying, measuring, and storing steps for a plurality of different operating temperatures of the programmed memory cell.
27. A method for determining in real-time a bias for a memory array in a neural network, the method comprising: programming a memory cell to store a value; applying a predetermined current to a bit line of the programmed memory cell; and measuring a voltage of a control gate terminal of the memory cell to determine the bias.
28. The method of claim 27, further comprising: storing the bias.
29. The method of claim 28, further comprising: applying the bias to one or more memory cells in an array of memory cells during an operation on the one or more memory cells.
30. The method of claim 29, wherein the array is an analog neural memory array.
31. The method of claim 28, comprising: performing the programming, applying, measuring, and storing steps for a plurality of different operating temperatures of the programmed memory cell.
32. A method for a neural network, the method comprising: programming a memory cell; applying a series of currents of increasing size to a bit line of the programmed memory cell; comparing a voltage of the bit line to a reference voltage to generate a comparison output; when the comparison output changes value, measuring a voltage of a control gate terminal of the memory cell and storing the voltage as a determined bias; applying the determined bias to terminals of the set of memory cells; and performing a read operation on the set of memory cells.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0092] FIG. 1 is a diagram that illustrates an artificial neural network.
[0093] FIG. 2 depicts a prior art split gate flash memory cell.
[0094] FIG. 3 depicts another prior art split gate flash memory cell.
[0095] FIG. 4 depicts another prior art split gate flash memory cell.
[0096] FIG. 5 depicts another prior art split gate flash memory cell.
[0097] FIG. 6 is a diagram illustrating the different levels of an exemplary artificial neural network utilizing one or more non-volatile memory arrays.
[0098] FIG. 7 is a block diagram illustrating a vector-by-matrix multiplication system.
[0099] FIG. 8 is a block diagram illustrates an exemplary artificial neural network utilizing one or more vector-by-matrix multiplication systems.
[0100] FIG. 9 depicts another embodiment of a vector-by-matrix multiplication system.
[0101] FIG. 10 depicts another embodiment of a vector-by-matrix multiplication system.
[0102] FIG. 11 depicts another embodiment of a vector-by-matrix multiplication system.
[0103] FIG. 12 depicts another embodiment of a vector-by-matrix multiplication system.
[0104] FIG. 13 depicts another embodiment of a vector-by-matrix multiplication system.
[0105] FIG. 14 depicts a prior art long short-term memory system.
[0106] FIG. 15 depicts an exemplary cell for use in a long short-term memory system.
[0107] FIG. 16 depicts an embodiment of the exemplary cell of FIG. 15.
[0108] FIG. 17 depicts another embodiment of the exemplary cell of FIG. 15.
[0109] FIG. 18 depicts a prior art gated recurrent unit system.
[0110] FIG. 19 depicts an exemplary cell for use in a gated recurrent unit system.
[0111] FIG. 20 depicts an embodiment of the exemplary cell of FIG. 19.
[0112] FIG. 21 depicts another embodiment of the exemplary cell of FIG. 19.
[0113] FIG. 22 depicts another embodiment of a vector-by-matrix multiplication system.
[0114] FIG. 23 depicts another embodiment of a vector-by-matrix multiplication system.
[0115] FIG. 24 depicts another embodiment of a vector-by-matrix multiplication system.
[0116] FIG. 25 depicts another embodiment of a vector-by-matrix multiplication system.
[0117] FIG. 26 depicts another embodiment of a vector-by-matrix multiplication system.
[0118] FIG. 27 depicts another embodiment of a vector-by-matrix multiplication system.
[0119] FIG. 28 depicts another embodiment of a vector-by-matrix multiplication system.
[0120] FIG. 29 depicts another embodiment of a vector-by-matrix multiplication system.
[0121] FIG. 30 depicts another embodiment of a vector-by-matrix multiplication system.
[0122] FIG. 31 depicts another embodiment of a vector-by-matrix multiplication system.
[0123] FIG. 32 depicts another embodiment of a vector-by-matrix multiplication system.
[0124] FIG. 33 depicts another embodiment of a vector-by-matrix multiplication system.
[0125] FIG. 34 depicts another embodiment of a vector-by-matrix multiplication system.
[0126] FIG. 35 depicts performance data from a neural network.
[0127] FIG. 36 depicts a neural network method.
[0128] FIG. 37 depicts a neural network array.
[0129] FIG. 38 depicts an array.
[0130] FIG. 39 depicts a neural network array.
[0131] FIG. 40A depicts a method.
[0132] FIG. 40B depicts a bias look up table.
[0133] FIG. 41 depicts a method.
[0134] FIG. 42 depicts a method.
[0135] FIG. 43 depicts an implementation of a scaler and an analog-to-digital converter.
[0136] FIG. 44A depicts a calibration circuit and FIG. 44B depicts a calibration method.
[0137] FIG. 45 depicts a bias averaging circuit.
[0138] FIG. 46A depicts a bias generation block.
[0139] FIG. 46B depicts another bias generation block.
[0140] FIG. 46C depicts another bias generation block.
[0141] FIG. 47 depicts a neural network layer method.
[0142] FIG. 48 depicts a neural network method.
[0143] FIG. 49 depicts a neural network method.
[0144] FIG. 50 depicts a neural network method.
DETAILED DESCRIPTION OF THE INVENTION
[0145] The artificial neural networks of the present invention utilize a combination of CMOS technology and non-volatile memory arrays.
VMM System Overview
[0146] FIG. 34 depicts a block diagram of VMM system 3400. VMM system 3400 comprises VMM array 3401, row decoder 3402, high voltage decoder 3403, column decoder 3404, bit line drivers 3405, input circuit 3406, output circuit 3407, control logic 3408, and bias generator 3409. VMM system 3400 further comprises high voltage generation block 3410, which comprises charge pump 3411, charge pump regulator 3412, and high voltage analog precision level generator 3413. VMM system 3400 further comprises (program/erase, or weight tuning) algorithm controller 3414, analog circuitry 3415, control engine 3416 (that may include special functions such as arithmetic functions, activation functions, embedded microcontroller logic, without limitation), and test control logic 3417. The systems and methods described below can be implemented in VMM system 3400.
[0147] The input circuit 3406 may include circuits such as a DAC (digital to analog converter), DPC (digital to pulses converter, digital to time modulated pulse converter), AAC (analog to analog converter, such as a current to voltage converter, logarithmic converter), PAC (pulse to analog level converter), or any other type of converters. The input circuit 3406 may implement normalization, linear or non-linear up/down scaling functions, or arithmetic functions. The input circuit 3406 may implement a temperature compensation function for input levels. The input circuit 3406 may implement an activation function such as ReLU or sigmoid. The output circuit 3407 may include circuits such as a ADC (analog to digital converter, to convert neuron analog output to digital bits), AAC (analog to analog converter, such as a current to voltage converter, logarithmic converter), APC (analog to pulse(s) converter, analog to time modulated pulse converter), or any other type of converters. The output circuit 3407 may implement an activation function such as rectified linear activation function (ReLU) or sigmoid. The output circuit 3407 may implement statistic normalization, regularization, up/down scaling/gain functions, statistical rounding, or arithmetic functions (e.g., add, subtract, divide, multiply, shift, log) for neuron outputs. The output circuit 3407 may implement a temperature compensation function for neuron outputs or array outputs (such as bitline output) so as to keep power consumption of the array approximately constant or to improve precision of the array (neuron) outputs such as by keeping the IV slope approximately the same.
[0148] As discussed above, a neural network may comprise many different layers, and within each layer, many calculations will be performed involving stored weight values in one or more arrays within that layer. Some layers will be used more than other layers, and it can be appreciated that such layers are more important to the overall accuracy of the neural network based on their high frequency of use.
[0149] FIG. 35 depicts graph 3501 reflecting data collected by the inventors regarding frequency of use of weights within an MLP (multi-layer perceptron) neural network for an MNIST (Modified National Institute of Standards and Technology) digit classification. In the example shown, there are n levels, where each L (L0, ..., Ln) represents a range of weights. As can be seen, the lower weights are used much more frequently than the other weight ranges. For this graph, as an example Ln, does not contribute significantly to the overall network performance. Hence, Ln could be set to a 0 value such as by reducing the control gate voltage applied to the array in level Ln, which would result in lower power consumption due to the lower cell current drawn at the lower control gate voltage, without significantly affection accuracy.
[0150] A neural network comprises multiple layers. Each layer can have a weight distribution that is specific to that layer. Hence, a different technique may be needed for each layer to improve, overall network performance. For example, Ln might contribute only a small amount in a first layer but might contribute a significant among in a second layer.
[0151] The present examples provide for methods of improving operation of a neural network. While the term optimization may be utilized, it is to be understood that the method does not necessarily guarantee absolute optimization, i.e. fully perfect, functional, or effective as possible, but instead the term optimization as used herein is simply meant as an improvement over prior art methods.
[0152] FIG. 35 also depicts table 3502, which indicates the accuracy of read operations based on changes to the voltage, VCG, applied to the control gate of memory cells during a read operation. As can be seen, dropping VCG from 1.8 V to 1.6 V has no impact on accuracy, and dropping VCG from 1.5 V to 1.4 V has a small impact on accuracy. As the VCG (or VEG) is lowered, the cell current is lowered exponentially based on the sub-threshold equation. This indicates that in some cases, power might be saved by dropping the voltage applied to a terminal of a memory cell without sacrificing accuracy or while sacrificing accuracy to an acceptable degree. Similarly, in the linear region, a lower input row voltage results in lower current. One can further appreciate that changes in operating temperature can impact both accuracy and power consumption, and similarly, VCG and/or EG modulation (i.e., an increase or decrease in magnitude) can be used to obtain improved power and/or accuracy as temperature changes.
[0153] Based on this discussion of FIG. 35, it can be appreciated that one can determine and apply different bias voltages for one or more terminals of a memory cell (such as CG, EG, WL, etc.) to improve power consumption (perhaps at the expense of accuracy, for example by lowering the VCG used), to improve accuracy during static temperature conditions (perhaps at the expense of power consumption, for example, by increasing the VCG used), or to improve or maintain accuracy during changing temperature conditions (perhaps at the expense of power consumption, for example, by increasing the VCG as temperature changes). Other performance characteristics could be maximized instead of accuracy and power consumption.
[0154] With these concepts in mind, various methods will now be described.
[0155] FIG. 36 depicts neural network layer method 3600 performed on a particular layer within a neural network. For example, this method might be performed on a layer (or more than one layer) that is deemed more important due to its significant effect on overall network accuracy.
[0156] In step 3601, default voltage biases are applied to terminals (e.g., the control gate terminals) of cells in an array of a layer during a read operation. The default voltage biases typically are the same as the bias values used during verify operations when a programmed weight is verified.
[0157] In step 3602, performance inference is conducted.
[0158] In step 3603, baseline data is collected as to the performance (e.g., accuracy) of the network when default biases are applied to the array. This data is, for example, data indicating the accuracy of an MNIST inference operation. This baseline data will serve as a reference point for performance target checks in step 3605.
[0159] In step 3604, the biases are modulated (e.g., increased or decreased by a certain increment) and then applied to terminals (e.g., the control gate terminals) of cells in the layer of the array.
[0160] In step 3605, a performance target check is performed. If the performance data result is within a target range compared to the performance data collection performed in step 3603, then the method proceeds to step 3604 until the performance target is not met, at which point the method proceeds to completion in step 3606 and the method stores the previous bias condition, which was the last set of biases that resulted in performance data within the target range.
[0161] In step 3606, the previous set of biases are deemed good and are stored for future use (such as in a lookup table) in conjunction with that layer. Optionally, the current operating temperature can be stored along with the bias levels.
[0162] FIG. 37 depicts neural network array 3700. Neural network array 3700 comprises arrays 3701-0, ..., 3701-n, where n+1 is the number of arrays in neural network 3700. Neural network 3700 also comprises temperature sensor 3703-i, where i is the number of sensors, which senses the operating temperature within a specific location in neural network 3700. Optionally, each array 3701-0,...,3701-n contains its own temperature sensor 3703 (such that i=n+1), such that each temperature sensor 3703 is associated with one of the arrays 3701-0,...3701-n and the memory cells contained in such array. Temperature to voltage bias lookup table (LUT) 3704-i, where i is the number of voltage bias lookup tables, is consulted, and based on the sensed temperature, a bias voltage(s) for one or more terminals (e.g., the control gate terminal or the erase gate terminal, without limitation) is obtained. Those bias voltages, termed temperature biases 3702, are then applied to each cell in the particular array in question. Thus, temperature biases 3702-0 are applied to array 3701-0, and so on. Each array 3701-0,...,3701-n forms one or more neurons in the neural network.
[0163] FIG. 38 depicts array 3801. Array 3801 can be used, for example, for any of arrays 3701-0,...3701-n in FIG. 37. In this embodiment, different bias voltages (e.g., VCG) can be used for different sub-arrays 3802-0,..., 3802-k that are contained within the same array 3801, i.e., array 3801 is partitioned into multiple sub-arrays. For example, each sub-array 3802-0,...3802-k can receive its own temperature bias 3803-0,...3803-k, respectively. In addition to allowing for compensation based on the specific operating temperatures measured at different locations within array 3801, this embodiment also would be suitable for a situation where different types of weights are stored in each sub-array 3802. For example, sub-array 3802-0 might store weights in the range 0-30 nA, array 3801-1 might store weights in the range 30-60 nA, and so forth, since each current range may need different temperature biases.
[0164] This embodiment also would be suitable for a situation where the memory cells in different arrays operate in different modes (regions). For example, the cells in sub-array 3802-0 might operate in the sub-threshold mode whereas the cells in sub-array 3802-n might operate in the linear mode, since different modes (regions) may need different temperature biases.
[0165] FIG. 39 depicts neural network array 3900. In this embodiment, the teachings as to FIG. 38 are extended to m+1 arrays 3901-0,..3901-m in neural network array 3900. Each array 3901 is divided into k+1 arrays 3902-0a,...3902-ka (where α is the array number ranging from 0 to m). Each array 3902 receives its own temperature bias 3903-0a,...3903-ka, respectively. It is to be further understood that each array 3901 could be divided into different numbers of arrays and need not be divided into the same number of arrays as other arrays 3901.
[0166] FIG. 40A depicts neural network array 4000. In a typical neural network read (inference) operation within a single layer, a digital input value DIN [m:0] is applied to array 4001, which results in a digital output DOUT [n:0] (or alternatively, an analog value). Array 4001 can be an array or a portion of an array.
[0167] In neural network 4000, criteria are used to find one or more values in lookup table 4003. The criteria might include, for example, the desired input and output values, current operating temperature values, and whether it is desired to target for lowest power consumption, a target performance (e.g., accuracy or latency) or performance at a certain temperature. Lookup table 4003 will then provide biases based on those criteria. Thereafter, the biases are applied to array 4001 during the read operation, which consummates method 4000. Array 4001 can comprise non-volatile memory cells or volatile memory cells.
[0168] FIG. 40B depicts a bias look up table (BLUT) 4020. Array 4021 is an array or a portion of an array of volatile or non-volatile memory cells. Array 4021 receives a digital input, DIN[m:0] and outputs a digital output, DOUT[n:0]. The digital output data pattern is programmable depending on the desired output such as from linear or sub threshold memory cell relation, or from silicon characterization data, without limitation. The digital output data, DOUT[n:0], is then applied to digital-to-analog converter 4022, which outputs a desirable bias analog voltage to be applied to the array, or sub-array, in question. BLUT 4020 is used, for example, to provide biases values in conjunction with a temperature sensor, i.e., temperature biases, to improve the neural network performance.
[0169] FIG. 41 depicts bias generation circuit 4100. Temperature sensor 4101 senses an operating temperature and indicates the operating temperature with digital bits D[m:0]. Optionally a timer 4104 can initiate the temperature sensing and subsequent bias generation such as for example every 10-100 ms (the time that the silicon takes to increase one degree Celsius as example, with one degree Celsius as the allowable temperature change to not affect the network performance significantly). Those D[m:0] bits are used to perform a lookup in lookup table 4102 to find the bias value that should be applied based on that operating temperature, i.e., the appropriate temperature bias. The bias value is indicated with digital bits D[k:0], which are provided to digital-to-analog converter 4103, which converts the digital bits into a bias voltage, which can then be applied to a terminals of memory cells (e.g., control gate terminals) in an array during a read (inference) operation.
[0170] FIG. 42 depicts scaling circuit 4200. Temperature sensor 4201 senses an operating temperature and indicates the operating temperature with digital bits D[n:0]. Those digital bits are provided to scaler 4202, which also receives output neuron current, Ineu, from an array as a result of a neuron read operation. Scaler 4202 performs current-to-voltage conversion of Ineu and performs scaling of that signal based on D[n:0]. For example, for the sub-threshold region, higher temperatures result in higher neuron current (due to higher memory cell current), hence it is desirable to scale down this current before it is applied to the ADC 4203. For the linear region, higher temperatures result in typically lower neuron current (due to lower cell current), hence it is desirable to scale up this current before it is applied to the ADC 4203. The result is a more balanced analog value over temperature that is provided to analog-to-digital converter 4203, resulting in digital output bits D[n:0] that represents the scaled, digital version of Ineu, which scaling at least partially compensates for the senses operating temperature.
[0171] FIG. 43 depicts scaling circuit 4300, which is an implementation of scaler ITV (current to voltage converter) 4202 and analog-to-digital converter 4203 from FIG. 42. Scaler 4202 has a programmable gain, which may be programmed by programming an R value (for the ITV circuit that uses R to convert the neuron current into a voltage to be digitized by the ADC) or a C value (for the ITV circuit that uses C to convert the neuron current into a voltage to be digitized by the ADC). Scaler 4202 can also be implemented as a programmable current mirror (for the neuron (bitline) current). ADC 4203 is a programmable n-bit ADC, where n can be, for example, 4 or 8 or 12 bits.
[0172] FIG. 44A depicts calibration circuit 4400, and FIG. 44B depicts calibration method 4450 that utilizes calibration circuit 4400 to populate lookup table 4470 with values. Current digital-to-analog converter 4402 is coupled to the bit line(s) of memory cell(s) 4401 and to the non-inverting input of comparator 4403, which also receives a reference voltage VREF at its inverting input. The memory cell (s) 4401 can be a single cell or a plurality of cells (e.g., from a reference array or a portion of a main array)
[0173] As stated above, each non-volatile or volatile memory cell used in the analog neural memory system is to be erased and programmed to hold a very specific and precise amount of charge, i.e., the number of electrons, in the floating gate. For example, each floating gate should hold one of N different values, where N is the number of different weights that can be indicated by each cell. Examples of N include 16, 32, 64, 128, and 256. Calibration method 4450 is performed for each of the N different values that can be stored in memory cell 4401. Each time calibration method 4450 is performed, memory cell 4401 is programmed (tuned) to 1 of the N different values, such as a read current of 10 nA (step 4451).
[0174] The voltage on the control gate of memory cell 4401 is measured in accordance with calibration method 4450. The bitline current is varied by current digital to analog converter 4402 from a low current (such as 1 nA) to a high current (such as 100 nA), such that currents of increasing size are applied, and the output of comparator 4403 (referred to as a comparison output) is monitored. At some point, the comparison output will change in value (e.g., from a “0” to a “1”) (step 4452). When the flip occurs, i.e., before any change in the bitline current by current digital to analog converter 4402, the control gate voltage of memory cell 4401 is measured, and that control gate voltage can be stored in lookup table 4470. The method is repeated for the other N possible values that can be stored in memory cell. If more than one cell is used then the currents provided by the current DAC (IDAC) need be adjusted accordingly, for example if 4 cells are used with 1nA each cell (for example for averaging), then the IDAC current is 4nA. The resulting CG voltages are stored in lookup table 4470 (step 4454).
[0175] In another embodiment, lookup table 4470 is further expanded to include values for a plurality of temperatures within the expected operating range, such that lookup table 4470 is a temperature bias lookup table (TBLUT).
[0176] For example, for in a situation where N=128 (which corresponds to an 8-bit input value), an equivalent current range might be1na to 128nA with each 1nA increment associated with one of N levels. Calibration circuit 4400 and calibration method 4450 are then used to populate lookup table 4470 with CG voltages for all 128 levels for each of a plurality of different temperatures (e.g., -40C, -39C, ...0C,..25C, 26C, ..., 85C). If, for example, 10 different temperature points are used for N=128, then lookup table will be populated with 1280 values (one value for each of the 128 levels for each of the 10 different temperatures.
[0177] In another calibration method, a plurality of cells are used to store (weights) currents which represents samples in the array. A bias current from IDAC 4402 is then applied and CG is extracted as above for each of the plurality of cells and their corresponding stored values (weights). This can be determined over temperature and stored in a look up table so the CG bias changes over temperature can be recalled from the look up table for different stored values (weights) and be applied to the arrays based on the stored value for the cell in question. Optionally, this can be performed in real-time and the biases applied to various cells in the array during operation.
[0178] In another embodiment, calibration circuit 4400 and calibration method 4450 of FIG. 44 can be used to do calculate an average of the CG voltage to be applied for each of the N levels for each of the plurality of different temperatures. For example, for each value of N and each temperature, M different readings can be taken and the average reading stored in lookup table 4470. If, for example, 10 different temperature points are used for N=128, then 1280*M readings will be taken, with 1280 different averages stored in lookup table 4470.
[0179] In another embodiment, instead of taking measurements for all N possible values for each of the plurality of temperatures, measurements instead can be taken for a smaller set of possible values (e.g., for 4 of the N possible values instead of all N possible values), and the averages of those smaller set of possible values can be stored in lookup table 4470 for the particular temperature used. Thus, if 10 different temperatures are used, then lookup table 4470 will contained only 10 values (one value for each of the 10 different temperatures.
[0180] In another embodiment, the EG bias voltage is also varied. Measurements of the CG voltage are taken at different EG bias voltages, and CG and EG biases are stored in lookup table 4470.
[0181] FIG. 45 depicts bias average circuit 4500 for determining an average bias based on measurements performed on n+1 different memory cells. The calibration method 4450 is performed on n+1 different cells, each resulting in a voltage (e.g., VCG) that represents the “optimal” or average bias voltage for that cell.
[0182] Each cell is associated with a measuring block 4501, here shown as measuring blocks 4501-0 through 4501-n. Each measuring block 4501 is identical. Measuring block 4501-0 comprises operation amplifier 4502-0, PMOS transistors 4503-0 and 4504 arranged as a current mirror, NMOS transistor 4505-0, and resistor 4506-0. Other measuring blocks 4501 contain identical components. During operation, each measuring block 4501 contributes the mirrored current through its PMOS transistor 4504, which is summed at the top terminal of resistor 4507, which resistor 4507 may be a variable resistor. The output, VOUT, is the average of the various voltages that were provided as inputs to blocks 4501 (by proper ratio of value of the resistor 4507 over 4506). The output voltage VOUT = (R-4507/R-4506) * summation of VINO to VINn, for example n =3, R-4507/R4506 = ¼, VOUT = (¼) * (VIN0+VIN1+VIN2+VIN3), = average voltage of four input voltages VIN0-3.
[0183] The output voltage, VOUT, can be applied as a bias to a control gate terminal of one or more cells in the neural network memory array.
[0184] FIG. 46A depicts bias generation block 4600. Bias generation block 4600 comprises current digital-to-analog converter 4602 coupled to the bit line of memory cell 4601 and to a non-inverting input of comparator 4603, which comparator 4603 also receives a reference voltage VREF to its inverting input (where VREF is the same VREF shown in FIG. 44). Row registers 4604 provide a digital value, DRIN[0:7], to IDAX 4602, which converts the digital value into a current applied to the bit line terminal of cell 4601. An external voltage, VIN, is applied to the CG terminal when switch 4605 is closed. Switch 4606 is closed, and capacitor 4607 is charged to the same voltage as CG. When the output of comparator 4603 changes, switch 4606 is opened; the voltage of capacitor 4607 at that point represents the CG voltage that caused the output of comparator 4603 to change, which is a determined bias voltage. That is, switch 4606 and capacitor 4607 form a sample-and-hold circuit. That voltage is held steady by buffer 4608 and then applied to control gates in an array. The memory cell 4601 can be operated in the sub-threshold region or the linear region.
[0185] FIG. 46B depicts bias generation block 4650, which is similar to bias generation block 4600 except the memory cell 4651 is diode connected to generate the CG bias and does not use a comparator. Bias generation block 4650 can be used in FIG. 44A to generate CG bias values for look up table 4470. Bias generation block 4650 comprises current digital-to-analog converter 4652 coupled to the bit line of memory cell 4651. Current digital-to-analog converter 4652 is controlled by row registers 4654. The voltage on control gate of cell 4651 is sampled by switch 4656, which then charges capacitor 4657 to that voltage, which capacitor 4657 holds the voltage after switch 4656 is opened. That is, switch 4656 and capacitor 4657 form a sample-and-hold circuit. That voltage is held steady by buffer 4658 and then applied to control gates in an array. Memory cell 4651 can be operated in the sub-threshold region or the linear region. Bias generation block 4650 converts an input digital value DRIN[0:7] from row registers 4654 into an equivalent CG voltage to be applied to the array.
[0186] FIG. 46C depicts bias generation block 4680, which is similar to bias generation block 4650 except that it adds level shifter 4685. Bias generation block 4680 can be used in FIG. 44A to generate CG bias values for look up table 4470. Bias generation block 4680 comprises current digital-to-analog converter 4652 coupled to the bit line of memory cell 4651. Current digital-to-analog converter 4652 is controlled by row registers 4654. Level shifter 4685 is placed between the output of current digital-to-analog converter 4652 and the control gate terminal of memory cell 4651, and shifts, for example, the voltage by a bias voltage (e.g., 0.2 V-0.5 V). The voltage on control gate of cell 4651 is sampled by switch 4656, which then charges capacitor 4657 to that voltage, which capacitor 4657 holds the voltage after switch 4656 is opened. That is, switch 4656 and capacitor 4657 form a sample-and-hold circuit. That voltage is held steady by buffer 4658 and then applied to control gates in an array. Memory cell 4651 can be operated in the sub-threshold region or the linear region. Bias generation block 4650 converts an input digital value DRIN[0:7] from row registers 4654 into an equivalent CG voltage to be applied to the array.
[0187] FIG. 47 depicts a neural network neuron method 4700 performed on a particular neuron within a neural network. In step 4701, nominal biases are applied to the particular neurons of interest of the array. This method might be performed on a neuron that is deemed more important due to its frequency of use. Steps 4702 to 4706 are identical to steps 3602 to 3606 in FIG. 36.
[0188] FIG. 48 depicts neural network method 4800. The method 4800 comprises sensing an operating temperature associated with a first set of memory cells (step 4801); determining a bias in a lookup table based on the sensed operating temperature (step 4802); applying the determined bias to terminals of the first set of memory cells (step 4803); and performing a read operation on the first set of memory cells (step 4804). Optionally, the first set of memory cells can comprise all cells in an array. Optionally, the first set of memory cells can comprise all cells in all arrays. Optionally, method 4800 further comprises sensing an operating temperature associated with a second set of memory cells (step 4805); determining a bias in a lookup table based on the second sensed operating temperature (step 4806); applying the determined bias to terminals of the second set of memory cells (step 4807); and performing a read operation on the second set of memory cells (step 4808).
[0189] FIG. 49 depicts neural network operation method 4900, which is similar to neural network operation method 4800 except that bias calibration is performed in in real time. Neural network operation method 4900 comprises sensing an operating temperature associated with a first set of memory cells (step 4901); determining a bias based on the sensed operating temperature (step 4902), applying the determined bias to terminals of the first set of memory cells (step 4903); and performing a read operation on the first set of memory cells (step 4904). Optionally, the first set of memory cells can comprise all cells in an array. Optionally, the first set of memory cells can comprise all cells in all arrays. Optionally, method 4900 further comprises sensing an operating temperature associated with a second set of memory cells (step 4905); determining a bias based on the second sensed operating temperature (step 4906); applying the determined bias to terminals of the second set of memory cells (step 4907); and performing a read operation on the second set of memory cells (step 4908).
[0190] FIG. 50 depicts neural network method 5000, which comprises programming one or more memory cells (step 5001); applying a plurality of currents to the programmed memory cells (step 5002); measuring a voltage of a control gate terminal of each programmed memory cell and storing the voltage as a determine bias for a cell storing the value stored in the programmed memory cell (step 5003)applying bias voltages to terminals of a set of memory cells based using the determined biases for cells storing the values to be stored in the set of memory cells (step 5004); and performing a read operation on the set of memory cells (step 5005).
[0191] It should be noted that, as used herein, the terms “over” and “on” both inclusively include “directly on” (no intermediate materials, elements or space disposed therebetween) and “indirectly on” (intermediate materials, elements or space disposed therebetween). Likewise, the term “adjacent” includes “directly adjacent” (no intermediate materials, elements or space disposed therebetween) and “indirectly adjacent” (intermediate materials, elements or space disposed there between), “mounted to” includes “directly mounted to” (no intermediate materials, elements or space disposed there between) and “indirectly mounted to” (intermediate materials, elements or spaced disposed there between), and “electrically coupled” includes “directly electrically coupled to” (no intermediate materials or elements there between that electrically connect the elements together) and “indirectly electrically coupled to” (intermediate materials or elements there between that electrically connect the elements together). For example, forming an element “over a substrate” can include forming the element directly on the substrate with no intermediate materials/elements therebetween, as well as forming the element indirectly on the substrate with one or more intermediate materials/elements there between.