Single-readout high-density memristor crossbar
10340001 ยท 2019-07-02
Assignee
Inventors
- Mohammed Affan Zidan (Thuwal, SA)
- Hesham Omran (Thuwal, SA)
- Rawan Naous (Thuwal, SA)
- Ahmed Sultan Salem (Thuwal, SA)
- Khaled Nabil SALAMA (Thuwal, SA)
Cpc classification
G11C2013/0057
PHYSICS
G11C2013/005
PHYSICS
H10B63/84
ELECTRICITY
G11C13/0033
PHYSICS
G11C13/0009
PHYSICS
International classification
Abstract
Methods are provided for mitigating problems caused by sneak-paths current during memory cell access in gateless arrays. Example methods contemplated herein utilize adaptive-threshold readout techniques that utilize the locality and hierarchy properties of the computer memory system to address this sneak-paths problem. The method of the invention is a method for reading a target memory cell located at an intersection of a target row of a gateless array and a target column of the gateless array, the method comprising: reading a value of the target memory cell; andcalculating an actual value of the target memory cell based on the read value of the memory cell and a component of the read value caused by sneak path current. Utilizing either an initial bits strategy or a dummy bits strategy in order to calculate the component of the read value caused by sneak path current, example embodiments significantly reduce the number of memory accesses pixel for an array readout. In addition, these strategies consume an order of magnitude less power in comparison to alternative state-of-the-art readout techniques.
Claims
1. A method for reading a target memory cell located at an intersection of a target row of a high-density gateless array and a target column of the high-density gateless array, the method comprising: reading a value of the target memory cell; calculating an actual value of the target memory cell based on the read value of the memory cell and a component of the read value caused by a sneak path current; and calculating the component of the read value caused by the sneak path current prior to calculating the actual value of the target memory cell, wherein calculating the component of the read value caused by the sneak path current includes: estimating a value of an initial memory cell, reading a value of the initial memory cell, and calculating the component of the read value caused by the sneak path current based on the estimated value of the initial memory cell and the read value of the initial memory cell.
2. The method of claim 1, wherein, for each row in the set, estimating a value of a first memory cell comprises: reading the value of the first memory cell a plurality of times; and calculating the estimated value of the first memory cell based on reading the value of the first memory cell a plurality of times.
3. The method of claim 1, wherein calculating the component of the read value caused by sneak path current comprises: storing a known value in a dummy memory cell located in the target row; reading a value of the dummy memory cell; and calculating the component of the read value caused by sneak path current based on the known value stored in the dummy memory cell and the read value of the dummy memory cell.
4. The method of claim 1, wherein reading a value of a particular memory cell comprises: identifying a row of the high-density gateless array and a column of the high-density gateless array that intersect at the particular memory cell; connecting all remaining rows of the high-density gateless array to a first common node; and connecting all remaining columns of the high-density gateless array to a second common node.
5. The method of claim 4, wherein reading the value of the particular memory cell further comprises: biasing the rows connected to the first common node to a first predefined voltage and the columns connected to the second common node to a second predefined voltage.
6. The method of claim 5, wherein the first predefined voltage and the second predefined voltage are equal.
7. An apparatus for reading a target memory cell located at an intersection of a target row of a high-density gateless array and a target column of the high-density gateless array, the apparatus comprising a processor and a memory storing computer-executable instructions, that, when executed by the processor, cause the apparatus to: read a value of the target memory cell; calculate an actual value of the target memory cell based on the read value of the memory cell and a component of the read value caused by a sneak path current; and calculate the component of the read value caused by the sneak path current prior to calculating the actual value of the target memory cell, wherein the component of the read value caused by the sneak path current is calculated by the apparatus by: estimating a value of an initial memory cell, reading a value of the initial memory cell, and calculating the component of the read value caused by the sneak path current based on the estimated value of the initial memory cell and the read value of the initial memory cell.
8. The apparatus of claim 7, wherein, for each row in the set, estimating a value of a first memory cell comprises: reading the value of the first memory cell a plurality of times; and calculating the estimated value of the first memory cell based on reading the value of the first memory cell a plurality of times.
9. The apparatus of claim 7, wherein the computer-executable instructions, when executed by the processor, further cause the apparatus to calculate the component of the read value caused by sneak path current by causing the apparatus to: store a known value in a dummy memory cell located in the target row; read a value of the dummy memory cell; and calculate the component of the read value caused by sneak path current based on the known value stored in the dummy memory cell and the read value of the dummy memory cell.
10. The apparatus of claim 7, wherein reading a value of a particular memory cell includes: identifying a row of the high-density gateless array and a column of the high-density gateless array that intersect at the particular memory cell; connecting all remaining rows of the high-density gateless array to a first common node; and connecting all remaining columns of the high-density gateless array to a second common node.
11. The apparatus of claim 10, wherein reading the value of the particular memory cell further includes: biasing the rows connected to the first common node to a first predefined voltage and the columns connected to the second common node to a second predefined voltage.
12. The apparatus of claim 11, wherein the first predefined voltage and the second predefined voltage are equal.
13. A computer program product for reading a target memory cell located at an intersection of a target row of a high-density gateless array and a target column of the high-density gateless array, the computer program product comprising a computer-readable storage medium storing computer-executable instructions that, when executed, cause an apparatus to: read a value of the target memory cell; calculate an actual value of the target memory cell based on the read value of the memory cell and a component of the read value caused by a sneak path current; and calculate the component of the read value caused by the sneak path current prior to calculating the actual value of the target memory cell, wherein calculating the component of the read value caused by the sneak path current is achieved by causing the apparatus to: estimate a value of an initial memory cell, read a value of the initial memory cell, and calculate the component of the read value caused by the sneak path current based on the estimated value of the initial memory cell and the read value of the initial memory cell.
14. The computer program product of claim 13, wherein the computer-executable instructions, when executed, further cause the apparatus to calculate the component of the read value caused by sneak path current by causing the apparatus to: store a known value in a dummy memory cell located in the target row; read a value of the dummy memory cell, wherein reading a value of a particular memory cell includes: identifying a row of the high-density gateless array and a column of the high-density gateless array that intersect at the particular memory cell, connecting all remaining rows of the high-density gateless array to a first common node, connecting all remaining columns of the high-density gateless array to a second common node, and biasing the rows connected to the first common node and the columns connected to the second common node to a first predefined voltage; and calculate the component of the read value caused by sneak path current based on the known value stored in the dummy memory cell and the read value of the dummy memory cell.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Having thus described certain example embodiments of the present disclosure in general terms, reference will now be made to the accompanying drawings, which are not necessarily drawn to scale, and wherein:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
DETAILED DESCRIPTION
(20) Some embodiments of the present invention will now be described more fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the inventions are shown. Indeed, these inventions may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will satisfy applicable legal requirements. Like numbers refer to like elements throughout.
Sneak Paths Analysis
(21) Sneak-paths impact the performance of a crossbar-based system in two ways. First, a considerable amount of undesirable energy is consumed while current sneaks throughout the array cells. Second, the sneak currents cannot be predicted because they are data dependent. Data stored in a memory array is naturally random, which leads to a random sneak-paths resistance. This is translated into having distributions to represent the One and Zero values rather than a single value. In addition, the magnitude of the sneak-current is typically higher than the current of the desired memory cell; as a result, the distributions for the two binary values are highly overlapped, as shown in
(22) A crossbar can be accessed using two general modes. The first approach is the floating terminals accessing mode, in which the array is accessed through the desired row and column and the other terminals are kept floating, as shown in
Sneak-Paths Correlation
(23) An advantage of the connected terminals accessing mode is its simple model. In the case of biasing all the unselected terminals at V.sub.DD=2, the sneak-paths resistance is made of R.sub.r and R.sub.c only, while R.sub.a is shorted out since nodes n.sub.3 and n.sub.4. (
(24)
where R.sub.x is the resistance of a one-row cell, and L is the array length. The row cell resistance can be either R.sub.on or R.sub.off, which are the ON and OFF resistance of the device under V.sub.dd/2 voltage drop respectively. The row resistance can be rewritten as,
(25)
where N.sub.on is the number of ON cells within the accessed row not counting the accessed cell itself. The sneak-paths component due to the accessed row (R.sub.r) can be derived similarly.
(26) For practical array size, the values of R.sub.r and R.sub.c are almost constant over the same row or column, respectively. For instance, the sneak-paths row resistances found at two different locations in the same row have all cells in common except the two cells that are swapped because of the accessed locations. For devices with a large OFF/ON ratio, the relative change in the sneak-paths row resistance is given by:
(27)
where is the OFF/ON ratio of the used device. The maximum relative change in the row resistance versus the array size for a balanced number of zeros and ones is plotted in
Adaptive-Threshold Readout
(28) The sneak-paths correlation property can be effectively utilized in case of sequential reading for the stored data on an array, which is the typical memory access scheme in computer systems. The cache fetches a block of data from the RAM, as well as RAM do with the HDD. Data is thus transferred and shared between different memory layers as a block of contiguous bits, rather than in random bits or words. As a result, a memory layer is accessed to read or write blocks of adjacent memory cells in sequence. This is possible because of the data locality property. When a bit location is accessed, its neighborhoods are likely to be accessed too.
(29) The locality property is of help only if the knowledge gained from reading a single bit can be adopted in reading that bit's neighborhoods. This is true for the connected terminals crossbar, where the values of R.sub.r and R.sub.c can be safely shared over the same row or column respectively, as discussed in the previous sections. This is equivalent to defining an adaptive threshold that changes at each new row readout, which can be achieved with the aid of the connected terminals crossbar structure.
(30) The generic connected terminals circuit model shown in
I.sub.sense=I.sub.m+I.sub.r,(4)
where I.sub.m is the desired current and I.sub.r is the row sneak current component. Sensing from node n.sub.2 swaps the locations and the role of R.sub.r and R.sub.c in the circuit, as shown in
Multi-Read for Initial Bits
(31) Each bit generally has two unknowns: R.sub.m and R.sub.r (or R.sub.c). Without adopting sneak-paths correlation and locality, multiple access stages are needed to estimate the bit value. However, a faster readout can be achieved by categorizing the bits into two types: the initial bits, which are the first bits accessed in a given row, and regular bits, which are any other bits in the array. To estimate the value of the initial bit, two unknowns need to be calculated: the desired resistance (R.sub.m), and the row sneak resistance (R.sub.r). However, the remaining bits in the row share the same (R.sub.r) value, and I.sub.r is treated as a threshold for a given row. Any of the readout techniques presented in the literature, such as Vontobel, P. O. et al., Writing to and reading from a nano-scale crossbar memory based on memristors, Nanotechnology 20, 425204 (2009), can be used to estimate the initial bit. For instance, a multistage readout procedure may adopt multiple reads and writes per cell to estimate both of the sneak-paths and the desired current components. The readout for the initial bit dictates the threshold used for the remaining bits in that row. In the case of a Zero bit, the threshold (R.sub.th or I.sub.th) can be calculated from the readout for the initial bit (R.sub.ib or I.sub.ib) as follows:
R.sub.th=R.sub.ibC.sub.r, where C.sub.r=0.5(R.sub.offR.sub.on)
I.sub.th=I.sub.ib+C.sub.i, where C.sub.i=0.5(I.sub.onI.sub.off)
where R.sub.on and R.sub.off refer to the resistance with and without applying a current to the memory cell, respectively, and similarly where I.sub.on and I.sub.off refer to the resistance with and without applying a current to the memory cell, respectively.
(32) In the case of a One bit, the threshold (R.sub.th or I.sub.th) can be calculated from the readout for the initial bit (R.sub.ib or I.sub.ib) as follows:
R.sub.th=R.sub.ib+C.sub.r, where C.sub.r=0.5(R.sub.offR.sub.on)
I.sub.th=I.sub.ibC.sub.i, where C.sub.i=0.5(I.sub.on/I.sub.off)
where, as with the Zero bit case above, R.sub.on and R.sub.off refer to the resistance with and without applying a current to the memory cell, respectively, and similarly where I.sub.on and I.sub.off refer to the resistance with and without applying a current to the memory cell, respectively. Subsequently, the resistance or current of subsequent memory cells can be measured and then compared to this threshold to estimate the actual value of the remaining memory cells in the row.
(33)
Predefined Dummy Bits
(34) A more time efficient way to estimate the adaptive threshold is to add dummy bits with a predefined value to the array. For a dummy bit (which may also referred to as a predefined bit), the value of R.sub.m is known in advance, and a single readout is needed to estimate the value of R.sub.r. This estimated R.sub.r value is reused with the other bits in the same row, where, in this case, a single readout is required to estimate the remaining unknown (R.sub.m). This value is used for the rest of the bits in the same row. The dummy bit can be organized in several ways, given that each row contains a single bit.
(35) In this regard, while accessing a row for the first time during a data block fetching, the threshold (R.sub.th or I.sub.th) can be measured by reading the resistance or current of the dummy (or predefined) bit (R.sub.pd or I.sub.pd). The threshold is defined as,
R.sub.th=R.sub.pdC.sub.r, where C.sub.r=0.5(R.sub.offR.sub.on)
I.sub.th=I.sub.pd+C.sub.i, where C.sub.i=0.5(I.sub.on/I.sub.off)
where R.sub.on and R.sub.off refer to the resistance with and without applying a current to the memory cell, respectively, and similarly where I.sub.on and I.sub.off refer to the resistance with and without applying a current to the memory cell, respectively. Subsequently, the resistance or current of subsequent memory cells can be measured and then compared to this threshold to estimate the actual value of the remaining memory cells in the row.
(36) The dummy bits technique adds a smaller amount of overhead to the readout process than the initial bits method, because a dummy bit only needs to be accessed a single time (in comparison to n times for an initial bit). However, for practical size arrays with 256 k size or more, the average number of array accesses per bit when fetching a block of data from memory is almost one for both methods.
(37) In order to evaluate the validity and efficiency of crossbar readout techniques, an accurate simulation platform that includes different crossbar non-idealities is a necessity. To achieve this goal, a Python script was utilized that created SPICE netlists for realistic size arrays and swept different parameters and data patterns by calling HSPICE or Cadence APS iteratively. A crossbar parasitic resistance value of 5 per cell was used and the effect of the switching circuitry in all of the simulations was included in this work. Finally, it should be noted that resistive RAMs are built in the same hierarchy and structure of DRAMs, where subarrays of size up to 256 kb are used to reduce the capacitive loading of the metal lines. As a result, the inventors used an array size up to 256 kb for simulations and comparisons with the above-described crossbar readout techniques.
(38) In this regard, to verify the proposed concept, the readout operation was simulated at different locations of a 256 kb array of various NIST RAM images. In a first case, the readout locations were distributed over the array, while in a second, all the readouts were made for cells in the same column.
Crossbar Power Consumption
(39) Undesirable sneak-paths power consumption is not avoidable in high-density gateless arrays. However, it can be reduced by utilizing devices with nonlinear saturation behavior. Reducing the voltage applied to such devices by fifty percent can increase saturation resistance up to two orders of magnitude. This is a very attractive property since a sneak path is made of series memristor devices, where a sub-voltage is dropped on each of them. In the connected terminals structure, the device nonlinearity can be enforced by biasing the unused terminals to sub-read voltage. In such case, the very small R.sub.a is shorted out, and the nonlinearity of the other terminals is efficiently utilized.
Figure-of-Merit
(40) In general, the presented technique offers a readout technique that is immune to the sneak-paths problem and that is more power efficient and faster than the state-of-the-art crossbar accessing techniques that are presented in the literature. Table 1 shows a detailed comparison between the various gateless techniques that can provide an error-free readout. The different methods are compared based on a figure-of-merit (FoM), which is defined as
(41)
where the proposed technique shows the best FoM.
(42) TABLE-US-00001 TABLE 1 Comparison between the state-of-the-art gateless readout techniques for a subarray of size 256 kb. Error Free # of # of Locality Readout Read Power FoM Readout Reads Writes Needed Circuit* [mW] [Tbit/cm.sup.2W] Multi-Stage Yes 3 3 No ADC + Comp 7 91 Multi-Port Yes 3 0 No ADC + Comp 2.1 304 Grounded Rows & Cols No 1 0 No VG + Comp 4 160 This Work Yes 1.01** 0 Yes VG + Comp 0.291 2195 *ADC: Analog-to-Digital Converter, Comp: Comperator, and VG: Virtual-Ground. **The number of reads is calculated for the case of 16 bytes being fetched from the array in sequence.
Operations Performed by a Computing Device to Efficiently Perform Readout Operations
(43) Having stepped through a description of the adaptive threshold techniques used in example embodiments of the present invention,
(44) Turning now to the flowchart, the procedure begins at optional operation 902. In operation 902, the apparatus calculates, for a target row of a high-density gateless array, a component of the read value that is caused by sneak path current.
(45) In some embodiments, calculating this component may utilize the initial bits strategy described above. In such embodiments, operation 902 may include arranging the high-density gateless array in a connected terminals structure for accessing an initial memory cell located in the target row. Subsequently, the apparatus may estimate a value of the initial memory cell, and also read a value of the initial memory cell. The apparatus thereafter calculates the component of the read value caused by sneak path current based on the estimated value of the initial memory cell and a read value of the initial memory cell. Furthermore, estimating the value of the first memory cell may include reading the value of the first memory cell a plurality of times, and calculating the estimated value of the first memory cell based on reading the value of the first memory cell the plurality of times. As noted previously, any of the readout techniques presented in the literature can be used to estimate this value.
(46) In other embodiments, calculating the component of the read value that is caused by sneak path current may utilize the dummy bits strategy described above. In such embodiments, operation 902 may include storing a known value in a dummy memory cell located in the target row, and arranging the high-density gateless array in a connected terminals structure for accessing the dummy memory cell. Subsequently, the apparatus may read a value of the dummy memory cell, and then calculate the component of the read value caused by sneak path current based on the known value stored in the dummy memory cell and the read value of the dummy memory cell.
(47) It should be understood that operation 902 is optional because, in some embodiments, the component caused by sneak path current may have been previously calculated and need not be calculated a second time prior to performance of operations 904 through 908.
(48) Turning now to operation 904, the apparatus reads a value of the target memory cell. In this regard, in some embodiments reading a value of a particular memory cell includes identifying a row of the high-density gateless array and a column of the high-density gateless array that intersect at the particular memory cell, and connecting all remaining rows of the high-density gateless array to a first common node; and connecting all remaining columns of the high-density gateless array to a second common node.
(49) Reading the value of the particular memory cell may further include biasing the rows connected to the first common node to a first predefined voltage and the columns connected to the second common node to a second predefined voltage. In this regard, the first predefined voltage and the second predefined voltage may be equal.
(50) Subsequently, in operation 906, the apparatus calculates an actual value of the target memory cell based on the read value of the target memory cell and the component of the read value caused by sneak path current (which may have been calculated in operation 902 or previously stored and simply retrieved for use in operation 908).
(51) For ease of explanation, these operations are described above to retrieve an actual value for a single memory cell. However, it should be understood that these operations may be repeated in sequence for a number of memory cells within a target row to retrieve a series of memory cell values within that target row. Furthermore, if memory cells are desired from other rows of the high-density gateless array, these operations may then be repeated for those other rows. It should also be understood that while these operations contemplate retrieving memory cell values in a row-wise fashion, similar operations may be performed to retrieve a sequence of actual values from memory cells in a column-wise fashion.
(52) Accordingly, as illustrated above, taking advantage of the memory locality and the sneak-paths correlation leads to a fast and power efficient readout technique. Contrary to other techniques, embodiments described herein achieve the theoretical limit of a single memory access per pixel for an array readout at a fraction of the power, when compared to the state-of-the-art readout techniques. In fact, according to the Table 1, the adaptive-threshold readout is 7 to 24 times better than the other gateless techniques presented in the literature, based on the density-power figure-of-merit. In addition, the new sneak-paths immune technique requires minimal hardware to distinguish between the memory data values.
(53) The above-described flowchart in
(54) Many modifications and other embodiments of the inventions set forth herein will come to mind to one skilled in the art to which these inventions pertain having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. Therefore, it is to be understood that the inventions are not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Moreover, although the foregoing descriptions and the associated drawings describe certain example combinations of elements and/or functions, it should be appreciated that different combinations of elements and/or functions may be provided by alternative embodiments without departing from the scope of the appended claims. In this regard, for example, different combinations of elements and/or functions than those explicitly described above are also contemplated as may be set forth in some of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.