ULTRA-COMPACT CAM ARRAY BASED ON SINGLE MTJ AND OPERATING METHOD THEREOF
20230377650 · 2023-11-23
Assignee
Inventors
Cpc classification
International classification
Abstract
Disclosed in the present invention is an ultra-compact CAM array based on a single MTJ and an operating method thereof. The CAM array comprises an M*N CAM core for storing contents, additional reference rows for storing “0” and “1” and reference columns for storing “0” and “1”, a row decoder, a column decoder, transmission gates ENs, write drivers WDs, search current sources I.sub.searchs and two-stage detection amplifiers. The present invention utilizes 1T-1MTJ cells to construct the CAM array, and combines the advantages of the MTJ and CMOS. While ensuring search energy efficiency, a unique structure of the MTJ is utilized to implement a less area overhead and a lower search delay compared with a traditional CMOS-based CAM, and non-volatility is achieved.
Claims
1. An ultra-compact content addressable memory (CAM) array based on a single magnetic tunnel junction (MTJ), wherein the CAM array comprises an M*N CAM core for storing contents, additional reference rows for storing “0” and “1” and reference columns for storing “0” and “1”, a row decoder, a column decoder, transmission gates ENs, write drivers WDs, search current source I.sub.searchs, and two-stage detection amplifiers.
2. The ultra-compact CAM array based on a single MTJ according to claim 1, wherein the CAM core comprises M*N CAM cells, and each CAM cell comprises 1MTJ and 1NMOS; two terminals of the MTJ in each CAM cell are respectively connected with a bit line BL and drain of an NMOS, gate of the NMOS is connected with a word line WL, and source of the NMOS is connected with another bit line BLB; bit lines of the CAM cells in each row in the CAM array are connected to each other, and each column shares the same longitudinal WL.
3. The ultra-compact CAM array based on a single MTJ according to claim 1, wherein the row decoder controls the transmission gates ENs and the search current sources I.sub.searchs, the column decoder controls the WLs, the write drivers WDs are divided into write drivers WD.sub.1s and write drivers WD.sub.2s, the write drivers WD.sub.1s are connected to the BLs through the transmission gates ENs, the write drivers WD.sub.2s are connected to the BLBs, the BL of each storage row in the CAM core is connected to positive inputs of two two-stage detection amplifiers SAs, and the two-stage detection amplifier SAs are divided into two-stage detection amplifiers SA.sub.0s and two-stage detection amplifiers SA.sub.1s, the BL of the reference row for storing “0” connects to negative inputs of all the two-stage detection amplifiers SA.sub.0s, and the BL of the reference row for storing “1” connects to negative inputs of all the two-stage detection amplifiers SA.sub.1s.
4. The ultra-compact CAM array based on a single MTJ according to claim 1, wherein two types of storing are performed on the MTJ by a bidirectional current generated by two write drives WDs in each row: “0” and “1”.
5. The ultra-compact CAM array based on a single MTJ according to claim 1, wherein during the search, a read voltage and a reference voltage are respectively generated on the BLs of the storage row and the reference row by the search current source I.sub.search, and information about match or not is obtained through the two two-stage detection amplifiers SAs in each row.
6. The ultra-compact CAM array based on a single MTJ according to claim 1, wherein all 1T-1MTJ cells in the reference row for storing “0” and the reference column for storing “0” store data “0”, all 1T-1MTJ cells in the reference row for storing “1” and the reference column for storing “1” store data “1”, and at the intersection of the two reference rows and the two reference columns are four 2T cells for ensuring that a reference voltage on the BL of the reference row is different from read voltages of the other storage rows during the search.
7. The ultra-compact CAM array based on a single MTJ according to claim 1, wherein the two-stage detection amplifier SA comprises a first differential pre-amplifier and a second-stage dynamic latch voltage comparator.
8. An operating method for the CAM array according to claim 1, wherein the operating method comprises: before the CAM array starts to work, data storage is performed on each cell, that is, after the information is encoded into a binary sequence, the 1MTJ is written through a bidirectional current; for each search operation, a two-step search scheme is utilized: a first step: enabling a WL corresponding to all “0” bits in a search sequence and a WL in the reference column for storing “0”, setting WLs of the remaining columns to 0, letting BLB be grounded, applying the search current to generate a read voltage V.sub.SEARCH0 on a storage row BL and generate a reference voltage V.sub.REF0 on BL of the reference row for storing “0”, and when a clock signal CLK is at a high level, precharging the two output terminals of the two-stage detection amplifier SA0 to a high level; when the CLK goes low, if the row has a mismatch condition that “1” is stored while “0” is searched, letting V.sub.SEARCH0 be greater than V.sub.REF0, so that a reverse output terminal ML.sub.0 of the two-stage detection amplifier SA.sub.0 is pulled down to ground; and if the row matches, remaining ML.sub.0 at a high level; a second step: enabling a WL corresponding to all “1” bits in a search sequence and a WL in the reference column for storing “1”, setting WLs of the remaining columns to 0, letting BLB be grounded, applying the search current to generate a read voltage V.sub.SEARCH1 on a storage row BL and generate a reference voltage V.sub.REF1 on the reference row BL for storing “1”, and when a clock signal CLK is at a high level, precharging the two output terminals of the two-stage detection amplifier SA.sub.1 to a high level; when the CLK goes low, if the row has a mismatch condition that “0” is stored while “1” is searched, letting V.sub.SEARCH1 be smaller than V.sub.REF1, so that a positive output terminal ML.sub.1 of the two-stage detection amplifier SA.sub.1 is pulled down to ground; and if the row matches, remaining ML.sub.1 at a high level; when a segmentation design is utilized for performing a long-byte search, the ML of each segment is shorted by a logic circuit to obtain search results of the first step and/or the second step, and in a global detector, the search result in the first step is connected to an AND gate by a D-latch and the search result in the second step is also connected to the AND gate to obtain the search result of the whole row; and in the search phase of the second step, the output of the AND gate is observed, and the row matches if the output is at a high level.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
DESCRIPTION OF THE EMBODIMENTS
[0038] The present invention is further described in detail in combination with the accompany drawings and specific embodiments.
[0039] 1. A 1T-1MTJ Cell Structure and Operation Process:
[0040] As shown in part (a) of
Bias conditions of read and write operations of the 1T-1MTJ cell are shown in part (b) of
[0041] 2. Overall Structure and Operation Process of a CAM Array of 1T-1MTJ:
[0042] As shown in
[0043] BL of the storage row is connected to positive input terminals of two two-stage detection amplifiers SAs, and the two two-stage detection amplifiers SAs are a two-stage detection amplifier SA.sub.0 and a two-stage detection amplifier SA.sub.1 respectively. The BL of the reference row storing “0” is connected to negative input terminals of all the two-stage detection amplifiers SA.sub.0s, and the BL of the reference row storing “1” is connected to negative input terminals of all two-stage detection amplifiers SA.sub.1s.
[0044]
[0045] The whole operation process of the CAM array of the 1T-1MTJ is as follows: [0046] (1) Before the 1T-1MTJ CAM array starts to work, data is stored for each CAM cell, that is, after information is encoded into binary data, two Write Drivers WDs in each row provide a sufficient write current by enabling the transmission gates EN and WL. The write operation is performed row by row, which is divided into two steps of writing “0” and writing “1”. During the writing operation, the transmission gates ENs of all unselected rows and WLs of unselected columns should be turned off to avoid to write interference. [0047] (2) For each search operation, a two-step search scheme is utilized: [0048] (2.1) Step 1: finding all the mismatch conditions that “1” is stored while “0” is searched. As shown in part (a) of
where R.sub.on is on-resistance of NMOS controlled by WL in each cell. Then a read voltage V.sub.SEARCH0 generated on the BL of this row by applying the search current I.sub.SEARCH is as follows:
[0049] At the same time, I cells storing “0” and 1 2T cell are enabled on the reference row storing “0”, and the resistance after being connected in parallel is
and then the reference voltage V.sub.REF0 generated by applying the search current I.sub.SEARCH on the BL of the reference row storing “0” is as follows:
[0050] When CLK is at a high level (the precharging stage), the two output terminals of the two-stage detection amplifier SA.sub.0 are precharged to a high level; when CLK becomes low (the search stage), if there is a mismatch condition that “1” is stored while “0” is searched in the row, V.sub.SEARCH0 is greater than V.sub.REF0, so that ML.sub.0 at the reverse output terminal of the two-stage detection amplifier SA.sub.0 is pulled down to the ground; and if it matches for this row, V.sub.SEARCH0 is less than V.sub.REF0 and ML.sub.0 remains a high level. [0051] (2.2) Step 2: finding all the mismatch conditions that “0” is stored while “1” is searched. As shown in part (b) of
and then a read voltage V.sub.SEARCH1 generated on the BL of this row by applying the search current I.sub.SEARCH is as follows:
[0052] At the same time, J cells storing “0” and 1 2T cell are enabled on the reference row storing “1”, and the resistance after being connected in parallel is
and then the reference voltage V.sub.REF1 generated by applying the search current I.sub.SEARCH on the
[0053] BL of the reference row storing “1” is as follows:
[0054] When CLK is at a high level (the precharging stage), the two output terminals of the two-stage detection amplifier SA.sub.1 are precharged to a high level; when CLK becomes low (the search stage), if there is a mismatch condition that “0” is stored while “1” is searched in the row, V.sub.SEARCH1 is less than V.sub.REF1, so that ML.sub.1 at the positive output terminal of the two-stage detection amplifier SA.sub.0 is pulled down to the ground; and if it matches for this row, V.sub.SEARCH1 is greater than V.sub.REF1 and ML.sub.1 remains a high level.
[0055] Therefore, combining the above two-step search, only if ML.sub.0 is high in the first step and ML.sub.1 is high in the second step, it indicates that the stored content and the search sequence match, otherwise there is a mismatch.
[0057] As a search word length increases, the difference between the read voltage and the reference voltage becomes smaller, which would affect the search reliability. Therefore, the present invention provides a segmented design scheme to support a long byte search. As shown in part (a) of
[0058] The functions and effects of the present invention are further illustrated and demonstrated by the following simulation experiment:
[0059] 1. Simulation Conditions
[0060] In the experiment, the MTJ is simulated using a physical-circuit-based compatible SPECTRE and SPICE Model with efficient design and analysis. The basic transistors use a 45 nm Predictive Technology Model (PTM) with a voltage of 1.1V. The key technical parameters of the MTJ set by the simulation are shown in the following table.
TABLE-US-00001 Name of parameter Detailed description Default value D Diameter of MTJ 40 nm TMR.sub.0 TMR rate without V.sub.bias 150% T.sub.free Thickness of free layer 1.3 nm T.sub.oxide Thickness of oxide layer 0.75 nm R .Math. A Resistance value * area 5Ω .Math. μm.sup.2 ΔTMR Process change rate of TMR rate 3% ΔT.sub.free Process change rate of thickness of 3% oxide layer ΔT.sub.oxide Process change rate of thickness of 3% free layer V.sub.DD Power supply voltage 1.1 V
[0061] In the simulation, the CAM design of 1T-1MTJ is simulated using a SPECTRE software. In addition to the simulation of the CAM design in the present invention, we compare our results with five CAM designs proposed in a non-patent document 1 (A. T. Do, C. Yin, K. S. Yeo, and T.T.-H. Kim,“Design of a power-efficient cam using automated background checking scheme for small match line swing,” in 2013 Proceedings of the ESSCIRC (ESSCIRC). IEEE, 2013, pp. 209-212.), a non-patent document 2 (S. Matsunaga, A. Katsumata, M. Natsui, T. Endoh, H. Ohno, and T. Hanyu, “Design of anine-transistor/two-magnetic-tunnel-junctioncell-based low-energy nonvolatile ternary content-addressable memory,” Japanese Journal of Applied Physics, vol. 51, no. 2S, p. 02BM06, 2012.), a non-patent document 3 (B. Song, T. Na, J. P. Kim, S. H. Kang, and S.-O. Jung, “A 10t-4mtj nonvolatile ternary cam cell for reliable search operation and a compact area,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 64, no. 6, pp. 700-704, 2016.), a non-patent document 4 (C. Wang, D. Zhang, L. Zeng, E. Deng, J. Chen, and W. Zhao, “A novel mtj-based non-volatile ternary content-addressable memory for high-speed, low-power, and high-reliable search operation,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 66, no. 4, pp. 1454-1464, 2018.) and a non-patent document 5 (C. Wang, D. Zhang, L. Zeng, and W. Zhao, “Design of magnetic nonvolatile tcam with priority-decision in memory technology for high speed, low power, and high reliability,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 67, no. 2, pp. 464-474, 2019.).
[0062] The comparison metrics mainly include the number of transistors, the area of each CAM cell, write energy consumption of each CAM cell per write, search error rate, search delay and search energy consumption of each CAM cell per search. For the CAM design in the present invention, the measurement of the search error rate and search delay is in a worst case, that is, only one CAM cell does not match; the average energy consumption of writing unit data “0” and writing unit data “1” is taken as the write energy consumption. The measurement of search energy consumption is done by using an average case where half of the CAM cells in a row match.
[0063] 2. Simulation Results
[0064] 1) Functional Verification of the 1T-1MTJ CAM
[0065] 1.1)
[0066] 1.2)
[0067] 2) Write Speed and Write Energy Consumption Analysis
[0068] The setting of the transfer gate transistor width, the enabling voltage V.sub.en on WL and EN.sub.TG and the write voltage V.sub.WRITE on the write driver WD would affect the write efficiency of the 1T-1MTJ CAM array.
[0069] 3) Search Reliability Analysis
[0070] Under the premise that the process change rate of MTJ TMR rate, oxide layer thickness and free layer thickness is set as 3%, and the process change rate of transistor width and threshold voltage is set as 10%, the search error rate (SER) is obtained by performing a Monte Carlo simulation with only one CAM cell mismatch to perform search reliability analysis.
[0071] 4) Search Delay and Search Energy Consumption Analysis
[0072] After confirming the search reliability of the 1T-1MTJ CAM design, it is necessary to analyze the search delay and search energy consumption. In the precharging stage (CLK is at a high level), not only the output of SA needs to be precharged to a high level, but also the first differential pre-amplifier needs to prepare two voltage signals for the input of the second stage to participate in the comparison, so that the SA can produce the comparison results when CLK becomes low. Therefore, when the bias current is increased to improve the bandwidth of the first stage, as shown in part (a) of
[0073] 5) Performance Comparison
[0074] The following table presents the comparison of the metrics of the CAM design based on the single MTJ in the present invention with other CAM designs.
TABLE-US-00002 Non-patent Non-patent Non-patent Non-patent Non-patent The REFERENCE document document document document document present DOCUMENTS 1 2 3 4 5 invention Non-volatility No Yes Yes Yes Yes Yes Cell structure 10T 9T-2MTJ 10T-4MTJ 15T-4MTJ 20T-6MTJ 1T-1MTJ Cell plane 3.3 6.84 8.28 10.76 18.05 0.06 (μm.sup.2) SER (144-bit) 0 30.0% 18.5% 2.7% 2.7% 8.3% Search delay 1.07 0.20 1.28 0.17 0.17 0.17 (ns) Search energy 0.77 37.37 5.07 0.17 1.06 2.72 consumption (fJ/bit) Writing energy 0.03 0.55 5.79 1.59 2.38 1.26 consumption (pJ/bit)
[0075] The above table summarizes the technical metrics of the 1T-1MTJ CAM and other CAMs, in which the word length of segment of the 1T-1MTJ CAM is set to 16 bit per segment. As can be seen from the above table, the cell area of the 1T-1MTJ CAM in the present invention is 1.82% of that of the 10T cell based on the traditional CMOS technology, and this advantage is further amplified when compared with other MTJ-based CAMs. Although the 1T-1MTJ CAM needs the reference rows, reference columns, and SAs to complete the search operation, these additional area overhead is negligible when performing the long byte search, and the search delay of the 1T-1MTJ CAM is only 16% of that of 10T CAM. Although the search energy consumption of 15T-4MTJ/20T-6MTJ CAMs is lower than that of the 1T-1MTJ CAM, the area overhead is much higher. In addition, the search energy consumption of the 1T-1MTJ CAM would be further reduced when the word length of the segment increased. At the same time, because there are fewer MTJs and transistors in the writing path of the 1T-1MTJ CAM, compared with 10T-4MTJ/15T-4MTJ/20T-6MTJ CAM, the writing energy efficiency is increased by 4.60 times/1.26 times/1.89 times. Although the writing energy consumption of 9T-2MTJ CAM is lower, the search error rate of 1T-1MTJ CAM is only 28% of that of 9T-2MTJ CAM.
[0076] It can be seen from the above results that the present invention not only has non-volatility which is difficult to be achieved by the CMOS design, and robustness against the process changes, but also has the characteristics of compact design, low energy consumption and low delay. In addition, the above results also validate the effectiveness of the 1T-1MTJ CAM array utilizing the two step search scheme and the segmented design in the data-intensive search applications.
[0077] The above embodiments are used to explain the present invention, not to restrict it, and without departing from the spirit and protection scope of claims in the present invention, any modification or alteration made to the present invention falls within the protection scope of the present invention.