Plural Distributed PBS with Both Voltage and Current Sensing SA for J-Page Hierarchical NAND Array's Concurrent Operations
20170352424 · 2017-12-07
Inventors
Cpc classification
G11C2211/5642
PHYSICS
G11C16/3459
PHYSICS
G11C16/0483
PHYSICS
International classification
G11C16/34
PHYSICS
G11C11/56
PHYSICS
Abstract
Provided are several preferred options of 3D hierarchical NAND arrays being formed in a (2D DL//3D LBL).sub.⊥(3D CSL//3D WL) scheme and their associated 2D PBs are preferably formed right below the 3D array but on the reversed side of Psub so that the large silicon areas of most 2D peripheral circuits can be saved and the various 3D nLC NAND operations can be performed in more powerful pipeline and concurrent manner with a dramatic reduction in latency and power consumption.
The preferred various 3D hierarchical NAND memories comprise a plurality of divided 3D sub-arrays for nLC storage, a plurality of 3D N-bit Cstring-based DCRs with minimum memory capacity to store 3×2n pages of program data when a 3-WL rotational nLC program scheme is adopted, and a plurality of distributed N-bit PBs with same number of LBL lines.
Each hierarchical 3D array comprises a plurality of 3D LGs and each LG comprises a plurality of 3D blocks connected by N local 3D LBL metal lines and 3D CSL lines and each block further comprises N strings without a need of extra local precharge line of LGps lines as disclosed in prior granted patents.
More number of distributed N-bit PBs would allow more powerful and flexible concurrent operations to be performed at the expense of taking larger silicon area in reversed side of Psub. By contrast, less number of distributed N-bit PBs would allow less powerful and flexible concurrent operations to be performed with a tradeoff of saving more silicon area in the reversed side of Psub. For performing any concurrent 3D NAND operation, a minimum two N-bit PB and 3×2n N-bit DCRs are required. Each N-bit SA comprises at least n+1 N-bit latches.
Each bit of PB comprises one SA and one nLC-latch circuit. N-bit SA further comprises one N-bit Current-sensing circuit for performing ABL program, ABL page data loading in each N-bit CLBLs, ABL program-verify, ABL read on each 3D sub-array and ABL Write-back to each N-nit Cstring-based DCRs, and one N-bit Voltages-sensing circuit for performing HBL Recall from each page of selected Cstring-based N-bit DCR to N-bit PB. The operations of the 3D hierarchical NAND and Cstring-based DCR arrays and their associated distributed PBs can be performed in both concurrent and pipeline manners, regardless of a 2-poly floating-gate 3D cell or a 1-poly charge-trapping 3D cell, regardless of GIDL or FN-tunneling erase scheme, regardless of SLC, MLC, TLC and XLC storage types.
Claims
1. A 3D hierarchical NAND arrays comprising: a plurality of divided 3D sub-arrays for nLC storage, a plurality of 3D N-bit Cstring-based DCRs with minimum memory capacity to store 3×2n pages of program data when a 3-WL rotational nLC program scheme is adopted, and a plurality of distributed N-bit PBs with same number of LBL lines; each hierarchical 3D array comprises a plurality of 3D LGs and each LG comprises a plurality of 3D blocks connected by N local 3D LBL metal lines and 3D CSL lines and each block further comprises N strings without a need of extra local precharge line of LGps lines as disclosed in prior granted patents; more number of distributed N-bit PBs would allow more powerful and flexible concurrent operations to be performed at the expense of taking larger silicon area in reversed side of Psub; each bit of PB comprises one SA and one nLC-latch circuit. N-bit SA further comprises one N-bit Current-sensing circuit for performing ABL program, ABL page data loading in each N-bit CLBLs, ABL program-verify, ABL read on each 3D sub-array and ABL Write-back to each N-nit Cstring-based DCRs, and one N-bit Voltages-sensing circuit for performing HBL Recall from each page of selected Cstring-based N-bit DCR to N-bit PB; the operations of the 3D hierarchical NAND and Cstring-based DCR arrays and their associated distributed PBs can be performed in both concurrent and pipeline manners, regardless of a 2-poly floating-gate 3D cell or a 1-poly charge-trapping 3D cell, regardless of GIDL or FN-tunneling erase scheme, regardless of SLC, MLC, TLC and XLC storage types.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0007]
[0008]
[0009] Each 16 KB 2D PB is shared by two 16 KB low-level ⅛-long local 2D LBL lines being formed in parallel to 16 KB DLs but perpendicular to 3D stings' 3D CSL lines and 3D WLs, e.g., referred as (DL//LBL).sub.⊥(CSL//WL) in accordance with the preferred concurrent operations of ABL nLC Program, ABL nLC Read, ABL nLC Program-verify, ABL nLC Erase-verify and nLC Erase of the present invention. Each 16 KB PB takes care of two divided ⅛ 3D sub-arrays and all four distributed 16 KB PBs and 4n×16 KB DCRs circuits are preferably formed below the preferred four divided 3D hierarchical NAND sub-arrays to save silicon area.
[0010]
[0011] Each 16 KB 2D PB is shared by four 16 KB low-level ⅛-long local 2D LBL lines in two separate paired LGs being formed in parallel to 16 KB DLs but perpendicular to 3D stings' 3D CSL lines and 3D WLs, e.g., referred as (DL//LBL).sub.⊥(CSL//WL) in accordance with the preferred concurrent operations of ABL nLC Program, ABL nLC Read, ABL nLC Program-verify, ABL nLC Erase-verify and nLC Erase of the present invention. Each 16 KB PB takes care of two divided ⅛ 3D sub-arrays and all four distributed 16 KB PBs and 4n×16 KB DCRs circuits are preferably formed below the preferred four divided 3D hierarchical NAND sub-arrays to save silicon area.
[0012]
[0013]
[0014]
SUMMARY OF THE INVENTION
[0015] A principle objective of the invention is to form various distributed CLBL-based LGs 3D Hierarchical NAND arrays associated with a plurality of the distributed 3D Csting-based N-bit 3D DCRs and a plurality of distributed N-bit 2D LV PBs and other peripheral circuits to be formed right below 3D NAND and distributed 3D DCR arrays but in the reversed side of Psub to allow multiple 3D NAND operations to be performed in pipeline, concurrent or the mixed manners with a dramatic reduction in silicon area and power consumption.
[0016] Another objective of the invention is to use only HV 3D NMOS devices such as a 3DML transistor as a HV buffer to connect each local HV 3D LBLo/e metal line to each corresponding shared LV 2D GBLo/e metal line controlled by a gate signal of LG, to use a HV 3DMD NMOS transistor to connect each local HV 3D DCRo/e metal line to each corresponding shared LV 2D GBLo/e metal line controlled by a gate signal of ENDCR, and a HV 3DMT NMOS transistor acting as a switch between two adjacent LGs' LBL lines controlled by a gate signal of TIE. The HV means Verase or Vpgm with a value up to 25V.
[0017] Yet another objective of the invention is to form all various preferred Hierarchical 3D cell arrays and the distributed Csting-based 3D DCR arrays with at least two or more than two distributed N-bit LV PBs being preferably formed right below above two 3D arrays but in the reversed side of the Psub to save the most of the silicon areas of peripheral circuits.
[0018] A still further objective of the invention is to use one 3D NAND string's channel capacitance (Cstring) as 1-bit DCR to store 1-bit nLC digital program data with Vdd voltage for “1” data and Vss voltage for “0” data. All 3D cells' Vts in each Cstring of DCR are preferably kept as Vte, which is the erase-state Vt with a value below 0V, e.g., Vte<0V. As such, each Cstring capacitance of each bit of DCR can reach the maximum value referred as Cstringmax when all 3D WLs of all N-bit DCR blocks are tied to Vdd but with VSSL=VGSL=0V to prevent the each stored voltage of each bit of program data from being leaked.
[0019] Note, the location of every distributed 3D DCR is preferably placed nearing each corresponding distributed 2D PB to keep the short 3D DCRo/e metal line so that the highest signal voltage level after each Charge-sharing (CS) operation of each HBL Recall operation can be achieved. The CS is performed between each 3D Cstring and each corresponding 3D DCR metal with a capacitor ratio defined as the equation of R=Cstring/(Cstring+CDCR). For today's 3D NAND technology, the maximum number of cells in each Cstring is 48 which makes Cstring comparable to CDCR, thus high value of R.
[0020] A still further objective of the invention is to execute an immediate concurrent ABL SLC program on all Cstring-based DCRs that store the 3-WL nLC program 3×2n×N-bit page data when Vdd supply's unintentional power-down is being detected and the desired nLC 3-WL rotational program is not completed yet. Note, the ABL SLC program is only performed on those erased cell of DCRS' WLs with incomplete nLC ABL program in the regular 3D NAND array in accordance with on-chip state machine record and control.
[0021] A still further objective of the invention is to design each SA circuit of each bit of PB having two sensing independent inputs such as a Current-sensing input to be used for a preferred N-bit ABL Read and another Voltage-sensing input to be used for a preferred-bit HBL Recall operation in two different cycles.
[0022] A still further objective of the invention is to allow two different sensing to be performed concurrently in at least two more than two different N-bit distributed PBs according to different NAND operations.
[0023] For example, at least one N-bit PB's N-bit SA may perform Voltage-sensing of a HBL Recall operation, while at least one another PB's N-bit SA is performing Current-sensing of an ABL Read operation simultaneously, independently and locally. Since a HBL Recall and an ABL Read are done locally in different LGs in each distributed PB, thus no data contention will occurs on N-bit shared DL metal lines.
[0024] A still further objective of the invention is to add 2D CMOS circuits to each PB circuit (Not shown) in
[0025] A still further objective of the invention is to perform the first preferred concurrent J-page 3D HBLo/e (Half-BL) Program-verify operation on J distributed N-bit cells of J selected WLs of the 3D hierarchical NAND array, where J=1 to 8 in accordance with the number of the distributed PB is 1 to 8. Each HBL program-verify operation can be split into 4 sequential steps below. [0026] 1) Perform HBL Recall of N/2-bit of each page of nLC data from N/2-bit corresponding GBLo/e of each N/2-bit DRCo/e's Cstrings/e sequentially into PB for data comparison. [0027] 2) J×N/2-bit CLBLo/e precharge for HBL program-verify and J×N/2-bit CLBLe/o shielding lines concurrent operation step:
[0028] Initially, all J-page N/2-bit of CLBLo/e are precharged from LGpso/e to 1V (VLBLo/e=1V) but the remaining N/2-bit interleaved CLBLe/o are held at 0V acting as the shielding lines during HBL LBLo/e Read. [0029] 3) Concurrent J×N/2-bit LBLo/e cells' Vt evaluations step: It is performed to J×N/2-bit LBLo/e cells with appropriate set of WLs, SSL and GSL voltages such as VR, Vread and Vdd per selected WL per selected LG. This is a major latency of each HBL program-verify step with the following evaluation results within preset of each iterative program-verify time. [0030] a) Pass Fine program-verify: Once cells' Vts pass the verify condition of Fine program Vt of Vtn-0.2V, then VLBL=1V. [0031] b) Fail Fine program-verify: Once cells' Vts fail the verify condition of Fine program Vt of Vtn-0.2V, then VLBL=0V after the preset period time. [0032] 4) A DRAM-like CS (charge-sharing) between one selected CLBL and one CGBL. [0033] a) Pass Fine program-verify: VLBL=VGBL=1V×R, where R is the CS ratio with an equation of R=1V×CLBC/(CLBL+CGBL). [0034] b) Fail Fine program-verify: Once cells' Vts fail the verify condition of Fine program Vt of Vtn-0.2V, then VLBL=VGBL=0V after the preset period time. [0035] 5) Fine or Coarse program-verify: [0036] a) Once cells' Vts fail the verify condition of Fine program Vt of Vtn−0.2V, then VLBL=0V after the preset period time. [0037] b) Once cells' Vts pass the verify condition of Fine program Vt of Vtn−0.2V, then set 0V<VLBL<1V after the preset period time. [0038] c) Once cells' Vts pass the verify condition of final program Vt of Vtn, then set VLBL=Vdd to inhibit the next iterative program. [0039] 6) Continue the 2.sup.nd HBL J×N/2-bit program-verify [0040] 7) Check if J×N-bit ABL program is passed? If it passes, then ABL nLC program is completed. Otherwise, the J×N-bit ABL nLC program is continued until the set count number is reached, then it is stopped to report a bad 3D block.
[0041] A still further objective of the invention is to perform the preferred concurrent J-page 3D ABL CLBL Read operation with steps like above program-verify by J distributed PBs but without those interactive steps, where J=2 to 8.
[0042] A still further objective of the invention is to perform the preferred concurrent J-page mixed or same operations such as Partial/Full-block Erase, ABL nLC Program, ABL nLC Program-verify, ABL nLC Read, ABL Erase-verify in different 3D LGs of the preferred 3D Hierarchical NAND arrays.
[0043] A still further objective of the invention is to form those HV 3D devices such as 3DML, 3DMD and 3DML NMOS transistors to connect respective paired 3D LBL lines and one 3D DCR
[0044] A still further objective of the invention is to provide a flexible Program size in unit of K×N-bit, where K is the number of selected 3D/2D WLs for performing ABL nLC concurrent program and K is Integer and defined as K≧1 in accordance with PB size is N-bit.
[0045] A still further objective of the invention is to remove the MHV of Vread from the non-selected WLs of the selected 3D blocks when each ABL Read data is being sensed and latched into each corresponding SA so that the WL Vread-stress can be dramatically reduced due to the faster Read speed with less LBL capacitance of the preferred 3D/2D hierarchical NAND array.
[0046] A still further objective of the invention is to build n on-chip ECC circuit to be shared by all J distributed PBs after J-page ABL nLC program-verify operations of any ECC algorithms are performed. The ECC circuit will count if the total number of error bits of each selected 3D WL of the selected page exceeding the preset maximum number of N bits of nLC page data plus Syndrome bits?
DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS
[0047] In the following detailed descriptions of concurrent operations, the present 3D hierarchical NAND array and the associated 3D Cstring-based DCRs embodiments, reference is made to the previous pending utilities or provisional ones filed the same inventor and the following accompanying drawings that forms a part hereof, and in which is shown, by way of illustration, specific embodiments in which the disclosure may be practiced. In the drawings, like numerals describe substantially similar components throughout the several views. These embodiments are described in sufficient detail to enable those skilled in the ordinary art to practice the embodiments. Other embodiments may be utilized and any structural, logical, and electrical changes may be made without departing from the scope of the present disclosure. The following detailed description, therefore, is not intended to be exhaustive or to be limited to the precise form disclosed.
[0048] In the following descriptions, when a N-bit ABL nLC ABL Program is referred, it means that total N-bit 3D NAND cells, not including the additional syndrome ECC bytes, formed on one 3D WL and more than one physical 3D WLs of plurality of selected 3D strings in the selected 3D LGs being concurrently selected for performing J-page nLC ABL program along with one or more than local distributed N-bit PB and N-bit DRCs of all preferred 3D Hierarchical arrays from
[0049] For example, a full physical 3D WL page size is formed with 8 KB physical 3D cells. Thereby, one option of an ABL nLC program with 16 KB size means two physical WLs of 8 KB size are concurrently selected for nLC program with 8 KB PB of the present invention. Unlike prior art using 16 KB PB (Page-Buffer) to perform one 16 KB ABL nLC program, only 8 KB PB is required to perform 16 KB ABL nLC program of the present invention. Thereby, the PB size is cut in half for a 50% saving in silicon area.
[0050] Furthermore, when a nLC ABL Program-verify and ABL Read are referred in the following descriptions, it means that N-bit cells per one 3D WL, per one selected 3D string, per one selected 3D block in one selected 3D LG, are selected for performing concurrent ABL Program-verify and ABL Read by N-bit locally corresponding distributed SAs that use one Current-sensing scheme to eliminate the undesired LBL-LBL, GBL-GBL and DL-DL AC Read coupling effect. To Read and Program-verify one page of N-bit nLC data from one selected 3D WL of one selected 3D block, only a 1-cycle ABL Read is required. Similarly, a ABL Write-back by each N-bit distributed SA from each N-bit PB to each N-bit Cstring-based DCR for each nLC digital page data, only a 1-cycle ABL Read is required.
[0051] By contrast, when a nLC HBL Recall is referred, it means that N/2-bit interleaved cells per one 3D WL, per one selected 3D string, per one selected 3D block of each Cstring-based DCR are selected for performing concurrent DRAM-like read so that the undesired DCR-DCR metal line AC Read coupling effect can be eliminated. To Recall one page of N-bit nLC digital data from one selected 3D WL of one selected 3D Cstring-based DCR, a 2-cycle HBL Recall is required.
[0052] To Recall one page of Cstring-based N-bit locally distributed nLC data from each corresponding locally distributed N-bit 3D DCR to N-bit PB, a 2-cycle HBL Recall is required via N/2-bit selected locally distributed LBL metal lines.
[0053] By contrast, to Write-back one page of N-bit nLC digital data from N-bit locally distributed PB to N-bit Cstring-based locally distributed DCRs, only a 1-cycle ABL Write-back is required via N-bit selected locally distributed LBL metal lines.
[0054] Furthermore, when a 3D NAND cell is referred in the following descriptions, it means that either a 2-poly Floating-gate 3D NAND cell or a 1-poly Charge-trapping 3D NAND cell is used in 3 preferred 3D Hierarchical NAND arrays with a plurality of distributed 2D PBs and 3D Cstring-based DCRs of
[0055] Furthermore, when a 3D NAND Erase operation is referred in the following descriptions, it means either a FN channel-tunneling erase scheme or a GIDL erase scheme is used in above 3 preferred 3D Hierarchical NAND arrays with a plurality of distributed PBs and DCRs of
[0056] Furthermore, when a locally distributed PB is referred in the following descriptions, it means a LV N-bit PB comprising of one nLC Latch circuit (86), one LV SA mixed with N-bit current-sensing LV SA for ABL Read and Verify and one N-bit voltage-sensing LV SAs for HBL Read and Verify (104a), one LV PRB circuit (106), one LV Match circuit (107). All devices of each PB circuit in
[0057] Although particular embodiments of above preferred 3D hierarchical NAND arrays and distributed PBs and DCRs to perform the mixed pipeline and concurrent operations will be disclosed below, other derivatives, modifications and changes from the present invention will be apparent to those of ordinary skill in the art and should be covered by this invention. Some embodiments have been covered in previous U.S. patent applications by the same inventor of this invention and are omitted here for description simplicity. Only the new inventive concepts are summarized below as the targeted objectives.
[0058] Embodiments of the semiconductor memory devices and Hierarchical arrays are described with reference to the drawings.
[0059]
[0060] Array being equally divided into eight 3D sub-arrays with eight distributed 16 KB 2D PBs that are globally connected by 16 KB long top-level 2D DL lines, which are formed on the reversed side of Psub right below eight locally distributed 16 KB-width 3D sub-arrays.
[0061] Each 16 KB 2D PB comprises 16 KB 2D LV SAs circuit distributed in ⅛ of whole 3D hierarchical 3D array and their associated 16 KB 2D, LV, nLC Latches. Each 3D ⅛-array further comprises a plurality of 3D NAND blocks. Each 3D NAND block further comprises a plurality of 3D NAND strings with their 16 KB drains being locally connected by 16 KB low-level ⅛-long 3D LBL metal lines being formed in parallel to 16 KB long 2D DLs and their 16 KB sources being connected to one common 3D CSL metal lines in parallel to 3D WLs but perpendicular to LBL and DL lines, e.g., referred as (2D DL//3D LBL).sub.⊥(3D CSL//3D WL) in accordance with the preferred concurrent operations of ABL nLC Program, ABL nLC Read, ABL nLC Program-verify, ABL nLC Erase-verify and nLC Erase of the present invention. Each 16 KB PB takes care of each divided ⅛ 3D sub-array and all eight distributed 16 KB PBs circuits are preferably formed below the preferred eight divided 3D hierarchical NAND sub-arrays to save silicon area.
[0062] Each local 3D LBL metal line is connected to each corresponding input of GBL1 of each 2D SA through one HV (>20V) 3D NMOS transistor (3DML) with its gate being tied to a common LG1 signal, its drain tied to LBL1, and its source tied to GBL1. Each output of nLC-Latch circuit is then coupled to the corresponding drain node of a LV 2D NMOS MD transistor with its gate being coupled to a common DLPBSW1 signal and its source being connected to each corresponding 2D DL.
[0063] Each SA circuit is shown in
[0068] The whole 3D NAND memory only uses one 16 KB static CACHE, whose 16 KB bidirectional inputs are connected to 16 KB corresponding DLs and can be sequentially shared by eight distributed 16 KB PBs. Furthermore, only one bidirectional 16 KB Y-pass circuit with a 3-level column decoding of YA-dec, YB-dec and YC-dec is connected to 16 KB outputs of the static CACHE. As a result, either one Byte or one Word can be decoded to Byte-based or Word-based I/Os of NAND memory (Not shown).
[0069] Now, the details of the concurrent operations of
In this case, 3×n×16 KB nLC page data will be sequentially programmed into three 16 KB 3D cells simultaneously in three selected adjacent 3D WLs. 3n pages of nLC program data are required to be stored in each bit of PB in 3×n cycles. In this case, each bit of PB's nLC-latch circuit requires 3×n×16 KB nLC page data to be stored in one bit of PB. For a SLC/MLC/TLC ABL program, each bit of PB requires 3/6/9 latches respectively. Thus, this 3-WL rotational ABL nLC program scheme will increase the size of each PB's nLC-latch circuit by 3×. In the future, a 100-layer 3D string will become the mainstream. Thus, the physical spacing between two adjacent 3D WLs will get smaller and smaller. As a result, the 3D cells' Vt DC coupling effect between three adjoined 3D WLs will greatly degrade the stored nLC data. Thus, a popular 2D's 3-WL rotational ABL program scheme is or will be adopted in the mainstream 3D NAND ABL nLC program. Thus, the size reduction of each PB's nLC-latch circuit for a 3-WL rotational ABL nLC program in a 3D hierarchical array is strongly required.
Therefore, a 3D string-based capacitor, Cstring, to store n−1 nLC program data is used. Each page of 16 KB Cstrings is referred as Dynamic CACAHE or 3D DCR. The reading of this 16 KB DCACHE digital data is performed in a HBL manner in 2 cycles. The 2-cycle reading is defined as HBL Recall and each 16 KB digital data writing into each 16 KB DCACHE is performed in ABL manner in 1-cycle by each 16 KB PB and defined as an ABL Write-back. In order to perform a HBL Recall, each SA of the present invention needs to have a DRAM-like voltage sensing circuit as shown in
[0085]
[0086] Each bit of PB in
[0087] The size requirement of each 3D-DCR comprises at least 3×2n×16 KB DCR strings to allow the minimum storage of 3×2n×16 KB nLC digital page data for performing a desired 3-WL rotational nLC program scheme for this 3D hierarchical NAND array of the present invention. The reasons for 3×2n×16 KB Cstrings are explained below. [0088] 1) For each ABL nLC program for each selected 3D WL, it requires n×16 KB program data to be stored locally in the 3D-DCRs. [0089] 2) For a HBL Recall operation, it requires 2n×8 KB 3D-DCRs' strings to store two separate n×8 KB Odd/Even program data to avoid DCRo/e-DCRo/e metal coupling effect. [0090] 3) For a 3-WL rotational ABL nLC program, it requires 3×2n×16 KB program data to be stored locally in the 3D-DCRs.
[0091] The 3D hierarchical NAND array in
[0093] In addition, a 3D HV NMOS transistor is inserted between each paired 3D LBLo/e metal lines with its gate being tied to TIE is proposed by the present invention. The reason to incorporate this HV transistor is to allow the CS (Charge-sharing) or LBLo/e precharge operation between two physically separate 1/8-array can be performed to generate the desired plural Analog VLBLs with ΔVBL=ΔVtn of nLC Vtn for a superior ABL nLC program as disclosed by the same inventor in plural prior inventions in both 2D and 3D hierarchical NAND arrays. The details can be referred to many previous patents granted to the same inventor of the present invention. Thus the detailed explanations are omitted herein for description brevity. [0094] Now the detailed operations of ABL program and Read of
[0105]
[0122] Note, the total size of 4 separate 3D DCRs in
[0123] Similarly,
[0124] Now the detailed operations of ABL program and Read of
[0125] Concurrent ABL nLC program: Up to 8-page 16 KB ABL nLC program can be concurrently performed as
[0131] The detailed circuit explanations of each PB's circuit are summarized below. [0132] 1) Each bit of PB has one bidirectional input of a 2D GBL line which is shared by the following circuits. [0133] I. This is the 1.sup.st option of each GBL to be directly connected to a bidirectional output node of 2D DL node via a 2D NMOS transistor of MD with its gate being tied to DLPBSW when other connections are biased in a tri-state as listed in iii below. [0134] i. When VDLPBSH≧Vdd+Vt, then VGBL=VDL. [0135] ii. When VDLPBSH=0V, then VGBL is disconnected from VDL. [0136] iii. VENPREB=VRW=VT1=VBIAS2=VBIAS1=0V. [0137] II. This is the 2.sup.nd option of each GBL to be directly connected to an input of a current-sensing cascode-type SA that comprises one LV PMOS transistor MP1 in series with one NMOS transistor MN9 with its gate being tied to BIAS1 and other bias conditions are shown in iii below. [0138] i. MP1: Vsource=Vdd, Vdrain=VSA, and Vgate=REFV1. [0139] ii. MN9: Vsource=VGBL, Vdrain=VSA, and Vgate=BIAS1. [0140] iii. VBIAS2=VRW=VT1=VENPREB=0V: To disconnect PRE circuit, DL and the Voltage-sensing SA circuit of MN10 from each GBL so that only the Current-sensing ABL Read and Program-verify can be performed without being disturbed. [0141] iv. ABL Program-verify or Read operation: This is a cascade-type SA. [0142] When a programmed cell's Vt passes the Program-verify, there would be zero cell current flow or an extremely small cell current flow. Thus, VGBL=VBIAS1−Vt(MN9), thus VSA=Vdd. [0143] When a programmed cell's Vt fails the Program-verify, there would be some cell current flow. Thus, VGBL<VBIAS1−Vt(MN9), thus VSA<VBIAS1−Vt(MN9). [0144] III. This is the 3.sup.rd option of each GBL to be directly connected to an input of a Voltage-sensing SA that comprises only one precharge NMOS transistor of MN10 with its gate being tied to BIAS12 and other bias conditions are shown in ii below. [0145] i. MN10: Vdrain=Vdd, Vsource=VGBL, and Vgate=VBIAS2. [0146] ii. VBIAS2=VRW=VT1=VENPREB=0V: To disconnect PRE circuit, DL and the Current-sensing SA circuit of MP1 and MN9 from each GBL so that only the Voltage-sensing HBL Read and HBL Program-verify can be performed without being disturbed. [0147] IV. This is the 4.sup.th option of each GBL to be directly connected to one input Q1 of a DRAM-like Latch-type SA, which is comprised of the following devices. [0148] i. A latch-type circuit comprises MP3, MP2, MP4, MN2, MN4 and MN5. [0149] ii. One sensed voltage input of Q1 with a corresponding MN1 transistors with its gate being tied to T1, drain tied to GBL and source tied to Q1. [0150] iii. Two opposite reset signals of RS and LAT, where RS is connected to MN8's gate while LAT is connected to MN3's gate. [0151] iv. One Reference input to Q1B via MN7 with its gate being tied to T1 to track MN1 bias condition to achieve the highest level of reliable operation. [0152] Note, this Voltage-sensing SA circuit is only used for HBL Read, HBL Program-verify and HBL Recall operations. The detailed steps of operation are omitted herein and can be referred to previous patents that were filed by the same inventor of the present invention. [0153]