SPLIT ARRAY ARCHITECTURE FOR ANALOG NEURAL MEMORY IN A DEEP LEARNING ARTIFICIAL NEURAL NETWORK
20220374696 · 2022-11-24
Assignee
Inventors
- Hieu Van Tran (San Jose, CA)
- Thuan Vu (San Jose, CA)
- Stanley Hong (San Jose, CA)
- Stephen Trinh (San Jose, CA)
- Anh Ly (San Jose, CA)
Cpc classification
G11C7/12
PHYSICS
G11C7/16
PHYSICS
G11C7/1006
PHYSICS
G06N3/0442
PHYSICS
International classification
G11C7/10
PHYSICS
G11C7/12
PHYSICS
Abstract
Numerous embodiments are disclosed for splitting an array of non-volatile memory cells in an analog neural memory in a deep learning artificial neural network into multiple parts. Each part of the array interacts with certain circuitry dedicated to that part and with other circuitry that is shared with one or more other parts of the array.
Claims
1. An analog neural memory, comprising: an array of non-volatile memory cells arranged into rows and columns; a first column decoder coupled to a first half of the columns in the array; a second column decoder coupled to a second half of the columns in the array; a first output circuit coupled to the first column decoder for generating a first output from one or more columns in the first half of the columns during a first read operation; and a second output circuit coupled to the second column decoder for generating a second output from one or more columns in the second half of the columns during a second read operation.
2. The analog neural memory of claim 1, wherein the first read operation and the second read operation occur concurrently.
3. The analog neural memory of claim 1, further comprising: a shared bit line driver coupled to the first column decoder and to the second column decoder during a program operation.
4. The analog neural memory of claim 1, wherein a shared high voltage decoder is selectively coupled to all rows in the array.
5. The analog neural memory of claim 1, wherein a shared control gate high voltage decoder is selectively coupled to all rows in the array.
6. The analog neural memory of claim 1, wherein a shared erase gate high voltage decoder is selectively coupled to all rows in the array.
7. The analog neural memory of claim 1, wherein a shared row decoder is coupled to all rows in the array.
8. The analog neural memory of claim 1, wherein continuous column diffusion occurs between columns in the first half of the columns and the second half of the columns.
9. An analog neural memory, comprising: a first array of non-volatile memory cells arranged into rows and columns; a second array of non-volatile memory cells arranged into rows and columns; a third array of non-volatile memory cells arranged into rows and columns; a fourth array of non-volatile memory cells arranged into rows and column a first row decoder coupled to rows of the first array and the second array; a second row decoder coupled to rows of the third array and the fourth array; a first column decoder coupled to the first array; a second column decoder coupled to the second array; a third column decoder coupled to the third array; a fourth column decoder coupled to the fourth array; a first output circuit coupled to the first column decoder for generating a first output from one or more rows in the first array during a first read operation; a second output circuit coupled to the second column decoder for generating a second output from one or more rows in the second array during a first read operation; a third output circuit coupled to the third column decoder for generating a third output from one or more rows in the third array during a second read operation; and a fourth output circuit coupled to the fourth column decoder for generating a fourth output from one or more rows in the fourth array during the second read operation.
10. The analog neural memory of claim 9, wherein the first read operation and the third read operation occur concurrently.
11. The analog neural memory of claim 9, wherein the second read operation and the fourth read operation occur concurrently.
12. The analog neural memory of claim 9, further comprising: a first shared bit line driver coupled to the first column decoder and the second column decoder during a program operation; and a second shared bit line driver coupled to the third column decoder and the fourth column decoder during a program operation.
13. The analog neural memory of claim 9, wherein each of the first output circuit, the second output circuit, the third output circuit, and the fourth output circuit comprises a current-to-voltage converter.
14. The analog neural memory of claim 13, wherein each of the first output circuit, the second output circuit, the third output circuit, and the fourth output circuit further comprises an analog-to-digital converter coupled to the current-to-voltage converter.
15. The analog neural memory of claim 9, wherein a shared high voltage decoder is selectively coupled to all rows in the array.
16. The analog neural memory of claim 9, wherein the first array, the second array, the third array, and the fourth array each comprises continuous column diffusion between columns.
17. The analog neural memory of claim 9, wherein the first array, the second array, the third array, and the fourth array are formed from one physical array and divided from one another by a portion of the physical array without metal contacts.
18. An analog neural memory, comprising: an array of non-volatile memory cells arranged into rows and columns; a first output circuit coupled to a first half of the columns in the array for generating a first output from one or more columns in the first half of the columns during a first read operation; and a second output circuit coupled to a second half of the columns for generating a second output from one or more columns in the second half of the columns during a second read operation.
19. The analog neural memory of claim 18, wherein the first read operation and the second read operation occur concurrently.
20. The analog neural memory of claim 18, wherein a shared high voltage decoder is selectively coupled to all rows in the array.
21. The analog neural memory of claim 18, wherein a shared control gate high voltage decoder is selectively coupled to all rows in the array.
22. The analog neural memory of claim 18, wherein a shared erase gate high voltage decoder is selectively coupled to all rows in the array.
23. The analog neural memory of claim 18, wherein a shared word line decoder is selectively coupled to all rows in the array.
24. The analog neural memory of claim 18, wherein the array comprises continuous column diffusion between columns in the first half of the columns and the second half of the columns.
25. An analog neural memory, comprising: an array of non-volatile memory cells arranged into rows and columns, each non-volatile memory cell comprising a control gate terminal, a word line terminal, a source line terminal, and an erase gate terminal; a plurality of control gate lines, each control gate line coupled to control gate terminals of a row of non-volatile memory cells; a plurality of word lines, each word line coupled to word line terminals of a row of non-volatile memory cells; a plurality of source lines, each source line coupled to source line terminals of two adjacent rows of non-volatile memory cells; and a plurality of erase gate lines, each erase gate line coupled to erase gate terminals of a row of non-volatile memory cells; wherein a control gate line of a first row is coupled to a control gate line of a second row, an erase gate line of the first row is coupled to an erase gate line of a third row, and a source line of the first row is coupled to a source line of the third row.
26. The analog neural memory of claim 25, wherein the first row and the second row are in different sectors.
27. The analog neural memory of claim 25, wherein the first row and the third row are in different sectors.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0096]
[0097]
[0098]
[0099]
[0100]
[0101]
[0102]
[0103]
[0104]
[0105]
[0106]
[0107]
[0108]
[0109]
[0110]
[0111]
[0112]
[0113]
[0114]
[0115]
[0116]
[0117]
[0118]
[0119]
[0120]
[0121]
[0122]
[0123]
[0124]
[0125]
[0126]
[0127]
[0128]
[0129]
[0130]
[0131]
[0132]
[0133]
[0134]
[0135]
[0136]
DETAILED DESCRIPTION OF THE INVENTION
[0137] The artificial neural networks of the present invention utilize a combination of CMOS technology and non-volatile memory arrays.
[0138] VMM System Overview
[0139]
[0140] The input circuit 3106 may include circuits such as a DAC (digital to analog converter), DPC (digital to pulses converter, digital to time modulated pulse converter), AAC (analog to analog converter, such as a current to voltage converter, logarithmic converter), PAC (pulse to analog level converter), or any other type of converters. The input circuit 3106 may implement normalization, linear or non-linear up/down scaling functions, or arithmetic functions. The input circuit 3106 may implement a temperature compensation function for input levels. The input circuit 3106 may implement an activation function such as ReLU or sigmoid. The output circuit 3107 may include circuits such as an ADC (analog to digital converter, to convert neuron analog output to digital bits), AAC (analog to analog converter, such as a current to voltage converter, logarithmic converter), APC (analog to pulse(s) converter, analog to time modulated pulse converter), a current-to-voltage converter, or any other type of converters. The output circuit 3107 may implement an activation function such as ReLU or sigmoids. The output circuit 3107 may implement statistic normalization, regularization, up/down scaling/gain functions, statistical rounding, or arithmetic functions (e.g., add, subtract, divide, multiply, shift, log) for neuron outputs. The output circuit 3107 may implement a temperature compensation function for neuron outputs or array outputs (such as bitline output) so as to keep power consumption of the array approximately constant or to improve precision of the array (neuron) outputs such as by keeping the IV slope approximately the same.
[0141]
[0142]
[0143] Optionally, with further reference to
[0144]
[0145] Unlike in VMM system 3100, in VMM system 3300 certain operations are split between different sets of circuitry. Specifically, array 3301a is operated upon by column decoder 3304 and current-to-voltage converter 3306, and array 3301b is operated upon by column decoder 3305 and current-to-voltage converter 3307. This allows multiple read and/or program operations to be performed simultaneously, where read or program operations can be performed concurrently on one or more cells in array 3301a and one or more cells in array 3301b.
[0146] Current-to-voltage converter circuits 3306 and 3307 are both coupled to shared analog-to-digital converter 3308, which is used in a time multiplexing fashion during read operations, and to shared bit line drivers 3309, which is used during program and erase operations. For example, in read operation, the array 3301a is enabled and is coupled to the column decoder 3304 and to the current-to-voltage converter circuit 3306 while the array 3301b is enabled and is coupled to the column decoder 3305 and the current-to-voltage converter circuit 3307 at the same time. The output voltage from the current-to-voltage converter circuits 3306 and 3307 are sampled and held (S/H), e.g., by S/H capacitors inside the shared ADC 3308, and these array output voltage are digitized (converted) by the time multiplexed shared ADC 3308 (since it is shared between the current-to-voltage converter circuits 3306 and 3307). For example, for one ADC shared between two current-to-voltage converter circuits, two set of S/H capacitors are used. In another embodiment, one ADC can be used for N current-to-voltage converter circuits, and in this case N set of S/H capacitors are used.
[0147] The use of a shared ADC between two current-to-voltage converter circuits can be applied to
[0148]
[0149]
[0150] In VMM system 3500, certain operations are split between different sets of circuitry. Specifically, array 3501a is operated upon by column decoder 3505 and output circuit 3509; array 3501b is operated upon by column decoder 3507 and output circuit 3511; array 3501c is operated upon by column decoder 3506 and output circuit 3510; and array 3501d is operated upon by column decoder 3508 and output circuit 3512. This allows multiple read/or and program operations to be performed simultaneously in all four arrays at once, where read or program operations can be performed concurrently on one or more cells in array 3501a, one or more cells in array 3501b, one or more cells in array 3501c, and one or more cells in array 3501d. Arrays 3501a and 3501b are both selectively coupled to shared bit line drivers 3513 during program and erase operations. Arrays 3501c and 3501d are both selectively coupled to shared bit line drivers 3514 during program and erase operations.
[0151] For example, a first read operation can be performed where column decoder 3505 and output circuit 3509 generate a first output from one or more rows in array 3501a, a second read operation can be performed where column decoder 3506 and output circuit 3510 generate a second output from one or more rows in array 3501c, a third read operation can be performed where column decoder 3507 and output circuit 3511 generate a third output from one or more rows in array 3501b, and a fourth read operation can be performed where column decoder 3508 and output circuit 3512 generate a fourth output from one or more rows in array 3501d. Optionally, the first and third read operations can occur concurrently. Optionally, the second and fourth read operations can occur concurrently.
[0152]
[0153]
[0154]
[0155] In the arrays of
[0156]
[0157] Rows 3801 (a first row) and 3811 (a second row) share a control gate line (CG0) (meaning that the control gate terminal of each cell in those rows is coupled to the same control gate line); rows 3802 and 3812 share a control gate line (CG1) (meaning that the control gate terminal of each cell in those rows is coupled to the same control gate line); rows 3803 and 3813 share a control gate line (CG2) (meaning that the control gate terminal of each cell in those rows is coupled to the same control gate line); rows 3804 and 3814 share a control gate line (CG3) (meaning that the control gate terminal of each cell in those rows is coupled to the same control gate line); rows 3805 and 3815 share a control gate line (CG4) (meaning that the control gate terminal of each cell in those rows is coupled to the same control gate line); rows 3806 and 3816 share a control gate line (CG5) (meaning that the control gate terminal of each cell in those rows is coupled to the same control gate line); rows 3807 and 3817 share a control gate line (CG6) (meaning that the control gate terminal of each cell in those rows is coupled to the same control gate line); and rows 3808 and 3818 share a control gate line (CG7) (meaning that the control gate terminal of each cell in those rows is coupled to the same control gate line). This means that the control gates are shared across the sectors. These couplings allow different rows to share decoder circuitry. The array terminals are shared such that the program or erase disturb is reduced by having a reduced amount of erase or program voltage stress on un-selected cells.
[0158] Rows 3801 (a first row), 3802 (a third row), 3805, and 3806 share an erase gate line (EG0) (meaning that the erase gate terminal of each cell in those rows is coupled to the same erase gate line) and a source line (SL0) (meaning that the source line terminal of each cell in those rows is coupled to the same source line); rows 3803, 3084, 3807, and 3808 share an erase gate line (EG1) (meaning that the erase gate terminal of each cell in those rows is coupled to the same erase gate line) and a source line (SL1) (meaning that the source line terminal of each cell in those rows is coupled to the same source line); rows 3811, 3812, 3815, and 3816 share an erase gate line (EG0) (meaning that the erase gate terminal of each cell in those rows is coupled to the same erase gate line) and a source line (SL0) (meaning that the source line terminal of each cell in those rows is coupled to the same source line); and rows 3813, 3114, 3817, and 3818 share an erase gate line (EG1) (meaning that the erase gate terminal of each cell in those rows is coupled to the same erase gate line) and a source line (SL1) (meaning that the source line terminal of each cell in those rows is coupled to the same source line).
[0159]
[0160] It should be noted that, as used herein, the terms “over” and “on” both inclusively include “directly on” (no intermediate materials, elements or space disposed therebetween) and “indirectly on” (intermediate materials, elements or space disposed therebetween). Likewise, the term “adjacent” includes “directly adjacent” (no intermediate materials, elements or space disposed therebetween) and “indirectly adjacent” (intermediate materials, elements or space disposed there between), “mounted to” includes “directly mounted to” (no intermediate materials, elements or space disposed there between) and “indirectly mounted to” (intermediate materials, elements or spaced disposed there between), and “electrically coupled” includes “directly electrically coupled to” (no intermediate materials or elements there between that electrically connect the elements together) and “indirectly electrically coupled to” (intermediate materials or elements there between that electrically connect the elements together). For example, forming an element “over a substrate” can include forming the element directly on the substrate with no intermediate materials/elements therebetween, as well as forming the element indirectly on the substrate with one or more intermediate materials/elements there between.