Distributed Pattern Processor Comprising Three-Dimensional Memory Array
20170255834 ยท 2017-09-07
Assignee
Inventors
Cpc classification
G11C17/10
PHYSICS
G11C5/063
PHYSICS
G11C17/165
PHYSICS
G11C5/025
PHYSICS
G11C15/00
PHYSICS
G11C5/02
PHYSICS
International classification
Abstract
The present invention discloses a distributed pattern processor. The distributed pattern processor not only stores patterns permanently, but also processes them using massive parallelism. It comprises a plurality of storage-processing units (SPU), with each SPU comprising a pattern-processing circuit and at least a three-dimensional memory (3D-M) array storing at least a pattern. The 3D-M array is vertically stacked above the pattern-processing circuit.
Claims
1. A distributed pattern processor, comprising: an input bus for transferring a first pattern; a semiconductor substrate having transistors thereon; a plurality of storage-processing units (SPU) including a first SPU, said first SPU comprising at least a three-dimensional memory (3D-M) array and a pattern-processing circuit, wherein said 3D-M array is stacked above said substrate, said 3D-M array storing a second pattern; said pattern-processing circuit is formed on said substrate, said pattern-processing circuit performing pattern matching or pattern recognition for said first and second patterns; said 3D-M array and said pattern-processing circuit are communicatively coupled by an inter-level connection comprising a plurality of contact vias.
2. The distributed pattern processor array according to claim 1, further comprising a second SPU formed side-by-side with said first SPU, wherein said first and second SPUs are both communicatively coupled with said input bus.
3. The distributed pattern processor array according to claim 2, further comprising an output bus, wherein said first and second SPUs are both communicatively coupled with said output bus.
4. The distributed pattern processor array according to claim 1, wherein said 3D-M is three-dimensional writable memory (3D-W).
5. The distributed pattern processor array according to claim 4, wherein said 3D-W is three-dimensional one-time-programmable memory (3D-OTP).
6. The distributed pattern processor array according to claim 4, wherein said 3D-W is three-dimensional multiple-time-programmable memory (3D-MTP).
7. The distributed pattern processor array according to claim 6, wherein said 3D-MTP is 3D-XPoint.
8. The distributed pattern processor array according to claim 1, wherein said 3D-M is three-dimensional printed memory (3D-P).
9. The distributed pattern processor array according to claim 8, wherein said 3D-P is three-dimensional mask-programmed read-only memory (3D-MPROM).
10. The distributed pattern processor array according to claim 1, wherein said 3D-M array at least partially covers said pattern-processing circuit.
11. The distributed pattern processor array according to claim 1, wherein said pattern-processing circuit is covered by at least a first 3D-M array and a second 3D-M array.
12. The distributed pattern processor array according to claim 11, further comprising a gap between said first 3D-M array and said second 3D-M array.
13. The distributed pattern processor array according to claim 12, further comprising a routing channel in said gap.
14. The distributed pattern processor array according to claim 1 being a big-data processor, wherein said first pattern is a search pattern; said second pattern is a target pattern; and said 3D-M array stores at least a portion of big data
15. The distributed pattern processor array according to claim 1 being a anti-malware processor, wherein said first pattern is a target pattern; said second pattern is a search pattern; and said 3D-M array stores at least a virus signature and/or a network rule.
16. The distributed pattern processor array according to claim 1 being a anti-malware processor, wherein said first pattern is a search pattern; said second pattern is a target pattern; and said 3D-M array stores at least a portion of user data.
17. The distributed pattern processor array according to claim 1 being a voice-recognition processor, wherein said first pattern is a target pattern; said second pattern is a search pattern; said 3D-M array stores at least an acoustic model and/or a language model.
18. The distributed pattern processor array according to claim 1 being a voice-recognition processor, wherein said first pattern is a search pattern; said second pattern is a target pattern; said 3D-M array stores at least a portion of voice data.
19. The distributed pattern processor array according to claim 1 being an image-recognition processor, wherein said first pattern is a target pattern; said second pattern is a search pattern; said 3D-M array stores at least a language model.
20. The distributed pattern processor array according to claim 1 being an image-recognition processor, wherein said first pattern is a search pattern; said second pattern is a target pattern; said 3D-M array stores at least a portion of image data.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0018]
[0019]
[0020]
[0021]
[0022]
[0023] It should be noted that all the drawings are schematic and not drawn to scale. Relative dimensions and proportions of parts of the device structures in the figures have been shown exaggerated or reduced in size for the sake of clarity and convenience in the drawings. The same reference symbols are generally used to refer to corresponding or similar features in the different embodiments. Throughout the specification, the symbol / means and/or.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
[0024] Those of ordinary skills in the art will realize that the following description of the present invention is illustrative only and is not intended to be in any way limiting. Other embodiments of the invention will readily suggest themselves to such skilled persons from an examination of the within disclosure.
[0025] Referring now to
[0026]
[0027] Referring now to
[0028] The 3D-W comprises a substrate circuit OK formed on the substrate 0. A first memory level 16A is stacked above the substrate circuit OK, with a second memory level 16B stacked above the first memory level 16A. The substrate circuit OK includes the peripheral circuits of the memory levels 16A, 16B. It comprises transistors 0t and the associated interconnect OM. Each of the memory levels (e.g. 16A, 16B) comprises a plurality of first address-lines (i.e. y-lines, e.g. 2a, 4a), a plurality of second address-lines (i.e. x-lines, e.g. 1a, 3a) and a plurality of 3D-W cells (e.g. 5aa). The first and second memory levels 16A, 16B are coupled to the substrate circuit OK through contact vias 1av, 3av, respectively. Because they couple the 3D-M array 170 and the pattern-processing circuit 180, the contacts vias 1av, 3av are collectively referred to as inter-storage-processor (ISP) connections 160.
[0029] A 3D-W cell 5aa comprises a programmable layer 12 and a diode layer 14. The programmable layer 12 could be an antifuse layer (used for 3D-OTP) or a re-programmable layer (used for 3D-MTP). The diode layer 14 is broadly interpreted as any layer whose resistance at the read voltage is substantially lower than when the applied voltage has a magnitude smaller than or polarity opposite to that of the read voltage. The diode could be a semiconductor diode (e.g. p-i-n silicon diode), or a metal-oxide (e.g. TiO.sub.2) diode.
[0030] The 3D-M of
[0031] 3D-P has at least two types of 3D-P cells: a high-resistance 3D-P cell 5aa, and a low-resistance 3D-P cell 6aa. The low-resistance 3D-P cell 6aa comprises a diode layer 14, while the high-resistance 3D-P cell 5aa comprises a high-resistance layer 12. As an example, the high-resistance layer 12 is a layer of silicon oxide (SiO.sub.2). This high-resistance layer 12 is physically removed at the location of the 3D-P cell 6aa through mask programming.
[0032] In a 3D-M, each memory level comprises at least a 3D-M array. A 3D-M array is a collection of 3D-M cells in a memory level that share at least one address-line. The 3D-M array on the topmost memory level is referred to as the topmost 3D-M array. The memory level below the topmost memory level is referred to as intermediate memory level. A 3D-M die comprises a plurality of 3D-M blocks. Each 3D-M block comprises a topmost 3D-M array and all 3D-M arrays bound by the projection of the topmost 3D-M array on each intermediate memory level.
[0033] Referring now to
[0034] Referring now to
[0035] In this preferred embodiment, because it is bound by four peripheral circuits, the area of the pattern-processing circuit 180 must be smaller than that of the 3D-M array 170. As a result, the pattern-processing circuit 180 has limited functions. It is more suitable for simple pattern processing (e.g. string match and code match). Apparently, complex pattern processing (e.g. voice recognition, image recognition) requires a larger area to facilitate the layout of the pattern-processing circuit 180.
[0036] The embodiment of
[0037] The embodiment of
[0038] In some embodiments of the present invention, the pattern-processing circuit 180 may perform partial pattern processing. For example, the pattern-processing circuit 180 only performs a simple pattern processing (e.g. simple feature extraction and analysis). After being filtered by the simple pattern processing, the remaining patterns are sent to an external processor (e.g. CPU, GPU) to complete the full pattern processing. Because a majority of patterns will be filtered by the simple pattern processing, the patterns output from the pattern-processing circuit 180 are far fewer than the original patterns. This can alleviate the bandwidth requirement on the output bus 120.
[0039] In the preferred distributed pattern processor 200, the SPU 100ij could be processor-like or storage-like. The processor-like SPU appears to a user like a processor. It performs pattern processing for an external user data using its embedded search-pattern database. To be more specific, the 3D-M array 170 in the SPU 100ij stores at least a portion of the search-pattern database; the input data 110 of the SPU 100ij include the user data (e.g. network packets), which are usually generated real-time; and, the pattern-processing circuit 100ij of the SPU 100ij performs pattern matching or pattern recognition. Because the 3D-M array 170 and the pattern-processing circuit 180 have fast ISP-connections 160, the preferred distributed pattern processor 200 offers a faster pattern-processing speed than the conventional von Neumann architecture.
[0040] On the other hand, the storage-like SPU appears to a user like a storage. Its primary purpose is to permanently store user data, with a secondary purpose of performing pattern-processing using its embedded pattern-processing circuit. To be more specific, the 3D-M array 170 in the SPU 100ij permanently stores at least a portion of a user database; the input data 110 of the SPU 100ij include at least a search pattern; and, the pattern-processing circuit 100ij of the SPU 100ij performs pattern matching or pattern recognition. Just like the flash memory, a plurality of distributed pattern-processor dice 200 can be packaged into a storage card (e.g. an SD card, a TF card) or a solid-state drive (SSD). They can be used to store mass user data (e.g. in a user-data archive). Because each SPU 100ij in each distributed pattern-processor die 200 has its own pattern-processing circuit 180, this pattern-processing circuit 180 only needs to process the user data stored in the 3D-M array 170 of the same SPU 100ij. As a result, no matter how large is the capacity of a storage card (or, a solid-state drive), the processing time for the whole storage card (or, the whole solid-state drive) is similar to the processing time for a single SPU 100ij. This is unimaginable for the conventional von Neumann architecture.
[0041] A big difference between the present invention and prior art is that the 3D-M arrays in a storage-like SPU are the final storage place for the user data. In prior art, the memory embedded in a processor is used as a cache and only temporarily stores user data; and, all user data are permanently stored in external storage (e.g. hard drive, optical drive, tape). This arrangement causes the bottleneck of memory wall faced by the von Neumann architecture. In addition, prior art cannot simply switch to the permanent-storage approach used in the present invention. Assume that prior art adopted the permanent-storage approach, i.e. the embedded memory in the processor permanently stores user data. Once the embedded memory is full, the processor can only serve the inside data, but not any outside data. Thus, a large number of processors are required for mass data. Since the conventional processors are expensive, prior art using the permanent-storage approach would incur a high price tag.
[0042] In contrast, for the SPU 100ji disclosed in the present invention, the pattern-processing circuit 180 is formed at the same time as the peripheral circuits of the 3D-M array 170. Because the peripheral circuits are needed for the 3D-M anyway, adding the fact that the peripheral circuits only occupy a small area on the substrate 0 and most substrate area can be used to form the pattern-processing circuit 180 (
[0043] In the following paragraphs, several applications of the distributed pattern processor are disclosed. One application is big-data processor. Big-data processor is used for big-data analytics (e.g. financial data mining, e-commerce data mining, bio-informatics). Big data are generally unstructured data or semi-structured data which cannot be analyzed using relational database. To improve its pattern-processing speed, a storage-like distributed pattern processor 200 is preferably used: the input data 110 include search keywords or other regular expressions; the 3D-M array 170 stores at least a portion of the big data; and, the pattern-processing circuit 180 performs pattern processing. In the big-data processor, the 3D-M is preferably a 3D-MTP. It can be used to store big data.
[0044] Another application is anti-malware processor. It is used for network security and/or anti-virus operations. Network security applications may take the processor-like approach: the input data 110 include at least a network packet; the 3D-M array 170 stores at least a network rule and/or a virus signature; and, the pattern-processing circuit 180 performs pattern processing. Anti-virus operations may take either the processor-like approach or the storage-like approach. For the processor-like approach, the input data 110 are at least a portion of the user data stored in a computer, the 3D-M array 170 stores at least a virus signature; and, the pattern-processing circuit 180 performs pattern processing. For the storage-like approach, the input data 110 include a virus signature from a virus signature database; the 3D-M array 170 stores at least a portion of the user database; and, the pattern-processing circuit 180 performs pattern processing. For the processor-like approach, the 3D-M is preferably a 3D-OTP or 3D-MTP. It can be used to store the network rule database and/or the virus signature database. For the storage-like approach, the 3D-M is preferably a 3D-MTP. It can be used to store the user database.
[0045] The distributed pattern processor 200 may also used for voice recognition and/or image recognition. Recognition can be performed using either the processor-like approach or the storage-like approach. For the processor-like approach, the input data 110 include at least a portion of voice/image data collected by at least a sensor; the 3D-M array 170 store at least a recognition model (e.g. an acoustic model, a language model, an image model); and, the pattern-processing circuit 180 performs pattern processing. For the storage-like approach, the input data 110 include the search voice/image patterns; the 3D-M array 170 stores at least a portion of the voice/image archives; and, the pattern-processing circuit 180 performs pattern processing. For the processor-like approach, the 3D-M is preferably a 3D-P, 3D-OTP or 3D-MTP. It can be used to store the acoustic model database, the language model database and/or the image model database. For the storage-like approach, the 3D-M is preferably a 3D-MTP. It can be used to store the voice/image archives.
[0046] While illustrative embodiments have been shown and described, it would be apparent to those skilled in the art that many more modifications than that have been mentioned above are possible without departing from the inventive concepts set forth therein. The invention, therefore, is not to be limited except in the spirit of the appended claims.