Device Disaggregation For Improved Performance
20200051999 ยท 2020-02-13
Inventors
- Javier A. DeLaCruz (San Jose, CA)
- Don Draper (San Jose, CA, US)
- Jung Ko (San Jose, CA)
- Steven L. Teig (Menlo Park, CA)
Cpc classification
H01L2224/80
ELECTRICITY
H01L2224/94
ELECTRICITY
H01L25/0652
ELECTRICITY
H01L21/8221
ELECTRICITY
H01L2224/80
ELECTRICITY
H01L25/50
ELECTRICITY
H01L2224/94
ELECTRICITY
International classification
Abstract
The present disclosure provides chip architectures for FPGAs and other routing implementations that provide for increased memory with high bandwidth, in a reduced size, accessible with reduced latency. Such architectures include a first layer in advanced node and a second layer in legacy node. The first layer includes an active die, active circuitry, and a configurable memory, and the second layer includes a passive die with wiring. The second layer is bonded to the first layer such that the wiring of the second layer interconnects with the active circuitry of the first layer and extends an amount of wiring possible in the first layer.
Claims
1. A 3D semiconductor device, comprising: a first layer in advanced node, the first layer including an active die, active circuitry, and a configurable memory; a second layer in legacy node, the second layer including a passive die with wiring, wherein the second layer is bonded to the first layer such that the wiring of the second layer interconnects with the active circuitry of the first layer to connect one or more first points on the active die to one or more second points on the active die; and a plurality of external interconnects extending through the passive die and adapted to couple the wiring with an external device.
2. The 3D semiconductor device of claim 1, wherein the active circuitry of the first layer includes a plurality of multiplexers.
3. The 3D semiconductor device of claim 2, wherein at least some of the multiplexers have ratios of at least 32:1 or greater.
4. The 3D semiconductor device of claim 2, wherein at least some of the active circuitry is hardcoded.
5. The 3D semiconductor device of claim 1, wherein interconnects between the first layer and the second layer have a pitch of 10 or less.
6. The 3D semiconductor device of claim 1, wherein an interconnect density between the first layer and the second layer is 105-106 connections/mm2.
7. The 3D semiconductor device of claim 2, wherein the active die further comprises an embedded memory residing over the multiplexers and look-up tables of the active circuitry.
8. The 3D semiconductor device of claim 1, wherein the 3D semiconductor device is a field programmable gate array.
9. The 3D semiconductor device of claim 1, wherein the 3D semiconductor device is a switch matrix.
10. The 3D semiconductor device of claim 1, wherein the 3D semiconductor device is a traffic manager.
11. The 3D semiconductor device of claim 1, wherein the plurality of external interconnects comprise data interconnects, power interconnects, and ground interconnects in a repeating pattern.
12. The 3D semiconductor device of claim 11, wherein the repeating pattern includes one or more stripes of the data interconnects between one or more stripes of power interconnects and one or more stripes of ground interconnects.
13. A 3D semiconductor device, comprising: a first tier; and a second tier bonded to the first tier, wherein each of the first tier and the second tier comprises: a first layer in advanced node, the first layer including an active die, active circuitry, and a configurable memory; and a second layer, the second layer including a passive die with wiring, wherein the second layer is bonded to the first layer such that the wiring of the second layer interconnects with the active circuitry of the first layer to connect one or more first points on the active die to one or more second points on the active die.
14. The 3D semiconductor device of claim 13, wherein the second layer is bonded to the first layer such that the wiring of the second layer interconnects with the active circuitry of the first layer to connect one or more third points on the active die to an external device.
15. The 3D semiconductor device of claim 13, wherein the second layer is in legacy node.
16. The 3D semiconductor device of claim 13, wherein the first tier and the second tier are face-to-face bonded.
17. The 3D semiconductor device of claim 13, further comprising a third tier, the third tier also comprising an advanced node layer and a passive layer.
18. The 3D semiconductor device of claim 17, wherein the third tier is front-to-back bonded to the second tier.
19. The 3D semiconductor device of claim 17, wherein the third tier is back-to-back bonded to the second tier.
20. The 3D semiconductor device of claim 19, further comprising through-silicon vias extending between the second tier and the third tier.
21. The 3D semiconductor device of claim 13, wherein the active circuitry of the first tier includes at least one look-up table, the at least one look-up table configured to access the configurable memory of the first tier and the configurable memory of the second tier.
22. The 3D semiconductor device of claim 21, wherein the at least one look-up table is configured to access the configurable memory of the first tier and the configurable memory of the second tier in a given clock cycle.
23. The 3D semiconductor device of claim 13, further comprising an interface including a plurality of data interconnects, power interconnects, and ground interconnects in a repeating pattern.
24. A field programmable gate array, comprising: a first layer in advanced node, the first layer including an active die and active circuitry, the active circuitry comprising a plurality of multiplexers and a plurality of hardcoded logical connections; and a second layer in legacy node, the second layer including a passive die with wiring, wherein the second layer is bonded to the first layer such that the wiring of the second layer interconnects with the active circuitry of the first layer to connect one or more first points on the active die to one or more second points on the active die.
25. The field programmable gate array of claim 24, further comprising an embedded memory residing over the plurality of multiplexers and a plurality of look-up tables.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
DETAILED DESCRIPTION
[0027] While the following disclosure provides a number of examples, it should be understood that the concepts and techniques are not limited to specific examples, but rather can be more broadly applied. For example, while the examples herein may refer to FPGAs, it should be understood that the technology described in such examples could also be applied to other devices, such as routers, switch matrix devices, traffic managers, etc.
[0028]
[0029] The passive wiring die 120 may be formed of any semiconductor material, such as silicon, glass, InP, SiGe, SOI, GaAs, GaN, SiC, LiTaO.sub.3, LiNbO.sub.3, sapphire, etc. In some examples, it may be extremely thin, such as having a thickness below 50 m. For example, the passive die may be approximately 5 m in some examples. However, it should be understood that any thickness may be used.
[0030] The passive wiring die 120 includes wiring in one or more routing layers. The routing layers may be formed using any of a variety of conventional fabrication techniques used for legacy nodes. Multiple routing layers may be separated by, for example, passivation layers, such as silicon dioxide, silicon nitride, polymer or other materials. The passive die 120 can make data signal connections back to the active die 110. In contrast to a conventional interposer which can only take signals to and from an active die, adding layers to the active die 110 improves the connectivity within the active die 110. While a single passive wiring die 120 is shown in
[0031] As shown in
[0032] The active die 110 may be in silicon, GaAs, SiGe, SOI, or any substrate suitable for active circuitry. The active die 110 may include, for example, an FPGA or components thereof, or other logic devices, such as network switching circuitry. As such, the active die 110 may include a plurality of multiplexers and look-up tables (LUTs).
[0033] The joining of the passive die 120 to the active die 110 extends a possible amount of wiring of the active die 110. For example, the passive die 120 provides for connections between points on the active die 110 to other points on the same active die 110. The extra wiring creates an ability for the active die 110 to use deep multiplexers, such as 32:1 or greater. For example,
[0034] The multiplexers 282, 284 may have various ratios, including large ratios such as 32:2, 64:2, or greater. Moreover, the ratio for a first multiplexer 282 may differ from that of a second multiplexer 284. While only two multiplexers are shown, it should be understood that the active circuitry 115 may include any number of multiplexers.
[0035] The additional wiring of the passive die 120 provides an ability to program more code into smaller devices. Because the passive die 120 is less expensive than the active die 110, the design combining the active die with the passive die provides the benefit of the additional available wiring that is also economically advantageous as compared to adding extra layers in advanced node. Moreover, the design may be fabricated using legacy foundry equipment, thereby reducing a need for purchasing new equipment. For example, existing equipment from legacy nodes can be used given that the wiring layers do not need to have the finest geometry. This enables a cost reduction in adding of the extra wiring layers.
[0036] In some examples, costs may be further reduced by prewiring some connections of the active node circuitry, rather than using multiplexers. For example, rather than making every route path possible with numerous multiplexers, an implementation of the chip may only require some routes to have various possible paths while other routes are the same every time. The routes that are the same every time may be fixed in place by hardcoding or prewiring the connections, rather than using a multiplexer. For example, a generic FPGA may be used and one or more of the routing paths may be hardcoded, such that the paths are fixed in a program in such a way that they cannot be altered without altering the program. For example, inputs, outputs, or the paths between them could not be changed without altering the source code. The reduction in multiplexers will result in reduced power consumption of the device.
[0037]
[0038] As shown in
[0039] In
[0040]
[0041]
[0042] According to some other examples, a passive routing layer of the chip may be used to effectively configure input/output (I/O). For example, I/O connections to buffers within the chip may be changed through the passive or active circuitry. Some layers of the chip may be maintained, while layers interfacing with other devices are swapped out. For example, the passive die 120 may be swapped out with another passive die having different routing paths. The interchangeable passive layers allow for hard flexibility in routing which may be more power-efficient than having the soft programmability of multiplexers. This may purposely restrict some level of programmability based upon application, market, desire to reduce the power dissipation of a devices or other reasons.
[0043]
[0044] A further benefit of the design of
[0045] The embedded memory 612 may be configured to emulate a many-ported memory, thus making it highly parallel. For example, by emulating a many-ported memory, the embedded memory 612 may be adapted to handle regular expression search, networking, data lookup, encryption, compression/decompression, and any of a variety of other functions.
[0046] While
[0047] According to some examples, the design of
[0048]
[0049] Replacing the configurable memory 714 with a passive ROM 716 provides cost benefits in that eliminating a need for active circuits such as transistors, and instead using a passive wafer, significantly reduces the cost of materials. Moreover, the ROM 716 operates using a reduced amount of power as compared to the configurable memory 714, thereby providing a power saving benefit. Eliminating transistors further eliminates their leakage contribution, and thus an overall amount of leakage drops when using the passive ROM 716 instead of the configurable memory 714. Further, there is no change to the multiplexers and LUTs in the active circuitry 115. As such, replacing the configurable memory 714 with a passive ROM 716 will not result in a timing change.
[0050]
[0051] FPGA block 830 is back to back bonded to FPGA block 850. Through-silicon vias (TSVs) 838, 858 may be used to establish connections across the FPGA blocks 830, 850. For example, the TSVs 838, 858 may provide connections between the configurable memory and the multiplexers. Low density routing may be provided across the back to back connections.
[0052] According to some examples, multiplexable links may be shared between the dies. A link can be multiplexed within the same die or between dies. If the stack is mounted on an ASIC, a number of interconnect pads may provide more potential signal locations than needed. Accordingly, such additional potential signal locations can be routed if it becomes necessary.
[0053] Memories in this example architecture could be SRAM-based or non-SRAM-based. For example, the memories may in some instances include DRAM or non-volatile memories.
[0054] The stack provides an increased number of interconnects, without consuming additional area along a horizontal axis. By stacking vertically, only a few microns of additional area may be needed along a vertical axis.
[0055]
[0056] Because the LUT 957 can reference multiple memories in a clock cycle, the LUT 957 can behave as multiple LUTs. For example, for each different memory the LUT 957 can access in a given clock cycle, the LUT 957 can perform a function. Accordingly, if the LUT 957 can access 3 different memories, the LUT 957 can perform 3 different functions, and thus serve as 3 different LUTs. While only one LUT 957 is shown in
[0057] In some instances, the LUT 957 may cycle through some, but not all, of the configurable memories 912, 932, 952 in a given cycle. In such instances, partial reconfiguration is possible in nearly zero time.
[0058] According to some examples, a spare layer of memory may be used to capture a user state to act as a shadow processor. The shadow state can be read out asynchronously without disturbing a running processor. For example, in a given cycle, computation may be performed more quickly by predicting future requests and performing computations. The predictions may be based on, for example, a last bit of interest in a last process. While data is transferred in response to existing requests, predictions may made for future requests as an active shadow. Because the LUT is able to access multiple memories in one clock cycle, the LUT can access the spare layer of memory to retrieve the computations performed in response to the predicted requests, while also accessing memories for responding to current requests.
[0059]
[0060]
[0061] In
[0062]
[0063]
[0064]
[0065] Although the invention herein has been described with reference to particular embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present invention. For example, while some example architectures are described herein in connection with FPGAs, it should be understood that the present disclosure is not limited to FPGAs. Rather, the architectures may be implemented in any of a number of other types of devices, including, by way of example only, switches, such as network switches or datacenter switches, routers, etc. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the spirit and scope of the present invention as defined by the appended claims.