DYNAMIC RANDOM-ACCESS MEMORY (DRAM) ON HOT COMPUTE LOGIC FOR LAST-LEVEL-CACHE APPLICATIONS
20260107482 ยท 2026-04-16
Inventors
- Mustafa Badaroglu (San Diego, CA, US)
- Zhongze Wang (San Diego, CA)
- Woo Tag KANG (San Diego, CA, US)
- Periannan Chidambaram (San Diego, CA, US)
Cpc classification
H10B80/00
ELECTRICITY
H10W90/22
ELECTRICITY
International classification
Abstract
A stacked system-on-chip (SoC) is described. The stacked SoC comprises a first memory die comprising a dynamic random-access memory (DRAM). The stacked SoC also comprises a compute logic die. The compute logic die comprises a static random-access memory (SRAM) comprising a first SRAM partition and a second SRAM partition. The first memory die is stacked on the compute logic die. The compute logic die comprises a memory controller. The memory controller is coupled between the first SRAM partition and the second SRAM partition. Additionally, the memory controller is coupled to a DRAM bus of the first memory die.
Claims
1. A system-on-chip (SoC), comprising: a first memory die comprising a dynamic random-access memory (DRAM); and a compute logic die, comprising: a static random-access memory (SRAM) comprising a first SRAM partition and a second SRAM partition, in which the first memory die is stacked on and overlaps at least a portion of the compute logic die, and a memory controller coupled between the first SRAM partition and the second SRAM partition, in which the memory controller is coupled to a DRAM bus of the first memory die.
2. The SoC of claim 1, wherein the first memory die supported by the compute logic die on a first package substrate.
3. The SoC of claim 2, further comprising a system memory die supported by a second package substrate.
4. The SoC of claim 3, wherein the first package substrate and the second package substrate supported by a printed circuit board (PCB).
5. The SoC of claim 2, further comprising: a laminate substrate; and a system memory die supported by the laminate substrate, wherein the laminate substrate is supported by the first package substrate through conductive pillars.
6. The SoC of claim 5, wherein the first package substrate comprises as a fan-out (FO) package substrate.
7. The SoC of claim 5, wherein the system memory die comprises a dynamic random-access memory (DRAM).
8. The SoC of claim 1, in which the memory controller comprises a network-on-chip (NoC) controller.
9. The SoC of claim 1, in which the first memory die comprises a last-level-cache (LLC)-DRAM.
10. The SoC of claim 1, in which the first SRAM partition comprises a first quadrant and a second quadrant, and the second SRAM partition comprises a third quadrant and a fourth quadrant.
11. A method of fabricating a system-on-chip (SoC), the method comprising: forming a compute logic die, comprising a static random-access memory (SRAM) comprising a first SRAM partition and a second SRAM partition, and a memory controller coupled between the first SRAM partition and the second SRAM partition; forming a first memory die comprising a dynamic random-access memory (DRAM); stacking the first memory die on the compute logic die; coupling the memory controller of the compute logic die to a DRAM bus of the first memory die; and stacking the compute logic die supporting the first memory die on a first package substrate.
12. The method of claim 11, further comprising a system memory die supported by a second package substrate.
13. The method of claim 12, wherein the first package substrate and the second package substrate are supported by a printed circuit board (PCB).
14. The method of claim 11, further comprising: a laminate substrate; and a system memory die supported by the laminate substrate.
15. The method of claim 14, wherein the laminate substrate is supported by the first package substrate through conductive pillars.
16. The method of claim 14, wherein the first package substrate comprises as a fan-out (FO) package substrate.
17. The method of claim 14, wherein the system memory die comprises a dynamic random-access memory (DRAM).
18. The method of claim 11, in which the memory controller comprises a network-on-chip (NoC) controller.
19. The method of claim 11, in which the first memory die comprises a last-level-cache (LLC)-DRAM.
20. The method of claim 11, in which the first SRAM partition comprises a first quadrant and a second quadrant, and the second SRAM partition comprises a third quadrant and a fourth quadrant.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0008] For a more complete understanding of the present disclosure, reference is now made to the following description taken in conjunction with the accompanying drawings.
[0009]
[0010]
[0011]
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
DETAILED DESCRIPTION
[0021] The detailed description set forth below, in connection with the appended drawings, is intended as a description of various configurations and is not intended to represent the only configurations in which the concepts described herein may be practiced. The detailed description includes specific details for the purpose of providing a thorough understanding of the various concepts. It will be apparent, however, to those skilled in the art that these concepts may be practiced without these specific details. In some instances, well-known structures and components are shown in block diagram form to avoid obscuring such concepts.
[0022] As described herein, the use of the term and/or is intended to represent an inclusive OR, and the use of the term or is intended to represent an exclusive OR. As described herein, the term exemplary used throughout this description means serving as an example, instance, or illustration, and should not necessarily be construed as preferred or advantageous over other exemplary configurations. As described herein, the term coupled used throughout this description means connected, whether directly or indirectly through intervening connections (e.g., a switch), electrical, mechanical, or otherwise, and is not necessarily limited to physical connections. Additionally, the connections can be such that the objects are permanently connected or releasably connected. The connections can be through switches, repeaters, and/or buffers. As described herein, the term proximate used throughout this description means adjacent, very near, next to, or close to. As described herein, the term on used throughout this description means directly on in some configurations, and indirectly on in other configurations. It will be understood that the term layer includes film and is not construed as indicating a vertical or horizontal thickness unless otherwise stated. As described, the term substrate may refer to a substrate of a diced wafer or may refer to a substrate of a wafer that is not diced. Similarly, the terms chip and die may be used interchangeably.
[0023] Memory is a vital component for wireless communications devices. For example, a cell phone may integrate memory as part of an application processor, such as a system-on-chip (SoC) including a central processing unit (CPU), a graphics processing unit (GPU) and/or a neural signal processor (NSP). Successful operation of some wireless applications depends on the availability of high-capacity and low-latency memory solutions for scalability of CPU/GPU/NSP workload. In particular, a semiconductor memory device solution for providing a high-capacity, low latency, and high-bandwidth memory for a last-level-cache is desired.
[0024] Semiconductor memory devices include, for example, a static random-access memory (SRAM) and a dynamic random-access memory (DRAM). An SRAM memory cell is bi-stable, meaning that it can maintain its state statically and indefinitely, so long as adequate power is supplied. SRAM also supports high speed operation, with lower power dissipation, which is useful for computer cache memory. SRAM area and scaling, however, are stalled by a currently available transistor evolution roadmap particularly for six transistor (6T) SRAM implementations.
[0025] A DRAM memory cell includes one transistor and one capacitor, thereby providing a high degree of integration. DRAM-on-logic, however, is hindered by temperature envelope limitations of DRAM on hotspots on the CPU/GPU/NSP of an SoC. In particular, integrating DRAM to provide a last-level-cache (LLC) on hot compute logic including the CPU/GPU/NSP is problematic because this hot compute logic prevents cooling of the LLC-DRAM junction temperatures. Those limitations have led to industry implementation of LLC-DRAM in side-by-side configuration with the CPU/GPU/NSP of the hot compute logic.
[0026] Accordingly, various aspects of the present disclosure are directed to stacking a DRAM buffer over an SRAM portion of a logic core to provide an on-chip DRAM/SRAM integration. A stacked, system-on-chip (SoC) includes a memory die having a dynamic random-access memory (DRAM) on the memory die and a compute logic die. In various aspects of the present disclosure, the compute logic die includes a static random-access memory (SRAM), having a first SRAM partition and a second SRAM partition on the compute logic die. In some aspects of the present disclosure, the first memory die is stacked on the compute logic die. Additionally, the SoC includes a memory controller on the compute logic die. In various aspects of the present disclosure, the memory controller is coupled between the first SRAM partition and the second SRAM partition and coupled to a DRAM bus of the first memory die.
[0027] According to various aspects of the present disclosure, this SoC DRAM/SRAM integration enables placement of an LLC-DRAM on any hot CPU/GPU/NSP logic die. In various aspects of the present disclosure, a network-on-chip (NoC) controller is placed between SRAM partitions, which provides an LLC base that operates as a cold plate for supporting a memory die including DRAM. This placement of the NoC controller enables improved arbitration of data between the SoC cores, resulting in significantly improved latency. Furthermore, a reduced footprint of a DRAM cell (e.g., 0.00178m.sup.2/cell) versus an SRAM cell (e.g., 0.026m.sup.2/cell) provides a significantly larger density (e.g., 14.6x), resulting in improved latency, energy per bit (energy/bit) and cost when DRAM is stacked on SRAM. Additionally, the central placement of the NoC controller provides a coherent bus interface for LLC.
[0028]
[0029] In this configuration the host SoC 100 includes various processing units that support multi-threaded operation For the configuration shown in
[0030]
[0031] The SRAM bitcell 200 is bi-stable, meaning that it can maintain its state statically and indefinitely, so long as adequate power is supplied. The SRAM bitcell 200 also supports high speed operation, with lower power dissipation, which is useful for computer cache memory. Area and scaling of the SRAM bitcell 200, however, are stalled by a currently available transistor evolution roadmap particularly for six transistor (6T) SRAM implementations. As shown in
[0032]
[0033] A DRAM memory cell includes one transistor and one capacitor (1T1C), thereby providing a high degree of integration due to a reduced footprint (e.g., 0.00178m.sup.2/cell). DRAM-on-logic, however, is hindered by temperature envelope limitations of DRAM on hotspots, such as the CPU 102, the GPU 104, and the NPU/NSP 108 of the system-on-chip (SoC) 100 of
[0034]
[0035] In some aspects of the present disclosure, the DRAM die 300 is stacked on the compute logic die 410. In this arrangement, the CPU 102, the GPU 104, the DSP 106, and the NPU/NSP 108 are placed at opposing peripheral portions of the compute logic die 410 and are separated by the SRAM 420, which effectively operates as a cold plate (e.g., an LLC-base) for helping cool junction temperatures of the DRAM die 300. Additionally, the stacked SoC 400 includes a memory controller 440 on the compute logic die 410. In various aspects of the present disclosure, the memory controller 440 is coupled between the first SRAM partition 422 and the second SRAM partition 424 and coupled to the DRAM bus 340 of the DRAM die 300 through first memory interconnects 350 as further illustrated, for example, in
[0036]
[0037] In various aspects of the present disclosure, the routing layers of the bus topology 460 are coupled to second memory interconnects 450 (e.g., vertical connects through under bumps/pad-to-interconnect-vias) of the memory controller 440. The first memory interconnects 350 and the second memory interconnects 450 may include hybrid bound or under bump bonding through the routing layers of the bus topology 460. In these aspects of the present disclosure, the memory controller 440 is configured as a network-on-chip (NoC) controller to route DRAM data and static random-access memory (SRAM) data through the routing layers of the bus topology 460. In this example, the memory controller 440 routes the DRAM data and the SRAM data through the routing layers of the bus topology 460 to the CPU 102, the GPU 104, the DSP 106, and the NPU/NSP 108, which are placed at opposing peripheral portions of the compute logic die 410. Further quadrant splitting of the first SRAM partition 422 and the second SRAM partition 424 may be performed for improved data routing in the compute logic die 410.
[0038] As shown in
[0039]
[0040]
[0041] As shown in
[0042]
[0043]
[0044]
[0045] At block 1004, a first memory die is formed, having a dynamic random-access memory (DRAM) on the first memory die. For example, as shown in
[0046] At block 1006, the first memory die is stacked on the compute logic die. For example, as shown in
[0047] At block 1008, the memory controller of the compute logic die is coupled to a DRAM bus of the first memory die. For example, as shown in
[0048]
[0049] In
[0050]
[0051] Data recorded on the storage medium 1204 may specify logic circuit configurations, pattern data for photolithography masks, or mask pattern data for serial write tools such as electron beam lithography. The data may further include logic verification data such as timing diagrams or net circuits associated with logic simulations. Providing data on the storage medium 1204 facilitates the design of the circuit 1210 or the IC component 1212 by decreasing the number of processes for designing semiconductor wafers.
[0052] Implementation examples are described in the following numbered clauses:
[0053] 1. A system-on-chip (SoC), comprising:
[0054] a first memory die comprising a dynamic random-access memory (DRAM); and
[0055] a compute logic die, comprising:
[0056] a static random-access memory (SRAM) comprising a first SRAM partition and a second SRAM partition, in which the first memory die is stacked on and overlaps at least a portion of the compute logic die, and
[0057] a memory controller coupled between the first SRAM partition and the second SRAM partition, in which the memory controller is coupled to a DRAM bus of the first memory die.
[0058] 2. The SoC of clause 1, wherein the first memory die supported by the compute logic die on a first package substrate.
[0059] 3. The SoC of clause 2, further comprising a system memory die supported by a second package substrate.
[0060] 4. The SoC of clause 3, wherein the first package substrate and the second package substrate supported by a printed circuit board (PCB).
[0061] 5. The SoC of clause 2, further comprising: a laminate substrate; and a system memory die supported by the laminate substrate, wherein the laminate substrate is supported by the first package substrate through conductive pillars.
[0062] 6. The SoC of clause 5, wherein the first package substrate comprises as a fan-out (FO) package substrate.
[0063] 7. The SoC of clause 5, wherein the system memory die comprises a dynamic random-access memory (DRAM).
[0064] 8. The SoC of any of clauses 1-7, in which the memory controller comprises a network-on-chip (NoC) controller.
[0065] 9. The SoC of any of clauses 1-8, in which the first memory die comprises a last-level-cache (LLC)-DRAM.
[0066] 10. The SoC of any of clauses 1-9, in which the first SRAM partition comprises a first quadrant and a second quadrant, and the second SRAM partition comprises a third quadrant and a fourth quadrant.
[0067] 11. A method of fabricating a system-on-chip (SoC), the method comprising:
[0068] forming a compute logic die, comprising a static random-access memory (SRAM) comprising a first SRAM partition and a second SRAM partition, and a memory controller coupled between the first SRAM partition and the second SRAM partition;
[0069] forming a first memory die comprising a dynamic random-access memory (DRAM);
[0070] stacking the first memory die on the compute logic die;
[0071] coupling the memory controller of the compute logic die to a DRAM bus of the first memory die; and
[0072] stacking the compute logic die supporting the first memory die on a first package substrate.
[0073] 12. The method of clause 11, further comprising a system memory die supported by a second package substrate.
[0074] 13. The method of clause 12, wherein the first package substrate and the second package substrate are supported by a printed circuit board (PCB).
[0075] 14. The method of any of clauses 11-13, further comprising: a laminate substrate; and a system memory die supported by the laminate substrate.
[0076] 15. The method of clause 14, wherein the laminate substrate is supported by the first package substrate through conductive pillars.
[0077] 16. The method of clause 14, wherein the first package substrate comprises as a fan-out (FO) package substrate.
[0078] 17. The method of clause 14, wherein the system memory die comprises a dynamic random-access memory (DRAM).
[0079] 18. The method of any of clauses 11-17, in which the memory controller comprises a network-on-chip (NoC) controller.
[0080] 19. The method of any of clauses 11-18, in which the first memory die comprises a last-level-cache (LLC)-DRAM.
[0081] 20. The method of any of clauses 11-19, in which the first SRAM partition comprises a first quadrant and a second quadrant, and the second SRAM partition comprises a third quadrant and a fourth quadrant.
[0082] For a firmware and/or software implementation, the methodologies may be implemented with modules (e.g., procedures, functions, etc.) that perform the functions described herein. A machine-readable medium tangibly embodying instructions may be used in implementing the methodologies described herein. For example, software codes may be stored in a memory and executed by a processor unit. Memory may be implemented within the processor unit or external to the processor unit. As used herein, the term memory refers to types of long term, short term, volatile, nonvolatile, or other memory and is not limited to a particular type of memory or number of memories, or type of media upon which memory is stored.
[0083] If implemented in firmware and/or software, the functions may be stored as one or more instructions or code on a computer-readable medium. Examples include computer-readable media encoded with a data structure and computer-readable media encoded with a computer program. Computer-readable media includes physical computer storage media. A storage medium may be an available medium that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Disk and disc, as used herein, include compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray.sup. disc, where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
[0084] In addition to storage on computer-readable medium, instructions and/or data may be provided as signals on transmission media included in a communications apparatus. For example, a communications apparatus may include a transceiver having signals indicative of instructions and data. The instructions and data are configured to cause one or more processors to implement the functions outlined in the claims.
[0085] Although the present disclosure and its advantages have been described in detail, various changes, substitutions, and alterations can be made herein without departing from the technology of the disclosure as defined by the appended claims. For example, relational terms, such as above and below are used with respect to a substrate or electronic device. Of course, if the substrate or electronic device is inverted, above becomes below, and vice versa. Additionally, if oriented sideways, above, and below may refer to sides of a substrate or electronic device. Moreover, the scope of the present application is not intended to be limited to the configurations of the process, machine, manufacture, composition of matter, means, methods, and steps described in the specification. As one of ordinary skill in the art will readily appreciate from the disclosure, processes, machines, manufacture, compositions of matter, means, methods, or steps, presently existing or later to be developed that perform the same function or achieve the same result as the corresponding configurations described herein may be utilized according to the present disclosure. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or steps.
[0086] Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
[0087] The various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein may be implemented or performed with a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, multiple microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
[0088] The steps of a method or algorithm described in connection with the disclosure may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM, flash memory, ROM, EPROM, EEPROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
[0089] The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the spirit or scope of the disclosure. Thus, the disclosure is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.