INTEGRATED CIRCUIT WITH LOW LATENCY AND HIGH DENSITY ROUTING BETWEEN A MEMORY CONTROLLER DIGITAL CORE AND I/OS
20170083461 ยท 2017-03-23
Inventors
Cpc classification
International classification
Abstract
An integrated circuit is provided with a memory controller coupled to a buffered command and address bus and a pipelined data bus having a pipeline delay. The memory controller is configured to control the write and read operations for an external memory having a write latency period requirement. The memory controller is further configured to launch write data into the pipelined data bus responsive to the expiration of a modified write latency period that is shorter than the write latency period.
Claims
1. An integrated circuit, comprising: a buffered command and address (CA) bus; a pipelined data (DQ) write bus having a pipeline delay; and a memory controller configured to drive a write command signal into the buffered CA bus at an initial time, wherein the memory controller is further configured to determine a delay difference period between a write latency requirement for an external memory and the pipeline delay and to drive a DQ signal into the pipelined DQ write bus at an expiration of the delay difference period.
2. The integrated circuit of claim 1, further comprising a plurality of DQ endpoints, wherein the pipelined DQ write bus comprises a plurality of pipelined DQ write buses corresponding to the plurality of DQ endpoints, each pipelined DQ write bus being coupled between the memory controller and the corresponding DQ endpoint, and wherein the DQ signal comprises a plurality of DQ signals corresponding to the plurality of DQ endpoints, each DQ endpoint being configured to drive the corresponding DQ signal to an external memory.
3. The integrated circuit of claim 2, wherein the external memory is a dynamic random access memory (DRAM).
4. The integrated circuit of claim 2, further comprising a buffered DQ read bus coupled between the DQ endpoint and the memory controller.
5. The integrated circuit of claim 1, wherein the buffered CA bus comprises a plurality of buffers coupled to a plurality of metal-layer traces routed according to non-default routing rules.
6. The integrated circuit of claim 1, further comprising: a clock source configured to provide a memory clock signal, wherein the memory controller is configured to drive the write command into the buffered CA bus at the initial time responsive to a first cycle of the memory clock signal, and wherein the memory controller is further configured to drive the DQ signal into the pipelined DQ write bus at the expiration of the delay difference period responsive to a second cycle of the memory clock signal.
7. The integrated circuit of claim 6, wherein the pipelined DQ write bus comprises a plurality of first registers and a plurality of second registers, and wherein the first registers are configured to be clocked by a rising edge of the memory clock signal, and wherein the second registers are configured to be clocked by a falling edge of the memory clock signal.
8. The integrated circuit of claim 6, wherein the pipelined DQ write bus comprises a plurality of registers and a plurality of corresponding multiplexers, wherein each multiplexer is configured to select for an output signal from the corresponding register and for a bypass path that bypasses the corresponding register, and wherein the memory controller is configured to control the selection by the multiplexers to adjust the pipeline delay.
9. The integrated circuit of claim 6, wherein the pipeline delay equals an integer P number of the memory clock cycles, and wherein the write latency requirement equals an integer number WL of the memory clock cycles, and wherein the delay difference period equals a difference between WL and P.
10. The integrated circuit of claim 6, wherein the memory controller includes a DQ timer configured to time the difference delay period responsive to being clocked with the memory clock.
11. The integrated circuit of claim 1, wherein, the memory controller is configured to adjust the pipeline delay for the pipelined DQ write bus responsive to a change in the write latency requirement.
12. A method, comprising: from a memory controller, driving a command signal over a buffered command bus to a first input/output (I/O) endpoint at an initial time; determining a delay equaling a difference between a write latency requirement for an external memory and a pipeline delay over a pipelined data bus; and at the expiration of the delay from the initial time, driving a data signal from the memory controller over the pipelined data bus to a second I/O endpoint.
13. The method of claim 12, further comprising driving a clock signal from the memory controller to the second I/O endpoint, the method further comprising latching the data signal at the second I/O endpoint responsive to the clock signal.
14. The method of claim 13, further comprising transmitting the latched data signal from the second I/O endpoint to the external memory to satisfy the write latency requirement.
15. The method of claim 12, wherein driving the command signal comprises driving a write command signal.
16. The method of claim 15, wherein driving the write command signal at the initial time is responsive to a first cycle of a clock signal.
17. The method of claim 12, further comprising changing the pipeline delay responsive to a change in the write latency requirement.
18. The method of claim 16, wherein changing the pipeline delay comprises controlling a plurality of multiplexers within the pipelined data bus.
19. An integrated circuit, comprising: a memory controller; first means for propagating a write command signal from the memory controller to a command and address (CA) endpoint without a pipeline delay; and second means for propagating a write data (DQ) signal from the memory controller to a DQ endpoint with a pipeline delay, wherein the memory controller includes a third means for determining a delay difference period between a write latency period for an external memory and the pipeline delay and for driving the DQ signal into the means for propagating the DQ signal upon the expiration of the delay difference period.
20. The integrated circuit of claim 18, wherein the third means is configured to time the delay difference period responsive to cycles of a memory clock signal.
21. The integrated circuit of claim 18, wherein the second means is configured to propagate a plurality of DQ signals from the memory controller to a corresponding plurality of DQ endpoints with the pipeline delay.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0009]
[0010]
[0011]
[0012]
[0013]
[0014] The various aspects of the present disclosure and their advantages are best understood by referring to the detailed description that follows. It should be appreciated that like reference numerals are used to identify like elements illustrated in one or more of the figures.
DETAILED DESCRIPTION
[0015] To increase density and operating speed, a memory controller is provided in which the command and address (CA) bus between the memory controller and its endpoints is buffered whereas the data (DQ) buses between the memory controller and its endpoints are pipelined with registers. Since there may be only one buffered CA bus for a relatively large number of pipelined DQ paths, the area demands from any non-default routing rules (NDR) routing of the metal traces for the buffered CA bus is minimal. In addition, the buffered CA bus increases memory operating speed. Since the data signals carried on the DQ buses will now be delayed by the clock cycles corresponding to the number of pipeline registers in each DQ bus whereas the CA signals will be unhindered by any pipelining, the write latency between the generation of the CA signals and the generation DQ signals within the memory controller is decoupled. In particular, the memory controllers disclosed herein launch their DQ signals with regard to a modified write latency that is shorter than the write latency required by the external memory.
[0016] An example system-on-a-chip (SoC) 100 including a memory controller 101 is shown in
[0017] In addition, memory controller 101 drives a plurality of pipelined data (DQ) buses 125 that are received by a corresponding plurality of DQ endpoints 145. Each pipelined DQ bus 125 includes a plurality of pipeline registers that are clocked by the memory write clock distributed by memory controller 101 to DQ endpoints 145. The corresponding clock paths and clock source are not shown for illustration clarity. Each DQ bus 125 may be deemed to comprise a means for propagating a DQ signal from the memory controller 101 to a DQ endpoint 145 with a pipeline delay. The pipeline registers may alternate as rising-edge clocked registers 115 and falling-edge clocked registers 120. The delay between a consecutive pair of registers 115 and 120 thus corresponds to one half cycle of the memory clock signal. The total delay in clock cycles across each pipeline DQ bus 125 thus depends upon how many pipeline stages formed by pairs of registers 115 and 120 are included. For example, if there six registers 115 (and thus six registers 120) included in each pipelined DQ bus 125, the total pipeline delay in clock cycles for the DQ signals to propagate from memory controller 101 to the corresponding DQ endpoint 145 would be six clock cycles. In alternative implementations, pipelined DQ bus 125 may be responsive to just one clock edge (rising or falling) such that its registers would be all rising-edge triggered or all falling-edge triggered. As will be explained further herein, memory controller 101 is configured to use this pipeline delay with regard to launching the DQ data signals with respect to a modified or pseudo write latency period. For example, if the pipelining delay is six clock cycles whereas the desired write latency is eight clock cycles, memory controller 101 may launch the DQ signals two clock cycles after the launch of the corresponding write command. More generally, the pipelining delay may be represented by a variable P whereas the write latency required by the external memory may be represented as the variable WL (both delays being some integer number of clock cycles). The memory controller may thus launch the DQ signals by the difference between the write latency and the pipelining delay (WL-P) in clock cycles after the launch of the corresponding write command. The write command is subjected to no pipelining delay on buffered CA bus 110 such that it arrives at CA endpoint 130 in the same clock cycle as when it was launched. In contrast, the DQ signals will be delayed by the pipelining delay. Since the DQ signals were launched WL-P clock cycles after the write command, the DQ signals thus arrive at their DQ endpoints 145 by a delay of WLP+P=WL in clock cycles after the launch of the CA write command. The desired write latency is thus maintained despite the lack of pipelining for the CA write command.
[0018] Note that the required write latency for DRAMs such as specified by the JEDEC specification may depend upon the clock rate. The clock rate may be changed depending upon the mode of operation. For example, the clock rate may be slowed down in a low power mode of operation as compared to the rate used in a high performance mode of operation. In that regard, the JEDEC specification requires a write latency of eight clock cycles at a clock rate of 988 MHz but reduces the required write latency to be just three clock cycles at a clock rate of 400 MHz. The resulting change in clock rate may thus result in the changed write latency being less than the pipelining delay for each DQ bus 125. For example, if the pipelining delay was six clock cycles but the new value for the write latency was three clock cycles, memory controller 101 could not satisfy the required write latency even if it launched the DQ data signals in the same clock cycle as it launched the corresponding CA write command.
[0019] To account for any changing of the write latency such as with regard to modes of operation, each pipelined DQ bus 125 in system 100 may be replaced by an adaptive pipelined DQ bus 140 as shown in
[0020] Note that each DQ signal carried on a corresponding pipelined DQ bus 125 or 140 is a multi-bit word just like the corresponding CA write command. Each pipelined DQ bus 125 or 140 may thus comprise a plurality of metal layer traces corresponding to the width in bits of the DQ signals they carry. These individual traces are not shown for illustration clarity. Registers 115 and 120 would thus comprises a plurality of such registers for each individual bit in the corresponding DQ signal.
[0021] A more detailed view of SoC 100 is shown in
[0022] A DQ generation circuit 210 is configured to calculate the delay difference between the write latency and the pipeline delay, which in this example would be two clock cycles. This delay difference may be considered to be a modified write latency period in that DQ generation circuit launches the DQ signals responsive to the expiration of the delay difference period analogously to how a conventional memory controller would launch its DQ signals at the expiration of the write latency period following the launch of the write command. DQ timers 215 are configured accordingly to time this two clock cycle difference so that DQ generation circuit 210 launches the corresponding DQ signals two clock cycles after timing and command generation circuit 200 launched the write command. DQ generation circuit 210 may comprise a plurality of logic gates such as to implement a finite state machine configured to perform the necessary DQ generation and timing functions. The write latency between the CA generation (in this example, eight clock cycles) and the modified write latency with regard to the DQ generation (in this example, two clock cycles) is thus decoupled. Although DQ buses 125 are pipelined, note that the read data buses from DQ endpoints 145 to memory controller 101 may be buffered so as to minimize the read latency. DQ generation circuit 210 may be considered to comprise a means for determining a delay difference period between a write latency period for an external memory and the pipeline delay and for driving the DQ signal into DQ bus 125 upon the expiration of the delay difference period.
[0023] The resulting latency between the launching of the CA write command and the write data (DQ) is shown in tabular form in
[0024] A method of operation will now be discussed with regard to the flowchart shown in
[0025] As those of some skill in this art will by now appreciate and depending on the particular application at hand, many modifications, substitutions and variations can be made in and to the materials, apparatus, configurations and methods of use of the devices of the present disclosure without departing from the spirit and scope thereof. In light of this, the scope of the present disclosure should not be limited to that of the particular implementations illustrated and described herein, as they are merely by way of some examples thereof, but rather, should be fully commensurate with that of the claims appended hereafter and their functional equivalents.