Generic physical layer providing a unified architecture for interfacing with an external memory device and methods of interfacing with an external memory device
11373694 · 2022-06-28
Assignee
Inventors
- Soon Chieh Lim (Bayan Lepas Pulau Pinang, MY)
- Chee Hak Teh (Bayan Lepas Pulau Pinang, MY)
- Tat Hin Tan (Bayan Lepas Pulau Pinang, MY)
Cpc classification
G11C7/1039
PHYSICS
G11C7/1012
PHYSICS
International classification
G11C7/10
PHYSICS
G11C8/18
PHYSICS
Abstract
A generic physical layer providing a unified architecture for interfacing with an external memory device. The physical layer comprises a transmit data path for transmitting a parallel data to the external memory device and a receive data path for receiving a serial data from the external memory device. The generic physical layer is characterized by a receive enable logic for masking strobe of the serial data, wherein the transmit data path and the receive data path each comprising a FIFO circuit, a data rotator and an adjustable-delay logic for delay tuning and a per-bit-deskew for multi-lane support.
Claims
1. A generic physical layer providing a unified architecture for interfacing with an external memory device, the generic physical layer comprising: a transmit data path for transmitting a parallel data to the external memory device; a receive data path for receiving a serial data from the external memory device; and a receive enable logic for masking strobe of the serial data, wherein the transmit data path and the receive data path each comprising a FIFO circuit, a data rotator and an adjustable-delay logic for delay tuning and a per-bit-deskew for multi-lane support.
2. The generic physical layer as claimed in claim 1, wherein the transmit data path further includes a clock-crossing multiplexor configured to select a read base index.
3. The generic physical layer as claimed in claim 1, wherein the transmit data path further includes a serializer configured to serialize the parallel data.
4. The generic physical layer as claimed in claim 1, wherein the receive data path further includes a divider configured to divide the strobe of the serial data.
5. The generic physical layer as claimed in claim 1, wherein the transmit data path further includes a multi-rank logic configured to support data transmission of multiple ranks.
6. The generic physical layer as claimed in claim 1, wherein the FIFO circuit of the receive data path is associated with a plurality of latches for handling strobe toggling, multi-lane data transfer and de-skew.
7. The generic physical layer as claimed in claim 1, wherein the receive data path further includes a counter configured to extend user read enable to cover strobe toggling.
8. The generic physical as claimed in claim 1, wherein the receive enable logic comprises a logic gate circuit.
9. A method of transmitting a parallel data to an external memory device using a generic physical layer, wherein the generic physical layer provides a unified architecture for interfacing with the external memory device, and wherein the generic physical layer includes a transmit data path for transmitting a parallel data to the external memory device, a receive data path for receiving a serial data from the external memory device, and a receive enable logic for masking strobe of the serial data, and wherein the transmit data path and the receive data path each comprising a FIFO circuit, a data rotator and an adjustable-delay logic for delay tuning and a per-bit-deskew for multi-lane support, the method comprising: inputting the parallel data at the transmit data path; implementing coarse delay tuning by the FIFO circuit and the data rotator, and fine delay tuning by the adjustable-delay logic; serializing the parallel data to form serial data; and transmitting the serial data to an external pad.
10. A method of receiving a serial data from an external memory device using a generic physical layer, wherein the generic physical layer provides a unified architecture for interfacing with the external memory device, and wherein the generic physical layer includes a transmit data path for transmitting a parallel data to the external memory device, a receive data path for receiving a serial data from the external memory device, and a receive enable logic for masking strobe of the serial data, and wherein the transmit data path and the receive data path each comprising a FIFO circuit, a data rotator and an adjustable-delay logic for delay tuning and a per-bit-deskew for multi-lane support, the method comprising: inputting the serial data at the receive data path; dividing strobe of the serial data based on strobe edges including rising edges and falling edges by a divider; and implementing coarse delay tuning by the FIFO circuit and the data rotator, and fine delay tuning by the adjustable-delay logic.
11. The method as claimed in claim 10, wherein the method further comprising: masking the strobe of the serial data by the receive enable logic prior to inputting the serial data at the receive data path.
12. The method as claimed in claim 10, wherein the method further comprising: extending a user read enable to cover strobe toggling by a counter.
13. The method as claimed in claim 11, wherein the step of masking the strobe of the serial data comprises the step of generating three signals comprising a receive enable, a receive end of packet and a receive end of packet FIFO load enable.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) So that the manner in which the above recited features of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may have been referred by embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments of this invention and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective embodiments.
(2) These and other features, benefits, and advantages of the present invention will become apparent by reference to the following text figures, with like reference numbers referring to like structures across the views, wherein:
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
(25)
(26)
(27)
(28)
(29)
(30)
(31)
(32)
(33)
(34)
(35)
(36)
(37)
(38)
(39)
(40)
(41)
DETAILED DESCRIPTION OF THE INVENTION
(42) As required, detailed embodiments of the present invention are disclosed herein; however, it is to be understood that the disclosed embodiments are merely exemplary of the invention, which may be embodied in various forms. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting but merely as a basis for claims. It should be understood that the drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the invention is to cover all modifications, equivalents and alternatives falling within the scope of the present invention as defined by the appended claims. As used throughout this application, the word “may” is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words “include,” “including,” and “includes” mean including, but not limited to. Further, the words “a” or “an” mean “at least one” and the word “plurality” means one or more, unless otherwise mentioned. Where the abbreviations or technical terms are used, these indicate the commonly accepted meanings as known in the technical field.
(43) The present invention is described hereinafter by various embodiments with reference to the accompanying drawings, wherein reference numerals used in the accompanying drawings correspond to the like elements throughout the description. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiment set forth herein. Rather, the embodiment is provided so that this disclosure will be thorough and complete and will fully convey the scope of the invention to those skilled in the art. In the following detailed description, numeric values and ranges are provided for various aspects of the implementations described. These values and ranges are to be treated as examples only, and are not intended to limit the scope of the claims. In addition, a number of materials are identified as suitable for various facets of the implementations. These materials are to be treated as exemplary, and are not intended to limit the scope of the invention.
(44) The present invention relates to a generic physical layer providing a unified architecture for interfacing with an external memory device. Accordingly, the generic physical layer comprises a transmit data path (100) for transmitting a parallel data to the external memory device and a receive data path (200) for receiving a serial data from the external memory device, characterized by a receive enable logic (300) for masking strobe of the data. The transmit data path (100) and the receive data path (200) each comprising a FIFO circuit (4), a data rotator (2) and an adjustable-delay logic for delay tuning and a per-bit-deskew (10) for multi-lane support.
(45) In accordance with an embodiment of the present invention, the transmit data path (100) further comprises a clock-crossing multiplexor (6) configured to select a read base index, a serializer (8) configured to serialize the parallel data, and a multi-rank logic configured to support data transmission of multiple ranks.
(46) In accordance with an embodiment of the present invention, the receive data path (200) further comprises a divider (12) configured to divide the strobe of the data and a counter configured to extend a user read enable to cover strobe toggling.
(47) In accordance with an embodiment of the present invention, the FIFO circuit (4) of the receive data path (200) is associated with a plurality of latches for handling strobe toggling, multi-lane data transfer and de-skew.
(48) In accordance with an embodiment of the present invention, the receive enable logic (300) comprises a logic gate circuit.
(49) Hereinafter, each features of the physical layer including the transmit data path (100), receive data path (200) and the receive enable logic (300) will be discussed in more details. Examples will be given for more detailed explanation. The advantages of the present invention may be more readily understood and put into practical effect from these examples. However, it is also to be understood that the following examples are not to limit the scope of the present invention in any way.
(50) Transmit Data Path (100)
(51)
(52) In the following teachings, the transmit data path (100) can be assumed to transmit an input data [N−1:0] with N equal to 4. The value of N includes but not limited to 2, 4, 8 and 16. Further, it can be assumed that the input data is transmitted bit-0 first and bit-(N−1) last. Likewise, the input data can be transmitted bit-(N−1) first and bit-0 last in other embodiments.
(53) Referring to
(54)
(55) Thereafter, the input data can enter the transmit data path (100) through a data rotator (2). The data rotator (2) and a first-in-first-out (FIFO) circuit (4) including but not limited to four FIFO units, U0-U3, can serve to delay the input data by certain UI. The data rotator (2) can be implemented using a left-rotate function. For example, when the input data is “DCBA” and the index of the data rotator (2) is one, the input data will be rotated left by one and the rotated output data will be “CBAD”. Generally, the data rotator index can be 1-bit for a 2-bit input. The data rotator index can be 2-bit for a 4-bit input. The data rotator index can be 3-bit for an 8-bit input. Meanwhile, the U0-U3 write indices can be log 2 (M) bit where M is the number of entries of the FIFO units. In this embodiment, there are eight entries in each FIFO unit and thus the U0-U3 write indices are 3-bit each. The desired UI delay can refer to as a generic letter, A, in this embodiment for explanation purposes. The bit value of A can be obtained through the sum of the bit value of the data rotator index and the bit value of the U0-U3 write indices. In this embodiment, the data rotator index is 2-bit as there is a 4-bit input while the U0-U3 write indices are 3-bit each as there are eight entries in each FIFO units, thus the bit value of A is equal to 5-bit which means that there are 32 possible UI delays in the transmit data path (100). In another embodiment, the data rotator index can be 1-bit while the U0-U3 write indices can be 2-bit each, thus the bit value of A is equal to 3-bit which means that there are 8 possible UI delays in the transmit data path (100).
(56) The data rotator index and the U0-U3 write indices can be generated by the following logic equations which is based on the present embodiment of 2-bit data rotator index and 3-bit U0-U3 write indices. It is readily understood that the following logic equations can be adjusted according to the bit value of the data rotator index and the bit value of the FIFO unit write indices.
The data rotator index=lower 2 bits of A, A[1:0];
The U0 write index=write base index+upper 3 bits of A, A[4:2]+bit_wise_or(A[1:0]);
The U1 write index=write base index+A[4:2]+second bit of A, A[1];
The U2 write index=write base index+A[4:2]+bit_wise_and(A[1:0]); and
The U3 write index=write base index+A[4:2]
(57) In one example, the desired number of delays for input data “DCBA” through the transmit data path (100) is eight and thus A[4:0] is 8 UI or 5′b01000. The data rotator index and the U0-U3 write indices can be determined as follows.
The data rotator index=A[1:0]=0;
The U0 write index=write base index+A[4:2]+bit_wise_or(A[1:0])=0+2+0=2;
The U1 write index=write base index+A[4:2]+A[1]=0+2+0=2;
The U2 write index=write base index+A[4:2]+bit_wise_and(A[1:0])=0+2+0=2; and
The U3 write index=write base index+A[4:2]=0+2=2
(58) The write base index starts from 0 and increments on every cycle. In this example, the lower 2 bits of A, A[1:0] is “00” and thus it is 0 in decimal value. The upper 3 bits of A, A[4:2] is “010” and thus it is 2 in decimal value. The value of bit_wise_or(A[1:0]) is 0 unless one or both of the first bit and the second bit of A are “1”. The second bit of A, A[1] is “0” and thus it is 0 in decimal value. The value of bit_wise_and(A[1:0]) is 0 unless both of the first bit and the second bit of A are “1”. Since the data rotator index is 0 and the U0-U3 write indices are 2, the input data “DCBA” will be rotated by 0 and written into entry 2 of each FIFO unit as shown in
(59) Referring to
(60) Subsequently, the read index can be used to read out data from the 4 FIFO units. Entry 0 of the FIFO units will be read out first as the read index is 0. The serializer (8) utilizes both CLK_B_0 and CLK_B_90 as “select” for a multiplexor and select 1 out of the 4 bits of FIFO output data to achieve a 4:1 serialization as shown in
(61) Referring to
(62) In another example, the desired delay for input data “DCBA” through the transmit data path (100) is nine and thus A[4:0] is 9 UI or 5′b01001. The data rotator (2) can be implemented using a right-rotate function. The data rotator index and the U0-U3 write indices can be determined as follows.
The data rotator index=A[1:0]=1;
The U0 write index=write base index+A[4:2]+bit_wise_or(A[1:0])=0+2+1=3;
The U1 write index=write base index+A[4:2]+A[1]=0+2+0=2;
The U2 write index=write base index+A[4:2]+bit_wise_and(A[1:0])=0+2+0=2; and
The U3 write index=write base index+A[4:2]=0+2=2
(63) Since the data rotator index is 1 and the data rotator (2) is implemented using a right-rotate function, the input data “ABCD” is rotated right as “DABC”. “D” is written into entry 3 of the FIFO U0 while “ABC” are written into entry 2 of FIFO U1-U3 each accordingly as shown in
(64) Referring to
(65) In accordance with an embodiment of the present invention, the external pad has to be connected to different devices or ranks and thus different delays are required. For example, one set of data has to be transmitted to one device with a delay of 4 UI while another set of data has to be transmitted to another device with a delay of 7 UI. This can be achieved through changing the rotator index and the U0-U3 write indices for each device.
(66) Referring to
The data rotator index=A[1:0]=0. So the rotated data is still “DCBA”.
The U0 write index=write base index+A[4:2]+bit_wise_or(A[1:0])=0+1+0=1;
The U1 write index=write base index+A[4:2]+A[1]=0+1+0=1;
The U2 write index=write base index+A[4:2]+bit_wise_and(A[1:0])=0+1+0=1; and
The U3 write index=write base index+A[4:2]=0+1=1
(67) In the second cycle, input data of “HGFE” is transmitted to rank-1 with a delay of 5 UI and thus A[4:0] is 5 UI or 5′b00101. Write base index increments to 1.
The data rotator index=A[1:0]=1. So the rotated data is “GFEH”.
The U0 write index=write base index+A[4:2]+bit_wise_or(A[1:0])=1+1+1=3;
The U1 write index=write base index+A[4:2]+A[1]=1+1+0=2;
The U2 write index=write base index+A[4:2]+bit_wise_and(A[1:0])=1+1+0=2; and
The U3 write index=write base index+A[4:2]=1+1=2
(68) In the third cycle, input data of “LKJI” is transmitted to rank-3 with a delay of 7 UI and thus A[4:0] is 7 UI or 5′b00111. Write base index increments to 2.
The data rotator index=A[1:0]=3. So the rotated data is “ILKJ”.
The U0 write index=write base index+A[4:2]+bit_wise_or(A[1:0])=2+1+1=4;
The U1 write index=write base index+A[4:2]+A[1]=2+1+1=4;
The U2 write index=write base index+A[4:2]+bit_wise_and(A[1:0])=2+1+1=4; and
The U3 write index=write base index+A[4:2]=2+1=3
(69)
(70) In accordance with an embodiment of the present invention, it is possible to have several data lanes with each lane having N-bit parallel input data and 1 serial output. Each lane may have different clocks. Therefore, each lane may need to adjust the clock slightly different and thus it renders the use of the per-bit-deskew (10) for each lane. Referring to
(71) Receive Data Path (200)
(72)
(73)
(74)
(75) Referring to
(76) Thereafter, once the input data has been captured into the FIFOs U0-U3, the data can be read out after the FIFO data is stable. Reading out from the FIFO can be from the CLK_Y clock domain. A user read enable input can be asserted to indicate read out from the FIFO units. In certain settings, the user read enable input has to be extended so as to cover strobe toggling such as preambles, interambles and postambles. In an example where there are 1 cycle of preamble and 1 cycle of postamble, a FIFO read enable can be generated, which is extended from the user read enable input by 2 CLK_Y cycles. If the user read enable has to be extended by a certain number of cycles, it can be achieved through the use of a counter.
(77) Two cases of using the counter are illustrated in
(78)
(79) Referring to
(80) In the next cycle, the FIFO output will be “XXXX” as illustrated in
(81) The aforementioned method can continue to work for subsequent input data. For example, the next input stream of ‘I, J, K, L, M, N, O, P’ will occupy the following entries in the FIFO units as shown in
(82) In accordance with an embodiment of the present invention, the input data can be skewed by one or more cycles. The skew is introduced when the input data latency and the input clock latency are unmatched. For example, referring to
(83) In accordance with an embodiment of the present invention, there are multiple data lanes and each lane has its own clock or data skew as shown in
(84) Receive Enable Logic (300)
(85) For protocols that utilize bidirectional strobes, including but not limited to LPDDR3, LPDDR4, LPDDR5, DDR3, DDR4 and DDR5, the input clock known as data strobe (DQS) is only valid during a specific timing window. Outside of this timing window, the strobe is unknown. Hence, the strobe cannot be used as a direct clock into the receive data path (200). The strobe has to be qualified or gated with a receive enable signal.
(86)
(87) First of all, the memory controller may assert the user read enable signal when it has issued a read command to the external memory device. This user read enable is an indication to the receive data path (200) that read data is expected to return from the external memory device. The user read enable is asserted for a certain number of CLK_Y cycles which is equal to the length of the data burst cycle that it intends to read. For example, if the memory controller has sent a read command for 8 chunks of data, the burst read clock cycle is 4 due to double data rate where 1 chunk of data corresponds to 0.5 clock cycle. Therefore, the user read enable is asserted for 4 clocks. However, some protocols including but not limited to DDR4, LPDDR4 and DDR5 may require extra strobe toggling such as preambles and postambles on the strobe.
(88) Referring to
(89) Referring to
(90) Subsequently, RXENA goes through a transmit data path (100) with no output buffer. The transmit data path (100) is utilized to delay RXENA by an arbitrary amount in order to align RXENA at the middle of the TRise window of the raw DQS. The transmit data path can be coupled with RXENA or RXEOP to ensure that a signal can be generated, which envelopes the valid DQS used for reads. The transmit data path (100) in this embodiment can serve as a slow-to-fast clock serializer with delay adjustment to transfer RXENA or RXEOP in the slow clock domain to the fast clock domain.
(91) A. User read enable is asserted by the memory controller for 4 clock cycles.
(92) B. RXENA is generated and extended for an additional 3 clocks, giving a total of 7 clock cycles. The extension is to cover both the preambles and postambles of DQS.
(93) C. RXENA is delayed by the transmit data path (100) and the adjustable-delay logic in such a way that the rising edge of RXENA is placed before the first rising edge of raw DQS and within the Trise window of raw DQS.
(94) D. RXENA Final is derived from RXENA since RXENA Final is obtained from an OR operation between RXEOP FIFO unload enable and RXENA through a logic gate circuit as shown in
E. At the first falling edge of gated DQS, RXEOP FIFO unload enable is asserted.
F. For the next 6 clock cycles, RXEOP FIFO is unloaded.
G. When the unload pointer reaches entry-5, a value of ‘1’ is unloaded from the RXEOP FIFO. This will cause the RXEOP FIFO unload enable to be deasserted in the next cycle.
H. Around this period, RXENA has been deasserted. However, RXENA Final is still active high due to RXEOP FIFO unload enable still being high.
I. RXEOP FIFO unload enable is deasserted.
J. RXENA Final is also deasserted because both RXENA and RXEOP FIFO unload enable are deasserted.
(95) In accordance with an embodiment in the present invention, the receive enable logic (300) can be extended to half-rate CLK_Y where CLK_Y frequency is divided by 2, quarter-rate CLK_Y where CLK_Y frequency is divided by 4 or slower with minor modifications. The DQS can also be divided accordingly. Using divided clocks enables the invention to scale for DDR5 and beyond.
(96) A. User read enable is asserted by the user. Since CLK_Y is divided by 2, 1 cycle of divided CLK_Y is equivalent to 2 cycles of the original/full-rate CLK_Y. Therefore, user read enable is only asserted for 2 clocks for the same amount of data.
(97) B. RXENA is now represented by 4 bits. Each bit represents a UI interval (there are 4 UI in 1 divided CLK_Y). So when RXENA[3:0] is 4′b1111, it means RXENA is asserted for 1 full divided CLK_Y. When RXENA[3:0] is 4′b0011, then RXENA is only asserted for the first half of divided CLK_Y. Here, RXENA is held for 3 divided CLK_Y cycles, or 10 UI (there are 10 bits of ‘1’), to cover the preambles.
C. RXEOP FIFO load enable is asserted for 2 CLK_Y cycles which is 1 cycle less than RXENA.
D. RXEOP is asserted only on the last cycle when RXEOP FIFO load enable is asserted.
E. The 4-bits of RXENA[3:0] is serialized and delayed by the similar scheme in the transmit data path (100). The delay is adjusted in such a way that the rising edge of the serialized RXENA is placed before the first rising edge of raw DQS and within the Trise window of raw DQS.
F. RXENA Final is obtained from an OR operation between RXEOP FIFO unload enable and RXENA through the logic gate circuit.
G. Gated DQS is obtained from an AND operation between the raw DQS and RXENA Final through the logic gate circuit.
H. Divided DQS is generated by dividing the gated DQS on every rising edge of gated DQS. The divided DQS at 90 degrees is generated by dividing the gated DQS on every falling edge of gated DQS.
I. At the first falling edge of gated DQS, RXEOP FIFO unload enable is asserted.
J. For the next 2 clock cycles, RXEOP FIFO is unloaded.
K. When the unload pointer reaches entry-1, a value of ‘1’ is unloaded from the RXEOP FIFO. This will cause the RXEOP FIFO unload enable to be deasserted in the next cycle.
L. RXEOP FIFO unload enable is deasserted. RXENA Final is also deasserted because both RXENA and RXEOP FIFO unload enable are deasserted.
(98) Accordingly, the receive enable logic (300) can deal with any number of preambles, postambles and interambles. No additional counter is required in DQS clock domain to count the width of the receive enable signal. In the existing solutions, the additional counter has to be aware of the different memory protocols as the number of strobe toggling varies for each memory protocol. Instead, the receive enable logic (300) utilizes the variable count indication encompassing the generation of the three major signals to cater for different memory protocols. This is of critical importance because DQS runs at a high speed as the RXEOP FIFO is vital for conveying burst length including data cycles, preambles and postambles. The fall of RXENA Final is synchronous to the fall of gated DQS. Hence it is impervious to the effect of DQS drift. In addition, RXEOP FIFO can be made very small, for example, with only 3 entries. The load and unload pointers can be implemented using one hot ring counter for fast operation. Further, the receive enable logic (300) allows running CLK_Y at half-rate or half frequency for higher speed of DDR5 protocol.
(99) Various modifications to these embodiments are apparent to those skilled in the art from the description and the accompanying drawings. The principles associated with the various embodiments described herein may be applied to other embodiments. Therefore, the description is not intended to be limited to the embodiments shown along with the accompanying drawings but is to be providing broadest scope of consistent with the principles and the novel and inventive features disclosed or suggested herein. Accordingly, the invention is anticipated to hold on to all other such alternatives, modifications, and variations that fall within the scope of the present invention and appended claim.
(100) In the claims which follow and in the preceding description of the invention, except where the context requires otherwise due to express language or necessary implication, the word “comprise” or variations such as “comprises” or “comprising” is used in an inclusive sense, i.e. to specify the presence of the stated features but not to preclude the presence or addition of further features in various embodiments of the invention.