PAM4 transceivers for high-speed communication

11431416 · 2022-08-30

Assignee

Inventors

Cpc classification

International classification

Abstract

A communication device includes an AFE configured to track and hold a first driving signal to produce a plurality of sample signals, a shift and hold module configured to store the plurality of sample signals, and an ADC configured to respectively convert the plurality of sample signals to a plurality of digitized sample signals, the ADC including a plurality of ADC slices. A DSP is configured to calibrate the AFE based on the plurality of ADC slices corresponding to the plurality of digitized sample signals and generate an output data stream comprising the plurality of digitized samples. A skew management module is configured to detect a skew of the plurality of digitized sample signals in the output data stream generated by the DSP module, generate a programmable skew offset based on the detected skew, and correct the skew in the output data stream based on the programmable skew offset.

Claims

1. A communication device comprising: an analog front end (AFE) comprising: a track and hold (T/H) module configured to track and hold a first driving signal to produce a plurality of sample signals, the first driving signal being based on an input data stream; a shift and hold (S/H) module configured to store the plurality of sample signals; and an analog-to-digital converter (ADC) module configured to respectively convert the plurality of sample signals to a plurality of digitized sample signals, the ADC module including a plurality of ADC slices respectively corresponding to the plurality of digitized sample signals; a digital signal processing (DSP) module configured to calibrate the AFE based on the plurality of ADC slices corresponding to the plurality of digitized sample signals and to generate an output data stream comprising the plurality of digitized samples; and a skew management module configured to (i) detect a skew of the plurality of digitized sample signals in the output data stream generated by the DSP module, the skew corresponding to either one of a timing skew between respective signals of the plurality of digitized sample signals and a logical skew of pulses in the respective signals, (ii) generate a programmable skew offset based on the detected skew, and (iii) correct the skew in the output data stream based on the programmable skew offset.

2. The device of claim 1 further comprising a phase lock loop module for recovering a clock signal from the input data stream, the skew management module being further configured to detect the skew based on the recovered clock signal.

3. The device of claim 1 wherein the DSP module is configured to: calibrate the AFE by: adjusting reference voltages of the plurality of ADC slices; and correcting a residual gain error of the AFE that remains after adjustment of the reference voltages.

4. The device of claim 1 wherein the DSP module comprises a set of parallel feed forward equalizers configured to perform channel equalization.

5. The device of claim 1, T/H module being configured to track and hold the first driving signal at a first frequency that is about twice a frequency of the input data stream.

6. The device of claim 1 further comprising a power regulator coupled to a power source, the power regulator configured to regulate power supplied to the AFE by performing feed-forward injection of a frequency tuned to attenuate noises associated with the power source.

7. The device of claim 1 wherein the DSP module comprises a baud rate timing recovery module configured to recover a clock signal from the input data stream.

8. The device of claim 1 further comprising a clock generation module configured to generate a clock signal based at least on the input data stream and to provide the clock signal to at least one of the AFE and the DSP module.

9. A method for processing an input data stream, the method comprising: receiving the input data stream; generating a first driving signal by amplifying the input data stream; tracking and holding the first driving signal at a second frequency to produce a plurality of sample signals; converting the plurality of sample signals to a plurality of digitized sample signals using a plurality of ADC slices respectively corresponding to the plurality of digitized sample signals; generating an output data stream using the plurality of digitized sample signals; detecting a skew of the plurality of digitized sample signals in the output data stream, the skew corresponding to either one of a timing skew between respective signals of the plurality of digitized sample signals and a logical skew of pulses in the respective signals; generating a programmable skew offset based on the detected skew; and correcting the skew in the output data stream based on the programmable skew offset.

10. The method for processing an input data stream of claim 9 further comprising modulating the output data stream to transmit the output data stream in a pulse amplitude modulated (PAM) format.

11. The method for processing an input data stream of claim 9 further comprising generating a clock signal based on the input data stream.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) The following diagrams are merely examples, which should not unduly limit the scope of the claims herein. One of ordinary skill in the art would recognize many other variations, modifications, and alternatives. It is also understood that the examples and embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this process and scope of the appended claims.

(2) FIG. 1 is a simplified diagram illustrating a transceiver system according to an embodiment of the present invention.

(3) FIG. 2A is a simplified diagram illustrating a transceiver system with multiple data lanes.

(4) FIG. 2B is a simplified diagram illustrating an integrated EQ-VGA module according to an embodiment of the present invention.

(5) FIG. 3 is a simplified diagram illustrating a DSP module according to an embodiment of the present invention.

(6) FIG. 4A is a simplified diagram illustrating a driver according to an embodiment of the present invention.

(7) FIG. 4B is a simplified diagram illustrating a skew management system according to an embodiment of the present invention.

(8) FIG. 5 is a simplified diagram illustrating a fractional PLL according to an embodiment of the invention.

(9) FIG. 6A is a simplified diagram illustrating a delay lock loop that generates different phases according to an embodiment of the present invention.

(10) FIG. 6B is a simplified diagram illustrating a regulator according to an embodiment of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

(11) The present invention is directed to data communication. More specifically, embodiments of the present invention provide a transceiver that processes an incoming data stream and generates a recovered clock signal based on the incoming data stream. The transceiver includes a voltage gain amplifier that also performs equalization and provides a driving signal to track and hold circuits that hold the incoming data stream, which is stored by shift and holder buffer circuits. Analog to digital conversion is then performed on the buffer data by a plurality of ADC circuits. Various DSP functions are then performed over the converted data. The converted data are then encoded and transmitted in a PAM format. There are other embodiments as well.

(12) High speed signaling using NRZ has approached speeds above 50-Gb/s where it is extremely difficult to maintain power efficiency and performance over a wide variety of channels and applications. PAM4 is emerging as one way forward to increase throughput in such band-limited channels. Higher modulation formats also helps mitigate cost in optical systems by packing more bits per wavelength. Strong momentum in standards to adopt PAM4 reflects these significant trends in the industry. At the same time, migrating transceivers designs to current technology nodes have narrowed the power gap between traditional Analog and ADC-DSP-DAC based systems at high-speed. These factors make ADC-based receivers a highly desirable design choice, as is also the trend in wireless communications.

(13) The following description is presented to enable one of ordinary skill in the art to make and use the invention and to incorporate it in the context of particular applications. Various modifications, as well as a variety of uses in different applications will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to a wide range of embodiments. Thus, the present invention is not intended to be limited to the embodiments presented, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

(14) In the following detailed description, numerous specific details are set forth in order to provide a more thorough understanding of the present invention. However, it will be apparent to one skilled in the art that the present invention may be practiced without necessarily being limited to these specific details. In other instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the present invention.

(15) The reader's attention is directed to all papers and documents which are filed concurrently with this specification and which are open to public inspection with this specification, and the contents of all such papers and documents are incorporated herein by reference. All the features disclosed in this specification, (including any accompanying claims, abstract, and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic series of equivalent or similar features.

(16) Furthermore, any element in a claim that does not explicitly state “means for” performing a specified function, or “step for” performing a specific function, is not to be interpreted as a “means” or “step” clause as specified in 35 U.S.C. Section 112, Paragraph 6. In particular, the use of “step of” or “act of” in the Claims herein is not intended to invoke the provisions of 35 U.S.C. 112, Paragraph 6.

(17) Please note, if used, the labels left, right, front, back, top, bottom, forward, reverse, clockwise and counter clockwise have been used for convenience purposes only and are not intended to imply any particular fixed direction. Instead, they are used to reflect relative locations and/or directions between various portions of an object.

(18) It is to be appreciated that embodiments of the present invention provide transceiver systems that can operate at high speed (e.g., 40/50/100/400 Gb/s). In certain implementations, transceivers are configured to use non-return to zero (“NRZ”) and/or pulse amplitude modulation (“PAM”) modulation techniques. For example, PAM4 modulation is used for data communication over optical communication networks. FIG. 1 is a simplified diagram illustrating a transceiver system according to an embodiment of the present invention. This diagram is merely an example, which should not unduly limit the scope of the claims. One of ordinary skill in the art would recognize many variations, alternatives, and modifications. Among other things, transceiver 100 is configured to provide various types of networking applications. As shown in FIG. 1, transceiver 100 is configured to receive data at a high rate (e.g., 10-20 Gb/s). Data transmitted from transceiver 100 can be in various formats, such as NRZ, PAM4, and/or other formats. Transceiver 100 includes phase-lock loop (“PLL”) devices for clock recovery. In certain embodiments, transceiver 100 is implemented without a reference clock and uses clock signal recovered by the PLL from incoming data. The recovered clock from the host interface is filtered through the line receive pll path prior to retransmit. While eliminating a reference, this also allows for independent control of jitter tolerance on the host and jitter transfer through the line. In certain embodiments, incoming data are processed before transmitted. For example, data processing may include data buffering, aligning incoming data from multiple communication lanes, forward error correction (“FEC”), and/or others. For example, data is first received by an analog front end (AFE), which prepares the incoming for digital processing. The digital portion (e.g., DSPs) of the transceivers provides various functions in the digital domain, such as skew management, equalization, reflection cancellation, and/or other functions. It is to be appreciated that filtering the recovered clock through the PLL path can provide many benefits, as it allows the system to independently filter the recovered clock multiple times (e.g., through RX PLL and/or TX PLL), and to eliminate the reference clock buffer, thereby saving both power and cost.

(19) The incoming data is characterized by a data frequency, which can be determined by sweeping a predetermined frequency range. For example, the transceiver is configured to acquire sampling frequency by sweeping through a predetermined frequency range, performing data sampling at different frequencies within the predetermined frequency range, and determining a target frequency for sampling data based on a maximum early peak frequency and a maximum late peak frequency. There are other embodiments as well.

(20) In certain embodiments, the transceiver 100 is configured to detect loss of signal. For example, an incoming data stream is sampled and a recovered clock signal is generated from receiver accordingly. The recovered clock is then to transmitter for signal regeneration. An output clock signal of a higher frequency than the recovered clock signal is generated by a narrow-band transmission PLL. The frequency of the recovered clock signal is compared to a divided frequency of the output clock signal. If a difference between the recovered clock signal and the output clock signal is greater than a threshold error level, a loss of signal indication is provided. There are other embodiments as well.

(21) FIG. 2A is a simplified diagram illustrating a transceiver system with multiple data lanes. This diagram is merely an example, which should not unduly limit the scope of the claims. One of ordinary skill in the art would recognize many variations, alternatives, and modifications. Transceiver 200 comprises EQ-VGA modules 201 and 202. For example, EQ-VGA module 201 integrates an equalizer (“EQ”) and a voltage gain amplifier (“VGA”). It is to be appreciated by using integrated EQ-VGA modules, power consumption and efficiency are improved. In addition, by reducing interconnect and wiring between the equalizer and the VGA, total-harmonic distortion (THD) at system power-on is reduced.

(22) In certain embodiments, a continuous time linear equalization (CTLE) is used to process the incoming data stream and provide an offset correction as needed. For example, a CTLE module for receiving input data signal is set to an isolation mode, and one or more sense amplifiers perform data sampling asynchronously during the isolation mode. During the isolation mode, CLTE(s) that are not directly connected to the sense amplifiers are shut. Data sampled during the isolation mode are used to determine an offset value that is later used in normal operation of the SERDES system. There are other embodiments as well.

(23) FIG. 2B is a simplified diagram illustrating an integrated EQ-VGA module according to an embodiment of the present invention. This diagram is merely an example, which should not unduly limit the scope of the claims. One of ordinary skill in the art would recognize many variations, alternatives, and modifications. As shown in FIG. 2B, input voltages are received as a pair, v.sub.in_p and v.sub.in_n, and they are provided to the input transistors P.sub.1 and P.sub.2 respectively. After equalization and voltage gain are performed, output voltage pairs v.sub.out_p and v.sub.out_n are provided. According to various embodiments, the VGA is characterized by a gain range of at least 12 dB in 0.1 dB steps and a bandwidth of at least 13 GHz. For example, a high-frequency gain-boost of up to 8 dB increases signal energy in high loss channels. The integrated VGA and equalizer achieves a signal to noise ratio (SNR) of at least 41.7 dB and THD of at least 36 dB overall gain, boosting and processing corners with full-scale outputs. The EQ-VGA uses trans-conductance (gm)-boosted source degeneration, which improves linearity by reducing the nonlinear gate-to-source voltage (v.sub.gs) variation of transistors P1 and P2 by the respective loops created by N1-N3-N7 and N2-N6-N8. It is to be appreciated that programmable gain is achieved through differentially modulating the mirrored trans-conductance gain via the v.sub.ds bias on N3-N5 and N4-N6.

(24) Now referring back to FIG. 2A. The EQ-VGA modules 201 and 202 drive the track and hold (“T/H”) circuits. In an embodiment, the EQ-VGA modules perform coarse equalization to reduce dynamic range requirements of the ADC 207. In a specific embodiment, each of the EQ-VGA modules drives four T/H switches. For example, the EQ-VGA module 201 drives the top four T/H switches, and the EQ-VGA drives the bottom four T/H switches. Depending on the specific implementation, the T/H switches can be configured to operate in various frequencies. For example, for 28 Gb/s data communication, each of the T/H switches operates at 3.5 GS/s. Data held by the T/H switches are stored at the sample-and-hold (SH) buffer 206. At shown in FIG. 2A, the SH buffer 206 comprises 8 buffer units that corresponds to 8 T/H switches, which stores data that T/H switches hold. The SH buffer 206 is connected to ADCs (e.g., ADC 205), which use a feed-forward based negative-g.sub.ds technique and is preferable to simple source followers to optimize signal-to-noise performance and linearity at low supply voltages. In various embodiments, a replica circuit controls the negative-g.sub.ds in the buffers ensuring constant gain over process, voltage, and temperature. Each of the 32 (4 for each lane) sub-ADCs is a successive approximation register (SAR) core clocked at a predetermined frequency (e.g., 7 GHz for 28 Gb/s communication link). It is to be appreciated that independent reference buffers minimize nonlinear and signal-dependent noise coupling between channels.

(25) Now referring back to FIG. 1. As shown, system 100 includes DSP modules for data processing. FIG. 3 is a simplified diagram illustrating a DSP module according to an embodiment of the present invention. This diagram is merely an example, which should not unduly limit the scope of the claims. One of ordinary skill in the art would recognize many variations, alternatives, and modifications. The digital signal processing (DSP) module 302 as shown in coupled to an analog module 301. The analog module 301 includes, among other components, a PLL module and an analog front end (AFE) module. In various embodiments, the PLL module recovers clock signal from received data. The AFE module provides signal conditioning. As shown in FIG. 3, the AFE module is coupled to the DSP module 302, which performs calibration for offset, gain, timing skew estimation and correction of the analog front-end. For example, the gain of the 32 interleaved channels is estimated using an envelope detector. Gain mismatch is compensated by adjusting the associated reference voltage which maximizes range of each ADC slice. Residual gain errors are further corrected in the digital domain. Offsets of each interleaved channel are estimated digitally by computing the average of the slicer error at the output of the feed forward equalization (FFE) that corresponds to each signal path. Depending on the specific implementation, dynamic range of the ADC can be configured as a tradeoff for offset correction to avoid DACs in the signal path that would degrade bandwidth. For example, timing mismatch is digitally estimated by using correlated properties of the PAM input signal. For example, digital controls are fed back to small delay cells that alter the sampling phases of the 8 T/H clocks with a resolution of for about 100 fs. It is to be appreciated that the DSP module 302, working in conjunction with the analog module 301, can provide substantial performance improvement. For example, plot 303 provides an NRZ jitter tolerance. More specifically, plot line 305 shows performance with NRZ with 15 dB backplane, and the plot line 306 shows performance with very short reach (VSR) mask. VSR Mask Plot 304 provides an SNDR of the entire front-end and the impact from timing calibration. More specifically, plot line 308 shows SNR from a RJ setting of 300 fs RMS, plot line 307 shown performance for setting with timing CAL ON with 300 fs RJ removed, plot line 309 shows performance with timing calibration turned on, and plot line 310 shows performance with timing calibration turned off.

(26) In certain embodiments, the DSP module uses a Management Data Input/Output (MDIO) for providing serial data communication, which includes management data I/O, data communication, and device configuration. For example, information related to skew management, reflection cancellation, and various signal characterized measured by a receiving system is communicated through the MDIO.

(27) In various embodiments, the DSP module 302 employs a set of parallel FFEs for channel equalization. The parallel factor was chosen to be a multiple of the number of sub-ADC channels to minimize power consumption. Bandwidth mismatch between the different AFE paths is compensated by independent adaptation of the FFE slices. The DSP module 302 also includes an adaptive PAM4 decision feedback equalizer (DFE). The feedback taps are limited to one tap to reduce the impact of error propagation. In various embodiments, the DSP module 302 performs reflection cancellation to reduce noise. For example, reflection cancellation techniques are described in U.S. patent application Ser. No. 14/597,120, filed 14 Jan. 2015, entitled “PAM DATA COMMUNICATION WITH REFLECTION CANCELLATION”.

(28) According to various embodiments, baud-rate clock recovery techniques is based on a Mueller-Muller timing recovery scheme, and involves taking inputs directly at the ADC output, thus eliminating interaction problems with FFE-DFE adaptation while providing a low latency clock recovery path. A measured jitter tolerance plot for NRZ modulation is shown in plot 303 against a VSR mask. The clock recovery scheme can be made truly reference-less by taking advantage of the reference-less HOST VSR Link. The recovered clock is filtered prior to ADC sampling. Depending on the implementation, by eliminating the need for a reference clock and only uses clock signal recovered from incoming data, power consumption and chip area can be reduced. For example, data rate program without reference clock signal is described in U.S. patent application Ser. No. 14/681,989, filed 8 Apr. 2015, entitled “DATA RATE PROGRAMMING USING SOURCE DEGENERATED CTLE”.

(29) At the driver stage, common-mode logic (CIVIL) configuration is used. FIG. 4A is a simplified diagram illustrating a driver according to an embodiment of the present invention. This diagram is merely an example, which should not unduly limit the scope of the claims. One of ordinary skill in the art would recognize many variations, alternatives, and modifications. In various embodiments, line transmitters are configured as two PAM or four NRZ Links. With four NRZ links, the system provides support for segmented modulators that generate PAM-4 in the optical domain. As shown in FIG. 4A, the CML implementation of driver is configured with shunt peaking in the final stage. For example, the driver provides swing levels up to 1.4 Vpp and incorporates a 3-Tap finite impulse response (FIR) filter with independent control on the MSB and LSB paths. The MSB to LSB ratio can also altered for providing compensation on the PAM-4 eye, which is useful in applications where the PAM transmitter interfaces with optical drivers. For example, eye modulation is performed to compensate for distortion that occurs during data transmission and to equalize signal-to-noise level among different eye levels.

(30) In certain implementations, eye modulation is performed at the transmission side of a PAM communication system to compensate for distortion and non-linearity and generate an output waveform. Spacing among eye levels is adjusted by performing symmetric modulation using α parameter and asymmetric modulation using β parameter. A correction module measures the output waveform and sends feedback signals to a control module to adjust the α parameter and the β parameter. There are other embodiments as well.

(31) In various embodiments, transceiver system according to embodiments of the present invention provide skew control mechanism that auto-zeroes electrical and logical skew in NRZ mode. Additionally, the system can pre-compensate skews (e.g., less than 1 UI) that occur downstream. FIG. 4B is a simplified diagram illustrating a skew management system according to an embodiment of the present invention. This diagram is merely an example, which should not unduly limit the scope of the claims. One of ordinary skill in the art would recognize many variations, alternatives, and modifications. For example, an analog phase detector senses the skew and a finite state machine (FSM) corrects for screw by adjusting the offset in the PLL charge-pump. A delta-sigma modulator driving this offset current provides very fine control of the PLL phase (resolution <100 fs). The FSM also calibrates the loop to be able to introduce the required skew and maintain it over operating conditions. In actual implementation, the noise introduced by the offset delta-sigma is substantially negligible. In an exemplary implementation, the entire system shows a simulated 3σ error of less than +/−0.5 ps peak-to-peak due to mismatches.

(32) According to various embodiments, skew management functions are performed by a skew management module. The skew management module generates a control current based on output test patterns of the two communication lanes. The control current is integrated and compared to a reference voltage by a comparator, which generates an analog offset signal. A PLL of one of the communication lanes generates a corrected clock signal that is adjusted using the analog offset signal to remove or adjust the skew between the communication lanes. The corrected clock signal is used for output data.

(33) As mentioned above, PLLs are used to provide clock signals. FIG. 5 is a simplified diagram illustrating a fractional PLL according to an embodiment of the invention. This diagram is merely an example, which should not unduly limit the scope of the claims. One of ordinary skill in the art would recognize many variations, alternatives, and modifications. For example, a fractional-N PLL provides the required clocks for the TX and RX paths. In an implementation, the voltage controlled oscillator (VCO) is inductor-based with dual tuning paths (9.9 to 15.5 GHz). An amplifier and RC filter form a slow path that drives V.sub.ctrl_fast close to a target voltage. It is to be appreciated that this implementation offers many advantages. The fractional PLL maximizes charge pump headroom and linearity, and it stabilizes the fast loop K.sub.vco over tuning range, tracks temperature, and reduces the loop filter size. In various implementations, the multi-modulus (MM) divider is based on Vaucher's extended range topology, which enables transition across stage boundaries smoothly overcoming a key limitation in the original topology. It is to be appreciated that the factional DLL illustrated in FIG. 5, in an exemplary implementation, can have a characterized by a low integrated RMS jitter of 182 fs on the TX outputs in a frequency band of 1 KHz-100 MHz.

(34) For data communication, timing phases are often needed. In various embodiments, delay lock loop (DLL) is used to generate timing phases. FIG. 6A is a simplified diagram illustrating a delay lock loop that generates different phases according to an embodiment of the present invention. This diagram is merely an example, which should not unduly limit the scope of the claims. One of ordinary skill in the art would recognize many variations, alternatives, and modifications. As shown in FIG. 6A, a DLL generates the timing phases for the ADC from a 7-GHz clock from the PLL. Static phase offset (SPO) is often a challenge in DLLs. A differential quadrature phase detector achieves the goal of low SPO allowing for healthy timing margins in the ADC clocking and easing start up of the DSP engine.

(35) In various embodiments, phase-interpolator is implemented in conjunction with a delay-lock loop (DLL) and an SR latch, where one or more outputs of the DLL is used by the SR latch. Additionally, such techniques can be used for a variety of applications such as network and/or computer storage systems, computer servers, hand held computing devices, portable computing devices, computer systems, network appliances and/or switches, routers, and gateways, and the like.

(36) In addition, embodiments of the present invention also power supply noise management. FIG. 6B is a simplified diagram illustrating a regulator according to an embodiment of the present invention. This diagram is merely an example, which should not unduly limit the scope of the claims. One of ordinary skill in the art would recognize many variations, alternatives, and modifications. It is to be appreciated that power supply noise management is an important aspect of high performance communication links. Both power supply rejection ratio (PSRR) and random noise from regulators impact over-all SNR of the analog front-ends. The regulator topology shown in FIG. 6B uses feed-forward injection. The frequency of injection is tuned to attenuate external switching regulator noise, which can often occur around PLL corner frequencies. This attenuation allows for reduced on-board filtering requirements. In addition, source degeneration is employed in the error amplifier to further reduce 1/f noise contributors.

(37) Depending on the specific implementation, transceiver system according to embodiment of the present invention can be manufacturing using various types of fabrication processes. For example, 28 nm CMOS logic process can be used to fabricate the transceiver system. In a specific implementation, a transceiver system (e.g., transceiver system 100 in FIG. 1) consumes about 2.4 W of power from 1.2V and 0.9V power supplies, with FEC bypassed. There are other embodiments as well.

(38) While the above is a full description of the specific embodiments, various modifications, alternative constructions and equivalents may be used. Therefore, the above description and illustrations should not be taken as limiting the scope of the present invention which is defined by the appended claims.