High-speed receiver architecture
09882648 ยท 2018-01-30
Assignee
Inventors
- Oscar Ernesto Agazzi (Irvine, CA)
- Diego Ernesto Crivelli (Cordoba, AR)
- Hugo Santiago Carrer (Mendiolaza, AR)
- Mario Rafael Hueda (Cordoba, AR)
- German Cesar Augusto Luna (Cordoba, AR)
- Carl Grace (Berkeley, CA, US)
Cpc classification
H04B7/0456
ELECTRICITY
H04B10/5059
ELECTRICITY
H04B1/38
ELECTRICITY
H04B10/25073
ELECTRICITY
H04L5/16
ELECTRICITY
International classification
H04B1/38
ELECTRICITY
H04B10/2507
ELECTRICITY
H04L25/02
ELECTRICITY
H04L5/16
ELECTRICITY
Abstract
A receiver (e.g., for a 10G fiber communications link) includes an interleaved ADC coupled to a multi-channel equalizer that can provide different equalization for different ADC channels within the interleaved ADC. That is, the multi-channel equalizer can compensate for channel-dependent impairments. In one approach, the multi-channel equalizer is a feedforward equalizer (FFE) coupled to a Viterbi decoder, for example a sliding block Viterbi decoder (SBVD); and the FFE and/or the channel estimator for the Viterbi decoder are adapted using the LMS algorithm.
Claims
1. A receiver comprising: an analog-to-digital converter that generates signal samples from a received signal having combined non-Gaussian noise and Gaussian noise; a feedforward equalizer coupled to receive the signal samples from the analog-to-digital converter, and to apply equalization to generate equalized samples; a decoder coupled to an output of the feedforward equalizer, the decoder determining detected symbols from the equalized samples and a channel model by minimizing a cumulative metric that compensates the combined Gaussian and non-Gaussian noise in the received signal; and a channel estimator to receive the output of the feedforward equalizer and an output of the decoder and to generate the channel model and an error feedback signal to adaptively update coefficients of the feedforward equalizer based on the error feedback signal, wherein the cumulative metric approximated by M.sub.n[(.sub.n).sup.v(.sub.n).sup.v].sup.2 where M is the cumulative metric, .sub.n represents a sample at the output of the feedforward equalizer, .sub.n is a noise-free signal component of .sub.n, where 0<v1.
2. The receiver of claim 1, where v is given by
3. The receiver of claim 1, further comprising: a transformation block to provide an approximate solution to (.sub.n).sup.v, wherein the transformation block implements a function LUT(.sub.n)=(.sub.n+y.sub.min).sup.v when the sample .sub.n is in an interval between y.sub.min and +y.sub.min.
4. The receiver of claim 1, further comprising: a transformation block to provide an approximate solution to (.sub.n).sup.v, wherein the transformation block implements a function y.sup.vy.sup.v.sup.
5. The receiver of claim 1, further comprising: a transformation block to provide an approximate solution to (.sub.n).sup.v, wherein the transformation block implements a function K.sup.1.sup.0.5+(v0.5) where K is a constant.
6. The receiver of claim 1, where v is given by v=0.5 if M.sub.2,1/M.sub.2,0>K.sub.0.5 and by v=1, where K.sub.0.5 is a programmable threshold level.
7. The receiver of claim 1, wherein the non-Gaussian noise comprises amplified spontaneous emission (ASE) noise.
8. A method for equalizing a received signal, the method comprising: generating, by an analog-to-digital converter, signal samples from the received signal having combined non-Gaussian noise and Gaussian noise; applying, by a feedforward equalizer an equalization to the signal samples from the analog-to-digital converter to generate equalized samples; determining, by a decoder coupled to an output of the feedforward equalizer, detected symbols from the equalized samples and a channel model by minimizing a cumulative metric that compensates the combined Gaussian and non-Gaussian noise in the received signal; and generating, by a channel estimator, the channel model and an error feedback signal to adaptively update coefficients of the feedforward equalizer based on the error feedback signal, wherein the cumulative metric is approximated by M.sub.n[(.sub.n).sup.v(.sub.n).sup.v].sup.2 where M is the cumulative metric, .sub.n represents a sample at the output of the feedforward equalizer, .sub.n is a noise-free signal component of .sub.n, where 0<v1.
9. The method of claim 8, where v is given by
10. The method of claim 8, further comprising: providing an approximate solution to (.sub.n).sup.v by implementing a function LUT(.sub.n)=(.sub.n+y.sub.min).sup.v when the sample .sub.n is in an interval between y.sub.min and +y.sub.min.
11. The method of claim 8, further comprising: providing an approximate solution to (.sub.n).sup.v a function y.sup.vy.sup.v.sup.
12. The method of claim 8, further comprising: providing an approximate solution to (.sub.n).sup.v, by implementing a function K.sup.1.sup.0.5+(v0.5) where K is a constant.
13. The method of claim 8, where v is given by v=0.5 if M.sub.2,1/M.sub.2,0>K.sub.0.5 and by v=1, where K.sub.0.5 is a programmable threshold level.
14. The method of claim 8, wherein the non-Gaussian noise comprises amplified spontaneous emission (ASE) noise.
15. A transceiver chip for communication over an optical fiber, the transceiver chip comprising: a host interface for providing encoded data in a host format for transmitting over the optical fiber and for receiving decoded data in the host format originating from the optical fiber; transmit path circuitry to receive the encoded data from the host interface and generate an electrical signal suitable for driving a laser coupled to the optical fiber; an input port for receiving a received signal originating from the optical fiber having combined non-Gaussian noise and Gaussian noise; and receive path circuitry coupled between the input port and the host interface, the receive path circuitry comprising: an analog-to-digital converter that generates signal samples from the received signal having the combined non-Gaussian noise and Gaussian noise; a feedforward equalizer coupled to receive the signal samples from the analog-to-digital converter, and to apply equalization to generate equalized samples; a decoder coupled to an output of the feedforward equalizer, the decoder determining detected symbols from the equalized samples and a channel model by minimizing a cumulative metric that compensates the combined Gaussian and non-Gaussian noise in the received signal, the decoder to generate the decoded data; and a channel estimator to receive the output of the feedforward equalizer and an output of the decoder and to generate the channel model and an error feedback signal to adaptively update coefficients of the feedforward equalizer based on the error feedback signal, wherein the cumulative metric is approximated by M.sub.n[(.sub.n).sup.v(.sub.n).sup.v].sup.2 where M is the cumulative metric, .sub.n represents a sample at the output of the feedforward equalizer, .sub.n is a noise-free signal component of .sub.n, where 0<v1.
16. The transceiver chip of claim 15, where v is given by
17. The transceiver chip of claim 15, further comprising: a transformation block to provide an approximate solution to (.sub.n).sup.v, wherein the transformation block implements a function LUT (.sub.n)=(.sub.n+y.sub.min).sup.v when the sample .sub.n is in an interval between y.sub.min and +y.sub.min.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The invention has other advantages and features which will be more readily apparent from the following detailed description of the invention and the appended claims, when taken in conjunction with the accompanying drawings, in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22) The figures depict embodiments of the present invention for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the invention described herein.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
(23)
(24) On the receive side, a typical receiver 115 includes a photodetector 111 for receiving and detecting data from the optical fiber 110. The detected data is typically processed through a transimpedance amplifier (TIA) 112. A programmable gain amplifier (PGA) 120 applies a variable gain to the electrical analog signal. The resulting electrical signal is converted to digital form by an interleaved ADC 130. The interleaved ADC 130 is timed by a clock signal produced by the sampling clock generator 140. The digital output of the ADC 130 is further processed by digital signal processing circuitry (DSP) 150 to recover the digital data. In this example, the DSP implements electronic dispersion compensation using a multi-channel equalizer. The recovered data may then be placed on the appropriate interface by interface circuitry 116. For example, if receiver 115 is implemented on a chip that is mounted on a host, the interface 116 may be an interface to the host. In
(25) The following example will be illustrated using a 10G receiver. While 10G systems serve as convenient examples for the current invention, the current invention is not limited to 10G systems. Examples of other systems to which the current invention could be applied include Fibre Channel systems, which currently operate at speeds from 1 Gbps to 10 Gbps, as specified by the Technical Committee T11, a committee of the InterNational Committee for Information Technology Standards (INCITS).
(26) One place where the following examples deviate from the LRM standard is the fiber length. The draft standard specifies 220 meters, but the following examples use a 300 meter length. This is motivated by the large number of fibers in the field whose length approaches 300 meters, and by the fact that users of EDC technology have expressed a desire for this extended reach. Although the LRM channel is used in the following examples in order to make them more concrete, the techniques illustrated are general and they can be used in many other fiber optic or other communications applications. Other fiber optic applications for which these techniques can be used include, for example, systems using single mode optical fiber as the communications medium.
(27)
(28)
(29) In some embodiments, the retimer 237 may also multiplex the eight ADC channels 230 back into one or more higher data rate signals (e.g., into one 10G signal, or two parallel 5G signals, etc.). In the particular implementation shown in
(30)
(31) The multi-channel equalizer 350 in this example is a maximum likelihood sequence estimation (MLSE) equalizer. This is motivated by the fact that the optimal receiver for an intersymbol interference channel in the presence of Gaussian noise consists of a whitened matched filter followed by a maximum likelihood sequence detector. The equalizer 350 includes a MIMO-FFE (C) 360 coupled to a sliding block Viterbi decoder (SBVD) 370 and a MIMO channel estimator (B) 380. This architecture is able to compensate for the ISI of MMF, as well as for the impairments of the receiver front-end, such as channel-to-channel variations in the interleaved ADC 130.
(32) In more detail, the MIMO-FFE 360 applies feed-forward equalization to the digital data received from the ADC 130. The coefficients for the equalization are updated using the LMS algorithm, as implemented by circuitry 362. The SBVD 370 then makes decisions based on the equalized samples from the FFE 360. These are output as the digital data recovered by the receiver (or possibly converted from serial to parallel form). Circuitry 380 is the channel estimator for the SBVD 370. The estimated channel is used by the SBVD 370 to make its decisions. An error computation unit 381 calculates the error between the FFE 360 output q.sub.n and the output of the channel estimator 380. The error signal produced by the channel estimator 380 is used by the LMS update circuitry 362 to update the coefficients for the FFE 360 and is also used by the timing recovery circuitry 340 to adjust the clock 140 driving the ADC 130. The channel estimator 380 itself is also adaptive, in this example also based on the LMS algorithm.
(33)
(34) In one implementation, unlike conventional ADC pipelines, the residue amplifiers 425 are implemented as open-loop amplifiers rather than closed-loop amplifiers. Closed-loop amplifiers can be more closely controlled, in terms of parameters such as gain and nonlinearity. However, closed-loop amplifiers have more severe speed limitations or require more power to achieve a given speed than open-loop amplifiers. The use of open-loop amplifiers provides higher speed (increases swing and bandwidth) with lower power. It can also reduce requirements on transistor performance.
(35) However, because the gain G provided by open-loop amplifiers 425 can be less controlled, some form of redundancy is preferably employed to avoid the loss of analog information in the pipeline. In one approach, a sub-radix architecture with redundancy is used. In a non-redundant architecture, the total number of raw bits d.sub.i generated by the stages 420 is the same as the number of bits in the digital representation. In a redundant architecture, the stages 420 produce more raw bits d.sub.i than the number of output bits in the digital representation. The extra bits represent redundant information which is used to correct errors in the pipeline. In a sub-radix architecture, each stage 420 outputs one raw bit d.sub.i but effectively converts less than one output bit of the digital representation. Therefore, the total number of stages 420 is more than the number of output bits in the digital value.
(36) For example, in one non-redundant architecture, each stage 420 effectively converts 1 bit and the residue amplifier gain G is 2. Therefore, eight stages 420 are required to implement an 8-bit A/D conversion. The eight raw bits d.sub.i are the actual output bits in the digital representation of the analog value, with the raw bit from stage 1 being the most significant output bit. As an example of a sub-radix architecture, each stage 420 might generate 1 raw bit but convert only 0.8 output bits with a residue amplifier gain G of 2.sup.0.8. More stages 420 are required, 10 stages in this case to implement an 8-bit A/D conversion. The 10 raw bits d.sub.i from the stages 420 are not the 8 output bits in the digital representation but are used to generate the final 8 bits using known algorithms. The sub-radix architecture allows gains errors to be tolerated by an amount proportional to the amount of gain reduction. It also allows redundancy with not much additional hardware.
(37) A popular redundancy technique is a 1.5 output bits/stage architecture. In this technique, each stage 420 outputs 2 raw bits (thereby requiring additional comparators, which dissipate additional power), and backend processing uses this redundant information to improve accuracy. Using this technique, the accuracy of the ADC pipeline is set primarily by the accuracy of the interstage gain G. Because the gain of open-loop interstage amplifiers 425 is not as well controlled, this technique is not preferred for the present application. A sub-radix architecture, on the other hand, maintains 1 output bit per stage but provides redundancy by interstage gains of less than 2, and the accuracy of the interstage gain G is not as central to the architecture. This requires additional stages 420 (for example, an 8-bit ADC pipeline might require 10 or 11 stages using this technique) but only 1 comparator per stage. Again, backend processing uses the redundant information to provide the required accuracy.
(38)
(39) In the lookahead pipeline, the critical timing path, consisting of the amplifier settling time plus the comparator regeneration time, is broken into two shorter paths. In the example shown, all stages 420 (other than the first stage 420Q) have a pair of comparators 421(X) and 421(Y) (rather than a single comparator) that operates to develop the possible values for the stage based on the input value to the previous stage. This basically allows the interstage amplification and the comparator operation to occur in parallel, giving the comparators an entire clock half-period to regenerate. In this architecture, the first stage 420Q (that generates raw bit D.sub.1) is a half-stage that uses a single comparator. The remaining stages 420B-N use two comparators 421 per stage. The last stage may be simplified since there is no following stage. The last stage could contain only the circuitry required to generate the last raw bit D.sub.N (e.g., eliminating the subtractor 423N and open-loop amplifier 425N). The architecture is somewhat more complex that an ADC pipeline without lookahead, but it allows much higher speeds when the interstage amplifier's speed is comparable to the comparator's speed.
(40) In some sense, the sub-ADC 421 operation for a lookahead stage is moved ahead one stage. Referring to
(41) However, the sub-ADC 421 for stages 420B-N becomes more complex. The sub-ADC 421B for the second lookahead stage 420B includes two comparators 421B(X) and 421B(Y). These comparators determine the bit D.sub.2 for stage 420B. Comparator 421B(X) determines bit D.sub.2 assuming that bit D.sub.1 is a 1. Comparator 421B(Y) determines bit D.sub.2 assuming that bit D.sub.1 is a 0. Switch 427B determines which result to select, depending on the output of sub-ADC 421Q of the previous stage 420Q. The bit D.sub.2 is fed to the sub-DAC 422C of stage 420C.
(42) As described above, the lookahead pipeline architecture allows a full clock half period for the comparators to regenerate. There is also the potential to use part of the amplifier settling time for comparator regeneration, since the amplifier output will be approaching its final value closely enough that the comparator threshold has been passed and the comparator can begin regenerating.
(43)
(44) Each interleaved ADC channel 230 includes two pipeline units 610(1) and 610(2). Each ADC pipeline unit 610 includes an ADC pipeline 630 followed by a calibration unit, which in this example is a lookup table 640. As a result of the non-linearities of the individual stages 420 in the pipeline 630, the response of the overall ADC pipeline 630 has a complex non-linear characteristic, denoted in
(45) Each ADC channel 230 includes two pipeline units 610(1) and 610(2) which are constantly being swapped between normal operation and calibration modes, at a rate of about 1 MHz. At any given instant, one of the two pipelined units is in normal operation, while the other is in calibration. Approximately every microsecond, the units are automatically interchanged. Therefore, to an external observer, the pair of pipelined units 610(1) and 610(2) operates as a single high-precision ADC channel 230.
(46) For the pipelined unit 610(1) that is in normal operation, the calibration portion of a pipelined unit 610(1) behaves as a simple lookup table 640(1). The raw output from the ADC pipeline 610(1) is the memory address used to access the lookup table 640(1). The content at this memory address is the digital output of the ADC channel 230.
(47) For the pipelined unit 610(2) that is in calibration, the lookup table 640(2) contents are updated. The update is based on a reference ramp generated by a digital counter 615 followed by a high precision DAC 617, which provides the input for the ADC pipeline 610(2) under calibration. Since the ramp can be relatively slow, a digital ramp can be generated from the DSP 150. The lookup table 640(2) is updated using an LMS algorithm, where the error is computed as the difference between the current content of the lookup table entry addressed by the pipeline output and the expected output, which is the output of the counter 615. If the two quantities are identical, the lookup table 640(2) entry is already correct and it does not need to be updated. Correspondingly, the error is zero, so that no update takes place. However, if the two quantities differ, there will be an update. The LMS algorithm effectively averages many updates, so that the entries in the lookup table 640(2) are not computed based on a single conversion, but on an average of many conversions.
(48) Now consider the design of an interleaved ADC for the following 10G example: 10 GS/s nominal conversion rate (10.3125 GS/s actual conversion rate) 8 bit accuracy
(49) In one design, the ADC includes eight parallel time-interleaved ADC channels 230A-H. Each ADC channel 230 operates at a nominal conversion rate of 1.25 GS/s (actual conversion rate 1.29 GS/s). Each ADC channel 230 includes two ADC lookahead pipelines 630 of 11 stages each, with one pipeline in service at any one time and the other available for calibration. Each of the 16 lookahead pipelines 630 uses open-loop interstage amplifiers and subranging lookahead pipeline architecture. Lookup table calibration compensates for non-linearities. There are 16 lookup tables for the non-linear calibration, one for each of the 16 pipelines. Each lookup table takes the 11-bit raw input from the lookahead pipeline as input and outputs the corrected 8-bit digital value.
(50) Allowing for the expected worst case offset values and interstage gain tolerance (for the open-loop amplifiers), computing the required redundancy gives an ADC pipeline with 11 stages and an interstage nominal gain G of 1.75. The 3 sigma input referred offset including comparators and residue amplifiers is estimated at 26 mV. This results in an interstage gain G of less than 1.82. With gain G=1.75, 11 stages are required to achieve 8 bit performance with 10% tolerance on the gain G.
(51) The digital output of the interleaved ADC 130 is further processed by the multi-channel equalizer 350.
(52)
(53) First transform filters h(t) and f.sub.o(t) through f.sub.M1(t) from the continuous to the sampled time domain. The transformation assumes ideal sampling (sampling without phase errors). Sampling time errors will be modeled with a multiple-input, multiple-output (MIMO) interpolation filter, as will be seen later. Defining:
a.sub.n.sup.(i)=a.sub.(nMi)i=0, . . . ,M1,(1)
a MIMO description of this communications link is obtained by converting the single-input, single-output (SISO) filters h(t) and f.sub.o(t) through f.sub.M1(t) to a MIMO and a multiple-input, single-output (MISO) representation, respectively, as shown in
(54) In this way, the MIMO model accepts M-dimensional input vectors whose components are transmitted symbols, and produces M-dimensional output vectors whose components are signal samples, at a rate 1/MT.
r(z)=GP(z)F(z)H(z)a(z)+O(z).(2)
Grouping the factors in the first term of the sum as S(z)=GP(z)F(z)H(z), the entire MIMO response of the system can be represented in the z-domain and time-domain, respectively, as:
(55)
(56) Given the model of Eqn. (3), the joint compensation of the channel impairments (such as intersymbol interference (ISI)) and the analog front-end (AFE) errors can be formulated as the general equalization problem of a MIMO channel. Common equalization techniques include feed forward equalization, decision feedback equalization, and maximum likelihood sequence estimation.
(57)
(58) In one implementation, the MIMO-FFE 360 is described by the following equation:
(59)
where N.sub.f is the number of MM-matrix taps (C.sub.i) of the forward equalizer.
(60) Let K be the total number of bits transmitted. It is convenient to assume, without loss of generality, that K=NM with N integer. The maximum-likelihood sequence detector chooses, among the 2.sup.K possible sequences, the one {.sub.k} (=1, . . . , K) that minimizes the metric:
(61)
where B() is a function that models the response of the equalized channel with memory 1, and .sub.n=(.sub.nM, .sub.nM1, . . . .sub.(n1)M+2). Note that each component of B() depends only on consecutive received bits. This formulation assumes that in general the function B() is nonlinear. The minimization of Eqn. (5) can be efficiently implemented using the Viterbi algorithm. The required number of states of the Viterbi decoder is S=2.sup.1. The SBVD 370 is generally a suitable form of the Viterbi algorithm for a MIMO receiver. The input to the SBVD 370 is the FFE 360 output vector q.sub.n, and the output is a block of M detected symbols .sub.n.
(62) For each of the M components of B(.sub.n), the MIMO channel estimator 380 generates the 2S expected values of the corresponding component of the q.sub.n vector for all possible combinations of the most recently received bits (corresponding to the 2S branch metrics in the trellis diagram). The MIMO channel estimator 380 can be implemented using M lookup tables, each lookup table having 2S entries. While the vector B(.sub.n) can in general take on 2.sup.MS values, dynamic programming techniques inherent in the Viterbi algorithm reduce the computational requirement to that of computing the 2MS branch metrics corresponding to the individual components of B(.sub.n).
(63) The coefficients of the FFE 360 and the lookup tables can be iteratively adapted using the well known LMS algorithm, as follows for iteration j:
e.sub.n=B.sup.j(.sub.n)q.sub.n,(6)
C.sub.l.sup.(j+1)=C.sub.l.sup.(j)+e.sub.nr.sub.n1.sup.(T),(7)
B.sup.j+1(.sub.n)=B.sup.j(.sub.n)e.sub.n(8)
where ().sup.T means transpose and are the algorithm step sizes of the FFE and channel estimator, respectively. The iteration number j of the LMS update is shown as a superscript. The LMS update circuitry 362 carries out this function.
(64) Note that the absence of a reference level in Eqns. (6)-(8) defines coefficients of the FFE 360 and the channel estimator 380 only up to a scale factor. One possible way to define the scale is to set one of the coefficients of the FFE 360 to a specific value which is kept fixed (not adapted). In the 10G example, the number of taps of the FFE 360 can be programmed by the user. This allows the user to trade performance for power consumption. For similar reasons, the number of states of the Viterbi decoder 370 can also be set by the user.
(65) The parallel implementation of the FFE 360 is closely related to the MIMO structure. From the MIMO representation, the FFE 360 can be expanded as a convolution matrix as follows:
(66)
where L.sub.f is the number of taps used. Then the output samples are computed as:
q.sub.n=C[r.sub.(nM)r.sub.(nM1). . . r.sub.((n1)M+Lf1)].sup.T(10)
(67) The parallel implementation of the FFE 360 can be represented by MFIR filters, which is precisely what Eqn. (10) represents. In the presence of mismatches in the AFE, the coefficients in different rows of Eqn. (9) are different. This effectively allows different equalization to be applied to each of the interleaved channels (although the equalization can be applied after the interleaved channels have been recombined). The MIMO structure of the Viterbi decoder 370 is also essentially identical to the parallel processing realization. The only modification is that branch metrics associated with different components of the input vector q.sub.n are computed using different components of the channel estimator function B, which is not the case in a traditional parallel implementation. Although in Eqns. (9) and (10) the implicit assumption is made that the DSP parallelization factor equals the dimension of the MIMO channel, in practice this constraint is not required.
(68)
(69)
(70) In the 10G example, a 25-tap, 16-parallel FIR is used. Recall that the incoming 10G signal is decimated into 8 1.25G signals but that the ADC channel processing each of these signals uses two ADC pipelines, one is in operation while the other is in calibration. Therefore, there are eight ADC pipelines active at any given time. Each of the eight ADC channels is demultiplexed by a factor two by the retimer 237 to allow a parallelization factor of 16 in the DSP 150. This is done to reduce the clock rate of the DSP 150. Different parallelization factors can be used in alternate embodiments. In this example, because there are only 8 independent ADC channels, the number of independent equalizers need only be 8, not 16. Therefore, each set of coefficients of the equalizer is shared by two channels of the MIMO equalizer.
(71) The basic architecture shown in
(72) In the 10G example, the SBVD 370 can be user programmed for either 4 states or 8 states. The channel estimator 380 is implemented using a 16-term Volterra series expansion and therefore uses either 8 terms or 16 terms, depending on the number of states for the SBVD 370. The coefficient of the linear term corresponding to the most recently received bit is forced to 1 to fix the scaling factor for the channel estimator 380. In this implementation, the constant term is forced to 0 to avoid competition with other modules that remove baseline wander from the signal. In another embodiment, the constant term is actually adapted, therefore performing baseline wander compensation without the need for other baseline wander compensation modules. Therefore, the number of adaptive terms is 6 for a 4-state decoder and 14 for an 8-state decoder. Both the channel estimator 380 and the SBVD 370 are multi-channel in the sense that, similar to the FFE 360, they are parallelized to support separate equalization of each of the 8 ADC pipelines. Taken to the extreme, there effectively are independent parts of the channel estimator 380 and SBVD 370 for each of the 8 ADC pipelines.
(73) In another aspect, the FFE 360 and channel estimator 380 can be adapted on a sub-sampled basis. Let R be the parallelization factor of the interleaved ADC 130 and M be the parallelization factor of the DSP 150. M may be different from R. In the 10G examples, the baseline values are R=8 and M=16.
(74) Referring to Eqns. (6)-(8) above, the LMS update algorithm for the FFE 360 can be written as
c(n+1,k)=c(n,k)e(n)(nk)(11)
where k is an index that identifies the equalized coefficients, n represents time, e is the slicer error, and x is the input signal. Let
n=mM+p with (0p<M)(12)
Then the update algorithm Eqn. (11) can be written
c(m+1,p,k)=c(m,p,k)e(m,p)(mM+pk)(13)
(75) If the same coefficients are used to equalize all ADC channels (i.e., if multi-channel equalizer is not used), then the dependence of the coefficients on p can be dropped. The update term preferably should be summed over all ADC channels to average out the effect of sampling phase errors. In this case, Eqn. (13) reduces to
(76)
If the coefficients used to equalize different ADC channels are all independent, then update Eqn. (13) could be used. However, the speed of update can be improved by adding an update component similar to the one computed for the case of common coefficients, for example
(77)
In this approach, the channel-dependent update is broken into two terms: one that represents an average update for all channels (the term) and one that represents each channel's deviation from the average update (the term). For =0, Eqn. (15) reduces to the case of common coefficients Eqn. (14). For =0, it reduces to the case of entirely independent coefficients Eqn. (13).
(78) However, note that update Eqn. (15) is not subsampled. The values of the error at all times n=mM+p are used to update the coefficients. The implementation of this approach would require relatively complex parallel processing. To reduce complexity and power dissipation, it is desirable to subsample the adaptation, in other words, to adapt the coefficients without using all samples of the error. Note that subsampling may be different for the and terms of the update equation.
(79) Let the subsampling factors for the and terms of Eqn. (15) be M.sub.c=rM and M.sub.d=sM, respectively, where r and s are integers greater than or equal to 1. This means that both M.sub.c and M.sub.d are greater than or equal to M, which avoids the need for parallel processing. Typically, r and s will be powers of 2. Now let z be the least common multiple of r and s. The time index n can then be written as
n=izM+w where (0w<zM)(16)
Substituting this into Eqn. (15) yields the subsampled update algorithm
(80)
(81) As an example, consider the case of M.sub.c=64, M.sub.d=64, M=16, r=4, s=4 and z=4. In this case, coefficients applied to different ADC channels can be different. Although the coefficients are updated every 1024 cycles of the baud clock, the subsampling factor of the common update term is only 64, because each update incorporates the contributions of 16 error samples. The subsampling factor of the independent terms is 6416=1024. The processor that computes the common updates runs at of the clock rate of the DSP. The processor that computes the independent updates is shared by all interleaves.
(82)
(83) The channel estimator 380 can be updated in a similar fashion. In the 10G example, as described above, the channel estimator 380 is implemented using a 16-term Volterra series expansion. The constant term is forced to zero to avoid competition with other circuitry that compensates for baseline wander. The coefficient of the linear term corresponding to the most recently received bit is forced to 1 to fix the scaling factor for the channel estimator 380. Alternatively, the coefficient of the oldest bit could be set to 1, to force the channel estimator 380 to train to an anticausal response, which may be advantageous for some channels.
(84) The adaptation algorithm described above for the FFE 360 is also used for the channel estimator 380. Eqn. (17) can be used for the channel estimator 380, except that the sign of the two terms involving the error is plus, and the signal is replaced by decisions and products of decisions corresponding to the terms of the Volterra series expansion.
(85) The above examples were based on MLSE and LMS, but other techniques can also be used. Multi-channel equalizers other than MLSE can also be used. Other common equalization techniques include feed forward equalization and decision feedback equalization.
(86) As an example of one variation, the equalizer 350 in
(87) Mathematically, let a.sub.n be the transmit bit at time instant n. The signal at the DSP input is given by
y.sub.n=s.sub.n+r.sub.n+z.sub.n(18)
where s.sub.n=f(a.sub.n, a.sub.n1, . . . a.sub.n1+d) is the noise-free signal component, r.sub.n is the ASE noise in the electrical domain, and z.sub.n is the thermal Gaussian noise. Note that s.sub.n is a nonlinear function of d consecutive bits. An MLSE detector approach chooses the sequence that minimizes the cumulative metric defined by
(88)
where T.sub.s() is a given signal-dependent nonlinear transformation, and .sub.s is the conditional second-order central moment of the random variable T.sub.s(y.sub.n). In SMF channels with ASE and Gaussian noise, T.sub.s(y) is well approximated by y.sup.vs with 0v.sub.s1 and y0. For example, v.sub.s=0.5 for all s for ASE noise, and v.sub.s=1 for all s for Gaussian noise. For combined noise, typically 0.4v.sub.s<1 and v.sub.s typically is different for each noise-free level of s. In the presence of combined ASE and Gaussian noise (i.e., a practical situation), the implementation of the exact signal-dependent nonlinear transformation T.sub.s() in the receiver architecture can be complex. It is advantageous to use simpler approximations in order to reduce receiver complexity.
(89)
(90) In the presence of only ASE noise (i.e., ASE mode), the random variable y.sup.0.5 is approximately Gaussian with the same variance for all the noise-free levels s. This way, if the samples from the ADC are first processed by the SQRT transformation 1830, the rest of the receiver (FFE, MLSE, Gaussian metrics, etc.) can be designed for normal operation (i.e., assuming a Gaussian channel). Note that this approach involves implementing the square root transformation of the received samples before FFE. This could be implemented by using the ADC calibration tables, but different track and hold offsets might degrade the accuracy This is because these offsets might originate different nonlinear transformations in each ADC. Therefore, in the example of
(91) Furthermore, in the example of
(92) In a simplified approach, the SQRT function is approximated based on
(93)
where .sub.0 and .sub.1 are proper constants. For channels with combined noise, .sub.1/.sub.0 varies between [0.5, 1]. Thus, further simplification can be achieved
.sub.1=(0.5+2.sup.N.sup.
.sub.0=(1.0+2.sup.N.sup.
This approach can achieve significant gains with low complexity, and without the use of external information. The approach could also be used with external BER information by using exhaustive search of parameters N.sub.0 and N.sub.1. Internal adaptation is also possible.
(94) An alternate approach uses the samples at the output of the FFE and functions with combined noise. This scheme is based on the cumulative metric given by
(95)
where the hat denotes samples at the FFE output. To reduce complexity, one value of v.sub.s, v, is used. One criteria to select this value is that the variances .sub.s be approximately the same for all s. This way, the cumulative metric reduces to
M.sub.n[({circumflex over ()}.sub.n).sup.v(.sub.n).sup.v].sup.2(23)
(96) The value v may be estimated as follows. Let M.sub.2,s be the conditional second-order central moment of the samples before transformation. Then, it can be shown that .sub.s(v.sub.s).sup.2s.sup.2(vs1)M.sub.2,s. Therefore, a solution is given by
(97)
where S.sub.0=f(0, 0, . . . , 0), S.sub.1=f(1, 1, . . . , 1), and M.sub.2,0 and M.sub.2,1 are the respective conditional second-order central moments of the noise. Note that v=1 for Gaussian channels (M.sub.2,1=M.sub.2,0). For ASE noise, it is possible to verify that M.sub.2,1/M.sub.2,0S.sub.1/S.sub.0 and v0.5. Note that this approach can address combined ASE and Gaussian noise.
(98) Unlike the SQRT approach shown in
(99) In one implementation, it is estimated. If the interval of the samples is known, e.g. [y.sub.min, +y.sub.max], (with y.sub.min, y.sub.max>0), the content of the LUT is generated as follows
LUT(.sub.n)=(.sub.n+y.sub.min).sup.v.sub.n>y.sub.min(25)
For an automatic adaptation of the parameters v and y.sub.min, the computation of the variances for two signal levels can be determined
(100)
where g.sub.0 and g.sub.1 are the minimum and maximum values of the channel estimator, respectively.
(101) In one approach, the complexity in the computation of v may be reduced by using an approximation based on M.sub.2,0 and M.sub.2,1. For example, for ASE and Gaussian limited cases, the approximation might be
v=0.5 if M.sub.2,1/M.sub.2,0>K.sub.0.5
v=1.0 if M.sub.2,1/M.sub.2,0<K.sub.0.5(27)
where K.sub.0.5 is a programmable threshold level (e.g., K.sub.0.5=1.5). One possible implementation uses a LUT-based approach similar to the one described for
(102) In yet another approach, the nonlinear function y.sup.v can be implemented by using the approximation
y.sup.vy.sup.v.sup.
where y.sup.vref is a tabulated reference. For the case v.sub.ref=0.5, the nonlinear function reduces to
y.sup.v.fwdarw.K.sup.1y.sup.0.5+(v0.5)y(29)
where K is a given constant (e.g., K0.85).
(103) As yet another example, in a nondispersive SMF channel (e.g., back-to-back test), the optimal detector reduces to a comparator with a proper threshold (offset). Thus, the optimal solution for a nondispersive SMF channel is a MLSE with a shift of the baseline. On nondispersive channels, the threshold can be analytically approximated from the parameter v for equal noise power as follows
thr[0.5((S.sub.0).sup.v+(S.sub.1).sup.v)].sup.1/v,S.sub.0,S.sub.1,>0,(30)
where s.sub.0 and s.sub.1 are the noise-free signals. The parameter v may be estimated as described above.
(104) Referring again to
(105) The timing recovery circuitry 340 operates as follows. The signal from the ADC may have a non-zero offset. The offset compensation 1310 is circuitry that tracks this baseline wander and removes (or reduces) it. The timing phase corrector 1320 introduces a controlled amount of ISI by using a filter with z-transform of F(z)=1z.sup.1, where ||<<1 is adjusted dynamically to minimize the error signal from the multi-channel equalizer 350. The phase detector 1330 is based on a modified Mueller and Muller algorithm, based on pseudo-decisions derived directly from the input signal before equalization as shown in
(106) As shown in
(107)
(108)
(109) In one approach, the timing recovery circuitry 340 is implemented in a parallel manner. Rather than processing one serial stream at 10G, the incoming data is decimated into eight parallel streams of 1.25G each. This allows the clocks (e.g., for the phase detector 1330) to run at the 1.25G rate (actually 1.288 GHz clock) rather than at a 10G rate.
(110)
(111) The quantizer 1530 receives the output of the peak detector 1520. It compares the output to a reference value and generates a 1 or 0 depending on whether the output is greater than or less than the reference. This I/O signal is used to adjust the gain of the PGA 120. In one approach, the AGC is divided into a coarse gain and a fine gain. The I/O signal is used initially to set the coarse gain and then used on a continuous basis to set the fine gain via counter 1540.
(112)
(113) The best delay search 1640 converges the multi-channel equalizer 350 using the available cursor delays in the FFE 360. In an alternate embodiment, the best delay search 1640 converges the equalizer 350 using all the available delays in the FFE 360 and all the available delays in the linear part of the channel estimator 380. For each convergence, the mean squared error (MSE) obtained from error signal (Eqn. 6) is stored. Once all available delays are swapped, the best delay search 1640 selects the delays which yielded the minimum MSE. The multi-channel equalizer 350 is then converged 1660 using the cursor delays determined by the delay search 1640. After that, the chip operates in a normal mode.
(114) The approach described has many advantages. For example, many of the functions have been chosen to allow maximum implementation on a DSP chip 150. The 10G example results in an all-DSP (other than the analog front end) electronic dispersion compensation receiver for the 10GBASE-LRM application. The functions shown in DSP 150 of
(115) The examples described above generally concern the receiver. However, in many 10G and other applications, the communication links are bidirectional and the receiver and transmitter at each end of the link are housed in a single transceiver module. In some applications, these modules are fixed to a host circuit board, and in other applications they are pluggable modules that can be inserted into and removed from a cage (or socket) that is fixed to the host circuit card. Multi-Source Agreements (MSAs) have been developed to achieve some degree of interoperability between modules from different manufacturers. Example MSAs include XFP and SFP+, in which the 10 Gbps electrical I/O interface to the host is serial, and X2, XPAK, and XENPAK, in which the 10 Gbps electrical interface to the host is parallelized to four lanes in each direction. The receivers described above are well suited for inclusion in these types of transceiver modules.
(116)
(117) Although the detailed description contains many specifics, these should not be construed as limiting the scope of the invention but merely as illustrating different examples and aspects of the invention. It should be appreciated that the scope of the invention includes other embodiments not discussed in detail above. For example, the functionality has been described above as implemented primarily in electronic circuitry. This is not required, various functions can be performed by hardware, firmware, software, and/or combinations thereof. Depending on the form of the implementation, the coupling between different blocks may also take different forms. Dedicated circuitry can be coupled to each other by hardwiring or by accessing a common register or memory location, for example. Software coupling can occur by any number of ways to pass information between software components (or between software and hardware, if that is the case). The term coupling is meant to include all of these and is not meant to be limited to a hardwired permanent connection between two components. In addition, there may be intervening elements. For example, when two elements are described as being coupled to each other, this does not imply that the elements are directly coupled to each other nor does it preclude the use of other elements between the two. Various other modifications, changes and variations which will be apparent to those skilled in the art may be made in the arrangement, operation and details of the method and apparatus of the present invention disclosed herein without departing from the spirit and scope of the invention as defined in the appended claims. Therefore, the scope of the invention should be determined by the appended claims and their legal equivalents.