Non-linear neural network equalizer for high-speed data channel
12284059 ยท 2025-04-22
Assignee
Inventors
Cpc classification
International classification
Abstract
A data channel on an integrated circuit device includes a non-linear equalizer having as inputs digitized samples of signals on the data channel, decoding circuitry configured to determine from outputs of the non-linear equalizer a respective value of each of the signals, and adaptation circuitry configured to adapt parameters of the non-linear equalizer based on respective ones of the value. The non-linear equalizer includes a non-linear filter portion, and a front-end filter portion configured to reduce numbers of the inputs from the digitized samples. The non-linear equalizer may be a neural network equalizer, such as a multi-layer perceptron neural network equalizer, a reduced complexity multi-layer perceptron neural network equalizer, or a radial-basis function neural network equalizer. Alternatively, the non-linear equalizer may include a linear filter and a non-linear activation function, which may be a hyperbolic tangent function.
Claims
1. A data channel on an integrated circuit device, the data channel comprising: a non-linear equalizer having as inputs digitized samples of signals on the data channel; decoding circuitry configured to determine from outputs of the non-linear equalizer a respective value of each of the signals; and adaptation circuitry configured to adapt parameters of the non-linear equalizer based on respective ones of the value; wherein: the non-linear equalizer includes: a front-end filter portion configured to combine some of the inputs from the digitized samples, to provide a reduced number of inputs; and a non-linear filter portion configured to operate on the reduced number of inputs.
2. The data channel of claim 1 wherein the non-linear equalizer is a neural network equalizer.
3. The data channel of claim 2 wherein the neural network equalizer is a multi-layer perceptron neural network equalizer.
4. The data channel of claim 3 wherein the multi-layer perceptron neural network equalizer is a reduced complexity multi-layer perceptron neural network equalizer.
5. The data channel of claim 2 wherein the neural network equalizer is a radial-basis function neural network equalizer.
6. The data channel of claim 1 wherein the non-linear equalizer comprises a linear filter and a non-linear activation function.
7. The data channel of claim 6 wherein the non-linear activation function is a hyperbolic tangent function.
8. The data channel of claim 1 wherein the adaptation circuitry adapts parameters of the non-linear equalizer based on cross-entropy.
9. The data channel of claim 1 wherein the front-end filter portion comprises a finite-impulse-response filter.
10. The data channel of claim 1 further comprising scalable bypass circuitry for controllably outputting output of the front-end filter portion as at least a portion of output of the non-linear equalizer.
11. A method for detecting data on a data channel on an integrated circuit device, the method comprising: performing non-linear equalization of digitized samples of input signals on the data channel; determining from output signals of the non-linear equalization a respective value of each of the output signals; and adapting parameters of the non-linear equalization based on respective ones of the value; wherein: performing non-linear equalization of digitized samples of input signals on the data channel includes: performing front-end filtering to combine some of the inputs from the digitized samples, to provide a reduced number of inputs; and performing non-linear filtering on the reduced number of inputs from the digitized samples.
12. The method of claim 11 wherein performing the non-linear equalization comprises performing neural network equalization.
13. The method of claim 12 wherein performing the neural network equalization comprises applying a multi-layer perceptron neural network equalizer.
14. The method of claim 13 wherein performing the neural network equalization comprises applying a reduced complexity multi-layer perceptron neural network equalizer.
15. The method of claim 12 wherein performing the neural network equalization comprises applying a radial-basis function neural network equalizer.
16. The method of claim 11 wherein performing the non-linear equalization comprises applying a non-linear activation function and performing linear filtering on output of the non-linear activation function.
17. The method of claim 16 wherein applying the non-linear activation function comprises applying a hyperbolic tangent function.
18. The method of claim 11 wherein adapting the parameters of the non-linear equalization comprises adapting the parameters of the non-linear equalization based on cross-entropy.
19. The method of claim 11 wherein the performing front-end filtering comprises performing finite-impulse-response filtering.
20. The method of claim 11 further comprising scalably bypassing the non-linear filtering for controllably outputting an output of the front-end filtering as at least a portion of output of the non-linear equalization.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Further features of the disclosure, its nature and various advantages, will be apparent upon consideration of the following detailed description, taken in conjunction with the accompanying drawings, in which like reference characters refer to like parts throughout, and in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
DETAILED DESCRIPTION
(15) As noted above, integrated circuit devices may include high-speed SERDES links between various device components. Typical SERDES links may suffer from significant non-linearity or channel impairment in the signal path, as a result of, e.g., insertion loss, inter-symbol-interference (ISI), and, in an optical system, non-linearities such as dispersion loss or, in a copper (i.e., wired) system, cross-talk, jitter, etc. Various forms of linear equalization typically are used, at the receiver end of such links, to attempt to deal with such channel impairments.
(16) Similarly, in magnetic recording, reading and writing are performed by a head of a hard disk drive that moves relative to the surface of a storage medium and writes data to, or reads data from, circular data tracks on a magnetic disk. In order to increase recording densities, it is desirable to shrink the bit cell, or area of disk surface in which a single bit is recorded. Shrinking the bit cell, however, increases inter-symbol interference (ISI) from data recorded on the media, thereby increasing the bit error rate (BER) and decreasing the reliability of read-back data. An increased BER also effectively reduces the rate at which data can be read back, owing to the overhead inherent in error-detection or error-correction techniques employed to compensate for the increased BER, and/or owing to repeat read-back attempts that may be necessary in order to accurately read-back data after erroneous data read-back attempts.
(17) Two-dimensional magnetic recording (TDMR) is another technique that has been developed in an effort to increase storage capacity in hard disk drives. TDMR employs a read-back technique that allows for greater storage capacity by combining signals simultaneously obtained from multiple read-back heads to enhance the accuracy of reading back data from one or more data tracks. A TDMR read-back channel typically includes a linear equalizer to mitigate the negative impact that noise has on the read-back channel signal integrity, and on the accuracy and reliability in reading back digital data values from the storage medium. Some TDMR read-back channels utilize minimum-mean-squared-error (MMSE) as a cost function to adapt the equalizer to further improve BER performance.
(18) However, linear equalization may not be sufficient to compensate for such non-linearities or interference. Linear equalization may not be enough to correctly assign received samples near the threshold between levels to the correct written bit or symbol when the signal-to-noise ratio is low.
(19) In accordance with implementations of the subject matter of this disclosure, non-linear equalization is used to compensate for non-linearities in a high-speed data channel, such as a SERDES channel, or a disk-drive read channel, thereby reducing the bit-error rate (BER). In different implementations, different types of non-linear equalizers may be used.
(20) Conceptually, a linear equalizer performs the separation of samples for assignment to one level or another by effectively drawing a straight line between groups of samples plotted in a two-dimensional (e.g., (x,y)) space. In channels that are insufficiently linear, or where the levels are too close together, there may not be a straight line that can be drawn between samples from different levels on such a plot. A non-linear equalizer effectively re-maps the samples into a different space in which the samples from different levels may be separated by a straight line or other smooth curve.
(21) A non-linear equalizer in accordance with implementations of the subject matter of this disclosure may be more or less complex. For example, a non-linear equalizer may have more or fewer variables, or taps, with complexity being proportional to the number of variables. In addition, a non-linear equalizer that operates at the bit leveli.e., operates separately on the bits of each symbol (e.g., two bits/symbol for four-level signaling in a data channel) rather than on the symbol as a wholemay be less complex than a non-linear equalizer that operates at the symbol level. Either way, greater complexity yields greater performance when all other considerations are equal. However, greater complexity also may require greater device area and/or power consumption.
(22) Types of non-linear equalizers that may be used in accordance with the subject matter of this disclosure may include forms of reduced-complexity multi-layer perceptron neural network (RC-MLPNN) equalizers and forms of reduced-complexity radial-basis-function neural network (RC-RBFNN) equalizers.
(23) Performance of the non-linear equalizer may be affected by the cost function used for adaptation of the equalizer. Implementations of the subject matter of this disclosure reduce the complexity of a non-linear equalizer whether it uses any one of various different cost functions for adaptation, including either a minimum mean-square error (MMSE or MSE) cost function, or a cross-entropy (CE)-based cost function. A CE-based cost function may yield a better result than an MMSE cost function, but a CE-based cost function is more complex than an MMSE cost function.
(24) According to implementations of the subject matter of this disclosure, a non-linear equalizer with reduced complexity but comparable performance is provided by appending, to a non-linear neural network equalizer, a front-end filter to reduce complexity of the inputs to the non-linear equalizer. For example, a finite-impulse-response (FIR) filter may be used as a front end to reduce complexity of the non-linear equalizer by reducing the number of input parameters or dimensions.
(25) The subject matter of this disclosure may be better understood by reference to
(26)
(27) In TDMR detector channel 100, respective analog-to-digital converter (ADC) outputs 111, 121 from two separate read heads (not shown) are input to an equalizer 101 for equalization according to implementations of the subject matter of this disclosure. Output Y of equalizer 101 is then passed to Viterbi detector 102, the output of which is then passed to a soft-output Viterbi algorithm (SOVA) detector 103. SOVA detector 103 provides log-likelihood ratios (LLRs) 113 to a decoder, such as a low-density parity check (LDPC) decoder 104 which decodes the bits. LLRs 113 also are fed back to equalizer 101 which is adapted using either a mean-square error (MSE) cost function 151 or a cross-entropy cost function 161 which compares the LLRs 113 to output bits 114 (which are expressed as non-return-to-zero data 131), to set a target 141.
(28) The purpose of implementing equalization on channel 100 is to correct for various sources of interference referred to above and thereby effectively move samples that are on the wrong side of a detection threshold (whether for bits read from a storage medium or signal levels in a SERDES channel) to the correct side of the threshold. Linear equalization effectively takes a plot of the samples in a two-dimensional (x,y) space and draws a straight line through the samples where the threshold ought to be. However, in a channel with non-linearities, there may be no straight line that can be drawn on that two-dimensional plot that would correctly separate the samples. In such a case, non-linear equalization can be used. A non-linear equalization function may effectively remap the samples into a different space in which there does exist a straight line that correctly separates the samples.
(29) Alternatively, the non-linear equalization function may remap the samples into a space in which there exists some smooth curve other than a straight line that correctly separates the samples. For example, non-linear equalization using a radial-basis function may remap the samples into a polar-coordinate, or radial, space in which the samples are grouped into circular or annular bands that can be separated by circles or ellipses.
(30) The advantage of non-linear equalization over linear equalization in a non-linear channel may be seen in a simplified illustration as shown in
(31) However, a radial-basis function
(32)
where c.sub.i is the centroid of the it.sup.h node, can be used to transform the XOR function from the linear Cartesian (x1,x2) space to a non-linear radial ((r1),(r2)) space as follows:
(33) TABLE-US-00001 x.sub.1 x.sub.2 (r.sub.1) (r.sub.2) y 0 0 0.1353 1 0 0 1 0.3678 0.3678 1 1 0 0.3678 0.3678 1 1 1 1 0.1353 0
which is diagrammed in
(34) As discussed below, various types of non-linear equalizers are available. Whatever type of non-linear equalizer is used may be adaptive to account for changing channel conditions. Various forms of adaptation may be used.
(35) One type of adaptation function that may be used is minimum mean-squared error (MMSE), where the mean-squared error (MSE) is defined as the square of the norm of the difference between the equalized signal (Y) and the ideal signal (Y). The equalizer may initially be adapted in a training mode in which the ideal signal values are available. Later, during run-time operation, the detected output values of the equalized channel should be close enough to the ideal values to be used for adaptation.
(36) Another type of adaptation function that may be used is the cross-entropy (CE) between a training bit and its log-likelihood ratio (LLR). In particular, cost function circuitry may be configured to compute a cross-entropy value indicative of a difference between a probability distribution of the detected bit value (which is a function of the LLR signal) and a probability distribution of the training bit value. The cost function circuitry then adapts the equalizer by setting an equalizer parameter (e.g., one or more coefficients of filter taps of the equalizer) to a value that corresponds to a minimum cross-entropy value from among the computed cross-entropy value and one or more previously computed cross-entropy values, to decrease a bit-error rate for the channel. As in the case of MSE equalization, the equalizer may initially be adapted in a training mode in which the ideal signal values are available. Later, during run-time operation, the detected output values of the equalized channel should be close enough to the ideal values to be used for adaptation. Specifically, if any forward error correction code (FEC) decoder (e.g., a Reed Solomon (RS) decoder or Low-Density Parity Check (LDPC) decoder) is available after the equalizer, then successfully decoded frames from the FEC decoder output may be used for adaptation.
(37) LLR may be defined as the relationship between the probability (P.sub.0) of a bit being 0 and the probability (P.sub.1) of a bit being 1:
(38)
The cross-entropy between a training bit and its LLR may be computed as follows:
(39)
(40) When the true bit is a logic 0 but the probability of the detected bit represented by the LLR indicates that P.sub.0=0, or the true bit is a logic 1 but the probability of the detected bit represented by the LLR indicates that P.sub.1=0, then the true value is the complete opposite of the expected value, meaning that cost (cross-entropy) approaches infinity. On the other hand, when the probability of a detected bit value as indicated by the LLR agrees with the true bit value, then cross-entropy equals zero. Insofar as in most cases both probabilities P.sub.0 and P.sub.1 are higher than 0 and lower than 1, cross-entropy will be a finite non-zero value. Thus, this cost function can be used for adaptation and reflects the quality of the detected bits, with the goal being to minimize cross-entropy.
(41) The gradient of cross-entropy with respect to the LLR may be computed by substituting for P.sub.0 and P.sub.1 in the cross-entropy equation:
(42)
(43) The LLR may be adapted to minimize cross-entropy (i.e.,
(44)
), as follows:
LLR.sub.t+1=LLR.sub.t.Math.P.sub.1 if bit=0
LLR.sub.t+1=LLR.sub.t+.Math.P.sub.0 if bit=1
(45) A negative LLR means bit=0 has a higher probability than bit=1, while a positive LLR means bit=1 has a higher probability than bit=0. In these equations, P.sub.0 and P.sub.1 are probabilities and therefore are positive values, and a is an adaptation bandwidth which also is positive. Therefore, when the true bit=0 then adaptation using cross-entropy will make a negative LLR more negative, and when the true bit=1 then adaptation using cross-entropy will make a positive LLR more positive. Therefore, cross-entropy-based adaptation maximizes the magnitude of the LLR and hence is a maximum-likelihood adaptation which reduces BER. Thus, adaptation of the equalizer to minimize cross-entropy also minimizes BER.
(46) If one assumes that there is a general computation graph from parameter X.fwdarw.Y.fwdarw.LLR.fwdarw.CE such that parameter X affects the value of output Y which affects the LLR, from which the cross-entropy may be computed, then the cross-entropy gradient can be expressed in terms of other parameters:
(47)
Therefore, any parameter can be adapted to minimize the cross-entropy.
(48)
(49) A first implementation of a reduced-complexity non-linear neural network filter 500, shown in
(50) In radial-basis function non-linear neural network filter 501, digital samples from two inputs 511, 521 are delayed by delay line 531 and combined in radial-basis function non-linear neural network 541. As seen in
(51) Each sample input at 511, 521 adds a parameter or dimension to radial-basis function non-linear neural network filter 501, increasing filter complexity. In order to reduce the complexity of radial-basis function non-linear neural network filter 501, reduced-complexity non-linear neural network filter 500 includes front-end filter 502, which combines some of the inputs from ADC outputs 111, 121 to provide a reduced number of inputs 511, 521 to radial-basis function non-linear neural network filter 501. As can be seen in
(52) In the implementation of
(53) A second implementation 600 of a reduced-complexity non-linear neural network filter, shown in
(54) However, in this implementation, rather than being summed, the taps of delay line 612 are input directly to the hidden nodes 650 of radial-basis function non-linear neural network filter stage 601, which in this implementation are upstream of delay line 631.
(55) Once again, with inputs 111, 121 from two sources, half 613 of delay line 612 of front-end filter 602 is devoted to input 111, while half 614 of delay line 612 of front-end filter 602 is devoted to input 121, with one respective hidden node 650 of radial-basis function non-linear neural network filter stage 601 for each input source 111, 121. The same is true of delay line 631 within radial-basis function non-linear neural network filter stage 601, with separate halves 632, 633 of delay line 631 devoted to inputs deriving separately from inputs 111, 121. Here too, the delays 631 form individual taps of a final FIR filter, which are combined at summation node 641 to yield the output Y.
(56) A third implementation of a reduced-complexity non-linear neural network filter 700, shown in
(57) Typically, an MLP filter includes a delay line for input samples, followed by at least one hidden layer in which the samples are summed and then passed through a non-linear activation function such as, e.g., a hyperbolic tangent function tanh (), followed by a layer including one or more summations.
(58) In finite-impulse-response-(FIR)-based front-end filter 701, delay line 731 is divided into a first portion 732 receiving inputs 111 and a second portion 733 receiving inputs 121. Each line connecting a delay 731 to one of hidden nodes 750 represents a multiplication of a sample by a coefficient (not shown; see discussion above in connection with
(59) In this implementation, the boundary between the front-end filter 701 and the MLP non-linear neural network filter 702 runs through the hidden layer of hidden nodes 750, but that is not necessarily the case in all implementations.
(60) MLP non-linear neural network filter 702 in this implementation includes a respective tanh () non-linear activation function as part of each respective one of hidden nodes 750 and a FIR filter formed by a delay line 712 and a summation node 722. A portion 751 of delay line 712 receives output samples 711 from front-end filter 702, while a portion 752 of delay line 712 receives output samples 721 from front-end filter 701. Each line connecting a delay 712 to sum 722 represents a multiplication of a sample by a coefficient (not shown; see discussion above in connection with
(61) Reduced-complexity non-linear neural network filter 700 may be represented as an equivalent filter arrangement 800, shown in
(62) FIR filters 801, 802 form finite-impulse-response-(FIR)-based front-end filter 810, with FIR filter 801 receiving inputs 111 while FIR filter 802 receives inputs 121. FIR filters 803, 804 and non-linear activation functions 805, 806 form reduced-complexity non-linear neural network 820. In reduced-complexity non-linear neural network 820, activation function 805 receives the outputs of FIR filter 801 and passes those outputs, after non-linear activation, to FIR filter 803, while activation function 806 receives the outputs of FIR filter 802 and passes those outputs, after non-linear activation, to FIR filter 804. The outputs of FIR filter 803 and FIR filter 804 are combined at summation node 808 to yield the output Y.
(63) Another implementation of a reduced-complexity non-linear neural network filter 900, shown in
(64) The outputs 941 of finite-impulse-response-(FIR)-based front-end filter 901 are then filtered by multilayer perceptron (MLP) non-linear neural network filter 902, which includes a non-linear activation function 912 (which may be a tanh (_) non-linear activation function), followed by FIR filter 922.
(65) In a variation 1000 of reduced-complexity non-linear neural network filter 900, shown in
(66) In addition, a non-linear function 1100 (particularly one that is close to a linear function 1101) can be approximated as a series of linear functions 1102 of different slopes, as shown in
(67) A similar variation 1200, based on reduced-complexity non-linear neural network filter 800, is shown in
(68) It can be shown that the various implementations of a reduced-complexity non-linear neural network filter shown above provide nearly as good performance as a non-reduced-complexity non-linear neural network filter, particularly when adapted using cross-entropy. However, the reduced complexity provides substantial savings in device area and power consumption.
(69) Although the implementations shown above receive two inputs (as in the case of a TDMR channel), implementations of the subject matter of this disclosure may include channels with only one input, or with three or more inputs. In such cases, the input delay lines may not be divided into groups (in the case of one input), or may be divided into three or more groups (in the case of three or more inputs), rather than being divided into two groups as shown), with each group receiving samples from one of the inputs.
(70) A method 1300 according to implementations of the subject matter of this disclosure is diagrammed in
(71) Method 1300 begins at 1301 where non-linear equalization of digitized samples of input signals on the data channel is performed, including performing front-end filtering at 1311 to reduce numbers of the inputs from the digitized samples, performing non-linear filtering at 1321 on the reduced number of inputs from the digitized samples. At 1302, a respective value of each of the output signals is determined from output signals of the non-linear equalization. At 1303, parameters of the non-linear equalization are adapted based on respective ones of the value, and method 1300 ends.
(72) Thus it is seen that a high-speed data channel using a reduced-complexity non-linear equalizer has been provided.
(73) As used herein and in the claims which follow, the construction one of A and B shall mean A or B.
(74) It is noted that the foregoing is only illustrative of the principles of the invention, and that the invention can be practiced by other than the described embodiments, which are presented for purposes of illustration and not of limitation, and the present invention is limited only by the claims which follow.