Non-linear neural network equalizer for high-speed data channel

12284059 ยท 2025-04-22

Assignee

Inventors

Cpc classification

International classification

Abstract

A data channel on an integrated circuit device includes a non-linear equalizer having as inputs digitized samples of signals on the data channel, decoding circuitry configured to determine from outputs of the non-linear equalizer a respective value of each of the signals, and adaptation circuitry configured to adapt parameters of the non-linear equalizer based on respective ones of the value. The non-linear equalizer includes a non-linear filter portion, and a front-end filter portion configured to reduce numbers of the inputs from the digitized samples. The non-linear equalizer may be a neural network equalizer, such as a multi-layer perceptron neural network equalizer, a reduced complexity multi-layer perceptron neural network equalizer, or a radial-basis function neural network equalizer. Alternatively, the non-linear equalizer may include a linear filter and a non-linear activation function, which may be a hyperbolic tangent function.

Claims

1. A data channel on an integrated circuit device, the data channel comprising: a non-linear equalizer having as inputs digitized samples of signals on the data channel; decoding circuitry configured to determine from outputs of the non-linear equalizer a respective value of each of the signals; and adaptation circuitry configured to adapt parameters of the non-linear equalizer based on respective ones of the value; wherein: the non-linear equalizer includes: a front-end filter portion configured to combine some of the inputs from the digitized samples, to provide a reduced number of inputs; and a non-linear filter portion configured to operate on the reduced number of inputs.

2. The data channel of claim 1 wherein the non-linear equalizer is a neural network equalizer.

3. The data channel of claim 2 wherein the neural network equalizer is a multi-layer perceptron neural network equalizer.

4. The data channel of claim 3 wherein the multi-layer perceptron neural network equalizer is a reduced complexity multi-layer perceptron neural network equalizer.

5. The data channel of claim 2 wherein the neural network equalizer is a radial-basis function neural network equalizer.

6. The data channel of claim 1 wherein the non-linear equalizer comprises a linear filter and a non-linear activation function.

7. The data channel of claim 6 wherein the non-linear activation function is a hyperbolic tangent function.

8. The data channel of claim 1 wherein the adaptation circuitry adapts parameters of the non-linear equalizer based on cross-entropy.

9. The data channel of claim 1 wherein the front-end filter portion comprises a finite-impulse-response filter.

10. The data channel of claim 1 further comprising scalable bypass circuitry for controllably outputting output of the front-end filter portion as at least a portion of output of the non-linear equalizer.

11. A method for detecting data on a data channel on an integrated circuit device, the method comprising: performing non-linear equalization of digitized samples of input signals on the data channel; determining from output signals of the non-linear equalization a respective value of each of the output signals; and adapting parameters of the non-linear equalization based on respective ones of the value; wherein: performing non-linear equalization of digitized samples of input signals on the data channel includes: performing front-end filtering to combine some of the inputs from the digitized samples, to provide a reduced number of inputs; and performing non-linear filtering on the reduced number of inputs from the digitized samples.

12. The method of claim 11 wherein performing the non-linear equalization comprises performing neural network equalization.

13. The method of claim 12 wherein performing the neural network equalization comprises applying a multi-layer perceptron neural network equalizer.

14. The method of claim 13 wherein performing the neural network equalization comprises applying a reduced complexity multi-layer perceptron neural network equalizer.

15. The method of claim 12 wherein performing the neural network equalization comprises applying a radial-basis function neural network equalizer.

16. The method of claim 11 wherein performing the non-linear equalization comprises applying a non-linear activation function and performing linear filtering on output of the non-linear activation function.

17. The method of claim 16 wherein applying the non-linear activation function comprises applying a hyperbolic tangent function.

18. The method of claim 11 wherein adapting the parameters of the non-linear equalization comprises adapting the parameters of the non-linear equalization based on cross-entropy.

19. The method of claim 11 wherein the performing front-end filtering comprises performing finite-impulse-response filtering.

20. The method of claim 11 further comprising scalably bypassing the non-linear filtering for controllably outputting an output of the front-end filtering as at least a portion of output of the non-linear equalization.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) Further features of the disclosure, its nature and various advantages, will be apparent upon consideration of the following detailed description, taken in conjunction with the accompanying drawings, in which like reference characters refer to like parts throughout, and in which:

(2) FIG. 1 illustrates a TDMR detector channel as one example of a channel with which implementations of the subject matter of this disclosure may be used;

(3) FIG. 2 is a plot of an exclusive-OR function in a Cartesian coordinate space illustrating a problem solved by implementations of the subject matter of this disclosure;

(4) FIG. 3 is a plot of a transformation of the exclusive-OR function of FIG. 2 into a different coordinate space illustrating a solution based on implementations of the subject matter of this disclosure;

(5) FIG. 4 is a schematic representation of a general implementation of a reduced-complexity non-linear neural network filter in accordance with the subject matter of this disclosure;

(6) FIG. 5 is a diagram of a first implementation of a reduced-complexity non-linear neural network filter in accordance with the subject matter of this disclosure;

(7) FIG. 6 is a diagram of a second implementation of a reduced-complexity non-linear neural network filter in accordance with the subject matter of this disclosure;

(8) FIG. 7 is a diagram of a third implementation of a reduced-complexity non-linear neural network filter in accordance with the subject matter of this disclosure;

(9) FIG. 8 is an alternative representation of the implementation of the reduced-complexity non-linear neural network filter shown in FIG. 7;

(10) FIG. 9 is a diagram of a fourth implementation of a reduced-complexity non-linear neural network filter in accordance with the subject matter of this disclosure;

(11) FIG. 10 is a diagram of a fifth implementation of a reduced-complexity non-linear neural network filter in accordance with the subject matter of this disclosure;

(12) FIG. 11 is a graphical representation of a function to be filtered;

(13) FIG. 12 is a diagram of a sixth implementation of a reduced-complexity non-linear neural network filter in accordance with the subject matter of this disclosure; and

(14) FIG. 13 is flow diagram illustrating a method according to implementations of the subject matter of this disclosure.

DETAILED DESCRIPTION

(15) As noted above, integrated circuit devices may include high-speed SERDES links between various device components. Typical SERDES links may suffer from significant non-linearity or channel impairment in the signal path, as a result of, e.g., insertion loss, inter-symbol-interference (ISI), and, in an optical system, non-linearities such as dispersion loss or, in a copper (i.e., wired) system, cross-talk, jitter, etc. Various forms of linear equalization typically are used, at the receiver end of such links, to attempt to deal with such channel impairments.

(16) Similarly, in magnetic recording, reading and writing are performed by a head of a hard disk drive that moves relative to the surface of a storage medium and writes data to, or reads data from, circular data tracks on a magnetic disk. In order to increase recording densities, it is desirable to shrink the bit cell, or area of disk surface in which a single bit is recorded. Shrinking the bit cell, however, increases inter-symbol interference (ISI) from data recorded on the media, thereby increasing the bit error rate (BER) and decreasing the reliability of read-back data. An increased BER also effectively reduces the rate at which data can be read back, owing to the overhead inherent in error-detection or error-correction techniques employed to compensate for the increased BER, and/or owing to repeat read-back attempts that may be necessary in order to accurately read-back data after erroneous data read-back attempts.

(17) Two-dimensional magnetic recording (TDMR) is another technique that has been developed in an effort to increase storage capacity in hard disk drives. TDMR employs a read-back technique that allows for greater storage capacity by combining signals simultaneously obtained from multiple read-back heads to enhance the accuracy of reading back data from one or more data tracks. A TDMR read-back channel typically includes a linear equalizer to mitigate the negative impact that noise has on the read-back channel signal integrity, and on the accuracy and reliability in reading back digital data values from the storage medium. Some TDMR read-back channels utilize minimum-mean-squared-error (MMSE) as a cost function to adapt the equalizer to further improve BER performance.

(18) However, linear equalization may not be sufficient to compensate for such non-linearities or interference. Linear equalization may not be enough to correctly assign received samples near the threshold between levels to the correct written bit or symbol when the signal-to-noise ratio is low.

(19) In accordance with implementations of the subject matter of this disclosure, non-linear equalization is used to compensate for non-linearities in a high-speed data channel, such as a SERDES channel, or a disk-drive read channel, thereby reducing the bit-error rate (BER). In different implementations, different types of non-linear equalizers may be used.

(20) Conceptually, a linear equalizer performs the separation of samples for assignment to one level or another by effectively drawing a straight line between groups of samples plotted in a two-dimensional (e.g., (x,y)) space. In channels that are insufficiently linear, or where the levels are too close together, there may not be a straight line that can be drawn between samples from different levels on such a plot. A non-linear equalizer effectively re-maps the samples into a different space in which the samples from different levels may be separated by a straight line or other smooth curve.

(21) A non-linear equalizer in accordance with implementations of the subject matter of this disclosure may be more or less complex. For example, a non-linear equalizer may have more or fewer variables, or taps, with complexity being proportional to the number of variables. In addition, a non-linear equalizer that operates at the bit leveli.e., operates separately on the bits of each symbol (e.g., two bits/symbol for four-level signaling in a data channel) rather than on the symbol as a wholemay be less complex than a non-linear equalizer that operates at the symbol level. Either way, greater complexity yields greater performance when all other considerations are equal. However, greater complexity also may require greater device area and/or power consumption.

(22) Types of non-linear equalizers that may be used in accordance with the subject matter of this disclosure may include forms of reduced-complexity multi-layer perceptron neural network (RC-MLPNN) equalizers and forms of reduced-complexity radial-basis-function neural network (RC-RBFNN) equalizers.

(23) Performance of the non-linear equalizer may be affected by the cost function used for adaptation of the equalizer. Implementations of the subject matter of this disclosure reduce the complexity of a non-linear equalizer whether it uses any one of various different cost functions for adaptation, including either a minimum mean-square error (MMSE or MSE) cost function, or a cross-entropy (CE)-based cost function. A CE-based cost function may yield a better result than an MMSE cost function, but a CE-based cost function is more complex than an MMSE cost function.

(24) According to implementations of the subject matter of this disclosure, a non-linear equalizer with reduced complexity but comparable performance is provided by appending, to a non-linear neural network equalizer, a front-end filter to reduce complexity of the inputs to the non-linear equalizer. For example, a finite-impulse-response (FIR) filter may be used as a front end to reduce complexity of the non-linear equalizer by reducing the number of input parameters or dimensions.

(25) The subject matter of this disclosure may be better understood by reference to FIGS. 1-13.

(26) FIG. 1 illustrates a TDMR detector channel 100 as one example of a channel with which implementations of the subject matter of this disclosure may be used. However, as noted above, implementations of the subject matter of this disclosure also may be used with other forms of high-speed data channels such as a SERDES channel (not shown).

(27) In TDMR detector channel 100, respective analog-to-digital converter (ADC) outputs 111, 121 from two separate read heads (not shown) are input to an equalizer 101 for equalization according to implementations of the subject matter of this disclosure. Output Y of equalizer 101 is then passed to Viterbi detector 102, the output of which is then passed to a soft-output Viterbi algorithm (SOVA) detector 103. SOVA detector 103 provides log-likelihood ratios (LLRs) 113 to a decoder, such as a low-density parity check (LDPC) decoder 104 which decodes the bits. LLRs 113 also are fed back to equalizer 101 which is adapted using either a mean-square error (MSE) cost function 151 or a cross-entropy cost function 161 which compares the LLRs 113 to output bits 114 (which are expressed as non-return-to-zero data 131), to set a target 141.

(28) The purpose of implementing equalization on channel 100 is to correct for various sources of interference referred to above and thereby effectively move samples that are on the wrong side of a detection threshold (whether for bits read from a storage medium or signal levels in a SERDES channel) to the correct side of the threshold. Linear equalization effectively takes a plot of the samples in a two-dimensional (x,y) space and draws a straight line through the samples where the threshold ought to be. However, in a channel with non-linearities, there may be no straight line that can be drawn on that two-dimensional plot that would correctly separate the samples. In such a case, non-linear equalization can be used. A non-linear equalization function may effectively remap the samples into a different space in which there does exist a straight line that correctly separates the samples.

(29) Alternatively, the non-linear equalization function may remap the samples into a space in which there exists some smooth curve other than a straight line that correctly separates the samples. For example, non-linear equalization using a radial-basis function may remap the samples into a polar-coordinate, or radial, space in which the samples are grouped into circular or annular bands that can be separated by circles or ellipses.

(30) The advantage of non-linear equalization over linear equalization in a non-linear channel may be seen in a simplified illustration as shown in FIGS. 2 and 3, where the signal to be equalized is characterized by the exclusive-OR (XOR or @) function. FIG. 2 is plot of y=x1x2 in (x1,x2) space, where the open dots 201, 202 represent y=0 and cross-hatched dots 203, 204 represent y=1. It is apparent that there is no straight line that can be drawn separating the open dots from the cross-hatched dots.

(31) However, a radial-basis function

(32) ( r i ) = ( .Math. x - c i .Math. ) = e - .Math. [ x 1 x 2 ] - c i .Math. 2
where c.sub.i is the centroid of the it.sup.h node, can be used to transform the XOR function from the linear Cartesian (x1,x2) space to a non-linear radial ((r1),(r2)) space as follows:

(33) TABLE-US-00001 x.sub.1 x.sub.2 (r.sub.1) (r.sub.2) y 0 0 0.1353 1 0 0 1 0.3678 0.3678 1 1 0 0.3678 0.3678 1 1 1 1 0.1353 0
which is diagrammed in FIG. 3. As can be seen, when mapped into the non-linear radial ((r1),(r2)) space, the values 301, 302, 303 (as can be seen, both of the two y=1 points 201, 202 in (x1,x2) space map to the same point 301 in ((r1),(r2)) space) of the XOR function 300 may be separated by straight line 304.

(34) As discussed below, various types of non-linear equalizers are available. Whatever type of non-linear equalizer is used may be adaptive to account for changing channel conditions. Various forms of adaptation may be used.

(35) One type of adaptation function that may be used is minimum mean-squared error (MMSE), where the mean-squared error (MSE) is defined as the square of the norm of the difference between the equalized signal (Y) and the ideal signal (Y). The equalizer may initially be adapted in a training mode in which the ideal signal values are available. Later, during run-time operation, the detected output values of the equalized channel should be close enough to the ideal values to be used for adaptation.

(36) Another type of adaptation function that may be used is the cross-entropy (CE) between a training bit and its log-likelihood ratio (LLR). In particular, cost function circuitry may be configured to compute a cross-entropy value indicative of a difference between a probability distribution of the detected bit value (which is a function of the LLR signal) and a probability distribution of the training bit value. The cost function circuitry then adapts the equalizer by setting an equalizer parameter (e.g., one or more coefficients of filter taps of the equalizer) to a value that corresponds to a minimum cross-entropy value from among the computed cross-entropy value and one or more previously computed cross-entropy values, to decrease a bit-error rate for the channel. As in the case of MSE equalization, the equalizer may initially be adapted in a training mode in which the ideal signal values are available. Later, during run-time operation, the detected output values of the equalized channel should be close enough to the ideal values to be used for adaptation. Specifically, if any forward error correction code (FEC) decoder (e.g., a Reed Solomon (RS) decoder or Low-Density Parity Check (LDPC) decoder) is available after the equalizer, then successfully decoded frames from the FEC decoder output may be used for adaptation.

(37) LLR may be defined as the relationship between the probability (P.sub.0) of a bit being 0 and the probability (P.sub.1) of a bit being 1:

(38) LLR = L = log ( P 1 P 0 ) P 1 + P 0 = 1 P 0 = 1 ( 1 + e L ) P 1 = e L ( 1 + e L )
The cross-entropy between a training bit and its LLR may be computed as follows:

(39) Cross Entropy ( bit , LLR ) = - P ( bit = 0 ) .Math. log ( P 0 ) - P ( bit = 1 ) .Math. log ( P 1 ) Cross Entropy ( bit , LLR ) = - ( 1 - bit ) .Math. log ( P 0 ) - bit .Math. log ( P 1 ) Cross Entropy = Inf when { bit = 0 , P 0 = 0 bit = 1 , P 1 = 0 Cross Entropy = 0 when { bit = 0 , P 0 = 1 bit = 1 , P 1 = 1

(40) When the true bit is a logic 0 but the probability of the detected bit represented by the LLR indicates that P.sub.0=0, or the true bit is a logic 1 but the probability of the detected bit represented by the LLR indicates that P.sub.1=0, then the true value is the complete opposite of the expected value, meaning that cost (cross-entropy) approaches infinity. On the other hand, when the probability of a detected bit value as indicated by the LLR agrees with the true bit value, then cross-entropy equals zero. Insofar as in most cases both probabilities P.sub.0 and P.sub.1 are higher than 0 and lower than 1, cross-entropy will be a finite non-zero value. Thus, this cost function can be used for adaptation and reflects the quality of the detected bits, with the goal being to minimize cross-entropy.

(41) The gradient of cross-entropy with respect to the LLR may be computed by substituting for P.sub.0 and P.sub.1 in the cross-entropy equation:

(42) ( C E ) ( L L R ) = P 1 - bit = { P 1 when bit = 0 P 1 - 1 = - P 0 when bit = 1

(43) The LLR may be adapted to minimize cross-entropy (i.e.,

(44) ( C E ) ( L L R ) = 0
), as follows:
LLR.sub.t+1=LLR.sub.t.Math.P.sub.1 if bit=0
LLR.sub.t+1=LLR.sub.t+.Math.P.sub.0 if bit=1

(45) A negative LLR means bit=0 has a higher probability than bit=1, while a positive LLR means bit=1 has a higher probability than bit=0. In these equations, P.sub.0 and P.sub.1 are probabilities and therefore are positive values, and a is an adaptation bandwidth which also is positive. Therefore, when the true bit=0 then adaptation using cross-entropy will make a negative LLR more negative, and when the true bit=1 then adaptation using cross-entropy will make a positive LLR more positive. Therefore, cross-entropy-based adaptation maximizes the magnitude of the LLR and hence is a maximum-likelihood adaptation which reduces BER. Thus, adaptation of the equalizer to minimize cross-entropy also minimizes BER.

(46) If one assumes that there is a general computation graph from parameter X.fwdarw.Y.fwdarw.LLR.fwdarw.CE such that parameter X affects the value of output Y which affects the LLR, from which the cross-entropy may be computed, then the cross-entropy gradient can be expressed in terms of other parameters:

(47) ( CE ) ( parameter X ) = ( parameter Y ) ( parameter X ) .Math. ( L L R ) ( parameter Y ) .Math. ( C E ) ( L L R )
Therefore, any parameter can be adapted to minimize the cross-entropy.

(48) FIG. 4 shows a general implementation 400 of a reduced-complexity non-linear neural network filter 401 in accordance with the subject matter of this disclosure for equalizing the two ADC outputs 111, 121 in the TDMR read channel 100 of FIG. 1. Reduced-complexity non-linear neural network filter 401 accepts inputs 111, 121 of a certain complexity, but initially filters inputs 111, 121 through a front-end filter 402 to reduce the complexity of inputs 111, 121, before filtering reduced-complexity inputs 411, 421 through non-linear filter circuitry 403. Reduction of the complexity of inputs 411, 421 allows a reduction in the complexity (as measured by dimensionality) of non-linear filter circuitry 403, therefore the complexity of non-linear neural network filter 401, without having to reduce the complexity of the inputs 111, 121 being filtered.

(49) A first implementation of a reduced-complexity non-linear neural network filter 500, shown in FIG. 5, is based on a radial-basis function non-linear neural network filter 501, with a finite-impulse-response-(FIR)-based front-end filter 502.

(50) In radial-basis function non-linear neural network filter 501, digital samples from two inputs 511, 521 are delayed by delay line 531 and combined in radial-basis function non-linear neural network 541. As seen in FIG. 5, radial-basis function non-linear neural network 541 includes at least one hidden layer 550 of hidden nodes 551. Each hidden node 551 operates on each delayed sample with a radial-basis function, but to avoid crowding the drawing only some delays in delay line 531 are shown as being coupled to each hidden node 551. The outputs of hidden layer 550 are combined (e.g., by addition) at 552 to provide Y output 503.

(51) Each sample input at 511, 521 adds a parameter or dimension to radial-basis function non-linear neural network filter 501, increasing filter complexity. In order to reduce the complexity of radial-basis function non-linear neural network filter 501, reduced-complexity non-linear neural network filter 500 includes front-end filter 502, which combines some of the inputs from ADC outputs 111, 121 to provide a reduced number of inputs 511, 521 to radial-basis function non-linear neural network filter 501. As can be seen in FIG. 5, in this implementation, front-end filter 502 uses FIR filtering (each line connecting a delay 512 to sum 522 represents multiplication of a sample by a coefficient (not shown), forming a filter tap, with the taps being summed at 522) to combine, e.g., every four input samples from ADC outputs 111, 121 into one input sample 511, 521. This allows a reduction in the complexity (as measured by dimensionality) of radial-basis function non-linear neural network filter 501, and therefore the complexity of non-linear neural network filter 500, without having to reduce the complexity of the inputs 111, 121 being filtered. The unseen coefficients may be parameters that adapted with a back-propagation algorithm and, for example, may be derived from the equation set forth above in connection with the cross-entropy gradient.

(52) In the implementation of FIG. 5, each set of input samples 111, 121 is processed in a separate portion of delay line 512, and in a separate portion of delay line 531. In this implementation, with two sets of input samples (from the two read heads of TDMR channel 100), each delay line is divided into two segments. However, more generally, the number of segments corresponds to the number of input sets. Thus, for a single input set, there would be only one segment (i.e., the delay line would not be segmented) but if there were three inputs sets, the delay line may be divided into three segments, etc.

(53) A second implementation 600 of a reduced-complexity non-linear neural network filter, shown in FIG. 6, also is based on a radial basis filter neural network filter 601, with a finite-impulse-response-(FIR)-based front-end filter 602. As in the case of front-end filter 502, front-end filter 602 uses FIR filtering (each line connecting a delay 612 to radial basis function 611, 621 represents multiplication of a sample by a coefficient (not shown; see discussion above in connection with FIG. 5) forming a filter tap) to combine, e.g., every four input samples from ADC outputs 111, 121 into one input sample 611, 621, thereby allowing a reduction in the complexity (as measured by dimensionality) of radial-basis function non-linear neural network filter 601, therefore the complexity of non-linear neural network filter 600, without having to reduce the complexity of the inputs 111, 121 being filtered.

(54) However, in this implementation, rather than being summed, the taps of delay line 612 are input directly to the hidden nodes 650 of radial-basis function non-linear neural network filter stage 601, which in this implementation are upstream of delay line 631.

(55) Once again, with inputs 111, 121 from two sources, half 613 of delay line 612 of front-end filter 602 is devoted to input 111, while half 614 of delay line 612 of front-end filter 602 is devoted to input 121, with one respective hidden node 650 of radial-basis function non-linear neural network filter stage 601 for each input source 111, 121. The same is true of delay line 631 within radial-basis function non-linear neural network filter stage 601, with separate halves 632, 633 of delay line 631 devoted to inputs deriving separately from inputs 111, 121. Here too, the delays 631 form individual taps of a final FIR filter, which are combined at summation node 641 to yield the output Y.

(56) A third implementation of a reduced-complexity non-linear neural network filter 700, shown in FIG. 7, is based on a multilayer perceptron (MLP) non-linear neural network filter 702, with a finite-impulse-response-(FIR)-based front-end filter 701.

(57) Typically, an MLP filter includes a delay line for input samples, followed by at least one hidden layer in which the samples are summed and then passed through a non-linear activation function such as, e.g., a hyperbolic tangent function tanh (), followed by a layer including one or more summations.

(58) In finite-impulse-response-(FIR)-based front-end filter 701, delay line 731 is divided into a first portion 732 receiving inputs 111 and a second portion 733 receiving inputs 121. Each line connecting a delay 731 to one of hidden nodes 750 represents a multiplication of a sample by a coefficient (not shown; see discussion above in connection with FIG. 5) forming a FIR filter tap. The taps are summed by the summation portion of each hidden node 750, which includes a summation function followed by a non-linear activation function which in this implementation is a tanh () function. Although the hidden layer is shown as having only one hidden node 750 for all of the inputs in each respective set of inputs 111, 121, in other implementations (not shown) there may be multiple nodes 750 for each set of inputs 111, 121. In any event, a set of outputs 711 is generated based on front-end filtering of inputs 111, and another set of outputs 721 is generated based on front-end filtering of inputs 121.

(59) In this implementation, the boundary between the front-end filter 701 and the MLP non-linear neural network filter 702 runs through the hidden layer of hidden nodes 750, but that is not necessarily the case in all implementations.

(60) MLP non-linear neural network filter 702 in this implementation includes a respective tanh () non-linear activation function as part of each respective one of hidden nodes 750 and a FIR filter formed by a delay line 712 and a summation node 722. A portion 751 of delay line 712 receives output samples 711 from front-end filter 702, while a portion 752 of delay line 712 receives output samples 721 from front-end filter 701. Each line connecting a delay 712 to sum 722 represents a multiplication of a sample by a coefficient (not shown; see discussion above in connection with FIG. 5) forming a FIR filter tap, and the taps are combined at summation node 722 to yield the output Y.

(61) Reduced-complexity non-linear neural network filter 700 may be represented as an equivalent filter arrangement 800, shown in FIG. 8. Reduced-complexity non-linear neural network filter 800 includes four FIR filters 801, 802, 803, 804, and two non-linear activation functions 805, 806 (which may be respective tanh () non-linear activation functions).

(62) FIR filters 801, 802 form finite-impulse-response-(FIR)-based front-end filter 810, with FIR filter 801 receiving inputs 111 while FIR filter 802 receives inputs 121. FIR filters 803, 804 and non-linear activation functions 805, 806 form reduced-complexity non-linear neural network 820. In reduced-complexity non-linear neural network 820, activation function 805 receives the outputs of FIR filter 801 and passes those outputs, after non-linear activation, to FIR filter 803, while activation function 806 receives the outputs of FIR filter 802 and passes those outputs, after non-linear activation, to FIR filter 804. The outputs of FIR filter 803 and FIR filter 804 are combined at summation node 808 to yield the output Y.

(63) Another implementation of a reduced-complexity non-linear neural network filter 900, shown in FIG. 9, also is based on a multilayer perceptron (MLP) non-linear neural network filter 902, with a finite-impulse-response-(FIR)-based front-end filter 901. In this implementation 900, finite-impulse-response-(FIR)-based front-end filter 901 includes two FIR filters 911, 921, each of which filters a respective set of inputs 111, 121. The respective outputs of FIR filters 911, 921 are combined by summation node 931.

(64) The outputs 941 of finite-impulse-response-(FIR)-based front-end filter 901 are then filtered by multilayer perceptron (MLP) non-linear neural network filter 902, which includes a non-linear activation function 912 (which may be a tanh (_) non-linear activation function), followed by FIR filter 922.

(65) In a variation 1000 of reduced-complexity non-linear neural network filter 900, shown in FIG. 10, a scalable bypass path 1001 is provided around non-linear neural network filter 902. Scalable bypass path 1001 is controlled by a scaling factor g (1011). FIR filter 922 inherently includes a similar scaling control. The provision of scalable bypass path 1001 allows several modes of operation. First, if g=0, reduced-complexity non-linear neural network filter 1000 operates identically to reduced-complexity non-linear neural network filter 900. Second, by setting g=1, and setting the scaling factor of FIR filter 922 to 0, reduced-complexity non-linear neural network filter 1000 operates as a linear filter. This linear mode may be used as a jump start mode while the non-linear portion of the filter is adapting.

(66) In addition, a non-linear function 1100 (particularly one that is close to a linear function 1101) can be approximated as a series of linear functions 1102 of different slopes, as shown in FIG. 11. By varying g to vary the slopes, non-linear function 1100 can be filtered using mostly finite-impulse-response-(FIR)-based front-end filter 901, which is linear, with non-linear neural network filter 902 correcting for the difference between the segmented linear approximation and the actual non-linear function.

(67) A similar variation 1200, based on reduced-complexity non-linear neural network filter 800, is shown in FIG. 12. A scalable bypass path 1201 is provided around non-linear neural network filter 820. Scalable bypass path 1201 is controlled by a scaling factor g (1211). FIR filters 803, 804 of non-linear neural network filter 820 inherently include a similar scaling control. By controlling g at 1211, non-linear neural network filter 1200 can be operated in various modes in a manner similar to non-linear neural network filter 1000.

(68) It can be shown that the various implementations of a reduced-complexity non-linear neural network filter shown above provide nearly as good performance as a non-reduced-complexity non-linear neural network filter, particularly when adapted using cross-entropy. However, the reduced complexity provides substantial savings in device area and power consumption.

(69) Although the implementations shown above receive two inputs (as in the case of a TDMR channel), implementations of the subject matter of this disclosure may include channels with only one input, or with three or more inputs. In such cases, the input delay lines may not be divided into groups (in the case of one input), or may be divided into three or more groups (in the case of three or more inputs), rather than being divided into two groups as shown), with each group receiving samples from one of the inputs.

(70) A method 1300 according to implementations of the subject matter of this disclosure is diagrammed in FIG. 13.

(71) Method 1300 begins at 1301 where non-linear equalization of digitized samples of input signals on the data channel is performed, including performing front-end filtering at 1311 to reduce numbers of the inputs from the digitized samples, performing non-linear filtering at 1321 on the reduced number of inputs from the digitized samples. At 1302, a respective value of each of the output signals is determined from output signals of the non-linear equalization. At 1303, parameters of the non-linear equalization are adapted based on respective ones of the value, and method 1300 ends.

(72) Thus it is seen that a high-speed data channel using a reduced-complexity non-linear equalizer has been provided.

(73) As used herein and in the claims which follow, the construction one of A and B shall mean A or B.

(74) It is noted that the foregoing is only illustrative of the principles of the invention, and that the invention can be practiced by other than the described embodiments, which are presented for purposes of illustration and not of limitation, and the present invention is limited only by the claims which follow.