Method and apparatus for determination of vectoring matrices

11115080 · 2021-09-07

Assignee

Inventors

Cpc classification

International classification

Abstract

A vectoring controller is configured to determine first coefficient values for a vectoring matrix at a first tone based on a first number of iterations through an iterative update algorithm and a first channel matrix estimate at the first tone, and to determine second coefficient values for the vectoring matrix at a second neighboring tone based on a second number of iterations through the iterative update algorithm and a second channel matrix estimate at the second tone. The vectoring controller is configured to start with the first coefficient values as initial values for the respective second coefficient values in the iterative update algorithm. The second number of iterations is lower than or equal to the first number of iterations.

Claims

1. A vectoring controller configured to determine a vectoring matrix that is used for joint processing of Discrete Multi-Tone DMT communication signals to be transmitted over, or received from, a plurality of subscriber lines, the vectoring controller being configured to determine first coefficient values for the vectoring matrix at a first tone based on a first number of iterations through an iterative update algorithm and based on a first channel matrix estimate at the first tone, and to determine second coefficient values for the vectoring matrix at a second neighboring tone based on a second number of iterations through the iterative update algorithm and based on a second channel matrix estimate at the second tone, wherein the vectoring controller is further configured to start with default coefficient values as initial values for the determination of the first coefficient values through the iterative update algorithm, and to start with the first coefficient values as initial values for the determination of the second coefficient values through the iterative update algorithm, and wherein the vectoring controller is further configured to set the second number of iterations to a value that is lower than the first number of iterations.

2. A vectoring controller according to claim 1, wherein the iterative algorithm is an iterative Minimum Mean Squared Error iMMSE update algorithm, and wherein the vectoring controller is further configured to set the second number of iterations to 1.

3. A vectoring controller according to claim 1, wherein the iterative algorithm is a Schulz update algorithm, and wherein the vectoring controller is further configured to set the second number of iterations to 1, 2 or 3.

4. A vectoring controller according to claim 1, wherein the vectoring controller is further configured to derive the first and second channel matrix estimates from raw Discrete Fourier Transform DFT samples of signals received from the subscriber lines while crosstalk probing signals are being transmitted over the subscriber lines.

5. A vectoring controller according to claim 1, wherein the vectoring controller is further configured to derive the first and second channel matrix estimates from slicer error samples of signals received from the subscriber lines while crosstalk probing signals are being transmitted over the subscriber lines.

6. A vectoring controller according to claim 1, wherein the vectoring controller is further configured to determine third coefficient values for the vectoring matrix at a third further-neighboring tone based on a third number of iterations through the iterative update algorithm and based on a third channel matrix estimate at the third tone, wherein the vectoring controller is further configured to start with the second coefficient values as initial values for the determination of the respective third coefficient values through the iterative update algorithm, and wherein the vectoring controller is further configured to set the third number of iterations to a value that is lower than to the first number of iterations.

7. A vectoring controller according to claim 6, wherein the first, second and third tones are tones with increasing or decreasing tone index.

8. A vectoring controller according to claim 7, wherein the first tone is selected from among a set of reference tones.

9. A vectoring controller according to claim 1, wherein the vectoring controller comprises a processor, a fast-access memory and a slower-access memory, wherein the slower-access memory is configured to hold the first channel matrix estimate, wherein the fast-access memory is configured to load the first channel matrix estimate from the slower-access memory, wherein the processor is configured to read the first channel matrix estimate from the fast-access memory, to determine the first coefficient values, and to write the first coefficient values into the fast-access memory, wherein the slower-access memory is configured to load the first coefficient values from the fast-access memory, and to hold the first coefficient values for further configuration of a vectoring processor, and wherein the first coefficient values are retained in the fast-access memory for further determination of the second coefficient values.

10. A vectoring controller according to claim 9, wherein the processor is further configured to determine the second coefficient values, and to substitute the second coefficient values for the first coefficient values in the fast-access memory, and wherein the second coefficient values are retained in the fast-access memory for further determination of the third coefficient values.

11. A vectoring controller according to claim 9, wherein the processor is configured to run multiple threads for determination of coefficient values for the vectoring matrix at respective tones.

12. An access node comprising a vectoring controller according to claim 1.

13. A method for determining a vectoring matrix that is used for joint processing of Discrete Multi-Tone DMT communication signals to be transmitted over, or received from, a plurality of subscriber lines, the method comprising: determining first coefficient values for the vectoring matrix at a first tone based on a first number of iterations through an iterative update algorithm and based on a first channel matrix estimate at the first tone; determining second coefficient values for the vectoring matrix at a second neighboring tone based on a second number of iterations through the iterative update algorithm and based on a second channel matrix estimate at the second tone; wherein the method further comprises starting with default coefficient values as initial values for the determining the first coefficient values through the iterative update algorithm, and starting with the first coefficient values as initial values for the determining the second coefficient values through the iterative update algorithm, and wherein the second number of iterations is set to a value that is lower than the first number of iterations.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) The above and other objects and features of the invention will become more apparent and the invention itself will be best understood by referring to the following description of an embodiment taken in conjunction with the accompanying drawings wherein:

(2) FIG. 1 represents an overview of an access plant;

(3) FIG. 2 represents further details about an access node;

(4) FIG. 3 is a plot of the performance of the original iMMSE algorithm versus the proposed iMMSE algorithm;

(5) FIG. 4 is a plot of the performance of the original Schulz method versus the proposed Schulz method;

(6) FIG. 5 represents further details about a vectoring controller; and

(7) FIG. 6 represents the readings and writings from and into respective memory units for determination of the vectoring matrices.

DETAILED DESCRIPTION OF THE INVENTION

(8) There is seen in FIG. 1 an access plant 1 comprising a network unit 10 at a CO, an access node 20 coupled via one or more optical fibers to the network unit 10, and further coupled via a copper plant to Customer Premises Equipment (CPE) 30 at various subscriber locations. The transmission media of the copper plant is typically composed of copper Unshielded Twisted Pairs (UTP).

(9) As an illustrative example, the copper plant comprises four subscriber lines L.sub.1 to L.sub.4 sharing a common access segment 40, and then going through dedicated loop segments 50 for final connection to CPEs 30.sub.1 to 30.sub.4 respectively.

(10) Within the common access segment 40, the subscriber lines L.sub.1 to L.sub.4 are in close vicinity and thus induce crosstalk into each other (see the arrows in FIG. 1 between the respective subscriber lines).

(11) The access node 20 comprises a Vectoring Processing Unit 21 (or VPU) for jointly processing the data symbols that are being transmitted over, or received from, the copper plant in order to mitigate the crosstalk and to increase the achievable data rates.

(12) The choice of the vectoring group, i.e. the set of communication lines whose communication signals are to be jointly processed, is rather critical. Within a vectoring group, each communication line is considered as a disturber line inducing crosstalk into the other communication lines of the group, and the same communication line is considered as a victim line incurring crosstalk from the other communication lines of the group. Crosstalk from lines that do not belong to the vectoring group is treated as alien noise and is not canceled. Ideally, the vectoring group should match the whole set of communication lines that physically and noticeably interfere with each other, else limited vectoring gains are to be expected.

(13) There is seen in FIG. 2 further details about an access node 100 and respective CPEs 200.

(14) The access node 100 comprises: N transceivers 110; a Vectoring Processing Unit (VPU) 120; and a Vectoring Control Unit (VCU) 130 for controlling the operation of the VPU 120.

(15) The N transceivers 110 are individually coupled to the VPU 120 and to the VCU 130. The VCU 130 is further coupled to the VPU 120.

(16) The N transceivers 110 individually comprise: a Digital signal Processor (DSP) 111; and an Analog Front End (AFE) 112.

(17) The N transceivers 110 are coupled to respective N transceivers 210 within CPEs 200 through N respective subscriber lines L.sub.1 to L.sub.N, which for convenience are assumed to form part of the same vectoring group.

(18) The N transceivers 210 individually comprise: a Digital Signal Processor (DSP) 211; and an Analog Front End (AFE) 212.

(19) The AFEs 112 and 212 individually comprise a Digital-to-Analog Converter (DAC) and an Analog-to-Digital Converter (ADC), a transmit filter and a receive filter for confining the signal energy within the appropriate communication frequency bands while rejecting out-of-band interference, a line driver for amplifying the transmit signal and for driving the transmission line, and a Low Noise Amplifier (LNA) for amplifying the receive signal with as little noise as possible.

(20) In case of Frequency Division Duplexing (FDD) operation where downstream and upstream communications operate simultaneously over the same transmission medium in distinct and non-overlapping frequency bands, the AFEs 112 and 212 further comprise a hybrid for coupling the transmitter output to the transmission medium and the transmission medium to the receiver input while achieving low transmitter-receiver coupling ratio. The AFE may further accommodate echo cancellation filters to reduce the coupling ratio at a further extent.

(21) In case of Time Duplexing Division (TDD) operation where downstream and upstream communications operate over the same frequency band but in distinct and non-overlapping time slots, the hybrid can be advantageously omitted as the transmitter and receiver operate in alternate mode: the receive circuitry is switched OFF (or the receive signal is discarded) while the transmit circuitry is active, and the way around, the transmit circuitry is switched OFF while the receive circuitry is active.

(22) The AFEs 112 and 212 further comprise impedance-matching circuitry for adapting to the characteristic impedance of the transmission medium, clipping circuitry for clipping any voltage or current surge occurring over the transmission medium, and isolation circuitry (typically a transformer) for pc-isolating the transceiver from the transmission medium.

(23) The DSPs 120 are for encoding and modulating user and control traffic into downstream DMT symbols, and for de-modulating and decoding user and control traffic from upstream DMT symbols.

(24) The following transmit steps are typically performed within the DSPs 111 and 211: data encoding, such as data multiplexing, framing, scrambling, error correction encoding and interleaving; signal modulation, comprising the steps of ordering the tones according to a tone ordering table, parsing the encoded bit stream according to the respective bit loadings of the ordered tones, and mapping each chunk of bits onto an appropriate transmit constellation point (with respective carrier amplitude and phase), possibly with Trellis coding; signal scaling, such as power normalization, transmit PSD shaping and fine gain scaling; Inverse Fast Fourier Transform (IFFT); cyclic Prefix (CP) insertion; and time-windowing.

(25) The following receive steps are typically performed within the DSPs 111 and 211: time-windowing and CP removal; Fast Fourier Transform (PPT); Frequency EQualization (FEQ); signal de-modulation and detection, comprising the steps of applying to each and every equalized frequency sample an appropriate constellation grid, the pattern of which depends on the respective bit loading, detecting the expected transmit constellation point and the corresponding transmit bit sequence, possibly with Trellis decoding, and re-ordering all the detected chunks of bits according to the tone ordering table; and data decoding, such as data de-interleaving, error correction decoding, de-scrambling, frame delineation and de-multiplexing.

(26) Some of these transmit or receive steps can be omitted, or some additional steps can be present, depending on the exact digital communication technology being used.

(27) The DSPs 120 are further configured to operate a Special Operation Channel (SOC) for initializing a bi-directional communication session over a subscriber line, and an Embedded Operation Channel (EOC) for transporting diagnosis, management or On-Line Reconfiguration (OLR) commands and responses. The DSPs 120 are further configured to run the respective management entities for controlling the communication parameters of the various protocol layers in line with a Management Information Base (MIB). For G.fast, the DSPs 120 are further configured to operate a Robust Management Channel (RMC) for fast adaptation of the TDD framing parameters.

(28) The DSPs 111 are further configured to supply transmit frequency samples u.sub.k to the VPU 120 before Inverse Fast Fourier Transform (IFFT) for joint signal precoding, and to supply receive frequency samples y.sub.k to the VPU 120 after Fast Fourier Transform (FFT) for joint signal post-processing.

(29) The DSPs 111 are further configured to receive pre-compensated transmit samples x.sub.k from the VPU 120 for further transmission, and to receive post-compensated receive samples y′.sub.k from the VPU 120 for further detection. Alternatively, the DSPs 111 may receive correction samples to add to the initial frequency samples before further transmission or detection.

(30) The VPU 120 is configured to mitigate the crosstalk induced over the subscriber lines L.sub.1 to L.sub.N. The VPU 120 comprises a linear precoder configured to multiply a vector u.sub.k of transmit frequency samples with a precoding matrix P.sub.k in order to pre-compensate an estimate of the expected crosstalk, and a linear postcoder configured to multiply a vector of receive frequency samples y.sub.k with a postcoding matrix Q.sub.k so as to post-compensate an estimate of the incurred crosstalk.

(31) In the matrix P.sub.k or Q.sub.k, a row i is associated with a particular victim line L.sub.i, while a column j is associated with a particular disturber line L.sub.j.

(32) The VCU 130 is basically for controlling the operation of the VPU 120, and more specifically for estimating the channel couplings between the respective subscriber lines of the vectoring group, and for initializing and updating the coefficients of the precoding matrix P.sub.k and of the postcoding matrix Q.sub.k from the so-estimated channel couplings.

(33) The various channel couplings are estimated based on pilot signals (a.k.a crosstalk probing signals) transmitted over the vectored lines. The pilot signals are typically transmitted during dedicated time periods and/or over dedicated tones.

(34) For instance, in 6.993.5 ITU recommendation (vectored VDSL2), the transceiver units send pilot signals on the so-called SYNC symbols. The SYNC symbols occur periodically after every super frame, and are transmitted synchronously over all the vectored lines (super frame alignment). A similar technique has been adopted in G.fast.

(35) On a given disturber line, a subset of the tones of a SYNC symbol (pilot tones hereinafter) are all 4-QAM modulated by the same pilot digit from a given pilot sequence, and transmit one of two complex constellation points, either ‘1+j’ corresponding to ‘+1’ or ‘−1−j’ corresponding to ‘−1’ (vectored VDSL2); or transmit one of three complex constellation points, either ‘1+j’ corresponding to ‘−1’ or ‘−1−j’ corresponding to ‘−1’ or ‘0+0j’ corresponding to ‘0’ (G.fast).

(36) On a given victim line, both the real and imaginary part of the received DFT sample before equalization (G.fast), or of the normalized slicer error, which is the difference vector between the received and properly equalized DFT sample and the constellation point onto which this DFT sample is demapped (vectored VDSL2 and G.fast), are measured on a per pilot tone basis and reported to the VCU 130 for estimation of the various channel couplings.

(37) The successive error samples gathered over a given victim line are next correlated with the pilot sequence used over a given disturber line in order to obtain an estimate of the channel coupling from the given disturber line into the given victim line. To reject the crosstalk contributions from the other disturber lines, the pilot sequences used over the respective disturber lines are mutually orthogonal (e.g., Walsh-Hadamard sequences).

(38) The channel estimates are eventually used for initializing or updating the coefficients of the precoding matrix P.sub.k or of the postcoding matrix Q.sub.k.

(39) Presently, the VCU 130 starts first by configuring the transceivers 110 and 210 with the respective pilot sequences to use for modulation of the pilot tones of the SYNC symbols. The pilot sequences comprises T pilot digits using {+1, −1} or {+1, 0, −1} as alphabet. The pilot digit that modulates a given tone k during pilot symbol position t over line L.sub.i is denoted as w.sub.i,k.sup.t.

(40) The SYNC symbols are not processed through the VPU 120 in order to target the channel matrix per se.

(41) The VCU 130 next gathers measurement samples as measured by the transceivers 110 and 210 while SYNC symbols are being transmitted. The measurement sample as measured by the transceiver 110i or 210i over a victim line L.sub.i at tone k during pilot symbol position t is denoted as e.sub.i,k.sup.t.

(42) The VCU 130 correlates T measurement samples e.sub.i,k.sup.t=.sub.t.sub.0.sub.. . . t.sub.0.sub.+T−1 as measured over a given victim line L.sub.i during a complete acquisition cycle with the T pilot digits w.sub.j,k.sup.t=.sub.t.sub.0.sub.. . . t.sub.0.sub.+T−1 the pilot sequence used over a given disturber line L.sub.j so as to obtain an estimate of the channel coupling h.sub.ij,k from the disturber line L.sub.j into the victim line L.sub.i at frequency index k. As the pilot sequences are mutually orthogonal, the contributions from the other disturber lines reduce to zero after this correlation step.

(43) The VCU 130 determines estimates H.sub.k of the channel matrix or of the normalized channel matrix at respective tones k based on these correlation results. The nominal channel matrix is derived from a measure of the raw receive signals before equalization, whereas the normalized channel matrix—normalization is with respect to the direct channel gains—is derived from a measure of the slicer errors after channel equalization.

(44) The VCU 130 is configured to determine the coefficients of the precoding matrix P.sub.k or postcoding matrix Q.sub.k at respective tones k based on the estimates H.sub.k of the channel matrix at respective tones k, and by means of an iterative update algorithm.

(45) The VCU 130 first selects a set of reference tones k.sub.REF={k.sub.1, k.sub.2, . . . } from all the available tones. A reference tone is a tone whose precoder or postcoder is initialized with no neighboring channel information (e.g., initialized with the identity matrix I), and where a relatively large number of iterations through the iterative update algorithm are allowed if necessary.

(46) The reference tones can be evenly spaced through the entire communication bandwidth. Alternatively, the spacing between the reference tones can be a function of the channel coherence, for instance higher spacing for the low-frequency range (well-conditioned channel with large coherence bandwidth) and closer spacing for the high-frequency range (ill-conditioned channel with narrow coherence bandwidth).

(47) For convenience, it is further assumed that 1 . . . K are the K available tones, that tones are processed in ascending order from 1 to K, and that tone k=1 belongs to the set of reference tones k.sub.REF in order to boot-up the iterative algorithm.

(48) For each reference tone k.sub.i∈k.sub.REF, the VCU 130 initializes the coefficients of the precoding matrix P.sub.ki or postcoding matrix Q.sub.ki to some default value, typically the identity matrix I, and next determines some values for those coefficients based on the channel matrix estimate H.sub.ki, and by means of successive iterations through the iterative update algorithm till some convergence criteria is met. The values computed for the coefficients of the precoding matrix P.sub.ki or postcoding matrix Q.sub.ki are then re-input to the iterative algorithm and used as initial starting values for determination of the coefficients of the precoding matrix P.sub.ki+1 or postcoding matrix Q.sub.ki+1 at next tone k.sub.i+1. The VCU 130 is then able to determine the precoding matrix P.sub.ki+1 or postcoding matrix Q.sub.ki+1 based on the channel matrix estimate H.sub.k i+1, and by means of one or two iterations at most through the iterative update algorithm. In turn, the values computed for the coefficients of the precoding matrix P.sub.ki+1 or postcoding matrix Q.sub.ki+1 are re-input to the iterative algorithm and used as initial starting values for determination of the coefficients of the precoding matrix P.sub.ki+2 or postcoding matrix Q.sub.ki+2 at next tone k.sub.i+2, and so forth with the subsequent tones till a new reference tone is met.

(49) The VCU 130 can also process the tones in decreasing order, starting from a reference tone ki, and use the precoding matrix P.sub.ki or postcoding matrix Q.sub.ki computed at tone ki as input at tone k.sub.i−1, and so forth.

(50) Also, the VCU 130 does not need to run the iterative algorithm on each and every tone. Instead, the VCU 130 can use the proposed method for non-consecutive tones provided they are not too far apart from each other (i.e., within the channel coherence bandwidth), and then rely on interpolation to determine the vectoring coefficients at the intermediate tones in-between.

(51) One benefit of using multiple reference tones is the ability to parallelization: multiple threads can be executed in parallel, each thread starting with a reference tone of the set k.sub.REF and going through all successive tones up to the next reference tone. Alternative multi-threading schemes can be used as well.

(52) Another benefit of using multiple reference tones is the improved robustness: by having more than one start point, we avoid that the iterative algorithm gets stuck in a local optimum for most of the bandwidth.

(53) In one embodiment, the VCU 130 makes use of the iMMSE algorithm to determine the precoding matrix P.sub.k or postcoding matrix Q.sub.k. A good description of the iMMSE algorithm is given in the paper entitled “Weighted Sum-Rate Maximization using Weighted MMSE for MIMO-BC Beamforming Design” from Christensen et al. published in IEEE Transactions on Wireless communications magazine, vol. 7, No 12 in December 2008.

(54) A pseudo-code for computing the precoding matrix P.sub.k by means of the iMMSE algorithm is given as follows:

(55) For k=1:K

(56) If k.sub.∈k.sub.REF

(57) then i.sub.MAX=5; #exemplary value P.sub.k=I;

(58) Else i.sub.MAX=1; P.sub.k=P.sub.k−1;

(59) End;

(60) i=0;

(61) Do i++; C.sub.k=H.sub.kP.sub.k; d.sub.k=diag(C.sub.k); n.sub.k=sum(abs(C.sub.k).sup.2,2)−abs(d.sub.k).sup.2+diag(K.sub.zz,k); R.sub.k=DIAG(d.sub.k.sup.H/(abs(d.sub.k).sup.2+n.sub.k)); W.sub.k=DIAG(1+(abs(d.sub.k).sup.2 ./ n.sub.k)); α=trace(W.sub.kR.sub.kK.sub.zz,kR.sub.k.sup.H)/Nm.sub.k; P.sub.k=(H.sub.k.sup.HR.sub.k.sup.HW.sub.kR.sub.kH.sub.k+αI).sup.−1H.sub.k.sup.HR.sub.k.sup.HW.sub.k; P.sub.k=sqrt(Nm.sub.k/trace(P.sub.kP.sub.k.sup.H))P.sub.k;

(62) Until Conv_Criteria OR (i==i.sub.MAX);

(63) End;

(64) The mathematical notations used in this pseudo-algorithm read as follows: A.sup.H denotes the Hermitian (i.e., the conjugate transpose) of matrix A; trace(A) denotes the trace of matrix A; diag(A) picks up the diagonal coefficients of matrix A and outputs a vector; DIAG(a) outputs a diagonal matrix with the coefficients of vector a as diagonal coefficients and with zeros as off-diagonal coefficients; abs(a) denotes the coefficient-wise complex magnitude operator; ./ denotes the coefficient-wise division operator; sum(A,2) denotes the summation of the row elements of matrix A and outputs a vector; and sqrt(a) denotes the square root operator for scalar a.

(65) In this pseudo-algorithm, C.sub.k=H.sub.kP.sub.k is the concatenated channel matrix; d.sub.k are the direct channel gains of the concatenated channel (i.e., the diagonal elements of matrix C.sub.k); K.sub.zz,k=E(z.sub.kz.sub.k.sup.H) denotes the noise covariance matrix; R.sub.k is the optimal linear MMSE receive filter to be used at receive side in conjunction with precoding matrix P.sub.k at transmit side to achieve the optimal aggregate data rate; m.sub.k is the discrete transmit power mask to comply with; i is the iteration index; i.sub.MAX is the maximum number of iterations allowed irrespective of whether the convergence criteria is fulfilled or not; and Conv_Criteria is a Boolean determining whether the convergence criteria is fulfilled with the newly-updated precoding matrix P.sub.k (TRUE) or not (FALSE).

(66) As an example of convergence criteria, one may compute the Frobenius norm sqrt(trace((P.sub.k.sup.(i)−P.sub.k.sup.(i−1))(P.sub.k.sup.(i)−P.sub.k.sup.(i−1)).sup.H)) between two successive iterations i−1 and i, and test whether this norm is less than a given threshold ε. If so, then additional iterations are not expected to substantially change the values of the precoding coefficients, and convergence towards the optimal value is assumed to be achieved.

(67) The prior art solution initializes every tone with the identity matrix I (no channel knowledge), and iterates up to 10 times before convergence. In our proposal, only the reference tones are initialized with the identity matrix I. For the subsequent tones, channel knowledge is already built in the precoder as calculated for the previous tones. Because of tone correlation, initializing the iterative algorithm for tone k+1 with the precoder at tone k maintains performance and saves in computational cost up to 10 times in comparison to the original algorithm.

(68) We can even compromise on performance at the reference tones and still expect the algorithm to gradually converge to an optimal value for the subsequent tones as channel knowledge is building up across neighboring tones. Then a lower number of iterations may be sufficient too for the reference tones, like in the pseudo-code above where 5 iterations at most are used for the reference tones.

(69) Note that, for lower tones, the VDSL2 and G.fast bands coincide. For these lower tones we can count on the structure of the channel. Particularly for the low-frequency tones, the iMMSE algorithm above basically converges to the ZF precoder in one iteration only.

(70) The performance of the original method and of the algorithm above are illustrated in FIG. 3, which represents the data rate performance of a 48 users G.fast 212 MHz communication system. The achieved data rates are plotted for respective maximum numbers of iterations in the original iMMSE and for the proposed algorithm.

(71) The number of reference tones needed to achieve good performance is rather small. For the performance of FIG. 3, only 80 equally-spaced tones out of the 4096 tones were used as reference tones. In each of them, a maximum of 5 iterations were allowed.

(72) We observe how the former converges slowly, in about 10 iterations. The proposed algorithm performs almost equally well, with a 10-fold decrease in computational cost.

(73) For the postcoding matrix Q.sub.k, the iMMSE algorithm yields the linear MMSE receive filter, and can be computed in one shot without any iteration.

(74) In another alternative embodiment, the VCU 130 makes use of the Schulz method to determine the precoding matrix P.sub.k or postcoding matrix Q.sub.k.

(75) The Schulz method has been designed for computing matrix inverses through successive iterations, and thus is especially suited for ZF precoding or postcoding.

(76) For ZF precoding, the channel inverse is computed through the following iterative update formula:
P.sub.k=P.sub.k(2I−H.sub.kP.sub.k).

(77) This formula is guaranteed to converge to the inverse of H.sub.k when the initial P.sub.k is close enough to the inverse, or when the initial P.sub.k is chosen as P.sub.k=αH.sub.k.sup.H where a is taken in the interval [0, 2/ρ(H.sub.kP.sub.k−I)] with ρ referring to any function that upper bounds the spectral radius. Given that the channel does not change much from tone to tone, we exploit the best solution of the previous tone as an initial value for the current tone. simulations demonstrate that this speeds up convergence with a 10-fold factor.

(78) A pseudo-algorithm for computing the precoding matrix P.sub.k by means of the iterative Schulz method is given as follows:

(79) For k=1:K

(80) If k.sub.∈k.sub.REF

(81) then i.sub.MAX=20; #exemplary value P.sub.k=H.sub.k.sup.H; e=∥H.sub.kP.sub.k−I∥.sub.F; If e≥1 Then α=1/(1+e).sup.2; P.sub.k=αH.sub.k.sup.H; End

(82) Else i.sub.MAX=2; P.sub.k=P.sub.k−1;

(83) End;

(84) i=0;

(85) Do i++; P.sub.k=P.sub.k(2I−H.sub.kP.sub.k)

(86) Until Convergence Criteria OR (i==i.sub.MAX);

(87) End

(88) wherein ∥A∥.sub.F=sqrt(trace(AA.sup.H)) denotes the Frobenius norm of matrix A.

(89) For ZF postcoding, the channel inverse is computed through the following iterative update formula:
Q.sub.k=(2I−Q.sub.kH.sub.k)Q.sub.k.

(90) A pseudo-algorithm for computing the postcoding matrix Q.sub.k by means of the iterative Schulz method is given as follows:

(91) For k=1:K

(92) If k.sub.∈k.sub.REF

(93) then i.sub.MAX=20; Q.sub.k=H.sub.k.sup.H; e=∥H.sub.kP.sub.k−I∥.sub.F; If e≥1 Then α=1/(1+e).sup.2; Q.sub.k=αH.sub.k.sup.H; End

(94) Else i.sub.MAX=2; Q.sub.k=Q.sub.k−1;

(95) End;

(96) i=0;

(97) Do i++; Q.sub.k=(2I−Q.sub.kH.sub.k)Q.sub.k

(98) Until Convergence Criteria OR (i==i.sub.MAX);

(99) End

(100) There is plotted in FIG. 4 the simulation results between the perfect inverse method and the proposed Schulz iterative method exploiting tone correlation. Simulations demonstrate that only two iterations per tone are necessary to converge to within 1% of rate accuracy compared to a perfect inverse. The Schulz iterations are particularly interesting in a system where the matrix multiplication operations are accelerated to the maximum, either by straight forward hardware acceleration or increased parallelization.

(101) The high-level hardware architecture for the implementation of the algorithms above is depicted in FIG. 5, wherein further details about the VPU 120 and the VCU 130 are shown.

(102) The VPU 120 comprises a vectoring processor 121 for jointly processing the transmit user samples and the receive data samples, and a working memory 122 (or M3 memory) wherein the precoding and postcoding coefficients to be used for crosstalk mitigation are stored.

(103) The VCU 130 is shown as comprising a generic Central Processing Unit (CPU) 131, and a slow-access memory 132 (or M1 memory), such as DDR memory. The VCU 130 further comprises a dedicated processing unit 133 with hardware acceleration for efficiently computing the precoding matrix P.sub.k or postcoding matrix Q.sub.k, such as a Digital Signal Processor (DSP), and a fast-access memory 134 (or M1 memory), such as Level 1 (L1) cache memory.

(104) The processing units 121, 131 and 133 are coupled to the memory units 122, 132 and 134 through a memory bus. The CPU 131 is further coupled to the dedicated processor 133 and to the vectoring processor 121.

(105) The configuration of the VPU 120 is two-fold: first, the coefficients of the precoding matrix P.sub.k or of the postcoding matrix Q.sub.k are computed within the VCU's internal memory 132 and 134. Second, the newly-computed vectoring coefficients are pushed by the VCU 130 into the VPU's working memory 122. In order not to disturb the VPU operation, the new vectoring coefficients are written within an unused memory area of the memory 122. Then, from a given DMT symbol onwards, a pointer pointing towards the active set of vectoring coefficients to be used by the VPU 120 as effective precoding matrix P.sub.k or postcoding matrix Q.sub.k is switched towards the memory area where the new vectoring coefficients have been written to, thereby releasing the memory area where the previous vectoring coefficients were stored and allowing smooth transition between the two sets of vectoring coefficients. And so forth with the next VPU update. The VPU 120 can be updated on a per-tone basis or on a per group of tones basis.

(106) There is further shown in FIG. 5 a Direct Memory Access (DMA) controller 140 coupled to the memory units 122, 132 and 134 for optimal data transfer there between. The DMA controller 140 can be triggered on-purpose whenever a big chunk of data need to be transferred from one place of the memories 122, 132 or 134 to another without involving the processors for this task. The presence of the DMA controller 140 is optional.

(107) There is seen in FIG. 6 the successive writings and readings into the respective memory units during the determination of the precoding matrix P.sub.k or postcoding matrix Q.sub.k at successive tones.

(108) First, the channel matrix at the respective tones is estimated as aforementioned, and the channel matrix estimates H.sub.k at the respective tones k are stored in the slow-access memory M2 for further use.

(109) The channel estimate H.sub.ki at reference tone k.sub.i is loaded from the slow-access memory M2 into the fast-access memory M1. This task can be performed directly by the CPU 131, or by the DMA controller 140 upon trigger from the CPU 131.

(110) The CPU 131 then makes a call to the dedicated processor 133 with as input parameter, a pointer pointing towards the channel matrix estimates H.sub.ki in memory M1. Thereupon, the dedicated processor 133 reads the channel estimates H.sub.ki from the fast-access memory M1, and determines an optimal precoding matrix P.sub.ki or postcoding matrix Q.sub.ki by means of successive iterations through the iterative update algorithm. The respective numbers of iterations through the iterative update algorithm are indicated along loop-back circles for illustrative purpose. Presently, the maximum number of 5 iterations were used for determining the vectoring coefficients at reference tone k.sub.i (see “n.sub.1=5” in FIG. 6).

(111) While the precoding matrix P.sub.ki or postcoding matrix Q.sub.ki is being computed, the channel matrix estimate H.sub.ki+1 at next tone k.sub.i+1 is already loaded from the slow-access memory M2 into the fast-access memory M1.

(112) When computation completes, the dedicated processor 133 writes the computed precoding matrix P.sub.ki or postcoding matrix Q.sub.ki at reference tone k.sub.i into the fast-access memory M1, and returns the call from the CPU 131 with as output parameter, a pointer pointing towards the computed precoding matrix P.sub.ki or postcoding matrix Q.sub.ki in memory M1.

(113) The computed precoding matrix P.sub.ki or postcoding matrix Q.sub.ki is then loaded from the fast-access memory M1 into the slow-access memory M2 for further configuration of the VPU 120. Again this task can be performed directly by the CPU 131 or by the dedicated processor 133, or by the DMA controller 140 upon trigger from the CPU 131 or the dedicated processor 133.

(114) With the proposed scheme, the computed precoding matrix P.sub.ki or postcoding matrix Q.sub.ki at tone k.sub.i is held in the fast-access memory M1 and re-input to the dedicated processor 133 for determination of the precoding matrix P.sub.ki+1 or postcoding matrix Q.sub.ki+1 at neighboring tone k.sub.i+1.

(115) The CPU 131 then makes a call to the dedicated processor 133 with as first input parameter, a first pointer pointing towards the channel matrix estimates H.sub.ki+1 in memory M1, and as second input parameter, another pointer pointing towards the precoding matrix P.sub.ki or postcoding matrix Q.sub.ki in memory M1 as previously determined by the dedicated processor 133 at previous tone k.sub.i.

(116) The later is used as initial starting value in the iterative update algorithm for determination of the precoding matrix P.sub.ki+1 or postcoding matrix Q.sub.ki+1 at tone k.sub.i+1, thereby substantially reducing the required number of iterations. Presently, only one iteration was used for determining the vectoring coefficients at neighboring tones k.sub.i+1 (see “n.sub.2=1” in FIG. 6).

(117) The dedicated processor 133 returns the computed precoding matrix P.sub.ki+1 or postcoding matrix Q.sub.ki+1 at neighboring tone k.sub.i+1, which is again transferred from the fast-access memory M1 to the slow-access memory M2 for further configuration of the VPU 120. And again, the computed precoding matrix P.sub.ki+1 or postcoding matrix Q.sub.ki+1 at tone k.sub.i+1 is held in the fast-access memory M1 and re-input to the dedicated processor 133 for determination of the precoding matrix P.sub.ki+2 or postcoding matrix Q.sub.ki+2 at next tone k.sub.i+2. And so forth with the subsequent tones as shown in FIG. 6.

(118) For an optimal data transfer, the communication between the slow-access and fast-access memory units should be symmetrical, but its value is limited by the bus width and the memory technology used (e.g., single port, dual port, etc.). Depending on the memory technology used, reading and writing into the fast-access memory unit M1 could be done sequentially, or in parallel, or a mix of the two.

(119) It is to be noticed that the term ‘comprising’ should not be interpreted as being restricted to the means listed thereafter. Thus, the scope of the expression ‘a device comprising means A and B’ should not be limited to devices consisting only of components A and B. It means that with respect to the present invention, the relevant components of the device are A and B.

(120) It is to be further noticed that the term ‘coupled’ should not be interpreted as being restricted to direct connections only. Thus, the scope of the expression ‘a device A coupled to a device B’ should not be limited to devices or systems wherein an output of device A is directly connected to an input of device B, and/or vice-versa. It means that there exists a path between an output of A and an input of B, and/or vice-versa, which may be a path including other devices or means.

(121) The description and drawings merely illustrate the principles of the invention. It will thus be appreciated that those skilled in the art will be able to devise various arrangements that, although not explicitly described or shown herein, embody the principles of the invention. Furthermore, all examples recited herein are principally intended expressly to be only for pedagogical purposes to aid the reader in understanding the principles of the invention and the concepts contributed by the inventor(s) to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions. Moreover, all statements herein reciting principles, aspects, and embodiments of the invention, as well as specific examples thereof, are intended to encompass equivalents thereof.

(122) The functions of the various elements shown in the figures may be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared. Moreover, a processor should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, Digital signal Processor (DSP) hardware, network processor, Application specific Integrated circuit (ASIC), Field Programmable Gate Array (FPGA), etc. Other hardware, conventional and/or custom, such as Read Only Memory (ROM), Random Access Memory (RAM), and non volatile storage, may also be included.