Dual QR decomposition decoder for spatially multiplexed MIMO signals

09577736 ยท 2017-02-21

Assignee

Inventors

Cpc classification

International classification

Abstract

Wireless communication systems employ Multiple Input Multiple Output (MIMO) transmission and reception schemes to increase performance and the data rate of the system. A new approach for an SM-MIMO decoder that operates on the received symbols in parallel is presented. The new approach performs two different QR decompositions of the estimated channel matrix and produces two triangular matrices; one is right triangular and the other is left triangular. The modified systems of equations are processed in parallel. After each M-algorithm process has processed half of the total number of stages, total search space for the globally optimal transmitted symbol vector is reduced significantly. Finally, cumulative distance metrics are computed for the symbol sequences in the reduced search space and a global minimum is determined for the estimated transmitted symbol vector. This approach offers faster processing of the SM-MIMO signals and reduced distance metric computations and search operations.

Claims

1. A method of decoding spatially multiplexed signals received by a wireless device, the method comprising: receiving, using a plurality of receive chains, spatially multiplexed signals including a plurality of symbols from a transmitting device; deriving, using one or more processing devices, an estimated channel matrix H from the plurality of received symbols; decomposing, using the one or more processing devices, the estimated channel matrix H into first and second unitary matrices Q.sup.1 and Q.sup.2, and first and second triangular matrices R.sup.1 and R.sup.2, wherein R.sup.1 is an upper right triangular matrix and R.sup.2 is a lower left triangular matrix; applying, using the one or more processing devices, a first M-algorithm process to a bottom set of N.sub.t/2 rows of a system of equations y.sup.1=R.sup.1s+w.sup.1 to obtain a first set of M candidates, wherein N.sub.t identifies a number of receive chains, y.sup.1 is a first rotated received signal vector, s is a transmitted symbol vector and w.sup.1 is a first rotated noise vector; applying, using the one or more processing devices, a second M-algorithm process to a top set of N.sub.t/2 rows of a system of equations y.sup.2=R.sup.2s+w.sup.2 to obtain a second set of M candidates, wherein y.sup.2 is a second rotated received signal vector and w.sup.2 is a second rotated noise vector; performing, using the one or more processing devices, a distance determination over M*M candidates by combining the first and second sets of M candidates from the top set and the bottom set, wherein M identifies a number of candidate neighbors; and obtaining, using the one or more processing devices, a candidate from among the M*M candidates having a global minimum distance to select a final decoded symbol vector identifying a given one of the plurality of received symbols.

2. The method of claim 1, wherein the first and second M-algorithm processes are done in parallel.

3. The method of claim 1, further comprising demodulating the received symbols of y.sup.1 and y.sup.2 without computing any distance metrics.

4. The method of claim 3, wherein the demodulation is performed by the one or more processing devices by quadrant based demodulation of the received symbols.

5. The method of claim 1, further comprising selecting a set of N.sub.b nearest neighbor symbol sequences, wherein the number of nearest neighbor symbol sequences is constrained to be less than a constellation size L of the received symbols and greater than or equal to M.

6. The method of claim 5, wherein, when N.sub.b is equal to M, completing any remaining M-algorithm processing.

7. The method of claim 5, wherein when N.sub.b is greater than M, the method further comprises: determining, by the one or more processing devices, distance metrics for the bottommost row of y.sup.1=R.sup.1s+w.sup.1 over the N.sub.b candidates; selecting, from the determined distance metrics, the M candidates having the lowest distance for a next stage of M-algorithm processing; and completing any remaining M-algorithm processing.

8. A wireless receiver apparatus configured to decode spatially multiplexed signals, the apparatus comprising: a plurality of receive chains configured to receive spatially multiplexed signals including a plurality of symbols from a transmitting device; and one or more processing devices operatively coupled to the plurality of receive chains, the one or more processing devices being configured to: derive an estimated channel matrix H from the plurality of received symbols; decompose the estimated channel matrix H into first and second unitary matrices Q.sup.1 and Q.sup.2, and first and second triangular matrices R.sup.1 and R.sup.2, wherein R.sup.1 is an upper right triangular matrix and R.sup.2 is a lower left triangular matrix; apply a first M-algorithm process to a bottom set of N.sub.t/2 rows of a system of equations y.sup.1=R.sup.1s+w.sup.1 to obtain a first set of M candidates, wherein N.sub.t identifies a number of receive chains, y.sup.1 is a first rotated received signal vector, s is a transmitted symbol vector and w.sup.1 is a first rotated noise vector; apply a second M-algorithm process to a top set of N.sub.t/2 rows of a system of equations y.sup.2=R.sup.2s+w.sup.2 to obtain a second set of M candidates, wherein y.sup.2 is a second rotated received signal vector and w.sup.2 is a second rotated noise vector; perform a distance determination over M*M candidates by combining the first and second sets of M candidates from the top set and the bottom set, wherein M identifies a number of candidate neighbors; and obtain a candidate from among the M*M candidates having a global minimum distance to select a final decoded symbol vector identifying a given one of the plurality of received symbols.

9. The apparatus of claim 8, wherein the first and second M-algorithm processes are performed in parallel by the one or more processing devices.

10. The apparatus of claim 8, wherein the one or more processing devices are further configured to demodulate the received symbols of y.sup.1 and y.sup.2 without computing any distance metrics.

11. The apparatus of claim 10, wherein the demodulation is performed by the one or more processing devices by quadrant based demodulation of the received symbols.

12. The apparatus of claim 8, wherein the one or more processing devices are further configured to select a set of N.sub.b nearest neighbor symbol sequences, wherein the number of nearest neighbor symbol sequences is constrained to be less than a constellation size L of the received symbols and greater than or equal to M.

13. The apparatus of claim 12, wherein, upon selection of N.sub.b to equal to M, the one or more processing devices are further configured to complete any remaining M-algorithm processing.

14. The apparatus of claim 12, wherein when N.sub.b is selected to be greater than M, the one or more processing devices are further configured to: determine distance metrics for the bottommost row of y.sup.1=R.sup.1s+w.sup.1 over the N.sub.b candidates; select the M candidates having the lowest distance for a next stage of M-algorithm processing; and complete any remaining M-algorithm processing.

15. A non-transitory recording medium storing instructions thereon, the instructions, when executed by one or more processing devices, cause the one or more processing devices to execute a method of decoding spatially multiplexed signals received by a wireless device, the method comprising: receiving, using a plurality of receive chains, spatially multiplexed signals including a plurality of symbols from a transmitting device; deriving, using one or more processing devices, an estimated channel matrix H from the plurality of received symbols; decomposing, using the one or more processing devices, the estimated channel matrix H into first and second unitary matrices Q.sup.1 and Q.sup.2, and first and second triangular matrices R.sup.1 and R.sup.2, wherein R.sup.1 is an upper right triangular matrix and R.sup.2 is a lower left triangular matrix; applying, using the one or more processing devices, a first M-algorithm process to a bottom set of N.sub.t/2 rows of a system of equations y.sup.1=R.sup.1s+w.sup.1 to obtain a first set of M candidates, wherein N.sub.t identifies a number of receive chains, y.sup.1 is a first rotated received signal vector, s is a transmitted symbol vector and w.sup.1 is a first rotated noise vector; applying, using the one or more processing devices, a second M-algorithm process to a top set of N.sub.t/2 rows of a system of equations y.sup.2=R.sup.2s+w.sup.2 to obtain a second set of M candidates, wherein y.sup.2 is a second rotated received signal vector and w.sup.2 is a second rotated noise vector; performing, using the one or more processing devices, a distance determination over M*M candidates by combining the first and second sets of M candidates from the top set and the bottom set, wherein M identifies a number of candidate neighbors; and obtaining, using the one or more processing devices, a candidate from among the M*M candidates having a global minimum distance to select a final decoded symbol vector identifying a given one of the plurality of received symbols.

16. The non-transitory recording medium of claim 15, wherein the method further comprises demodulating the received symbols of y.sup.1 and y.sup.2 without computing any distance metrics.

17. The non-transitory recording medium of claim 16, wherein the demodulation is performed by the one or more processing devices by quadrant based demodulation of the received symbols.

18. The non-transitory recording medium of claim 15, wherein the method further comprises selecting a set of N.sub.b nearest neighbor symbol sequences, wherein the number of nearest neighbor symbol sequences is constrained to be less than a constellation size L of the received symbols and greater than or equal to M.

19. The non-transitory recording medium of claim 18, wherein, when N.sub.b is equal to M, the method further comprises completing any remaining M-algorithm processing.

20. The non-transitory recording medium of claim 18, wherein when N.sub.b is greater than M, the method further comprises: determining, by the one or more processing devices, distance metrics for the bottommost row of y.sup.1=R.sup.1s+w.sup.1 over the N.sub.b candidates; selecting, from the determined distance metrics, the M candidates having the lowest distance for a next stage of M-algorithm processing; and completing any remaining M-algorithm processing.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 illustrates a conventional mobile wireless communication system.

(2) FIG. 2 illustrates a wireless mobile station diagram, which may be employed with aspects of the invention described herein.

(3) FIG. 3 illustrates a baseband subsystem for a wireless mobile station, which may be employed with aspects of the invention described herein.

(4) FIG. 4 illustrates an RF subsystem for a wireless mobile station, which may be employed with aspects of the invention described herein.

(5) FIG. 5 illustrates the receive chain of a wireless communication system, which may be employed with aspects of the invention described herein.

(6) FIG. 6 illustrates the transmit chain of a wireless communication system, which may be employed with aspects of the invention described herein.

(7) FIG. 7 illustrates an example of a SM-MIMO wireless communication system with four transmit chains at the transmit entity and four receive chains at the receive entity, which may be employed with aspects of the invention described herein.

(8) FIG. 8 illustrates the constellation of 16-QAM, which may be employed with aspects of the invention described herein.

(9) FIG. 9 illustrates the constellation of 64-QAM, which may be employed with aspects of the invention described herein.

(10) FIG. 10 illustrates a QRD-M SM decoder.

(11) FIG. 11 illustrates the general processing flow diagram of the M-algorithm.

(12) FIG. 12 illustrates an example of a Dual M-approach in accordance with aspects of the present invention.

(13) FIG. 13 illustrates an example of processing of the upper half and lower half of the system of equations for N.sub.t=4 in accordance with aspects of the present invention.

(14) FIG. 14 illustrates an example processing flow diagram of a dual M-algorithm approach in accordance with aspects of the present invention.

(15) FIG. 15 illustrates an example of a center point of a quadrant for 64-QAM constellation.

(16) FIG. 16 illustrates an example quadrant based demodulator for 64-QAM.

(17) FIG. 17 illustrates a table for nearest neighbor sequence symbols for the 16-QAM constellation, which may be employed with aspects of the invention described herein.

(18) FIG. 18 contains table for nearest neighbor sequence symbols for the 64-QAM constellation, which may be employed with aspects of the invention described herein.

(19) FIG. 19 illustrates example processing of the first stage of an M-algorithm using quadrant based demodulation and pre-computed nearest neighbor tables in accordance with aspects of the present invention.

DETAILED DESCRIPTION

(20) The present invention describes a method and apparatus of a QRD-M SM decoder that has reduced complexity and reduced latency when compared to that of the conventional QRD-M SM decoder. To describe the invention, the conventional QRD-M SM decoder is briefly described next.

(21) The conventional QRD-M SM decoder consists of two main processing blocks as shown in FIG. 10. The first main processing block is the QR decomposition and matrix multiplication and the second main processing block is the M-algorithm.

(22) The QR decomposition block decomposes the channel matrix H into a right triangular matrix R and a unitary matrix Q using the QR matrix decomposition method. Specifically,
H=QR(10)

(23) Since R is a right triangular matrix, all its elements below the main diagonal are zero. A property of a unitary matrix is that its inverse can be obtained by its Hermitian transpose. Specifically,
Q.sup.1=Q.sup.H(11)
Therefore,
Q.sup.HQ=I(12)
where I is an identity matrix. The Hermitian transpose of a unitary matrix is also a unitary matrix. Also when a vector is multiplied by a unitary matrix, the magnitude of the vector does not change. The unitary matrix Q is in general an N.sub.rN.sub.r matrix. A discussion of the fundamentals of matrix computations may be found in the text entitled Matrix Computations, The Johns Hopkins University Press, 2nd Ed., 1989, by G. H. Golub and C. F. Van Loan, the entire disclosure of which is hereby expressly incorporated by reference herein.

(24) Substituting H from EQ. 10 in the expression for the received signal vector represented by EQ. 9:
x=QRs+n(13)
Pre-multiplying both sides with Q.sup.H,
Q.sup.Hx=y=Q.sup.HQRs+Q.sup.Hn=Rs+w(14)
where y is the rotated received signal vector of x and w is the rotated noise vector of n. EQ. 14 becomes
y=Rs+w(15)

(25) For the case of 44 SM-MIMO, the expanded version of EQ. 15 is as follows:

(26) [ y 0 y 1 y 2 y 3 ] = [ r 0 , 0 r 1 , 0 r 2 , 0 r 3 , 0 0 r 1 , 1 r 2 , 1 r 3 , 1 0 0 r 2 , 2 r 3 , 2 0 0 0 r 3 , 3 ] [ s 0 s 1 s 2 s 3 ] + [ w 0 w 1 w 2 w 3 ] ( 16 )

(27) In case the number of receive chains at the receive entity is greater than the number of transmit chains at the transmit entity, all the elements in the bottom N.sub.r-N.sub.t rows of the right triangular matrix R are zero and the bottom N.sub.r-N.sub.t rows of the column vector y are also zero after QR decomposition. Therefore, the system of equations represented by EQ. 15 is simplified to an N.sub.tN.sub.t system of linear equations. In the remainder of this disclosure, the R matrix is considered to be an N.sub.tN.sub.t matrix.

(28) The second main processing block of the QRD-M SM decoder, namely the M-algorithm, is described next. The solution of the system of equations represented in EQ. 15 using the M-algorithm may be obtained in several stages. The number of stages in the M-algorithm corresponds to the number of rows in the system of equations and the M-algorithm is applied sequentially to each stage. The value of M in the M-algorithm refers to the number of best symbol sequences used for further consideration in a sequential decoding process. The best symbol sequences are the symbol sequences from the constellation selected based on minimum distance metrics. The M-algorithm for each stage includes two major processing steps. First, it computes all the distance metrics for a given stage. Next it selects M best symbol sequences for the next stage of processing. The selected M best symbol sequences are referred as surviving symbol sequences for the next stage. This process continues for all stages and at the last stage one best symbol sequence is selected as the decoded symbols vector .

(29) A 44 SM-MIMO wireless communication system, as represented in EQ. 16, using 16-QAM modulation is chosen to illustrate the M-algorithm. For the chosen example, as represented in EQ. 16, the number of stages for M-algorithm is four. In QRD-M SM decoder, the M-algorithm starts by first operating on the bottom-most row corresponding to a single non-zero element in the R matrix. For the chosen example, as represented in EQ. 16, the M-algorithm starts with the fourth row containing the single non-zero element r.sub.3,3 in matrix R.

(30) To solve the equation represented by the bottom-most row containing a single non zero element, all possible values for s.sub.(N.sub.t.sub.1) from the constellation alphabet A used by the transmit entity may be multiplied with element r.sub.(N.sub.t.sub.1),(N.sub.t.sub.1) of matrix R and subtracted from element y.sub.(N.sub.t.sub.1) of vector y to compute the distance metrics d.sub.(N.sub.t.sub.1) for all possible values of s.sub.(N.sub.t.sub.1). For the chosen example, as represented in EQ. 16, to solve the equation represented by the fourth row containing a single non zero element r.sub.3,3, all possible values for s.sub.3 from the constellation alphabet A used by the transmit entity may be multiplied with r.sub.3,3 and subtracted from y.sub.3 to compute the distance metrics d.sub.3 for all possible values of s.sub.3. For the chosen example, as represented in EQ. 16, with 16-QAM modulation used by the transmit entity, the number of distance metric computations at the receive entity for the fourth row is 16, corresponding to 16 possible values for s.sub.3.

(31) For the chosen example, as represented in EQ. 16, M=8 is used for the M-algorithm. For the chosen example, as represented in EQ. 16, this results in the selection of 8 best symbol sequences with minimum distance metrics from the total of 16 distance metrics corresponding to L=16 symbol sequences. These selected 8 (M=8) symbol sequences are referred as surviving symbol sequences. At the first stage, the symbol sequences of length one and at the subsequent stages the symbol sequences grow by one symbol in length at each stage as the stages progress.

(32) Next, the M-algorithm enters the second stage of processing. In the second stage of processing, the M-algorithm operates on row (N.sub.t2). For the chosen example, as represented in EQ. 16, the M-algorithm operates on the third row which is immediately above the fourth row. At the second stage of M-algorithm, there are 16 possible values for s.sub.2 and 8 selected surviving symbol sequences from the previous stage. This requires 168=128 total number of distance metric computations corresponding to 128 different combinations of s.sub.2 and s.sub.3. The distance metrics computed in the second stage are cumulative distance metrics corresponding to the distance metric of a symbol sequence (s.sub.2, s.sub.3) and the distance metric of the selected surviving symbol sequences for s.sub.3 during the first stage. The M-algorithm then selects 8 best surviving symbol sequences corresponding to the minimum cumulative distance metrics. The surviving symbol sequences are of length two at this stage.

(33) Next, the M-algorithm enters the third stage of processing. In the third stage of processing, the IV-algorithm operates on row (N.sub.t3). For the chosen example, as represented in EQ. 16, the M-algorithm operates on the second row which is immediately above the third row. At the third stage of the M-algorithm, there are 16 possible values for s.sub.1 and 8 selected surviving symbol sequences from the previous stage. This requires 168=128 total number of distance metric computations corresponding to 128 different combinations of s.sub.1, s.sub.2 and s.sub.3. The distance metrics computed in the third stage are the cumulative distance metrics corresponding to the distance metric of a symbol sequence (s.sub.1, s.sub.2, s.sub.3) and the distance metric of the selected surviving symbol sequence for (s.sub.2, s.sub.3) during the second stage. Next, the M-algorithm selects 8 best surviving symbol sequences corresponding to the minimum cumulative distance metrics. The surviving symbol sequences are of length three at this stage.

(34) This process continues for each stage until the last stage, which corresponds to the first row of EQ. 16, is reached. After computing the cumulative distance metrics for the last stage, one best surviving symbol sequence is selected as the decoded symbols vector . In case where the decoding is successful the decoded symbols vector is equal to the transmitted symbols vector, i.e., =s. For the chosen example, as represented in EQ. 16, at the last stage the M-algorithm operates on the first row. Therefore, at the last stage of the M-algorithm, there are 16 possible values for s.sub.0 and 8 selected surviving symbol sequences from previous stage. This requires 168=128 total number of distance metric computations corresponding to 128 different combinations of s.sub.0, s.sub.1, s.sub.2 and s.sub.3. The distance metrics computed in the last stage are the cumulative distance metrics corresponding to the distance metric of a symbol sequence (s.sub.0, s.sub.1, s.sub.2, s.sub.3) and the distance metric of the selected surviving symbol sequence (s.sub.1, s.sub.2, s.sub.3) during the third stage. Next, the M-algorithm selects one best surviving symbol sequence =[.sub.0, .sub.1, .sub.2, .sub.3].sup.T corresponding to the minimum cumulative distance metric. FIG. 11 shows the general processing flow diagram of the M-algorithm for N.sub.t stages.

(35) The value of M may be chosen according to the required decoding performance and processing complexity tradeoff. The smaller the value of M, the lesser the complexity and processing requirements, which leads to reduction in power consumption. However, a smaller value of M also reduces the decoding performance.

(36) Two major areas of complexity in the M-algorithm for each stage are: the computation of distance metrics and selection of best surviving symbol sequences corresponding to the minimum distance metrics. The computation of distance metrics in general may require complex multiplications. Since there may be hundreds of distance metric computations for one pass of QRD-M SM decoder, the number of required complex multiplications is generally high. Although the complexity of the computation of distance metrics is high, it may be pipelined and/or parallelized in a VLSI implementation to reduce latency. However, the operation to select M best surviving symbol sequences involves extensive memory access, conditional branching, element swapping, and so forth depending on the ordering feature of the input sequences and therefore the operation to select M best surviving symbol sequences may be difficult to pipeline and/or parallelize. Therefore, the processing latency of the QRD-M SM decoder normally depends on the processing latency of the operation that selects the M best surviving symbol sequences. Furthermore, the next stage of processing may not start until the M best surviving symbol sequences for the current stage have been identified.

(37) In general, when using an N.sub.tN.sub.r SM, there will be N.sub.t processing stages in the M-algorithm of the QRD-M SM decoder. If a modulation scheme with constellation size L is used by the transmit entity, then the following distance metrics computations may be performed by a traditional M-algorithm: For the first stage: L distance metric computations over symbol sequences consisting of length one. For the second stage: ML distance metric computations over symbol sequences consisting of length two. For the third stage: ML distance metric computations over symbol sequences consisting of length three. For the N.sub.t-th stage: ML distance metric computations over symbol sequences consisting of length N.sub.t.

(38) In addition to the distance metric computations, the following selection operations may be performed based on minimum distance metrics: For the first stage: M surviving symbol sequences out of L symbol sequences. For each intermediate stage: M surviving symbol sequences out of ML symbol sequences For the last stage: one surviving symbol sequence out of ML symbol sequences.

(39) Aspects of the invention described herein provide a method and apparatus to achieve decoding performance similar to that of the conventional M-algorithm but with reduced processing requirements and reduced processing latency. This may enable the implementation of a QRD-M SM decoder that may have lower processing latency and reduced power consumption. These can be substantial advantages for portable wireless communication devices such as a cellular phone, laptop, netbook, etc.

(40) This improved and more efficient processing may be performed by one or more DSPs, microcontrollers, hardware accelerators, co-processors or a combination of any of such processing devices, which receive signals from multiple receive chains. This may be done in conjunction with internal memory, including a stack or buffer memory, with external memory, or both. The results of the processor-generated determination are used to decoding spatially multiplexed signals in a MIMO wireless communication systems and to provide efficient communication between the receiving device and other devices.

(41) According to an aspect of the present invention, the channel matrix H is decomposed into two different triangular matrices. The first QR decomposition is performed as described above for the normal QRD-M SM decoder. The first QR decomposition results in the first unitary matrix Q.sup.1 and first triangular matrix R.sup.1 as shown below in EQ. 17 for the case of N.sub.t=4.

(42) R 1 = [ r 0 , 0 1 r 1 , 0 1 r 2 , 0 1 r 3 , 0 1 0 r 1 , 1 1 r 2 , 1 1 r 3 , 1 1 0 0 r 2 , 2 1 r 3 , 2 1 0 0 0 r 3 , 3 1 ] ( 17 )

(43) The second QR decomposition results in the second unitary matrix Q.sup.2 and second triangular matrix R.sup.2 as shown below in EQ. 18 for the case of N.sub.t=4.

(44) R 2 = [ r 0 , 0 2 0 0 0 r 1 , 0 2 r 1 , 1 2 0 0 r 2 , 0 2 r 2 , 1 2 r 2 , 2 2 0 r 3 , 0 2 r 3 , 1 2 r 3 , 2 2 r 3 , 3 2 ] ( 18 )

(45) As observed from EQ. 17, the matrix R.sup.1 is an upper right triangular matrix and that from EQ. 18, the matrix R.sup.2 is a lower left triangular matrix. Using the two different QR decompositions in EQ. 13 and pre-multiplying both sides of the equation with Q.sup.1H and Q.sup.2H respectively, two different systems of equations are obtained as follows.
y.sup.1=R.sup.1s+w.sup.1(19)
and
y.sup.2R.sup.2s+w.sup.2(20)
where
y.sup.1=Q.sup.1Hx
y.sup.2=Q.sup.2Hx
and
w.sup.1=Q.sup.1Hn
w.sup.2=Q.sup.2Hn

(46) According to another aspect of the invention the M-algorithm process is used for the first stage of processing on both systems of equations represented in EQ. 19 and EQ. 20. This is herein referred to as dual M-algorithm processing as shown in FIG. 12. This results in L distance metric computations for each system of equations for the first stage. After computing the distance metrics, the best M symbol sequences are used for considerations in the next step of M-algorithm processing in both systems of equations. The second stage of processing is also applied to each system of equations. This results in ML distance metric computations for symbol sequences having a length of two for each system of equations. This dual M-algorithm processing continues until N.sub.t/2 stages have been processed for both the systems of equations. At this point, the bottom N.sub.t/2 rows of the system of equations have been processed by the first M-algorithm process operating on system of equations represented by EQ. 19 and the top N.sub.t/2 rows of the system of equations have been processed by the second M-algorithm process operating on system of equations represented by EQ. 20. For both of the M-algorithm processes there are M best symbol sequences remaining after the N.sub.t/2 stages of processing. FIG. 13 shows an example of the dual M-algorithm for the case of N.sub.t=4.

(47) The surviving symbol sequences for the first M-algorithm process, when applied to the system of equations represented by EQ. 19, correspond to the symbol sequences s.sub.n.sub.t.sub./2 to s.sub.n.sub.t.sub.1 of the transmitted symbol vector s. The surviving symbol sequences for the second M algorithm process, when applied to the system of equations represented by EQ. 20, correspond to the symbol sequences s.sub.0 to

(48) S n t 2 - 1
of the transmitted symbol vector s. The decision for the globally optimum symbol sequence for the entire transmit symbol vector s may be obtained by performing joint distance metric computations for all the possible combinations of the M surviving symbol sequences from each of the M-algorithm processes.

(49) There are total of M.sup.2 possible symbol sequence combinations and M.sup.2 distance metric computations to be performed. The distance metrics may be computed either using the first row of the first system of equations represented in EQ. 19 or the last row of the second system of equations represented in EQ. 20. The joint distance metrics are computed for the symbol vector of length N.sub.t with only M.sup.2 possible symbol sequence combinations. As the distance metrics are computed, a current minimum distance metric and its corresponding symbol sequence are maintained. The final remaining symbol sequence corresponding to the joint minimum distance metric is the decoded symbol sequence.

(50) The overall processing flow for the dual M-algorithm is as per the flowchart 1400 contained in FIG. 14. Unless expressly stated herein or constrained by prior operations, the processing stages may be performed in a different order or concurrently. At processing stage 1402, the channel matrix H is decomposed into an upper triangular matrix R.sup.1 and Q.sup.1. At processing stage 1404, the same channel matrix H is decomposed into an upper triangular matrix R.sup.2 and Q.sup.2. At processing stage 1406 the M-algorithm is applied to the bottom two rows of the system of equations in EQ. 19. The output of processing step 1406 is a set of M surviving symbol vectors of length N.sub.t/2 corresponding to the top half of the transmitted symbol vector s. At processing stage 1408 the M-algorithm is applied to the top two rows of the system of equations in EQ. 20. The output of processing step 1408 is a set of M surviving symbol vectors of length N.sub.t/2 corresponding to the top half of the transmitted symbol vector s. At stage 1410 the two sets of symbols vectors of size M are used to compute distance over symbol vectors of length N.sub.t over a set of M.sup.2 candidates. Finally, at stage 1412 the symbol vector with the smallest distance is used as the decoded symbol vector. The process preferably terminates at stage 1414. Each of these stages of the process may be implemented by one or more processors and memory as discussed above.

(51) For the case of 22 SM-MIMO, the demodulation using the aspects of the present invention may be performed as follows. First the 22 channel matrix is decomposed into two different triangular matrices as described earlier. This results in one upper triangular and one lower triangular matrix as shown below.

(52) R 1 = [ r 0 , 0 1 r 1 , 0 1 0 r 1 , 1 1 ] ( 21 ) R 2 = [ r 0 , 0 2 0 r 0 , 1 1 r 1 , 1 2 ] ( 22 )

(53) The two systems of equations using the two different triangular matrices may be solved in parallel. The first M-algorithm process performs the first stage of processing corresponding to the second row (bottom most) of the system of equations using R.sup.1. Similarly, the second M-algorithm process performs the first stage of processing corresponding to the first row of the system of equations using R.sup.2. Next, the joint distance metrics are computed for the symbol vector of length N.sub.t=2 with only M.sup.2 possible sequences. As the distance metrics are computed, the current minimum distance metric and its corresponding symbol sequence are maintained. The final remaining symbol sequence corresponding to the joint minimum distance metric is the decoded symbol sequence.

(54) For the case of 22 MIMO configuration with 64-QAM and M=4, the following computations are performed. At the first stage of processing, L distance metrics for each of the system of equations are performed. In this chosen example, L=64, therefore 264=128 distance metric computations are performed over symbol sequences of length one. Next MM=16 joint distance metrics are computed over a symbol sequences of length two. For the chosen example, when the conventional M-algorithm is used, there are L=64 distance metric computations over symbol sequences of length one in the first stage and ML=256 distance metric computations over symbol sequences of length two for the second stage. The distance metric computations over symbol sequences of length two requires additional number of operations when compared to the first stage of distance metric computations over symbol sequences of length one. The present invention achieves reduction in the overall computations for the demodulation of the transmitted symbol vector, which as noted above may result in significant resource and/or power savings for the communication device.

(55) In accordance with one aspects of the present invention, a higher value of M may result in a relatively smaller increase in the computation complexity.

(56) The demodulation of the first stage received symbol may be performed without computing any of the L distance metrics. This may be achieved by quadrant based demodulation of the received symbol as described below for the 64-QAM constellation shown in FIG. 15. First the sign of the real component is used to determine the value of bit b.sub.5. Similarly, the sign of the imaginary component is used to determine the value of the bit b.sub.2. Next, the absolute value of the received symbol sequence is subtracted from the center point of the first quadrant of the 64-QAM constellation as shown in FIG. 16. The resultant signal is then used to demodulate the next pair of bits as follows. The sign of the real component is used to determine the value of bit b.sub.4. Similarly, the sign of the imaginary component is used to determine the value of the bit b.sub.1. Next, the absolute value of the residual signal from the previous stage is then subtracted from the center point of the lower one fourth of the first quadrant as shown in FIG. 16. The resultant signal is then used to demodulate the next pair of bits as follows. The sign of the real component is used to determine the value of bit b.sub.3. Similarly, the sign of the imaginary component is used to determine the value of the bit b.sub.0. Thus, the 64-QAM received symbol may be demodulated in three steps without computing any distances.

(57) Since the first stage of the demodulation involves only one received symbol, such individual symbol demodulation is suitable. However, this only provides one of the M total symbol sequences needed for the next stage of processing. Therefore the remaining M1 best symbol sequences for the next stage of processing may need to be determined. According to a method described in the Low Latency application, this may be accomplished as follows. The nearest neighbor symbol sequences for the constellation may be pre-computed and stored in a ROM table. The pre-computed nearest neighbor symbol sequences table may be used as an approximation to the actual nearest M symbol sequences that may be closest to the received symbol sequence. Examples of such nearest neighbor symbol sequences tables for the 16-QAM and 64-QAM constellations of FIG. 8 and FIG. 9 are provided in the tables contained in FIG. 17 and FIG. 18 respectively. According to an aspect of the present invention, the approximation for the best M symbol sequences may be improved as follows. Let the number of nearest neighbor symbol sequences stored in the pre-computed table for each symbol sequence in the constellation be denoted by N.sub.b. Clearly, N.sub.b must be greater than or equal to M but less than L. If N.sub.b is chosen to be greater than M, then the receiver may compute the distance metrics for N.sub.b pre-computed nearest neighbor symbol sequences only. Next, the receiver may search the N.sub.b distance metrics for the M minimum distance metrics. This reduces the complexity from computing L (size of constellation) distance metrics to N.sub.b distance metrics. Also, the search is reduced from M out of L distance metrics to M out of N.sub.b distance metrics. For example, in case of 64-QAM (L=64), M=4, and N.sub.b=8, the distance metrics computation is reduced from 64 to only 8. The search has been reduced from 4 out of 64 to only 4 out of 8 distance metrics.

(58) In accordance with aspects of the present invention, the quadrant based demodulation may be applied to first stage of processing for both the first and the second systems of equations represented by EQ. 19 and EQ. 20. Next, the nearest neighbor symbol sequences for each of the demodulated symbols for the two systems of equations are determined based on the pre-computed nearest neighbor symbol sequence tables.

(59) According to an aspect of the present invention, at this point there are two different methods for further demodulation and each of them is described next. These methods are illustrated in FIG. 19. In Method 1, the N.sub.b nearest neighbor symbol sequences corresponding to the demodulated symbol sequences at the first stage are used as M best symbol sequences for subsequent stages of processing. In this case N.sub.b=M. In Method 2 the N.sub.b nearest neighbor symbols corresponding to the demodulated symbol sequences determined from the pre-computed tables are used as symbol sequences for distance metric computation in the first stage of each of the two M-algorithm processes. After the N.sub.b distance metrics are computed for each of the M-algorithm processes, the distance metrics are searched for the M smallest distance metrics from the N.sub.b distance metrics for each of the two M-algorithm processes. Once the best M symbol sequences are determined for each of the M-algorithm processes, they are used as the best symbol sequences for subsequent stages of processing in the respective M-algorithm process.

(60) The overall processing flow for the quadrant based decoding in conjunction with nearest neighbor lookup tables is as per the flowchart 1900 contained in FIG. 19. The first stage of the processing 1902 is the QR decomposition of the channel matrix. At the processing stage 1904 the bottommost row which consists of a single unknown variable is processed according to the quadrant based decoding method as per FIG. 15 and FIG. 16. Next at the processing stage 1906 the N.sub.b nearest neighbors are selected from a pre-computed tables stored in a ROM as shown in FIG. 17 and FIG. 18 depending on modulation type. At processing stage 1908 a decision is made where a Method 1 or Method 2 is to be used. If Method 1 is to be used, the number of nearest neighbors is selected to be equal to M at processing stage 1912. If Method 2 is to be used, the processing continues in block 1910 where the number of nearest neighbors is selected to be greater than M and the distance metrics are computed for the bottom most row over the N.sub.b candidates and from those M candidates with the lowest distance are selected for the next stage of M-algorithm processing. At processing stage 1914 the rest of the M-algorithm processing is performed. The process preferably terminates at stage 1916.

(61) The processing steps as illustrated in FIG. 19 are applied to the systems of equations in EQ. 19 and EQ. 20. The last two processing steps of computing distance over M*M candidates and selecting the candidate with lowest metric are then applied as per processing stages 1410 and 1412 in FIG. 14.

(62) For the case of 22 MIMO configuration with 64-QAM when processing the first stage symbol sequence using the quadrant based demodulation, the two demodulated symbol sequences may be obtained without any distance metric computations. The symbol sequences to be considered for joint demodulation of the transmitted symbol vector s, the nearest neighbor symbol sequences for each of the demodulated symbol may be used. According to one method, the N.sub.b neighbor symbol sequences of the demodulated symbol for the first M-algorithm process and the N.sub.b neighbor symbol sequences of the demodulated symbol for the second M-algorithm process may be considered for the joint demodulation of the entire transmitted symbol vector s. This requires total of N.sub.bN.sub.b distance metric computations over a symbol vector of length two. Alternatively, the demodulated symbol at the first stage may be used to narrow the list of symbol sequences for which the distance metrics are computed. Specifically, instead of distance metric computations over the entire constellation alphabet of L, only N.sub.b distance metrics may be computed. Next the best M symbol sequences with the smallest distance metrics may be chosen for further processing. In this method, the two M-algorithm processes perform 2N.sub.b distance metric computations for the first stage and then MM distance metric computations for the joint demodulation of the transmitted symbol vector s. When compared to conventional M-algorithm approach, which requires 2*L first stage distance metric computations and LM second stage distance metric computations over symbol sequences of length two, significant computation and/or power savings may be obtained.

(63) Aspects of the present invention may be implemented in firmware of the MCU or the SPU of the baseband subsystem 16 shown in FIG. 3. In another alternative, aspects of the present invention may also be implemented as a combination of firmware and hardware of the baseband subsystem 16. By way of example, aspects of the present invention may be implemented in any communication entity in the wireless communication systems such as client terminal, the base station and others.

(64) In accordance with such aspects of the present invention, the aforementioned processes may be applied to various wireless communication systems such as systems based on an IEEE 802.16 wireless communication standard, an IEEE 802.11 wireless communication standard, an IEEE 802.20 wireless communication standard, Wideband Code Division Multiple Access (WCDMA) wireless communication standard, a 3GPP wireless communication standard, or a Long Term Evolution (LTE), a 3GPP wireless communication standard.

(65) Although the invention herein has been described with reference to particular embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present invention. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the spirit and scope of the present invention as defined by the appended claims. Aspects of each embodiment may be employed in the other embodiments described herein.