METHOD FOR DECODING NON-BINARY CODES AND CORRESPONDING DECODING APPARATUS
20170012642 · 2017-01-12
Assignee
- CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE (CNRS ) (Paris, FR)
- Universite Cergy-Pontoise (Cergy, FR)
- UNIVERSITAT POLITÈCNICA DE VALÈNCIA (Valencia, ES)
Inventors
- David Declercq (Ableiges, FR)
- Erbao Li (Zhengzhou, CN)
- Francisco Miguel Garcia Herrero (Gandia, ES)
- Javier Valls Coquillat (Tavernes De La Valldigna, ES)
Cpc classification
H03M13/1108
ELECTRICITY
International classification
Abstract
An extension to the enhanced serial generalized bit-flipping decoding algorithm (ES-GBFDA) of non-binary LDPC codes by introducing soft information in the check node operation. The application not only considers the most reliable symbol in the syndrome computation, but also takes at least the second most reliable symbol of each incoming message into account. An extended information set is available for the parity-check node update and this allows introducing the concept of weak and strong votes performed by the check node unit. Each variable node can receive two kinds of votes, whose amplitudes can be tuned to the reliability of the syndrome that produces the vote.
Claims
1. A method for decoding a non-binary low density parity-check (NB-LDPC) code defined in a finite field of size q, which is a symbol flipping decoding method using multiple votes performed by the check node unit and transferred to the variable node unit, the code can be displayed in a bipartite graph comprising at least one variable node V.sub.n, n=0, . . . , N1 and at least one check node C.sub.m, m=0, . . . , M1, said method comprising for each iteration j of I.sub.t decoding iterations, the steps consisting in that: each variable node V.sub.n, connected to a check node C.sub.m, is configured for determining (A1.1, A1.2) a most reliable symbol Q.sub.n.sup.1(j) and at least one symbol which is at least a p.sup.th most reliable symbol Q.sub.n.sup.p(j), with p2 for obtaining a vector of d.sub.c most reliable symbols; each check node C.sub.m is configured for determining: (A3.1) a first symbol to be voted R.sub.n.sup.0(j)=R.sub.n.sup.(j) based on the vector of d.sub.c most reliable symbols passed by the variable nodes connected to him in the bipartite graph; (A3.2) a list of i=1, . . . , L second symbols to be voted R.sub.n.sup.i(j) based on a list of L+1 test vectors defined as a combination of d.sub.c symbols with a restriction according to which at most of these d.sub.c symbols are a p.sup.th most reliable symbol Q.sub.n.sup.p(j) with p2, and at least d.sub.c of these d.sub.c symbols are a most reliable symbol Q.sub.n.sup.1(j).
2. The method according to claim 1, wherein each variable node is configured for determining (A1.1, A1.2) the most reliable symbols Q.sub.n.sup.(j) and the second most reliable symbol Q.sub.n.sup.2(j)=Q.sub.n.sup.(j) and their corresponding extrinsic reliability W.sub.n.sup.(j), W.sub.n.sup.(j) so that at a check node (A2.1, A2.2) the list of L+1 test vectors are built by replacing symbol Q.sub.n.sup.(j) by the second most reliable symbol Q.sub.n.sup.(j) in at most L locations were the differences between W.sub.n.sup.(j) and W.sub.n.sup.(j) are the smallest.
3. The method according to the preceding claim, wherein L, the method comprising a step consisting in that a sorter unit is configured for sorting (A1.3) the differences of extrinsic reliability W.sub.nm.sup.(j)W.sub.nm.sup.2(j) from the highest value to the lowest value for obtaining a sequence of L sorted indices n, the sequence
comprising the locations where Q.sub.n.sup.(j) is replaced by Q.sub.n.sup.2(j) in the L+1 test vectors.
4. The method according to claims 1 to 3, wherein each variable node is further configured for computing (A4.1, A4.2) an intrinsic information W.sub.mn.sup.(j) from check node counting the votes of R.sub.n.sup.0(j) and of R.sub.n.sup.i(j) with the respectively amplitude voting v.sub.0, v.sub.2.
5. The method according to claims 1 to 4, comprising before the I.sub.t decoding iterations an initialization step (Initialization) comprising the sub-steps of: determining a LLR vector L.sub.n=(L.sub.n[0], L.sub.n[1], . . . , L.sub.n[q1]) of a n.sup.th symbol in a sequence of N non-binary noisy symbols; initializing a vector of APP vectors W.sub.n.sup.(0) to the LLR vector L.sub.n and initializing a matrix W.sub.mn.sup.(0) to an all-zero matrix, said matrix W.sub.mn.sup.(j) being the intrinsic information from the check node m.
6. The method according to the preceding claim, wherein each variable node taking as input the LLR vector and the vector W.sub.mn.sup.(j) combines (A4.1, A4.2) the previous vector W.sub.mn.sup.(j1), the voting symbols R.sub.n.sup.0(j) and of R.sub.n.sup.(j) and the voting amplitudes v.sub.0, v.sub.1 through a function F.sub.1 for obtaining the vector defined as the intrinsic information W.sub.mn.sup.(j).
7. The method according to the preceding claim, wherein the function F.sub.1 is a simple summation of the values of the previous vector W.sub.mn.sup.(j1) at indices indicated by the voting symbols R.sub.n.sup.0(j) and of R.sub.n.sup.i(j) and the voting amplitudes v.sub.0, v.sub.1.
8. The method according to claim 5, wherein each variable node taking as input the LLR vector and the vector W.sub.n.sup.(j) combines (A5.1, A5.2) the previous vector W.sub.n.sup.(j1)W.sub.mn.sup.(j1), the voting symbols R.sub.n.sup.0(j) and of R.sub.n.sup.i(j) and the voting amplitudes v.sub.0, v.sub.1 through a function F.sub.2 for obtaining the vector W.sub.n.sup.(j).
9. The method according to the preceding claim, wherein the function F.sub.2 is a simple summation of the values of the previous vector W.sub.n.sup.(j1) at indices indicated by the voting) symbols R.sub.n.sup.0(j) and of R.sub.n.sup.i(j) and the voting amplitudes v.sub.0, v.sub.1.
10. A decoding apparatus comprising at least one variable node V.sub.n and a at least one check node C.sub.m, said decoder being configured for implementing a method for decoding a non-binary low density parity-check (NB-LDPC) code defined in a finite field of size q, according to one of the preceding claims.
11. A decoding apparatus according to the preceding claim, said apparatus being configured for implementing check node operations by means of L processing units configured dynamically to compute R.sub.n.sup.i(j), with i=1, . . . , L, each one of the L processing units share 3d.sub.c inputs that correspond to the symbols Q.sub.n.sup.1(j),Q.sub.n.sup.2(j) and to the coefficients of the code, said inputs of each processing unit are combined by means of 2d.sub.c GF multipliers and 2d.sub.c GF adders, said processing units including all the logic necessary to compute L different syndromes as an intermediate step and a variable number of pipeline stages that may vary from 0 to log 2(d.sub.c)+2 depending on the speed of said processing unit.
12. A decoding apparatus according to the preceding claim, said apparatus being configured for implementing the method of claims 6, 7, 8 and 9 and comprising: i) a bank of memories that store R.sub.n.sup.i(j), with i=0, . . . , L symbols, ii) q processors with a logic required to compare the symbols R.sub.n.sup.i(j), with i=0, . . . , L, with the q elements of the Galois Field and determine the amplitude of the votes corresponding to that symbols; and iii) q cells that implement functions F.sub.1 and F.sub.2, the bank of memories being implemented with L RAM memories or a bank of L registers., the processors being implemented with L-XNOR gates of log 2(q) bits and L-OR gates of 1 bit to compare the input symbols, i.e., symbols R.sub.n.sup.i(j), with i=0, . . . , L1, with the q Galois Field elements and determine the amplitude of the votes, the cells including a logic necessary for implementing F.sub.1 and F.sub.2 and the storage resources for W.sub.mn.sup.(j) and W.sub.n.sup.(j).
13. A decoding apparatus according to claim 11, said apparatus comprising a sorter unit configured for obtaining the sequence of L sorted indices n according to the method defined in claim 3, the sorter unit including at least one sub-processors of radix L, each sub-processor including: i) one stage of comparators configured for performing all the possible combinations of the inputs; ii) a plurality of adders and a plurality of NOT gates configured for computing a summation of the output signals of the different comparators associated to the same input, the adders being configured for checking how many times a condition of greater than or less than is satisfied for each one of the inputs; iv) a plurality of logic gates configured for implementing a L different masks that allows ordering the inputs according to the information provided by the outputs of the adders, logic gates are XNOR, OR and AND gates.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0033] Other features and advantages of the invention will appear in the following description with references to the drawings, in which:
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
[0045]
[0046]
[0047]
DETAILED DESCRIPTION
[0048] A method for decoding a non-binary low density parity-check (NB-LDPC) code defined in a finite field of size q is based on a bipartite graph comprising at least one variable node V.sub.n, n=0, . . . , N1 and at least one check nodes C.sub.m, m=0, . . . , M1.
[0049]
NB-LDPC Notations
[0050] Let us define a (N,K) NB-LDPC code over GF(q) (q=2.sup.p) with code length N and information length K. Its parity check matrix H.sub.M,N has N columns and M rows. The non-zero coefficients of H.sub.M,N are h.sub.m,n, where m is the row index and n the column index. The check node degree is denoted as d.sub.c and the variable node degree as d.sub.v, (m) defines the set of variables nodes connected to the m-th check node, and
(m) the set of check nodes connected to the n-th variable node. After transmission over a noisy Additive White Gaussian Noise (AWGN) channel, the received symbol sequence is defined as Y=(y.sub.0,y.sub.1, . . . , y.sub.N1). Considering the transmission of the codeword symbol c.sub.n, the log-likelihood ration (LLR) is computed as L.sub.n[x]=log [P(c.sub.n=0)|y.sub.n)/P(c.sub.n=x|y.sub.n)], for each x GF(q). The LLR vector of the n-th symbol is L.sub.n=(L.sub.n[0], L.sub.n[1], . . . , L.sub.n[q1]). The hard-decision based on the most reliable element in L.sub.n is called z.sub.n GF(q).
Enhanced Serial GBFDA (ES-GBFDA)
[0051] The method for decoding a NB-LDPC code according to the invention is an improvement of the ES-GBFDA. Before explaining the method of the invention we briefly describe this known improved algorithm, especially the ES-GBFDA from [18] called Algorithm 1 above.
[0052] In this Algorithm 1, two main steps can be distinguished in the iterative process the check node unit (CNU), steps A2 and A3; and the variable node unit (VNU), steps A1, A4 and A5.
[0053] During the initialization, W.sub.n.sup.(0) equals to the channel LLRs, L.sub.n, and W.sub.mn.sup.(0) is an all-zero matrix. After initialization, at the j-th iteration, step A1 sorts the extrinsic information to find the symbols Q.sub.n.sup.(j) with the maximum reliability (GFmax), which will be taken as the new hard decision.
[0054] The extrinsic information is calculated as W.sub.n.sup.(j1)W.sub.mn.sup.(j1).Math.W.sub.n.sup.(j1) is the vector where all the votes are accumulated and the W.sub.m.sup.(j1) is the intrinsic information from check node C.sub.m. Step A2 computes the syndrome s, required in step A3 to calculate R.sub.n.sup.(j), which is the symbol to be voted (selected). In step A4, W.sub.mn.sup.(j) counts the votes on R.sub.n.sup.(j) where v is the amplitude of the vote. In step A5, W.sub.n.sup.(j) accumulates the initial LLRs plus the votes of all the check nodes connected to the variable node V.sub.m. The votes modify the values of W.sub.n.sup.(j1) and W.sub.mn.sup.(j1), changing the result of the sorting process in A1 and, hence, flipping symbols in Q.sub.n.sup.(j). Step A6 performs the tentative decoding by finding the symbols associated to the maximum values of W.sub.n.sup.(j) and step A7 implements the stopping criterion based on the information of all the syndromes. The decoded codeword is then {tilde over (c)}.sub.n.sup.(j).
TABLE-US-00001 Algorithm 1. Serial Enhanced-GBFDA Initialization: W.sub.n.sup.(0) = L.sub.n, W.sub.mn.sup.(0) = 0 For j = 1 to It.sub.max For m = 1 to M A1 : Q.sub.n.sup.(j) = GFmax(W.sub.n.sup.(j1) W.sub.mn .sup.(j1)) A2 : s = .sub.nN(m) h.sub.m,nQ.sub.n.sup.(j) A3 : R.sub.n.sup.(j) = h.sub.mn.sup.1s Q.sub.n.sup.(j) A4 : W.sub.mn.sup.(j)(R.sub.n.sup.(j)) = W.sub.mn.sup.(j1)(R.sub.n.sup.(j)) + v A5 : W.sub.n.sup.(j)(R.sub.n.sup.(j)) = W.sub.n.sup.(j1)(R.sub.n.sup.(j)) + v end For m A6 : {tilde over (c)}.sub.n.sup.(j) = GFmax(W.sub.n.sup.(j)) A7 : if ({tilde over (c)}.sub.n.sup.(j) H.sup.T = 0) SKIP end For j Output= {tilde over (c)}.sub.n.sup.(j)
General Presentation of the Method of the Invention Called as Multiple Vote Symbol Flipping Decoding Algorithm (MV-SF)
[0055] We now describe, in a general way, the method of the invention based on the ES-GBFDA, by increasing the list of candidates computed by each check node and hence, increasing as well the number of votes propagated to the variable node. We only present the steps involved at the j-th decoding iteration.
[0056] Let Q.sub.n.sup.(j) be the hard decision obtained from the most reliable symbols, and Q.sub.n.sup.p(j) the hard decision of the p-th most reliable symbol within the input messages of a check node c.sub.m. We only consider the most reliable symbol Q.sub.n.sup.(j) and the second most reliable symbol Q.sub.n.sup.2(j)=Q.sub.n.sup.(j). The symbols Q.sub.n.sup.(j) (respectively Q.sub.n.sup.2(j)) are associated with reliabilities W.sub.n.sup.(j) (respectively W.sub.n.sup.2(j)=W.sub.n.sup.(j)).
[0057] We define test vector for a check node c.sub.m as the combination of d.sub.c symbols with the restriction that at least one, and at most of these d.sub.c symbols cannot be based on the most reliable information. In order to reduce the number of test vector candidates we only consider combinations between Q.sub.n.sup.(j) and Q.sub.n.sup.(j). To build the list of test vectors of a check node c.sub.m, first the difference between the reliability of W.sub.n.sup.(j) and W.sub.n.sup.(j) is computed and sorted in ascendant order for n (m). The first elements of the sorted differences are the symbols in which the reliability of Q.sub.n.sup.(j) is closer to the reliability of Q.sub.n.sup.(j). To keep the number of test vectors low we only replace Q.sub.n.sup.(j) by Q.sub.n.sup.(j) in at most locations were the differences between W.sub.n.sup.(j) and W.sub.n.sup.(j) is smaller (first elements of the sorted list). The parameter tunes the performance/complexity trade-off of the algorithm since it is directly related to with the number of test vectors to be processed by each check node. is selected as <<d.sub.c to keep complexity low. The set with the locations is denoted
.
[0058] Let us define .sub.i.sup.(j) as the i-th test vector of a list 2.sup.1 possible test vectors of a check node m, different from the one based only on Q.sub.n.sup.(j) symbols. Each .sub.i.sup.(j) is built replacing Q.sub.n.sup.(j) with Q.sub.n.sup.(j).
[0059] In Equation (1) the definition of the test vector .sub.i.sup.(j) is indicated where .sub.t is the t-th element in the set
, i.sub.t is the bit t of the binary representation of i and
and are the AND and OR operations respectively.
[0060] In the same way, we can define the reliability of each .sub.i.sup.(j), .sub.i.sup.(j), by means of Equation (2).
Description of a Preferred Embodiment of the Method of the Invention
[0061] We now consider a case wherein each check node is composed of L= test vectors. Each test vector has d.sub.c1 Q.sub.n.sup.(j) symbols, and one Q.sub.n.sup.(j) symbol. The position of the Q.sub.n.sup.(j) symbol in the test vector i, .sub.i.sup.(j) is given by N.sub.i as follows:
[0062] The reliability equation for the test vector .sub.i.sup.(j) is given by
[0063] It can be deduced from Equation (4) that changing Q.sub.n.sup.(j) by Q.sub.n.sup.(j) in N.sub.i does not introduce a strong variation in the total reliability of the check node, .sub.i.sup.(j). The only element that changes in the sum is
which is selected because the difference
is one of the smallest (so both
have similar reliabilty in the selected location N.sub.1).
[0064] In other words, as .sub.i.sup.(j) has a high reliability, similar to Q.sub.n.sup.(j), processing its candidates increases the amount of useful soft-information in the decoder and hence improves the performance.
[0065] To keep the complexity low and avoid computing and storing (4) for each test vector, function (.sub.i.sup.(j)) has to be very simple.
[0066] In a preferred embodiment of the invention, only two amplitudes for the votes are considered v.sub.o for the symbol candidates derived from the most reliable test vector, and v.sub.1 for the other test vectors, v.sub.0>v.sub.1. However, this latter condition for the amplitudes v.sub.0 and v.sub.1 is necessary but not sufficient to obtain the best efficiency and performance. As already mentioned, the distance between the amplitudes is, with 7.sub.7, the most important parameter to be optimized. If v.sub.0 and v.sub.1 are too close the amplitudes of the votes will be mixed easily and the flipping of the symbols will be done almost without any criterion, as candidates with very different reliability values will have almost the same vote amplitude. If the difference between v.sub.0 and v.sub.1 is too large, the votes of the less reliable test vectors will be not taken into account and the effect will be similar as not having an extended list of candidates. On the other hand, the values v.sub.o and v.sub.1 also have to be scaled according to the channel information. Unlike other algorithm as EMS or Min-max were all the information is strongly related with the incoming LLRs, algorithms based on voting process mix two different domains, the one with channel information and the one based on votes. It is easy to deduce that some kind of normalization is required for vote amplitudes or for channel information in order to combine them properly. Hence, we need to optimize the amplitudes of the votes v.sub.0 and v.sub.1, and also their combination with the LLR information, through the function .
[0067] In the preferred embodiment of the invention called Algorithm 2 above, L=. It can be seen that some steps of this Algorithm 2 are similar to some steps of Algorithm 1 (i.e., ES-GBFDA).
TABLE-US-00002 Algorithm 2. Multiple-Vote Symbol Flipping Decoding Algorithm
[0068] The method of the invention, above, comprises I.sub.t decoding iterations. In the following, the method is described at each iteration j.
[0069] Before, the I.sub.t decoding iterations an initialization step, Initialization, comprises the sub-steps of:
[0070] determining a LLR vector L.sub.n=(L.sub.n[0], L.sub.n[1], . . . , L.sub.n[q1]) of a n.sup.th symbol in a sequence of N non-binary noisy symbols;
[0071] initializing a vector of APP vectors W.sub.n.sup.(0) to the LLR vector L.sub.n and initializing a matrix W.sub.mn.sup.(0) to an all-zero matrix, said matrix W.sub.mn.sup.(j) being the intrinsic information from the check node m.
[0072] In steps A1.1 and A1.2 each variable node V, connected to a check node C.sub.m is configured for determining a most reliable symbols Q.sub.n.sup.1(j) and at least one symbol which is at least a p.sup.th most reliable symbol Q.sub.n.sup.p(j), with p2 for obtaining a vector of dc most reliable symbols.
[0073] In particular steps A1.1 and A1.2 search the extrinsic information of the most reliable symbols W.sub.n.sup.(j)(max.sub.1) , and their associated hard decision symbols Q.sub.n.sup.(j) and Q.sub.n.sup.(j).
[0074] Furthermore, each variable node n=0, . . . , d.sub.c1 is configured for determining (the most reliable symbols Q.sub.n.sup.(j) and the second most reliable symbol Q.sub.n.sup.(j) and their corresponding extrinsic reliability W.sub.n.sup.(j), W.sub.n.sup.(j) so that at a check node m the list of L+1 test vectors are built by replacing symbol Q.sub.n.sup.(j) by the second most reliable symbol Q.sub.n.sup.(j) in at most L locations were the differences between W.sub.n.sup.(j) and W.sub.n.sup.(j) are the smallest, with W.sub.n.sup.(j)=W.sub.n.sup.(j)W.sub.mn.sup.(j).
[0075] Based on the dc most reliable symbols, each check node C.sub.m is configured for determining: [0076] in step A3.1, a first symbol to be voted (selected) R.sub.n.sup.0(j) based on the vector of d.sub.c most reliable symbols passed by the variable nodes connected to him in the bipartite graph; [0077] in step A3.2, a list of i=1, . . . , L second symbol to be voted (selected) R.sub.n.sup.i(j) based on a list of L+1 test vectors defined as a combination of d.sub.c symbols with a restriction according to which at most of these d.sub.c symbols are a p.sup.th most reliable symbol Q.sub.n.sup.p(j) with p2, and at least d.sub.c of these d.sub.c symbols are a most reliable symbol Q.sub.n.sup.(j).
[0078]
[0079] It can be seen that comparatively with Algorithm 1, each check node performs more operations than a check node of Algorithm 1. Further, step A3.1 is the same as A3 in algorithm 1.
[0080] Furthermore, each variable node is configured for determining in steps A1.1 and A1.2 the most reliable symbols Q.sub.n.sup.(j) and the second most reliable symbol Q.sub.n.sup.(j) and their corresponding extrinsic reliability W.sub.n.sup.(j), W.sub.n.sup.(j) so that at a check node in steps A2.1 and A2.2 the list of L+1 test vectors are built by replacing symbol Q.sub.n.sup.(j) by the second most reliable symbol Q.sub.n.sup.(j) in at most L locations were the differences between W.sub.n.sup.(j) and W.sub.n.sup.(j) are the smallest, with W.sub.n.sup.(j)=W.sub.nW.sub.nm.
[0081] The method of the invention also comprises a step consisting in that a sorter unit is configured for sorting in step A1.3 the differences of extrinsic reliability W.sub.nm.sup.(j)W.sub.nm.sup.(j) from the highest value to the lowest value for obtaining a sequence of L sorted indices n, the sequence
comprising the locations where Q.sub.n.sup.(j) is replaced by Q.sub.n.sup.(j) in the L+1 test vectors.
[0082] In particular, step A1.3 sorts W.sub.mn.sup.(j)W.sub.nm.sup.(j) in increasing order, where n (m). Hence, the first values after the sorting are the ones with least difference between, the reliability of Q.sub.n.sup.(j) and Q.sub.n.sup.(j) symbols. Finally, step A1.3 stores the sorted n indices in
.
[0083]
[0084] Additionally, each variable node is further configured for computing in steps A4.1 and A4.2 an intrinsic information W.sub.mn.sup.(j) from check node counting the votes of R.sub.n.sup.(j) and of R.sub.n.sup.i(j) with the respectively amplitude voting v.sub.0, v.sub.1.
[0085]
[0086] Furthermore, each variable node taking as input the LLR vector and the vector W.sub.mn.sup.(j) combines (A4.1, A4.2) the previous vector W.sub.mn.sup.(j1), the voting symbols R.sub.n.sup.0(j) and of R.sub.n.sup.(j) and the voting amplitudes v.sub.0, v.sub.1 through a function F.sub.1 for obtaining the vector defined as the intrinsic information W.sub.mn.sup.(j).
[0087] The function F.sub.1 can be a simple summation of the values of the previous vector W.sub.mn.sup.(j1) at indices indicated by the voting symbols R.sub.n.sup.0(j) and of R.sub.n.sup.i(j) and the voting amplitudes v.sub.0, v.sub.1.
[0088] Also, each variable node taking as input the LLR vector and the vector W.sub.n.sup.(j) combines in steps A5.1, A5.2 the previous vector W.sub.n.sup.(j1)W.sub.mn.sup.(j1), the voting symbols R.sub.n.sup.(j) and of R.sub.n.sup.(j) and the voting amplitudes v.sub.0, v.sub.1 through a function F.sub.2 for obtaining the vector W.sub.n.sup.(j).
[0089] The function F.sub.2 can be a simple summation of the values of the previous vector W.sub.n.sup.(j1) at indices indicated by the voting symbols R.sub.n.sup.(j) and of R.sub.n.sup.(j) and the voting amplitudes v.sub.0, v.sub.1.
[0090] Steps A2.1, A3.1, A4.1 and A5.1 from Algorithm 2 are the same as A2, A3, A4 and A5 in Algorithm 1. These steps are performed based on the most reliable symbols. Steps A2.2, A3.2, A4.2 and A5.2 are performed with the L test vectors from the considered set of Q.sub.n.sup.(j) and Q.sub.n.sup.(j) symbols. Steps A2.2, A3.2, A4.2 and A5.2 are performed with the L+1 test vectors from the considered set of Q.sub.n.sup.(j) and Q.sub.n.sup.(j) symbols. Step A2.2 performs the syndrome computation for the L+1 test vectors (s.sub.i), formed by one symbol and d.sub.c1 symbols Q.sub.n.sup.(j). Step A3.2 calculates the candidates of the voting process, R.sub.ni.sup.(j), according to each one if the L+1 test vectors. Steps A4.2 and A5.2 are similar to steps A4 and A5 of algorithm. The main difference is that in the first one the voted candidates are R.sub.ni.sup.(j) instead of R.sub.n.sup.(j). One can note that there are two amplitudes for the votes v.sub.0, v.sub.1. As it can been explained earlier, the constraint v.sub.0>v.sub.1 must be fulfilled, because R.sub.n.sup.(j) is computed with Q.sub.n.sup.(j) symbols, so it is more reliable than R.sub.ni.sup.(j), which is computed with both Q.sub.n.sup.(j) and Q.sub.n.sup.(j) symbols.
Architecture of a Decoder for Implementing the Method of the Invention
[0091] A decoder for implementing the method of the invention is now described in relation to
[0092] On this figure, a diagram of the complete architecture is described. Three main parts can be distinguished: i) VNU units, ii) CNU units and iii) the sorter unit. The decoder has dc VNU units. Each VNU unit computes Q.sub.n.sup.(j) and Q.sub.n.sup.(j) symbols which are the input for CNU units, by using R.sub.n.sup.(j) and R.sub.n.sup.(j) incoming symbols (with i from 0 to L). In addition, VNU units calculate W.sub.nm.sup.(j) and W.sub.nm.sup.(j) and perform the subtraction (W.sub.nm.sup.(j)W.sub.nm.sup.(j)), which is used as input of the sorter unit. So, VNU units implement steps A1.1, A1.2, A4.1, A4.2, A5.1 and A5.2 from Algorithm 2. The sorter unit receives dc W.sub.nm.sup.(j)W.sub.nm.sup.(j) values and looks for the L smallest values (the elements which are added to the enlarged list are the ones with the smaller difference between the reliabilities of Q.sub.n.sup.(j) and Q.sub.n.sup.(j)). The sorter outputs are the indexes of the L elements with the smallest W.sub.nm.sup.(j)W.sub.nm.sup.(j) values within the dc inputs. This part of the architecture implements step A1.3 and due to its complexity and relative novelty it is explained at the end of the section, separately from the rest. Finally, two different kinds of CNU units are found: the ones that compute only Q.sub.n.sup.(j) symbols (CNU unit HD) and the ones that compute test vectors with both Q.sub.n.sup.(j) and Q.sub.n.sup.(j) (CNU unit test vector (TV)). The decoder has just one CNU unit HD and L CNU units TV. Hardware resources for computing h.sub.mn coefficients are shared between both HD and TV units. CNU unit HD implements steps A2.1 and A3.1 steps, while the L CNU units TV implement steps A2.2 and A.3 from Algorithm 2.
CNU Units
[0093]
[0094]
VNU Units
[0095]
[0096] Each VNU unit has 2(L+1) RAMs that store R.sub.n.sup.(j) and R.sub.ni symbols. A single RAM stores (q1)R.sub.n.sup.(j) (or R.sub.ni) symbols that correspond to the information of one sub-matrix. R.sub.n.sup.(j) and R.sub.ni symbols are represented with p bits because q=2.sup.p. Memories connected to the same input work in a ping-pong way, so during q1 clock cycles one RAM is read and the other is written and for the next q1 cycles memories work in the opposite way. This reduces the idle cycles of the decoder and increases its efficiency.
[0097] mThe logic for selecting the active cells generates sym_sel x which indicates if a VNU cell receives a vote or not. To compute sym_sel x of a cell x, all the R.sub.n.sup.(j) and R.sub.ni symbols (outputs of the ping-pong RAMs selected by the multiplexor) are compared with the symbol x associated to the cell, where x GF(q). These comparisons are implemented with XNOR gates, so if one of the symbols (R.sub.n.sup.(j) or R.sub.ni) is equal to x, the output of one of the XNOR gates will be one, indicating that the symbol x is voted. sym_sel x is computed by applying the OR operation to all the XNOR outputs that are compared to the symbol x. All the outputs of the XNOR gates (A bit) connected to R.sub.ni are also added (omitted on
[0098] Each VNU unit has q cells that compute W.sub.n(0)W.sub.mn(0), W.sub.n(.sup.0)W.sub.mn(.sup.0), |W.sub.n(.sup.1)W.sub.mn(.sup.1) . . . , W.sub.mn(.sup.q1)W.sub.mn(.sup.q2)|.
[0099] In
[0100] The outputs of q cells (W.sub.nW.sub.mn) are the inputs of the maximum finder unit as it is shown in
[0101] The latency equation for the whole decoder is the same as the one in [18]; however, if the same frequency wants to be reached the number of pipeline stages must be increased. The latency of the decoder is (q1+pipeline)d.sub.v(#iterations+1)+(q1+pipeline) clock cycles. Computing each sub-matrix takes (2.sup.q1) but the pipeline delay has to be added, so each sub-matrix needs (q1+pipeline) clock cycles. As there are d.sub.v sub-matrices, each iteration needs (q1+pipeline)d.sub.v clock cycles. We have to add one extra iteration to the total number of iterations for the initialization and (q1+pipeline) clock cycles are needed to get the values of the decoded codeword after the iterative process.
Sorter Unit: L-Minimum Finder in a dc Element List
[0102] The proposed ascendant sorter unit can be seen as an L-minimum finder. The architecture is based on the one from [20] but instead of looking for two minimums, it looks for L minimums. The main difference between this proposal and the one in [20] is that we do not apply masks to compute the minimums different from the absolute one. The fact of avoiding masks reduces the critical path at a cost of increasing some hardware resources. We describe an architecture for the sorter unit for L=4 and dc=27, but it can be easily generalized. In addition, the selection of the radix for each one of the stages is not optimized as the objective is just to show that there is a possible solution to implement the ascendant sorter unit with moderated area and continuous processing (each clock cycle a new group of elements can be sorted, after the latency generated by the pipeline with the first input).
[0103]
[0104] The architecture for one unit of stage I=0 is illustrated on
[0105] In the invention, the decoder of
Implementation Results and Comparison
[0106] We here below give estimated results of area and throughput for the decoder architecture described above.
[0107] To perform the estimation, analytical methods as the ones in [21] were applied. To ensure that a fair comparison with other works is done, we overestimate some hardware resources to provide an upper bound in both area and critical path. For example, the area of the first and second maximum finder in
[0108] On the other hand, the MV-SF architecture adds hardware to compute the test vectors, increasing the length of the critical path compared to ES-GBFDA in [18]. The number of gates of the critical path is increased at the check node for the selection of the test vectors (
[0109] Although this critical path is two gates longer than the one in [18], the effect of routing is bigger than the one from the logic depth, so we can assume frequency 238 MHz without introducing a big error in the estimation. Hardware resources for MV-SF decoder with the previous parameters can be found in the table of
[0110] In the table of
[0111] Comparing the ES-GBFDA decoder to the one for implementing the method of the invention, we increase area 1.5M/847K=1.77 times with an slightly lower throughput (540 Mbps instead of 615 Mbps).
[0112] The decoder of Algorithm 2 is 726/360=2 times less efficient than the one based on ES-GBFDA in [18] but it reaches a non-negligible coding gain of 0.44 dB. A direct mapping architecture of Algorithm 2 requires L+1 times the area of ES-GBFDA (L for the less reliable test vectors and one for the most reliable one), which is 4.23 MXOR. Hence, our proposal is 4.23M/1.5M=2.82 times less area consuming than a direct mapping architecture with 12% less throughput. In terms of efficiency, a direct mapping architecture 615 Mbps=4.23 MXORs=145) is more than two times less efficient that the proposal of this work, with the same coding gain.
[0113] To the best knowledge of the authors, the architecture in [15] is the most efficient one based on Min-Sum algorithm.
[0114] Compared to this architecture, the decoder of the invention requires 1.5M/806K=1.86 times more area, but reaches a throughput 540/149=3.62 times higher. The decoder for Algorithm 2 is 360=185=1.9 times more efficient compared to the one in [15] based on Simplified Min-Sum, but a difference of 0.26dB in coding gain should be taken into account. Regarding to Min-Max architectures, we compare to the most efficient one, [22] (other efficient architectures are included in the table of
[0115] To sum up, the architecture based the method of the invention has 1.86 and 2.75 times more area than the ones based on EMS or Min-Max, but reaches a throughput 3.5 times higher. MV-SF introduces 0.26dB of performance loss compared to EMS and 0.21 dB compared to Min-Max.
REFERENCES
[0116] [1] M. Davey and D. J. MacKay, Low density parity check codes over GF(q), IEEE Commun. Letter, vol. 2, pp. 165-167, June 1998.
[0117] [2] L. Barnault and D. Declercq, Fast decoding algorithm for LDPC over GF(2.sup.q), Proc. Info. Theory Workshop, pp. 70-73, Paris, France, March 2003.
[0118] [3] H. Wymeersch, H. Steendam and M. Moeneclaey, Log-domain decoding of LDPC codes over GF(q), Proc. IEEE Intl. Conf. on Commun., pp. 772-776, Paris, France, June 2004.
[0119] [4] C. Spagnol, E. Popovici and W. Marnane, Hardware implementation of GF(2.sup.m) LDPC decoders, IEEE Trans. on Circuits and Syst.-I, vol. 56, no. 12, pp. 2609-2620, December 2009.
[0120] [5] D. Declercq and M. Fossorier, Decoding algorithms for nonbinary LDPC codes over GF(q), IEEE Trans. on Commun., vol. 55, pp. 633-643, April 2007.
[0121] [6] C. Poulliat, M. Fossorier and D. Declercq, Design of regular (2,dc)-LDPC codes over GF(q) using their binary images, IEEE Trans. Commun., vol. 56(10), pp. 1626-1635, October 2008.
[0122] [7] V. Savin, Min-max decoding for non binary LDPC codes, Proc. IEEE ISIT, pp. 960-964, Toronto, Canada, July 2008.
[0123] [8] A. Voicila, D. Declercq, F. Verdier, M. Fossorier and P. Urard, Low-Complexity Decoding for non-binary LDPC Codes in High Order Fields, IEEE Trans. on Commun., vol. 58(5), pp 1365-1375, May 2010.
[0124] [9] E. Boutillon and L. Conde-Canencia, Bubble check: a simplified algorithm for elementary check node processing in extended min-sum non-binary LDPC decoders, Electronics Letters, vol. 46, pp. 633-634, April 2010.
[0125] [10] X. Zhang and F. Cai, Partial-parallel decoder architecture for quasi-cyclic non-binary LDPC codes, Proc. of Acoustics Speech and Signal Processing (ICASSP), pp. 1506-1509, Dallas, Tex., USA, March 2010.
[0126] [11] D. Zhao, X. Ma, C. Chen, and B. Bai, A Low Complexity Decoding Algorithm for Majority-Logic Decodable Nonbinary LDPC Codes, IEEE Commun. Letters, vol. 14, no. 11, pp. 1062-1064, November 2010.
[0127] [12] C. Chen, B. Bai, X. Wang, and M. Xu, Nonbinary LDPC Codes Constructed
[0128] Based on a Cyclic MDS Code and a Low-Complexity Nonbinary Message-Passing Decoding Algorithm, IEEE Commun. Letters, vol. 14, no. 3, pp. 239-241, March 2010.
[0129] [13] Y.-L. Ueng, C.-Y. Leong, C.-J. Yang, C.-C. Cheng, K.-H. Liao, and S.-W. Chen, An efficient layered decoding architecture for nonbinary QC-LDPC codes, IEEE Trans. on Circuits and Systems I: Regular Papers, vol. 59, no. 2, pp. 385-398, February 2012.
[0130] [14] J. Lin and Z. Yan, Efficient shuffled decoder architecture for nonbinary quasicyclic LDPC codes, IEEE Trans. on Very Large Scale Integration (VLSI) Systems, vol. PP, no. 99, p. 1, 2012.
[0131] [15] X. Chen and C.-L. Wang, High-throughput efficient non-binary LDPC decoder based on the Simplified Min-Sum algorithm, IEEE Trans. On Circuits and Systems I: Regular
[0132] Papers, vol. 59, no. 11, pp. 2784-2794, November 2012.
[0133] [16] X. Zhang, F. Cai and S. Lin, Low-Complexity Reliability-Based Message-Passing Decoder Architectures for Non-Binary LDPC Codes, IEEE Trans. on Very Large Scale Integration (VLSI) Systems, vol. 20, no. 11, pp. 1938-1950, September 2011.
[0134] [17] F. Garcia-Herrero, M. J. Canet and J. Valls, Architecture of generalized bit-flipping decoding for high-rate non-binary LDPC codes, Springer Circuits, Systems, and Signal Processing, vol. 32, no. 2, pp. 727-741, April 2013.
[0135] [18] F. Garcia-Herrero, M. J. Canet, J. Valls, Decoder for an Enhanced Serial Generalized Bit Flipping Algorithm, IEEE International Conference on Electronics, Circuits and Systems (ICECS), pp. 412-415, Sevilla, Spain, December 2012.
[0136] [19] B. Zhou, J. Kang, S. Song, S. Lin, K. Abdel-Ghaffar, and M. Xu, Construction of non-binary quasi-cyclic LDPC codes by arrays and array dispersions, IEEE Trans. on Commun., vol. 57, no. 6, pp. 1652-1662, June 2009.
[0137] [20] L. Amaru, M. Martina, and G. Masera, High speed architectures for finding the first two maximum/minimum values, IEEE Trans. on Very Large Scale Integration (VLSI) Systems, vol. 20, no. 12, pp. 2342-2346, December 2012.
[0138] [21] X. Zhang and F. Cai, Reduced-complexity decoder architecture for nonbinary LDPC codes, IEEE Trans. on Very Large Scale Integration (VLSI) Systems, vol. 19, no. 7, pp. 1229-1238, July 2011.
[0139] [22] F. Cai; X. Zhang, Relaxed Min-Max Decoder Architectures for Nonbinary Low-Density Parity-Check Codes, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2013.
[0140] [23] F. Garcia-Herrero, D. Declercq, J. Valls, Non-Binary LDPC Decoder based on Symbol Flipping with Multiple Votes, submitted to IEEE Commun. Letters, 2013.