Method and decoder for soft input decoding of generalized concatenated codes

Abstract

A soft input decoding method and a decoder for generalized concatenated (GC) codes. The GC codes are constructed from inner nested block codes, such as binary Bose-Chaudhuri-Hocquenghem, BCH, codes and outer codes, such as Reed-Solomon, RS, codes. In order to enable soft input decoding for the inner block codes, a sequential stack decoding algorithm is used. Ordinary stack decoding of binary block codes requires the complete trellis of the code. In one aspect, the present invention applies instead a representation of the block codes based on the trellizes of supercodes in order to reduce the memory requirements for the representation of the inner codes. This enables an efficient hardware implementation. In another aspect, there is provided a soft input decoding method and device employing a sequential stack decoding algorithm in combination with list-of-two decoding which is particularly well suited for applications that require very low residual error rates.

Claims

1. A soft input decoding method for a generalized concatenated code (GCC) constructed from L inner nested linear binary block codes and L outer codes, wherein L2 denotes the number of levels of the GCC and the inner code of a higher level is a sub-code of the inner code of the preceding lower levels, if any, the method comprising: using one or more decoding devices to iteratively decode level by level original data received from a data channel, and performing in each level: a first decoding step for decoding input data of the current level l based on the inner block codes of the current level l and providing respective decoded output data, the input data of the lowest level comprising the original data and the input data of each subsequent level being based on the decoding result of the previous level; and a subsequent second decoding step for decoding the outer code of the current level l based on the output data of the first decoding step to estimate a decoding result of the current level l, wherein the decoding result of the highest level is output as decoded data; wherein the first decoding step of at least one of the levels includes: sequential stack decoding of the input data of the current level l based on a trellis of the inner block code of the current level and using as a soft input information characterizing the reliability of the data channel, wherein the sequential stack decoding is configured to deliver only codewords comprised in the output data of that first decoding step which are represented by the trellis; and selecting as the output data of the first decoding step of the current level l: a first output data estimate resulting from the sequential stack decoding of the input data of the current level l, if its metric value rated according to a predetermined metric reaches or exceeds a predetermined threshold or is superior according to the metric to the respective metric value of a second output data estimate resulting from a second run of the sequential stack decoding in which the first output data estimate is excluded, and the second output data estimate otherwise.

2. The method according to claim 1, wherein: the first decoding step of at least one current level l other than the lowest level comprises trellis-based sequential stack decoding of the input data of the current level l using as a soft input information characterizing the reliability of the data channel; the sequential stack decoding is based on a first trellis representing a first super-code B.sub.1(l) of the inner block code B(l) of the current level l and on a second trellis representing a second supercode B.sub.2(l) of B(l), with B(l)=B.sub.1(i)B.sub.2(l), wherein a supercode of B(l) is a code containing all codewords of B(l) and the redundancy of each of the first supercode B.sub.1(l) and the second supercode B.sub.2(l) is lower than that of B(l); the corresponding inner code B(l1) of the immediately preceding level (l1) is reused as one of the supercodes B.sub.1(l) and B.sub.2(l); and the sequential stack decoding is configured to deliver only codewords comprised in the output data of the first decoding step that are represented by both the first trellis and the second trellis.

3. The method of according to claim 1, which further comprises applying hard decoding to decode the inner block codes in the first decoding step of a first level of the GCC, before applying sequential stack decoding to a higher level based on a trellis of the inner block code of the higher level and using as a soft input information characterizing the reliability of the data channel.

4. The method according to claim 3, wherein the first decoding step comprises determining for at least one data word comprised in the input data of the current level, whether the codeword resulting from hard decision decoding of the data word corresponds to a valid codeword of the inner block code B of the current level.

5. The method according to claim 3, wherein: the metric that is applied in the sequential stack decoding to rate code sequences occurring during the decoding and to order the stack based on the metric values resulting from the rating is one of the Fano metric and the quadratic Euclidean distance metric, or a combination thereof; and the soft input characterizing the reliability of the data channel serves as an input variable of the metric.

6. The method according to claim 1, wherein the block code of the lowest level of the inner nested block codes is a single-error correction code.

7. The method according to claim 1, wherein the inner codes are nested binary extended Bose-Chaudhuri-Hocquenghem (BCH) codes, and the extended BCH code in the lowest level of the inner nested BCH codes is a mere single parity-check, SPC, code; and the extended BCH code in at least one higher level of the inner nested BCH codes has an error correction capability and is a sub-code of the BCH code of the lowest nesting level.

8. The method according to claim 1, which comprises: arranging the original data received from a data channel in a two-dimensional original data matrix having a first dimension n.sub.a equal to the length of the outer codes and a second dimension n.sub.b equal to the length of the inner block codes, wherein a line of the first dimension of a matrix is a row of the matrix and a line of its second dimension is a column of the matrix, or vice versa, and the outer codes are defined over a Galois-Field GF(2.sup.m) such that m elements of each line of the second dimension represent one symbol of the Galois-Field GF(2.sup.m); and a) a first iteration corresponding to the lowest level of the inner block codes of the original data matrix, wherein the first decoding step comprises: applying a decoding scheme of the inner block code of the lowest level to the lines of the second dimension of the original data matrix with respect to the lowest level of the inner nested block codes in which the lines of a second dimension of the original data matrix are encoded in order to obtain an intermediate decoding data matrix of the first iteration and to determine erasure information characterizing lines of the second dimension of the original data matrix in which an erasure has been detected based on the decoding of the inner block code of the lowest level; inferring the information bits contained in the lines of the second dimension of the intermediate decoding data matrix of the first iteration in order to retrieve code symbols (a.sub.i,j) of the outer codes in which the lines of a first dimension of the original data matrix are encoded; and the second decoding step comprises: applying outer decoding corresponding to the respective outer codes to the retrieved code symbols in the lines of the first dimension of the intermediate decoding data matrix of the first iteration in order to obtain a partial decoding result matrix of the first iteration, wherein the erasure information is used during outer decoding to identify erroneous symbols of the outer code in the intermediate decoding data matrix of the first iteration; re-encoding said partial decoding result matrix of the first iteration by applying by applying an encoding scheme of the inner block code of the lowest level to the second dimension of this matrix to obtain a re-encoded matrix of the first iteration; and subtracting the re-encoded matrix of the first iteration from the original data matrix in order to obtain a start matrix for a subsequent further iteration; and b) for each of the further levels of the inner block codes, a respective further iteration wherein the first decoding step of the respective level comprises: applying a decoding scheme of the inner block code of the current level to the lines of the second dimension of the start matrix of the current iteration with respect to the current level of the inner block codes in which the lines of a second dimension of the start matrix of the current iteration are encoded in order to obtain an intermediate decoding data matrix of the current iteration; inferring the information bits contained in the lines of the second dimension of the intermediate decoding data matrix of the current iteration in order to retrieve code symbols of the outer codes in which the lines of a first dimension of the original data matrix are encoded; applying outer decoding corresponding to the respective outer codes used for obtaining the original data matrix during encoding, to the retrieved code symbols in the lines of the first dimension of the intermediate decoding data matrix of the current iteration in order to obtain a partial decoding result matrix of the current iteration; if the current iteration corresponds to the highest nesting level of the in-ner block codes in the original data matrix, outputting the partial decoding result matrix of the current iteration as the decoded data, and otherwise, re-encoding said partial decoding result matrix of the current iteration by applying an encoding scheme of the inner block code of the current level to the second dimension of this matrix to obtain a re-encoded matrix of the current iteration, and subtracting the re-encoded matrix of the current iteration from the start matrix of the current iteration in order to obtain a start matrix for a subsequent further iteration.

9. The method according to claim 1, which comprises, in at least one first decoding step using sequential stack decoding, the decoding of a data word in the input data of the inner code of the current level is terminated, when: the maximum possible path length is reached, wherein the trellis path having the best metric value among the paths accrued so far in the stack is selected as the decoded codeword corresponding to the data word; or a predetermined maximum number of iterations have occurred.

10. The method according to claim 9, wherein, if the termination is caused because a predetermined maximum number of iterations have occurred, the output data of the current first decoding step is marked as an erasure symbol for the corresponding outer code used in the second decoding step of the current level.

11. A decoding device operable to apply soft input decoding for a generalized concatenated code (GCC) constructed from L inner nested linear binary block codes and L outer codes, wherein L2 denotes the number of levels of the GCC and the inner code of a higher level is a sub-code of the inner code of the preceding lower levels, if any, the decoding device comprising: a decoding circuit configured to iteratively decode level by level original data received from a data channel, and performing in each level: a first decoding step for decoding input data of the current level l based on the inner block codes of the current level l and providing respective decoded output data, the input data of the lowest level comprising the original data and the input data of each subsequent level being based on the decoding result of the previous level; and a subsequent second decoding step for decoding the outer code of the current level l based on the output data of the first decoding step to estimate a decoding result of the current level l, wherein the decoding result of the highest level is output as decoded data; wherein the first decoding step of at least one of the levels includes: sequential stack decoding of the input data of the current level l based on a trellis of the inner block code of the current level and using as a soft input information characterizing the reliability of the data channel, wherein the sequential stack decoding is configured to deliver only codewords comprised in the output data of that first decoding step which are represented by the trellis; and selecting as the output data of the first decoding step of the current level l: a first output data estimate resulting from the sequential stack decoding of the input data of the current level l, if its metric value rated according to a predetermined metric reaches or exceeds a predetermined threshold or is superior according to the metric to the respective metric value of a second output data estimate resulting from a second run of the sequential stack decoding in which the first output data estimate is excluded, and the second output data estimate otherwise.

12. The device according to claim 11, wherein: the first decoding step of at least one current level l other than the lowest level comprises trellis-based sequential stack decoding of the input data of the current level l using as a soft input information characterizing the reliability of the data channel; the sequential stack decoding is based on a first trellis representing a first super-code B.sub.1(l) of the inner block code B(l) of the current level l and on a second trellis representing a second supercode B.sub.2(l) of B(l), with B(l)=B.sub.1(i)B.sub.2(l), wherein a supercode of B(l) is a code containing all codewords of B(l) and the redundancy of each of the first supercode B.sub.1(l) and the second supercode B.sub.2(l) is lower than that of B(l); the corresponding inner code B(l1) of the immediately preceding level (l1) is reused as one of the supercodes B.sub.1(l) and B.sub.2(l); and the sequential stack decoding is configured to deliver only codewords comprised in the output data of the first decoding step that are represented by both the first trellis and the second trellis.

13. The device according to claim 11, which further comprises applying hard decoding to decode the inner block codes in the first decoding step of a first level of the GCC, before applying sequential stack decoding to a higher level based on a trellis of the inner block code of the higher level and using as a soft input information characterizing the reliability of the data channel.

14. The device according to claim 13, wherein the first decoding step comprises determining for at least one data word comprised in the input data of the current level, whether the codeword resulting from hard decision decoding of the data word corresponds to a valid codeword of the inner block code B of the current level.

15. The device according to claim 13, wherein: the metric that is applied in the sequential stack decoding to rate code sequences occurring during the decoding and to order the stack based on the metric values resulting from the rating is one of the Fano metric and the quadratic Euclidean distance metric, or a combination thereof; and the soft input characterizing the reliability of the data channel serves as an input variable of the metric.

16. The device according to claim 11, wherein the inner codes are nested binary extended Bose-Chaudhuri-Hocquenghem (BCH) codes, and the extended BCH code in the lowest level of the inner nested BCH codes is a mere single parity-check, SPC, code; and the extended BCH code in at least one higher level of the inner nested BCH codes has an error correction capability and is a sub-code of the BCH code of the lowest nesting level.

17. The device according to claim 11, wherein the decoding circuit is further configured to arrange the original data received from a data channel in a two-dimensional original data matrix having a first dimension n.sub.a equal to the length of the outer codes and a second dimension n.sub.b equal to the length of the inner block codes, wherein a line of the first dimension of a matrix is a row of the matrix and a line of its second dimension is a column of the matrix, or vice versa, and the outer codes are defined over a Galois-Field GF(2.sup.m) such that m elements of each line of the second dimension represent one symbol of the Galois-Field GF(2.sup.m); and a) a first iteration corresponding to the lowest level of the inner block codes of the original data matrix, wherein the first decoding step comprises: applying a decoding scheme of the inner block code of the lowest level to the lines of the second dimension of the original data matrix with respect to the lowest level of the inner nested block codes in which the lines of a second dimension of the original data matrix are encoded in order to obtain an intermediate decoding data matrix of the first iteration and to determine erasure information characterizing lines of the second dimension of the original data matrix in which an erasure has been detected based on the decoding of the inner block code of the lowest level; inferring the information bits contained in the lines of the second dimension of the intermediate decoding data matrix of the first iteration in order to retrieve code symbols (a.sub.i,j) of the outer codes in which the lines of a first dimension of the original data matrix are encoded; and the second decoding step comprises: applying outer decoding corresponding to the respective outer codes to the retrieved code symbols in the lines of the first dimension of the intermediate decoding data matrix of the first iteration in order to obtain a partial decoding result matrix of the first iteration, wherein the erasure information is used during outer decoding to identify erroneous symbols of the outer code in the intermediate decoding data matrix of the first iteration; re-encoding said partial decoding result matrix of the first iteration by applying by applying an encoding scheme of the inner block code of the lowest level to the second dimension of this matrix to obtain a re-encoded matrix of the first iteration; and subtracting the re-encoded matrix of the first iteration from the original data matrix in order to obtain a start matrix for a subsequent further iteration; and b) for each of the further levels of the inner block codes, a respective further iteration wherein the first decoding step of the respective level comprises: applying a decoding scheme of the inner block code of the current level to the lines of the second dimension of the start matrix of the current iteration with respect to the current level of the inner block codes in which the lines of a second dimension of the start matrix of the current iteration are encoded in order to obtain an intermediate decoding data matrix of the current iteration; inferring the information bits contained in the lines of the second dimension of the intermediate decoding data matrix of the current iteration in order to retrieve code symbols of the outer codes in which the lines of a first dimension of the original data matrix are encoded; applying outer decoding corresponding to the respective outer codes used for obtaining the original data matrix during encoding, to the retrieved code symbols in the lines of the first dimension of the intermediate decoding data matrix of the current iteration in order to obtain a partial decoding result matrix of the current iteration; if the current iteration corresponds to the highest nesting level of the in-ner block codes in the original data matrix, outputting the partial decoding result matrix of the current iteration as the decoded data, and otherwise, re-encoding said partial decoding result matrix of the current iteration by applying an encoding scheme of the inner block code of the current level to the second dimension of this matrix to obtain a re-encoded matrix of the current iteration, and subtracting the re-encoded matrix of the current iteration from the start matrix of the current iteration in order to obtain a start matrix for a subsequent further iteration.

18. A nonvolatile memory, the system comprising: a memory array including a plurality of cells configured to store a plurality of data bits and a plurality of parity bits that are calculated from the plurality of data bits according to a coding scheme based on generalized concatenated code (GCC) wherein the GCC is constructed from inner nested linear binary block codes and outer codes; and a decoding device configured to iteratively decode level by level original data received from a data channel, and performing in each level: a first decoding step for decoding input data of the current level l based on the inner block codes of the current level l and providing respective decoded output data, the input data of the lowest level comprising the original data and the input data of each subsequent level being based on the decoding result of the previous level; and a subsequent second decoding step for decoding the outer code of the current level l based on the output data of the first decoding step to estimate a decoding result of the current level l, wherein the decoding result of the highest level is output as decoded data; wherein the first decoding step of at least one of the levels includes: sequential stack decoding of the input data of the current level l based on a trellis of the inner block code of the current level and using as a soft input information characterizing the reliability of the data channel, wherein the sequential stack decoding is configured to deliver only codewords comprised in the output data of that first decoding step which are represented by the trellis; and selecting as the output data of the first decoding step of the current level l: a first output data estimate resulting from the sequential stack decoding of the input data of the current level l, if its metric value rated according to a predetermined metric reaches or exceeds a predetermined threshold or is superior according to the metric to the respective metric value of a second output data estimate resulting from a second run of the sequential stack decoding in which the first output data estimate is excluded, and the second output data estimate otherwise.

19. A computer program comprising instructions that when executed on a decoding device are operable to apply soft input decoding for a generalized concatenated code (GCC) constructed from L inner nested linear binary block codes and L outer codes, wherein L2 denotes the number of levels of the GCC and the inner code of a higher level is a sub-code of the inner code of the preceding lower levels, if any, the instructions executable to: iteratively decode level by level original data received from a data channel, and perform in each level: a first decoding step for decoding input data of the current level l based on the inner block codes of the current level l and providing respective decoded output data, the input data of the lowest level comprising the original data and the input data of each subsequent level being based on the decoding result of the previous level; and a subsequent second decoding step for decoding the outer code of the current level l based on the output data of the first decoding step to estimate a decoding result of the current level l, wherein the decoding result of the highest level is output as decoded data; wherein the first decoding step of at least one of the levels includes: sequential stack decoding of the input data of the current level l based on a trellis of the inner block code of the current level and using as a soft input information characterizing the reliability of the data channel, wherein the sequential stack decoding is configured to deliver only codewords comprised in the output data of that first decoding step which are represented by the trellis; and selecting as the output data of the first decoding step of the current level l: a first output data estimate resulting from the sequential stack decoding of the input data of the current level l, if its metric value rated according to a predetermined metric reaches or exceeds a predetermined threshold or is superior according to the metric to the respective metric value of a second output data estimate resulting from a second run of the sequential stack decoding in which the first output data estimate is excluded, and the second output data estimate otherwise.

Description

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWING

(1) FIG. 1 schematically illustrates a construction of a GC code starting from original data arranged in a data matrix to be encoded into a corresponding codeword matrix, according to a preferred embodiment of the present invention;

(2) FIG. 2 schematically illustrates a simplified version of the encoding scheme of FIG. 1 in more detail;

(3) FIG. 3 illustrates trellises of an example code and of its two supercodes. Trellis a) is a representation of the complete code, whereas the trellises b) and c) are the representations of the supercodes;

(4) FIG. 4 shows a comparison of Algorithm 1 and Algorithm 3 with respect to the residual word error rate (WER);

(5) FIG. 5 shows Comparison of Algorithm 1 and Algorithm 3 with respect to the number of iterations;

(6) FIG. 6 schematically illustrates a decoding scheme for a codeword matrix obtainable from the encoding scheme of FIG. 2, according to a preferred embodiment of the present invention;

(7) FIG. 7 a diagram showing the code rate versus signal to noise ratio in an exemplary embodiment for a likelihood P.sub.e=10.sup.10 of the likelihood of the event that at least one level is in error;

(8) FIG. 8 a diagram showing the code rate versus signal to noise ratio in an exemplary embodiment for a likelihood P.sub.e=10.sup.16 of the likelihood of the event that at least one level is in error;

(9) FIG. 9 schematically illustrates a memory system comprising a coding device, particularly decoding device, incorporated in a memory controller, according to a preferred embodiment of the present invention;

(10) FIG. 10 shows a block diagram of a decoding device for sequential decoding, according to a preferred embodiment of the present invention;

(11) FIG. 11 a block diagram of a priority queue element of the decoding device of FIG. 10; and

(12) FIG. 12 shows a diagram of the average number of iterations for decoding the first and second level as a function of the signal-to-noise ratio, according to a preferred embodiment of the present invention based on Algorithm 2.

DETAILED DESCRIPTION OF THE INVENTION

(13) GC Construction

(14) The coding methods illustrated herein are based on the use of GC codes for error correction in memories, such as flash memories, that require high-rate codes. The GC codes are constructed from inner nested binary Bose-Chaudhuri-Hocquenghem (BCH) codes and outer Reed-Solomon (RS) codes. For the inner codes extended BCH codes are used, where single parity-check codes are applied in the first level of the GC code (cf. FIGS. 1 and 2). This construction enables high-rate GC codes. A detailed general discussion of GC codes can be found in [5].

(15) Reference is now made to FIG. 1, which illustrates a construction of a GC code starting from original data arranged in a data matrix to be encoded into a corresponding codeword matrix, i.e. an encoding process leading to a GC code that is suitable to be decoded by decoding methods and devices according to the present invention. The original data is arranged in an n.sub.bn.sub.a matrix, where n.sub.a and n.sub.b are the lengths of the outer and inner codes, respectively. The encoding starts with the outer codes. The rows are protected by L outer codes, e.g. Reed-Solomon codes, of length n.sub.a, i.e. L denotes the number of levels. m elements of each column represent one symbol from the Galois field GF(2.sup.m). Hence, m neighboring rows form a codeword of an outer code A.sup.(i), i=0, . . . , L1. Note that the code rate of the outer codes increases from level to level. The outer codes protect Lm rows of the matrix. The remaining n.sub.bLm rows are used for the redundancy of the inner codes.

(16) The shaded area in FIG. 1 illustrates the redundancy of the component codes that are filled by outer and inner encoding. After the outer encoding the columns of the codeword matrix are encoded with binary inner codes of length n.sub.b. Each column of the codeword matrix is the sum of L codewords of nested linear BCH codes.
B.sup.(L1).Math.B.sup.(L2).Math. . . . .Math.B.sup.(0)(1)

(17) Hence, a higher level code is a sub-code of its predecessor, wherein the higher levels have higher error correcting capabilities, i.e., t.sub.b,L1t.sub.b,L2 . . . t.sub.b,0, where t.sub.b,i is the error correcting capability of level i. The code dimensions are k.sup.(0)=Lm, k.sup.(1)=(L1)m, . . . , k.sup.(L1)=m. The codeword b.sub.j of the j-th column is the sum of L codewords.

(18) $\begin{matrix} b_{j} = {.Math.}_{i = 0}^{L - 1} b_{j}^{(i)} & (2) \end{matrix}$

(19) These codewords b.sub.j.sup.(i) are formed by encoding the symbols a.sub.j,i with the corresponding sub-code B.sup.(i), where a.sub.j,i is the j-th symbol (m bits) of the outer code A.sup.(i). For this encoding (Li1)m zero bits are prefixed onto the symbol a.sub.j,i. Note that the j-th column b.sub.j is a codeword of B.sup.(0), because of the linearity of the nested codes.

(20) FIG. 2 illustrates this encoding scheme in more detail based on a simple exemplary case (Example 0), where L=2, m=3 and n.sub.a=n.sub.b=7. Thus, in this example there are only L=2 levels and with m.sub.1=m=3 bits per symbol the outer codes, selected to be RS codes in this example, are constructed over the Galois field GF(2.sup.3), while for the sake of simplicity the inner block codes, selected to be extended BCH codes in this example, are defined over GF(2.sup.3), i.e. m.sub.2=3. In a first step S1, a data structure representing a n.sub.bn.sub.a (i.e. 77) matrix D is filled with data to be encoded and consisting of 24 information bits i.sub.0 to i.sub.23. The filling scheme corresponds to that of the data matrix D in FIG. 2. Accordingly, in the first level (i=0) the number of information bits is lower than in the second level (i=1) and the remaining prefixed bits of both levels are reserved for the redundancy added by the outer encoding in a next step S2. Each symbol comprises three neighboring bits of the same column, such that for example in the fifth column of D the bits i.sub.0, i.sub.3 and i.sub.6 form a first symbol in the column, and the bits i.sub.11, i.sub.16 and i.sub.21 form another symbol. In addition, the final row of the matrix D is reserved for redundancy to be added by the inner encoding in a further step S3.

(21) In the outer encoding step S2, the information in each of the two levels i=0 and i=1 is encoded by a respective RS code, wherein the code dimension of the outer RS code for level 0 is only k.sub.a.sup.(0)=3 while the code dimension of level 1 is increased to k.sub.a.sup.(1)=5. Performing the outer encoding step S2 results in an intermediate matrix A comprising the code symbols a.sub.i,j, wherein each of these symbols a.sub.i,j comprises m.sub.1=3 bits and the rows of the matrix A are codewords of the outer code.

(22) In the inner encoding step S3 each of the symbols a.sub.i,j of the intermediate matrix A is individually encoded by a corresponding inner code in the form of an extended BCH code B.sup.(i). In the first level i=0, the respective extended BCH code B.sup.(0) may particularly be, as in this example, a mere Single Parity Check (SPC) code. Accordingly, as exemplarily illustrated in FIG. 2 for symbol a.sub.4,0, each of symbols a.sub.j,0, of level 0 (j=0, . . . , 6) is encoded by prefixing (Li1)m.sub.1, i.e. (201)*3=3 zero bits (i.e. a 0 symbol) onto the symbol and applying an SPC code to the two symbols which is added in the final row of the column representing a resulting codeword b.sub.j.sup.(0) for column j and level 0.

(23) In the second level i=1, the respective extended BCH code B.sup.(1), which unlike the SPC code does have an error correction capability of 1 Bit, is applied in each column of the matrix A to the respective symbol a.sub.j,1. As in this simple example this is already the final level, no prefixing of 0 symbols is necessary. Again, an SPC code is applied to the resulting BCH codeword and added in the final row of the respective column j.

(24) In order to arrive at the final GC codeword matrix C, on a column by column basis all of the individual codewords b.sub.j.sup.(i) of all levels i of column j are added according to formula (2) above in order to receive the corresponding codeword b.sub.j which then forms column j of the resulting GC codeword matrix C, as again exemplarily illustrated in FIG. 2 for column j=4.

(25) In a further example (Example 1) corresponding to FIG. 1, we consider a GC code suitable for error correction in flash memories. The GC code is designed for 2 KB information blocks, i.e. a code which can be used to store 2 KB of data plus additional meta information. For this GC code we use L=6 levels with inner nested BCH codes over GF (2.sup.6), i.e. m.sub.2=6, and outer RS codes over GF (2.sup.9) (i.e. m.sub.1=9). In the first level (i=0) the inner code can correct a single error and therefore six redundancy bits are needed. Thus, the number of rows is n.sub.b=6.Math.9+6=60. All inner codes are binary BCH codes of length n.sub.b=60, where the code B.sup.(0) has a dimension k.sub.b,0=54 and minimum distance d.sub.b,0=3. The outer RS codes are constructed over the Galois field GF(2.sup.9). Hence, the dimension of the inner codes is reduced by m.sub.1=9 bits with each level. The GC code is constructed from L=6 outer RS codes of length n.sub.a=343. The parameters of the codes are summarized in the following Table I. The code has overall dimension k=m .sub.j=0.sup.L1k.sub.a,j=16596 and length n=n.sub.a.Math.n.sub.b=20580. The code has a code rate R=0.806. The design of this code will be discussed later-on.

(26) TABLE-US-00001 TABLE I j k.sub.b, j d.sub.b, j k.sub.a, j d.sub.a, j 0 54 3 187 157 1 45 5 321 23 2 36 9 333 11 3 27 11 331 13 4 18 15 335 9 5 9 17 337 7

(27) The table shows the parameters of the code of example 1. k.sub.b,j and d.sub.b,j are the code dimension and minimum Hamming Distance of the binary inner code of level j. k.sub.a,j and d.sub.a,j are the code dimension and minimum Hamming Distance of the outer RS codes.

(28) This code is also able to correct burst errors. The minimum distance of all outer RS code is greater than or equal to five. Hence, each outer code can correct at least two erroneous symbols and consequently two columns of the codeword matrix may be corrupted by an arbitrary number of errors.

II. Sequential Stack Decoding

(29) The GC decoder processes level by level, where first the inner codes and then the outer codes are decoded. In order to enable soft input decoding of the overall GC code, a soft input decoding algorithm for the inner codes is required. This section describes sequential decoding procedures using the stack algorithm for block codes. These decoding methods are used to decode the binary inner codes.

A. Sequential Stack Decoding Using a Single Trellis

(30) Firstly, a sequential stack decoding process using a single trellis, as presented in [15], is discussed in more detail with reference to Algorithm 1 outlined below in order to better illustrate a starting point of the present invention. All decoding methods of the present invention are based on this decoding method, which uses a trellis, as defined above, to represent the code. Improvements to this decoding algorithm provided by the present invention will be discussed in the subsequent sections.

(31) The sequential decoding procedure as presented in [15] is a stack algorithm, i.e., a stack is required to store interim results. The stack contains code sequences of different lengths. Let v.sub.t denote a code sequence of length t, i.e. v.sub.t=v.sub.1, . . . , v.sub.t. Each code sequence is associated with a metric and a node .sub.t. The node .sub.t is the node in the trellis that is reached by following the path corresponding to the code sequence through the trellis. The metric rates each code sequence and the stack is ordered according to these metric values where the code sequence at the top of the stack is the one with the largest metric value.

(32) The Fano metric for a code bit v.sub.i is defined as follows. Let v.sub.i be the i-th code bit and r.sub.i the i-th received symbol for transmission over a discrete memoryless channel. The Fano metric for a code bit v.sub.i is defined by:

(33) $\begin{matrix} M (r_{i} .Math. v_{i}) = \log_{2} \frac{p (r_{i} .Math. v_{i})}{p (r_{i})} - & (3) \end{matrix}$
where p(r.sub.i|v.sub.i) is the channel transition probability and p(r.sub.i) is the probability to observe r.sub.i at the channel output. The term custom character is a bias term that is typically chosen to be the code rate R [34]. The Fano metric of a code sequence v.sub.t is
M(r.sub.t|v.sub.t)=.sub.i=1.sup.tM(r.sub.i|v.sub.i)(4)
where r.sub.t is the sequence of the first t received symbols. Note that the Fano metric according to the above equations is only defined for discrete memoryless channels (DMC). We consider the quantized AWGN channel which is a DMC. Binary block codes typically have no tree structure. Consequently, the Fano metric is not necessarily the best metric for all linear block codes. However, in particular at least when the inner block codes are specifically selected to be binary BCH codes, custom character =R provides good results for all considered channel conditions.

(34) Algorithm 1 will be demonstrated in the following example (Example 2), where for simplicity a transmission over a binary symmetrical channel (BSC) is assumed:

(35) Consider for instance the code B={(0000),(1110),(1011),(0101)} with parity-check matrix

(36) $H = (\begin{matrix} 1 & 1 & 0 & 1 \\ 0 & 1 & 1 & 1 \end{matrix}) .$

(37) The corresponding trellis is depicted in FIG. 3A. A transmission over a binary symmetrical channel with an error probability of 0.1 is assumed. Hence, the results for the Fano metric are:

(38) $M (r_{i} .Math. v_{i}) {\begin{matrix} 0.3 for r_{i} = v_{i} \\ - 2.8 for r_{i} v_{i} \end{matrix} .$

(39) Algorithm 1 can be described in pseudo-code form as follows:

(40) TABLE-US-00002 Algorithm 1: Sequential stack decoding using a single trellis. Data: received word r Result: estimated codeword {circumflex over (v)} Sequential decoding starts in the first node .sub.0 of the trellis; calculate the metric values for v.sub.1 = 0 and v.sub.1 = 1; insert both paths into the stack according to their metric values; while the top path has not approached the end node .sub.n do remove the code sequence v.sub.t at the top from the stack; if the branch v.sub.t+1 = 0 exists in the trellis for the node .sub.t corresponding to the top path v.sub.t then calculate the metric M(r.sub.t+1|v.sub.t+1) = M(r.sub.t|v.sub.t) + M(r.sub.t+1|v.sub.t+1 = 0); insert the code sequence v.sub.t+1 = (v.sub.t, 0). into the stack; end if the branch v.sub.t+1 = 1 exists in the trellis for the node .sub.t corresponding to the top path v.sub.t then calculate the metric M(r.sub.t+1|v.sub.t+1) = M(r.sub.t|v.sub.t) + M(r.sub.t+1|v.sub.t+1 = 1); insert the code sequence v.sub.t+1 = (v.sub.t,1) into the stack; end end return the codeword {circumflex over (v)} corresponding to the top path in the final iteration;

(41) The following table represents the stack for the received sequence r=(0010) throughout the four iterations needed to calculate the estimated codeword {circumflex over (v)}:

(42) TABLE-US-00003 1st iteration 2nd iteration 3rd iteration 4th iteration v.sub.t M(r.sub.t|v.sub.t) v.sub.t M(r.sub.t|v.sub.t) v.sub.t M(r.sub.t|v.sub.t) v.sub.t M(r.sub.t|v.sub.t) 0 0.3 00 0.6 000 2.2 0000 1.9 1 2.8 01 2.5 01 2.5 01 2.5 1 2.8 1 2.8 1 2.8

(43) Accordingly, the top word after the 4.sup.th and last iteration is output as the estimated codeword {circumflex over (v)}=0000. A negative value indicates that the received word was in error. A positive value indicates that the received word is error free. More errors in the received word lead to a negative value with large magnitude, which indicates a low reliability of the estimated codeword. This indication can then be used by the subsequent decoding of the outer codes to correct the remaining error(s).

B. Supercode Decoding for Nested BCH-Codes

(44) This section starts with a description of the supercode decoding method according to preferred embodiments followed by a discussion of the proposed application of supercode decoding for the nested-BCH codes that are used in the GC code.

(45) A supercode is a superset B.sub.1 of the original code B B.sub.1. In order to decode the original code B, two supercodes B.sub.1 and B.sub.2 have to constructed such that B.sub.1 B.sub.2=B. The supercodes have fewer redundancy bits and thus fewer trellis states. The supercodes can be constructed such that each code has half of the original redundancy bits. This reduces the number of states from O(2.sup.p) to O(2.sup.p/2) in standard order notation, where p is the number of parity bits. The concept of supercode decoding is well-suited for decoding of GC codes, because the higher levels of the nested-BCH codes are supercodes of the lower levels (cf. Equation (1)).

(46) A supercode B.sub.i of the block code B is a code containing all codewords of B. For a linear code B with parity-check matrix H, we can construct two supercodes B.sub.1 and B.sub.2 such that B=B.sub.1 B.sub.2.

(47) Let

(48) $H = (\begin{matrix} H_{1} \\ H_{2} \end{matrix})$
be the parity-check matrix of the code B. This means that H.sub.1 and H.sub.2 are two sub-matrices of H. Then the sub-matrices H.sub.1 and H.sub.2 define the supercodes B.sub.1 and B.sub.2, respectively.

Example 3

(49) Consider for example the code B from Example 2. We obtain
H.sub.1=(1101) custom character B.sub.1={(0000),(1100),(1110),(0010),(1011),(1001),(1011),(0101)}
and
H.sub.2=(0111)B.sub.2={(0000),(1000),(0110),(1110),(1011),(1101),(0011),(0101)},
where the underlined vectors are the codewords of the code B. The corresponding supercode trellises are depicted in FIG. 3(b) and FIG. 3(c), respectively.

(50) Next the proposed sequential decoding algorithm is demonstrated. Any path stored in the stack is associated with a metric value as well as two states .sub.t,1 and .sub.t,2 which are the states in the trellis for supercode B.sub.1 and B.sub.2, respectively. We demonstrate decoding Algorithm 2 in the following example, where we consider the same setup as in Example 2. Algorithm 2 can be described in pseudo-code form as follows:

(51) TABLE-US-00004 Algorithm 2: Sequential stack decoding using supercode trellises. Data:received word r Result: estimated codeword {circumflex over (v)} Sequential decoding starts in the nodes .sub.0,1 and .sub.0,2 of the supercode trellises; calculate the metric values for v.sub.1 = 0 and v.sub.1 = 1; insert both paths into the stack according to their metric values; while the top path has not approached the end nodes .sub.n,1 and .sub.n,2 do remove the code sequence v.sub.t at the top from the stack; if the branch v.sub.t+1 = 0 exists in the trellis for both nodes .sub.n,1 and .sub.n,2 corresponding to the top path v.sub.t then calculate the metric M(r.sub.t+1|v.sub.t+1) = M(r.sub.t|v.sub.t) + M(r.sub.t+1|v.sub.t+1 = 0); insert the code sequence v.sub.t+1 = (v.sub.t,0). into the stack; end if the branch v.sub.t+1 = 1 exists in the trellis for both nodes .sub.n,1 and .sub.n,2 corresponding to the top path v.sub.t then calculate the metric M(r.sub.t+1|v.sub.t+1) = M(r.sub.t|v.sub.t) + M(r.sub.t+1|v.sub.t+1 = 1); insert the code sequence v.sub.t+1 = (v.sub.t,1) into the stack; end end return the codeword {circumflex over (v)} corresponding to the top path in the final iteration;

Example 4

(52) The following table represents the stack for the received sequence r=(0010) for algorithm 2 throughout the five iterations needed to calculate the estimated codeword {circumflex over (v)}:

(53) TABLE-US-00005 1st iteration 2nd iteration 3rd iteration 4th iteration 5th iteration v.sub.t M(r.sub.t|v.sub.t) v.sub.t M(r.sub.t|v.sub.t) v.sub.t M(r.sub.t|v.sub.t) v.sub.t M(r.sub.t|v.sub.t) v.sub.t M(r.sub.t|v.sub.t) 0 0.3 00 0.6 001 0.9 000 2.2 0000 1.9 1 2.8 01 2.5 000 2.2 01 2.5 01 2.5 1 2.8 01 2.5 1 2.8 1 2.8 1 2.8

(54) Accordingly, the top word after the 5.sup.th and last iteration is output as the estimated codeword {circumflex over (v)}=0000. A negative value indicates that the received word was in error. A positive value indicates that the received word is error free. More errors in the received word lead to a negative value with large magnitude, which indicates a low reliability of the estimated codeword. This indication can then be used by the subsequent decoding of the outer codes to correct the remaining error(s). Note that the stack in the third iteration differs from Example 2, because the code sequence 001 exists in both supercode trellises but not in the actual code. This code sequence is deleted in the next iteration, because it cannot be extended in both supercode trellises.

(55) As the previous example demonstrates, the time complexity of the proposed algorithm may be larger than with Algorithm 1. This results from code sequences that exist in the supercodes, but are not valid in the actual code. Nevertheless, both algorithms result in the same codeword:

(56) Theorem 1: Algorithm 1 and Algorithm 2 result in the same estimated codeword.

(57) Proof. Both algorithms differ only with respect to the representation of the code. To prove the proposition, it is sufficient to verify that both representations are equivalent. We first prove by induction that the estimated codeword corresponds to a valid path in both supercode trellises, i.e., it is a codeword in both supercodes. The base case is the initial step where the code bits 0 and 1 are inserted in the stack. Note that a linear code has no code bit positions with constant values. Hence, the transitions v.sub.1=0 and v.sub.1=1 exist in both supercode trellises. For the inductive step, we assume that a path for the code sequence v.sub.t exists in both supercode trellises. It follows from Algorithm 2 that this path is only extended, if the extended path exists in both supercode trellises. This proves the claim that the estimated codeword corresponds to a valid path in both supercode trellises. Now note that B=B.sub.1 B.sub.2, i.e., a path is only valid in both supercode trellises if and only if it is a valid codeword of the code B. Algorithm 2 reduces the space complexity required for representing the code. We demonstrate this in the following example.

Example 5

(58) We consider three BCH codes from Table I. All codes have length n=60. In the first level, we use a single-error correcting code. This code has 3,262 nodes in the trellis. This code is a supercode of the BCH code of the second level. The trellis of the second level has 159,742 nodes. However, utilizing the trellis of the first level code, we require only a single additional supercode trellis with 2,884 nodes to represent the code at the second level. Finally, the code at the third level has a trellis with 7,079,886 nodes. Using supercode decoding, we utilize the trellises of the first and second level and require one additional supercode trellis with 2,410 nodes to represent the third code. With sequential decoding the number of visited nodes in the trellis (the number of iterations) depends on the number of transmission errors. Note that with the presented codes the time complexity with Algorithm 2 is at most 1.75 times larger than with Algorithm 1.

C. Selective Soft Input Decoding and List-of-Two Decoding

(59) Next, two techniques to improve the performance and the complexity of Algorithm 1 will be described, starting with a demonstration that the soft input decoding can be omitted in cases where the hard decision of the received vector corresponds to a valid codeword (selective soft input decoding, see subsection). Thereafter, the proposed sequential list-of-two decoding algorithm is described. List-of-two decoding is motivated by the fact that Algorithm 1 is not a maximum-likelihood decoding procedure. Hence, one may search for further codewords in order to find better candidates than the result of Algorithm 1.

(60) (a) Selective soft input decoding: In the following an additive white Gaussian noise channel with binary phase shift keying is considered. Assume that a binary code symbol v.sub.t custom character .sub.2 is mapped to the transmission symbol x.sub.t {+1, 1} by x.sub.t=12v.sub.t. The transmitted symbol vector x is distorted by a noise vector n such that the received sequence is r=x+n. The noise vector n is a vector of independent identically distributed Gaussian random variables with mean zero. Hence,

(61) $\begin{matrix} p (r_{t} .Math. x_{t} = 1) = \frac{1}{\sqrt{2^{2}}} .Math. e^{\frac{{(r_{t} 1)}^{2}}{2}}, & (5) \end{matrix}$
where .sup.2 denotes the variance of the Gaussian distribution. For this channel, it is common practice to use the quadratic Euclidean distance d.sub.E.sup.2(x, r)=.sub.i=1.sup.n|x.sub.iri/2 as metric, because
arg( custom character P(r|v))=arg(d.sub.E.sup.2(x,r))(6)

(62) However, we have
d.sub.E.sup.2(x,r)=.sub.t=1.sup.nx.sub.t.sup.22 .sub.t=1.sup.nx.sub.tr.sub.t+.sub.t=1.sup.nr.sub.t.sup.2(7)

(63) Let {tilde over (r)}.sub.t=sgn(r.sub.t) denote the sign, i.e., the hard decision, of r.sub.t. Using
.sub.t=1.sup.nx.sub.tr.sub.t=.sub.t=1.sup.n|r.sub.t|2 .sub.t: x.sub.t.sub.r.sub.t|r.sub.t|(8)

(64) one obtains
d.sub.E.sup.2(x,r)=n+4 .sub.t: x.sub.t.sub.r.sub.t|r.sub.t|+2 .sub.t=1.sup.n|r.sub.t|+.sub.t=1.sup.nr.sub.t.sup.2(9)

(65) Note that .sub.t: x.sub.t.sub.r.sub.t|r.sub.t| is the only term in (9) that depends on x. Consequently, instead of minimizing the quadratic Euclidean distance one may also minimize .sub.t: x.sub.t.sub.r.sub.t|r.sub.t|. Note that .sub.t: x.sub.t.sub.r.sub.t|r.sub.t|=0 if the vector {tilde over (r)}=({tilde over (r)}.sub.1, . . . , {tilde over (r)}.sub.n) corresponds to a valid codeword. Hence, in this case, {tilde over (r)} is the maximum-likelihood estimate.

(66) (b) Now we consider list-of-two decoding. In order to enable a trade-off between performance and complexity, we introduce a threshold for the metric of the estimated codeword as exemplified in Algorithm 3 presented below.

(67) TABLE-US-00006 Algorithm 3: Sequential list-of-two decoding. Data:received word r, threshold Result: estimated codeword {circumflex over (v)} If {tilde over (r)} corresponds to a valid codeword, then return the codeword {circumflex over (v)} corresponding to {tilde over (r)}; else calculate a first estimate v.sub.1using either Algorithm 1 or Algorithm 2; If M(r.sub.t|v.sub.t) then Return the codeword {circumflex over (v)} = v.sub.1; else remove v.sub.1 from the stack; calculate a second estimate v.sub.2 using either Algorithm 1 or Algorithm 2; If M (r, v.sub.1) M (r, v.sub.2) then Return {circumflex over (v)} = v.sub.1; else Return {circumflex over (v)} = v.sub.2; end end end

(68) In Algorithm 3, Algorithm 1 is applied to decode the inner codes at the first level, i.e. the codewords of the code B.sup.(0), whereas Algorithm 2 is applied for the lower levels. FIG. 4 presents the performance of Algorithm 1 and Algorithm 3 with respect to the residual word error rate (WER) for transmission over the AWGN channel. The code is a one error correcting binary BCH code of length n=60. This code is later-on used as inner code in the first level of the GC code. The decoding performance and the number of decoding iterations depend on the threshold . FIG. 5 presents a comparison with respect to the number of iterations, where we have used two different threshold values denoted by .sub.1 and .sub.2, respectively. The values of .sub.2 were obtained by computer search in order to minimize the word error rate for a given signal to noise ratio. The values of .sub.1 were chosen to demonstrate that Algorithm 3 can reduce the word error rate compared with Algorithm 1 with a similar complexity.

III. GC Decoding and Decoding Error Probability

(69) The decoder processes level by level starting with i=0, taking original data received from a channel, such as a flash memory device, as input, wherein the original data is arranged in a data matrix that is structured as illustrated in FIG. 1. FIG. 6 depicts the decoding steps. Let i be the index of the current level. First the columns of the matrix are decoded with respect to B.sup.(i) and the information bits have to be inferred (re-imaged) in order to retrieve the code symbols a.sub.i,j of A.sup.(i) where j is the column index. If all symbols of the code A.sup.(i) are inferred the outer code (e.g. RS code) can be decoded. At this point a partial decoding result .sub.i is available. Finally, this result has to be re-encoded using B.sup.(i). The estimated codewords of the inner code B.sup.(i) are subtracted from the matrix before the next level can be decoded.

(70) The detailed encoding and hard input decoding process is described in [36]. In the first level i=0 the soft input decoding according to Algorithm 1 is used. Starting with the second level, the structure of the nested-BCH codes can be exploited and Algorithm 2 be used, where the code at level i1 can be used as supercode of the code of level i. For the implementation, the number of decoding iterations for each inner code may be limited. If the number of iterations exceeds a threshold a decoding failure is declared. For the outer (e.g. RS) codes error and erasure decoding is employed [37], where the decoding failures of the inner codes are regarded as erased symbols of the outer (e.g. RS) code.

A. Probability of a Decoding Error

(71) In the following, an analysis of the probability of a decoding error for the GC decoder is presented followed by an example that illustrates the performance of the proposed decoding procedure.

(72) The performance of the soft input decoding of the inner codes can be determined using Monte Carlo simulation. Let P.sub.b,j be the error probability for the decoding of the inner code B(j). Furthermore, let P.sub.e,j be the corresponding probability of a decoder failure. The probability of a decoding error is bound with the multi-stage decoding algorithm.

(73) Let T.sub.j=n.sub.ak.sub.a,j be the number of redundancy symbols for the outer RS code A(j) at the j-th level. The probability P.sub.a,j of a decoding error with error and erasure decoding at the j-th level can be computed as follows [37]:

(74) $\begin{matrix} P_{a, j} = {.Math.}_{q = 1}^{T_{j}} {.Math.}_{t = [\frac{T_{j} - q}{2}] + 1}^{n_{a} - q} P_{q} (\begin{matrix} n_{a} - q \\ t \end{matrix}) {P_{b, j}^{t} (1 - P_{b, j})}^{n_{a} - q - t} + {.Math.}_{q = T_{j} + 1}^{n_{a}} P_{q} & (10) \end{matrix}$
where P.sub.q is the probability of q erasures.

(75) 0 $\begin{matrix} P_{q} = (\begin{matrix} n_{a} \\ t \end{matrix}) {P_{e, j}^{q} (1 - P_{e, j})}^{n_{a} - q} & (11) \end{matrix}$

(76) Using the union bound, the block error rate P.sub.GC for the GC code, i.e. the likelihood of the event that at least one level is in error, can be estimated

(77) $\begin{matrix} P_{e} {.Math.}_{i = 0}^{L - 1} P_{a, i} & (12) \end{matrix}$

Example 6

(78) Consider the code from Example 1. This code has a code rate R=0.806 and was designed to guarantee P.sub.e10.sup.16 according to (12) for E.sub.B/N.sub.04.7 dB, where soft input decoding is used in the first three levels and hard input decoding in the remaining levels.

B. Comparison Error Correction Performance

(79) We compare the error correction performance of the GC code in different decoding modes with the performance of long BCH codes with hard input decoding. As performance measure, we use the code rate that is required to guarantee for a given signal to noise ratio an overall word error rate less than 10.sup.10 or 10.sup.16, respectively. All codes are constructed similar to the code presented in Example 1. In particular, the inner codes are chosen according to Table I. Whereas the error correcting capability of the outer codes are adapted to obtain the highest possible code rate for a given signal to noise ratio. Note that in this example, the overall code rate of the GC code is at most R=0.9, because of the choice of the inner code.

(80) FIG. 7 depicts the code rate versus the signal to noise ratio for P.sub.e=10.sup.10, whereas the results for P.sub.e=10.sup.16 are presented in FIG. 8. The GC code with soft input decoding outperforms the GC code with hard input decoding for all error probabilities and the BCH code for code rates below 0.88. The soft input decoding was simulated with a 3-bit quantization. The three curves with soft input decoding use different decoding strategies, where the soft input decoding is applied only to the first level, first and the second level, or levels 0 to 2, respectively. The soft input decoding improves the performance by up to 1.3 dB. For instance, the GC code with code rate R=0.8 achieves a block error rate less than 10.sup.16 at a signal to noise ratio of E.sub.b/N.sub.0=4.7 dB which is only 1 dB from the channel capacity of the quantized AWGN channel. For P.sub.e=10.sup.10, the code rate R=0.8 is sufficient for a signal to noise ratio (SNR) of E.sub.b/N.sub.04.6 dB. Note that soft input decoding of the first and second level is sufficient for all SNR values above E.sub.b/N.sub.0=5.5 dB.

IV. Memory System and Architecture of Decoding Device

(81) This section describes an exemplary memory system comprising a decoding device adapted to perform at least one of the decoding methods discussed above in sections II.B and II.C and a related decoder architecture for a GC soft input decoder like the one used in said memory system.

(82) FIG. 9 shows an exemplary memory system 1 comprising a memory controller 2 and a memory device 3, which may particularly be a flash memory device, e.g. of the NAND type. The memory system 1 is connected to a host 4, such as a computer to which the memory system 1 pertains, via a set of address lines A1, a set of data lines D1 and set of control lines C1. The memory controller 2 comprises a processing unit 2a and an internal memory 2b, typically of the embedded type, and is connected to the memory 3 via an address bus A2, a data bus D2, and a control bus C2. Accordingly, host 4 has indirect read and/or write access to the memory 3 via its connections A1, D1 and C1 to the memory controller 2, which in turn can directly access the memory 3 via the buses A2, D2 and C2. Each of the set of lines respectively buses A1, D1, C1, A2, D2 and C2 may be implemented by one or more individual communication lines. Bus A2 may also be absent.

(83) The memory controller 2 is also configured as a coding device and adapted to perform the decoding methods of the present invention, particularly as described above with reference to FIGS. 1 to 8. Thus, memory controller 2 is enabled to decode data read from memory device 3. Typically, memory controller 2 will additionally be adapted to perform a respective encoding for generating encoded data to be written to memory device 3. To that purpose, the memory controller 2 may comprise a computer program residing in its internal memory 2b which is configured to perform one or more of these decoding (and encoding) methods when executed on the processing unit 2a of the memory controller 2. Alternatively, the program may for example reside, in whole or in part, in memory device 3 or in an additional program memory (not shown) or may even be implemented in whole or part by a hard-wired circuit.

(84) Next, we discuss a preferred integration of the stack algorithm as inner decoder into the implementation of the GC decoder presented in [36]. Then the stack algorithm implementation for supercode decoding with its subsystems is presented and discussed. The original hard input GC decoder implementation in [36] uses algebraic syndrome decoding. In this implementation, the first levels of B can decode t.sub.b,0=1 and t.sub.b,1=2 errors. Thus high error correction capabilities of the outer codes A.sup.(0) and A.sup.(1) are required. This leads to lower code rates and a high decoding complexity of those outer codes. On the other hand, the soft decoding complexity of the column codes increases significantly with each code level. Hence soft decoding is of interest for the lower levels. Subsequently the algebraic decoding logic for the column code remains in the implementation. Therefore, it is possible to check whether the syndrome is zero. In this case, the codeword can be assumed to be correct, i.e., neither algebraic decoding nor sequential decoding result in a different codeword.

A. Decoding Logic

(85) A brief overview of an exemplary decoding system according to a preferred embodiment, which may particularly be implemented in a single decoding device, is depicted in FIG. 10. The system consists of a word array 5 of size n.sub.b and a desired width which stores the q-ary word. Furthermore, a demultiplexer 6 selects the currently processed bit position depending on the top path of the stack and delivers this value to a metric calculator 7. Based on the received codeword symbol r.sub.i and the previous metric M(r.sub.t1|v.sub.t1) the metric module 7 returns M(r.sub.t|v.sub.t) to a priority queue block 8, which is used to implement a stack for sequential stack decoding. To represent the supercode trellis asynchronous ROM is used, wherein preferably a separate ROM 9a, 9b is used for each trellis. For each supercode, each word of the respective ROM 9a, 9b represents a trellis node .sub.t,i of the respective trellis. The data consists of two pointers for the successor nodes v.sub.t+1=0 and v.sub.t+1=1. Depending on the top entry of the priority queue 8 the desired codeword symbol is selected and the next branches for the actual nodes .sub.t,1 and .sub.t,2 are loaded from the trellis ROM 9a, respectively 9b. The priority queue 8 unloads the top entry and loads the new paths in a single clock cycle.

(86) Each entry of the priority queue 8 contains several elements. The first element is the metric value. The path in the trellis, the length of the path, and a pointer to the current node are stored. All entries have to be ordered by the metric values such that the top entry has the highest value. The process of the priority queue 8 starts with its initialization. The starting node, its initial metric value and the path length are set. Each update cycle begins with the load phase in which the next node pointers are loaded from the trellis ROM 9a, 9b. Simultaneously the next codeword symbol is loaded based on the path length index. The next metric value can be determined based on the code symbol and the available branches. With binary codes, there exists at least one possible branch and at most two branches. The resulting branches are pre-sorted using combinatorial logic. In the following these two entries are called the major and the minor entries, where the major entry has the better metric value.

(87) All priority queue elements are successively ordered in a chain. FIG. 11 shows a possible implementation of a priority queue element. Each element can exchange its date with its previous or next neighbor. Furthermore, each element can decide whether it keeps its own data, take the data from its neighbor, load the new major data or the new minor data. In each element, the metric value is compared with the new value. The result of this comparison is signaled to its predecessor and successor elements. If the signal of a predecessor is false and the major metric value comparator gives a positive signal, the new major value will be stored. Likewise, if an element receives a false signal from its successor and the minor metric value comparator signals a new metric value that is less than the current value, the new minor data is stored. In the case that an element receives a signal from its neighbors, space for the new data has to be created by shifting all entries to next element. There exist two special cases that have to be taken into account. The first special case occurs, if a node has only a single outgoing branch. In this case, the shifting of elements has to be prevented by signaling. The second special case occurs, if the new major and the new minor elements are designated to be inserted into the same entry register. This case can be detected and preventing by passing this value to the next element.

(88) The algorithm terminates, if the maximum possible path length is reached. The stored path in the top element is the decoded codeword. In a practical implementation, an iteration counter may be used, that terminates after a determined maximum number of iterations. This abort can be used to mark this decoded GCC column as an erasure symbol for the outer (RS) code. In order to decode supercodes (cf. Algorithm 2 or Algorithm 3 based on Algorithm 2), the following extensions have to be implemented. The metric calculation has to take all trellis branches of each supercode into account. Furthermore, all node pointers have to be stored in the priority queue elements. Preferably, for each supercode a distinct ROM, particularly a different ROM device, is used, which represents its trellis.

B. Area Comparison

(89) This section describes an exemplary FPGA implementation of the proposed soft input decoder according to a preferred embodiment and compares it with the hard input decoder presented in [36]. The hard input decoder uses algebraic decoding. It consists of the syndrome calculation, the Berlekamp-Massey algorithm (BMA), and the Chien search module. The soft input decoder is implemented as proposed in Section II-B above. It has two limitations. First, the length of the priority queue is limited to 64 elements. Furthermore, the accuracy of the metric calculation is limited to 16 bits and a 3-bit quantization is used for the input symbols.

(90) The stack algorithm has a variable execution time depending on the error pattern. This algorithm needs at least 61 cycles to traverse the entire trellis, if no error occurred. This case can be omitted by checking whether the syndrome of a column word is zero. If no error is detected, the soft decoding can be avoided and thus only a single cycle is needed. FIG. 12 compares the average number of cycles needed for the stack algorithm. It shows the dependency between the channel bit error rate and the computational complexity, i.e., fewer errors lead to fewer decoding cycles. Note that the algebraic hard-input decoder needs four cycles for the first and six cycles for the second level.

(91) Next, a FPGA synthesis result for the stack algorithm is presented. The synthesis was performed with a Xilinx Vivado and a Virtex-7 target device. Table II shows the number of slices and look-up tables (LUT) of the hard input and the soft input decoder with 3-bit quantization. From these results, we observe that the number of logic elements required for the stack algorithm is about 80% of the number of logic gates required for the GC hard input decoder.

(92) TABLE-US-00007 TABLE II results of FPGA analysis Module LUT Slices RS module (t = 78) Syndrome 1 701 1 395 BMA 21 701 6 662 Forney alg. 1 046 729 Chien search 854 712 BCH Module (n = 60, t = 8) Syndrome 184 46 BMA 2 006 732 Chien search 1 557 240 reimage 148 336 TOTAL 29 197 10 852 Stack algorithm 23 896 9 885

(93) While above at least one exemplary embodiment of the present invention has been described, it has to be noted that a great number of variation thereto exists. Furthermore, it is appreciated that the described exemplary embodiments only illustrate non-limiting examples of how the present invention can be implemented and that it is not intended to limit the scope, the application or the configuration of the herein-described apparatus' and methods. Rather, the preceding description will provide the person skilled in the art with constructions for implementing at least one exemplary embodiment of the invention, wherein it has to be understood that various changes of functionality and the arrangement of the elements of the exemplary embodiment can be made, without deviating from the subject-matter defined by the appended claims and their legal equivalents.

REFERENCES

(94) [1] A. Fahrner, H. Griesser, R. Klarer, and V. Zyablov, Low-complexity GEL codes for digital magnetic storage systems, IEEE Transactions on Magnetics, vol. 40, no. 4, pp. 3093-3095, July 2004. [2] J. Freudenberger, U. Kaiser, and J. Spinner, Concatenated code constructions for error correction in non-volatile memories, in Int. Symposium on Signals, Systems, and Electronics (ISSSE), Potsdam, October 2012, pp. 1-6. [3] J. Freudenberger, J. Spinner, and S. Shavgulidze, Generalized concatenated codes for correcting two-dimensional clusters of errors and independent errors, in Int. Conference on Communication and Signal Processing (CSP), Castelldefels-Barcelona, February 2014, pp. 1-5. [4] I. Dumer, Concatenated codes and their multilevel generalizations. in Handbook of Coding Theory, Vol. II, Elsevier, Amsterdam, 1998. [5] M. Bossert, Channel coding for telecommunications. Wiley, 1999. [6] V. Zyablov, S. Shavgulidze, and M. Bossert, An introduction to generalized concatenated codes, European Transactions on Telecommunications, vol. 10, no. 6, pp. 609-622,1999. [7] J. Spinner and J. Freudenberger, Decoder architecture for generalized concatenated codes, IET Circuits, Devices & Systems, vol. 9, no. 5, pp. 328-335, 2015. [8] A. Neubauer, J. Freudenberger, and V. Khn, Coding Theory: Algorithms, Architectures and Applications. John Wiley & Sons, 2007. [9] D. Chase, Class of algorithms for decoding block codes with channel measurement information, IEEE Transactions on Information Theory, pp. 170-182, 1972. [10] C. Argon, S. McLaughlin, and T. Souvignier, Iterative application of the Chase algorithm on Reed-Solomon product codes, Proceedings IEEE ICC 2001, pp. 320-324, 2001. [11] M. Fossorier and S. Lin, Soft-decision decoding of linear block codes based on ordered statistics, IEEE Trans. Inform. Theory, vol. IT-41, pp. 1379-1396, September 1995. [12] B. Dorsch, A decoding algorithm for binary block codes and J-ary output channels, Information Theory, IEEE Transactions on, vol. 20, no. 3, pp. 391-394, May 1974. [13] M. Tomlinson, C. Tjhai, and M. Ambroze, Extending the Dorsch decoder towards achieving maximum-likelihood decoding for linear codes, IET Communications, vol. 1, no. 3, pp. 479-488, June 2007. [14] A. Gortan, R. Jasinski, W. Godoy, and V. Pedroni, Achieving near-MLD performance with soft information-set decoders implemented in FPGAs, in 2010 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), December 2010, pp. 312-315. [15] L. Aguado and P. Farrell, On hybrid stack decoding algorithms for block codes, Information Theory, IEEE Transactions on, vol. 44, no. 1, pp. 398-409, Janurary 1998. [16] J. Wolf, Efficient maximum likelihood decoding of linear block codes using a trellis, IEEE Transactions on Information Theory, vol. 24, no. 1, pp. 76-80, January 1978. [17] J. Freudenberger, T. Wegmann, and J. Spinner, An efficient hardware implementation of sequential stack decoding of binary block codes, in IEEE 5th International Conference on Consumer ElectronicsBerlin (ICCE-Berlin), September 2015, pp. 135-138. [18] J. Freudenberger and M. Bossert, Maximum-likelihood decoding based on supercodes, in Proc. 4th. International ITG Conference Source and Channel Coding, Erlangen, Germany, January 2004, pp. 185-190. [19] J. Freudenberger, Bounded Distance Decoding and Decision Feedback. Dsseldorf, Germany: VDI Verlag, 2004. [20] R. Micheloni, A. Marelli, and R. Ravasio, Error Correction Codes for Non-Volatile Memories. Springer, 2008. [21] X. Zhang and K. K. Parhi, High-speed architectures for parallel long BCH encoders, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 13, no. 7, pp. 872-877, 2005. [22] F. Sun, S. Devarajan, K. Rose, and T. Zhang, Design of on-chip error correction systems for multilevel NOR and NAND flash memories, IET Circuits, Devices Systems, vol. 1, no. 3, pp. 241-249, June 2007. [23] J. Freudenberger and J. Spinner, A configurable Bose-Chaudhuri-Hocquenghem codec architecture for flash controller applications, Journal of Circuits, Systems, and Computers, vol. 23, no. 2, pp. 1-15, Feburary 2014. [24] S. Cho, D. Kim, J. Choi, and J. Ha, Block-wise concatenated BCH codes for NAND flash memories, IEEE Transactions on Communications, vol. 62, no. 4, pp. 1164-1177, April 2014. [25] D. Kim and J. Ha, Quasi-primitive block-wise concatenated BCH codes for NAND flash memories, in IEEE Information Theory Workshop (ITW), November 2014, pp. 611-615. [26] G. Dong, N. Xie, and T. Zhang, On the use of soft-decision error-correction codes in NAND Flash memory, IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 58, no. 2, pp. 429-439, February 2011. [27] K. Zhao, W. Zhao, H. Sun, X. Zhang, N. Zheng, and T. Zhang, LDPC-in-SSD: Making advanced error correction codes work effectively in solid state drives, in Presented as part of the 11th USENIX Conference on File and Storage Technologies (FAST 13). San Jose, Calif.: USENIX, 2013, pp. 243-256. [Online]. Available: https://www.usenix.org/conference/fast13/technical-sessions/presentation/zhao [28] J. Wang, K. Vakilinia, T.-Y. Chen, T. Courtade, G. Dong, T. Zhang, H. Shankar, and R. Wesel, Enhanced precision through multiple reads for ldpc decoding in flash memories, IEEE Journal on Selected Areas in Communications, vol. 32, no. 5, pp. 880-891, May 2014. [29] W. Lin, S.-W. Yen, Y.-C. Hsu, Y.-H. Lin, L.-C. Liang, T.-C. Wang, P.-Y. Shih, K.-H. Lai, K.-Y. Cheng, and C.-Y. Chang, A low power and ultrahigh reliability LDPC error correction engine with digital signal processing for embedded NAND flash controller in 40 nm corns, in Symposium on VLSI Circuits Digest of Technical Papers, June 2014, pp. 1-2. [30] K. Haymaker and C. A. Kelley, Structured bit-interleaved LDPC codes for MLC flash memory, IEEE Journal on Selected Areas in Communications, vol. 32, no. 5, pp. 870-879, May 2014. [31] Solid-State Drive (SSD) Requirements and Endurance Test Method (JESD218). JEDEC SOLID STATE TECHNOLOGY ASSOCIATION, 2010. [32] S. Li and T. Zhang, Improving multi-level NAND flash memory storage reliability using concatenated BCH-TCM coding, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 18, no. 10, pp. 1412-1420, October 2010. [33] S. Qi, D. Feng, and J. Liu, Optimal voltage signal sensing of nand flash memmory for ldpc code, in Signal Processing Systems (SiPS), 2014 IEEE Workshop on, October 2014, pp. 1-6. [34] J. Massey, Variable-length codes and the Fano metric, IEEE Transactions on Information Theory, vol. 18, no. 1, pp. 196-198, 1972.35] V. Sorokine and F. Kschischang, A sequential decoder for linear block codes with a variable bias-term metric, IEEE Transactions on Information Theory, vol. 44, no. 1, pp. 410-416, 1998. [36] J. Spinner and J. Freudenberger, Design and implementation of a pipelined decoder for generalized concatenated codes, in Proceedings of 27th Symposium on Integrated Circuits and Systems Design (SBCCI), Aracaju, Brazil, September 2014, pp. 1-16. [37] L. Weiburn and J. Cavers, Improved performance of Reed-Solomon decoding with the use of pilot signals for erasure generation, in Vehicular Technology Conference, 1998. VTC 98. 48th IEEE, vol. 3, May 1998, pp. 1930-1934 vol. 3. [38] C. Yang, Y. Emre, and C. Chakrabarti, Product code schemes for error correction in MLC NAND flash memories, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 20, no. 12, pp. 2302-2314, December 2012. [39] J. Oh, J. Ha, J. Moon, and G. Ungerboeck, Rs-enhanced TCM for multilevel flash memories, IEEE Transactions on Communications, vol. 61, no. 5, pp. 1674-1683, May 2013. [40] E. Yaakobi, J. Ma, L. Grupp, P. Siegel, S. Swanson, and J. Wolf, Error characterization and coding schemes for flash memories, in IEEE GLOBECOM Workshops, December 2010, pp. 1856-1860. [41] E. Yaakobi, L. Grupp, P. Siegel, S. Swanson, and J. Wolf, Characterization and error-correcting codes for TLC flash memories, in Computing, Networking and Communications (ICNC), 2012 International Conference on, January 2012, pp. 486-491. [42] R. Gabrys, E. Yaakobi, and L. Dolecek, Graded bit-error-correcting codes with applications to flash memory, IEEE Transactions on Information Theory, vol. 59, no. 4, pp. 2315-2327, April 2013. [43] R. Gabrys, F. Sala, and L. Dolecek, Coding for unreliable flash memory cells, IEEE Communications Letters, vol. 18, no. 9, pp. 1491-1494, September 2014.

(95) The following is a summary list of reference numerals and the corresponding structure used in the above description of the invention: 1 memory system 2 memory controller, including coding device 2a processing unit 2b embedded memory of memory controller 3 nonvolatile memory (NVM), particularly flash memory 4 host 5 word array 6 demultiplexer 7 metric calculator 8 priority queue block, implementing stack 9a,b read-only memory, ROM A address lines of ROMs D data lines of ROMs A1 address line(s) to/from host D1 data line(s) to/from host C1 control line(s) to/from host A2 address bus of NVM, e.g. flash memory D2 data bus of NVM, e.g. flash memory C2 control bus of NVM, e.g. flash memory

Method and decoder for soft input decoding of generalized concatenated codes

Inventors

Cpc classification

Classification Explorer

H03M13/2927

ELECTRICITY

Classification Explorer

H03M13/618

ELECTRICITY

Classification Explorer

H03M13/1515

ELECTRICITY

Classification Explorer

H03M13/293

ELECTRICITY

Classification Explorer

H03M13/3944

ELECTRICITY

Classification Explorer

H03M13/2906

ELECTRICITY

Classification Explorer

H03M13/2909

ELECTRICITY

Classification Explorer

H03M13/1191

ELECTRICITY

Classification Explorer

G11B2020/1836

PHYSICS

Classification Explorer

H03M13/2993

ELECTRICITY

Classification Explorer

H03M13/356

ELECTRICITY

Classification Explorer

H03M13/152

ELECTRICITY

Classification Explorer

H03M13/458

ELECTRICITY

Classification Explorer

G11B2020/1859

PHYSICS

Classification Explorer

H03M13/2948

ELECTRICITY

Classification Explorer

H03M13/29

ELECTRICITY

Classification Explorer

H03M13/253

ELECTRICITY

International classification

Classification Explorer

H03M13/00

ELECTRICITY

Classification Explorer

H03M13/29

ELECTRICITY

Classification Explorer

H03M13/11

ELECTRICITY

Classification Explorer

H03M13/15

ELECTRICITY

Classification Explorer

H03M13/39

ELECTRICITY

Classification Explorer

H03M13/35

ELECTRICITY

Classification Explorer

H03M13/45

ELECTRICITY

Classification Explorer

H03M13/25

ELECTRICITY

Abstract