Reed-Solomon decoders and decoding methods
10439643 ยท 2019-10-08
Assignee
Inventors
- Shayan Srinivasa Garani (Karnataka, IN)
- Thatimattala S V Satyannarayana (Karnataka, IN)
- Yalamaddi Vamshi Krishna (Karnataka, IN)
Cpc classification
H03M13/1111
ELECTRICITY
H03M13/3707
ELECTRICITY
H03M13/45
ELECTRICITY
H03M13/453
ELECTRICITY
H03M13/617
ELECTRICITY
H03M13/1525
ELECTRICITY
International classification
H03M13/15
ELECTRICITY
H03M13/00
ELECTRICITY
H03M13/35
ELECTRICITY
H03M13/37
ELECTRICITY
Abstract
Embodiments of the present disclosure provide a high speed low latency rate configurable soft decision and hard decision based pipelined Reed-Solomon (RS) decoder architecture suitable for optical communication and storage. The proposed RS decoder is a configurable RS decoder that is configured to monitor the channel and adjust code parameters based on channel capacity. The proposed RS decoder includes interpolation and factorization free Low-Complexity-Chase (LCC) decoding to implement soft-decision decoder (SDD). The proposed RS decoder generates test vectors and feeds these to a pipelined 2-stage hard decision decoder (HDD). The proposed RS decoder architecture computes error locator polynomial in exactly 2t clock cycles without parallelism and supports high throughput, and further computes error evaluator polynomial in exactly t cycles. The present disclosure provides a 2-stage pipelined decoder to operate at least latency possible and reduced size of delay buffer.
Claims
1. A Reed-Solomon (RS) decoder comprising: a delay buffer configured to buffer a plurality of test vectors, wherein size of the delay buffer depends on latency of the RS decoder; a syndrome computation (SC) module configured to process the plurality of test vectors and generate syndromes, wherein syndrome computation stage of the SC module takes 2t cycles to compute 2t syndromes; a key equation solver (KES) configured to compute error locator polynomials; and a Chien search and error magnitude computation (CSEMC) module configured to find error location and corresponding error magnitude; wherein the SC module, the KES and the CSEMC module are arranged in a 2-stage pipeline manner thereby reducing the size of the delay buffer.
2. The decoder of claim 1, wherein said RS decoder is a soft decision decoder (SDD).
3. The decoder of claim 2, wherein the SDD is configured to use interpolation and factorization free Low-Complexity-Chase (LCC) decoding.
4. The decoder of claim 1, wherein said RS decoder is a 2-stage hard decision decoder (HDD).
5. The decoder of claim 4, wherein the HDD is a t-symbol correcting decoder based on Berlekamp-Massey (BM) algorithm that takes 2t iterations to obtain error locator polynomials.
6. The decoder of claim 1, wherein said RS decoder is configured to monitor channel capacity, and adjust code parameters based on the monitored channel capacity.
7. The decoder of claim 1, wherein the KES computes error locator polynomials using Berlekamp-Massey (BM) algorithm or Modified Euclidean (ME) algorithm.
8. The decoder of claim 1, wherein a test vector generator module generates 2.sup. test vectors that are provided as input to the SC module.
9. The decoder of claim 1, wherein the error locator polynomials are obtained in 2t clock cycles without any parallelism to achieve high throughput of the RS decoder.
10. The decoder of claim 1, wherein the CSEMC module comprises J-Parallel Chien search architecture.
11. A decoding method in a Reed-Solomon (RS) decoder, the method comprising steps of: buffering, by a delay buffer of the RS decoder, test vectors, wherein size of the delay buffer depends on latency of the RS decoder; calculating, by a syndrome computation (SC) module of the RS decoder that is in communication with the delay buffer, syndromes S.sub.j by processing the test vectors; computing, by a key equation solver (KES) of the RS decoder that is communication with the SC module, error locator polynomials .sub.i from the calculated syndromes S.sub.j; and computing, by a Chien search and error magnitude computation (CSEMC) module of the RS decoder in communication with the KES, error location X.sub.i from the computed error location polynomial .sub.i and corresponding error magnitude Y.sub.i, wherein the SC module, the KES and the CSEMC module are arranged in a 2-stage pipeline manner thereby reducing size of the delay buffer.
12. The method of claim 11, wherein the error locator polynomials .sub.i are calculated from the calculated syndromes S.sub.j using Berlekamp-Massey algorithm.
13. The method of claim 11, wherein the error magnitude Y.sub.i is calculated, by the CSEMC module, using Forney's formula.
14. The method of claim 11, wherein the method further comprises the step of generating 2.sup. test vectors by a test vector generation module, wherein each test vectors of the 2.sup. test vectors is passed to the SC module.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The accompanying drawings are included to provide a further understanding of the present disclosure, and are incorporated in and constitute a part of this specification. The drawings illustrate exemplary embodiments of the present disclosure and, together with the description, serve to explain the principles of the present disclosure.
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
DETAILED DESCRIPTION
(16) The following is a detailed description of embodiments of the disclosure depicted in the accompanying drawings. The embodiments are in such detail as to clearly communicate the disclosure. However, the amount of detail offered is not intended to limit the anticipated variations of embodiments; on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the present disclosure as defined by the appended claims.
(17) Each of the appended claims defines a separate invention, which for infringement purposes is recognized as including equivalents to the various elements or limitations specified in the claims. Depending on the context, all references below to the invention may in some cases refer to certain specific embodiments only. In other cases it will be recognized that references to the invention will refer to subject matter recited in one or more, but not necessarily all, of the claims.
(18) Various terms as used herein are shown below. To the extent a term used in a claim is not defined below, it should be given the broadest definition persons in the pertinent art have given that term as reflected in printed publications and issued patents at the time of filing.
(19) Aspects of the present disclosure relate to the field of error correcting codes. More particularly, the present disclosure relates to a decoder and decoding method for high speed storage and high speed communication systems.
(20) Aspects of the present disclosure provide Reed-Solomon decoder architecture, and methods thereof. In an embodiment, a high-speed low latency rate configurable soft decision and hard decision based 2-stage pipelined Reed-Solomon (RS) decoder architecture is provided.
(21) Embodiments of the present disclosure provide a Reed-Solomon (RS) decoder that includes a syndrome computation (SC) module configured to process test vectors and generate syndromes, wherein syndrome computation stage of the SC module takes 2t cycles to compute 2t syndromes, a key equation solver (KES) configured to compute error locator polynomials, and a Chien search and error magnitude computation (CSEMC) module configured to find error location and corresponding error magnitude, wherein the SC module, the KES and the CSEMC module are arranged in a 2-stage pipeline manner.
(22) In an aspect, the RS decoder can be configured as a soft decision decoder (SDD), wherein the SDD is configured to use interpolation and factorization free Low-Complexity-Chase (LCC) decoding.
(23) In an aspect, the RS decoder can be configured as a 2-stage hard decision decoder (HDD), wherein the HDD is a t-symbol correcting decoder based on Berlekamp-Massey (BM) algorithm that takes 2t iterations to obtain error locator polynomials.
(24) In an aspect, the RS decoder is configured to monitor channel capacity, and adjust code parameters based on the monitored channel capacity.
(25) In an aspect, the KES computes error locator polynomials using Berlekamp-Massey (BM) algorithm or Modified Euclidean (ME) algorithm.
(26) In an aspect, a test vector generator module generates 2.sup. test vectors that are provided as input to the SC module. In an aspect, a delay buffer is configured to buffer the test vectors. In an aspect, the SC module is merged with the KES to achieve low latency and to reduce size of delay buffer.
(27) In an aspect, the error locator polynomials are obtained in 2t clock cycles without any parallelism to achieve high throughput of the RS decoder.
(28) In an aspect, the CSEMC module comprises J-Parallel Chien search architecture.
(29) In an aspect, the present disclosure provides a decoding method for a Reed-Solomon (RS) decoder, the method including the steps of calculating, by a syndrome computation (SC) module, syndromes S.sub.j by processing test vectors; computing, by a key equation solver (KES), error locator polynomials .sub.i from the calculated syndromes S.sub.j; and computing, by a Chien search and error magnitude computation (CSEMC) module, error location X.sub.i from the computed error location polynomial .sub.i and corresponding error magnitude Y.sub.i.
(30) In an aspect, the error locator polynomials .sub.i are calculated from the calculated syndromes S.sub.j using Berlekamp-Massey algorithm, and the error magnitude Y.sub.i is calculated using Forney's formula.
(31) In an aspect, the method further includes the step of generating 2.sup. test vectors by a test vector generation module, wherein each test vectors of the 2.sup. test vectors is passed to the SC module.
(32) In an exemplary implementation, HDD and SDD can be rate configurable and can be configured for use in various channel conditions. The RS decoder can be configured to monitor channel condition and configure the HDD and SDD accordingly.
(33) Soft Decision Decoder (SDD)
(34) In an exemplary implementation, chase decoding algorithm can be used for implementing the proposed SDD as the chase decoding algorithm provides low complexity when compared with other decoding techniques. As one may be aware, other decoding techniques such as algebraic soft decision decoding (ASD) of a RS decoder provides higher coding gain over conventional HDD but involves high computational complexity. ASD decoding facilitates correction of errors beyond bounded distance decoding by using channel reliability information, wherein among ASD methods, Guruswami-Sudan (GS) and KV algorithms give better performance at the expense of higher complexity. On the other hand, chase decoding offers a low complexity solution with comparable performance.
(35) Low Complexity Chase Algorithm
(36) SDD of the present disclosure can use a chase decoder that can be configured to be implemented with a HDD decoder in order to generate a set of test vectors, wherein the chase decoder is easy to implement without compromising on performance parameters. The proposed SDD can correct code words within a decoding radius
(37)
where d.sub.min is the minimum distance of the code. Low complexity chase (LCC) decoder can be configured to generate 2.sup. test vectors based on symbol reliability information, where symbols are selected as least reliable symbols out of n symbols, for which hard decision or second most reliable decision is employed. In order to create the test vectors, a ratio between probability of the hard-decision symbol and probability of the second-best decision can be established, wherein ratio can indicate how good the hard decision is. The desired probability ratio for the received message polynomial r(x) can be estimated using equation-1.
(38)
where r.sub.i.sub._.sub.HD is the hard decision of the symbols, and r.sub.i.sub._.sub.2HD is the second most reliable decision. Corresponding to points with the worst probability ratio (between the hard decision and second-best decision), a set of 2.sup. combinations called test vectors can be created by selecting the hard-decision symbol (r.sub.i.sub._.sub.HD) or the second-best decision (r.sub.i.sub._.sub.2HD) in less reliable points. Second-best decision can be obtained based on information of message symbol probabilities.
(39) For the sake of implementation, a reasonable and simple method of generating second-best decision and test vectors can be used. For example, BPSK modulation over AWGN channel can be used to generate symbol reliabilities and second most reliable decisions. Consider a codeword c=[c.sub.1c.sub.2 . . . c.sub.n] where c.sub.i=[c.sub.i1c.sub.i2 . . . c.sub.im] is an m-bit vector. Co can be modulated to x.sub.ij and x.sub.ij=1-2c.sub.ij. Let r.sub.ij denote the real-valued channel observation corresponding to c.sub.ij. r.sub.ij=c.sub.ij+n.sub.ij, where n.sub.ij are Gaussian random noise samples.
(40) In an aspect, at receiver, hard slicing is done and the received vector thus formed can be denoted as y.sub.i.sup.[HD]. In an aspect, a symbol decision y.sub.1.sup.[HD] is made on each transmitted symbol c.sub.i (from observationsr.sub.i1, . . . r.sub.im). The channel provides reliability of each bit received. Among the m-bits in a symbol, the least reliable bit defines the worst case reliability of that symbol.
(41) Symbol Reliability .sub.i of i.sup.th symbol can be calculated using
(42)
The value .sub.i indicates confidence on the symbol decision y.sub.i.sup.[HD], wherein higher the value of .sub.i, greater is the reliability and vice-versa. Second most reliable decision y.sub.i.sup.[2HD] is obtained from y.sub.i.sup.[HD] by complementing or flipping the bit that achieves the minimum in above equation.
(43) In an exemplary implementation, LCC algorithm can include steps of sorting n-symbol reliability .sub.1, .sub.2, . . . .sub.n in increasing order, i.e. .sub.i1.sub.i2 . . . .sub.in, and forming an index I={i.sub.1, i.sub.2, . . . i.sub.} denoting -smallest reliability values. Further, set of 2.sup. test vectors are generated using the relations as below,
(44)
(45) Further, each test vector is passed on to HDD stage serially and a vector for which the decoding failure does not result, that can be taken as estimated codeword at the receiver. In an exemplary implementation, HDD stage can be implemented based on Berlekamp-Massey algorithm.
(46) Hard Decision Error Correction Procedure
(47) In an exemplary implementation, RS decoder can perform error correction. Error correction procedure can be explained with help of example as below. For example, let a code c(x) be corrupted by an additive error e(x), resulting in r(x). Suppose denotes the number of errors. Then, e(x) has the following form
(48)
where y.sub.i, is the error magnitude at error location i. The syndromes can be calculated as
(49)
where Y.sub.i=y.sub.1 and X.sub.i=.sup.j. The aim is to solve the above 2t equations to get the pairs (X.sub.i,Y.sub.i). In an exemplary implementation, a polynomial known as the error locator polynomial (x) can be defined as
(50)
(51) In an embodiment, a method for decoding is provided. More particularly an error correction method for decoding is described, wherein the method includes the steps of calculating syndromes S.sub.j, and calculating error location polynomial .sub.i from the S.sub.j, error location X.sub.i from .sub.i and error magnitude Y.sub.i. In an exemplary implementation, decoded codeword polynomial c(x) can be obtained by adding e(x) obtained from X.sub.i and Y.sub.i. Detail of the each step is given as below.
(52) Step 1: Calculation of syndromes S.sub.j:
(53) Syndromes can be evaluated using equation-5 from message polynomial r(x).
(54) Step 2: Calculation of error location polynomial .sub.i from S.sub.j. In an exemplary implementation, error locator polynomial .sub.i can be calculated from S.sub.j using Berlekamp-Massey algorithm. In an exemplary implementation, (x) can be calculated iteratively in 2t steps. Let .sup.(x) denote the error locator polynomial at .sup.th step of the iteration. To find (x) iteratively, a logical table 1 can be filled. Let l.sub., be the degree of .sup.()(x). When .sup.th row of table-1 gets filled, the iterative steps can find (+1).sup.th row using procedure shown as below.
(55)
If d.sub.0, another row prior to .sup.th row is searched, where d.sub.0 and number l.sub. in last column of the table-1 has largest value. .sup.(+1)(x), and l.sub.+1 is calculated repeatedly using equation-7 and equation-8 respectively.
.sup.(+1)(x)=.sup.()(x)d.sub..sub..sup.1x.sup.-.sup.(x)Equation-7
l.sub.+1=max[l.sub.,l.sub.+]Equation-8
(56) TABLE-US-00001 TABLE 1 .sup.() (x) d.sub. l.sub. l.sub. 1 1 1 0 1 0 1 S.sub.1 0 0 1 1 S.sub.1x . . . 2t
Step 3: Calculation of error location X.sub.i from error location polynomial .sub.i. In an exemplary implementation, error location X.sub.i can be calculated from error location polynomial .sub.i using Chien's search. In an exemplary implementation, if .sup.i is a root of error locator polynomial (x), it can be concluded that the error is present at location i.
Step 4: Calculation of error magnitude Y.sub.i. In an exemplary implementation, error magnitude Y.sub.i can be calculated using Forney's formula given as Equation-9 below.
(57)
(58) In an exemplary implementation, decoded codeword polynomial c(x) can be obtained by adding e(x) obtained from X.sub.i and Y.sub.i.
(59)
Multiplicity Assignment
(60) In an exemplary implementation, multiplicity assignment module 102 can perform multiplicity assignment, wherein the module 102 can be configured to receive least symbol reliabilities. In the process of sorting the least symbol reliability values, the module 102 gets least symbol reliabilities in increasing order. In process of sorting these, least symbol reliabilities the module 102 gets the locations and the corresponding r.sup.[2HD] values. In an exemplary implementation, loc register and r.sup.[2HD] register can be used to store the least symbol reliability locations and the corresponding secondary decision values respectively. Ch register is useful to sort the symbol reliabilities.
(61)
(62) In an exemplary implementation, each symbol in Ch register can be initialized with all 1. Symbol reliabilities (i) are serially sent for i=0 to 255. In an exemplary implementation, r.sub.i comparators and encoder logic shown in Table 2 can be used to generate EN.sub.k and EN.sub.j signals. These EN.sub.k and EN.sub.j signals can be used for loc register and r.sup.[2HD] register to get the least symbol reliability locations and the corresponding secondary decision values respectively.
(63) TABLE-US-00002 TABLE 2 EN.sub.k (LSB to MSB) EN.sub.j (LSB to MSB) 1111 . . . 1111 1000 . . . 0000 0111 . . . 1111 0100 . . . 0000 0011 . . . 1111 0010 . . . 0000 . . . . . . 0000 . . . 0111 0000 . . . 0100 0000 . . . 0011 0000 . . . 0010 0000 . . . 0001 0000 . . . 0001 0000 . . . 0000 0000 . . . 0000
(64)
(65) Test Vector Generation
(66)
(67)
(68) In an exemplary implementation, error locator and error evaluator polynomials can be used to find error locations and corresponding error magnitudes. Parallelism can be employed to adjust the number of clock cycles required for syndrome computation (SC) and Chien search and error magnitude computation stages. In an exemplary implementation, a delay buffer 708 can be used to buffer received symbols, wherein size of the delay buffer depends on latency of the decoder. Each functional block used for decoding using RS decoder of present disclosure is now described in more detail.
(69) As one may appreciate, in most of the cases, KES 704 determines the throughput as other blocks can employ parallelism for evaluating polynomials at various field elements. This parallelism drives the trade-off between through put and area. Size of the delay buffer 708 can be determined by the latency of the pipelined decoder, which is one of the reasons to decrease the latency.
(70) Syndrome Computation (SC)
(71) In an exemplary implementation, syndrome computation (SC) can be performed by syndrome computation module 702. The syndromes are calculated as S.sub.j=r(.sup.j), 1j2 t.
(72) Key Equation Solver (KES)
(73)
(74) In exemplary implementation, temp register can act as a shift register yielding d.sub. after the adder block 1. As one may observe, output of the adder block 2 can be .sup.+d.sub.d.sub..sup.1x.sup.-.sup.. The .sup. register can be updated based on d.sub. using multiplexer M.sub.2. In Berlekamp iterative process, the syndrome register gets rotated 2t number of times to retain the initial syndrome values that can be used for the next pipeline stage. The idea of rotating the syndrome register 2t number of times saves significant number of registers as the key equation solver does not require another set of registers for storing the syndromes.
(75) In an exemplary implementation, correction factor d.sub.d.sub..sup.1x.sup.-.sup. can be computed to update .sup. register. In an exemplary implementation, d.sub. and d.sub. can be evaluated, but may be any integer value. To compute x.sup.-.sup., a barrel shifter can be used to shift .sup. by -. Barrel shifter is a complex combinational logic that takes huge area and affects maximum operating frequency, in turn affecting the throughput. If x.sup.-.sup.(x) can be stored in the .sup. register instead of .sup.(x), the barrel shifter can be avoided, which can be achieved by a simple left shift by one or multiplying by x before loading into .sup. register as shown in
(76) Chien Search and Error Magnitude Computation
(77) Error evaluator Z(x) can be computed similar to the d.sub. computation as shown in
(78) In an exemplary implementation, J-parallel Chien search can be used to compute roots of a polynomial f(x)=f.sub.0+f.sub.1x+f.sub.2x.sup.2+ . . . +f.sub.nx.sup.n over GF(2.sup.m). As one may appreciate, j field elements can be tested in one cycle. The RS decoder of present disclosure can use this architecture to evaluate (.sup.i) and Z(.sup.i) for 1in. In an exemplary implementation, each constant multiplicand of the architecture to its inverse can be changed.
(79) In an exemplary implementation, (.sup.i) can be used to compute error magnitude as per Forney's equation-9. To compute error magnitude, error locator and magnitude computation architecture can be used. The error locator and magnitude architecture can include .sub.even(x) and .sub.odd (x) constructed using even and odd coefficients of (x) respectively. It turns out that
(80)
over a finite field of characteristic two.
(81) In an exemplary implementation, J-parallel Chien search can be used for .sub.even(x) and .sub.odd (x) calculation separately to avoid evaluating (.sup.i) and (.sup.i) separately. In an exemplary implementation, J-parallel Chien search architecture can also be used for Z(x) to evaluate Z(.sup.i) to find the error magnitude at corresponding error location simultaneously. With J-Chien search architecture, j error magnitude generated in every clock cycle can be added to the corresponding received symbols to get the estimated codeword. To compute entire error vector,
(82)
cycles may be required.
(83) Table-3 below illustrates clock cycles required for each stage of the RS decoder.
(84) TABLE-US-00003 TABLE-3 Stage Clock Cycles Syndrome Computation 2t Key Equation Solver 2t Chien and Error Magnitude Computation
(85) As one may appreciate, throughput of the decoder as a function of error correction capability can be calculated using equation-10 as below.
(86)
where n, m, t and f.sub.max denote code length, symbol width, error correcting capability and maximum operating frequency respectively. If J is chosen such that
(87)
the throughput is maximized. As one may appreciate, selection of J decides the trade-off between the throughput and hardware complexities. Similarly total latency of the designed Reed-Solomon decoder is given by sum of the cycles required for each stage as per Table 3.
(88)
(89) As one may observe, when error evaluation computation and error magnitude computation are divided as independent pipeline stages, through put can be improved further.
(90) 2-Stage Pipelined Low Latency Decoder
(91) In an exemplary embodiment, latency of the HDD decoder can be reduced significantly without affecting the throughput much if both SC and KES are merged into a single pipeline stage as shown in
(92)
(93) For a given code of interest, if a decoder cannot afford parallelism when SC, and Chien search and error magnitude computation take almost 2t number of clock cycles, the proposed low-latency two stage pipelined architecture can be used without any issue, as both the syndrome calculation and Chien search and error magnitude computation stages take almost the same number of clock cycles. With the proposed combination of syndrome computation stage 1102 and key equation resolver stage 1104, the decoder does not need to time multiplex KES architecture with multiple SC and Chien search and error magnitude computation blocks, as required in popular three-stage pipelined architecture to reduce the pipeline stalling. In an exemplary implementation, multiple KES in parallel can be used to handle multiple received vectors and to achieve higher throughput.
(94) As one may appreciate, throughput and latency of the LCC decoder is better than the previous decoder. As a summary, multiplicity assignment unit takes n=255 cycles to get unreliable locations and corresponding r.sub.i.sup.[2HD] values. In an exemplary implementation, a state machine can be used to control EN.sub.1, EN.sub.2, . . . , EN.sub. and get all the 2.sup. test vectors.
(95) In an exemplary implementation, each test vector of the 2.sup. test vectors can be passed through the proposed 2-stage pipelined HDD, wherein the slowest stage in 2-stage pipelined HDD is Chien Search and error magnitude computation block, which takes
(96)
Multiplicity assignment is started when EN.sub.3=0; EN.sub.2=0; EN.sub.i=1 for =3. This is possible because
(97)
Therefore, in the long run, each received vector r.sup.[HD] takes n=255 cycles to get corrected. As a result, throughput and latency of Soft LCC decoder is given as follows by Equation-13 and Equation 14 respectively.
(98)
(99) Table-4 below illustrates summary of RS decoder that uses 2-stage pipelined HDD with block length n=410.
(100) TABLE-US-00004 TABLE 4 Rate 0.8 0.8195 0.839 0.861 0.878 0.897 K 328 336 344 352 360 368 T 41 37 33 29 25 21 Throughput 5.55 5.84 6.156 6.507 6.902 7.347 (Gbps)
(101) In order to compare the RS decoder of present disclosure with other designs in the literature, a RS (255,239) decoder has been designed over GF(2.sup.8). Choosing J=30 such that
(102)
results in a throughput of 24 Gbps. There is a trade-off between area and throughput. One can save the area by choosing J=10, however with reduced J, throughput reduces to 12 Gbps. Table-5 below shows comparison between RS decoder with J=10 designed in accordance with present disclosure and a RS decoder v9.0 of Xilinx IP for RS (255,239) on Kintex-7 FPGA.
(103) TABLE-US-00005 TABLE 5 Xilinx IP Proposed LUTs 1177 8774 FFs 1169 5272 36k BRAMs 1 0 18k BRAMs 2 0 Max. Frequency (MHz) 292 200 Max. Throughput (Gbps) 2.33 12 Latency (cycles) 470 55
(104) As one may appreciate, no BRAMs are used in the proposed RS decoder, as all needed Galois field arithmetic is implemented using LUTs and FFs. In contrast, Xilinx IP utilizes huge memory in terms of two 18 k BRAMs and one 36 k BRAM, which saves the utilization of LUTs and FFs compared to the RS decoder of present disclosure. If one considers BRAMs, LUTs and FFs altogether, one can appreciate that the RS decoder of the present disclosure and Xilinx IP utilizes almost the same hardware resources. However, gain achieved in terms of throughput by RS decoder of present disclosure is almost five times, which is a significant improvement. If the error evaluation computation and error magnitude computation with J=10 were divided as independent pipeline stages, throughput would become 16 Gbps. The implemented design on Kintex-7 FPGA is validated using chip-scope as shown in
(105)
(106) In an aspect, Technology-Scaled Normalized Throughput (TSNT) index is used to evaluate different design performances with different fabrication technologies. As one may appreciate, TSNT index can be calculated using equation-15 as below.
(107)
(108) Table 6 shows comparison of proposed RS decoder with J=10 with respect to competing decoder implementations. TSNT value of proposed 180-nm RS (255.239) decoder is at least 1.8 times better than existing designs, which indicates our decoder would be much more area efficient for a given throughput. Table 6 further illustrates that latency of the proposed 180-nm decoder is reduced by almost 80% compared to the existing 180-nm designs. Further, the proposed RS (255,239) decoder is more than five times better in latency than the RS (255,245) dual-line architecture even though the ECC ability in the RS (255,245) dual-line architecture is less than that of the proposed RS (255,239) decoder.
(109) TABLE-US-00006 TABLE 6 Architecture Proposed Design 1 A B A B Design 2 Design 3 Technology (nm) 65 180 130 130 180 180 Total Gates 31657 34549 37600 30700 18400 20614 Max. Frequency (MHz) 588.23 285.71 606 606 640 400 Latency (clocks) 55 55 298 425 515 512 Latency (ns) 93.5 192.5 491.7 701.3 804.6 1280 Throughput (Mbps) 35293.8 17142.85 7357 4850 5100 3200 NT 1114.88 496.1896 195.7 158 277.17 155.23 TSNT 402.6 496.1896 141.33 114.11 277.17 155.23
(110) Table-7 below shows comparison between an existing LCC decoder and LCC decoder of present disclosure. LCC decoder of present disclosure is implemented using 2-stage pipelined HDD, and consumes more hardware resources than the existing LCC decoder. However, it works with a higher throughput and lower latency. If error evaluation computation and error magnitude computation in HDD were divided as independent pipeline stages, the LCC decoder of present disclosure can meet the condition
(111)
instead of
(112)
Therefore, LCC decoder of the present disclosure achieves the same throughput at a reduced hardware complexity.
(113) TABLE-US-00007 TABLE 7 Existing [4] Designed LUTs 5114 18584 FFs 5399 16621 Max. Frequency (MHz) 150.5 161.29 Max. Throughput (Gbps) 0.71 1.29 Latency (cycles) 1312 559 Platform Virtex-V Kintex-7 Xilinx ISE version 9.2i tool 14.6 tool
(114)
(115) In an exemplary implementation, a RS (255,239) decoder implemented over GF (2.sup.8) can be implemented on Kintex-7 FPGA using VHDL. As one may appreciate, throughput of the pipelined HD RS decoder is almost five times compared to the existing decoders with almost the same hardware resource utilization and operating at a maximum throughput of 24 Gbps. Throughput of SDD is almost twice compared to the existing designs. The overall processing latencies of SDD and HDD of present disclosure are reduced by almost 58% and 80% compared to the existing designs respectively.
(116) RS decoder of the present disclosure uses interpolation and factorization free hybrid Low-Complexity Chase algorithm, and achieves required frame error rate (FER). RS decoder of the present disclosure can achieve same frame error rate (FER) performance at reduced complexity as the one in KV algorithm with m=4.
(117)
(118) In an exemplary aspect, error locator polynomials .sub.i are calculated from the calculated syndromes S.sub.j using Berlekamp-Massey algorithm, and error magnitude Y.sub.i is calculated using Forney's formula.
(119) In an exemplary aspect, the method further includes the step of generating 2.sup. test vectors by a test vector generation module, wherein each test vectors of the 2.sup. test vectors is passed to the SC module.
(120) While the foregoing describes various embodiments of the invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof. The scope of the invention is determined by the claims that follow. The invention is not limited to the described embodiments, versions or examples, which are included to enable a person having ordinary skill in the art to make and use the invention when combined with information and knowledge available to the person having ordinary skill in the art.
Advantages of the Invention
(121) The present disclosure provides a RS decoder and method for high speed storage and communication system.
(122) The present disclosure provides a configurable RS decoder and a method thereof that can enable configurable error correcting capability depending upon channel characteristics and performance.
(123) The present disclosure provides a RS decoder that required less area and achieve high throughput and low latency for high speed storage and communication system.
(124) The present disclosure provides a RS decoder that can achieve high throughout and low latency for high speed storage and communication applications.