Minimum-size belief propagation network for FEC iterative encoders and decoders and related routing method

11526396 · 2022-12-13

Assignee

Inventors

Cpc classification

International classification

Abstract

The invention relates to an interconnection network for forward error correction encoders and decoders, including N input terminals, N output terminals, and M stages. Each stage includes switching elements having input pins and output pins. The input pins of the switching elements of the first stage are connected to the input terminals, and the output pins of the switching elements of the last stage are connected to the output terminals. The input and output pins of the switching elements of immediately successive stages are connected in a hardwired fashion so as to form a plurality of interconnection sub-networks for routing respective input values from respective output pins of the switching elements of the first stage to respective input pins of the switching elements of the last stage.

Claims

1. An interconnection network for forward error correction encoders and decoders, including N input terminals, N output terminals, and M stages; wherein N is a non-prime positive integer, and wherein M is a positive integer equal to, or higher than, two; wherein the M stages include a first stage and a last stage and wherein each stage includes switching elements having, each, respective input pins and respective output pins; wherein the input pins of the switching elements of the first stage are connected to the input terminals, and the output pins of the switching elements of the last stage are connected to the output terminals; wherein, the input and output pins of the switching elements of immediately successive stages are connected in a hardwired fashion so as to form a plurality of interconnection sub-networks for routing, each, respective input values from respective output pins of the switching elements of the first stage to respective input pins of the switching elements of the last stage; whereby the interconnection network is operable to route, on the basis of routing commands applied to the switching elements, N input values received at the N input terminals through the M stages and the interconnection sub-networks to provide, at the N output terminals, N output values corresponding to, or circularly shifted with respect to, said N input values received at the N input terminals; characterized in that M denotes a number of given submultiples of N whose product is equal to N; wherein each stage is associated with a respective submultiple of said M given submultiples of N, and includes S.sub.i switching elements, each having sm.sub.i respective input pins and sm.sub.i respective output pins, wherein S.sub.i=N/sm.sub.i, wherein sm.sub.i denotes said respective submultiple associated with said stage, and wherein i denotes said stage and is a positive integer comprised between one and M; wherein each switching element is configured to: receive, at the sm.sub.i respective input pins, sm.sub.i respective input values; and provide at the sm.sub.i respective output pins, on the basis of a respective routing command applied to said switching element, sm.sub.i respective output values corresponding to, or circularly shifted with respect to, said sm.sub.i respective input values received at the sm.sub.i respective input pins; wherein the interconnection sub-networks are not connected to each other; wherein the interconnection network includes sm.sub.1interconnection sub-networks for routing, each, N/sm.sub.1 respective input values from N/sm.sub.1 respective output pins of the switching elements of the first stage to N/sm.sub.i respective input pins of the switching elements of the last stage; wherein sm.sub.1 denotes the submultiple, among said M given submultiples of N, which is associated with the first stage; and wherein at least one of: (i) the stages are arranged, from the first stage to the last stage, according to a decreasing or increasing order of said M given submultiples of N; (ii) the stages are arranged, from the first stage to the last stage, according to a decreasing order of said M given submultiples of N, whereby the first stage is associated with the greatest submultiple among said M given submultiples of N; (iii) the interconnection network includes three or more stages thereby resulting in M being equal to, or higher than, three; wherein each interconnection sub-network includes, at each stage different than the first and the last stages, N .Math. k = 1 i sm k switching elements of said stage, wherein i denotes said stage and is a positive integer comprised between two and M−1; and wherein, at each stage different than the first and the last stages, the switching elements of said stage included in each interconnection sub-network have, each, the sm.sub.i respective input pins connected, in a hardwired fashion, to respective output pins of sm.sub.i respective switching elements of an immediately previous stage, and the sm.sub.i respective output pins connected, in a hardwired fashion, to respective input pins of sm.sub.i respective switching elements of an immediately successive stage; (iv) N is different from a power of two; or (v) each switching element includes sm.sub.i respective multiplexers, each of which is: connected to all the sm.sub.i respective input pins of said switching element; connected to a respective output pin of the sm.sub.i respective output pins of said switching element; and operable to connect the respective output pin to one of said sm.sub.i respective input pins of the switching element, depending on the respective routing command applied to said switching element.

2. The interconnection network of claim 1, wherein the first stage is associated with the greatest submultiple among said M given submultiples of N.

3. A routing method for the interconnection network as claimed in claim 1; said routing method comprising: computing, for each switching element, a respective routing command based on: an overall circular shift to be applied by the interconnection network to the N input values received at the N input terminals, the respective stage which said switching element belongs to, and a respective identifier of the switching element, which respective identifier is related to the respective stage which said switching element belongs to, or the respective stage and the respective interconnection sub-network which said switching element belongs to; and applying, to each switching element, the respective routing command computed, wherein said respective routing command is indicative of a circular shift to be applied by said switching element to the sm.sub.i respective input values received at the sm.sub.i respective input pins.

4. A forward error correction encoder comprising the interconnection network as claimed in claim 1.

5. A forward error correction decoder comprising the interconnection network as claimed in claim 1.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) For a better understanding of the present invention, preferred embodiments, which are intended purely by way of non-limiting examples, will now be described with reference to the attached drawings (all not to scale), wherein:

(2) FIGS. 1 and 2 schematically illustrate, respectively, a binary switching element in direct-connection and crossed-connection states and a three-stage Benes network employing six of said binary switching elements;

(3) FIG. 3 schematically illustrates a typical three-stage Clos network architecture;

(4) FIGS. 4 and 5 schematically illustrate an SCCC decoder provided with an interconnection network according to a preferred embodiment of the present invention;

(5) FIG. 6 schematically illustrates an example of rearrangement of a memory intended to store extrinsic information and log-likelihood ratio data;

(6) FIG. 7 schematically illustrates a 3×3 switching element used in the interconnection network according to said preferred embodiment of the present invention;

(7) FIGS. 8 and 10 schematically illustrate an example of architecture for the interconnection network according to said preferred embodiment of the present invention;

(8) FIG. 9 shows a preferred architecture for the 3×3 switching element of FIG. 7;

(9) FIGS. 11-17 schematically illustrate examples of cyclic shifts implemented by the interconnection network of FIG. 10;

(10) FIG. 18 schematically illustrates DVB-S2 LDPC encoding section architecture according to the prior art;

(11) FIGS. 19 and 20 schematically illustrate DVB-S2 LDPC decoding section architecture according to the prior art;

(12) FIG. 21 schematically illustrates DVB-S2 LDPC decoding section architecture provided with an interconnection network according to the present invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS OF THE INVENTION

(13) The following discussion is presented to enable a person skilled in the art to make and use the invention. Various modifications to the embodiments will be readily apparent to those skilled in the art, without departing from the scope of the present invention as claimed. Thence, the present invention is not intended to be limited to the embodiments shown and described, but is to be accorded the widest scope consistent with the principles and features disclosed herein and defined in the appended claims.

(14) The present invention relates to an interconnection network that can be advantageously exploited in all iterative FEC decoders and coders, in particular those based on SCCCs, PCCCs and LDPCs.

(15) In particular, the present invention allows to make very efficient, easily routable and controllable belief propagation networks for any iterative FEC decoder (i.e., LDPC-based, PCCC-based or SCCC-based) by exploiting regular interleaver algorithm properties so that some switching elements of a general Clos network and, in addition, also the most critical parts of the routing algorithms of the Clos networks (specifically, the routing algorithms of the first stage of a Clos network) can be removed. Analogous networks, of reduced data path size, may be advantageously exploited also for FEC encoders.

(16) The interconnection network according to the present invention has general factorization properties, requiring only that the number of inputs (i.e., the parallelism degree) be a non-prime number, and allows to achieve high throughput of encoders and decoders, by enabling parallel processing architecture while maintaining the required memory fixed.

(17) Hereinafter two examples of application of the interconnection network according to the present invention will be described in detail, one related to SCCCs and the other one to LDPCs. In this connection, it is worth noting that, in view of the example of application of the present invention to the SCCC technology described in detail in the following, also the application of the interconnection network according to the present invention to the PCCC technology will be immediately clear, mutatis mutandis, for those skilled in the art.

(18) As is known, with concatenated convolutional codes (SCCCs and PCCCs), decoding operations are performed by means of two (or even more than two, for PCCCs) distinct entities, which decode the data separately from one another; in particular, the two entities operate on, and exchange, extrinsic information containing soft information indicative of the belief on decoded bits (specifically, the belief that a decoded bit is 1 or 0).

(19) The data to be decoded (or coded), due to the above-explained parallel interleaver properties, are organized into totally separate sub-blocks, with no overlapping among the various blocks. In other words, the extrinsic information memory (i.e., the memory for storing the extrinsic information) can be partitioned into a number of sub-blocks equal to the parallelism degree; the codes have to be designed so as to allow for computational parallelism and minimized hardware requirements. The memory partition is different in each processing phase of the codec; the challenge of a high throughput codec architecture is to guarantee an efficient parallel implementation in both the phases. To be more clear, processing phases are the inner and outer processing phases of SCCC, or the natural or permuted order processing phases in a two-basic-codes PCCC. As a matter of fact, some examples are introduced to better highlight that the accessed sub-blocks are different in each decoding phase, e.g., in inner and outer decoding of the SCCCs.

(20) In this connection, FIG. 4 shows a simplified block diagram representing a functional architecture of an SCCC decoder (denoted as a whole by 100), that includes: an interconnection network 110 according to a preferred embodiment of the present invention, wherein said interconnection network 110 is able to carry out data exchange between inner and outer codes in real-time, by avoiding any conflict in memory access; a memory 120 for storing extrinsic information data, wherein said memory 120 is organized in τ different separate memory banks 121; and a set of Soft Input Soft Output (SISO) processors 130 including τ parallel processors 131 operable to perform decoding.

(21) In order to allow parallel implementation of the SCCC decoder 100, the design of the interleaver guarantees that the memory banks 121 do not overlap with each other during the two phases.

(22) In one of the two phases (i.e., inner and outer), the reading from the memory banks 121 is sequential and the sub-blocks are easily identified in FIG. 4, while in the other phase the subdivision into sub-blocks is functional (and hence it is not evident in FIG. 4).

(23) The interconnection network 110 allows to route the data in an instantaneous, efficient manner, without any computation constraint thanks to the implementation of a new, innovative self-routing algorithm according to an aspect of the present invention.

(24) The same structure can be reused for any degree of parallelism and for any code whose coding/decoding process can be split into two stages as in FIG. 4. Said architecture can be used for the PCCC or SCCC codecs and also for the LDPC defined in the DVB-S2 standard.

(25) FIG. 5 shows in greater detail the functional architecture of the SCCC decoder 100, wherein: a depuncturer 140 (hereinafter referred to as depuncturing unit too) is also shown; the memory 120 is implemented by means of two 4096×256 dual port Random-Access Memories (RAMs) 122; and τ=30 and, hence, the interconnection network 110 is a 30×30 interconnection network (i.e., it has thirty inputs and thirty outputs).

(26) In particular, the SCCC decoder 100 is a binary decoder that works on input quantities which are log-likelihood ratios (LLRs) of the bits encoded by the corresponding binary encoder at the transmitter side.

(27) During the inner phase, the parallel SISO processors 130 use the LLRs written into the memory 120 along with the extrinsic information to update said extrinsic information.

(28) The interconnection network 110 realizes the interleaving (in reading) and the de-interleaving (in writing) and is used to scramble the extrinsic information from the outer to the inner phase.

(29) The depuncturing unit 140 is necessary to re-insert the bits punctured at the transmitter side (a punctured bit corresponds to a neutral LLR value equal to zero). The LLRs are routed towards the SISO processors 130 through the depuncturing unit 140 only during the inner phase; the interconnection network 110 is also used only during the inner phase to route towards the SISO processors 130 the extrinsic information (u) containing soft information which is part of the belief on decoded bits from the memory 120 and vice versa.

(30) In the outer phase, instead, the memory 120 is read in the natural order.

(31) The aforesaid operation represents the classic implementation of a SCCC decoder.

(32) Before describing in greater detail the interconnection network 110, it is worth explaining also the logical path that has led the Applicant to conceive the interconnection network according to the present invention.

(33) An initial goal of the Applicant was that of designing a hardware architecture with the most suitable parallelism with respect to SCCC, PCCC or LDPC requirements. Hereinafter, for the sake of description simplicity, reference will be made again to the SCCC application.

(34) In order to make possible a generic parallelism degree to be chosen as an optimum trade-off between technology constraint and required throughput, the Applicant has designed a simplified architecture where the hardware parallelism degree can be any sub-multiple of the original interleaving parallelism, so that a scalable interleaving structure is obtained, that considerably simplifies the interleaving complexity for any degree of parallelism (which can be decided also on the basis of the target technology). The challenge is to guarantee that there is no collision in any of the memory banks, even scaling down the original design implying τ sub-blocks, whereas one may desire to be able to implement also a parallelism degree φ lower than τ. For example, let us assume that only 30 processors are implemented, whereas the designed (original) parallelism is 120; theoretically, at every clock rising edge, access to only 30 separate memory banks is necessary, wherein each memory bank is selected and accesses by only one processor at a time. In this case, it is possible to define and use 30 new memory blocks, each constituted by 4 old memory banks (since original parallelism is 120), as schematically illustrated in FIG. 6 (where an example of rearrangement of an extrinsic information and log-likelihood ratio data memory is shown). In this way, each of the 30 processors carries out in sequence the tasks of 4 of the 120 original processors in a four times longer processing time. The first processor carries out the tasks of the processors 0, 1, 2, 3, the second one carries out the tasks of the processors 4, 5, 6, 7, and so on up to thirstiest processor; hereinafter the 4 aforementioned tasks of every processor will be called encoding/decoding phases 0, 1, 2, 3. At every clock rising edge, two different processors cannot access simultaneously to one memory bank, thanks to the interleaving laws so that there is no conflict among processors; as a matter of fact, the original design of the interleaver is able to guarantee this no-conflict condition. In addition, this solution simplifies the routing issue, since it reduces the required fan-in of the employed multiplexers from 120:30 to 30:30, thereby considerably reducing the high fan-out of memory banks and the complexity of the multiplexers (in fact, this design requires that only a 30-way MUX per SISO be implemented). In a general approach, φ memory banks and φ multiplexers φ:1 are used, thereby guaranteeing a high flexibility (scalability) and allowing to implement the codecs in a number of different technologies, on the basis of the requirements.

(35) Due to the rearrangement and redefinition of the memory banks and the reduction of parallelism from τ to φ (for example, from 120 to 30), the π and ρ parameters must be modified in accordance with the following laws:
π.sub.new=π.sub.old+[(ρ.sub.old+m)mod n]*W,  (6)
ρ.sub.new=[(ρ.sub.old+m)mod τ]div n,  (7)
where div denotes the Euclidean division operation producing an integer quotient by dividing [(ρ.sub.old+m)mod τ] by n, mod denotes the modulo operation producing an integer remainder by dividing (ρ.sub.old+m) by, respectively, n and τ, m denotes the decoding phase (0, 1, 2, 3), W denotes the size of the memory sub-block and is equal to the ratio between the size of the overall memory I and the parallelism degree τ, and n denotes the ratio between the original parallelism (i.e., τ) and the new one (i.e., φ), namely n=τ/φ.

(36) It is worth noting that hereinafter the notations div and mod will be used with the same meaning as the one just explained (i.e., to denote, respectively, the Euclidean division operation producing an integer quotient, and the modulo operation producing an integer remainder) and, hence, they will not be described again.

(37) By applying the equations (6) and (7) to above-described example (wherein τ=120 and φ=30), there results that:
π.sub.new=π.sub.old+[(τ.sub.old+m)mod 4]*W,
τ.sub.new=[(τ.sub.old+m)mod 120]div 4.

(38) Thence, two additional bits are required to represent the address π; in particular, two bits are introduced at the beginning of the address to distinguish the four original memory banks composing the new memory banks. The new interleaving command parameters should be computed in run-time, since the memory necessary to store them is remarkable (four times greater than the previous one). The computation is easy and doesn't requires considerable hardware resources.

(39) Getting back to the description of the interconnection network 110 according to the preferred embodiment of the present invention, it is worth noting that said interconnection network 110 is a simplified interconnection network designed to operate as a sort of cyclic shift register of size τ (conveniently, with τ=30) and to carry out a cyclic shift of a generic number of positions ρ at a single clock cycle. It is important to stress the point that the size τ can be generalized to any generic size N, provided that N is a non-prime positive integer.

(40) The interconnection network 110, differently from Benes networks, doesn't use a single type of switching elements, but is based on switching elements which have different size and output the input value appropriately shifted.

(41) In this connection, FIG. 7 schematically illustrates an example of 3×3 switching element (denoted by 111) that can be used in the interconnection network 110. In particular, said switching element 111 can be set by two control line bits into three connection states, wherein: in a first connection state (denoted by a) in FIG. 7), the switching element 111 outputs, at a first output, a bit O.sub.0=I.sub.0 received at a first input, at a second output, a bit O.sub.1=I.sub.1 received at a second input, and at a third output, a bit O.sub.2=I.sub.2 received at a third input; in a second connection state (denoted by b) in FIG. 7), the switching element 111 outputs at the first output, a bit O.sub.1=I.sub.1 received at the second input, at the second output, a bit O.sub.2=I.sub.2 received at the third input, and at the third output, a bit O.sub.0=I.sub.0 received at the first input; and in a third connection state (denoted by c) in FIG. 7), the switching element 111 outputs at the first output, a bit O.sub.2=I.sub.2 received at the third input, at the second output, a bit O.sub.0=I.sub.0 received at the first input, and at the third output, a bit O.sub.1=I.sub.1 received at the second input.

(42) The size of the switching elements used in the interconnection network 110 depends on the factorization of the size τ=N=30 of said cyclic-shift-register-like interconnection network 110 into prime numbers (or primes). For example, since the factorization of τ=N=30 results in 30=2*3*5, then the interconnection network 110 is based on switching elements of size 2×2, 3×3, and 5×5.

(43) Alternatively, it is also possible to use switching elements with size based on (prime and non-prime) submultiples of N whose product is equal to N. For example, for N=30, it is also possible to use switching elements of size 6×6 and 5×5, or of size 15×15 and 2×2.

(44) Conveniently, the switching elements' size may be chosen to reduce the architectural complexity or the number of used FPGA LEs.

(45) Thence, the flexibility represents another advantage of the interconnection network according to the present invention.

(46) Conveniently, the interconnection network 110 is based on a factorization of N=30 into primes 2, 3, 5 and, hence, includes three stages, wherein a stage includes 30/5=6 5×5 switching elements, another stage includes 30/3=10 3×3 switching elements, and a further stage includes 30/2=15 2×2 switching elements.

(47) Instead, if N were equal to 120 and were decomposed into submultiples 2, 3, 4, 5, the interconnection network would include four stages, wherein a stage would include 120/5=24 5×5 switching elements, another stage would include 120/4=30 4×4 switching elements, a further stage would include 120/3=40 3×3 switching elements, and a yet further stage would include 120/2=60 2×2 switching elements.

(48) More in general, an interconnection network according to the present invention includes N inputs and N outputs (N being a non-prime positive integer) and M stages;

(49) wherein M is a positive integer equal to, or higher than, two, and denotes a number of predefined submultiples {sm.sub.1, . . . , sm.sub.M} of N whose product is equal to N—namely, in mathematical terms,

(50) .Math. i = 1 M sm i = N ;

(51) and wherein each stage i (with i=1, . . . , M) is associated with a respective submultiple sm.sub.i of said predefined submultiples {sm.sub.1, . . . , sm.sub.M} of N, and includes S.sub.i respective cyclic-shift-register-like switching elements, each having sm.sub.i inputs and sm.sub.i outputs, wherein S.sub.i=N/sm.sub.i.

(52) FIG. 8 shows an example of architecture for the interconnection network 110 based on a factorization of N=30 into primes 2, 3, 5 (thereby resulting in M=3 stages). In the example of FIG. 8, the interconnection network 110 includes: a first stage (i=1) associated with the factor (more in general, the submultiple) sm.sub.1=5 and comprising six (i.e., S.sub.1=6) 5×5 switching elements 112 designed to operate as a sort of cyclic shift registers; a second stage (i=2) associated with the factor (more in general, the submultiple) sm.sub.2=3 and comprising ten (i.e., S.sub.2=10) 3×3 switching elements 111 designed to operate as a sort of cyclic shift registers (as shown in FIG. 7 and previously described); and a third stage (i=3) associated with the factor (more in general, the submultiple) sm.sub.3=2 and comprising fifteen (i.e., S.sub.3=15) 2×2 switching elements 113, wherein said 2×2 switching elements 113 may be conveniently implemented by means of the binary switching element shown in FIG. 1.

(53) Therefore, the greatest switching elements (i.e., the 5×5 ones) are arranged at the first stage and the size of the switching elements decreases as the index i of the stages increases. However, also the opposite arrangement may be conveniently used.

(54) The outputs of the 5×5 switching elements 112 of the first stage are connected to the inputs of the 3×3 switching elements 111 of the second stage so as to generate (i.e., form) five interconnection sub-networks of size N/sm.sub.1=30/5=6.

(55) In particular, at the second stage, the 3×3 switching elements 111 are organized (i.e., grouped) into five sub-stages 114, each including two 3×3 switching elements 111, and the first output pin of each of the six 5×5 switching elements 112 is connected to a first sub-stage 114 (specifically, to an input pin of a 3×3 switching element 111 belonging to said first sub-stage 114); the second output pin of each of the six 5×5 switching elements 112 is connected to a second sub-stage 114 different from the first sub-stage 114 (specifically, to an input pin of a 3×3 switching element 111 belonging to said second sub-stage 114); the third output pin of each of the six 5×5 switching elements 112 is connected to a third sub-stage 114 different from said first and second sub-stages 114 (specifically, to an input pin of a 3×3 switching element 111 belonging to said third sub-stage 114); the fourth output pin of each of the six 5×5 switching elements 112 is connected to a fourth sub-stage 114 different from said first, second and third sub-stages 114 (specifically, to an input pin of a 3×3 switching element 111 belonging to said fourth sub-stage 114); and the fifth output pin of each of the six 5×5 switching elements 112 is connected to a fifth sub-stage 114 different from said first, second, third and fourth sub-stages 114 (specifically, to an input pin of a 3×3 switching element 111 belonging to said fifth sub-stage 114); whereby five interconnection sub-networks of size N/sm.sub.1=30/5=6 are formed from the 5×5 switching elements 112 of the first stage towards the 3×3 switching elements 111 of the second stage.

(56) Moreover, for each sub-stage 114 and, hence, for each interconnection sub-network of size 6, the first output pins of the two 3×3 switching elements 111 belonging to said sub-stage 114 are connected to one and the same respective first 2×2 switching element 113 (in particular, to the two input pins thereof); the second output pins of the two 3×3 switching elements 111 belonging to said sub-stage 114 are connected to one and the same respective second 2×2 switching element 113 (in particular, to the two input pins thereof); and the third output pins of the two 3×3 switching elements 111 belonging to said sub-stage 114 are connected to one and the same respective third 2×2 switching element 113 (in particular, to the two input pins thereof);

(57) whereby the sub-stages 114 generate (i.e., form), towards the 2×2 switching elements 113, three further interconnection sub-networks of size N/(sm.sub.1×sm.sub.2)=30/(5×3)=2 for each interconnection sub-network of size 6.

(58) More in general, for an interconnection network of size N, said size N is decomposed/factorized into submultiples/prime factors sm.sub.1*sm.sub.2*sm.sub.3.

(59) Thence, the first stage includes N/sm.sub.1 switching elements of size sm.sub.1×sm.sub.1, which generate sm.sub.1 interconnection sub-networks of size N/sm.sub.1.

(60) At the second stage, N/(sm.sub.1*sm.sub.2) switching elements of size sm.sub.2×sm.sub.2 are used for each sub-network of size N/sm.sub.1, and at the second stage there are, in total,

(61) N sm 1 * sm 2 * sm 1 = N sm 2
switching elements of size sm.sub.2×sm.sub.2.

(62) At the third stage, N/(sm.sub.1*sm.sub.2*sm.sub.3) switching elements of size sm.sub.3×sm.sub.3 are used for each sub-network of size N/(sm.sub.1*sm.sub.2), and at the third stage there are, in total,

(63) N sm 1 * sm 2 * sm 3 * sm 1 * sm 2 = N sm 3
switching elements of size sm.sub.3×sm.sub.3.

(64) The same applies for further stages of interconnection networks having more than three stages.

(65) Considering two immediately successive stages i and i+1, for each switching element sm.sub.i×sm.sub.i of the stage i, the sm.sub.i output pins of said switching element sm.sub.i×sm.sub.i are connected to sm.sub.i different interconnection sub-networks: the first output pin is connected to the first sub-network, the second output pin is connected to the second sub-network, and so on up to the sm.sub.i-th output pin. This applies to all the existing switching elements and all the existing sub-networks at that level.

(66) As shown in FIG. 8, the output pins of the 2×2 switching elements 113 of the last stage are directly connected to the output terminals of the interconnection network 110 with no numbering reorganization.

(67) On the contrary, the input pins of the 5×5 switching elements 112 of the first stage are connected to the input terminals of the interconnection network 110 reorganizing them in a different order, that is indicated by the labeling order in FIG. 8.

(68) In order for the algorithm for defining the reorganization labeling order to be better understood and in order to define where the k-th input terminal of the interconnection network 110 is to be connected, the k-th output terminal is conveniently searched. Then it is possible to proceed across the network 110 from output to input, from the third stage (i.e., the 2×2 switching elements 113), up to the second stage (i.e., the 3×3 switching elements 111) until the first stage is reached (i.e., the 5×5 switching elements 112). In finding the route, the switching elements 113, 111 and 112 are considered as set in pass-through permutation (direct connection—i.e., no shift). So the numbers shown at the input terminals of the interconnection network 110 in FIG. 8 are easily determined, showing the reassignment order.

(69) The interconnection network 110 has to be able to carry out a cyclic shift; thence, this allows the size of the sub-networks to be reduced from the first stage to the last one, as it will be explained. In the example shown in FIG. 8, due to Clos networks partitioning and factorization rules, the first stage subdivides the original 30×30 network into 5 different 6×6 sub-networks, that include elementary switching elements 111,113 from the second and third (output) stages. In the third (output) stage, there are 15 different 2×2 switching elements 113; when an input value, from the first stage to the second one and finally to the third one, is routed towards a smaller sub-network, then that value cannot be routed outside in another sub-network. For this reason, the interconnection network 110 is able (specialized) to realize cyclic shifts; this is not a limitation for FEC codecs but rather, merely a complexity simplification.

(70) It is worth noting that, from a conceptual point of view, the interconnection network 110 represents only a non-limiting example of an interconnection network according to the present invention. In fact, although the interconnection network 110 of size thirty includes three stages, each associated with a respective submultiple of thirty (i.e., five, three and two, respectively), wherein the three stages are arranged, from the first stage to the last one, according to a decreasing order of the aforesaid submultiples of thirty, an interconnection network according to a more general embodiment of the present invention has a generic size N (with N non-prime positive integer decomposed/factorized into a certain number M of given submultiples, with M positive integer equal to, or higher than, two) and includes M stages that may be conveniently arranged, from the first stage to the last stage, according to a decreasing or increasing order (or, even more in general, according to any order) of said given submultiples of N. Said interconnection network according to said more general embodiment of the present invention can be considered as comprising sm.sub.i interconnection “main” sub-networks (with sm.sub.i denoting the submultiple, among said given submultiples of N, that is associated with the first stage of the interconnection network) for routing, each, N/sm.sub.i respective input values from N/sm.sub.i respective output pins of the switching elements of the first stage to N/sm.sub.i respective input pins of the switching elements of the last stage; wherein each “main” interconnection sub-network conveniently includes, for each pair of immediately successive stages i and i+1 after the first one, sm.sub.i respective hardwired interconnection “sub-subnetworks”, with sm.sub.i denoting the submultiple (among said given submultiples of N) that is associated with the stage i. For example, in case of an interconnection network of generic size N (again, with N decomposed/factorized into a certain number of given submultiples) with stages arranged, from the first stage to the last one, according to a decreasing order of the given submultiples of N (such as the interconnection network 110 in the non-limiting example of FIG. 8), there are sm.sub.max interconnection “main” sub-networks (with sm.sub.max denoting the greatest submultiple among said given submultiples of N, which greatest submultiple is associated with the first stage—e.g., five in the interconnection network 110) for routing, each, N/sm.sub.max respective input values from N/sm.sub.max respective output pins of the switching elements of the first stage (e.g., six respective output pins of the 5×5 switching elements 112 of the interconnection network 110) to N/sm.sub.max respective input pins of the switching elements of the last stage (e.g., six respective input pins of the 2×2 switching elements 113 of the interconnection network 110). Moreover, with specific reference to the non-limiting example of FIG. 8, for each of the five interconnection “main” sub-networks of the interconnection network 110, there are three respective hardwired interconnection “sub-subnetworks” between the respective 3×3 switching elements 111 of the second stage and the respective 2×2 switching elements 113 of the third stage.

(71) As for the self-routing algorithm of the interconnection network 110, let us assume that a cyclic shift (i.e., the number of positions to be shifted) ρ is required at a clock cycle, which means that, given a bit vector received at the input terminals of the interconnection network 110, said interconnection network 110 has to provide, at the output terminals, the bit vector cyclically (or circularly) shifted by ρ.

(72) Hereinafter, rules for commanding the individual switching elements of the interconnection network 110 to achieve a specific cyclic shift ρ will be described.

(73) To this end, it is convenient to define an index j for identifying the switching elements 111, 112 and 113; the same index j can be used for any multistage network with generic number of stages and elementary switches; in particular, j can be obtained by means of the following expression (where, with reference to the example shown in FIG. 8, i=1 for the 5×5 switching elements 112 of the first stage, i=2 for the 3×3 switching elements 111 of the second stage, and i=3 for the 2×2 switching elements 113 of the third stage—in the general case, i denotes the different stages of one and the same interconnection network):

(74) j = N * .Math. l = 1 i 1 sm l - 1 - p mod [ .Math. l = 1 i 1 sm l ] ,
where p, for the first stage, is the label assigned to the first input to the switching elements 112 of the first sub-network; i.e., 0, 2, 4, 1, 3, 5 depending on the input block chosen.

(75) For the subsequent stages, that aggregate switching elements 111 and 113 in sub-networks, it is sufficient to identify the index j for the second stage and, furthermore, the index j will be repeated for the various sub-networks as shown in FIG. 8.

(76) The interconnection network 110 in FIG. 8 is connected in direct mode (no shift), in order to derive the indexes j. The same rule applies also for sub-networks at the subsequent stages.

(77) The configuration of a generic switching element 111,112,113 of the interconnection network 110 may be conveniently computed as:

(78) C ( ρ , i , j ) = { ( ρ + j ) mod [ N * .Math. l = 1 i - 1 1 sm l ] } div [ N * .Math. l = 1 i 1 sm l ] , ( 8 )
where j assumes the following values

(79) j ϵ [ 0 ; ( N * .Math. l = 1 i 1 sm l ) - 1 ]
and the following convention applies

(80) .Math. l = 1 0 1 sm l = 1.
C(ρ,i,j) denotes the switching element configuration command; it depends on three independent variables, the number of shift positions ρ, the stage i and the label (index) j assigned to the switching element.

(81) For the second and third stages, j is obtained by the same formula in which p is obtained by following the network backwards from the first upper input of the switching element, considering the network connected in a pass-through mode, and reaching the first stage input label index. So, starting from the upper rightmost switching element 113, there results j=0. This confirms that various sub-networks will inherit the same switch control commands.

(82) In this connection, attention is drawn to the fact that the various indexes j (namely, (BLOCK) 5, 3, 1, 4, 2, 0) univocally identifying the 5×5 switching elements 112 within the first stage, and the two indexes j (namely, (BLOCK) 1,0) univocally identifying the two 3×3 switching elements 111 within each sub-stage 114 are shown in FIG. 8; instead, as just explained, all the 2×2 switching elements 113 can be identified by one and the same index j=0.

(83) Therefore, for the configuration of the switching elements 112 of the first stage, the equation (8) becomes:
[(ρ+j)mod 30]div 6.

(84) Three bits are sufficient for the configuration of the switching elements 112, since the connection of a switch 5×5 is specified by a number included in the range [0;4].

(85) Moreover, for the configuration of the switching elements 111 of the second stage, the equation (8) becomes:
[(ρ+j)mod 6]div 2.

(86) Two bits are sufficient for the configuration of the switching elements 111, since the connection of a switch 3×3 is specified by a number included in the range [0;2].

(87) Finally, for the configuration of the switching elements 113 of the third stage, the equation (8) becomes:
ρ mod 2.

(88) One bit is sufficient for the configuration of the switching elements 113.

(89) Again with reference to FIG. 8, it is worth noting that the input terminals on the left side of the interconnection network 110 are labeled with a number corresponding to the index of the input vector, included in the range [0:29]. The same applies for the output vector.

(90) Each 5×5 switching element 112 of the first stage can be identified by a number included in the range [0:5] (since there are six different 5×5 switching element 112). This number denotes the block label or index j. This index j is useful because it simplifies the implementation of the routing algorithm; it is an input parameter of the routing algorithm. As previously said, the index j is represented in FIG. 8: from the top to the bottom, the labels are 5, 3, 1, 4, 2, 0.

(91) The same approach is used for the 3×3 switching elements 111 of the second stage. It has to be noted, though, that conceptually the last two stages can be interpreted as five separate and replicated 6×6 interconnection sub-networks; then, the same index j can be reused for all the replicated sub-networks. This result stems from the fact that the interconnection network 110 has to be able to perform a cyclic shift; moreover, the routing algorithm is performed with a recursive approach. Then, the routing complexity decreases in the subsequent stages. In fact, at the first stage, an input vector of size 30 is to be shifted. At the second stage, the complexity is only 30/5=6; namely, five different vectors (generated from the input one) of size 6 have to be shifted. These vectors corresponds to the input of each 6×6 sub-network, downstream of the first stage. This justifies the fact that in the second stage only two indexes j are used. These two indexes are shown in FIG. 8 and are included in the range [0:1] (since there are five identical and independent 6×6 sub-stages 114). In practice, the corresponding switches of the replicated sub-networks, from the second stage to the last one, will be denoted with the same label.

(92) A preferred architecture for the 3×3 switching elements 111 is shown in FIG. 9. In particular, as shown in FIG. 9, the 3×3 switching elements 111 are preferably realized, each, by means of three multiplexers 115, each connected to all the three inputs I.sub.0, I.sub.1, I.sub.2 of the 3×3 switching element 111, so as to be operable to output one of said three inputs depending on the control signal(s) received. In detail, this structure allows to have the same selector for all the multiplexers 115, which define the elementary switching elements 111.

(93) The same structure is preferably used also for the 5×5 switching elements 112 and the 2×2 switching element 113; so, all the switches are implemented with the same hardware approach. This property derives from the particular recursive routing algorithm. This also implies that the number of stage is roughly:
˜N log N,
reminding that N denotes the size of the input and output vectors and, hence, of the interconnection network 110.

(94) In a general application not limited to shifts, the number of stage is at least equal to:
˜2*N log N.

(95) FIG. 10 shows the interconnection network 110 with the same features as FIG. 8, but with different labels/indexes j assigned to the 3×3 switching elements 111 and the 5×5 switching elements 112.

(96) By applying the above routing algorithm (in particular, the equation (8)) to the 5×5 switching elements 112 as labelled in FIG. 10, there results:
Selector.sub.5×5=[(ρ+5−j)mod 30]div 6.

(97) The selector of the 5×5 switching elements 112 is obviously included in the range [0:4], since it has to be able to shift the 5-size input vector in a generic way. As a matter of fact, the integer division by six of a number included in the range [0:29] is a number higher than or equal to zero and lower than or equal to four.

(98) As for the 3×3 switching elements 111, there results:
Selector.sub.3×3=[(ρ+1−j)mod 6]div 2.

(99) The selector of the 3×3 switching elements 111 is included in the range [0:2]; it is able to shift the 3-size input vector in a generic way. As a matter of fact, the integer division by 2 of a number included in the range [0:5](that is the result of the 6-mod operation) is a number higher than or equal to zero and lower than or equal to two.

(100) Finally, the single bit required to select the output of the 2×2 switching elements 113 of the last stage is simply:
ρ mod 2.

(101) For example, if ρ=1 and Selector.sub.5×5 (j) denotes the selector of the j-th 5×5 switching element 112, then the following results are obtained:
Selector.sub.5×5(j)=0 if j≠0,
Selector.sub.5×5(j)=1 if j=0.

(102) Moreover, if ρ=1 and Selector.sub.3×3(j) denotes the selector of the j-th 3×3 switching element 111, then the following results are obtained:
Selector.sub.3×3(j)=1 if j=0,
Selector.sub.3×3(j)=0 if j=1.

(103) Finally, all the 2×2 switching elements 113 have the same selection, on the basis of the network construction and subdivision; so, the Selector.sub.2×2 is:
ρ mod 2=mod 2=1

(104) The resulting connection is shown in FIG. 11, where inside each block representing a switching element 111,112,113 a corresponding “local” cyclic (or circular) shift ρ*is shown.

(105) The same reasoning can be applied for a generic overall cyclic shift ρ having value included in the range [0;29](or, more in general, [0;N−1]). In this connection, FIGS. 12-17 show examples for ρ=2, ρ=3, ρ=4, ρ=19, ρ=23 and ρ=29.

(106) Hereinafter, also an example of application of the interconnection network according to the present invention to LDPC-based DVB-S2 will be described. In particular, in the following the hardware implementation of the co-decoder sections for the DVB-S2 code (in this connection reference can be made to Ref12), initially developed for the CCSDS standard (Ref4, Ref10, Ref11) will be introduced. The first idea for DVB-S2 code implementation, that provides a good speed improvement via multiple SISO processors, is based on the use of memory banks and shift registers.

(107) The DVB-S2 code—for what concerns the LDPC—has been evidently designed over underlying hardware architectural solutions (mostly based on shift registers based on FIFOs or memories) allowing for computational parallelism and minimized hardware requirements. The data exchange between the variable and check nodes is based on rules very similar to the interleaver laws underlying CCSDS standard; this allows to design the same hardware architecture (hence, the same interconnection network and the same routing algorithm) for routing the extrinsic information from variable to check nodes in the LDPC or from the inner codec to the outer one in the PCCC or SCCC codes.

(108) According to DVB-S2 LDPC encoding (Ref10), the algorithm to obtain m parity bits for every k information bits, to construct the k+m=n-bit codeword, is the following: initialize to zero the parity bits—p.sub.0=p.sub.1=p.sub.(m-1)=0; the first information bit constitutes one of the XOR operands for the parity check equation specified in the first row of the code generating tables; e.g., for code rate 2/3, the standard document indicates, exploiting associative property, the first double operand XOR:

(109) TABLE-US-00002 p.sub.0 = p.sub.0 ⊕ i.sub.0 p.sub.2767 = p.sub.2767 ⊕ i.sub.0 p.sub.10491 = p.sub.10491 ⊕ i.sub.0 p.sub.240 = p.sub.240 ⊕ i.sub.0 p.sub.16043 = p.sub.16043 ⊕ i.sub.0 p.sub.18673 = p.sub.18673 ⊕ i.sub.0 p.sub.506 = p.sub.506 ⊕ i.sub.0 p.sub.9279 = p.sub.9279 ⊕ i.sub.0 p.sub.12826 = p.sub.12826 ⊕ i.sub.0 p.sub.10579 = p.sub.10579 ⊕ i.sub.0 p.sub.8065 = p.sub.8065 ⊕ i.sub.0 p.sub.20928 = p.sub.20928 ⊕ i.sub.0 p.sub.8226 = p.sub.8226 ⊕ i.sub.0 the following 359 information bits constitute XOR operands to the parity check equations provided by the following equation:
{x+i*q}mod(n.sub.LDPC−k.sub.LDPC),
where x is the index of the parity check equation relevant to the first bit, and i is the index of the information bit itself, q is a characteristic parameter that varies with code rate (r), and is equal to
q=180*(1−r); the parity check equations to which the 361.sup.st information bit—indexed as i.sub.360 since the first is i.sub.0—will contribute, are given by the second table row; the following 359 information bits constitute XOR operands to the parity check equations provided by the following general equation:
{x+(i mod 360)*q}mod(n.sub.LDPC−k.sub.LDPC)
where x indicates the parity check equation to which the bit indexed with i.sub.360 contributes; after every information has been processed as indicated above, the final parity bit is obtained by the following modifications on the obtained parity check vector (a property stemming from the double diagonal matrix contained in the parity check matrix underlying the presented algorithm):
p.sub.i=p.sub.i⊕p.sub.i-1,i=1,2, . . . ,n.sub.ldpc−k.sub.ldpc−1; the content of p.sub.i, with i=0, . . . , m−1, is now equal to the non-systematic part of the codeword.

(110) As it is now clear, it is convenient to partition the k information bits into NS=(1-q) 360-bit blocks. The code generating table of the standard document correspondingly has, as a matter fact, NS rows. Additionally, it is convenient to arrange the parity check equations in blocks of q equations.

(111) The partitioning suggested by the presented algorithm, together with the shift register mode of operation clearly lying under the algorithm can be used to totally avoid multiplexers in the hardware implementation, by using the proposed interconnection network. The Tanner graph edges routing (also referred to as interleaving) can be implemented by cyclic shift registers and, above all, the shift registers can be implemented in the memory blocks of the FPGA, to which a 1 ns access time is typical.

(112) Following the above reasoning, the schematic presented in FIG. 18 describes the LDPC encoding section hardware architecture, endowed with q parallel XOR processors; every processor takes sequentially charge of 360 parity check equations calculation. The degree of parallelism could also be a multiple of q. For the sake of description simplicity, no details of internal connections of the horizontal bank of shift registers and their organization will be provided, since they can be easily obtained by those skilled in the art.

(113) Loading of upper shift register with incoming data—it is a PING PONG shift register; a copy of it is receiving data bits while the other is feeding them to the memory based shift registers block.

(114) Shift Register (SR) block loading—the upper shift register is operated, producing sequentially the 360.sup.th, 359.sup.th, . . . down to the 1.sup.st bit to the first line going down to the SR block. In parallel, NS other line do a similar operation on the other data blocks. Lines are connected at the upper SR 201 at flip-flops indexed with h*359 with h ranging from 1 to NS.

(115) SR block unloading—the parity check equations are calculated q at a time.

(116) In the above procedure, the contents of the bank of SR after loading is in direct relation with the part H.sup.d of the coding matrix, e.g., the systematic part of it.

(117) To obtain the parity part of the codeword exploiting the double diagonal form of the coding matrix, the simple structure connected to the bottom of the q XOR operators is used. It implements the operation:
p.sub.i=p.sub.i⊕p.sub.i-1,i=1,2, . . . ,n.sub.ldpc−k.sub.lpdc−1.

(118) When implementing the decoder architecture, the same solutions used in the encoder architecture can be used. In this connection, FIGS. 19 and 20 show traditional DVB-S2 LDPC decoder architecture; in particular, FIGS. 19 and 20 show flows during first half of an iteration, respectively, from variable node to check node and vice versa.

(119) In FIG. 19 the uppermost SRs (once again duplicated, since one is being written with a frame of information from the demodulator while the other is being processed) will contain the LLR bits.

(120) The extrinsic information memory—constituting the means for information exchange among the variable and check nodes—will be constituted by the bank of memory based on SRs, for which, given a 7-bit representation of the LLRs, 7 identical parallel structures will be required to store the 7-bit information, wider than the single bit used in the encoder.

(121) The n LLRs will be partitioned in 180 banks, of 360 log-likelihood ratios (LLRs) each (180*360=64800). The LLRs of the information bits will be contained in the first NS banks; the other remaining q*360 will contain those of the redundancy bits (or parity check bits).

(122) As in the encoding process, a parallel routing of NS LLRs to the bank of memory based on SRs will be executed, and so, with 360 steps, by shifting the upper SR each time, all of the LLRs are loaded in the bank of SRs in the middle of FIG. 19.

(123) The loading of redundancy bits follows a similar process, not described in details but easily obtainable by the properties of the coding matrix.

(124) When the extrinsic information memory SRs are fully loaded, information can be processed by the q parallel check node processors. The g(*) operator provides the messages that must be re-routed to the variable nodes:
m.sup.1(c.sub.k,v.sub.j)=g(m.sup.1(v.sub.1,c.sub.k),m.sup.1(v.sub.2,c.sub.k), . . . ,m.sup.1(v.sub.i,c.sub.k), . . . ,m.sup.1(v.sub.n,c.sub.k))
a check node is connected to de variable nodes, and for each of them the above equation must be applied. To that purpose, messages will be updated using a specific operator.

(125) After 359 shifts of the registers, all of the messages have been processed, and ready to get to the variable node processors. Updated messages will be calculated as follows:

(126) m l + 1 ( v i , c k ) = m i + .Math. j = 1 j k d v i m l ( c j , v j ) ; i = 1 , 2 , . . . , n .

(127) In the variable nodes the only operation to be executed is the sum.

(128) With the above defined architecture, variable to check node message computation and passing requires:
360*T.sub.pg+L
clock cycles, due to the necessary complete shifts of the SRs, which have a size equal to 360. The parameter T.sub.pg is the required elaboration time of the processors, whereas L is the latency time.

(129) Analogously, the check nodes to variable nodes message passing requires a time:
360*T.sub.sum+L,
where T.sub.sum is the time required to execute a sum and L is a latency time. The parameter L of the check nodes to variable nodes message passing is not necessarily identical to the homonymous parameter for variable to check nodes message passing.

(130) The interconnection network according to the present invention is able to perform the 360 shifts of the several SRs shown in FIG. 20 simply in one clock cycle, unless the initial latency at the start of a codeword processing.

(131) Then, the hardware architecture based on the present invention is very similar to the traditional (i.e., standard) one, except for the part in charge of the routing of the data from the variable nodes to the check nodes and vice versa (i.e., the interconnection network according to the present invention).

(132) The block diagram of the proposed solution is shown in FIG. 21, where the shift registers are replaced by an interconnection network 200 according to the present invention (i.e., implemented according to the above teachings of the present invention); this architecture has two dramatic advantages compared to the previous one: reduced processing time, since the interconnection network 200 is able to perform a generic number p of shifts in one clock cycle; the previous architecture, instead, needs ρ clock cycles; reduced hardware requirements—the interconnection subnetworks of the interconnection network 200 intrinsically simplify the routing problem, by reducing the hardware constraint in terms of LEs and number of physical connections.

(133) The parallelism degree is equal to 360 independently on the coding rate, whereas the processing time required to perform the encoding of one codeword is equal to the characteristic parameter q of the chosen coding rate.

(134) From the foregoing description, the technical advantages of the present invention are immediately clear.

(135) In particular, it is important to point out that the present invention provides an extremely simple routing algorithm, which requires low hardware resources and, above all, does not limit the maximum data rate because its computation can be carried out in real time.

(136) In addition, the interconnection network according to the present invention reduces the hardware requirements compared to the Benes network, as well as the Clos ones.

(137) In this connection, the following Table 2 shows a comparison between the interconnection network according to the present invention and an implementation based on the Benes network. The interconnection network according to the present invention reduces by at least 40-50% the hardware requirements in terms of logic elements (LEs).

(138) TABLE-US-00003 TABLE 2 Interconnection network according to the invention Benes network Occupation Occupation (number of (number of required required LEs per LEs per Occupation Inputs bit) Size bit) Ratio 12 36 16 112 32.14% 16 48 16 112 42.86% 24 96 32 288 33.33% 32 160 32 288 55.56% 36 144 64 704 20.45% 48 240 64 704 34.09% 60 360 64 704 51.14% 64 384 64 704 54.55% 72 360 128 1664 21.63% 84 504 128 1664 30.29% 96 576 128 1664 34.62% 108 756 128 1664 45.43% 120 840 128 1664 50.48%

(139) As previously explained, if one attempts to implement a belief propagation interconnection network of a parallel decoder with a general Clos network, complex issues arise, both in terms of hardware complexity of the switch fabric, and, even more critically, in routing commands generation, since it is quite complex to determine a routing algorithm.

(140) Instead, the present invention allows to implement a much lighter and much simpler interconnection network provided with non-blocking properties (in fact, an interconnection network according to the present invention is non-blocking for the routing rule imposed by the code interleaver mathematical description).

(141) Moreover, again as previously explained, banyan networks (even though represent a light implementation solution among generally blocking networks), are rigidly based on a power-of-two factorization and, hence, does not allow a general factorization. Also the barrel shifter solutions (although represent a very efficient kind of networks), used in Arithmetic Logic Units (ALUs) of computers, are based on a power-of-two factorization and generate logarithmic complexity structures that do not fit the general case.

(142) Instead, the present invention introduces a fully general network whose factorization is generally realizable, has minimum hardware implementation requirements with respect to the other existing solutions and, in addition, introduces a hard wiring that fully substitutes one of the network stages of a Clos network and allows a very efficient self-routing algorithm. The hard wiring is made possible by fully exploiting the typical interleaver laws of CCSDS or DVB standards (Ref10, Ref11, Ref12).

(143) Hence, for the most general size factorization, the present invention provides minimum hardware complexity along with a simple self-routing algorithm.

(144) In particular, banyan networks and barrel shifters solutions, not endowed with general flexibility on the number of inputs, do not allow any possibility of achieving the desired interleaver law and incur critical blocks. The present invention, instead, can match a general size interleaver, with the only limitation related to a non-prime number of inputs or, which is the same, a non-prime degree of interleaver parallelism.

(145) The present invention guarantees the exploitation of the interleaver algorithm properties in the best way to reduce to a minimum the hardware complexity of the switch fabric by completely eliminating one of the stages of a Clos network and completely cancelling some of the routing algorithm computations required for a Clos network. In particular, the routing algorithm section that is cancelled is actually the most critical one, while the computations still to be executed are simply obtainable by self-routing procedures.

(146) In addition, the present invention provides for a scalable design that is highly desirable to obtain the best trade-off among hardware parallelism, resources requirements and decoder throughput on the basis of the specific target digital technology, that typically varies with the improvement of the digital technology.

(147) In particular, the interconnection network according to the present invention is able to implement self-routing in real-time. With this approach, the major constraint to the data rate is the maximum operating frequency of the target technology, since the interconnection network according to the present invention requires very limited hardware resources (i.e., logic elements) and no specific routing algorithms computation.

(148) As previously described, the invention is mainly aimed to the implementation of a core element of an iterative FEC encoder/decoder, i.e., the interconnection network. This network is necessary in order to exchange data and simultaneously realize the required data interleaving function during the iterative elaboration process. More in detail, the interconnection network according to the present invention has N inputs and N outputs and is able to provide an output vector of size N that is a circularly shifted replica of the input vector of size N. The number of positions to be shifted can be any value in the range of size [0;N−1]. The interconnection network according to the present invention allows very high throughput performing any generic shift in only one clock cycle; at the same time, it minimizes the required hardware resources considering both the interconnection network architecture and the relevant control logic, being the last necessary to implement the simultaneous setting of all the internal elements for the required shift.

(149) Thence, the interconnection network according to the present invention represents the most efficient solution as for hardware and the most simple as for routing, is suitable for all FEC technologies, and does not impose restrictions on the number of inputs or parallelism degree, thereby being applicable to all high speed coding solutions.

(150) The present invention can be implemented in all the existing (and future) very-large-scale integration (VLSI) technologies, in particular by means of devices based on ASIC and FPGA technologies.

(151) The present invention can be advantageously exploited in all the applications and devices in which high throughput FEC encoders/decoders are required, such as: high data rate modem (for satellite and ground applications); optical communications (both free space and fibre based); digital storage devices for high speed data retrieval and recording.

(152) As for space applications, earth observation missions may greatly takes advantage of the present invention, because the large volumes of generated observation data and the reduced visibility time with the ground stations (since LEO orbits are typically used) require the use of extremely high data rates (for example, from 300/500 Mbps up to 1 Gbps, or even more for future missions). Therefore, the present invention could be a crucial element for the feasibility of Earth observation high data rate communication systems.

(153) In conclusion, it is clear that numerous modifications and variants can be made to the present invention, all falling within the scope of the invention, as defined in the appended claims.