Fast FIR filtering technique for multirate filters
10003324 ยท 2018-06-19
Assignee
Inventors
Cpc classification
International classification
Abstract
Data samples are filtered by using a digital filter where the length of an impulse response of the digital filter is finite, an impulse response of the digital filter is symmetric and the operation of the digital filter is multi-rate. The method uses a polyphase decomposition to break down the input data stream into N parallel substreams and the multi-rate digital filter is separated by a polyphase decomposition into multiple lower-rate sub-filters where each of the sub-filters is separated into a set of simpler sub-sub-filters which operate upon the same set of input samples and which have impulse responses which are jointly centro-symmetric, a set of pre-filtering arithmetic structures, and a set of post-filtering arithmetic structures and performing each such pair of sub-sub-filtering operations using a single shared filter structure, a set of pre-filtering combining adders, and a set of post-filtering separating adders.
Claims
1. A method for filtering data samples comprising: using a digital filter to act upon a input data stream in order to transform it into an output data stream, wherein: the length of an impulse response of the digital filter is finite; an impulse response of the digital filter is symmetric; the operation of the digital filter is multi-rate; using a polyphase decomposition to break down the input data stream into N parallel substreams; the multi-rate digital filter being separated by a polyphase decomposition into multiple lower-rate sub-filters; each of the sub-filters being separated one or more times into a set of simpler sub-sub-filters, a set of pre-filtering arithmetic structures, and a set of post-filtering arithmetic structures, according to a fast FIR filtering decomposition algorithm; the set of sub-sub-filters comprising one or more pairs of sub-sub-filters which operate upon the same set of input samples and which have impulse responses which are jointly centro-symmetric; and performing each such pair of sub-sub-filtering operations using a single shared filter structure, a set of pre-filtering combining adders, and a set of post-filtering separating adders.
2. The method according to claim 1, wherein the multi-rate digital filter is a digital interpolator and wherein a single pre-filtering arithmetic structure is used to generate the set of input data samples required by multiple sets of sub-sub-filters.
3. The method according to claim 2, wherein the order of the pre-filtering and post-filtering arithmetic structures is interchanged through a circuit transposition operation.
4. The method according to claim 1, wherein the filter is a digital decimator and a single post-filtering arithmetic structure is used to process the output samples from Multiple sets of sub-sub-filters.
5. The method according to claim 4, wherein the order of the pre-filtering and post-filtering arithmetic structures is interchanged through a circuit transposition operation.
6. The method according to claim 1, wherein the fast FIR decomposition algorithm is a 22 decomposition as described in equation 6.
7. The method according to claim 1, wherein the fast FIR decomposition algorithm is a 33 decomposition as described in equation 8.
8. A method for filtering data samples comprising: using a digital filter to act upon a input data stream in order to transform it into an output data stream, wherein: the length of an impulse response of the digital filter is finite; an impulse response of the digital filter is symmetric; the operation of the digital filter is multi-rate; using a polyphase decomposition to break down the input data stream into N parallel substreams; the multi-rate digital filter being separated by a polyphase decomposition into multiple lower-rate sub-filters; each of the sub-filters being separated one or more times into a set of simpler sub-sub-filters, a set of pre-filtering arithmetic structures, and a set of post-filtering arithmetic structures, according to a fast FIR filtering decomposition algorithm; wherein the multi-rate digital filter is a digital interpolator and wherein a single pre-filtering arithmetic structure is used to generate the set of input data samples required by multiple sets of sub-sub-filters.
9. The method according to claim 8, wherein the order of the pre-filtering and post-filtering arithmetic structures is interchanged through a circuit transposition operation.
10. The method according to claim 8, wherein the fast FIR decomposition algorithm is a 22 decomposition as described in equation 6.
11. The method according to claim 8, wherein the fast FIR decomposition algorithm is a 33 decomposition as described in equation 8.
12. A method for filtering data samples comprising: using a digital filter to act upon a input data stream in order to transform it into an output data stream, wherein: the length of an impulse response of the digital filter is finite; an impulse response of the digital filter is symmetric; the operation of the digital filter is multi-rate; using a polyphase decomposition to break down the input data stream into N parallel substreams; the multi-rate digital filter being separated by a polyphase decomposition into multiple lower-rate sub-filters; each of the sub-filters being separated one or more times into a set of simpler sub-sub-filters, a set of pre-filtering arithmetic structures, and a set of post-filtering arithmetic structures, according to a fast FIR filtering decomposition algorithm; wherein the filter is a digital decimator and a single post-filtering arithmetic structure is used to process the output samples from multiple sets of sub-sub-filters.
13. The method according to claim 12, wherein the order of the pre-filtering and post-filtering arithmetic structures is interchanged through a circuit transposition operation.
14. The method according to claim 12, wherein the fast FIR decomposition algorithm is a 22 decomposition as described in equation 6.
15. The method according to claim 12, wherein the fast FIR decomposition algorithm is a 33 decomposition as described in equation 8.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
DETAILED DESCRIPTION
(15) Sample rate converters are a special class of multirate FIR filters. Two main types of sample rate converters exist: interpolators and decimators. Since a decimator can easily be obtained by transposing the signal flow graph of an interpolator [14], the discussion which follows concentrates on interpolators.
(16) The polyphase decompostion of a single-input 1-to-M interpolator [15] is expressed in the z-domain as shown below.
(17)
where
(18)
for N=KM. If the interpolator receives more than one input sample on every clock cycle, a specialized filter structure is required for the implementation. This paper proposes an FFA-based architecture that reduces the multiplier and adder requirements for symmetrical FIR sample rate converters. In the case of an L-parallel 1-to-M FFA based interpolator, the structure is developed as follows:
Step 1: Apply the polyphase decomposition technique described by equation (13) to the FIR interpolator. This decomposition results in M polyphase subfilters.
Step 2: Apply an L-by-L FFA decomposition to each polyphase subfilter, thereby breaking it down into multiple FFA subfilters (also referred to herein as sub-sub-filters).
Step 3: Since each polyphase subfilter receives the same L-parallel input data, only one copy of the FFA pre-add section is required, as shown in
(19) The first optimization, which may be used to reduce the total number of multipliers required to implement the filter, will be introduced through an example.
(20) Consider the construction of a 2-parallel version of a 1-to-2 interpolator using the procedure described above. In this example, the symmetrical interpolation filter is 16 taps long, with the coefficients defined as h.sub.0, h.sub.1, . . . , h.sub.7, h.sub.7, h.sub.6, . . . , h.sub.0. Performing a polyphase decomposition will result in two polyphase subfilters, H.sub.0 and H.sub.1, which will contain the odd and even coefficients of the original prototype filter, respectively. Since two input samples are received every clock cycle, the 2-by-2 T&C FFA is applied to decompose each polyphase subfilter H.sub.i, into three 4-tap FFA subfilters (H.sub.i,j), where i represents the polyphase subfilter numbering and j represents the FFA subfilter numbering.
(21)
(22)
where
(23)
c.sub.i=(H.sub.0,0[i]+H.sub.1,0[i])/2, d.sub.i=(H.sub.0,0[i]H.sub.1,0[i])/2 and I.sub.2 and J.sub.2 represent the 2-by-2 identity and counter identity matrices, respectively. Notice that the original filtering operations have been broken down into a set of pre-filtering combining adders (represented by matrix D), a simpler filtering operation, and a set of post-filtering separating adders (represented by matrix A.
(24) Similarly, the coefficients used by FFA subfilter H.sub.1,1 are the reversed and negated versions of those used by H.sub.0,1. This subfilter pair may also be computed using equation (15) with the following minor substitution:
(25)
Note that the subfilters H.sub.0,2 and H.sub.1,2 are not symmetrical, so they must be implemented using one multiplier per coefficient.
(26) Next, consider the case of a 2-parallel 1-to-3 FFA-based interpolator, as shown in
(27) In general, for any L-by-L FFA based symmetrical interpolator or decimator, centro-symmetric FFA subfilter pairs exist between the T&C FFAs of the polyphase subfilters H.sub.i and H.sub.M+1i, where i=1, 2, . . . , M/21. If M is odd, the polyphase subfilter H.sub.(M1)/2+1 will generate some symmetrical FFA subfilters. By exploiting both forms of symmetry, it is possible to reduce the multiplier cost of multirate FFA implementations. Note that a derivation of the multiplier costs of various composite FFAs is provided hereinafter.
(28) It is well-known [14] that if the signal flow graph of a single-rate linear system is transposed, the functionality of the system is unchanged. This process can be applied to any FFA decomposition in order to generate an alternate structure. For example, consider
(29) In general, the effect of transposition on an FFA is to transfer some of the adders from the post adder structure into the pre adder structure. Table 1 summarizes the pre and post adder requirements for a variety of FFA structures. Note that the FFA subfilters are unaffected by transposition.
(30) It is apparent from Table 1 that transposition does not change the logic resources required to implement a single rate FFA, as the increase in pre adder resources is exactly balanced by a decrease in post adder resources. However, this is not the case for multirate FFAs. To illustrate this fact, consider the case of a 1-to-2 interpolator with 6 parallel inputs, shown in
(31) In general, 1-to-M interpolators require M copies of the post adder structure and only 1 copy of the pre adder structure. As a result, the most economical implementation for an interpolator is achieved through the transposed architecture, which transfers some of the complexity from the post adder structure to the pre adder structure. Conversely, the non-transposed architectures result in more economical implementations for decimators, due to the fact that M-to-1 decimators require M copies of the pre adder structure and only 1 copy of the post adder structure, as indicated in
(32) When transposition is performed, the updated pre and post adder matrices may be found through the following procedure:
(33) 1) Set post adder matrix to transpose of original pre adder matrix
(34) 2) Set pre adder matrix to transpose of original post adder matrix
(35) 3) Reverse order of rows in post adder matrix
(36) 4) Reverse order of columns in pre adder matrix
(37) An L-parallel 1-to-M FFA interpolator may be represented mathematically as follows:
(38)
(39) where M is the upsmapling factor, Q.sub.c is the post-adder matrix of an L-by-L FFA, Y.sub.p,i is a permuted version of the output of an L-by-L FFA performed on the i.sup.th polyphase subfilter, H.sub.c,i is an L-by-L FFA subfilter matrix derived from the i.sup.th polyphase subfilter, P.sub.c is the pre-adder matrix of an L-by-L FFA and X.sub.p is the permuted version of the input of the L-by-L FFA or FFA interpolator.
(40) The derivation of each of these quantities is described in detail in the following sections.
(41) 1) Derive Pre-Add Matrix:
(42) As shown in equation 10, the pre-adder matrix of a composite L-by-L standard FFA is constructed by forming the tensor product of the pre-adder matrices of each of the elementary (2-by-2 or 3-by-3) FFAs. However, the expression for the pre-adder matrix of an L-by-L T&C FFA is more complex.
(43) To illustrate, consider a 6-by-6 FFA implemented by cascading 2-by-2 and 3-by-3 FFAs. The pre-adder matrix for a 6-by-6 was developed hereinbefore, and is shown in equation 19. In the case of a symmetrical filter, the 6-by-6 T&C FFA is developed as follows.
(44) The first cascading stage of 6-by-6 T&C FFA involves a 2-by-2 T&C FFA, which breaks down the original filter into two symmetrical and one non-symmetrical FFA subfilters. The second stage of the decomposition involves using a 3-by-3 FFA to further decompose each of the subfilters from the first stage. When decomposing the symmetrical subfilters, the T&C FFA is used. However, since one of the subfilters is not symmetrical, it will not yield any symmetrical subfilters when decomposed. Therefore, it is preferable to use the less expensive standard FFA when decomposing the non-symmetrical filter.
(45) In order to derive an expression for the 6-by-6 T&C FFA, it is useful to note that the first two rows of the pre-adder matrix p.sub.c,2 generate inputs for the symmetrical subfilters, whereas the third row generates an input for the non-symmetrical subfilter. Since these two sets of subfilters are to be decomposed in different ways, a technique for mathematically separating the rows of p.sub.c,2 is needed. This paper introduces two new matrices for this purpose: S.sub.c,2=diag[1 1 0] extracts the rows which correspond to symmetrical subfilters, and S.sub.s,2=diag[0 0 1] extracts the rows which correspond to non-symmetrical subfilters. Similarly, two other matrices S.sub.c,3=diag[0 0 1 1 1 1] and S.sub.s,3=[1 1 0 0 0 0] are introduced to extract the rows from p.sub.c,3 related to symmetrical and non-symmetrical 3-by-3 T&C FFA subfilters, respectively. The 6-by-6 T&C pre-adder matrix is generated by applying the T&C 3-by-3 FFA to the symmetrical portion of p.sub.c,2 and the standard 3-by-3 FFA to the non-symmetrical portion of p.sub.c,2, as shown in equation 18.
P.sub.c,6=[S.sub.s,2p.sub.c,2]p.sub.c,3+[S.sub.n,2p.sub.c,2]
p.sub.s,3(18)
P.sub.s,6=p.sub.c,2p.sub.s,3(19)
(46) Extending the example one step further, a 12-by-12 FFA may be viewed as a cascade of 2-by-2 and 6-by-6 FFAs as shown in
P.sub.c,12=[S.sub.s,2p.sub.c,2]P.sub.c,6+[S.sub.n,2p.sub.c,2]
P.sub.s,6(20)
(47) In general, the derivation of the pre-adder matrix for a composite T&C FFA involves an iterative process that starts from the final cascading stage, W1. The first step of the iterative process defines the T&C pre-adder matrix P.sub.c,l.sub.
P.sub.c,l.sub.
For the subsequent iterations, j=W2, W3, . . . , 0, the pre-adder matrices of both b-by-b T&C and standard FFAs, where b=.sub.i=j.sup.W1l.sub.i are computed as shown in equations 22 and 23, respectively.
P.sub.c,b=[S.sub.c,l.sub.P.sub.c,b+[(S.sub.n,l.sub.
P.sub.s,b,(22)
P.sub.s,b=P.sub.s,l.sub.p.sub.s,l.sub.
. . . p.sub.s,l.sub.
p.sub.s,l.sub.
where b=b/l.sub.j. The final pre-adder matrix, P.sub.c, is equal to P.sub.c,L.
(48) 2) Derive Subfilter Matrix:
(49) The subfilter matrix H.sub.c,m, for m=0, 1, . . . , M1, which represents the FFA subfilters of m.sup.th polyphase subfilter H, is given as
H.sub.c,m=G.sub.cP.sub.cH.sub.m,(24)
where G.sub.c is the gain of the FFA.
(50) The gain matrix G.sub.c is derived through the same iterative process used to derive the pre-adder matrix, explained in 1. The first iterative step, j=W1, starts with gain matrices of both T&C and standard FFAs from the final cascading stage, as given below.
G.sub.c,l.sub.
For the subsequent iterations, j=W2, W3, . . . , 0 the gain matrices of both b-by-b standard and T&C sections are determined using equations 26 and 27.
G.sub.s,b=g.sub.s,l.sub.g.sub.s,l.sub.
. . . g.sub.s,l.sub.
g.sub.s,l.sub.
G.sub.c,b=[S.sub.c,l.sub.G.sub.c,b+[(S.sub.n,l.sub.
G.sub.s,b,(27)
The final gain matrix, G.sub.c, is equal to G.sub.c,L.
(51) 3) Derive Post-Add Matrix:
(52) The procedure for computing the post-adder for a composite standard FFA was previously provided in equation 12. Applying this procedure to a 6-by-6 standard FFA yields the result shown in equation 28 below.
Q.sub.s,6=B.sub.s,1.sup.(6)B.sub.s,0.sup.(6), where(28)
B.sub.s,0.sup.(6)=I.sub.3q.sub.s,3(29)
B.sub.s,1.sup.(6)=Q.sub.s,2.sup.(6),(30)
where Q.sub.s,2.sup.6 is generated by performing a 3-by-3 delay unfolding transformation on matrix q.sub.s,2. Note that in this section, the superscript .sup.(x) is used to indicate that the matrix in question belongs to an x-by-x FFA, and does not imply any mathematical operations. Thus, the matrix B.sub.s,0.sup.(6) is identical to the matrix B.sub.s,0. This superscript notation is used only to clarify the examples discussed in this section, and is not used for generalized L-by-L FFA derivation which follows.
(53) The derivation of 6-by-6 T&C post-add matrix is more complex because both the T&C and standard post-add matrices are involved. As shown in
Q.sub.c,6=B.sub.c,1.sup.(6)B.sub.c,0.sup.(6), where(31)
B.sub.c,0.sup.(6)=(S.sub.c,2q.sub.c,3)+(S.sub.s,2
q.sub.s,3)(32)
B.sub.c,1.sup.(6)=Q.sub.c,2.sup.(6),(33)
where Q.sub.c,2.sup.(6) is generated by performing a 3-by-3 delay unfolding transformation on q.sub.c,2.
(54) The 12-by-12 post-add matrices of standard and T&C FFAs are given in equations 34 and 38 respectively.
Q.sub.s,12=B.sub.s,2.sup.(12)B.sub.s,1.sup.(12)B.sub.s,0.sup.(12), where(34)
B.sub.s,0.sup.(12)=I.sub.3(I.sub.3
q.sub.s,3)=I.sub.3
B.sub.s,0.sup.(6)(35)
B.sub.s,1.sup.(12)=I.sub.3Q.sub.s,2.sup.(6)(36)
B.sub.s,2=Q.sub.s,2.sup.(12)(37)
where Q.sub.s,2.sup.(12) is generated by performing a 6-by-6 delay unfolding transformation on q.sub.s,2.
Q.sub.c,12=B.sub.c,2.sup.(12)B.sub.c,1.sup.(12)B.sub.c,0.sup.(12), where(38)
B.sub.c,0.sup.(12)=(S.sub.c,2B.sub.c,0.sup.(6))+(S.sub.s,2
B.sub.s,0.sup.(6))(39)
B.sub.c,1.sup.(12)=(S.sub.c,2Q.sub.c,2.sup.(6))+(S.sub.s,2
Q.sub.s,2.sup.(6))(40)
B.sub.c,2.sup.(12)=Q.sub.c,2.sup.(12)(41)
where Q.sub.c,2.sup.(12) is generated by performing a 6-by-6 delay unfolding transformation on q.sub.c,2.
(55) The derivation of generalized L-by-L T&C post-add matrix Q.sub.c is expressed as
(56)
where B.sub.c,W1i is derived below. From the above examples, it is clear that the derivation of B.sub.c,W1i involves an iterative process.
(57) The first iterative step (j=0) defines two matrices B.sub.c,i,j=Q.sub.c,l.sub.
(58) The subsequent iterative steps (j=1, . . . , i) are defined as follows.
B.sub.c,i,j=(S.sub.c,l.sub.B.sub.c,i,j1)+(S.sub.n,l.sub.
B.sub.s,i,j1),(43)
In eqn (43), B.sub.s,i,jI is obtained as given below.
B.sub.s,i,j=I.sub.F.sub.B.sub.s,i,j1,(44)
where F.sub.i=3 for l.sub.ij=2 and F.sub.i=6 for l.sub.ij=3. The matrix B.sub.c,W1i is equal to B.sub.c,i,i
(59) The resource usage of the proposed technique may be analyzed in three main sections: the subfilters, the polyphase combiner adders (in the case of a decimator), and the FFA pre/post adders. Considering the subfilters first, when a multirate filter is implemented using the proposed scheme, the total number of subfilters may be determined as follows:
(60)
where F.sub.i, the number of subfilters generated by the i'th FFA, is 3 for a 22 FFA and 6 for a 33 FFA. Each of the subfilters has N/LM coefficents. Of these subfilters, a fraction equal to ().sup.W are either individually symmetric or jointly centrosymmetric, with the remainder being asymmetric. As previously discussed, it is possible to exploit either type of symmetry in order to reduce the required multiplier resources by half. Thus, the total number of multipliers required to implement the filter is:
(61)
(62) The number of adders typically required to implement an FIR subfilter is one less than the number of filter coefficients. In order to exploit the symmetry in the jointly centrosymmetric subfilters, one additional adder is required per subfilter. Thus, the total number of adders required for the subfilters is:
(63)
(64) In the case of a decimator, the most efficient implementation is generated by placing the adders associated with the polyphase combiner right after the subfilters and before the FFA post adders, as shown in
(65)
(66) When constructing a single-rate composite FFA larger than 22 or 33, multiple copies of the standard 22 or 33 pre and post adder structures are required. The exact number of copies depends on the sequence of FFA decompositions used to construct the composite filter. At each stage of the decomposition, C.sub.i, the number of copies required, may be determined based on the number of inputs to the pre adder stage as follows:
(67)
As previously discussed, the T&C FFA structure is used when decomposing filters that are symmetric or jointly centrosymmetric, while the standard FFA decomposition is used for asymmetric filters. Thus, the fraction of the pre and post adder copies which use the T&C structure in the i'th stage of the FFA decomposition is .sup.i1. The standard FFA structure is used for the remaining copies.
(68) For multirate FFA structures, the total pre and post adder cost depends on whether the filter is an interpolator or a decimator. M copies of the pre adder structure are required for decimators, while M copies of the post adder structure are required for interpolators. Thus, the required numbers of pre and post adders for the proposed interpolator and decimator structures are given by equations 50 and 51, respectively:
(69)
where A.sub.S.sub._.sub.pre.sub.
(70) The typical way to implement a multirate filter at sampling rates higher than the system clock rate is to extend the parallel filtering approach described in [6]. Such an approach would involve breaking down each of the M polyphase filter branches into L.sup.2 subfilters. Since each pair of subfilters is jointly centrosymmetric in this case, the total number of multipliers required is
(71)
The total number of adders required is NLLM for interpolators and NLL for decimators.
(72) Table 2 compares the required computational resources of the parallel and proposed techniques for a variety of multirate filters. Note that in the Saved Adders column, the number without parentheses represents the total difference between the proposed approach and the parallel approach, whereas the number in parentheses represents the saving achieved by using the optimized transposed architecture.
(73) It is clear from equation 46 and Table 2 that the number of multipliers saved by the proposed approach does not depend on the rate change factor M. Rather, the multiplier savings are determined solely by the parallelism factor L. As the parallelism increases, the multiplier savings relative to the parallel case also increases, as highlighted in
(74) It should be noted that the proposed approach is not limited to cases where the signal's sampling frequency exceeds the system clock frequency. In situations where the converse is true, this technique may be applied in order to trade logic resource consumption or chip area for a reduction in dynamic power consumption. The application of the proposed technique will reduce the operating rate of the multipliers, thereby requiring more discrete multipliers in order to perform the operation. However, the total number of multiplications performed by the system per unit of time will be reduced according to the relationship shown in
(75) The arrangements herein provide a technique for applying Fast-FIR filtering to upsampling and downsampling filters, resulting in a reduction in the computational complexity and hardware implementation costs of such filters. The technique allows for the exploitation of symmetry in the prototype filter coefficients, which allows for a further reduction in the number of multipliers required to construct the filters.
(76) The structure and computational complexity of the derived multirate Fast FIR filters may be modified by transposing the signal flow graphs of the individual Fast-FIR decompositions, allowing some duplicate pre- and post-adder sections to be combined. In the case of an upsampler, the transposed version of the filter results in a more economical hardware implementation. Conversely, the non-transposed filter structure is more economical for downsamplers.
(77) Furthermore, the arrangements herein provide a series of generalized equations describing the operation of the symmetrical FIR exploitation scheme originally presented by T&C and utilized in the present technique. Such equations were not previously available in the public literature. These equations simplify the application of the T&C FIR technique for arbitrary numbers of parallel inputs.