TECHNIQUES FOR INPUT FORMATTING AND COEFFICIENT SELECTION FOR SAMPLE RATE CONVERTER IN PARALLEL IMPLEMENTATION SCHEME
20200153415 ยท 2020-05-14
Assignee
Inventors
- Vinoth Kumar (Trichy, IN)
- Bhanu Pande (Bangalore, IN)
- Carroll C. Speir (Pleasant Garden, NC)
- Satishchandra G. Rao (Bangalore, IN)
- Sajkapoor P. K. (Bangalore, IN)
Cpc classification
H03H17/0621
ELECTRICITY
International classification
Abstract
A sample rate converter (SRC) for implementing a rate conversion L/M is described wherein data is input to the SRC at an input rate (F.sub.in) and output from the SRC at an output rate (F.sub.out) equal to F.sub.in*L/M. The SRC includes a low pass filter (LPF) including P multiply-add instances, wherein P is a parallelization factor of the SRC; an input formatter for arranging samples received at the SRC in accordance with the rate conversion L/M and providing P*T.sub.pp input samples to the filter at a given time, wherein T.sub.pp is a number of taps per phase of the LPF; and a coefficient bank for storing a plurality of coefficients and for providing P*T.sub.pp of the coefficients to the LPF at a given time.
Claims
1-20. (canceled)
21. A sample rate converter (SRC) for implementing a rate conversion L/M wherein data is input to the SRC at an input rate (F.sub.in) and output from the SRC at an output rate (F.sub.out) equal to F.sub.in*L/M, the SRC comprising: a low pass filter (LPF) including P filters, wherein P is a parallelization factor of the SRC; an input formatter for arranging samples received at the SRC in accordance with the rate conversion L/M and providing a number of input samples to the filter at a given time; and a coefficient bank for storing a plurality of coefficients and for providing a number of the coefficients to the LPF at a given time.
22. The SRC of claim 21, wherein the number of input samples comprises P*T.sub.PP input samples, wherein T.sub.pp is a number of taps per phase of the LPF, and wherein the input formatter receives the samples at F.sub.in and provides P*T.sub.pp input samples to the LPF at F.sub.out.
23. The SRC of claim 21, wherein the number of the coefficients provided to the LPF at a given time comprises P*T.sub.pp of the coefficients, wherein T.sub.pp is a number of taps per phase of the LPF, and wherein the coefficient bank provides P*T.sub.PP of the coefficients to the filter at F.sub.out.
24. The SRC of claim 21, wherein the input formatter comprises a buffer for storing the received samples at F.sub.in and first circuitry for reading N.sub.uniq ones of the stored samples from the FIFO buffer at F.sub.out.
25. The SRC of claim 24, wherein N.sub.uniq=(P-1)*ceil(M/L)+T.sub.pp, for ceil (M/L)<T.sub.pp.
26. The SRC of claim 24, wherein the first circuitry comprises a first multiplexer (MUX) having inputs respectively connected to outputs of the FIFO and a FIFO read pointer generator for generating a select signal to the MUX.
27. The SRC of claim 26, wherein the input formatter comprises second circuitry for selecting P*T.sub.pp of the N.sub.uniq ones of the stored samples read from the FIFO in accordance with L/M to be provided to the filter as the P*T.sub.PP input samples.
28. The SRC of claim 27, wherein the second circuitry comprises at least one second MUX having inputs selectively connected to outputs of the first MUX and a MUX pointer for generating select signals to each at least one second MUX.
29. The SRC of claim 21, wherein each of the P filter instances implements a sum of products (SOP) operation on coefficients and formatted samples received thereby to generate an output sample and outputs the output sample at F.sub.out.
30. The SRC of claim 21, wherein the coefficient bank rearranges coefficients stored therein in accordance with:
new.sub.index(i,j)=mod[(iP+j)M,L]; and
Coeff.sub.rearranged(i,j)=Coeff.sub.original(new.sub.index(i,j)) where i=0 to ((L/P)-1) and j=parallel line (0 to (P-1)).
31. The SRC of claim 30, wherein the coefficient bank comprises P sets of L/P coefficient registers and P MUXes, wherein each one of the P sets of L/P coefficient registers is connected to input of one of the P MUXes, the coefficient bank further comprising a counter having an output connected to select inputs of each of the P MUXes, the counter having a maximum count value of (L/P)-1.
32. An apparatus comprising: sample rate conversion circuitry (SRC) for implementing a rate conversion L/M wherein data is input to the SRC at an input rate (F.sub.in) and output from the SRC at an output rate (F.sub.out) equal to F.sub.in*L/M, the SRC comprising: a low pass filter (LPF) including P filter instances, wherein P is a parallelization factor of the SRC; an input formatter for arranging samples received at the SRC in accordance with the rate conversion L/M and providing a number of input samples to the filter at a given time; and a coefficient bank for storing a plurality of coefficients and for providing a number of the coefficients to the LPF at a given time; wherein the input formatter receives the samples at F.sub.in and provides the number of input samples to the LPF at F.sub.out; and wherein the coefficient bank provides the number of the coefficients to the filter at F.sub.out.
33. The apparatus of claim 32, wherein the input formatter comprises: a FIFO buffer for storing the received samples at F.sub.in; and read circuitry for causing N.sub.uniq ones of the stored samples to be read from the FIFO buffer at F.sub.out; wherein the read circuitry comprises a first multiplexer (MUX) having inputs respectively connected to outputs of the FIFO and a FIFO read pointer generator for generating a select signal to the MUX.
34. The apparatus of claim 33, wherein the input formatter comprises select circuitry for selecting P*T.sub.pp of the N.sub.uniq ones of the stored samples read from the FIFO in accordance with L/M to be provided to the filter as the P*T.sub.PP input samples, wherein T.sub.pp is a number of taps per phase of the LPF, and wherein the select circuitry comprises at least one second MUX having inputs selectively connected to outputs of the first MUX and a MUX pointer generator for generating select signals to each at least one second MUX.
35. The apparatus of claim 32, wherein the coefficient bank rearranges coefficients stored therein in accordance with a function of P, M, and L.
36. The apparatus of claim 35, wherein the coefficient bank comprises P sets of L/P coefficient registers and P MUXes, wherein each one of the P sets of L/P coefficient registers is connected to input of one of the P MUXes, the coefficient bank further comprising a counter having an output connected to select inputs of each of the P MUXes, the counter having a maximum count value of (L/P)-1.
37. A method for performing a sample rate conversion L/M wherein data wherein an input data rate is F.sub.in and an output data rate is F.sub.out and wherein F.sub.out is equal to F.sub.in*L/M, the method comprising: receiving data samples at F.sub.in; storing the received data samples in a FIFO buffer; reading N.sub.uniq ones of the stored received data samples from the FIFO buffer at F.sub.out; a number of the N.sub.uniq ones of the stored received data samples read from the FIFO buffer to be provided to the filter; providing the selected number of ones of the N.sub.uniq ones of the stored received data samples read from the FIFO buffer to the filter at F.sub.out; and outputting a number of ones of a plurality of stored coefficients to the filter at F.sub.out.
38. The method of claim 37 wherein the reading is performed using read circuitry comprising a first multiplexer (MUX) having inputs respectively connected to outputs of the FIFO and a FIFO read pointer generator for generating a select signal to the MUX.
39. The method of claim 38, wherein the selecting is performed by select circuitry comprising at least one second MUX having inputs selectively connected to outputs of the first MUX and a MUX pointer generator for generating select signals to each at least one second MUX.
40. The method of claim 37 further comprising rearranging the stored coefficients before the outputting in accordance with a function of P, M, and L.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0006] To provide a more complete understanding of the present disclosure and features and advantages thereof, reference is made to the following description, taken in conjunction with the accompanying figures, wherein like reference numerals represent like parts, in which:
[0007]
[0008]
[0009]
[0010]
[0011]
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS
[0021]
[0022] Accordingly, the SRC 100 implements a rate conversion R, which is a real number, on a digital signal having an input rate F.sub.in such the output rate F.sub.out of the signal is F.sub.in*R. The rate conversion R can be expressed as a fraction L/M, and conceptually it is realized as Interpolation.fwdarw.Filter.fwdarw.Decimation as shown in
[0023] In a polyphase implementation of an SRC, such as SRC 100, the upsampler 102 places L-1 zero-valued samples between adjacent samples of the input data, designated herein as x(n), and increases the sample rate by a factor of L. Hence, the filter 104 is placed at the part of the system that has a higher sample rate. The rate conversion R may be a rational or irrational number and, in either case, may be represented exactly or approximated as a fraction L/M, as indicated above. For an irrational numeric value of R, L/M would closely match the actual value based on the precision of L and M. In view of the foregoing, R and L/M are used interchangeably herein.
[0024] As previously noted, a polyphase structure is an efficient way of implementing an SRC in hardware, in terms of computational efficiency. A brief introduction to a polyphase arrangement for implementing an SRC is provided below with reference to
[0025] Referring to
[0026] As shown in
[0027]
[0028]
[0029] In summary, the parallel implementation approach shown in
[0030] As discussed above, SRC implements a rate conversion L/M with an input rate of F.sub.in, and an output rate of F.sub.out=F.sub.in*(L/M). This poses a number of challenges for the inputs arriving at the F.sub.in rate. One such challenge concerns input formatting. In particular, inputs have to be efficiently arranged for filter operations as governed by L/M and P for the parallel implementation scheme. Another such challenge relates to timing isolation. Specifically, inputs have to be rate converted to the F.sub.out rate before they are forwarded to the filter to avoid direct timing paths between F.sub.in and F.sub.out. In particular, F.sub.in and F.sub.out can be of different frequencies governed by the L and M ratio. When there is a direct interaction between F.sub.in domain signals and F.sub.out domain signals, the interaction may result in a very high frequency crossing, causing an extremely difficult path for which to meet timing requirements. The input formatter intentionally avoids or isolates the timing paths between F.sub.in and F.sub.out domains for this reason.
[0031] A variable input offsets concept in accordance with embodiments described herein may be used to address this issue. In particular, unlike fixed decimation and interpolation, the index of inputs required in successive F.sub.out cycles for SRC filter operations is based on L/M. The input index required to generate the output sample identified by the m.sup.th output index is given by:
INT(m*M/L)
where m is the output sample index, and the offset between two successive input indices for the m.sup.th output and the (m+1).sup.th output (i.e., the input index offset) is given by:
INT((m+1)*M/L)INT(m*M/L)
[0032] Tables 1 and 2 below illustrate the variable input offsets concept for L/M=4/5 and L/M=8/21, respectively. In each of Table 1 and Table 2, a first row identifies the output sample index (m), a second row identifies the input sample index corresponding to the output sample index of the same column (INT(m*M/L)), and a third row identifies the input index offset corresponding to the output sample index and input sample index of the same column (INT ((m+1)*M/L)INT (m*M/L)).
TABLE-US-00001 TABLE 1 (L/M = 4/5) Output Sample Index 0 1 2 3 4 5 6 7 8 9 10 11 12 Input 0 1 2 3 5 6 7 8 10 11 12 13 15 Sample Index Input 1 1 1 2 1 1 1 2 1 1 1 2 1 Index Offset
TABLE-US-00002 TABLE 2 (L/M = 8/21) Output Sample Index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Input 0 2 5 7 10 13 15 18 21 23 26 28 31 34 36 Sample Index Input 2 3 2 3 3 2 3 3 2 3 2 3 3 2 3 Index Offset
[0033] As illustrated in the above tables, the input sample indices and the successive input index offsets are not fixed and vary with time based on the ratio of L to M. In the following section, conventional direct implementation of input formatter is compared against the techniques described herein for efficient input formatting.
[0034] In a parallel implementation scheme, in which samples are processed in parallel, a method for efficiently formatting the inputs in accordance with embodiments described herein is illustrated and explained with reference to a particular example illustrated in
Parallelization Factor (Parallel Output Computation) (P): 8
[0035] Number of Filter Taps per Phase (T.sub.pp): 12 (parameterized or adaptive)
Input Data Width: 16 (parameterized)
[0036] In a specific implementation of an input formatter 600 represented in
[0037] In an unoptimized structure, a set of 12 samples is required to be read from the FIFO to generate one output (because T.sub.pp=12). In other words, 12 (T.sub.pp) input samples are required for each and every parallel line; on the whole, 96 (i.e., 8*12) samples are read from the FIFO, based on eight offset values. If N is the depth of the FIFO, then 96*16 N:1 MUXes are required to read 96 samples as described above. Implementing such a large number of MUXes results in an enormous area cost and huge congestion issues are incurred in backend implementation (i.e., Physical Design/Place and Route).
[0038] Area efficient implementation for input formatting and avoidance of a direct timing path between F.sub.in and F.sub.out in accordance with embodiments described herein will now be explained. It will be recognized that the total number of unique samples N.sub.uniq required to cater to P parallel lines (for generating P output samples) is defined by:
N.sub.uniq=(P-1)*ceil(M/L)+T.sub.pp
for ceil (M/L)<T.sub.pp.
[0039] SRC may perform a rate conversion R in the range of 0.5 to 1 or effective decimation between 1 and 2. This facilitates an implementation of an input formatter that results in a huge area gain and congestion reduction. The same technique can be applied for any effective decimation less than T.sub.pp. In the case of (M/L).sub.max equal to 2, based on DDC architecture, N.sub.uniq is 26. Accordingly, in a given F.sub.out cycle, only 26 unique samples are read from the FIFO and from that, 96 samples can be arranged, as dictated by L/M. As shown in
[0040] As will be further described with reference to
[0041] As used herein, P is the number of parallel samples (or parallel paths) required, T.sub.pp is the number of taps per phase (or the order of the filter), L/M is the resampling ratio (interpolator/decimator), R is equal to ceil(M/L) (i.e., the smallest integer not less than M/L), F.sub.in is the input clock rate, and F.sub.out is the output clock rate. In general, the functionality of the input formatter is to arrange and provide the required samples to P mult-add blocks, or instances, for filtering operations. The input formatter takes P samples at F.sub.in, stores them, and outputs P*T.sub.pp samples at F.sub.out as required for P mult-add blocks. Each mult-add block needs T.sub.pp samples to produce one output; therefore P*T.sub.pp samples are required for P blocks to produce P output. Samples driven by the input formatter to a mult-add block are governed by the resampling ratio; the sample arrangement changes depending on the value of L/M. As will be explained in greater detail below, embodiments described herein efficiently provide P*T.sub.pp samples to each mult-add block.
[0042] In particular, P input samples are written to a custom FIFO at every F.sub.in clock cycle. The custom FIFO has P write lines. It will be recognized that, although P*T.sub.pp samples are required to be given out, all of them may not be unique. As noted above, the total number of samples required for P mult-add blocks is N.sub.uniq. N.sub.uniq samples are read from the FIFO at every F.sub.out clock cycle. The FIFO has N.sub.uniq read lines. In cases in which M/L<T.sub.pp, N.sub.uniq will be less than P*T.sub.pp; therefore, the number of samples required to be read from the FIFO will be less than P*T.sub.pp. For closer L and M values, N.sub.uniq is much less than P*T.sub.pp, reducing the MUXing complexity to a large extent. From N.sub.uniq samples, P*T.sub.pp samples are generated through another level of MUXing, again governed by L/M.
[0043] Table 1 below represents how the samples are arranged/selected for P parallel lines once N.sub.uniq samples are read from the FIFO.
TABLE-US-00003 TABLE 1 Number of Parallel Sample Start Length Line Start Index Indices of Input Samples Set Index from N.sub.uniq Possible Samples (in each line one of the sets is selected) P-1 0 1 T.sub.pp {0, 1, 2, . . . until T.sub.pp Samples} P-2 1 or R T.sub.pp {1, 2, 3 . . . until T.sub.pp Samples} or 2 or . . . {2, 3, 4 . . . until T.sub.pp Samples } or . . . Until R until {R, R + 1, R + 2, . . . until T.sub.pp Samples} P-3 2 or 2R-1 T.sub.pp {2, 3, 4 . . . until T.sub.pp Samples} or 3 or . . . {3, 4, 5 . . . until T.sub.pp Samples} or . . . Until 2R until {2R, 2R + 1, 2R + 2, . . . until T.sub.pp Samples} P-4 3 or 3R-2 T.sub.pp {3, 4, 5, . . . until T.sub.pp Samples} or 4 or . . . {4, 5, 6, . . . until T.sub.pp Samples} or . . . Until 3R until {3R, 3R + 1, 3R + 2, . . . until T.sub.pp Samples} P-5 4 or 4R-3 T.sub.pp {4, 5, 6, . . . until T.sub.pp Samples} or 5 or . . . {5, 6, 7, . . . until T.sub.pp Samples} or . . . Until 4R until {4R, 4R + 1, 4R + 2, . . . until T.sub.pp Samples} 0 P or PR-P + 1 T.sub.pp {P, P + 1, P + 2, . . . until T.sub.pp Samples} or P-1 or . . . {P + 1, P + 2, P + 3, . . . until T.sub.pp Samples} or . . . until PR until {PR, PR + 1, PR + 2, . . . until T.sub.pp Samples}
[0044] Referring to Table 1 above, it will be noted that the first line (P-1) does not require any MUX, the second line (P-2) requires an R:1 MUX (for each of T.sub.pp samples), the third line (P-3) requires a (3R-2):1 MUX, the fourth line (P-4) requires a (4R-3):1 MUX, and so on. Accordingly, for smaller values of R, the MUXing complexity is much less. For each parallel line, the sample start index is found form L and M and is the MUX select for the above-noted MUXes. The input sample offset between successive output samples is given by INT((m+1)R)INT(mR), where m is the output index and M+1 is the next output index. These offsets are computed and kept in the hardware from which selects are generated. The offsets are repeated after every L cycles. From Table 1, it will be noted that no MUX select is needed for the first line (P-1), the offset of the second line (P-2) is used as the MUX select for the second line, the offset of the second line (P-2) plus the offset of the third line (P-3) is used as the MUX select for the third line, the offset of the second line (P-2) plus the offset of the third line (P-3) plus the offset of the fourth line (P-4) is used as the MUX select for the fourth line, and so on. Offset(P-1)+offset(0)+FIFO current read pointer gives the read pointer for the next cycle to fetch the next set of N.sub.uniq samples from the FIFO. The above steps are repeated for every F.sub.out cycle.
[0045] The hardware cost for reading 26 unique samples from a 64 deep FIFO is 26*16 64:1 MUXes as compared to 96*16 64:1 MUXes in the direct implementation method. If X is assumed to be an area cost for 16 64:1 MUXes, the direct method incurs an area cost of 96X and current proposal would incur a cost of 26X. The additional cost of generating 96 samples from 26 samples is much less for the following reasons. Referring again to
[0046] Similarly, for line number 4 (not shown), input samples will have an offset of 3 to 6 from line number 7 610, which may be implemented using a 4:1 MUX (not shown). This pattern continues through line number 8, designated in
12*16*[2:1+3:1+4:1+5:1+6:1+7:1+8:1 MUXes]
Or 6.25X. Thus, the total area cost in the scheme illustrated in
[0047] Apart from area, the direct implementation method incurs a huge routing congestion problem in backend implementation (Physical Design/Place and Route) due to the need for a large crossbar MUX and a large number of fanouts. Even with 50% utilization, heavy congestion may be observed. In the proposed method shown in
[0048]
[0049] F.sub.in-F.sub.out timing isolation and FIFO size determination in accordance with embodiments described herein will now be described in greater detail. In particular, direct timing paths between F.sub.in and F.sub.out could be of very high frequency (e.g., on the order of F.sub.ADC/4) and filter operations should not fall in those paths. A scheme referred to as Write-Lead/Read-Lag is proposed to solve this timing path issue and is illustrated in
[0050] The size of the FIFO 800 may be determined based on the above-described Write-Lead/Read-Lag scheme and is given by the following equation:
FIFO.sub.size=2*P*ceil(M/L)+T.sub.pp
for ceil (M/L)<T.sub.pp.
[0051] P*ceil(M/L) new input samples are required to produce P samples. In this scheme, as write leads read by an F.sub.out cycle, extra space is required to store the write data, which is accounted for by the factor of 2. T.sub.pp is added to account for the order of the filter. In particular, to produce one output sample, T.sub.pp input samples are required. For effective decimation of M/L and parallel processing, P*ceil (M/L)+T.sub.pp samples are required to produce P samples. For example if M/L is 2, and P is 8 and T.sub.pp is 12, IN(1), IN(2), . . . IN(12) samples are required for the first output, IN(3), IN(4), . . . IN(14) samples are required for the second output, IN(5), IN(6), . . . IN(16) samples are required for the third output, and so on. On the whole 26 samples are required. This number is multiplied by 2 in the above equation, as read happens one cycle later than write.
[0052] It will be recognized that a polyphase implementation of an SRC requires appropriate coefficient selection for every phase, which selection is governed by L/M. A conceptual block diagram of a polyphase SRC structure has been described above with reference to
[0053] As used herein, L.sub.max is the maximum value allowed for L and L.sub.allowed corresponds to the allowed (or possible) values for L. In general, filter coefficients are organized into L phases, or banks, for L/M rate conversion and coefficients are programmed into the L banks. Each Fout clock cycle selects one of the L banks for filter operation, which is governed by the following equation:
coeff.sub.index(i)=mod(i*M,L)
where i is the output sample indexF.sub.out clock cycle and coeff.sub.index(i), points to coefficient set for i.sup.th output sample index. After L cycles, the pattern repeats, thus repeating the coefficient sets. For P parallel processing, P sets of coefficients are required to be forwarded to filter operations. As a result, P*T.sub.pp coefficients must be selected, thus requiring an L.sub.max:1 MUX for each coefficient in each parallel line. In contrast, embodiments described herein efficiently selects P*T.sub.pp coefficients, thus reducing MUX complexity.
[0054]
[0055] A more efficient coefficient selection module (e.g., for implementation as the coefficients bank 406 (
coeff.sub.index(i)=mod(i*M,L)
where i is the output sample indexF.sub.out clock cycle and coeff.sub.index(i), points to coefficient set for i.sup.th output sample index. As is evident from the above equation, when i crosses 95, the coefficient set repeats in the same fashion as 0-95. The coefficient sets are same for 0 to L-1, L to 2L-1, 2L to 3L-1, and so on.
[0056] In the parallel implementation scheme, assuming parallelization to be 8, in a given parallel line, coefficient sets do not span the entire space of coefficient banks (96, in the present example); rather, the number of sets spanned in a given parallel line is given by Max(L/P, 1) when L=2.sup.n (Pnumber of parallel samples) and by Max*L/P, 3) when L=3*2.sup.n (Pnumber of parallel samples).
[0057] Assuming L.sub.max equals 96 and P equals 8, the number of coefficient sets required to be spanned in a given parallel line is 12. This means that for a given parallel line, MUXing occurs only between 12 coefficient sets, rather than all 96 sets. This reduces the MUXing complexity by a factor of 8 over the direct implementation method 900 shown in
[0058] As shown in
[0059] 1. mod (L.sub.max, P)==0
[0060] 2. mod (nL, P)==0 and nL/PL.sub.max
where n is the smallest possible integer. If the above conditions are satisfied, the number of coefficient banks required for each parallel line is L.sub.max/P.
[0061] This scheme 910 works with assistance from software coefficient programming or pre-programmed banks of coefficient registers or hardware doing the same. As coefficient register banks are dedicated to the parallel lines, they must to be programmed as required by that line for a given L/M. Effectively they have to be programmed in a shuffled manner given by the following:
new.sub.index(i,j)=mod[(iP+j)M,L]
and
Coeff.sub.rearranged(i,j)=Coeff.sub.original(new.sub.index(i,j))
where i=0 to 11 and j=parallel line (0 to 7).
[0062] In general, an efficient coefficient selection scheme operates as follows. If L.sub.max/P is an integer and LCM (L.sub.allowed, P)/P<L.sub.max/P, then in a given parallel line, all of the L coefficient sets will not repeat, repetition is confined with in L.sub.max/P sets, and each parallel line will receive L.sub.max/P different coefficient sets. The repeating pattern depends on L and M. For each parallel line, L.sub.max/P banks, or phases, are dedicated and MUXing for each line is confined within those dedicated banks in the hardware. For example, Bank.sub.0 to Bank.sub.Lmax/P-1 are dedicated to Line 0, Bank.sub.Lmax/P to Bank.sub.2Lmax/P-1 are dedicated to Line 1, Bank.sub.2Lmax/P to Bank.sub.3Lmax/P-1 are dedicated to Line 2, and so on. Bank.sub.(P-1)Lmax/P to Bank.sub.Lmax-1 are dedicated to Line P-1.
[0063] Coefficients are not programmed in an direct fashion; rather, they are programmed/sorted by the following equations:
new.sub.index(i,j)=mod[(iP+j)M,L]
Coeff.sub.rearranged(i,j)=Coeff.sub.original(new.sub.index(i,j))
where i=0 to 11 and j=parallel line (0 to 7) in the specific example illustrated herein. Coeff.sub.original has original assorted sets of coefficients, like set.sub.0, set.sub.1, set.sub.2, . . . set.sub.L-1. Coeff.sub.rearranged is shuffled as per the above equations and then programmed to Bank.sub.0, Bank.sub.1, Bank.sub.2, . . . Bank.sub.L-1. The reshuffling of the coefficients ensures that for each line, appropriate L.sub.max/P sets are programmed in L.sub.max/P banks dedicated to them. This mechanism reduces the MUXing from Lmax:1 to L.sub.max/P:1, reducing the complexity by a factor of P. The MUX select line is also optimized as follows. Mod(i*M/L) generator is not required, where i is the output sample index; a simple L.sub.max/P up counter or down counter is sufficient and round-robins L.sub.max/P banks for each line.
[0064]
new.sub.index(i,j)=mod[(iP+j)M,L]
Coeff.sub.rearranged(i,j)=Coeff.sub.original(new.sub.index(i,j))
where i=0 to ((L.sub.max/P)-1) and j=parallel line (0 to (P-1)). In step 1002, the output of the Nth set of L/P (e.g., 12 in the example above) coefficient banks is provided to the inputs of the Nth coefficient MUX for N=0 to (P-1) (e.g., 7 in the example above). In step 1004, a select signal generated by a countdown counter having a maximum count of (L.sub.max/P)-1 (e.g., 11 in the example above) and running at F.sub.out is provided to each of the coefficient MUXes. Finally, in step 1006, coeefficients are output from each of the coefficient MUXes to a corresponding filter instance at F.sub.out.
[0065]
[0066]
[0067] A select line of the coefficient MUX is a simple down counter running at F.sub.out. The value of the counter depends on L. In particular, the counter value is the same as the number of coefficient sets spanned as explained above. The minimum counter is L.sub.max/P, where the select line moves from (L.sub.max/P)-1 to 0. The select line generation doesn't depend on both L and M, thereby eliminating the need for modulo generation as required in the direct implementation method. Additionally, using the present method, the proposed select line is common to all of the parallel lines, thus simplifying the select line generation hardware.
[0068] It should be noted that all of the specifications, dimensions, and relationships outlined herein (e.g., the number of elements, operations, steps, etc.) have only been offered for purposes of example and teaching only. Such information may be varied considerably without departing from the spirit of the present disclosure, or the scope of the appended claims. The specifications apply only to one non-limiting example and, accordingly, they should be construed as such. In the foregoing description, exemplary embodiments have been described with reference to particular component arrangements. Various modifications and changes may be made to such embodiments without departing from the scope of the appended claims. The description and drawings are, accordingly, to be regarded in an illustrative rather than in a restrictive sense.
[0069] Note that with the numerous examples provided herein, interaction may be described in terms of two, three, four, or more electrical components. However, this has been done for purposes of clarity and example only. It should be appreciated that the system may be consolidated in any suitable manner. Along similar design alternatives, any of the illustrated components, modules, and elements of the FIGURES may be combined in various possible configurations, all of which are clearly within the broad scope of this Specification. In certain cases, it may be easier to describe one or more of the functionalities of a given set of flows by only referencing a limited number of electrical elements. It should be appreciated that the electrical circuits of the FIGURES and its teachings are readily scalable and may accommodate a large number of components, as well as more complicated/sophisticated arrangements and configurations. Accordingly, the examples provided should not limit the scope or inhibit the broad teachings of the electrical circuits as potentially applied to myriad other architectures.
[0070] It should also be noted that in this Specification, references to various features (e.g., elements, structures, modules, components, steps, operations, characteristics, etc.) included in one embodiment, exemplary embodiment, an embodiment, another embodiment, some embodiments, various embodiments, other embodiments, alternative embodiment, and the like are intended to mean that any such features are included in one or more embodiments of the present disclosure, but may or may not necessarily be combined in the same embodiments.
[0071] It should also be noted that the functions related to circuit architectures illustrate only some of the possible circuit architecture functions that may be executed by, or within, systems illustrated in the FIGURES. Some of these operations may be deleted or removed where appropriate, or these operations may be modified or changed considerably without departing from the scope of the present disclosure. In addition, the timing of these operations may be altered considerably. The preceding operational flows have been offered for purposes of example and discussion. Substantial flexibility is provided by embodiments described herein in that any suitable arrangements, chronologies, configurations, and timing mechanisms maybe provided without departing from the teachings of the present disclosure.
[0072] Numerous other changes, substitutions, variations, alterations, and modifications may be ascertained to one skilled in the art and it is intended that the present disclosure encompass all such changes, substitutions, variations, alterations, and modifications as falling within the scope of the appended claims.
[0073] Note that all optional features of the device and system described above may also be implemented with respect to the method or process described herein and specifics in the examples may be used anywhere in one or more embodiments.
[0074] The means for in these instances (above) may include (but is not limited to) using any suitable component discussed herein, along with any suitable software, circuitry, hub, computer code, logic, algorithms, hardware, controller, interface, link, bus, communication pathway, etc.
[0075] Note that with the example provided above, as well as numerous other examples provided herein, interaction may be described in terms of two, three, or four network elements. However, this has been done for purposes of clarity and example only. In certain cases, it may be easier to describe one or more of the functionalities of a given set of flows by only referencing a limited number of network elements. It should be appreciated that topologies illustrated in and described with reference to the accompanying FIGURES (and their teachings) are readily scalable and may accommodate a large number of components, as well as more complicated/sophisticated arrangements and configurations. Accordingly, the examples provided should not limit the scope or inhibit the broad teachings of the illustrated topologies as potentially applied to myriad other architectures.
[0076] It is also important to note that the steps in the preceding flow diagrams illustrate only some of the possible signaling scenarios and patterns that may be executed by, or within, communication systems shown in the FIGURES. Some of these steps may be deleted or removed where appropriate, or these steps may be modified or changed considerably without departing from the scope of the present disclosure. In addition, a number of these operations have been described as being executed concurrently with, or in parallel to, one or more additional operations. However, the timing of these operations may be altered considerably. The preceding operational flows have been offered for purposes of example and discussion. Substantial flexibility is provided by communication systems shown in the FIGURES in that any suitable arrangements, chronologies, configurations, and timing mechanisms may be provided without departing from the teachings of the present disclosure.
[0077] Although the present disclosure has been described in detail with reference to particular arrangements and configurations, these example configurations and arrangements may be changed significantly without departing from the scope of the present disclosure. For example, although the present disclosure has been described with reference to particular communication exchanges, embodiments described herein may be applicable to other architectures.
[0078] Numerous other changes, substitutions, variations, alterations, and modifications may be ascertained to one skilled in the art and it is intended that the present disclosure encompass all such changes, substitutions, variations, alterations, and modifications as falling within the scope of the appended claims. In order to assist the United States Patent and Trademark Office (USPTO) and, additionally, any readers of any patent issued on this application in interpreting the claims appended hereto, Applicant wishes to note that the Applicant: (a) does not intend any of the appended claims to invoke paragraph six (6) of 35 U.S.C. section 142 as it exists on the date of the filing hereof unless the words means for or step for are specifically used in the particular claims; and (b) does not intend, by any statement in the specification, to limit this disclosure in any way that is not otherwise reflected in the appended claims.