CALCULATOR AND ASSOCIATED METHOD
20230385373 · 2023-11-30
Inventors
- ZHAOHUI CHEN (Beijing, CN)
- XUANLE REN (Shanghai, CN)
- YANHENG LU (Shanghai, CN)
- Jiansong Zhang (Beijing, CN)
Cpc classification
G06F7/76
PHYSICS
G06F17/156
PHYSICS
International classification
G06F17/15
PHYSICS
Abstract
The present application discloses a calculator and a method thereof. The calculator is configured to accelerate the number-theoretic transformation of a 2.sup.N-dimensional polynomial. The calculator includes a first coefficient memory, a second coefficient memory, a twiddle factor memory, a plurality of processing units and a data flow controller. In the odd-number rounds of coefficient computation operations, the processing units perform first calculation procedures to read coefficients from the first coefficient memory for modulo calculation, and perform first writing procedures to write output coefficients to the second coefficient memory. In even-number rounds of coefficient computation operations, the processing units performs second calculation procedures to read coefficients from the second coefficient memory for modulo calculations, and perform second writing procedures to write output coefficients to the first coefficient memory.
Claims
1. A calculator, configured to perform number-theoretic transformation on a 2.sup.N-dimensional polynomial, wherein N is an integer greater than 1, and the calculator comprises: a first coefficient memory, configured to store 2.sup.N coefficients of the 2.sup.N-dimensional polynomial in an initial period; a second coefficient memory; a twiddle factor memory, configured to store (2.sup.N−1) twiddle factors; 2.sup.M processing units, configured to perform N rounds of coefficient computation operations in parallel, wherein M is an integer greater than 1 and smaller than N; and a data flow controller, configured to control access addresses of the 2.sup.M processing units for accessing the first coefficient memory, the second coefficient memory and the twiddle factor memory; wherein: in each odd-number round of the N rounds of coefficient computation operations: the 2.sup.M processing units perform 2.sup.(N−M−1) rounds of first calculation procedures to read 2.sup.N first coefficients from the first coefficient memory, read at least one first twiddle factor from the twiddle factor memory, and perform a modulo calculation; and the 2.sup.M processing units perform 2.sup.(N−M−1) rounds of first writing procedures to write 2.sup.N first output coefficients generated during computation to the second coefficient memory; and in each even-number round of the N rounds of coefficient computation operations: the 2.sup.M processing units perform 2.sup.(N−M−1) rounds of second calculation procedures to read 2.sup.N second coefficients from the second coefficient memory, read at least one second twiddle factor from the twiddle factor memory, and perform a modulo calculation; and the 2.sup.M processing units perform 2.sup.(N−M−1) rounds of second writing procedures to write 2.sup.N second output coefficients generated during computation to the first coefficient memory.
2. The calculator of claim 1, wherein: the first coefficient memory comprises 2.sup.(M+1) first coefficient storage blocks each configured to store 2.sup.(N−M−1) coefficients; and the second coefficient memory comprising 2.sup.(M+1) second coefficient storage blocks each configured to store 2.sup.(N−M−1) coefficients.
3. The calculator of claim 2, wherein: in each round of the first calculation procedures, each processing unit reads a first coefficient from each of two first coefficient storage blocks of the 2.sup.(M+1) first coefficient storage blocks; and in each round of the second calculation procedures, each processing unit reads a second coefficient from each of two second coefficient storage blocks of the 2.sup.(M+1) second coefficient storage blocks.
4. The calculator of claim 3, wherein: in each round of first calculation procedure, each processing unit reads the one first coefficient from each of the two first coefficient storage blocks according to a same address; and in each round of second calculation procedure, each processing unit reads the one second coefficient from each of the two second coefficient storage blocks according to a same address.
5. The calculator of claim 2, wherein: in each round of first writing procedure, each processing unit writes two first output coefficients generated during computation to two second coefficient storage blocks of the 2.sup.(M+1) second coefficient storage blocks; and in each round of second calculation procedure, each processing unit writes two second output coefficients generated during computation to two first coefficient storage blocks of the 2.sup.(M+1) first coefficient storage blocks.
6. The calculator of claim 5, wherein: in each round of first calculation procedure, each processing unit writes the two first output coefficients to the two second coefficient storage blocks according to the same address; and in each round of second calculation procedure, each processing unit writes the two second output coefficients to the two first coefficient storage blocks according to a same address.
7. The calculator of claim 1, wherein: in each round of first calculation procedure, each processing unit performs a modulo calculation according to the two first coefficients read from the first coefficient memory and a first twiddle factor read from the twiddle factor memory; and in each round of second calculation procedure, each processing unit performs a modulo calculation according to the two second coefficients read from the second coefficient memory read and a second twiddle factor read from the twiddle factor memory.
8. The calculator of claim 7, wherein a first processing unit of the 2.sup.M processing units comprises: a modular multiplication unit, configured to perform a modular multiplication calculation on one of the two first coefficients according to a predetermined modulus and the first twiddle factors to generate a first value; a modular addition unit, configured to perform a modular addition calculation on the other of the two first coefficients according to the predetermined modulus and the first value to generate a first to-be-arranged coefficient; a modular subtraction unit, configured to perform a modular subtraction calculation on the other of the two first coefficients and the first value according to the predetermined modulus to generate a second to-be-arranged coefficient; and a coefficient exchange unit, configured to rearrange an output order of a plurality of first to-be-arranged coefficients generated by the modular addition unit and a plurality of second to-be-arranged coefficients generated by the modular subtraction unit for performing a plurality of subsequent first writing procedures.
9. The calculator of claim 8, wherein the coefficient exchange unit comprises: a first register, having an input terminal and an output terminal, wherein the input terminal of the first register is coupled to the modular addition unit; a second register, having an input terminal and an output terminal, wherein the input terminal of the second register is coupled to the modular subtraction unit; a third register, having an input terminal and an output terminal, wherein the input terminal of the third register is coupled to the output terminal of the second register; a first multiplexer, having a first input terminal, a second input terminal and an output terminal, wherein the first input terminal of the first multiplexer is coupled to the output terminal of the first register, and the second input terminal of the first multiplexer is coupled to the output terminal of the third register; a fourth register, having an input terminal and an output terminal, wherein the input terminal of the fourth register is coupled to the output terminal of the first multiplexer, and the output terminal of the fourth register is configured to output a first arranged output coefficient; and a second multiplexer, having a first input terminal, a second input terminal and an output terminal, wherein the first input terminal of the second multiplexer is coupled to the output terminal of the first register, the second input terminal of the second multiplexer is coupled to the output terminal of the third register, and the output terminal of the second multiplexer is configured to output a second arranged output coefficient; wherein: the first multiplexer alternately outputs data received by the first input terminal and the second input terminal of the first multiplexer; when the first multiplexer outputs data received by the first input terminal of the first multiplexer, the second multiplexer outputs data received by the second input terminal of the second multiplexer; and when the first multiplexer outputs data received by the second input terminal of the first multiplexer, the second multiplexer outputs data received by the first input terminal of the second multiplexer.
10. The calculator of claim 1, wherein the twiddle factor memory comprises 2.sup.M twiddle factor storage blocks, wherein the 2.sup.M twiddle factor storage blocks comprise a first twiddle factor storage block configured to store (2.sup.(N−M)−1+2.sup.(N−M−1)×M) twiddle factors, and 2.sup.(M−i)i.sup.th twiddle factor storage blocks configured to store (2.sup.(N−M−1)×i) twiddle factors, wherein i is an integer between 1 and M.
11. The calculator of claim 10, wherein, in the first (N−M) rounds of coefficient computation operations of the N rounds of coefficient computation operations, the 2.sup.M processing units read at least one twiddle factor from the first twiddle factor storage blocks, and in the last k rounds of coefficient computation operations of the N rounds of coefficient computation operations, the 2.sup.M processing units blocks read 2.sup.(N−K) twiddle factors from 2.sup.(M−K+1) twiddle factor storages of the 2.sup.M twiddle factor storage blocks, wherein k is an integer between 1 and M.
12. A calculation method, wherein, configured to perform a number-theoretic transformation on a 2.sup.N-dimensional polynomial, and the method comprises: in an initial period: storing 2.sup.N coefficients of the 2.sup.N-dimensional polynomial in a first coefficient memory; and storing (2.sup.N−1) twiddle factors corresponding to the 2.sup.N-dimensional polynomial in a twiddle factor memory; in a computation period, using 2.sup.M processing units to perform N rounds of coefficient computation operations in parallel, comprising: in each odd-number round of the N rounds of coefficient computation operations: allowing the 2.sup.M processing units to perform 2.sup.(N−M−1) rounds of first calculation procedures to read 2.sup.N first coefficients from the first coefficient memory, read at least one first twiddle factor from the twiddle factor memory, and perform modulo calculation; and allowing the 2.sup.M processing units to perform 2.sup.(N−M−1) rounds of first writing procedures to write 2.sup.N first output coefficients generated during computation to the second coefficient memory; and in each even-number round of the N rounds of coefficient computation operations: allowing the 2.sup.M processing units to perform 2.sup.(N−M−1) rounds of second calculation procedures to read 2.sup.N second coefficients from the second coefficient memory, read at least one second twiddle factor from the twiddle factor memory, and perform a modulo calculation; and allowing the 2.sup.M processing units to perform 2.sup.(N−M−1) rounds of second writing procedures to write 2.sup.N second output coefficients generated during computation to the first coefficient memory; wherein N is an integer greater than 1, and M is an integer greater than 1 and smaller than N.
13. The method of claim 12, wherein: the first coefficient memory comprises 2.sup.(M+1) first coefficient storage blocks, and the second coefficient memory comprises 2.sup.(M+1) second coefficient storage blocks; and the step of storing the 2.sup.N coefficients of the 2.sup.N-dimensional polynomial in the first coefficient memory comprises allowing the 2.sup.(M+1) first coefficient storage blocks to respectively store 2.sup.(N−M−1) coefficients.
14. The method of claim 13, wherein: the step of allowing the 2.sup.M processing units to perform the 2.sup.(N−M−1) rounds of first calculation procedures comprises, in each round of first calculation procedure, allowing each processing unit to read a first coefficient from each of two first coefficient storage blocks of the 2.sup.(M+1) first coefficient storage blocks; and the step of allowing the 2.sup.M processing units to perform the 2.sup.(N−M−1) rounds of second calculation procedure comprises, in each round of second calculation procedure, allowing each processing unit to read a second coefficient from each of two second coefficient storage blocks of the 2.sup.(M+1) second coefficient storage blocks.
15. The method of claim 14, wherein: in each round of first calculation procedure, each processing unit reads the one first coefficient from each of the two first coefficient storage blocks according to a same address; and in each round of second calculation procedure, each processing unit reads the one second coefficient from each of the two second coefficient storage blocks according to a same address.
16. The method of claim 13, wherein: the step of allowing the 2.sup.M processing units to perform the 2.sup.(N−M−1) rounds of first writing procedures comprises, in each round of first writing procedure, allowing each processing unit to write two first output coefficients generated during computation to two second coefficient storage blocks of the 2.sup.(M+1) second coefficient storage blocks; and the step of allowing the 2.sup.M processing units to perform the 2.sup.(N−M−1) rounds of second writing procedures comprises, in each round of second calculation procedure, allowing each processing unit to write two second output coefficients generated during computation to two first coefficient storage blocks of the 2.sup.(M+1) first coefficient storage blocks.
17. The method of claim 13, wherein: in each round of first calculation procedure, each processing unit writes the two first output coefficients to the two second coefficient storage blocks according to a same address; and in each round of second calculation procedure, each processing unit writes the two second output coefficients to the two first coefficient storage blocks according to a same address.
18. The method of claim 12, wherein: the step of allowing the 2.sup.M processing units to perform the 2.sup.(N−M−1) rounds of first calculation procedures comprises, in each round of first calculation procedure, allowing each processing unit to perform a modulo calculation according to two first coefficients read from the first coefficient memory and a first twiddle factor read from the twiddle factor memory; and the step of allowing the 2.sup.M processing units to perform the 2.sup.(N−M−1) rounds of second calculation procedures comprises, in each round of second calculation procedure, allowing each processing unit to perform a modulo calculation according to two second coefficients read from the second coefficient memory and a second twiddle factor read from the twiddle factor memory.
19. The method of claim 18, wherein, the step of allowing each processing unit to perform the modulo calculation according to the two first coefficients read from the first coefficient memory and the at least one first twiddle factor read from the twiddle factor memory comprises: performing a modular multiplication calculation on one of the two first coefficients and the first twiddle factors according to a predetermined modulus to generate a first value; performing a modular addition calculation on the other of the two first coefficients and the first value according to the predetermined modulus to generate a first to-be-arranged coefficient; and performing a modular subtraction calculation on the other of the two first coefficients and the first value according to the predetermined modulus to generate a second to-be-arranged coefficient; the method further comprises: rearranging an output order of a plurality of first to-be-arranged coefficients and a plurality of second to-be-arranged coefficients generated during a plurality of calculation procedures of each processing unit to perform a plurality of subsequent first writing procedures.
20. The method of claim 12, wherein the twiddle factor memory comprises 2.sup.M twiddle factor storage blocks, and the step of storing the (2.sup.N−1) twiddle factors corresponding to the 2.sup.N-dimensional polynomial to the twiddle factor memory comprises: storing (2.sup.(N−M)−1+2.sup.(N−M−1)×M) twiddle factors in first twiddle factor storage blocks of the 2.sup.M twiddle factor storage blocks; and storing (2.sup.(N−M−1)×i) twiddle factors in the i.sup.th twiddle factor storage block of the 2.sup.(M−i) twiddle factor storage blocks, wherein i is an integer between 1 and M.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0009]
[0010]
[0011]
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
DETAILED DESCRIPTION
[0019] The following disclosure provides various different embodiments or examples for implementing different features of the present disclosure. Specific examples of components and arrangements are described below to simplify the present disclosure. These are, of course, merely examples and are not intended to be limiting. For example, the formation of a first feature over or on a second feature in the description that follows may include embodiments in which the first and second features are formed in direct contact and may also include embodiments in which additional features may be formed between the first and second features, such that the first and second features may not be in direct contact. In addition, the present disclosure may repeat reference numerals and/or letters in the various embodiments. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed.
[0020] Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the invention are approximations, the numerical values set forth in the specific examples are reported as precisely as possible. Any numerical value, however, inherently contains certain errors necessarily resulting from the standard deviation found in the respective testing measurements. Also, as used herein, the term “about” generally means within 10%, 5%, 1%, or 0.5% of a given value or range. Alternatively, the term “generally” means within an acceptable standard error of the mean when considered by one of ordinary skill in the art. As could be appreciated, other than in the operating/working examples, or unless otherwise expressly specified, all of the numerical ranges, amounts, values, and percentages (such as those for quantities of materials, duration of times, temperatures, operating conditions, portions of amounts, and the likes) disclosed herein should be understood as modified in all instances by the term “generally.” Accordingly, unless indicated to the contrary, the numerical parameters set forth in the present disclosure and attached claims are approximations that can vary as desired. At the very least, each numerical parameter should at least be construed in light of the number of reported significant digits and by applying ordinary rounding techniques. Here, ranges can be expressed herein as from one endpoint to another endpoint or between two endpoints. All ranges disclosed herein are inclusive of the endpoints, unless specified otherwise.
[0021]
[0022] The first coefficient memory 110 can store 2.sup.N coefficients P[0] to P[2.sup.N−1] of the 2.sup.N-dimensional polynomial P1 in an initial period, and can store the data outputted by the processing unit 140 during computation. The second coefficient memory 120 can store the data outputted by the processing unit 140 during computation, and the twiddle factor memory 130 can store (2.sup.N−1) twiddle factors ω[1] to ω[2.sup.N−1] required for performing the number-theoretic transformation on the polynomial P1. Generally, the twiddle factors ω[1] to ω[2.sup.N−1] can be calculated in advanced according to the algorithm of the number-theoretic transformation.
[0023] Further, 2.sup.M processing units 140 can perform the modulo calculation required by the number-theoretic transformation according to the coefficients stored in the first coefficient memory 110 or the second coefficient memory 120 and the twiddle factor stored in the twiddle factor memory 130, and the data flow controller 150 can control the access addresses of the 2.sup.M processing units 140 for accessing the first coefficient memory 110, the second coefficient memory 120 and the twiddle factor memory 130 so as to ensure that the 2.sup.M processing units 140 can obtain the correct coefficients for performing the computation.
[0024] In the present embodiment, the calculator 100 can perform the computation of the number-theoretic transformation using an iterative approach, such as using the algorithm proposed by Cooley and Tukey.
[0025] For example, in the first coefficient computation operation, the twiddle factor ω[1] may be adopted to perform the 2.sup.(N-1) round of modulo calculations, and in the second coefficient computation operation, the twiddle factors ω[2] may be adopted to perform the 2.sup.(N-2) round of modulo calculations, and the twiddle factors ω[3] may be adopted to perform the 2.sup.(N-2) round of modulo calculations, and so on so forth. In such case, in each round of coefficient computation operation, the calculator 100 may perform modulo calculations on 2.sup.N input coefficients according to corresponding twiddle factors and generate 2.sup.N output coefficients.
[0026] In the present embodiment, the 2.sup.M processing units 140 can perform modulo calculations of N round of coefficient computation operations in parallel; such as the content calculated in the third layer of the for-loop of
[0027] Since the total number of coefficients read and generated in each coefficient computation operation is fixed (i.e. the total number is 2.sup.N), in the present embodiment, the first coefficient memory 110 and the second coefficient memory 120 can respectively have sufficient space for storing 2.sup.N coefficients, and the data flow controller 150 can alternately allowing the processing unit 140 to read the coefficient from one of the first coefficient memory 110 and the second coefficient memory 120, and write the calculation result in the other of the first coefficient memory 110 and the second coefficient memory 120. For example, in the first round of coefficient computation operation, the data flow controller 150 can control the processing unit 140 to read coefficients P[0] to P[2.sup.N−1] of the 2.sup.N-dimensional polynomial P1 from the first coefficient memory 110, and after performing the computation, control the processing unit 140 to store the computation result in the second coefficient memory 120. Next, in the second round of coefficient computation operation, the data flow controller 150 can control the processing unit 140 to read the coefficient obtained by the previous calculation from the second coefficient memory 120, and after performing the computation, control the processing unit 140 to store the computation result in the first coefficient memory 110 for the use in the next round of coefficient computation operation. In other words, in odd-number rounds of coefficient computation operations, the data flow controller 150 can control the processing unit 140 to read the coefficients from the first coefficient memory 110 to perform computation, and write the computation results to the second coefficient memory 120; whereas in even-number rounds of coefficient computation operations, the data flow controller 150 can control the processing unit 140 to read the coefficients from the second coefficient memory 120 to perform computation, and write the computation results to the first coefficient memory 110.
[0028] Further, since the algorithm of number-theoretic transformation is fixed, when performing the number-theoretic transformation on different polynomials, the order in which the coefficients may be accessed in each round should also be fixed and known. In such case, by properly arranging the order of reading and writing of coefficients, it is possible to read coefficients from the first coefficient memory 110 and write the output coefficients to the second coefficient memory 120 according to the same addresses for each odd-number round of coefficient computation operation. Similarly, it is possible to read coefficients from the second coefficient memory 120 and write the output coefficients to the first coefficient memory 110 according to the same addresses for each even-number round of coefficient computation operation. In this way, the access operations of 2.sup.M processing units on the first coefficient memory 110 and the second coefficient memory 120 can be simplified, thereby simplifying the operation of the calculator 100 and improving the performance of the 2.sup.M processing units 140 when performing parallel computations.
[0029]
[0030] In the present embodiment, Step S210 and Step S220 can be performed in an initial period before the computation operation is executed. In Step S210, the 2.sup.N coefficients P[0] to P[2.sup.N−1] in the 2.sup.N-dimensional polynomial P1 can be stored in the first coefficient memory 110, and in Step S220, (2.sup.N−1) the twiddle factors ω[1] to ω[2.sup.N−1] corresponding to the 2.sup.N-dimensional polynomial P1 can be stored in the twiddle factor memory 130.
[0031] During the computation, the calculator 100 may use the 2.sup.M processing units to perform Steps S240 to S280 to complete the N rounds of coefficient computation operations, and then proceed to Step S290 to complete the computation after said N rounds of coefficient computation operations.
[0032] In Step S240, the calculator 100 can first determine whether the coefficient computation operation currently being performed is an odd-number round (such as the first round, the third round or the fifth round) or an even-number round (such as the second round, the fourth round or the sixth round). In Step S240, when the calculator 100 determines that the coefficient computation operation currently being performed is an odd-number round, it can then perform Step S250 and Step S260, whereas when the calculator 100 determines that the coefficient computation operation currently being performed is an even-number round, it can then perform Step S270 and Step S280.
[0033] As shown in
[0034] According to the algorithm of number-theoretic transformation, the first calculation procedure in the odd-number rounds of coefficient computation operations and the second calculation procedures in the even-number rounds of coefficient computation operations may include substantially the same operations with the difference in the coefficients and twiddle factors that the two read. Further, in each round, when performing the first calculation procedure of the odd-number round of coefficient computation operation or the second calculation procedure of the even-number round of coefficient computation operation, each processing unit 140 performs the modulo calculations according to two coefficients and one twiddle factors to generate two output coefficients. In such case, to allow the 2.sup.M processing units to effectively perform computations in parallel, the first coefficient memory 110 can include 2.sup.(M+1) first coefficient storage blocks, and each first coefficient storage block can store 2.sup.(N−M−1) coefficients. Similarly, the second coefficient memory 120 can also include 2.sup.(M+1) second coefficient storage blocks, and each second coefficient storage block can store 2.sup.(N−M−1) coefficients. In this way, when performing each round of first calculation procedure or second calculation procedure, each processing unit 140 can read the required coefficients from two corresponding first coefficient storage blocks or two corresponding second coefficient storage blocks.
[0035]
[0036] In such case, in each round of first calculation procedure of Step S250, each of the processing units 1401 to 1404 may respectively read one first coefficient from each of two first coefficient storage blocks of the first coefficient storage blocks 1121 to 1128, and in each round of second calculation procedure of Step S270, each of the processing units 1401 to 1404 may respectively read one first coefficient from each of two second coefficient storage blocks of the second coefficient storage blocks 1221 to 1228.
[0037] Further, in each first writing procedure of Step S260, each of the processing units 1401 to 1404 may also write two first output coefficients generated during the computation to two second coefficient storage blocks of second coefficient storage blocks 1221 to 1228, and in each second writing procedure of Step S280, each of the processing units 1401 to 1404 may then write two second output coefficients generated during the computation to two first coefficient storage blocks of the first coefficient storage blocks 1121 to 1128.
[0038] In some embodiments, in order to simplify the access operations of the processing units 1401 to 1404 on the first coefficient memory 110 and the second coefficient memory 120, the calculator 100 can store 32 coefficients P[0] to P[31] of the polynomial P1 and output coefficients generated by each round of coefficient computation operation according to a specific order. That is, in each round of first calculation procedure of Step S250, each of the processing units 1401 to 1404 may read two first coefficients from two corresponding first coefficient storage blocks according to the same address, and in each first writing procedure of Step S260, each of the processing units 1401 to 1404 can write two first output coefficients to two second coefficient storage blocks according to the same address. Similarly, in each round of second calculation procedure of Step S270, each of the processing units 1401 to 1404 may read two second coefficients from two corresponding second coefficient storage blocks according to the same address, and in each second writing procedure of Step S280, each of the processing units 1401 to 1404 can also write two second output coefficients to two first coefficient storage blocks according to the same address.
[0039]
[0040] As shown in
[0041] As shown in
[0042] As shown in
[0043] As shown in
[0044] In some embodiments, in the first to fourth rounds of second calculation procedures in Step S270, the processing units 1401 to 1404 can also read corresponding coefficients from the second coefficient storage blocks 1221 to 1228 according to the addresses and orders shown in
[0045]
[0046] As shown in
[0047] As shown in
[0048] As shown in
[0049] As shown in
[0050] In some embodiments, in the first to fourth rounds of second writing procedures in Step S280, the processing units 1401 to 1404 can also write two second output coefficients in the first coefficient storage blocks 1121 to 1128 according to the addresses and orders shown in
[0051] Further, as shown in
[0052] Further, as shown in
[0053] In the present embodiment, in addition to accessing the first coefficient memory 110 and the second coefficient memory 120 according to a specific order to simplify the wiring connections between 2.sup.M processing units 140 and the first coefficient memory 110 and the second coefficient memory 120 and reducing the operational complexity, the calculator 100 can also store the the twiddle factors ω[1] to ω[2.sup.N−1] required for the number-theoretic transformation algorithm according to a specific order.
[0054] According to the number-theoretic transformation algorithm of
[0055] Further, when the number of twiddle factors required for a specific round of coefficient computation operation is not greater than the total number of rounds (that is, 2.sup.(N−M−1) rounds) of calculation procedures to be performed in each coefficient computation operation, the 2.sup.M processing units 140 may use one single twiddle factor in each round of first calculation procedures or second calculation procedure, whereas different twiddle factors can be used in different rounds of first calculation procedures or second calculation procedures. In such case, the 2.sup.M processing units 140 can still read corresponding twiddle factors from the same twiddle factor storage block.
[0056] However, when the number of twiddle factors required for one coefficient computation operation is greater than the total number of rounds (that is, 2.sup.(N−M−1) rounds) of calculation procedures to be performed in each coefficient computation operation, different processing units 140 may simultaneously use different twiddle factors in each calculation procedure to perform modulo calculations so as to maintain the performance of parallel computation; in such case, because each twiddle factor storage block has only one read/write terminal, the 2.sup.M processing units 140 must read the required twiddle factors from different twiddle factor storage blocks.
[0057] In such case, to maintain the performance of the parallel computation of the processing units 140 and hardware usage rate of the twiddle factor memory 130, the twiddle factor memory 130 can have 2.sup.M twiddle factor storage blocks, which include a first twiddle factor storage block for storing (2.sup.(N-M)−1+2.sup.(N−M−1)×M) twiddle factors, and 2.sup.(M−i)i.sup.th twiddle factor storage blocks for storing (2.sup.(N−M−1)×i) twiddle factors, wherein i is an integer between 1 and M. In this way, in the first (N−M) rounds of coefficient computation operations of the N rounds of coefficient computation operations, the 2.sup.M processing units 140 can read at least one twiddle factor from the first twiddle factor storage block, whereas in the last k rounds of coefficient computation operations of the N rounds of coefficient computation operations, the 2.sup.M processing units 140 can read 2.sup.(N−K) twiddle factors from the 2.sup.(M−K+1) twiddle factor storages blocks of the 2.sup.M twiddle factor storage blocks, wherein k is an integer between 1 and M.
[0058]
[0059] In such case, as shown in
[0060] In the second coefficient computation operation SB, since only two twiddle factors are required, in the first calculation procedures PD1B and PD2B of the second coefficient computation operation SB, the processing units 1401 to 1404 can read the twiddle factor ω[2] from the twiddle factor storage block 1321, and in the first calculation procedures PD3B and PD4B, the processing units 1401 to 1404 can read the twiddle factor ω[3] from the twiddle factor storage block 1321, and so on so forth. Accordingly, in the four rounds of first calculation procedures PD1C, PD2C, PD3C and PD4C of the third coefficient computation operation SC, the processing units 1401 to 1404 can sequentially read twiddle factors ω[4], ω[5], ω[6] and ω[7] from the twiddle factor storage blocks 1321.
[0061] Next, in the fourth coefficient computation operation SD, since the number of required twiddle factors exceeds the total number 2.sup.(N−M−1) of the calculation procedure required to be performed in each coefficient computation operation; that is, the number of the required twiddle factors is greater than 4, in the four rounds of first calculation procedures PD1D, PD2D, PD3D and PD4D of the fourth coefficient computation operation SD, the processing units 1401 and 1403 can sequentially read twiddle factors ω[8], ω[9], ω[10] and ω[11] from the twiddle factor storage block 1321, whereas the processing units 1402 and 1404 can sequentially read twiddle factors ω[12], ω[13], ω[14] and ω[15] from the twiddle factor storage block 1322.
[0062] Lastly, in the four rounds of first calculation procedures PD1E, PD2E, PD3E and PD4E of the fifth coefficient computation operation SE, the processing unit 1401 can sequentially read twiddle factors ω[16], ω[17], ω[18] and ω[19] from the twiddle factor storage block 1321, the processing unit 1402 can sequentially read twiddle factors ω[20], ω[21], ω[22] and ω[23] from the twiddle factor storage block 1322, the processing unit 1403 can sequentially read twiddle factors ω[24], ω[25], ω[26] and ω[27] from the twiddle factor storage block 1323, whereas the processing unit 1404 can sequentially read twiddle factors ω[28], ω[29], ω[30] and ω[31] from the twiddle factor storage block 1324.
[0063] As a result, the parallel computation performance of the processing unit 140 can be maintained without unnecessarily increase the capacity of the twiddle factor memory 130.
[0064] Further, in each round of calculation procedure in Step S250 and 270, each of the processing units 1401 to 1404 may perform the calculation in the third layer of the for-loop as shown in
[0065] As shown in
[0066] For example, in the first calculation procedure of the first round of coefficient computation operation, the processing unit 1401 can read two first coefficients P[0] and P[16] from the first coefficient storage blocks 1121 and 1125 and can read the corresponding first twiddle factor ω[1] from the twiddle factor memory 130. The modular multiplication unit 142 can perform a modular multiplication calculation on the first coefficient P[16] and the first twiddle factors ω[1] according to a predetermined modulus q to generate a first value V. Next, the modular addition unit 144 can perform a modular addition calculation on the first coefficient P[0] and the first value V according to the predetermined modulus q to generate a first to-be-arranged coefficient P′[16], whereas the modular subtraction unit 146 can perform a modular subtraction calculation on the first coefficient P[0] and the first value V according to the predetermined modulus q to generate a second to-be-arranged coefficient P′[0].
[0067] In the present embodiment, the processing unit 1401 may not directly perform the first writing procedure after generating the to-be-arranged coefficients P′[0] and P′[16] and directly write the to-be-arranged coefficientz P′[0] and P′[16] in the second coefficient memory 120, in order to maintain the order of storing each coefficient in the second coefficient memory 120 so that in the calculation procedures of the second round of coefficient computation operation, the processing units 1401 to 1404 can still read coefficients from the second coefficient memory 120 according to the addresses and order shown in
[0068]
[0069] In the present embodiment, the multiplexer MUX1 alternately outputs data received by the first input terminal of multiplexer MUX1 and data received by the second input terminal of multiplexer MUX1. Further, when the multiplexer MUX1 outputs the data received by the first input terminal of the multiplexer MUX1, the multiplexer MUX2 may output the data received by the second input terminal of the multiplexer MUX2; when the multiplexer MUX1 outputs the data received by the second input terminal of the multiplexer MUX1, the multiplexer MUX2 may output the data received by the first input terminal of the multiplexer MUX2.
[0070] In such case, the coefficient exchange unit 148 may alternately output the to-be-arranged coefficients obtained by calculation in the calculation procedure by the processing unit 1401. For example, if the processing unit 1401 follows the order shown in
[0071] As a result, in the second round of coefficient computation operation, the processing unit 1401 can follow the same addresses and order according to
[0072] In other words, the processing units 1401 to 1404 can rearrange the order of output coefficients with the coefficient exchange unit 148, such that in each calculation procedure, the processing units 1401 to 1404 can obtain corresponding coefficients from the first coefficient memory 110 or the second coefficient memory 120 according to the addresses and order of
[0073] In view of the foregoing, the calculator and calculation method of the present disclosure can perform modulo calculations of number-theoretic transformation using multiple processing units in parallel, and can access the data in two coefficient memories according to a specific order, thereby simplifying the wirings between the processing units and coefficient memories and improving the overall computation performance thereof.
[0074] The foregoing description briefly sets forth the features of some embodiments of the present application so that persons having ordinary skill in the art more fully understand the various aspects of the disclosure of the present application. It may be apparent to those having ordinary skill in the art that they can easily use the disclosure of the present application as a basis for designing or modifying other processes and structures to achieve the same purposes and/or benefits as the embodiments herein. It should be understood by those having ordinary skill in the art that these equivalent implementations still fall within the spirit and scope of the disclosure of the present application and that they may be subject to various variations, substitutions, and alterations without departing from the spirit and scope of the present disclosure.