Montgomery multiplication method for performing final modular reduction without comparison operation and montgomery multiplier

09811318 · 2017-11-07

Assignee

Inventors

Cpc classification

International classification

Abstract

A Montgomery multiplier includes a partial product computing unit for multiplying a multiplicand and a multiplier; a modulus reduction computing unit for performing a multiplication of a modulus and a quotient that reflects a quotient sign; an accumulation unit for accumulating in a intermediate value an output value of the partial product computing unit and an output value of the modulus reduction computing unit from a previous cycle; a quotient computing unit for receiving an accumulation value of the accumulation unit during a current cycle and calculating a quotient sign to be used during a next cycle; and a quotient sign determination unit for determining a quotient sign to be used during a next cycle from the multiplicand, the multiplier and the quotient.

Claims

1. A Montgomery multiplier apparatus, comprising: a partial product computing unit configured to multiply a multiplicand and a multiplier; a modulus reduction computing unit configured to multiply a modulus and a quotient with a quotient sign, wherein the quotient is zero during a first cycle; an accumulation unit configured to accumulate in a intermediate value an output value of the partial product computing unit and an output value of the modulus reduction computing unit from a previous cycle; a quotient computing unit configured to receive an accumulation value of the accumulation unit during a current cycle and to calculate a quotient to be used during a next cycle; and a quotient sign determination unit configured to determine the quotient sign to be used during the next cycle from the multiplicand, the multiplier and the quotient.

2. The Montgomery multiplier of claim 1, wherein the multiplicand, the multiplier and the quotient are represented by radix-based bits.

3. The Montgomery multiplier of claim 1, further comprising a first booth recoding unit configured to booth recode the multiplier.

4. The Montgomery multiplier of claim 1, further comprising a second booth recoding unit configured to booth recode the quotient.

5. The Montgomery multiplier of claim 1, wherein the multiplicand and the multiplier are each an integer whose value is in a range -m to m, and wherein m is the modulus.

6. The Montgomery multiplier of claim 1, wherein the quotient sign determination unit determines the quotient sign by tracking a quotient being calculated during each cycle.

7. The Montgomery multiplier of claim 6, wherein the quotient sign for a next cycle is negative if the sign of the quotient and the sign of a product of the multiplicand and the multiplier are both positive, the quotient sign is positive if the sign of the quotient and the sign of the product of the multiplicand and the multiplier are both negative, and the quotient sign is zero otherwise.

8. A Montgomery multiplier apparatus, comprising: a partial product computing unit configured to calculate a partial product of a multiplier and a multiplicand during a current cycle; a quotient sign determination unit configured to determine a quotient sign of a quotient calculated during a previous cycle, wherein the quotient is zero during a first cycle; a modulus reduction computing unit configured to calculate a modulus reduction by multiplying a modulus of the multiplier and multiplicand by the quotient based on the determined quotient sign; an accumulation unit configured to accumulate in a intermediate value the partial product and the modulus reduction from a previous cycle, and a quotient computing unit configured to calculate the quotient from the intermediate value, wherein the quotient is used during a next cycle.

9. The Montgomery multiplier apparatus of claim 8, further comprising a first booth recoding unit configured to booth-recode the multiplier.

10. The Montgomery multiplier apparatus of claim 8, further comprising a second booth recoding unit configured to booth-recode the calculated quotient.

11. The Montgomery multiplier apparatus of claim 8, wherein the quotient sign determination unit determines the quotient sign by tracking a quotient during every cycle.

12. The Montgomery multiplier apparatus of claim 11, wherein the quotient sign is determined by a sign of a most significant bit of the tracked quotient.

13. A Montgomery multiplier apparatus, comprising: a modulus reduction computing unit configured to multiply a modulus and a quotient with a quotient sign, wherein the quotient is zero during a first cycle; an accumulation unit configured to accumulate in a intermediate value a product of a multiplicand and a multiplier received from another unit and an output value of the modulus reduction computing unit from a previous cycle; a quotient computing unit configured to receive an accumulation value of the accumulation unit during a current cycle and to calculate a quotient sign to he used during a next cycle; and a quotient sign determination unit configured to determine the quotient sign to be used during a next cycle from a quotient being calculated during each cycle from the multiplicand, the multiplier and the quotient.

14. The Montgomery multiplier of claim 13, wherein the another unit is a partial product computing unit configured to multiply the multiplicand and the multiplier.

15. The Montgomery multiplier of claim 14, further comprising a first booth recoding unit configured to booth recode the multiplier, and a second booth recoding unit configured to booth recode the quotient.

16. The Montgomery multiplier of claim 13, wherein the quotient sign for the next cycle is negative if the sign of the quotient and the sign of the product of the multiplicand and the multiplier are both positive, the quotient sign is positive if the sign of the quotient and the sign of the product of the multiplicand and the multiplier are both negative, and the quotient sign is zero otherwise.

17. The Montgomery multiplier of claim 13, wherein the multiplicand, the multiplier and the quotient are each integers represented by radix-based bits, and the multiplicand and the multiplier each have a value in a range -m to m, and wherein m is the modulus.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 illustrates the concept of a Montgomery multiplier in accordance with some embodiments of the inventive concept.

(2) FIG. 2 is a block diagram of a Montgomery multiplier in accordance with some embodiments of the inventive concept.

(3) FIG. 3 is a flow chart of a method of a Montgomery multiplier in accordance with some embodiments of the inventive concept.

(4) FIG. 4 is a block diagram of a security system that includes a crypto processor having a modular multiplier in accordance with some embodiments of the inventive concept.

DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS

(5) Exemplary embodiments of inventive concept will be described more fully hereinafter with reference to the accompanying drawings, in which exemplary embodiments of the inventive concept are shown. This inventive concept may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In the drawings, the size and relative sizes of layers and regions may be exaggerated for clarity. Like numbers may refer to like elements throughout.

(6) A Montgomery multiplication algorithm for a multiplication operation in accordance with some embodiments of the inventive concept may include a final reduction, which is an operation of reducing an output range from (−2m, 2m) to (−m, m). Thus, it is not necessary to compare an output value with a modulus size. That is, an operation cycle for a final reduction operation and its related logic are not necessary.

(7) The table below shows a general Montgomery multiplication algorithm.

(8) TABLE-US-00001 Algorithm Montgomery multiplication INPUT: integers m = (m.sub.n-1...m.sub.1m.sub.0).sub.b, x = (x.sub.n-1...x.sub.1x.sub.0).sub.b, y = (y.sub.n-1...y.sub.1y.sub.0).sub.b with −m ≦x, y <m, R = b.sup.n with gcd(m,b) = 1, and m′ = −m.sup.−1mod b OUTPUT: xyR.sup.−1 mod m  1. A ← 0. (Notation: A = (a.sub.na.sub.n-1...a.sub.1a.sub.0).sub.b)  2. For i from 0 to (n−1) do the following:   2.1 q.sub.i ← (a.sub.0 + x.sub.iy.sub.0)m’ mod b   2.2 A ← (A + x.sub.iy + q.sub.im)/b  3. If A ≧ m then A ← A − m  4. Return (A)

(9) The input of a Montgomery multiplication includes a modulus (m), a multiplicand (x) and a multiplier (y). The modulus (m), the multiplicand (x) and the multiplier (y) are input as a radix-based digit value. In the case of the Montgomery multiplication algorithm disclosed above, the multiplicand (x) and the multiplier (y) have values in the range of (0, m). A Montgomery constant R is radix (b) raised to the nth power. The exponent n is an integer greater than 2. The modulus (m) and the radix (b) are relatively prime.

(10) The output of the Montgomery multiplication is xyR.sup.−1 mod m.

(11) After executing Step 2 of the Montgomery algorithm disclosed above, an intermediate value A has a value in the range of [0, 2 m). Thus, Step 3 is performed to reduce the intermediate value A for the next operation. However, executing Step 3 may need complicated hardware for the logic that compares A with m, and many operating cycles, since a cycle is needed to read A and m from an internal memory, to store A and m in the memory, to compare A and m, and to subtract m from A. Implementing a Montgomery multiplier with a multi-precision may further increase the cost and reduce performance. Since the reduction operation (A-m) is selectively executed depending on the intermediate value A, it may be vulnerable to a side channel attack.

(12) When implementing a Montgomery multiplier in hardware, a partial product (x.sub.iy, q.sub.im) of a Step 2.2 can use a booth recoding. Since the type of x, or q, of the partial product (x.sub.iy, q.sub.im) changes from {0, 1, 2, 3} to {−2, −1, 0, 1, 2}, the booth recoding does not need to implement an operation with respect to 3. Thus, a hardware cost can be reduced.

(13) In this case, the algorithm may be modified as follows. For convenience of description, it is assumed that n is an even number. A negative number is expressed by 2's complement. A high rank bit including a.sub.n is a sign padding. A table below illustrates a Montgomery multiplication algorithm using a booth recoding.

(14) TABLE-US-00002 Algorithm Montgomery multiplication (booth recoding) INPUT: integers m = (m.sub.n-1...m.sub.1m.sub.0).sub.b, x = (x.sub.n-1...x.sub.1x.sub.0).sub.b, y = (y.sub.n-1...y.sub.1y.sub.0).sub.b with −m ≦x, y <m, R = b.sup.n with gcd(m,b) = 1, and m′ = −m.sup.−1mod b OUTPUT: xyR.sup.−1 mod m   1. A ← 0. (Notation: A = (a.sub.na.sub.n-1...a.sub.1a.sub.0).sub.b)   2. For i from 0 to n step 2 do the following:    2.1 B_x.sub.i = booth_recoding (x.sub.i+1x.sub.ix.sub.i−1), where x.sub.−1 = 0    2.2 q.sub.i ← (a.sub.0+ B_x.sub.iy.sub.0)m’ mod b.sup.2    2.3 B_q.sub.i ← booth_recoding (q.sub.i)    2.4 A ← (A + B_x.sub.iy + B_q.sub.im )/b.sup.2   3. If A ≦ −m then A ← A + m else if A ≧ m then A ← A − m   4. Return (A)

(15) If a booth recoding is used to reduce the hardware cost, an additional operation for processing a signed number may be needed, which is the step when i=n.

(16) The reason a final reduction operation is needed in the above-described Montgomery multiplication algorithm will be described below. For convenience of description, it is assumed that a radix (b) is 2. Letting an input be x y, and m, a Montgomery multiplication operation satisfies the following equality.
Montgomery multiplication(x,y,m)=(xy+qm)/2.sup.n

(17) Let n be a bit-length of x, y, and m. If x and y satisfy −m<x<m, and −m<y<m respectively, xy satisfies −m.sup.2<xy<m.sup.2.

(18) If using a booth recoding, a quotient (q.sub.i) is one of {−2, −1, 0, 1, 2}. Thus, the maximum value and minimum value of qm satisfy the following inequality.
−2m.sup.2/3<qm<2m.sup.2/3

(19) Thus, since xy+qm satisfies −2 m.sup.2<−5 m.sup.2/3<xy+qm<5 m.sup.2/3<2 m.sup.2, m<2.sup.n, xy+qm/2.sup.n satisfies −2m<(xy+qm)/2.sup.n<2m. That is, a value of a Montgomery multiplication is in the range of (−2m, 2m) before executing a final reduction operation. Consequently, the Montgomery multiplication algorithm needs to perform the final reduction operation. This can be expressed by the following mathematical expression.
(xy+qm)/2.sup.n+q′m

(20) FIG. 1 illustrates the concept of a Montgomery multiplier in accordance with some embodiments of the inventive concept. FIG. 1 schematically depicts the multiplicand (x) and the multiplier (y) with their respective sign bits, the products xy and qm and their respective sign bits, the q′m term with its sign bit, and the final product xyR.sup.−1 mod m and its sign bit. Referring to FIG. 1, whether or not a reduction is executed (q′) can be determined depending on the signs (x, y, q). A size of xy+qm can be classified according to operand signs (x, y, q) based on the table below.

(21) TABLE-US-00003 xy m (always) q s = xy + qm Reduction q′ + + +   0 < s < 2m Necessary (−m) −1 + + − −m < s < m Non-necessary 0 − + + −m < s < m Non-necessary 0 − + − −2m < s < 0   Necessary(+m) 1 Don't care + 0 −m < s < m Non-necessary 0

(22) Since the signs (x, y), which are inputs to a Montgomery multiplication, are already known, if a sign of the reduction quotient (q) can be calculated in advance, a final reduction operation can be performed without comparing the intermediate value A with a size of a modulus (m). A reduction coefficient that is used in the final reduction is q′. Whether or not a reduction operation should be performed can be determined depending on operand signs (x, y, q).

(23) A table below shows a Montgomery multiplication algorithm in accordance with some embodiments of inventive concept using those characteristics.

(24) TABLE-US-00004 Algorithm Proposed Montgomery multiplication INPUT: integers m = (m.sub.n-1...m.sub.1m.sub.0).sub.b, x = (x.sub.n-1...x.sub.1x.sub.0).sub.b, y = (y.sub.n-1...y.sub.1y.sub.0).sub.b with −m ≦x, y <m, R = b.sup.n with gcd(m,b) = 1, and m’ = −m.sup.−1mod b OUTPUT: xyR.sup.−1 mod m   1. q_sign ← 0 A ← 0 (Notation: A = (a.sub.na.sub.n-1...a.sub.1a.sub.0).sub.b)   2. For i from 0 to n−2 step 2 do the following:    2.1 B_x.sub.i = booth_recoding (x.sub.i+1x.sub.ix.sub.i−1), where x.sub.−1 = 0    2.2 q.sub.i ← (a.sub.0+ B_x.sub.iy.sub.0)m’ mod b.sup.2    2.3 B_q.sub.i ← booth_recoding (q.sub.i)    2.4 update q_sign    2.5 A ← (A + B_x.sub.iy + B_q.sub.im)/b.sup.2   3. A ← A + q_sign*m   4. Return (A)

(25) Unlike a Montgomery multiplication algorithm described above, inputs (x, y) have values in the range [−m, m). The output (xyR.sup.−1 mod m) of the Montgomery multiplication becomes A+q_sign×m.

(26) The q_sign can be calculated by tracking a value of q.sub.i. This is because the quotient (q) satisfies the following mathematical formula.
q=q.sub.n-22.sup.n-2+q.sub.n-42.sup.n-4+ . . . +q.sub.22.sup.2+q.sub.02.sup.0

(27) Referring to the mathematical formula, if the highest rank bit q.sub.n-2 is 0, the next highest rank bit q.sub.n-4 determines an overall sign.

(28) A Montgomery multiplication algorithm in accordance with some embodiments of the inventive concept does not need to perform an additional reduction operation to process a sign part. In Step 3, there is no need to compare a modulus (m) with an intermediate value A to perform the reduction operation. In addition, the reduction part of Step 3 may be included in Step 2. Thus, when implementing in hardware, logic is not needed to compare A with the modulus (m). An overall operation performance can be improved by including a reduction in Step 2.

(29) A Montgomery multiplication algorithm in accordance with some embodiments of the inventive concept has the following properties. First, there is no need to compare the intermediate value A with a size of the modulus (m). Second, size comparison logic is not needed. Third, an operation cycle for a size comparison is not needed. Fourth, of the number of operation cycles for a multi-precision operation may be reduced. Since a size comparison operation is not needed, the final reduction (step 3) can be merged into a step 2.5.

(30) FIG. 2 is a block diagram of a Montgomery multiplier in accordance with some embodiments of the inventive concept. Referring to FIG. 2, the Montgomery multiplier 100 includes a partial product computing unit 110, a modulus reduction computing unit 120, an accumulation unit 130, a quotient computing unit 140, first and second recoding units 150 and 160 and a quotient sign determination unit 170.

(31) The partial product computing unit 110 performs a multiplication operation on a multiplicand (x′) and a multiplier (y′) during a current cycle. The multiplier (y′) may be a booth-recoding of the multiplier (y). The modulus reduction computing unit 120 performs a multiplication operation on a modulus (m) and a quotient (q′) that reflects a quotient sign (q_sign). The quotient (q′) may be a booth-recoding of the quotient (q). The accumulation unit 130 accumulates in an intermediate value the output value (xy′) of the partial product computing unit 110 and the output value (q′m) of the modulus product computing unit 120 from a previous cycle. The quotient computing unit 140 calculates a quotient (q) to be used during a next cycle from an accumulation value (xy′+q′m) of the accumulation unit 130. The first recoding unit 150 is provided with a multiplier (y) to output a booth-recoded value (y′) of the multiplier (y). The second recoding unit 160 is provided with a quotient (q) and a quotient sign (q_sign) to output a booth-recoded value (q′) of the quotient (q) and the quotient sign (q_sign). The quotient sign determination unit 170 determines a quotient sign (q_sign) by tracking a quotient (q.sub.i) during every cycle.

(32) In addition, registers for storing the multiplicand (x), the multiplier (y), and the modulus (m) may be further included. The accumulation unit 130 outputs xyR.sup.−1 mod m as a final value of the Montgomery multiplication.

(33) By including a final reduction of an output range from (−2m, 2m) to (−m, m) in the multiplication operation, the Montgomery multiplier 100 does not need to compare an output value with a modulus (m). Thus, the Montgomery multiplier 100 does not need the related logic and operation cycle for a final reduction operation. That is, a Montgomery multiplier according to embodiments of the inventive concept may have a reduced hardware cost and an improved performance as compared with a conventional Montgomery multiplier.

(34) In FIG. 2, the Montgomery multiplier 100 uses a booth recoding. However, embodiments of the inventive concept are not limited thereto. An alternative Montgomery multiplier may not use a booth recoding.

(35) FIG. 3 is a flow chart of a method of a Montgomery multiplier in accordance with some embodiments of the inventive concept. Referring to FIGS. 1 through 3, a Montgomery multiplication operation is as follows.

(36) A partial product is calculated with respect to a multiplicand (x) and a multiplier (y) during a current cycle, such as an i-th Montgomery multiplication operation (S110). A quotient sign (q_sign) with respect to a quotient (q) calculated from an intermediate value (A) determined during a previous (i−1)th Montgomery multiplication operation cycle (S120). The quotient sign (q_sign) can be determined by tracking a quotient at every cycle. A modulus reduction using the quotient sign (q_sign) is calculated (S130). The partial product and the modulus reduction are accumulated in the intermediate value (A) from the previous cycle (S140).

(37) A Montgomery multiplication operation in accordance with some embodiments of the inventive concept does not need a final reduction operation because a modulus reduction operation is performed using the quotient sign (q_sign).

(38) FIG. 4 is a block diagram of a security system 1000 that includes a crypto processor having a modular multiplier in accordance with some embodiments of the inventive concept. Referring to FIG. 4, the security system 1000 includes one or more central processing units (CPUs) 1100, a crypto processor 1200, a buffer memory 1300, a code memory 1400, a nonvolatile memory interface 1500 and at least one nonvolatile memory 1600.

(39) The central processing unit 1100 controls an overall operation of the security system 1000. The crypto processor 1200 decodes a command to enable code certification and electronic signature and processes data under control of the central processing unit (CPU) 1100. The crypto processor 1200 includes at least one of the modular multipliers 100 illustrated in FIG. 2. The buffer memory 1300 can store data, such as input/output data or data used in a code processing process, needed to drive the security system 1000. The buffer memory 1300 can be embodied by a volatile memory device such as a DRAM, a SRAM, etc. The code memory 1400 stores code data needed to operate the security system 1000. The code memory 1400 used for encryption and an electronic signature can store a security module and a modulus for data that should be protected together with a key needed for encryption and the electronic signature. The code memory 1400 can be embodied by a nonvolatile memory device such as a ROM, a PRAM, etc. The nonvolatile memory interface 1500 provides an interface with at least one nonvolatile memory device 1600. The nonvolatile memory device 1600 stores user data.

(40) Since the security system 1000 according to some embodiments of the inventive concept does not need a final step reduction operation as compared with a conventional security system, a hardware size and an operation cycle can be reduced. Thus, the security system 1000 can process data more rapidly.

(41) A Montgomery multiplier and a multiplication operation thereof in accordance with some embodiments of inventive concept, by performing a modulus reduction operation reflecting a quotient sign, does not need to compare an output value with a size of a modulus, which can reduce a hardware cost and an operation cycle.

(42) Although exemplary embodiments of the present general inventive concept have been shown and described, it will be appreciated by those skilled in the art that changes may be made in these exemplary embodiments without departing from the principles and spirit of the general inventive concept, the scope of which is defined in the appended claims and their equivalents. Therefore, the above-disclosed subject matter is to be considered illustrative, and not restrictive.