Low complexity conversion to Montgomery domain

Abstract

Disclosed herein is an apparatus for calculating a cryptographic component R.sup.2 mod n for a cryptographic function, where n is a modulo number and R is a constant greater than n. The apparatus comprises a processor configured to set a start value to be equal to R mod n, perform b iterations of a shift and subtract operation on the start value to produce a base value, wherein the start value is set to be equal to the base value after each iteration, set a multiplication operand to be equal to the base value, and perform k iterations of a Montgomery modular multiplication of the multiplication operand with the multiplication operand to produce an intermediate result, wherein the multiplication operand is set to be equal to the intermediate result after each iteration, wherein the shift and subtract operation comprises determining a shifted start value which is equivalent to the start value multiplied by two, and subtracting n from the shifted start value if the shifted start value is greater than or equal to n.

Claims

1. A method for calculating, using a processor, a cryptographic component R.sup.2 mod n for a cryptographic function, where n is a modulo number and R is a constant greater than n, the method comprising: setting a start value to be equal to R mod n; performing b iterations of a shift and subtract operation on the start value to produce a base value, wherein the start value is set to be equal to the base value after each iteration; setting a multiplication operand to be equal to the base value; and performing k iterations of a Montgomery modular multiplication of the multiplication operand with the multiplication operand to produce an intermediate result, wherein the multiplication operand is set to be equal to the intermediate result after each iteration; wherein the shift and subtract operation comprises determining a shifted start value which is equivalent to the start value multiplied by two, and subtracting n from the shifted start value if the shifted start value is greater than or equal to n.

2. The method of claim 1, wherein: the cryptographic component is an integer equal to R.sup.2 mod n; R is of the form 2.sup.l; l is an integer; n is an integer which is less than R; and R is coprime with n.

3. The method of claim 1, wherein determining a shifted start value which is equivalent to the start value multiplied by two comprises performing a single bit left shift of the start value.

4. The method of claim 1, further comprising setting the value of b based on the value of R.

5. The method of claim 1, further comprising setting the value of b to be equal to the lowest integer value for which a single iteration of the Montgomery multiplication consumes less processing time than b iterations of the shift and subtract operation.

6. The method of claim 1, further comprising setting the value of k based on the value of b.

7. The method of claim 6, wherein the value of k is set based on the value of b to satisfy the relationship 2.sup.2.sup.k.sup.b−1<R<2.sup.2.sup.k.sup.b+1.

8. The method of claim 6, wherein the value of k is set based on the value of b to satisfy the equation R=2.sup.2.sup.k.sup.b.

9. The method of claim 1, further comprising setting the value of b based on a calculation of the power consumed to calculate the cryptographic component, or based on the time taken to calculate the cryptographic component.

10. The method of claim 1, further comprising: determining an adjustment parameter indicative of a difference between the intermediate result and the cryptographic component; and performing Montgomery multiplication of the intermediate result with the adjustment parameter, to calculate the cryptographic component for the cryptographic function.

11. The method of claim 10, wherein determining the adjustment parameter comprises determining an excess parameter, the excess parameter being indicative of a difference between the intermediate result and the cryptographic component.

12. The method of claim 10, wherein the adjustment parameter is a function of an inverse of the excess parameter.

13. The method of claim 10, wherein determining the excess parameter comprises: determining an integer c for which 2.sup.2.sup.k.sup.b−c is equal to R; and setting the excess parameter to be equal to 2.sup.c.

14. The method of claim 10, wherein the adjustment parameter is equal to the inverse of the excess parameter multiplied by R mod n.

15. The method of claim 10, wherein the adjustment parameter is equal to the inverse of the excess parameter multiplied by R.

16. An apparatus for calculating a cryptographic component R.sup.2 mod n for a cryptographic function, where n is a modulo number and R is a constant greater than n, the apparatus comprising a processor configured to: set a start value to be equal to R mod n; perform b iterations of a shift and subtract operation on the start value to produce a base value, wherein the start value is set to be equal to the base value after each iteration; set a multiplication operand to be equal to the base value; and perform k iterations of a Montgomery modular multiplication of the multiplication operand with the multiplication operand to produce an intermediate result, wherein the multiplication operand is set to be equal to the intermediate result after each iteration; wherein the shift and subtract operation comprises determining a shifted start value which is equivalent to the start value multiplied by two, and subtracting n from the shifted start value if the shifted start value is greater than or equal to n.

17. The apparatus of claim 16, wherein the processor is further configured to: determine an adjustment parameter indicative of a difference between the intermediate result and the cryptographic component; and perform Montgomery multiplication of the intermediate result with the adjustment parameter, to calculate the cryptographic component for the cryptographic function.

18. The apparatus of claim 17, wherein the processor comprises: a shift and subtract unit having a first input, a second input, and an output; a Montgomery multiplication unit having a first input and a second input for receiving a first operand and a second operand, respectively, and having an output for supplying an intermediate result, the Montgomery multiplication unit configured to perform a Montgomery multiplication of the first operand with the second operand to produce the intermediate result; and a controller for controlling the values of the first input and the second input, the controller configured to perform the steps of, setting the input to the shift and subtract unit to be equal to a start value, iteratively setting the input to the shift and subtract unit to be equal to the output of the shift and subtract unit, for b shift and subtract operation iterations, setting the first input and the second input of the Montgomery multiplication unit to be equal to the output of the shift and subtract unit, and iteratively setting the first input and the second input to the intermediate multiplication result, for k Montgomery multiplication iterations.

19. The apparatus of claim 18, wherein the controller is further configured to perform the steps of: determining an adjustment parameter indicative of a difference between the intermediate result and the cryptographic component; and performing Montgomery multiplication of the intermediate result with the adjustment parameter, to calculate the cryptographic component for the cryptographic function.

20. An apparatus for calculating a cryptographic component R.sup.2 mod n for a cryptographic function, where n is a modulo number and R is a constant greater than n, the apparatus comprising: means for setting a start value to be equal to R mod n; means for performing b iterations of a shift and subtract operation on the start value to produce a base value, wherein the start value is set to be equal to the base value after each iteration; means for setting a multiplication operand to be equal to the base value; and means for performing k iterations of Montgomery modular multiplication of the multiplication operand with the multiplication operand to produce an intermediate result, wherein the multiplication operand is set to be equal to the intermediate result after each iteration; wherein the shift and subtract operation comprises determining a shifted start value which is equivalent to the start value multiplied by two, and subtracting n from the shifted start value if the shifted start value is greater than or equal to n.

Description

BRIEF DESCRIPTION OF DRAWINGS

(1) The technology will be described with reference to the following drawings, in which:

(2) FIG. 1 is a block diagram illustrating a cryptographic processing unit, according to an embodiment;

(3) FIG. 2 is a block diagram illustrating a subsection of a Montgomery arithmetic logic unit (ALU) of FIG. 1, according to an embodiment;

(4) FIG. 3 is a flow chart illustrating a method to compute R.sup.2 mod n, as performed by the Montgomery ALU of FIG. 2, according to an embodiment;

(5) FIG. 4 is a signal diagram depicting the values of a subset of the signals of the subsection of the Montgomery ALU of FIG. 2, during the calculation of the component R.sup.2 mod n, according to an embodiment;

(6) FIGS. 5A-5C are tables comprising performance figures for calculating R.sup.2 mod n according to an embodiment;

(7) FIG. 6 is a block diagram illustrating a subsection of a Montgomery arithmetic logic unit (ALU) of FIG. 1, according to an embodiment;

(8) FIG. 7 is a flow chart illustrating a method to compute R.sup.2 mod n, as performed by the Montgomery ALU of FIG. 6, according to an embodiment; and

(9) FIGS. 8A-8B are tables comprising performance figures for calculating R.sup.2 mod n according to an embodiment.

DESCRIPTION OF EMBODIMENTS

(10) Cryptographic Processing Unit

(11) FIG. 1 is a block diagram illustrating a cryptographic processing unit 102 which is configured to perform cryptographic processing in accordance with one or more cryptographic algorithms, according to an embodiment. The processing unit 102 may be a logic unit of an application specific integrated circuit (ASIC), field programmable gate array (FPGA) or other form of logic device. The cryptographic processing unit 102 may be integrated within a larger arithmetic logic unit or other processing unit.

(12) The cryptographic processing unit 102 comprises a control logic unit 104, which receives control and data input via signal 106. The control logic unit 104 outputs resulting data from cryptographic processing via data signal 106. The cryptographic processing unit 102 further comprises a Montgomery arithmetic logic unit (ALU) 110, which is configured to convert to and from the Montgomery domain, and to process within the Montgomery domain. The cryptographic processing unit 102 also comprises a cryptographic ALU 114, which is configured to provide cryptographic processing outside the Montgomery domain. The modules of the cryptographic processing unit 102 are clocked by clock signal 120.

(13) Calculating s.sup.e mod n

(14) In one embodiment, the cryptographic processor 102 is configured to perform cryptographic functions in accordance with the Rivest-Shamir-Adleman (RSA) cryptographic algorithm. The cryptographic processor 102 performs modular exponentiation, as part of the RSA algorithm, to compute z=s.sup.e mod n, where n is a large number which may be greater than 2020 bits long in a binary representation.

(15) To calculate z=s.sup.e mod n, the cryptographic processor 102 converts the operand s to a corresponding Montgomery domain representation, which is an alternative mathematical representation of the operand s. The cryptographic processor 102 then calculates the Montgomery domain representation of z within the Montgomery domain. The cryptographic processor 102 may then convert the Montgomery domain representation of z to the integer domain, to produce the desired result z.

(16) Converting to the Montgomery Domain

(17) Converting to and from the Montgomery domain is an additional step not performed in other modulo multiplication algorithms; however, when performing many multiplications in a row, as in modular exponentiation, intermediate results can be left in the Montgomery domain representation, and the initial and final conversions may become a negligible fraction of the overall computation.

(18) In one embodiment, the cryptographic processor 102 converts the operand s to the Montgomery domain by computing s′=sR mod n, for some R=2.sup.l>n, where l is greater than or equal to the number of bits of n. The cryptographic processor 102 calculates the value sR mod n by performing modulo multiplication of the operand (s=s mod n) with the conversion component (R.sup.2 mod n). The cryptographic processor then applies a Montgomery reduction function REDC ((s mod n)(R.sup.2 mod n)) to reduce the modulo multiplication product by a factor of R, to produce the result sR mod n.

(19) Montgomery Multiplication

(20) The combined operation of the modulo multiplication of two Montgomery domain operands, modulo n, and the subsequent application of the Montgomery reduction function to the resulting product of the modulo multiplication is called Montgomery multiplication.

(21) Montgomery multiplication is a function that can be performed by the Montgomery ALU 110 in a number of circumstances. For example, in one embodiment, the cryptographic processor 102 configures the Montgomery ALU 110 to perform the Montgomery multiplication function on two Montgomery domain operands aR mod n and bR mod n, as defined by the cryptographic processor control 104. In performing Montgomery multiplication, the Montgomery ALU 110 multiplies the Montgomery domain operands aR mod n and bR mod n within the Montgomery domain to produce product abR.sup.2 mod n. The Montgomery ALU 110 then applies the Montgomery reduction function to produce abR mod n, which is the Montgomery domain form of the desired product of operands a and b modulo n.

(22) For some cryptographic algorithms, the Montgomery ALU 110 then converts the Montgomery domain form of the product out of the Montgomery domain, by performing a second Montgomery reduction function. Alternatively, if the cryptographic algorithm performed by the cryptographic processor 102 specifies further modulo multiplication involving the product, the cryptographic processor 102 may keep the Montgomery domain form of the product for use as an operand of further Montgomery multiplication functions.

(23) Conversion Component

(24) In order to convert an operand to its corresponding Montgomery domain representation, the cryptographic processor 102 calculates the conversion component R.sup.2 mod n. The calculation of R.sup.2 mod n involves the determination of an integer value equal to R.sup.2 mod n. Such an integer is referred to as the “cryptographic component”, “conversion component” or simply “component” throughout this disclosure.

(25) Calculation of the Component Via Direct Modulo Reduction

(26) Processor 102 can calculate R.sup.2 mod n by performing direct modulo reduction, e.g. by applying a Euclidean algorithm. When R.sup.2 is significantly larger than n, direct modulo reduction may take many iterations.

(27) For small exponents, e, such as the exponents typically used for signature verification algorithms, the direct reduction of R.sup.2 mod n may take over half the time of the computation of s.sup.e mod n. Accordingly, the direct modulo reduction of R.sup.2 mod n can be quite computationally expensive.

(28) Embodiments of the present disclosure seek to ameliorate the issue of expensive calculation of the component R.sup.2 mod n by reducing the clock cycles consumed for the calculation of the component R.sup.2 mod n, compared to calculating R.sup.2 mod n via direct modulo reduction.

(29) Accordingly, embodiments of the present disclosure provide a method and apparatus for calculation of the component R.sup.2 mod n, through application of iterative Montgomery multiplication calculations, followed by an adjustment of the Montgomery multiplication product.

(30) Montgomery ALU Subsection

(31) FIG. 2 is a block diagram illustrating a subsection 200 of the Montgomery ALU 110. Subsection 200 can be used by the Montgomery ALU 110 to calculate the component (R.sup.2 mod n), in accordance with an embodiment.

(32) Control logic block 202 provides control signals 214, 228 to control the function of other blocks within the subsection 200. Control logic block 202 also provides data values, via signals 224 and 226, to be stored in Register A 216 and Register B 218, respectively.

(33) Multiplexer 204 is a 3:2 multiplexer which maps three input signals 206, 208, 222 to two output signals 210, 212 in accordance with the value of the selection signal 214. The selection signal 214 is set by the control logic block 202 and indicates the mapping of one of the three input signals 206, 208, 222 to output signal 210. The selection signal 214 also indicates the mapping of one of the three input signals 206, 208, 222 to output signal 212. Input signal 206 is set by Register A 216. Input signal 208 is set by Register B 218.

(34) Montgomery multiplication block 220 operates to perform Montgomery multiplication on a first and a second operand. The first operand is provided, via multiplexer 204, on data signal 210, and the second operand is provided, via multiplexer 204, on data signal 212. Functional block 220 outputs the result of Montgomery multiplication of the first operand and the second operand on output signal 222. Control logic block 202, multiplexer 204 and Montgomery multiplication unit 220 are all clocked by clock signal 120.

(35) The Montgomery ALU 110 provides control signals and parameters to the control logic block 202 of the subsection 200, via signal 230. Parameters can comprise the values of R and n. The Montgomery ALU 110 provides the component R.sup.2 mod n, as output from the Montgomery multiplication on data signal 222, to logic units within the cryptographic processing unit 102. The component R.sup.2 mod n may then be used in cryptographic functions performed by the cryptographic processing unit 102.

First Embodiment—Base of 2R mod n

(36) In a first embodiment, the cryptographic component (R.sup.2 mod n) is calculated by determining an exponent 2.sup.k which raises 2 to a value greater than R, such that 2.sup.2.sup.k≥R. Then, the Montgomery ALU 110 iteratively performs Montgomery multiplication of two identical operands, from a base of 2R mod n, iteratively producing an intermediate multiplication result. The intermediate multiplication result becomes both the first and the second operands for the next Montgomery multiplication, until the intermediate multiplication result is greater than the component R.sup.2 mod n.

(37) The Montgomery ALU 110 then determines how much greater the intermediate multiplication result is compared to the component R.sup.2 mod n, based on the value of k, and defines an excess parameter based thereon. Then, the Montgomery ALU performs a Montgomery multiplication of the intermediate multiplication result and an adjustment parameter, where the adjustment parameter is a function of the inverse of the excess parameter, to produce the component R.sup.2 mod n.

(38) FIG. 3 is a flow chart illustrating steps 1 to 5 of a method 300 as performed by the Montgomery arithmetic logic unit 110, according to an embodiment, to compute R.sup.2 mod n. Steps 1 to 5 of method 300 are described mathematically, below, with reference to a first embodiment.

(39) In step 1, 302, of method 300, the Montgomery ALU 110 determines the base value δ.sub.0 and stores the base value in Register A 216. In accordance with the first embodiment, the base value δ.sub.0 is set to 2R mod n=2.sup.2.sup.0 R mod n.

(40) If R=2.sup.l is greater than n, and n is greater than 2.sup.l−1, then the base value δ.sub.0=2R mod n is equal to either 2R−2n or 2R−3n, both of which may be calculated. Since R>n, it follows that 2R-2n>0. On the other hand, since R/2<n, it follows that 2R-4n<0. Therefore, only 2R−2n or 2R−3n are candidates for 2R mod n. If n>2/3R, then 3n>2R, therefore 2R−3n<0. In this case, 2R mod n=2R−2n. On the other hand, if n<2/3R, then 3n<2R, therefore 2R−3n>0, and 2R mod n=2R−3n.

(41) In step 2, 304, of method 300, the Montgomery ALU 110 sets a first and a second Montgomery multiplication operand to be equal to the base value δ.sub.0, which was determined in step 1, 302.

(42) In step 3, 306, of method 300, the Montgomery ALU 110 iteratively performs Montgomery multiplication operations on the first and second operands to produce an intermediate multiplication result at each iteration. At the end of each iteration, the first and second operands are set to the intermediate multiplication result calculated via the Montgomery multiplication operation.

(43) A termination condition may be defined in accordance with different embodiments of the present disclosure. In one embodiment, the termination condition is met when the intermediate multiplication result is in the form yR mod n, wherein y is greater than R. In an alternative embodiment, the termination condition is met when the intermediate multiplication result is in the form yR mod n, wherein y satisfies the form y≤R≤y.sup.2.

(44) In accordance with the first embodiment described herein, the Montgomery ALU 110 determines a first integer exponent k for which 2.sup.2.sup.k≥R. Step 3, 306, as performed by the Montgomery ALU 110 comprises a for-loop, in which k iterations are performed. Accordingly, the termination condition for this first embodiment is a determination of whether the number of Montgomery multiplication iterations has reached k iterations.

(45) Specifically, the Montgomery ALU 110 performs the following loop:
for (i=1, to i=k,i++); δ.sub.i=Montgomery(δ.sub.i-1,δ.sub.i-1); end

(46) Once the Montgomery ALU 110 has performed the k.sup.th Montgomery multiplication 308, as determined by termination condition logic 310, the Montgomery ALU 110 proceeds to Step 4, 312.

(47) It is noted that for the abovementioned loop, after iteration i, the intermediate result δ.sub.i=2.sup.2.sup.iR mod n. Accordingly, after iteration k, the intermediate result δ.sub.k=2.sup.2.sup.kR mod n. This final intermediate result, δ.sub.k, is called the penultimate result.

(48) In step 4, 312, of method 300, the Montgomery ALU 110 determines an excess parameter, which is an amount by which the penultimate result differs from the component R.sup.2 mod n. In accordance with the first embodiment, the Montgomery ALU 110 determines an integer c for which R=2.sup.2.sup.k.sup.−c. The Montgomery ALU 110 then determines the excess parameter to be 2.sup.c.

(49) In step 5, 314, of method 300, the Montgomery ALU 110 determines an adjustment parameter, where the adjustment parameter is a function of the inverse of the excess parameter, of the form:
adjustment parameter=2.sup.−cR mod n

(50) Then the Montgomery ALU 110 determines the component R.sup.2 mod n, by performing a Montgomery multiplication of the penultimate result δ.sub.k with the adjustment parameter, in the form:
target value=Montgomery (δ.sub.k,2.sup.−cR mod n)
Calculating c

(51) If R=2.sup.l>n>2.sup.l−1, then n>2.sup.l−1=2.sup.−1R, and therefore 2.sup.−cR mod n=2.sup.−cR for any c>0. If c>0 and 2.sup.−cR>n, then the reduction 2.sup.−cR mod n is directly computed by the Montgomery ALU 110.

(52) Alternatively, in one embodiment, the Montgomery ALU 110 calculates the reduction 2.sup.−cR mod n as an intermediate step in the computation of the reduction 2R mod n. In this case, 2.sup.−cR=2.sup.l−c>n, therefore, the Montgomery ALU 110 determines the integer d such that 2.sup.l−d>n>2.sup.l−d−1. The value d is greater than or equal to c, therefore 2R mod n may be calculated by noting that 2R=.sub.2.sup.d+1R′ where R′=2.sup.l−d, and 2.sup.−cR=2.sup.d−cR′.

(53) Accordingly, the Montgomery ALU 110 calculates the reduction of both 2.sup.d+1R′ mod n and 2.sup.d−c R′ mod n by first computing 2R′ mod n and then proceeding to compute 2(2R′ mod n) mod n and by induction after computing 2.sup.kR′ mod n proceeding to compute 2.sup.k+1R′ mod n=2(2.sup.k R′ mod n) mod n for all k≤d.

(54) If d is a small integer, the full sequence of 2.sup.kR′ mod n may be computed in low complexity. Use of an R which satisfies 2.sup.−dR>n may occur in an embodiment which uses one value of R for all the possible values of n. The value of d may be greater than zero but not much larger. The reduction 2.sup.−cR mod n may be stored in Register B 218. When the termination condition is set to y<R, an excess 2.sup.c for which 2.sup.cR>n, so the Montgomery ALU 110 calculates 2.sup.c R mod n.

(55) Signal Diagram

(56) FIG. 4 is a signal diagram depicting the values of a subset of the signals of the subsection 200 of the Montgomery ALU 110 during the calculation of the component R.sup.2 mod n, for the first embodiment of the present disclosure. Signal 402 is a clock signal and, for this embodiment, the Montgomery ALU is positive-edge triggered.

(57) The first operand is an output signal 210 of the multiplexer 204. The second operand is the other output signal 212 of the multiplexer 204. On the first clock cycle, both of the multiplexer outputs are 2R mod n. Accordingly, the first and the second operands are 2R mod n. For the following clock cycles, up to k, the multiplexer outputs are the values fed back from the Montgomery unit.

(58) On the second clock cycle, the output 222 of the Montgomery multiplication unit 220 is equal to 2.sup.2.sup.1 R mod n. On the third clock cycle, the output 222 of the Montgomery multiplication unit 220 is equal to 2.sup.2.sup.2R mod n. On the k.sup.th clock cycle, the output 222 of the Montgomery multiplication unit 220 is equal to 2.sup.2.sup.k R mod n.

(59) On the k.sup.th clock cycle, one of the multiplexer outputs 210 is the intermediate result 2.sup.2.sup.k R mod n as fed back from the Montgomery multiplication unit on signal 222. The other multiplexer output 212 is selected as the contents of Register B 218, which is set to 2.sup.−c R mod n.

(60) On the last clock cycle (k+1), the output 222 of the Montgomery multiplication unit 220 is the component R.sup.2 mod n.

(61) Performance

(62) Embodiments of the present disclosure may be applied to simplify the computation of R.sup.2 mod n in RSA computations through the use of Montgomery multiplication. In particular, the first embodiment, as described above, can reduce the latency of the calculation of R.sup.2 mod n for RSA compared to the method of direct modulo reduction.

(63) For small exponents e of s.sup.e mod n, which are typically used in RSA signature verification, the direct modulo reduction of R.sup.2 mod n takes approximately two-thirds of the computation time for the calculation of s.sup.e mod n. In contrast, the method proposed herein takes approximately only one-third of the computation time, as exemplified by the following performance figures.

(64) FIGS. 5A-5C are tables comprising performance figures for calculating R.sup.2 mod n via the first embodiment described herein, contrasted with the performance figures for calculating R.sup.2 mod n via the method of direct modulo reduction. FIGS. 5A, 5B and 5C provide performance figures for calculating R.sup.2 mod n where the key size is 2048 bits, 3072 bits and 4096 bits, respectively.

(65) Considering FIG. 5A, it can be seen that, in terms of clock cycles, the improved method saves almost 100,000 clock cycles, compared to the method of direct modulo reduction. At a clock frequency of 200 MHz, this translates to a latency saving of 494.12 microseconds.

(66) In a hardware simulation, the clock cycle consumption for the total computation of s.sup.e mod n (i.e. the RSA core) was reduced from 240,300 cycles, in which R.sup.2 mod n was calculated using direct modulo reduction, to 145,000 cycles, in which R.sup.2 mod n was calculated via an embodiment of the method 300.

Further Advantages

(67) Advantageously, the cryptographic architecture of an embodiment of the present disclosure may be more efficiently utilised because the method 300 can use the same Montgomery multiplication units which are used by the RSA multiplication. Accordingly, there may be a reduced requirement for dedicated computational units, which may not be fully utilised after the computation of R.sup.2 mod n is complete. The inclusion of dedicated computation units for the calculation of a specific cryptographic value is often undesired due to increased implementation footprint, energy consumption and/or design complexity.

(68) As noted above, although converting to the Montgomery domain consumes computation time, this computation time can be an acceptable overhead when performing many multiplications in a row, as in modular exponentiation, as intermediate results can be left in Montgomery domain representation, and the initial and final conversions may become a negligible fraction of the overall computation.

(69) If an exponent e is large (e.g. e>2.sup.16), then computation within the Montgomery domain is likely to be the preferred method for computing s.sup.e mod n, even for implementations that calculate R.sup.2 mod n via direct modulo reduction. On the other hand, if an exponent e is smaller, it may be more efficient for an implementation to compute s.sup.e mod n directly, without converting to and from the Montgomery domain, since the calculation of R.sup.2 mod n via direct modulo reduction, for conversion to the Montgomery domain, consumes a significant portion of the computation cycles for computing s.sup.e mod n. Advantageously, however, the improved method of calculating of R.sup.2 mod n, as described herein, reduces the computation cycles for converting to the Montgomery domain, thus making computing s.sup.e mod n in the Montgomery domain an efficient option for a wider range of exponents.

(70) An embodiment of the improved method 300 for R.sup.2 mod n described herein may also provide an advantage in the situation where e changes over time.

(71) In a device which uses Montgomery multipliers for calculation of cryptographic expressions other than R.sup.2 mod n, embodiments of the present disclosure can utilise the existing Montgomery multipliers of the device. Accordingly, it may be worthwhile implementing embodiments of the present disclosure for even small exponents, to take advantage of the hardware optimisation provided by the method's utilisation of existing Montgomery multipliers.

(72) Additionally, embodiments of the present disclosure may be advantageous for implementations with limited storage, particularly implementations that cannot store R.sup.2 mod n for future use and have to compute it each time.

(73) It will be appreciated by persons skilled in the art that numerous variations and/or modifications may be made to the above-described method of calculating the component R.sup.2 mod n, without departing from the broad general scope of the present disclosure. Specifically, other embodiments may utilise an alternative method for determining the number of Montgomery multiplication iterations performed by the Montgomery ALU 110. Additionally, other embodiments may determine an alternative base value than the base value described above. Variations on the method 300, as described above, are disclosed in relation to a second embodiment.

Second Embodiment—Base of 2.SUP.b .R

(74) According to a second embodiment of the present disclosure, the computation of R.sup.2 mod n does not require the computation of the base value 2R mod n in all situations. Instead, in accordance with this second embodiment, the Montgomery ALU 110, in step 302, sets the base value for the first iteration to 2.sup.bR for an integer b>1, under certain conditions, as detailed below.

(75) In one example, the cryptographic processing unit comprises an architecture in which the memory register for storing the first and second operands of the Montgomery multiplication is 2.sup.k wide, for some integer k. (For example, the memory register is 2048 bits wide, as is common for RSA implementations, and k=11.) Accordingly, n<2.sup.2.sup.k, and R=2.sup.l>n>2.sup.l−1 for some integer l. If l=2.sup.k, then the cryptographic processing unit continues according to the first embodiment; however, if l<2.sup.k, then following algorithm of the second embodiment may be used.

(76) In step 1, 302, of method 300, the Montgomery ALU 110 determines the base value δ.sub.0 and stores the base value in Register A 216. In accordance with the second embodiment, the Montgomery ALU 110 determines an integer b such that 1≤b≤2.sup.k−l−1.

(77) The Montgomery ALU 110 sets the base value as δ=2.sup.b R=2.sup.b+l. There is no need to compute mod n since δ=2.sup.b R<2.sup.2.sup.k and so there are 2.sup.k available bits for storing δ.

(78) Accordingly, in step 304, the first and second operands for the first iteration of Montgomery multiplication are set to δ.sub.0=2.sup.b R.

(79) In step 3, 306, of method 300, the Montgomery ALU 110 iteratively performs Montgomery multiplication operations on the first and second operands to produce an intermediate result at each iteration. At the end of each iteration, the first and second operands are set to the intermediate result calculated via the Montgomery multiplication operation.

(80) In accordance with the second embodiment described herein, the Montgomery ALU 110 determines the first integer i for which 2.sup.2.sup.i.sup.b≥R by performing the following loop: a. i=1; b. while (2.sup.2.sup.i−1.sup.b<R) i. δ=Montgomery (δ,δ); ii. i=i+1; (after iteration i, δ=2.sup.2.sup.l.sup.b R mod n).

(81) At the exiting of the while loop, as determined by the termination condition logic 310, the resulting δ is the penultimate result. The Montgomery ALU 110 then proceeds to Step 4, 312.

(82) In step 4, 312, of method 300, the Montgomery ALU 110 determines an excess parameter, which is an amount by which the penultimate result differs from the component R.sup.2 mod n. In accordance with the second embodiment, the Montgomery ALU 110 determines an integer c for which R=2.sup.2.sup.l.sup.b−c. The Montgomery ALU 110 then determines the excess parameter to be 2.sup.c.

(83) In step 5, 314, of method 300, the Montgomery ALU 110 determines an adjustment parameter, where the adjustment parameter is a function of the inverse of the excess parameter, of the form:
adjustment parameter=2.sup.−cR

(84) Then the Montgomery ALU 110 determines the value of component R.sup.2 mod n, by performing a Montgomery multiplication of the penultimate result with the adjustment parameter, in the form:
target value=Montgomery (δ,2.sup.−cR)

(85) Advantageously, this second embodiment may reduce the number of Montgomery multiplications required to compute R.sup.2 mod n, compared to the first embodiment, described above.

(86) Note that for b=1, this variant coincides with the first variant. For b=2, the second embodiment saves one Montgomery multiplication compared to the first embodiment. For b=4, the second embodiment saves two Montgomery multiplications compared to the first embodiment. In general, for b=2.sup.2.sup.j, the second embodiment saves j+1 Montgomery multiplications compared to the first embodiment.

Third Embodiment—Two Stage Calculation of Conversion Component

(87) According to a third embodiment, the conversion component, otherwise known as the cryptographic component, is calculated via a two-stage process. The two-stage process comprises a first stage, comprising the calculation of a base value via an iterative application of a shift and subtract operation, and a second stage comprising the calculation of the conversion component via an iterative application of Montgomery multiplication of identical operands, starting with the base value. In some cases, as detailed below, the third embodiment may further comprise determining and applying an adjustment parameter to the result of the iteratively applied Montgomery multiplications.

(88) Advantageously, the third embodiment may reduce the processing time consumed in the computation of the conversion component R.sup.2 mod n, compared to the first and second embodiments described above.

(89) Shift and Subtract Operation

(90) According to the third embodiment, the Montgomery ALU 110 applies an operation referred to as a shift and subtract operation. In general terms, for an integer a, the value 2.sup.2a R mod n may be computed from 2.sup.a R mod n via at least two different methods. According to a first method, a processor performs one Montgomery multiplication of two identical operands, 2.sup.a R mod n, to calculate 2.sup.2a R mod n. According to a second method, a processor performs a operations, where the first operation computes 2.sup.a+1 R mod n by multiplying 2.sup.a R mod n by 2 (which may be implemented by a single bit shift to the left). If the result of shift is greater or equal to n, then the processor subtracts n to obtain 2.sup.a+1 R mod n. The processor continues the shift and subtract operation for a total of a iterations. In each iteration, the processor computes 2.sup.a+i+1 R mod n by shifting 2.sup.a+1 R mod n one bit to the left, and subtracting n if the shift resulted in a number greater than or equal to n.

(91) Conducting the shift and subtract operation on a start value comprises calculating a shifted start value, which is equivalent to the value of the start value multiplied by two, and subtracting n if the shifted start value is greater than or equal to n. The method of calculating the shifted start value may depend on the format in which the start value is represented, including endianness, and the logical units implemented in an embodiment. According to one embodiment, calculating the shifted start value, which is equivalent to the operand 2.sup.a R mod n multiplied by two, is implemented by a single bit shift of the start value to the left, wherein the most significant bit is located in the left most bit and the least significant bit is located in the right most bit. In another embodiment, in which the most significant bit is located in the right most bit, calculating the shifted start value is implemented as a single bit shift to the right.

(92) In another embodiment, calculating the shifted start value is implemented by connecting the bits of a RegisterA containing 2.sup.a R mod n to the bits of a RegisterB configured to contain 2.sup.a+1 R mod n, such that RegisterA(bit i) is connected to RegisterB(bit i+1). Alternative architectures may be implemented to determine the shifted start value, to effect the shift component of the shift and subtract operation.

(93) Calculating the Base Value

(94) According to the third embodiment, the Montgomery ALU 110 uses shift and subtract operations to calculate a base value 2.sup.b R mod n. More particularly, the Montgomery ALU 110 computes 2.sup.b R mod n, for an integer b, by performing b iterations of the shift and subtract operation, starting at start value 2R mod n.

(95) The Montgomery ALU 110 then computes 2.sup.2b R mod n, 2.sup.2.sup.2.sup.b R mod n, 2.sup.2.sup.3.sup.b R mod n, up to 2.sup.2.sup.k.sup.bR mod n via a series of k Montgomery multiplications, to obtain the conversion component R.sup.2 mod n, where R=2.sup.2.sup.k.sup.b.

(96) Accordingly, the Montgomery ALU 110 computes 2.sup.2.sup.k.sup.bR mod n with b shift and subtract operations to obtain 2.sup.b R mod n, and subsequently k iterations of Montgomery multiplication of two identical operands, starting with 2.sup.b R mod n, to obtain 2.sup.2.sup.k.sup.bR mod n.

(97) The Values of b and k

(98) The value of b may be set in accordance with the value of k, such that the application of b shift and subtract operations, followed by k Montgomery multiplications, produces an intermediate result which is either equal to the conversion component, or can be adjusted to be equal to the conversion component by being Montgomery multiplied with an adjustment parameter indicative of a difference between the intermediate result and the cryptographic component.

(99) An intermediate result which can be adjusted to be equal to the conversion component 2.sup.2.sup.k.sup.bR mod n by being Montgomery multiplied with an adjustment parameter, is an intermediate result in the range 2.sup.2.sup.k−1.sup.b R mod n<intermediate result <2.sup.2.sup.k+1.sup.b R mod n. Accordingly, the values of b and k are complementary, and may be configured in various arrangements to satisfy requirements or limitations of the Montgomery ALU 110, as further detailed below.

(100) Montgomery ALU Subsection for Third Embodiment

(101) FIG. 6 is a block diagram illustrating a subsection 600 of the Montgomery ALU 110. Montgomery ALU 110 can use subsection 600 to calculate the conversion component R.sup.2 mod n, in accordance with the third embodiment.

(102) Control logic block 602 provides control signals 604, 606 and 608 to control the function of other blocks within the subsection 600. Control logic block 602 also provides data values, via signals 612 and 614, to be stored in register 616 and register 618, respectively.

(103) Control logic 602 sets register 618, and register 618 stores the value R mod n. Shift and subtract block 622 performs the two-step shift and subtract operation. This operation comprises a one bit left shift, then a conditional subtraction of n, if the left shifted value is greater than or equal to n. The result of the two-step operation performed by shift and subtract block 622 is output on data signal 624.

(104) In accordance with the third embodiment, the shift and subtract operation is performed b times to produce an operand to be used by the Montgomery multiplier 638. The control logic 602 controls the selection signal 604 of the multiplexer 618 so that the output 624 of the shift and subtract block 622 is routed through the multiplexer 620 to the input of the shift and subtract block 622. Register A 626 is configured to store the output 624 of the shift and subtract block 622.

(105) In one embodiment, the control logic block 602 of the Montgomery ALU 110 determines the value of b. The control logic block 602 uses the value of b to control the function of the multiplexer 620 via control signal 604. Accordingly, the input signal to the shift and subtract block 622 is initialised to R mod n from register 618, and for subsequent iterations of the shift and subtract operation, the input signal to the shift and subtract block 622 is set to the shift and subtract output signal 624.

(106) In one embodiment, the control logic block 602 of the Montgomery ALU 110 determines the value of k. The control logic block 602 uses the value of k to control the function of the multiplexer 630 via control signal 608. Multiplexer 630 is a 3:2 multiplexer which maps three input signals, 628, 630 and 640, to two output signals, 634 and 636, in accordance with the value of the selection signal 608. The selection signal 608 is set by the control logic block 602 and indicates the mapping of one of the three input signals, 628, 630 and 640, to output signal 634. The selection signal 608 also indicates the mapping of one of the three input signals, 628, 630 and 640, to output signal 636. Input signal 628 is set by Register A 626. Input signal 632 is set by Register B 616.

(107) Montgomery multiplication block 638 operates to perform Montgomery multiplication on a first operand and a second operand. The first operand is provided, via multiplexer 630, on data signal 634, and the second operand is provided, via multiplexer 630, on data signal 636. Functional block 638 outputs the result of Montgomery multiplication of the first operand and the second operand on output signal 640. Register B 616 stores an adjustment parameter, 2.sup.−cR mod n, which may be Montgomery multiplied with the output of the Montgomery multiplication block 638 on signal 640.

(108) Control logic block 602, shift and subtract logic block 622, multiplexer 620, multiplexer 630 and Montgomery multiplication unit 638 are all clocked by clock signal 120.

(109) The Montgomery ALU 110 provides control signals and parameters to the control logic block 602 of the subsection 600, via signal 642. Parameters can comprise the values of R and n. The Montgomery ALU 110 provides the component R.sup.2 mod n, as output from the Montgomery multiplication block 638 on data signal 640, to logic units within the cryptographic processing unit 102. The component R.sup.2 mod n may then be used in cryptographic functions performed by the cryptographic processing unit 102.

Method for Third Embodiment

(110) FIG. 7 is a flow chart illustrating the steps of a method 700 as performed by subsection 600 of the Montgomery ALU 110, according to an embodiment, to compute the conversion component R.sup.2 mod n. The steps of method 700 are described below, with reference to an embodiment.

(111) In step 702, of method 700, the Montgomery ALU 110 determines the values of b and k. In one embodiment, the Montgomery ALU 110 determines the values of b and k by accessing parameters provided by the control logic unit 104. In one embodiment, the Montgomery ALU 110 determines the values of b and k by calculating the values of b and k. Determining the values of b and k is described in further depth below.

(112) In step 704, subsection 600 of the Montgomery ALU 110 performs b iterations of the shift and subtract operation, starting from R mod n, to calculate the base value δ.sub.0=2.sup.b R mod n. Accordingly, after step 704, the operands for the first iteration of Montgomery multiplication, as provided on signals 634 and 636, are set to δ.sub.0=2.sup.b R mod n.

(113) In step 706, of method 700, the Montgomery ALU 110 sets a Montgomery multiplication operand to be equal to the base value δ.sub.0, which was determined in step 704.

(114) The Montgomery ALU 110 performs k iterations of the Montgomery multiplication operation on the multiplication operand, which is provided on both signals 634 and 636. Accordingly, the multiplication operand is Montgomery multiplied with itself. An intermediate result is provided on signal 640 after each Montgomery multiplication operation. At the end of each iteration, the multiplication operand provided on signals 634 and 636 is set to the intermediate result, on signal 640, as calculated via the Montgomery multiplication operation.

(115) After performing k iterations of the Montgomery multiplication operation, the result on signal 640 is the penultimate result.

(116) If the values of b, k and R are in the form R=2.sup.l, where l is in the form l=b2.sup.k, then the penultimate result will be equal to the conversion component R.sup.2 mod n. Accordingly, in this situation there is no need to apply an adjustment parameter to the penultimate result. In decision 708, the Montgomery ALU 110 determines whether the values of b, k and R are in this form. If the values of b, k and R are in this form, the Montgomery ALU 110 does not perform step 710, and the target value, being the conversion component, is set to be the penultimate result.

(117) In step 710, the Montgomery ALU 110 determines an excess parameter, which is an amount by which the penultimate result differs from the component R.sup.2 mod n. In accordance with the second embodiment, the Montgomery ALU 110 determines an integer c for which R=2.sup.2.sup.k.sup.b−c. The Montgomery ALU 110 then determines the excess parameter to be 2.sup.c.

(118) Further, in step 710, the Montgomery ALU 110 determines an adjustment parameter, where the adjustment parameter is a function of the inverse of the excess parameter, of the form:
adjustment parameter=2.sup.−cR

(119) Then the Montgomery ALU 110 determines the value of component R.sup.2 mod n, by performing a Montgomery multiplication of the penultimate result with the adjustment parameter, in the form:
target value=Montgomery (δ,2.sup.−cR)
Choosing b and k

(120) As noted above, the values of b and k are correlative, meaning that there is a relationship between the value of b and value of k. The values of b and k may be configured to satisfy requirements or limitations of the Montgomery ALU 110. More specifically, the value of b may be set in conjunction with setting the value of k, such that the application of b shift and subtract operations, followed by k Montgomery multiplications produces an intermediate result which is either equal to the conversion component, or can be adjusted to be equal to the conversion component by being Montgomery multiplied with an adjustment parameter. An intermediate result which can be adjusted to be equal to the conversion component 2.sup.2.sup.k.sup.b R mod n by being Montgomery multiplied with an adjustment parameter, is an intermediate result in the range 2.sup.2.sup.k−1.sup.b R mod n<intermediate result <2.sup.2.sup.k+1.sup.b R mod n.

(121) In other words, the values of b and k may be selected by noting that, for each choice of b, the required number of operations to obtain 2.sup.2.sup.k.sup.bR mod n is b shift and subtract operations, and k Montgomery multiplications up to the point where 2.sup.2.sup.k.sup.b is in the range

(122) $\frac{R}{2} < 2^{2^{k} b} < 2 R .$

(123) A Montgomery ALU may be configured with static values of either or both of b and k. Alternatively, control logic block 104 may dynamically configure a Montgomery ALU with values of b and k, or may dynamically set the values of b and k for a calculation of the conversion component. The values of b and k may be adjusted for the calculation of different conversion components. For example, the value of b may be adjusted upwards to increase the number of shift and subtract operations performed by the Montgomery ALU, and to decrease the number of Montgomery multiplications performed by the Montgomery ALU. Conversely, the value of b may be adjusted downwards to decrease the number of shift and subtract operations performed by the Montgomery ALU, and to increase the number of Montgomery multiplications performed by the Montgomery ALU.

(124) Accordingly an embodiment of the Montgomery ALU may be configured to set the values of b and k, or may be configured to calculate preferred values of b and k according to the requirements of the cryptographic processing unit 102. Accordingly, the values of b and k may be hardcoded, dynamically selected, or dynamically calculated.

(125) The selection of preferred values of b and k may depend on the architecture of the subsection 600 of the Montgomery ALU 110. Alternatively or additionally, the selection of the values of b and k may depend on the operational cost of performing the shift and subtract operation, and the operational cost of performing the Montgomery multiplication function, where operational cost may comprise the processing time consumed per operation, the number of clock cycles consumed per operation, or the power consumed per operation.

(126) In one embodiment, the Montgomery ALU 110 chooses the value of b as the first integer for which the Montgomery multiplication operation consumes less processing time than b iterations of the shift and subtract operation. Processing time may be measured in clock cycles, machine cycles, or units of time.

(127) In another embodiment, the Montgomery ALU 110 sets values b and k to avoid the need to apply an adjustment parameter in step 710. More particularly, R is in the form R=2.sup.l, where l is in the form l=b2.sup.k l, where b is an odd integer. Using this factorization, the Montgomery ALU 110 calculates the conversion component R.sup.2 mod n by b shift and subtract operations and k Montgomery multiplications, with no need to apply an adjustment parameter based on the excess parameter 2.sup.c, in step 710.

(128) For example, considering the situation in which R=2.sup.2048=2.sup.2.sup.11, if the Montgomery ALU 110 selects b=10, the Montgomery ALU will perform 10 shift and subtract operations and 8 Montgomery multiplications to obtain 2.sup.10.Math.2.sup.8 R mod n, where 2.sup.10.Math.2.sup.8=2.sup.2560>>2.sup.2048. Accordingly, the intermediate result exceeds the value of the conversion component. As a result, the Montgomery ALU 100 will adjust the intermediate result downwards, by Montgomery multiplying the intermediate result with the adjustment parameter to obtain the conversion component. Alternatively, if the Montgomery ALU 11 selects b=8, the Montgomery ALU 110 will calculate exactly 2.sup.2048R mod n in just 8 shift and subtract operations and 8 Montgomery multiplications, with no need to apply an adjustment parameter.

(129) According to another embodiment, the values of b and k may be determined empirically by calculating, for different values of b, the minimal value k, such that b.Math.2.sup.k≥2.sup.1. For each value of b, the Montgomery ALU 110 calculates the operational cost of computing b shift and subtract operations followed by k Montgomery multiplications and a further Montgomery multiplication to apply the adjustment parameter to reduce the intermediate result to the value of the conversion component. Accordingly, a value of b may be chosen to minimize the cost (in terms of processing time, power consumption, clock cycles or machine cycles) of b shift and subtract operations and k+1 Montgomery multiplications. In one embodiment, the Montgomery ALU sets the value of b based on a calculation of the processing time and/or power that would be consumed in the calculation of the conversion component for that value of b.

(130) According to another embodiment, the Montgomery ALU 110 calculates the maximal value k, such that b.Math.2.sup.k≤2.sup.1 for many values of b. The Montgomery ALU 110 calculates R.sup.2 mod n for each value of b by b shift and subtract operations and k Montgomery multiplications, which provides a value of 2.sup.b.Math.2.sup.kR mod n, followed by a further Montgomery multiplication of the intermediate result with an adjustment parameter to increase the intermediate result to the value of the conversion component.

(131) Performance

(132) FIGS. 8A-8B are tables comprising performance figures for calculating R.sup.2 mod n, according to the third embodiment, where the key size is 2048 bits, 3072 bits and 4096 bits. FIGS. 8A and 8B provide performance figures for calculating R.sup.2 mod n where the value of b is 16 and 1024, respectively.

(133) Considering FIG. 8A, it can be seen that, for a key size of 2048 bits and for b=16, the method of the third embodiment consumes 36,754 clock cycles compared to 52,728 clock cycles consumed by calculating the conversion component via the first embodiment (referring to FIG. 5A). At a clock frequency of 200 MHz, this translates to a latency improvement of 79.87 microseconds.

(134) Considering also FIG. 8B, it can be seen that, for a key size of 2048 bits, the application of the method of the third embodiment with b=1024, increases the number of clock cycles consumed to calculate the conversion component from 36,754 to 107,257 compared to the method of the third embodiment with b=16. At a clock frequency of 200 MHz, this translates to a latency increase of 352.52 microseconds.

(135) Furthermore, the number of clock cycles consumed (107,257) to calculate the conversion component for key size of 2048 bits, via the method of the third embodiment with b=1024 is higher than the number of clock cycles consumed (52,728) to calculate the conversion component for key size 2048 bits via the method of the first embodiment (referring to FIG. 5A). Accordingly, it can be seen that a non-optimal value of b can increase the latency for calculating the conversion component.

(136) It will be appreciated by persons skilled in the art that numerous variations and/or modifications may be made to the above-described embodiments, without departing from the broad general scope of the present disclosure. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive.

(137) It will be appreciated by persons skilled in the art that the present invention is not limited to what has been particularly shown and described herein. Rather, the scope of the present invention is defined only by the claims that follow.

Low complexity conversion to Montgomery domain

Assignee

Inventors

Cpc classification

Classification Explorer

G09C1/00

PHYSICS

Classification Explorer

G06F7/728

PHYSICS

Classification Explorer

G06F7/722

PHYSICS

Classification Explorer

G06F7/72

PHYSICS

Classification Explorer

H04L9/302

ELECTRICITY

Classification Explorer

H04L9/0841

ELECTRICITY

Classification Explorer

H04L2209/122

ELECTRICITY

International classification

Classification Explorer

G06F21/00

PHYSICS

Classification Explorer

G09C1/00

PHYSICS

Classification Explorer

G06F7/72

PHYSICS

Abstract

Claims

Description