Efficient modular addition resistant to side channel attacks

09544131 ยท 2017-01-10

Assignee

Inventors

Cpc classification

International classification

Abstract

A cryptographic device performs modular addition between a first integer value x and a second integer value y in a processor by: obtaining a first masked input {circumflex over (x)}, a second masked input , a first mask r.sub.x and a second mask r.sub.y, the first masked input {circumflex over (x)} resulting from the first integer value x masked by the first mask r.sub.x and the second masked input resulting from the second integer value y masked by the second mask r.sub.y; computing a first iteration masked carry value .sub.1, using the first masked input {circumflex over (x)}, the second masked input , the first mask r.sub.x, the second mask r.sub.y and a carry mask value ; recursively updating the masked carry value .sub.i to obtain a final masked carry value .sub.k1, wherein the masked carry value is updated using the first masked input {circumflex over (x)}, the second masked input , the first mask r.sub.x, the second mask r.sub.y, and the carry mask value ; combining the first masked input {circumflex over (x)} and the second masked input and the final masked value .sub.k1 to obtain an intermediate value; combining the intermediate value with the carry mask value to obtain a masked result; and outputting the masked result and a combination of the first mask r.sub.x and the second mask r.sub.y. It is preferred that the combinations use XOR.

Claims

1. A method of performing modular addition between a first integer value x and a second integer value y, the method comprising, in a hardware processor: obtaining a first masked input {circumflex over (x)}, a second masked input , a first mask r.sub.x and a second mask r.sub.y, the first masked input {circumflex over (x)} resulting from the first integer value x masked by the first mask r.sub.x and the second masked input resulting from the second integer value y masked by the second mask r.sub.y; computing a first iteration carry value c.sub.1, using the first masked input {circumflex over (x)}, the second masked input , the first mask r.sub.x, and the second mask r.sub.y; recursively updating intermediate carry values c.sub.i, to obtain a final carry value c.sub.k1, wherein an intermediate carry value is updated using the first masked input {circumflex over (x)}, the second masked input , the first mask r.sub.x and the second mask r.sub.y; combining the first masked input {circumflex over (x)} and the second masked input and the final carry value c.sub.k1 to obtain a masked result; and outputting the masked result.

2. The method of claim 1, wherein the first iteration carry value, the intermediate carry values and the final carry value are masked and: the first iteration carry value c.sub.1 is computed using also a carry mask value ; and the intermediate carry values c.sub.i are updated using also the carry mask value ; and wherein the masked result is obtained by: combining the first masked input {circumflex over (x)} and the second masked input and the final masked carry value to obtain an intermediate value; and combining the intermediate value with the carry mask value to obtain a masked result.

3. The method of claim 2, wherein the intermediate value and the masked result are obtained using XOR between the combined values.

4. The method of claim 2, further comprising outputting a combination of the first mask r.sub.x and the second mask r.sub.y.

5. The method of claim 4, wherein the combination of the first mask r.sub.x and the second mask r.sub.y is obtained using XOR.

6. The method of claim 1, wherein the modular addition is used to subtract the second integer value y from the first integer value x, the method further comprising: between the obtaining and the computing, setting the first masked input {circumflex over (x)} to the bitwise complementation of the first masked input {circumflex over (x)}; and between the combining and the outputting, setting the masked result to the bitwise complementation of the masked result.

7. A device for performing modular addition between a first integer value x and a second integer value y, the device comprising a hardware processor configured to: obtain a first masked input {circumflex over (x)}, a second masked input , a first mask r.sub.x and a second mask r.sub.y, the first masked input {circumflex over (x)} resulting from the first integer value x masked by the first mask r.sub.x and the second masked input y resulting from the second integer value y masked by the second mask r.sub.y; compute a first iteration carry value c.sub.1l, using the first masked input {circumflex over (x)}, the second masked input , the first mask r.sub.x, and the second mask r.sub.y; recursively update intermediate carry values c.sub.i, to obtain a final carry value c.sub.k1 wherein an intermediate carry value is updated using the first masked input {circumflex over (x)}, the second masked input , the first mask r.sub.x and the second mask r.sub.y; combine the first masked input {circumflex over (x)} and the second masked input and the final carry value c.sub.k1 to obtain a masked result; and output the masked result.

8. The device of claim 7, wherein the first iteration carry value, the intermediate carry values and the final carry value are masked and the hardware processor is configured to: compute the first iteration carry value c.sub.1 using also a carry mask value ; and update the intermediate carry values c.sub.i using also the carry mask value ; and wherein the hardware processor is configured to combine the first masked input {circumflex over (x)} and the second masked input and the final masked carry value to obtain an intermediate value, and to combine the intermediate value with the carry mask value to obtain the masked result.

9. The device of claim 8, wherein the intermediate value and the hardware processor is configured to use XOR between the combined values to obtain the masked result.

10. The device of claim 8, wherein the hardware processor is further configured to output a combination of the first mask r.sub.x and the second mask r.sub.y.

11. The device of claim 7, wherein the hardware processor is configured to use the modular addition to subtract the second integer value y from the first integer value x, the hardware processor being further configured to: set the first masked input {circumflex over (x)} to the bitwise complementation of the first masked input {circumflex over (x)}; and set the masked result to the bitwise complementation of the masked result.

Description

BRIEF DESCRIPTION OF DRAWINGS

(1) Preferred features of the present principles will now be described, by way of non-limiting example, with reference to the accompanying drawings, in which

(2) FIG. 1 illustrates a cryptographic device according to a preferred embodiment of the present principles; and

(3) FIG. 2 illustrates a generalization to three or more integers.

DESCRIPTION OF EMBODIMENTS

(4) A main idea of the present principles is to compute addition from Boolean masked inputs by implementing a secure version of the modular addition, that is an implementation where the input values of x and y and the successive values of the carry c.sub.i are kept always masked and where the algorithm outputs a Boolean masked result. This approach is effective if an addition modulo 2.sup.k occurs in combination with a Boolean operation (e.g. a XOR or any operation that is compatible with Boolean masking like logical Shifts and Rotations).

(5) In the algorithm of the present principles, it is ensured that the computations do not leak information about x and y, nor about the successive carries c.sub.i.

(6) Goubin's idea is followed and the carry with 2 is masked, but a change is introduced. Indeed, it is remarked that both the first round carry c.sub.1 and can be computed from the same value .sub.0=(xcustom charactery). It is then proposed to reorder the operations for computing . The reordering allows saving a few operations in the secure version of the algorithm.

(7) Given that .sub.0=2 and the definition of , the masked version of the carry equation simplifies to:

(8) c ^ i = { 2 0 , for i = 0 2 [ c ^ i - 1 ( x y ) ] , for 2 i k - 1
where =[2custom character(xy)].sub.0 is pre-computed once and is the same for round i=1 and for all rounds i{2, . . . , k1}. This means that the first loop iteration is saved and three operations (one AND, one XOR and one Shift) are traded against one Shift operation. The main loop for the blinded carry addition algorithm then becomes:

(9) TABLE-US-00003 /* First round iteration (i = 1) */ B 2 /* Main loop */ for i = 2 to k1 do B Bcustom character A B B B 2B end for

(10) It now remains to compute securely .sub.i from blinded inputs {circumflex over (x)}=xr.sub.x and =yr.sub.y. From a security view-point, at each step of the loop the computation of .sub.i depends on the computation of .sub.i, where .sub.i=.sub.i1custom character(xy). The computation of .sub.i and can be carried out separately and in a secure way. Regarding =2custom character (xy).sub.0, the first part of , i.e. 2custom character(xy) can be securely implemented as
A=xy,A=Acustom character2,
and
R=r.sub.xr.sub.y,R=Rcustom character2.

(11) It will be appreciated that A and R cannot be XOR-ed together as this would unmask the operands. For the second part of , i.e. .sub.0=(xcustom charactery), the calculation relies on a method proposed by Elena Trichina [see Combinational Logic Design for AES Subbyte Transformation on Masked Data. IACR Cryptology ePrint Archive, 2003:236, 2003]. Given the distributive property of the bitwise AND over the XOR, a masked version of the bitwise AND operation can be divided in four AND operations calculated pair-wise between masked data and masks (operations are done with masked data and masks independent from each other):
xcustom charactery=r.sub.xcustom characterr.sup.ycustom characterr.sub.x{circumflex over (x)}custom character{circumflex over (x)}custom characterr.sub.y

(12) If evaluated left-to-right, the expression does not leak any information about the operands. Interestingly, the result of the expression is randomized with and thus is uniformly distributed. This result can be subsequently XOR-ed with values A and R without unmasking the operands. In the present principles, can then be calculated from the inputs {circumflex over (x)}, , r.sub.x, r.sub.y and as follows:

(13) 1. Compute .sub.0=xcustom charactery using the equation hereinbefore.

(14) 2. Compute =.sub.0A.

(15) 3. Compute =R.

(16) Regarding =.sub.i=c.sub.i1custom character(xy) and given the Boolean masking form {circumflex over (x)}=xr.sub.x, =yr.sub.y, its masked version can be computed as

(17) 1. Compute {circumflex over ()}.sub.i=.sub.i1custom character({circumflex over (x)}).

(18) 2. Compute r.sub..sub.i=.sub.i1custom character(r.sub.xr.sub.y).

(19) Similarly to A and R, the values {circumflex over ()}.sub.i and r.sub..sub.i cannot be XOR-ed together as this would unmask the operands. Hence, is randomized with , and can then be used to securely compute .sub.i as .sub.i={circumflex over ()}.sub.ir.sub..sub.i. The secure addition algorithm is illustrated in Algorithm 3:

(20) TABLE-US-00004 Algorithm 3 Secure addition Input: ({circumflex over (x)}, , r.sub.x, r.sub.y) where {circumflex over (x)} = x r.sub.x and = y r.sub.y Output: (A, R) where A = (x + y) r.sub.x r.sub.y and R = r.sub.x r.sub.y /* Initialization */ 1: C (random that can be pre-generated) /* Compute .sub.0 = xcustom character y */ 2: T r.sub.xcustom character r.sub.y 3: T C 4: T custom character r.sub.x 5: t 6: T {circumflex over (x)}custom character 7: T 8: T {circumflex over (x)}custom character r.sub.y 9: T /* Compute = .sub.0 2 custom character (x y) */ 10: A {circumflex over (x)} 11: R r.sub.x r.sub.y 12: C 2C 13: T Ccustom character A 14: T 15: T Ccustom character R 16: T /* First round (i = 1) */ 17: D 2 /* Main loop */ 18: for i = 2 to k 1 do 19: T Dcustom character R 20: D Dcustom character A 21: D D 22: D D T 23: D 2D 24: end for /* XOR with the final carry */ 25: A A D /* Remove the carry mask 2 */ 26: A A C 27: return (A, R)

(21) From a performance point of view, is pre-computed when the main loop starts as well as A={circumflex over (x)} and R=r.sub.xr.sub.y. This pre-computation enables the update of .sub.i inside the main loop using only two additional operations when compared to Algorithm 2 (one AND and one XOR). Algorithm 3 uses 4 additional temporary variables (C, D, T and ), generates one random and takes 5 k+8 operations: 2 k+6 XORs, 2 k+2 ANDS and k logical shifts.

Variant Embodiment

(22) It will be appreciated that it can happen that one of the two operands is masked while the other is not (i.e. adding a variable 2 and a constant K). This can for example be useful with cryptographic algorithms that perform addition and subtraction with pre-defined constants. In a prior art solution, the Boolean masked input {circumflex over (x)}=xr.sub.x is first converted to arithmetic masked value A.sub.x=xr.sub.x using 7 operations. Then the addition with the constant is performed (A.sub.x+K=xr.sub.x+K, r.sub.x)with an unchanged maskand the addition result is finally converted back to a Boolean masked output =(x+K)r.sub.x using an Arithmetic-to-Boolean conversion algorithm. This costs 5 k+5+7+1=5 k+13 operations using Goubin's conversion methods.

(23) The following Algorithm 4 provides a faster algorithm. The main difference to Algorithm 3 is in the initialization step where some operations can be saved as only one operand is masked.

(24) TABLE-US-00005 Algorithm 4 Secure addition with one masked operand Input: ({circumflex over (x)}, K, r.sub.x) where {circumflex over (x)} = x r.sub.x Output: A = (x + K) r.sub.x mod 2.sup.k /* Initialization */ 1: C (random that can be pre-generated) /* Compute = Kcustom character r.sub.x {circumflex over (x)}custom character K */ 2: Kcustom character r.sub.x 3: C 4: T {circumflex over (x)}custom character K 5: T /* Compute = 2custom character ({circumflex over (x)} K) 2custom character r.sub.x */ 6: A {circumflex over (x)} K 7: C 2C 8: T Ccustom character A 9: T 10: T Ccustom character r.sub.x 11: T /* First round (i = 1)*/ 12: D 2 /* Main loop */ 13: for i = 2 to k 1 do 14: T Dcustom character r.sub.x 15: D Dcustom character A 16: D D 17: D D T 18: D 2D 19: end for /* XOR with the final carry */ 20: A A D /* Remove the carry mask 2 */ 21: A A C 22: return A

(25) The main loop and the aggregation step of the algorithm remain essentially unchanged. Compared to the initialization of Algorithm 3, it will be appreciated that 5 elementary operations are saved (two ANDS and three XORs), which reduces the algorithm cost to 5 k+3 operations.

(26) The algorithms described so far add at most two operands, but certain cryptographic algorithms require the addition of more operands. The straightforward way of adding m integers is to add the first two, then to add the resulting sum to the next integer, and so on. This would require a total of m1 additions. As the secure addition algorithm of the present principles costs 5 k+8 operations, an addition with m blinded operands would have a huge cost; i.e. (m1)(5 k+8) operations, rendering it unusable in practice.

(27) The following algorithm provides a much more efficient approach that follows the so-called carry-save addition technique. Loosely speaking, it consists in keeping track of the carry in a separate variable. For example, for adding three k-bit integers x, y, zcustom character.sub.2.sub.k:
x+y+z=(xyz)+2(zcustom character(xy)(xcustom charactery))(mod 2.sup.k).

(28) It will be appreciated that the addition of three integers boils down to adding two integers A and B, where
A=(xyz) and B=2(zcustom character(xy)(xcustom charactery)).

(29) As can be seen, expressions of A and B involve logical operations only.

(30) The approach extends naturally to more than three integers by iteration, as illustrated in FIG. 2.

(31) For example, four kbit integers x.sub.1, x.sub.2, x.sub.3, x.sub.4 are added using two iterations as A.sup.(1)=(x.sub.1x.sub.2x.sub.3), B.sup.(1)=2(x.sub.3custom character(x.sub.1x.sub.2)(x.sub.1custom characterx.sub.2)), A.sup.(2)=(A.sup.(1)B.sup.(1)x.sub.4), B.sup.(2)=2(x.sub.4custom character(A.sup.(1)B.sup.(1))(A.sup.(1)custom characterB.sup.(1))) and finally x.sub.1+x.sub.2+x.sub.3+x.sub.4=A.sup.(2)+B.sup.(2).

(32) More generally, m k-bit integers, x.sub.1, x.sub.2, . . . , x.sub.mcustom character.sub.2.sub.k, are added with (m2) evaluations of pairs (A.sup.(i), B.sup.(i)), 1im2, and a final addition to eventually get their sum, s=.sub.i=1.sup.m x.sub.1, as s=A.sup.(m-2)+B.sup.(m-2).

(33) The secure version of the carry-save addition using blinded input custom character, custom character, . . . , custom character and from masks r.sub.1, r.sub.2, . . . , r.sub.m works as follows:

(34) 1. A is securely evaluated as:
={circumflex over (B)},R.sub.A=R.sub.AR.sub.BR.sub.C.
where =custom character, R.sub.A=r.sub.1 and {circumflex over (B)}=custom character, R.sub.B=r.sub.2 and =custom character, R.sub.C=r.sub.3

(35) 2. B is computed from , {circumflex over (B)}, and from masks R.sub.A, R.sub.B, R.sub.C. as follows: i) Compute {circumflex over (T)}.sub.1=x.sub.1custom characterx.sub.2 as a secure AND using , {circumflex over (B)}, R.sub.A, R.sub.B and a random with Trichina's method. ii) Compute {circumflex over (T)}.sub.2={circumflex over (B)} and R.sub.T.sub.2=R.sub.AR.sub.B. iii) Compute {circumflex over (B)}=[x.sub.3custom character(x.sub.1x.sub.2)(x.sub.1custom characterx.sub.2)] using {circumflex over (T)}.sub.1, {circumflex over (T)}.sub.2, , , R.sub.T.sub.2, R.sub.C with a variant of Trichina's method. iv) Finally set {circumflex over (B)}=2{circumflex over (B)} and R.sub.B=2.

(36) This process is repeated again with =custom character, R.sub.C=r.sub.4 and the new values of A, B, R.sub.A and R.sub.B. The process is repeated (m2) times until =custom character, R.sub.C=r.sub.m, which enables us getting the pair (A.sup.(m-2), B.sup.(m-2)). Every iteration requires 22 additional operations. Therefore, the total cost of the generalized algorithm is 22(m2)+5 k+8=22 m+5 k36.

(37) The addition algorithm for m blinded operands can be described as follows:

(38) TABLE-US-00006 Algorithm 5 Secure binary addition with m blinded operands Input: ({circumflex over (x)}.sub.1, ..., {circumflex over (x)}.sub.m, r.sub.1, ..., r.sub.m) (custom character ).sup.2m such that {circumflex over (x)}.sub.1 = x.sub.1 r.sub.1, ..., {circumflex over (x)}.sub.m = x.sub.m r.sub.m Output: (, r.sub.s) where = (.sub.i=1.sup.m x.sub.i) r.sub.s (mod 2.sup.k) 1: random(2.sup.k); 2: A {circumflex over (x)}.sub.1; R.sub.A r.sub.1; B {circumflex over (x)}.sub.2; R.sub.B r.sub.2 3: for i = 1 to m 2 do /* Compute = (Acustom character B) */ 4. T Acustom character B; D T 5. T Acustom character R.sub.B; D D T 6. T R.sub.Acustom character B; D D T 7. T R.sub.Acustom character R.sub.B; D D T /* Compute 2 [ {circumflex over (x)}.sub.i+2custom character (A B)] */ 8. A AB; R.sub.A R.sub.AR.sub.B 9. T {circumflex over (x)}.sub.i+2custom character A; B T D 10. T {circumflex over (x)}.sub.i+2custom character R.sub.A; B B T 11. T r.sub.i+2custom character A; B B T 12. T r.sub.i+2custom character R.sub.A; B B T 13. B 2B; R.sub.B 2 /* A x.sub.i+2 */ 14. A A {circumflex over (x)}.sub.i+2; R.sub.A R.sub.A r.sub.i+2 15. end for 16. SecureAdd(A, B, R.sub.A, R.sub.B) using Algorithm 3

(39) Although described for adding over numbers modulo 2.sup.k, the secure addition algorithm (Algorithm 3) readily extends to output the results over the integers. The algorithm can indeed accommodate operands of arbitrary length and compute their blinded sum by running it modulo 2.sup.k+1 for any kmax(bit-length(x), bit-length(y))where bit-length denotes the binary length.

(40) The secure addition algorithm (Algorithm 3) can also be used for subtraction. x is used to denote the bitwise complementation of x, namely x=x(1). The secure subtraction algorithm runs in three steps: 1. Compute {circumflex over (x)}; 2. Use Algorithm 3 on input ({circumflex over (x)}, , r.sub.x, r.sub.y) and obtain (, r.sub.s) where =(x+y)r.sub.s and r.sub.s=r.sub.x+r.sub.y; 3. Set = and r.sub.w=r.sub.s, and return (, r.sub.w)
where =(xy)r.sub.w (mod 2.sup.k) and r.sub.w=r.sub.xr.sub.y.

(41) This subtraction algorithm can also be adapted to work with more than two operands and over the integers, as described for the addition algorithm.

(42) FIG. 1 illustrates a cryptographic device according to a preferred embodiment of the present principles. The cryptographic device 110 comprises an input unit 111 for receiving input and an output unit 113 for outputting data. The cryptographic device further comprises a hardware processor 112 configured to receive two or more input operands (one in the single-variable variant) and to perform secure modular addition according to any of the embodiments described herein.

(43) It will thus be appreciated that a software implementation of the present principles provides modular addition algorithms that is resistant against DPA attacks. The algorithms work with modular subtraction as well and are more efficient than Goubin's method. Using the same number of registers 13 elementary operations can be saved when two masked operands are used and 10 elementary operations are saved when only one masked operand is used.

(44) For 8-bit processors the gain of the algorithm is significant as it represents a decrease of about 21.3%. It will thus be appreciated that the algorithm provides the optimal choice regarding memory versus speed complexity, which makes the algorithm attractive for resource-constraint devices.

(45) The algorithm can be used for protecting the International Data Encryption Algorithm (IDEA) and TEA family block ciphers (TEA, XTEA, XXTEA) as well as Hash Message Authentication Code (HMAC) algorithms based on SHA-1 or SHA-2 against DPA attacks. IDEA uses 16-bit operands whereas SHA-1 or SHA-2 uses 32-bit operands. For smaller operands, one could also use our algorithm for protecting Secure And Fast Encryption Routine (SAFER). SAFER encryption uses additions modulo 28. For larger operands, the algorithm is also applicable to the SKEIN hash function or the Threefish block-cipher that work with variables of 64-bit size.

(46) It will be appreciated that the method of the present principles is particularly suited for devices with limited resources, in particular memory that otherwise for example could be used to store lookup tables.

(47) Each feature disclosed in the description and (where appropriate) the claims and drawings may be provided independently or in any appropriate combination. Features described as being implemented in hardware may also be implemented in software, and vice versa. Reference numerals appearing in the claims are by way of illustration only and shall have no limiting effect on the scope of the claims.