Optimized hardward architecture and method for ECC point addition using mixed affine-jacobian coordinates over short weierstrass curves
09900154 ยท 2018-02-20
Assignee
Inventors
Cpc classification
H04L9/3066
ELECTRICITY
International classification
Abstract
An optimized hardware architecture and method introducing a simple arithmetic processor that allows efficient implementation of an Elliptic Curve Cryptography point addition algorithm for mixed Affine-Jacobian coordinates. The optimized architecture additionally reduces the required storage for intermediate values.
Claims
1. A data cryptographic apparatus comprising: computational logic configured to perform an elliptic curve cryptography (ECC) point addition operation using mixed affine-Jacobian coordinates over a short Weierstrauss curve of the form y=x.sup.3+ax+b where a=3; a register memory configured to store a first point in affine coordinates and a second point in Jacobian coordinates, wherein the register memory is configured for two temporary storage variables, T.sub.1 and T.sub.2; a modular multiplier electrically coupled to the register memory, wherein the modular multiplier is configured to perform at most one modular multiplication for each step in a sequence of steps in the ECC point addition operation; and a simple arithmetic processor configured to perform modular subtraction and modular multiplication by two in support of the ECC point addition operation utilizing two modular subtractors, a logical one bit left shifter to either output AB2C for an input of variables A, B, and C or AB for an input of variables A and B, wherein the simple arithmetic processor is electrically coupled to the computational logic, the register memory, and the modular multiplier to output a result of the ECC point addition operation in the Jacobian coordinates.
2. A mobile device comprising the data cryptographic apparatus of claim 1.
3. A smartcard comprising the data cryptographic apparatus of claim 1.
4. The mobile device of claim 2, wherein the mobile device is a smartphone.
5. A method for performing an elliptic curve cryptography (ECC) point addition operation using mixed affine-Jacobian coordinates over a short Weierstrauss curve of the form y=x.sup.3+ax+b where a=3 comprising: accepting, with a computational device, as variable input a first point in affine coordinates and a second point in Jacobian coordinates using a simple arithmetic processor; configuring the simple arithmetic processor for modular subtraction and modular multiplication by two utilizing two modular subtractors, a logical one bit left shifter to either output AB2C for an input of variables A, B, and C or AB for an input of variables A and B; enabling a modular multiplier of the computational device to execute a sequence of steps to perform the ECC point addition operation of the first point and the second point, wherein the modular multiplier performs at most one modular multiplication for each step in the sequence of steps, wherein the sequence of steps requires no more than two temporary variables; and outputting, by the computational device, a result of the ECC point addition operation in the Jacobian coordinates.
6. The method of claim 5, wherein the computational device is part of a mobile device.
7. The method of claim 6, wherein the mobile device is a smartphone.
8. The method of claim 5, wherein the computational device is part of a smartcard.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
DETAILED DESCRIPTION
(8) PADD algorithm 300 in accordance with the invention is shown in
(9) As input in step 301, PADD algorithm 300 shown in
(10) The most computationally intensive operation in PADD algorithm 300 in
(11) It is important to note that besides the modular multiplication steps performed in steps 306, 307, 314, 315 and 316 of PADD algorithm 300, two additional, comparatively simple operations are performed as well: modular subtraction and modular multiplication by 2. Note that multiplication or division by a power of 2 in binary is merely a shift operation. In order to speed up execution of PADD algorithm 300 and eliminate the need for additional temporary registers, an embodiment in accordance with the invention of simple arithmetic unit (SAU) 400 with the inputs and outputs as shown in
(12)
(13)
(14) SAU 400 shown in
(15) Multi-cycle multiplier 610 functions by multiplying the values on inputs 635 and 640 together and outputting the result. Steps 301-303 are performed in the microprocessor (not shown) without using multi-cycle multiplier 610 and SAU 400.
(16) Step 304 utilizes multi-cycle multiplier 610. Register memory 695 provides Z.sub.1 on both inputs 635 and 640 of multi-cycle multiplier 610 and multi-cycle multiplier 610 computes Z.sub.1.sup.2 which is sent on line 650 to register memory 695 and stored in temporary register T.sub.1.
(17) Step 305 utilizes multi-cycle multiplier 610. Register memory 695 provides T.sub.1 on input 635 and Z.sub.1 on input 640 of multi-cycle multiplier 610. Multi-cycle multiplier 610 computes T.sub.1*Z.sub.1 which is sent on line 650 to register memory 695 where it is stored in temporary register T.sub.2.
(18) Step 306 utilizes both multi-cycle multiplier 610 and SAU 400. Register memory 695 provides T.sub.2 and y.sub.2 on lines 635 and 640, respectively, to multi-cycle multiplier 610. Multi-cycle multiplier 610 computes T.sub.2*y.sub.2 which is output on line 650 to input 1 of MUX 620 with MUX 620 set to 1. MUX 630 input is set to 0. MUX 620 sends T.sub.2*y.sub.2 to input A of SAU 400 on line 670. Line 670 is directly connected to the minuend input of subtractor 710. Register memory 695 provides Y.sub.1 on line 660 to input 0 of MUX 630 and MUX 630 is set to 0. MUX 630 sends Y.sub.1 to input B of SAU 400 on line 675. Line 675 is directly connected to the subtrahend input of subtractor 710. Subtractor 710 computes AB (which is T.sub.2*y.sub.2Y.sub.1) and outputs AB on line 730 to input 0 of MUX 740 with MUX 740 set to 0. MUX 740 sends D (which is AB) on line 690 to register memory 695 where it is stored in temporary register T.sub.2.
(19) Step 307 utilizes both multi-cycle multiplier 610 and SAU 400. Register memory 695 provides T.sub.1 and x.sub.2 on lines 635 and 640, respectively, to multi-cycle multiplier 610. Multi-cycle multiplier 610 computes T.sub.1*x.sub.2 which is output on line 650 to input 1 of MUX 620 with MUX 620 set to 1. MUX 620 sends T.sub.1*x.sub.2 to input A of SAU 400 on line 670. Line 670 is directly connected to the minuend input of subtractor 710. Register memory 695 provides X.sub.1 on line 660 to input 0 of MUX 630 and MUX 630 is set to 0. MUX 630 sends X.sub.1 to input B of SAU 400 on line 675. Line 675 is directly connected to the subtrahend input of subtractor 710. Subtractor 710 computes AB (which is T.sub.1*x.sub.2X.sub.1) and outputs AB on line 730 to input 0 of MUX 740 with MUX 740 set to 0. MUX 740 sends D (which is AB) on line 690 to register memory 695 where it is stored in temporary register T.sub.1.
(20) Steps 308-309 are performed in the microprocessor (not shown) without using multi-cycle multiplier 610 and SAU 400.
(21) Step 310 utilizes multi-cycle multiplier 610. Register memory 695 provides T.sub.1 on line 635 and Z.sub.1 on line 640 to multi-cycle multiplier 610. Multi-cycle multiplier 610 computes T.sub.1*Z.sub.1 and the result is output on line 650 to register memory 695 where it is stored in temporary register T.sub.2.
(22) Step 311 utilizes multi-cycle multiplier 610. Register memory 695 provides T.sub.1 on both lines 635 and 640 to multi-cycle multiplier 610. Multi-cycle multiplier 610 computes T.sub.1.sup.2 and the result is output on line 650 to register memory 695 Y.sub.3 where it is stored in Y.sub.3.
(23) Step 312 utilizes multi-cycle multiplier 610. Register memory 695 provides T.sub.2 on both lines 635 and 640 to multi-cycle multiplier 610. Multi-cycle multiplier 610 computes T.sub.2.sup.2 and the result is output on line 650 to register memory 695 where it is stored in X.sub.3.
(24) Step 313 utilizes multi-cycle multiplier 610. Register memory 695 provides T.sub.1 on line 635 and Y.sub.3 on line 640 to multi-cycle multiplier 610. Multi-cycle multiplier 610 computes T.sub.1*Y.sub.3 and the result is output on line 650 to register memory 695 where it is stored in temporary register T.sub.1.
(25) Step 314 utilizes both multi-cycle multiplier 610 and SAU 400. Register memory 695 provides X.sub.3 on line 665 to input 0 of MUX 620 with MUX 620 set to 0. MUX 620 sends X.sub.3 to input A of SAU 400 on line 670. Line 670 is directly connected to the minuend input of subtractor 710. Register memory 695 provides T.sub.1 on line 660 to input 0 of MUX 630 with MUX 630 set to 0. MUX 630 sends T.sub.1 on line 675 to input B of SAU 400. Line 650 is directly connected to the subtrahend input of subtractor 710. Subtractor 710 computes and outputs AB (which is X.sub.3T.sub.1) on line 730 to the minuend input of subtractor 720. Register memory 695 provides X.sub.1 on line 635 and Y.sub.3 on line 640 to multi-cycle multiplier 610. Multi-cycle multiplier 610 computes Y.sub.3*X.sub.1. The result is output on line 650 to input C of SAU 400 which is directly connected to logical one bit left shifter 715 which multiplies input C by two and outputs 2C (which is 2Y.sub.3*X.sub.1) on line 735 to the subtrahend output of subtractor 720. Subtractor 720 computes and outputs AB2C on line 750 to input 1 of MUX 740 with MUX 740 set to 1. MUX 740 sends D (which is AB2C=X.sub.3T.sub.12Y.sub.3*X.sub.1) on line 690 to register memory 695 where it is stored in X.sub.3.
(26) Step 315 utilizes both multi-cycle multiplier 610 and SAU 400. In step 314, Y.sub.3*X.sub.1 was computed by multi-cycle multiplier 610. Hence, Y.sub.3*X.sub.1 is still present in the output register (not shown) of multi-cycle multiplier 610 and in Step 315 is sent on line 650 to input 1 of MUX 620 and MUX 620 is set to 1. MUX 620 sends Y.sub.3*X.sub.1 on line 670 to input A of SAU 400. Line 670 is connected directly to the minuend input of subtractor 710. Register memory 695 provides X.sub.3 on line 660 to input 0 of MUX 630 with MUX 630 set to 0. MUX 630 sends X.sub.3 on line 675 to input B of SAU 400. Line 675 is directly connected to the subtrahend input of subtractor 710. Subtractor 710 calculates AB and sends the result on line 730 to input 0 of MUX 740 with MUX 740 set to0. MUX 740 sends D (which is AB=Y.sub.3*X.sub.1X.sub.3) on line 690 to register memory 695 which passes D through on line 635 and provides T.sub.2 on line 640 to multi-cycle multiplier 610. Multi-cycle multiplier 610 computes D*T.sub.2 (which is (Y.sub.3*X.sub.1X.sub.3)*T.sub.2) and outputs the result on line 650 to register memory 695 where the result is stored in Y.sub.3.
(27) Step 316 utilizes both multi-cycle multiplier 610 and SAU 400. Register memory 695 provides Y.sub.3 on line 665 to input 0 of MUX 620 with MUX 620 set to 0. MUX 620 sends Y.sub.3 on line 670 to input A of SAU 400. Line 670 is directly connected to the minuend of subtractor 710. Register memory 695 provides T.sub.1 on line 635 and Y.sub.1 on line 640 to multi-cycle multiplier 610. Multi-cycle multiplier 610 computes and outputs T.sub.1*Y.sub.1 on line 650 to input 1 of MUX 630 with MUX 630 set to 1. MUX 630 sends T.sub.1*Y.sub.1 on line 675 to input B of SAU 400. Line 675 is directly connected to the subtrahend of subtractor 710. Subtractor 710 computes AB (which is Y.sub.3T.sub.1*Y.sub.1) and provides the result on line 730 to input 0 of MUX 740 with MUX 740 set to 0. MUX 740 sends D (which is Y.sub.3T.sub.1*Y.sub.1) on line 690 to register memory 695 where the result is stored in Y.sub.3.
(28) Step 317 returns the result of the addition of P+Q in Jacobian coordinates which is (X.sub.3, Y.sub.3, Z.sub.3).