Optimized hardware architecture and method for ECC point doubling using jacobian coordinates over short weierstrass curves
09979543 ยท 2018-05-22
Assignee
Inventors
Cpc classification
H04L9/3066
ELECTRICITY
G06F7/726
PHYSICS
International classification
H04L9/00
ELECTRICITY
Abstract
An optimized hardware architecture and method introducing a simple arithmetic processor that allows efficient implementation of an Elliptical Curve Cryptography point doubling algorithm for Jacobian coordinates. The optimized architecture additionally reduces the required storage for intermediate values to one intermediate value.
Claims
1. A data cryptographic apparatus comprising: a computational logic configured to perform an elliptic curve cryptography (ECC) point doubling operation using Jacobian coordinates over a short Weierstrass curve of the form y=x.sup.3+ax+b where a=3; a register memory configured to store a point in the Jacobian coordinates, wherein the register memory is configured for no more than one temporary storage variable, ; a modular multiplier electrically coupled to the register memory, wherein the modular multiplier is configured to perform at most one modular multiplication for each step in a sequence of steps in the ECC point doubling operation; and a simple arithmetic processor configured to perform modular subtraction, modular addition, and modular multiplication by two in support of the ECC point doubling operation utilizing a logical one bit left shifter, a logical two bit left shifter, a logical three bit left shifter, and a multiplier by three constructed using a logical one bit left shifter and an adder configured to output either A+B, A8B, 2A, 3(AB), or 4AB for an input of variables A and B, wherein the simple arithmetic processor is electrically coupled to the register memory, the computational logic and the modular multiplier, to output the result of the ECC point doubling operation in the Jacobian coordinates.
2. A mobile device comprising the apparatus of claim 1.
3. A smartcard comprising the apparatus of claim 1.
4. The mobile device of claim 2, wherein the mobile device is a smartphone.
5. A method for performing an elliptic curve cryptography (ECC) point doubling operation using Jacobian coordinates over a short Weierstrass curve of the form y=x.sup.3+ax+b where a=3, comprising: accepting, by a computational device, as variable input, a point in the Jacobian coordinates using a simple arithmetic processor; configuring the simple arithmetic processor of the computational device for modular subtraction, modular addition, modular multiplication by two and modular multiplication by three utilizing a logical one bit left shifter, a logical two bit left shifter, a logical three bit left shifter, and a multiplier by three constructed using a logical one bit left shifter and an adder, wherein an input of variables A and B to the simple arithmetic processor results in an output of either A+B, A8B, 2A, 3(AB), or 4AB; enabling, a modular multiplier of the computational device to execute a sequence of steps to perform the ECC point doubling operation of the point in the Jacobian coordinates, wherein the modular multiplier performs at most one modular multiplication for each step in the sequence of steps, wherein the sequence of steps requires no more than one temporary variable; and outputting, by the computational device, a result of the ECC point doubling operation in the Jacobian coordinates.
6. The method of claim 5, wherein the computational device is part of a mobile device.
7. The method of claim 6, wherein the mobile device is a smartphone.
8. The method of claim 5, wherein the computational device is part of a smartcard.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
DETAILED DESCRIPTION
(8) PDBL algorithm 300 in accordance with the invention is shown in
(9) As input in step 301, PDBL algorithm 300 shown in
(10) The most computationally intensive operation in PDBL algorithm 300 in
(11) It is important to note that besides the modular multiplication steps performed in steps 303, 304, 305, 306, 307, 308, 309 and 310 of PDBL algorithm 300, additional, comparatively simple operations are performed as well: modular subtraction and addition and modular multiplication by powers of 2. Note that multiplication by a power of 2 in binary is merely a left shift operation. In order to speed up execution of PDBL algorithm 300 and eliminate the need for additional temporary registers, an embodiment in accordance with the invention of simple arithmetic unit (SAU) 400 with the inputs A, B and outputs C and D as shown in
(12)
(13)
(14) SAU 400 shown in
(15) Input A connects to adder 722 on line 671 and also connects to one bit left shifter 715, to input 0 of MUX 720 and to logical two bit left shifter 718 on line 671. Logical one bit shifter 715 outputs 2A on line 776 to input 0 of MUX 725. Logical two bit left shifter 718 outputs 4A on line 733 to input 1 of MUX 720. MUX 720 connects to the minuend input of subtractor 710 on line 731. Input B connects to adder 722 on line 672 and also connects to logical three bit left shifter 714 and input 0 of MUX 723 on line 672. Logical three bit left shifter 714 outputs 8B to input 1 of MUX 723 on line 744. MUX 723 connects to the subtrahend input of subtractor 710 on line 732. Adder 722 outputs C (=A+B) on line 690. Subtractor 710 connects to input 1 of MUX 725 on line 777 and connects to multiplier by three 728 on line 777. Multiplier by three 728 connects to input 2 on MUX 725. MUX 725 outputs D (see
(16) Multi-cycle multiplier 610 functions by multiplying the values on lines 635 and 640 together and outputting the result. Steps 301-302 are performed using the microprocessor (not shown) without using multi-cycle multiplier 610 and SAU 400.
(17) Step 303 utilizes multi-cycle multiplier 610. Register memory 695 provides Z.sub.1 on both inputs 635 and 640 of multi-cycle multiplier 610 and multi-cycle multiplier 610 computes Z.sub.1.sup.2 which is sent on line to register memory 695 where it is stored in Z.sub.3.
(18) Step 304 utilizes multi-cycle multiplier 610. Register memory 695 provides Y.sub.1 on both line 635 and on line 640 to multi-cycle multiplier 610. Multi-cycle multiplier 610 computes Y.sub.1*Y.sub.1 which is sent on line 650 to register memory 695 where it is stored in Y.sub.3.
(19) Step 305 utilizes multi-cycle multiplier 610. Register memory 695 provides X.sub.1 on line 635 and Y.sub.3 on line 640 to multi-cycle multiplier 610. Multi-cycle multiplier 610 computes X.sub.1*Y.sub.3 which is sent on line 650 to register memory 695 where it is stored in temporary register .
(20) Step 306 utilizes both multi-cycle multiplier 610 and SAU 400. Register memory 695 provides X.sub.1 on line 665 to input 0 of MUX 620 with MUX 620 set to 0. MUX 620 sends X.sub.1 to input A of SAU 400 on line 671 directly to adder 722 and to input 0 of MUX 720 with MUX 720 set to 0. MUX 720 sends A (X.sub.1) to the minuend input of subtractor 710 on line 731. Register memory 695 provides Z.sub.3 on line 650 to input 0 of MUX 630 with MUX 630 set to 0. MUX 630 sends Z.sub.3 to input B of SAU 400 on line 672 directly to adder 722 and input 0 of MUX 723 with MUX 723 set to 0. MUX 723 sends B (Z.sub.3) to the subtrahend input of subtractor 710. Subtractor 710 computes AB (which is X.sub.1Z.sub.3) which is output online 777 to multiplier by three 728 which computes and outputs 3(AB) (which is 3(X.sub.1Z.sub.3)) on line 778 to input 2 of MUX 725. MUX 725 sends D (which is 3(AB)=3(X.sub.1Z.sub.3)) on line 696 to register memory 695 which passes D on line 635 to multi-cycle multiplier 610. Adder 722 computes A+B and outputs the result as C (which is (X.sub.1+Z.sub.3)) on line 690 to register memory 695 which passes C on line 640 to multi-cycle multiplier 610. Multi-cycle multiplier computes C*D (which is 3(X.sub.1Z.sub.3)*(X.sub.1+Z.sub.3)) which is output on line 650 to register memory 695 where the result is stored in Z.sub.3.
(21) Step 307 utilizes both multi-cycle multiplier 610 and SAU 400. Register memory 695 provides Z.sub.3 on both lines 635 and 640 to multi-cycle multiplier 610. Multi-cycle multiplier 610 computes and outputs Z.sub.3*Z.sub.3 on line 650 to input 1 of MUX 620 with MUX 620 set to 1. MUX 620 sends Z.sub.3.sup.2 to input A of SAU 400 on line 671 which connects to input 0 on MUX 720 with MUX. MUX 720 sends A (Z.sub.3.sup.2) on line 731 to the minuend input of subtractor 710. Register memory 695 provides a on line 660 to input 0 of MUX 630 with MUX 630 set to 0. MUX 630 sends to input B of SAU 400 on line 672 which connects to logical three bit left shifter 714 (multiply by 8). Logical three bit left shifter 714 computes and outputs 8B (8) on line 744 to input 1 of MUX 723 with MUX 723 set to 1. MUX 723 sends 8B on line 732 to the subtrahend input of subtractor 710. Subtractor computes and outputs A8B (which is Z.sub.38) on line 777 to input 1 of MUX 725 with MUX 725 set to 1. MUX 725 sends D (which is A8B=Z.sub.38) on line 696 to register memory 695 where the result is stored in X.sub.3.
(22) Step 308 utilizes both multi-cycle multiplier 610 and SAU 400. Register memory 695 provides a on line 665 to input 0 of MUX 620 with MUX 620 set to 0. MUX 620 sends to input A of SAU 400 on line 671 which connects to logical two bit left shifter 718 (multiply by 4). Logical two bit left shifter 718 computes and outputs 4A (4) on line 733 to input 1 of MUX 720 with MUX 720 set to 1. MUX 720 sends 4A on line 731 to the minuend input of subtractor 710. Register memory 695 provides X.sub.3 on line 660 to input 0 of MUX 630 with MUX 630 set to 0. MUX 630 sends X.sub.3 to input B of SAU 400 on line 672 which is connected to input 0 of MUX 723 with MUX 723 set to 0. MUX 723 sends B (X.sub.3) on line 732 to the subtrahend input of subtractor 710. Subtractor 710 computes and outputs 4AB (which is 4X.sub.3) on line 777 to input 1 of MUX 725 with MUX 725 set to 1. MUX 725 outputs D (which is 4AB=4X.sub.3) on line 696 to register memory 695 which passes D onto line 635 and provides Z.sub.3 on line 640 to multi-cycle multiplier 610 which computes and outputs Z.sub.3*D (which is Z.sub.3*(4X.sub.3)) on line 650 to register memory 695 where the result is stored in temporary register .
(23) Step 309 utilizes both multi-cycle multiplier 610 and SAU 400. Register memory 695 provides a on line 665 to input 0 of MUX 620 with MUX 620 set to 0. MUX 620 sends to input A of SAU 400 on line 671 which connects to input 0 of MUX 720 with MUX 720 set to 0. MUX 720 sends A () on line 731 to the minuend input of subtractor 710. Register memory 695 provides Y.sub.3 on both line 635 and line 640 to multi-cycle multiplier 610 which computes and outputs Y.sub.3*Y.sub.3 on line 650 which connects to input 1 of MUX 630 with MUX 60 set to 1. MUX 630 outputs Y.sub.3.sup.2 to input B of SAU 400 on line 672 which connects to logical three bit left shifter 719 (multiply by 8). Logical three bit left shifter 719 computes and outputs 8B (8Y.sub.3.sup.2) on line 744 to input 1 of MUX 723 with MUX 723 set to 1. MUX 723 sends 8B to the subtrahend input of subtractor 710. Subtractor 710 computes and outputs A8B (which is 8Y.sub.3.sup.2) on line 777 to input 1 of MUX 725. MUX 725 sends D (which is A8B=8Y.sub.3.sup.2) on line 696 to register memory 695 where the result is stored in Y.sub.3.
(24) Step 310 utilizes both multi-cycle multiplier 610 and SAU 400. Register memory 695 provides Y.sub.1 on line 665 to input 0 of MUX 620 with MUX 620 set to 0. MUX 620 sends Y.sub.1 to input A of SAU 400 on line 671 which connects to logical one bit left shifter 715 (multiply by 2). Logical one bit left shifter 715 computes and outputs 2A (2Y.sub.1) on line 776 to input 0 of MUX 725 with MUX 725 set to 0. MUX 725 sends D (which is 2A=2Y.sub.1) on line 696 to register memory 695 which passes D onto line 635 and provides Z.sub.1 on line 640 to multi-cycle multiplier 610. Multi-cycle multiplier 610 computes and outputs D*Z.sub.1 (which is 2A*Z.sub.1=2Y.sub.1*Z.sub.1) on line 650 to register memory 695 where it is stored in Z.sub.3.
(25) Step 311 is performed using the microprocessor and returns the result of PDBL algorithm 300 which is (X.sub.3, Y.sub.3, Z.sub.3) for input (X.sub.1, Y.sub.1, Z.sub.1).