PROCESSOR, PROCESSOR OPERATION METHOD AND ELECTRONIC DEVICE COMPRISING SAME
20220283814 · 2022-09-08
Assignee
- ICTK HOLDINGS CO., LTD. (Gyeonggi-do, KR)
- IUCF-HYU(Industry-University Cooperation Foundation Hanyang University) (Seoul, KR)
Inventors
Cpc classification
G06F12/14
PHYSICS
G06F21/52
PHYSICS
G06F9/30145
PHYSICS
International classification
Abstract
Disclosed are a processor, a processor operation method and an electronic device comprising same. The disclosed processor operation method comprises the steps of: identifying an instruction for instructing the execution of a first operation and address information of an operand corresponding to the instruction; and executing the instruction on the basis of whether or not the address information of the operand satisfies a predetermined condition. In the step of executing the instruction, a second operation configured to the instruction is executed for the operand if the address information of the operand satisfies the predetermined condition, and the first operation is executed for the operand if the address information of the operand does not satisfy the predetermined condition.
Claims
1. A processor operation method comprising: identifying an instruction for instructing execution of a first operation and address information of an operand corresponding to the instruction; and executing the instruction based on whether or not the address information of the operand satisfies a predetermined condition, wherein the executing of the instruction comprises: executing a second operation set for the instruction on the operand if the address information of the operand satisfies the predetermined condition; and executing the first operation on the operand if the address information of the operand does not satisfy the predetermined condition.
2. The processor operation method of claim 1, wherein the predetermined condition corresponds to whether or not the address information of the operand falls within a preset address range.
3. The processor operation method of claim 1, wherein the first operation is an operation that is executed less than the second operation in the processor.
4. The processor operation method of claim 1, wherein an address range according to the predetermined condition is pre-registered with the processor before the instruction is executed.
5. The processor operation method of claim 1, wherein the second operation is an operation not included in an instruction set architecture (ISA) of the processor.
6. The processor operation method of claim 1, wherein the operand is loaded from a memory connected to the processor and stored in a dedicated buffer in the processor, and the address information of the operand indicates an address in the dedicated buffer in which the operand is stored.
7. The processor operation method of claim 1, wherein if the address information of the operand satisfies the predetermined condition, the operand is stored in a data-buffer in the processor and the address information of the operand is stored in a configuration-buffer in the processor.
8. The processor operation method of claim 7, wherein the configuration-buffer is a memory mapped buffer.
9. The processor operation method of claim 1, wherein whether or not the address information of the operand satisfies the predetermined condition is expressed by flag information connected to a general-purpose register in the processor.
10. The processor operation method of claim 1, wherein if the address information of the operand satisfies the predetermined condition, a round counter and a round-key pointer used in an operation of the processor are stored in a data-buffer in the processor and address information of the round counter and address information of the round-key pointer are stored in a configuration-buffer in the processor.
11. A processor comprising: a data-buffer configured to store an operand; a configuration-buffer configured to store address information of the operand; and a processing unit configured to identify an instruction for instructing execution of a first operation and address information of the operand corresponding to the instruction, and execute the instruction based on whether or not the address information of the operand satisfies a predetermined condition, wherein the processing unit is configured to: execute a second operation set for the instruction on the operand if the address information of the operand satisfies the predetermined condition; and execute the first operation on the operand if the address information of the operand does not satisfy the predetermined condition.
12. The processor of claim 11, wherein the predetermined condition corresponds to whether or not the address information of the operand falls within a preset address range.
13. The processor of claim 11, wherein the first operation is an operation that is executed less than the second operation in the processor.
14. The processor of claim 11, wherein an address range according to the predetermined condition is pre-registered with the processor before the instruction is executed.
15. The processor of claim 11, wherein the second operation is an operation not included in an instruction set architecture (ISA) of the processor.
16. The processor of claim 11, wherein the operand is loaded from a memory connected to the processor and stored in a dedicated buffer in the processor, and the address information of the operand indicates an address in the dedicated buffer in which the operand is stored.
17. The processor of claim 11, wherein if the address information of the operand satisfies the predetermined condition, the operand is stored in a data-buffer in the processor and the address information of the operand is stored in a configuration-buffer in the processor.
18. The processor of claim 11, further comprising: a dedicated operator configured to perform the second operation.
19. An electronic device comprising: a memory configured to store an instruction and an operand corresponding to the instruction; and a processor configured to execute the instruction, wherein the processor comprises: a buffer configured to store the operand and address information of the operand received from the memory to execute the instruction; and a processing unit configured to identify the instruction for instructing execution of a first operation and the address information of the operand corresponding to the instruction, and execute the instruction based on whether or not the address information of the operand satisfies a predetermined condition, and the processing unit is configured to: execute a second operation set for the instruction on the operand if the address information of the operand satisfies the predetermined condition; and execute the first operation on the operand if the address information of the operand does not satisfy the predetermined condition.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
BEST MODE FOR CARRYING OUT THE INVENTION
[0040] Hereinafter, example embodiments will be described in detail with reference to the accompanying drawings. However, since various changes may be made to the example embodiments, the scope of the patent application is not limited or confined by these example embodiments. It should be understood that all changes, equivalents, and substitutes for the example embodiments are included in the scope of rights.
[0041] The terms used in the example embodiments are used merely for the purpose of description, and should not be construed as limiting. A singular expression includes a plural expression unless the context clearly dictates otherwise. In this specification, it should be understood that terms such as “comprise”, “include”, or “have” are intended to designate that a feature, number, step, operation, component, part, or a combination thereof described in the specification exists, but do not preclude the possibility of existence or addition of one or more other features, numbers, steps, operations, components, parts, or combinations thereof.
[0042] Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by a person skilled in the art to which the example embodiment pertains. Terms such as those defined in dictionaries commonly used should be construed as having meanings consistent with the meanings in the context of the related art, and are not construed in ideal or excessively formal meanings unless explicitly defined in the present application.
[0043] In addition, in the description with reference to the accompanying drawings, the same elements are given the same reference numeral regardless of numerals in the drawings, and overlapping description thereof will be omitted. In describing the example embodiments, in the case that it is determined that detailed descriptions of a related known art may unnecessarily obscure the gist of the example embodiments, the detailed descriptions thereof will be omitted.
[0044] Further, in describing elements of the example embodiments, terms such as first, second, A, B, (a), (b), and the like may be used. These terms are merely for distinguishing the elements from other elements, and the essence, order, or order of the elements are not limited by the terms. In the case that it is described that an element is “coupled”, “linked”, or “connected” to another element, it should be understood that the element may be directly coupled or connected to the other element, but another element may be “coupled”, “linked”, or “connected” between the elements.
[0045] An element included in one example embodiment and an element having a common function will be described using the same name in other example embodiments. Unless otherwise stated, explanations described in one example embodiment may be applied to other example embodiments, and detailed descriptions within the overlapping range will be omitted.
[0046]
[0047] Referring to
[0048] The memory 110 stores an instruction to be executed by the processor 120 and an operand corresponding to the instruction. For example, the memory 110 may be a volatile memory or a non-volatile memory.
[0049] The processor 120 is a device that executes instructions or programs, or controls the electronic device 100, and may be, for example, a central processing unit (CPU), a graphic processing unit (GPU), or a neural processing unit (NPU). The processor 120 may read the instruction and/or operand stored in the memory 110 and store it in an internal buffer, and may quickly perform an operation according to the instruction based on data stored in the internal buffer. As such, the instructions executable by the processor 120 are defined by an instruction set architecture (ISA), and in various cases, the processor 120 may execute a new instruction not defined in the ISA. If new instructions are added to the ISA for this purpose, ISA extensions and compiler changes may be required. To prevent this, instead of adding the new instructions to the ISA, only the execution of existing instructions may be extended. In other words, the processor 120 may execute different operations for the same instructions according to a data address of the operand. This new computer architecture concept may be referred to herein as instruction overloading. Hereinafter, detailed descriptions will be provided with reference to the drawings.
[0050]
[0051] Referring to
[0052] In an example embodiment, among the operations performed by the processor 200, there may be a frequently used operation and an infrequent operation. For example, when block ciphers are performed in the processor 200, some operations such as multiplication (*) and division (/) are rarely used, while counterparts for finite fields, such as convolution and multiplicative inversion, are frequently used but may not be defined in the ISA corresponding to the processor 200. In this case, a new operation may be supported by the processor 200 without changing the ISA of the existing processor 200 by interpreting an instruction for instructing execution of the operation that is not frequently used, such as multiplication (*) or division (/), depending on the situation. Such instruction overloading may be distinguished from operator overloading in which an operator is interpreted as a different operation according to the data type of the operand in object-oriented languages.
[0053] The processor 200 may execute different operations for the same instruction according to address information of the operand. The address of such a variable may be registered in advance with the processor 200 and a variable to be processed differently may be designated. In other words, the execution of the instruction in the processor 200 may be different depending on whether or not the address of the operand is registered.
[0054] In one example embodiment, three different approaches for RISC-V core extension using instruction overloading may be used.
[0055] The first approach may be to extend the core to support a specific block cipher. For example, an advanced encryption standard (AES) extension may be performed. To this end, two types of instructions may be overloaded: a load/store instruction and an arithmetic instruction. These instructions may provide the following functions for the registered variables. [0056] Execution of a series of routine operations by a single overloaded load instruction [0057] Management of variables within the CPU core [0058] Cryptographic operation and transformation
[0059] First, each routine operation on some variable, such as a round counter and a round-key pointer, may be treated as the single overloaded load instruction, which may reduce instruction count.
[0060] Second, the registered variable may be managed in a dedicated buffer (for example, the configuration-buffer 213 and the data-buffer 215) within the processor 200, and the registered variable may be loaded and stored as the overloaded load and store instruction. Through this, an access speed to a block cipher variable may be improved.
[0061] Third, an overloaded arithmetic instruction may support the cryptographic operations and transformations, which may not be easily processed only with instructions generally provided by the processor 200. The transformation may usually be handled using large pre-computed tables in legacy software. However, an overloaded arithmetic operation may have faster execution time than accessing tables stored in a memory, and the transformation may be expressed more simply with the overloaded arithmetic instruction. As a result, compared to a legacy AES software code, a new AES code with instruction overloading may achieve faster execution speed and a smaller memory footprint. This AES extension may also be applied to XOR-based (ARX-based) block cipher addition/AND such as SIMON. SIMON, a lightweight block cipher, has almost the same memory footprint, but instruction overloading may be effective in reducing the execution time of SIMON code.
[0062] The second approach to the RISC-V core extension may be to provide general cryptographic operations and support block ciphers that perform operations through GF(2.sup.8) such as convolution and multiplicative inversion. This may mean that, if necessary, various block ciphers including ciphers developed after distribution of the processor 200 may be effectively supported with less overhead compared to the first method.
[0063] In the third approach, hardware masking may additionally be applied to the second approach. Here, masking is one of widely used measures against power analysis attacks. This will be described in detail with reference to
[0064]
[0065] Referring to
[0066] Based on RISC-V RV32IM ISA, a custom CPU core may be designed that processes 32-bit integer-based instructions for integer multiplication and division (M extension). Specifically, a microarchitecture of a base core supports single-issue, out-of-order execution, and AXI4-Lite interface, and the M extension may include a 32-bit×8-bit multiplier and a 32-bit x 1-bit divider.
[0067] The gray areas between stages in a 5-stage base core shown in
[0068]
[0069] Referring to
[0070] Arithmetic Instruction Overloading
[0071] Arithmetic operations such as integer multiplication (*) and integer division (/) may be rarely used in block ciphers. Therefore, instructions for these operations may be assigned to cryptographic transformations and operations such as SubByte and convolution of advanced encryption standard (AES). Also, shift instructions (for example, << and >>) may be assigned to rotations, which may be used more frequently than shift operations in block ciphers. Therefore, by overloading the instructions corresponding to the operators *, /, <<, and >>, it is possible to provide expansion operations for the registered variables.
[0072] For variable registration, the processor may include the Config-Buffers including TextAddr, TextNum, and srcFlags shown in
[0073] srcFlags may be linked to a0-a7 (in other words, x10-x17) in the GPR 410 for storing function arguments or internal variables in the RISC-V core. In the case that an address of loaded data falls within address ranges of TextAddr and TextNum, each flag may be set to 1. In other words, if the data address in which the operand is stored falls within the range [TextAddr, TextAddr+TextNum], when the operand is loaded into the GPR 410, the flag(s) corresponding to one or more of a0-a7 in which the data is stored may be set to 1 in srcFlags. For example, srcFlags may have 1-bit×8.
[0074]
[0075] Referring to
[0076] First, the shift instruction may be interpreted as the rotation. Thus, a word may be rotated using the symbol <<.
[0077] Then, the multiplication instruction may be interpreted as a parallel convolution over 4-bytes in a selected word. A corresponding C expression is:
w*0xA.sub.1A.sub.01B.sub.1B.sub.0000 [Formula 1]
[0078] Here, w may be a word from a registered variable, 0xA.sub.1A.sub.0 may be a polynomial to multiply by each byte in w, and 0x1B.sub.1B.sub.0 may be a reduction polynomial for GF(2.sup.8). In other words, if the 4 bytes in w are expressed as b.sub.0, b.sub.1, w.sub.2, and b.sub.3, Formula 1 may mean
b.sub.i←b.sub.i×0xA.sub.1A.sub.0 mod 0x1B.sub.1B.sub.0(i=0, . . . ,3).
[0079] Finally, the overloaded instruction may be division. If the division instruction is used, SubBytes or Inverse SubBytes may be simultaneously performed on the 4-bytes of the selected word. A corresponding C expression is:
±1/w [Formula 2]
[0080] Here, a negative sign may indicate inverse SubBytes for decoding.
[0081] Returning to
[0082] Load/Store Instruction Overloading
[0083] As described above, block ciphers may generally use the round counter, the round-key pointer, and the intermediate value variables. Operations on these variables may also be generic. Load/store instruction overloading on these variables may improve performance in two ways.
[0084] First, by using a dedicated buffer, the values of common block cipher variables may be accessed faster. If data of a running program is in memory, memory access may be very slow. In the case that a base RISC-V core has no memory latency, one or two clock cycles may be required for one load instruction. However, depending on the system configuration, the number of cycles may increase to several tens of cycles. By adding a dedicated buffer for block cipher variables, this issue may be solved. The buffer may be used as a means for faster access with the help of the proposed instruction overloading. Here, the dedicated buffer may be referred to as a data-buffer.
[0085] As described above, the address of the intermediate value variable is registered using TextAddr and TextNum, and this value may be used to set srcFlags for the overloaded arithmetic instructions. These two Config-Buffers (i.e. TextAddr and TextNum) may also be used for the overloaded load and store instructions to load the intermediate value variables from Text0-Text7, which are data-buffers for the intermediate value variables or store the intermediate value variables in Text0-Text7. Other block cipher variables, such as the round counter and the round-key pointer, may reside in the data-buffers such as Round and KeyPointer, respectively, and their addresses may be registered in additional Config-Buffers such as RndAddr and KeyAddr, respectively. The data-buffers may provide fast data access independent of memory latency. In other words, accessing each data-buffer may always require only one clock cycle. Other methods using cache and register keywords may also provide faster data access, but these methods may not guarantee fast and consistent data access because not all desired variables always reside in the GPR or cache.
[0086] In one example embodiment, RndAddr may store an address in which the round counter variable is stored, and KeyAddr may store an address in which the round-key pointer is stored. For example, each of RndAddr and KeyAddr may have 4-bits. Also, Text0-Text7 may store the intermediate value variables (for example, operands, etc.), and may each have, for example, 32-bits. Round may store the round counter, KeyPointer may store the round-key pointer, and KeyConfig may store an increment of KeyPointer (for example, 0, 1, −1, etc.). For example, Round may have 8-bit, KeyPointer may have 32-bit, and KeyConFigure may have 2-bit.
[0087] Second, routine operations such as increment, decrement, and comparison may be executed automatically whenever data in the data-buffer is loaded. Specifically, when the round counter is read, its value is automatically decremented, and a result of comparing the updated round counter with 0 may be returned. When the round-key pointer is read, a specified value is returned instead of the pointer value, so that the number of load instructions may be reduced by one. At the same time, the pointer may be incremented and decremented for encryption and decryption, respectively. A flag indicating increment or decrement of the round-key pointer may be stored in an additional buffer, KeyConfig. When the overloaded load instruction is executed, the data-buffers such as Round and KeyPointer may be used for the routine operations as well as fast data access.
[0088]
[0089] AES Code Using Instruction Overloading
[0090] In AES, a 16-byte intermediate value including an input plain text may be treated as a (4×4)-byte array called state. Assume that b.sub.i is the ith byte of each intermediate value. Then the state may correspond to b.sub.i (0≤i≤16).
[0091] Each round of AES may include transformations such as AddRoundKey, SubBytes, ShiftRows, and MixColumns. AddRoundKey is a simple XOR with a round key, the remaining three transformations are:
[0092] SubBytes (SBs): SubByte(b.sub.i) may perform nonlinear substitutions for each byte b.sub.i. This nonlinear operation is generally defined as multiplicative inversion, and may be followed by matrix multiplication with a predefined vector and affine transformation using XOR (⊕). However, matrix multiplication may be expressed as a convolution using the reduced polynomial 0x101. Consequently, b.sub.i′=SubByte(b.sub.i) may be as follows.
b.sub.i′←((b.sub.i.sup.−1 mod 0x11B)*0x1F mod 0x101)⊕0x63 [Formula 3]
[0093] ShiftRows (SRs): ShiftRow(r.sub.j) may rotate each row r.sub.j left by j bytes.
[0094] MixColumns (MCs): MixColumn(cj) may mix c.sub.j columns. Each byte of MixColumn(c.sub.j) may be defined as follows.
y.sub.i(x.sub.i*0x02)⊕(x.sub.i+1 mod 4*0x03)⊕x.sub.i+2 mod 4⊕x.sub.i+3 mod 4(0≤i<4), [Formula 4]
[0095] Here, x.sub.i and y.sub.i (0≤i<4) are the i-th byte of the input and output columns, respectively, and * may represent convolution using the polynomial 0x11B.
[0096] In summary, AES transforms may require GF(2.sup.8), XOR, and convolutional and multiplicative inversions for rotation. Of these operations, only XOR is supported by a single instruction on a typical processor, while other operations may require multiple instructions. In particular, convolution and multiplicative inversion may be complex to compute, and may be byte-wise operations rather than word-wise operations. By applying instruction overloading to the AES code, these issues may be effectively overcome.
[0097] Referring to
[0098]
[0099] Referring to
[0100] The AES encryption code using the macros described above may be as follows.
TABLE-US-00001 C expression 1: u32 rnd, *rk, r0, r1, r2, r3, rr0, rr1, rr2, rr3; 2: SetTexts(r0, 8); 3: SetRound(rnd, key−>rounds); 4: SetKeyPointer(rk, key−>rd_key, 1); // 1: Encryption, −1: Decryption // round 0 5: r0 = BytesToRow(in); r0 {circumflex over ( )}= LoadKey(rk, 1); 6: r1 = BytesToRow(in + 1); r1 {circumflex over ( )}= LoadKey(rk, 1); 7: r1 = BytesToRow(in + 2); r2 {circumflex over ( )}= LoadKey(rk, 1); 8: r3 = BytesToRow(in + 3); r3 {circumflex over ( )}= LoadKey(rk, 1); 9: while(1){ // round 1, ..., round Nr 10: rr0 = RowCalc_SB(r0); // SBs on 1st row 11: rr1 = RowCalc_SB_SR(r1, 8); // SBs & SRs on 2nd row 12: rr2 = RowCalc_SB_SR(r2,16); // SBs & SRs on 3rd row 13: rr3 = RowCalc_SB_SR(r3,24); // SBs & SRs on 4th row 14: if(LoadRound(rnd)) break; // MixColunm on i-th row & addRoundKey with rk[i] 15: r0 = RowCalc_MC(rr0, rr1, rr2, rr3) {circumflex over ( )} LoadKey(rk, 1); 16: r1 = RowCalc_MC(rr1, rr2, rr3, rr0) {circumflex over ( )} LoadKey(rk, 1); 17: r2 = RowCalc_MC(rr2, rr3, rr0, rr1) {circumflex over ( )} LoadKey(rk, 1); 18: r3 = RowCalc_MC(rr3, rr0, rr1, rr2) {circumflex over ( )} LoadKey(rk, 1); 19: } // round Nr's addRoundKey 20: rr0 {circumflex over ( )}= LoadKey(rk, 1); RowToBytes(out, rr0); 21: rr1 {circumflex over ( )}= LoadKey(rk, 1); RowToBytes(out+1, rr1); 22: rr2 {circumflex over ( )}= LoadKey(rk, 1); RowToBytes(out+2, rr2); 23: rr3 {circumflex over ( )}= LoadKey(rk, 1); RowToBytes(out+3, rr3); 24: ReleaseVars( );
[0101] This code has three advantages. First, no precomputed table may be needed, while maintaining encryption speed. In lines 10 to 18 of the above code, column-wise macros such as ColumnCalc and ColumnCalcLast may be changed to row-wise macros such as RowCalcSB, RowCalcSBSR, and RowCalcMC.
[0102] These macros may be defined as ROL, MLT, and INV as shown in
[0103]
[0104] Referring to
[0105] Second, the C expression related to the round counter and round-key pointer may be simplified. In lines 5-8, 15-18, and 20-23 of the above AES encryption code, the macro LoadKey may indicate only a load pointer rk in the C expression. However, the processor may actually load the round key pointed to by rk and increment or decrement rk as shown in
[0106] Third, access to block cipher variables may be faster due to the data-buffer. Although this does not appear in the C expression, block cipher variables such as rnd, rk, r0-r3, and rr0-rr3 may reside in the data buffers instead of in a memory. This may provide fast and constant access speed to the block cipher variables regardless of memory configuration.
[0107]
[0108] Referring to
[0109] To use the overloaded instruction, the address of the block cipher variable should be registered in the configuration-buffers in advance. The configuration-buffers may be memory mapped buffers. Thus, the configuration-buffers may be accessed with predefined pointers such as TEXTADDR, TEXTNUM, RNDADDR, KEYADDR, and KEYCONFIG. The pointers may be defined as a volatile pointer type. For example, TEXTADDR may be defined as
[0110] #define TEXTADDR (volatile u32*) 0x10000000.
[0111] This method may generally be used to control peripherals of a device driver or to configure a processor. For example, it is possible to access control and state registration of the RISC-V core and a system control block of an ARM core to read state and change configuration. A similar approach may be used. Macros using the five pointers mentioned above to register and release variables may be shown in
[0112] Instruction Overloading for Multiple Block Ciphers
[0113] Some modern block ciphers, such as SM4, SEED, and ARIA, may be designed using operations above GF(2.sup.8). Main common operations of these ciphers may be convolution and multiplicative inversion. However, a reduced polynomial may be different for each cipher. Therefore, in order to support various block ciphers, multiplicative inversion having an arbitrary reduction polynomial is required. This feature may be particularly useful for flexible S-box implementations. As such, instruction overload may be applied to support a wider range of block ciphers defined in GF(2.sup.8).
[0114] The extension of the division (/) instruction may be redefined so that multiplicative inversion may be performed when the srcFlag bit of the operand is 1.
[0115] The corresponding C expression for multiplicative inversion is as follows.
0x1A.sub.1A.sub.0/w [Formula 5]
[0116] Here, 0x1A.sub.1A.sub.0 may be a reduced polynomial for GF(2.sup.8). In other words, if 4 bytes in w are expressed as b.sub.0, b.sub.1, b.sub.2, and b.sub.3, Formula 5 may mean b.sub.i←b.sub.i.sup.−1 mod 0x1A.sub.1A.sub.0 (i=0, . . . , 3) for nonzero b.sub.i. zero b.sub.i may be mapped to itself. Hardware logic gates for these operations may require three clock cycles for the reduced polynomial for binary inversion and calculation.
[0117] In the case of the redefined division instruction, the definition of the macro INV for debugging of
[0118] Now an example of using overloaded instructions in SM4 may be presented. SubBytes of SM4 may be defined as follows.
b.sub.i′←(((b.sub.i*0xCB)⊕0xD3).sup.−1*0xCB)⊕0xD3 [Formula 6]
[0119] Here, 0x1F5 and 0x101 may be reduced polynomials for multiplicative inversion and convolution, respectively. Therefore, SubBytes for 4 bytes of the word selected in SM4 may be defined as ((((0x1F5/((r*A){circumflex over ( )}C))*A){circumflex over ( )}C). Here, A may be 0xCB101000 and C may be 0xD3D3D3D3. The inverse SubBytes of AES and SM4 may be similarly defined using multiplicative inversion. In other words, an overloaded division instruction may require the same multiplicative inversion logic circuit for encryption and decryption.
[0120]
[0121] Instruction Overloading for Masking
[0122] The power analysis attack may be a technique for recovering a secret key by statistically analyzing a power consumption trace collected during a key-related cryptographic operation. The masking may be a measure to protect block ciphers from such attacks. This may be a secret sharing technique in which sensitive intermediate variables used in cryptographic operations are divided into shares using randomizers called masks.
[0123] In order to add hardware masking capability to the processor, the following four factors may be considered. Random number generator; buffers for mask values;
[0124] extension of load and store units; extensions of ALUs, multipliers and dividers for masked operations.
[0125] First, the mask values may have to be set randomly to be resistant to the power analysis attack. A true random number generator (TRNG) is used, and a generated random number sequence may pass National Institute of Standards and Technology (NIST) random test suites. A 32-bit random number generated by four pairs of TRNGs and an 8-bit linear feedback shift register (LFSR) may be used as an initial mask value when a normal word is first stored in Text0-Text7 and when the current mask value is updated with a new value during the masked operation.
[0126] Second, in order to maintain the mask values, buffers called Mask-Buffers including TextMask0-TextMask7 and GprMask0-GprMask7 connected to Text0-Text7 and a0-a7, respectively, in GPR may be added. The mask values are stored in MaskBuffers, and by using these values, values of Text0-Text7 and a0-a7 may be XOR-masked.
[0127] Third, load and store instructions may be redefined to support transmission of the mask values. In other words, when transmitting a masked value between Text0-Text7 and a0-a7, the overloaded load and store instructions may transmit the mask between TextMask0-TextMask7 and GprMask0-GprMask7.
[0128] Finally, to support masked operations for AES, SM4, and SIMON, five operation executions may be modified: multiplicative inversion (/), AND (&), convolution (*), rotation (<<), and XOR ({circumflex over ( )}).
[0129] The first modified instruction may be an overloaded division instruction, i.e. multiplicative inversion. A masking method is shown in
[0130] The second instruction to change may be AND. It is assumed that c=a & b is calculated. Here, & may represent a bitwise AND operation. Given that the two operands of the AND instruction are XOR-masked A=a⊕M.sub.a and B=b⊕M.sub.b′, then C=(A&B)⊕(M.sub.a&B)⊕(M.sub.b&A) may be equal to c⊕M where Mc=M.sub.a & M.sub.b. Masked AND is SIMON-only, but power analysis prevention SPECK may also be supported in the case that masking is applied to the addition. Finally, applying masking does not require changing the operations of ALUs and multipliers for XOR, convolution, and rotation. However, it may be necessary to apply the same operation to the mask value as the masked value. For example, if the masked value of GPR is rotated by 8 bits, the corresponding mask value in the Mask-Buffer may also need to be rotated by 8 bits.
[0131]
[0132] In operation 1110, the processor identifies an instruction for instructing execution of a first operation and address information of an operand corresponding to the instruction. For example, the operand may be loaded from a memory connected to the processor and stored in a dedicated buffer in the processor, and the address information of the operand may indicate an address in the dedicated buffer in which the operand is stored. If the address information of the operand satisfies a predetermined condition, the operand may be stored in the data-buffer in the processor, and the address information of the operand may be stored in the configuration-buffer in the processor. The configuration-buffer may be a memory mapped buffer.
[0133] In operation 1120, the processor executes the instruction based on whether the address information of the operand satisfies a predetermined condition. The processor executes a second operation set for the instruction on the operand if the address information of the operand satisfies the predetermined condition, and executes the first operation on the operand if the address information of the operand does not satisfy the predetermined condition.
[0134] For example, the predetermined condition may correspond to whether the address information of the operand falls within a preset address range. Here, the first operation may be an operation that is executed less than the second operation in the processor. The address range according to the predetermined condition may be pre-registered with the processor before the instruction is executed. The second operation may be an operation not included in the ISA of the processor.
[0135] Whether the address information of the operand satisfies the predetermined condition may be expressed by flag information connected to a general-purpose register in the processor. If the address information of the operand satisfies the predetermined condition, the round counter and the round-key pointer used in the operation of the processor may be stored in the data-buffer in the processor, and the address information of the round counter and the address information of the round-key pointer may be stored in the configuration-buffer in the processor.
[0136] The above descriptions with reference to
[0137] The methods according to example embodiments may be embodied in the form of program instructions that may be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination.
[0138] The program instructions recorded on the medium may be specially designed and configured for the example embodiments, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium may include hardware devices specially configured to store and execute program instructions, such as magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, magneto-optical media such as floppy disks, ROM, RAM, and flash memory. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that may be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the example embodiments, and vice versa.
[0139] Software may comprise a computer program, code, instructions, or a combination of one or more thereof, which may configure a processing device to operate as desired or independently or collectively instruct the processing device.
[0140] Software and/or data may be permanently or temporarily embodied in any type of machine, component, physical device, virtual equipment, computer storage medium or device, or transmitted signal wave to be interpreted by or to provide instructions or data to the processing device. Software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.
[0141] As described above, although the example embodiments have been described with reference to the limited drawings, those skilled in the art may apply various technical modifications and variations based on the above. For example, even if the described techniques are performed in an order different from the described method, and/or the components of the described system, structure, apparatus, circuit, and the like are combined or connected in a form different from the described method, or replaced or substituted by other components or equivalents, appropriate results may be achieved.
[0142] Therefore, other implementations, other example embodiments, and equivalents to the claims are also within the scope of the following claims.