AUTOMATIC INSERTION OF MASKING INTO AN ALGORITHM
20180248682 ยท 2018-08-30
Assignee
Inventors
Cpc classification
H04L9/003
ELECTRICITY
H04L9/0631
ELECTRICITY
H04L9/002
ELECTRICITY
G06F2207/7238
PHYSICS
H04L2209/046
ELECTRICITY
International classification
H04L9/00
ELECTRICITY
Abstract
A computer implemented method, program product, and system implementing said method, for transforming a call graph representation of an algorithm into a secured call graph representation of said algorithm. The call graph comprises inputs (a, b, f), internal variables being the edges of the graph (c, d, e), elementary functions being the nodes of the graph, said functions being either linear or not linear, and outputs (g), the method comprising: a step of masking each input of the call graph, a step of replacing each unmasked internal variable of the call graph with a masked variable, a step of replacing at least each non-linear function of the call graph with an equivalent function that applies to masked variables, a step of unmasking each output of the call graph.
Claims
1. A computer implemented method for transforming a call graph representation of an algorithm into a secured call graph representation of said algorithm, the call graph comprising at least one input, at least one edge, at least one node and at least one output, the edges of the call graph representing internal variables of said algorithm, the nodes of the call graph representing linear or non-linear elementary functions of the algorithm, the computer implemented method comprising: masking each input of the call graph, replacing each unmasked internal variable of the call graph with a masked variable, replacing at least each non-linear function of the call graph with an equivalent function that applies to masked variables, unmasking each output of the call graph.
2. The computer implemented method of claim 1, wherein the call graph comprises parts processed iteratively, the replacing unmasked internal variables by masked internal variables comprising identifying internal variables used both as input and output of parts of the call graph processed iteratively, and using for these variables the same mask in input and output of said part of the call graph.
3. The computer implemented method of claim 1, wherein the call graph comprises parts processed iteratively, the replacing unmasked internal variables by masked internal variables comprising identifying internal variables used both as input and output of parts of the call graph processed iteratively, and inserting in a feedback edge of said iterative parts additional nodes for modifying the masks of said internal variables.
4. The computer implemented method of claim 1, wherein the masks of internal variables in parts of the call graph processed iteratively are changed at regular intervals, and the associated functions modified accordingly.
5. The computer implemented method of claim 4, further comprising inserting in the call graph additional nodes for refreshing the masks of the internal variables of said iterative parts.
6. The computer implemented method of claim 1, wherein the equivalent functions calculated in replacing at least each non-linear function of the call graph with an equivalent function that applies to masked variables are implemented using match tables.
7. The computer implemented method of claim 1, wherein the replacing at least each non-linear function of the call graph with an equivalent function that applies to masked variables further comprises replacing each linear function of the call graph by an equivalent function considering the masks of the input and output internal variables.
8. The computer implemented method of claim 1, wherein all the masks values are determined randomly.
9. The computer implemented method of claim 1, further comprising compiling said call graph to produce a protected executable code.
10. A computer program product, stored on a non-volatile computer-readable data-storage medium, comprising computer-executable instructions to cause a computer system to carry out the method according to claim 1.
11. A non-volatile computer-readable data-storage medium containing computer-executable instructions to cause a computer system to carry out the method according to claim 1.
12. A system comprising at least a processor and at least a memory, the memory storing computer-executable instructions to cause the system to carry out a computer implemented method for transforming a call graph representation of an algorithm into a secured call graph representation of said algorithm, the call graph comprising at least one input, at least one edge, at least one node and at least one output, the edges of the call graph representing internal variables of said algorithm, the nodes of the call graph representing linear or non-linear elementary functions of the algorithm, the system comprising a processing device configured to: mask each input of the call graph, replace each unmasked internal variable of the call graph with a masked variable, replace at least each non-linear function of the call graph by an equivalent function that applies to the masked variables, and unmask each output of the call graph.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0043] The invention will be better understood and its various features and advantages will emerge from the following description of a number of exemplary embodiments provided for illustration purposes only and its appended figures in which:
[0044]
[0045]
[0046]
[0047]
[0048]
[0049]
[0050]
[0051]
[0052]
[0053] The examples disclosed in this specification are only illustrative of some embodiments of the invention. They do not in any manner limit the scope of the invention which is defined by the appended claims.
DETAILED DESCRIPTION OF THE INVENTION
[0054]
[0055] The unprotected call graph of
[0056] The unprotected call graph also comprises oriented links connecting the output of nodes to input of other nodes, referred to as the edges of the graph. These edges are associated with intermediate variables (c, d, e) that are transmitted from one function to the subsequent functions.
[0057] The unprotected call graph further comprises one or more inputs (a, b, f), and one or more outputs (g).
[0058] The unprotected call graph also comprises an iterative part 101, the functions P, Q, R and S being processed a plurality of times, the subsequent iterations taking as input the output of the previous iteration.
[0059] The call graph is a representation of the interdependency relations of the function during the execution of the program, and interactions required in order to generate the output variable (g) from the input variables (a, b, f). It can be described using various programming languages, as for instance Graphcet, UML (Unified Modeling Language) or HTML (HyperText Markup Language). It can be generated automatically, from the source code using a software like Doxygen or Eclipse, or from the compiled code (assembler language, LLVM-IR (Low Level Virtual MachineIntermediate Representation), VHDL (VHSIC Hardware Description Language), Verilog, or even generated manually.
[0060]
[0061] In order to provide robustness against cryptanalysis and content recovery attacks, the invention is based on masking each variable of the program with a mask and, according to some advantageous embodiments of the invention, on changing this mask during the program execution.
[0062] To this end, the input variables (a, b, f) are masked (111, 112, 113). The mask values (m.sub.1, m.sub.2, m.sub.6) can be chosen randomly. The masked variables are designated thereinafter as am.sub.1, bm.sub.2, and fm.sub.6.
[0063] Each unmasked internal variables of the call graph (c, d, e) is then replaced by a masked variable (cm.sub.3, dm.sub.4, em.sub.5). The mask may be chosen randomly, or, when the internal variable is the output of a linear function, inherited accordingly from the masks used for the inputs of the function.
[0064] In such embodiment, in order to guaranty the consistency of the protected call graph, when an internal variable is used both as input and output of an iterative part of the graph (as for g in
[0065] Therefore, the inputs of the function P are masked equally at each iteration.
[0066] After assigning a mask to each input and replacing each unmasked internal variable by a masked variable, the functions (P, Q, R, S) associated to the nodes of the call graph are modified (P, Q, R, S), in order to comply with the mask value.
[0067] When the function is linear and the masking is Boolean, the output masks may be inherited from the input masks. Thus, the function does not require to be modified. Otherwise, the function has to be replaced by an equivalent function that reaches the same result while taking as input masked variables, and masks the output of the function.
[0068] When the function is non-linear, it is generally not possible to make a link between the output mask and the input(s) mask(s). Unmasking the input data, processing the function, and masking the result cannot be considered, as unprotected information would appear, and could be considered as a leak of information. Thus, the function may be replaced by a match table that provides all the possible results of the function, the match table being constructed considering the input and output masks. This way, all the variables processed by the algorithm are protected, and no approximations of non-linear function(s) are to be made, in contrast to the prior art.
[0069] Finally, the variable that outputs the call graph (g) may be unmasked (114).
[0070] In the first embodiment, all the internal variables are accordingly protected and the result of the protected call graph of
[0071]
[0072] However, the consistency of the masking concerning the iterative part of the graph is ensured by inserting, in the feedback loop 101 of the iterative part, an additional node 120, which is configured to modify the mask value of the internal variables that are used both as input and output of the iterative part.
[0073] In
[0074] The advantage of the second embodiment compared to the first embodiment is that all masks may be chosen randomly.
[0075] Various methods may be used to mask data. The masking can be a simple first order Boolean masking, as for example summing the variable with a secret shared value, a higher order Boolean masking, or any other more elaborated masking technique. One of the advantages of the invention is that it is compatible with any masking technique.
[0076]
[0077]
[0078] In such embodiment, two possibilities may be implemented: [0079] Inserting in the call graph an additional node 212, the additional node corresponding to a conversion of the mask affecting the result of the linear function f(m.sub.1, m.sub.2) to the required mask m.sub.3. To that end, the function 212 may add the mask m.sub.3 to the result of the linear function before removing the mask f(m.sub.1, m.sub.2), or [0080] calculating a match table 221 that is equivalent to the linear function taking into account the values of the masks, the match table being stored in memory. This table comprises as inputs all the possible values of am.sub.1 and bm.sub.2, and associates each of these values with the corresponding value of the output cm.sub.3. The match table may be calculated by executing the function 211 and 212 over all possible inputs. This table may be stored enciphered, although it might not be necessary as the data contained in the table do not allow determining any information concerning the original unprotected data processed by the function.
[0081] The processing of the linear functions as represented in
[0082]
[0083] Contrary to the linear functions represented in
[0084] All possible input variables may be browsed to construct the table. Thus, the table may be proportional to the number of inputs, the number of outputs, and/or the data size. For instance, considering that inputs a and b are coded over 8 bits, the associated match table is a table that sizes 2{circle around ( )}8 (number of possibilities for a)*28 (number of possibilities for b)*8 bits (size of c).
[0085]
[0086] The functions P, Q, R and S are executed twice, the values of the internal variables during the second iteration (c, d, e and g) being different from the values of the same variables during the first iteration (c, d, e and g).
[0087]
[0088] By refreshing the values of the masks (that is to say by changing the values of the masks) at the end of each iteration, a higher level of protection is obtained, in particular against side channel attacks using the constant aspect of the masks applied.
[0089] In
[0090] For the subsequent iterations, the masks used for the internal variables (c, d, e and g) are modified, and the associated functions modified accordingly. The masks applied to variables used as inputs of the iteration may also be modified. In the example, nodes 111 and 113 applying the masks m.sub.1 and m.sub.6 to the inputs a and f are changed to nodes 401 and 403 applying new masks m.sub.8 and m.sub.12. The masked variables cm.sub.3, dm.sub.4 and em.sub.5 are changed to new masked variables cm.sub.9, dm.sub.10 and em.sub.11. Function P, taking as inputs variables masked by the masks m.sub.1 and m.sub.2, is modified into the equivalent function P, the function P taking as inputs the variables masked m.sub.8 and m.sub.7. If the function P is a linear function and the output of the function is masked by a mask inherited from the masks of the inputs, P may remain unmodified. Functions Q, R and S are modified into functions Q, R and S accordingly.
[0091] In an alternative embodiment (not represented), the masks of the inputs a and f may not be modified from one iteration to another.
[0092] The call graph can be represented for example in
[0093]
[0094]
[0095] Each round is an iterative process, meaning that the y.sub.0 byte, which is the output of one iteration of the round, loops back on x.sub.0, which is the input of the round.
[0096] In the call graph representation, the circles represent the operation performed on the different variables, while the edges of the graph represent the internal variable. In
[0097] The first function applied to x.sub.0 in the round is called the substitution box (known as S-box). This substitution is the main element of the algorithm and consists in a bijective non-linear operation performed on x.sub.0. The output of the first substitution box is the intermediate variable a in
[0098] Next function, applied to , is a set of three linear operations: times 1, times 2 and times 3, performed in a Galois field. Such operations are linear.
[0099] The results of these operations, called b, b and b, are mixed with the results of the corresponding operations performed on bytes 5, 10 and 15 (x.sub.5, x.sub.A, x.sub.F). Such operation is called Mixcolumn. The mix consists in a XOR operation performed on the four entries. This function is equivalent to three successive XOR operations. The output of the Mixcolumn operation is an intermediate variable c.
[0100] Next function applied to c is a step called Addroundkey, that consists in mixing c with the key (or a specific byte processed from the key) k.sub.0, via a XOR operation, to generate y.sub.0, that will be used as an input of the subsequent iteration.
[0101]
[0102] As x.sub.0 is an intermediate variable, it is masked by mask m.sub.00. This mask can be a XOR performed between x.sub.0 and a known, random value, but it can also be a multidimensional share, meaning that a plurality of mask layers are applied. In the latter case, m.sub.00 is not necessarily a byte (8 bits) share, but can be a share of any dimension. It can also be a pair, a triplet or any other association of masks having the same or different sizes.
[0103] After being processed in the S-box step, the intermediate variable is masked by mask m.sub.01. As the substitution box is a non-linear function, it must be replaced by an equivalent match table that all in once removes mask m.sub.00, perform the non-linear function, and mask the result with mask m.sub.01, as illustrated in
[0104] After performing the times 1, times 2 and times 3 operations, the intermediate variable b may be masked by mask m.sub.02. In some embodiments, the mask may be equal to m.sub.01, or the functions may be replaced by an equivalent match box, so that m.sub.02 can be chosen as totally independent of m.sub.01.
[0105] In another embodiment, an equivalent match table 501, performing the operations of the substitution box and the times 1, times 2 and times 3 operations, may be calculated. The match table may have one input (x.sub.0m0.sub.0) and three outputs (bm.sub.02, bm.sub.02, bm.sub.02). Alternatively three match tables may be calculated, each of them having one input and one output. The substitution box as regrouped with the time 1, times 2 and times 3 operators in a match table represents an operation called T-box (Table box) when non-encrypted.
[0106] In another embodiment, different masks may be assigned to each of the intermediate variable b, b and b.
[0107] The step of mixing the results of the calculations performed over the various bytes, is a linear function. As a consequence, the output mask m.sub.03 can be retrieved from the masks m.sub.01, m.sub.51, m.sub.A1, and m.sub.F1 of the inputs (m.sub.51, m.sub.A1, and m.sub.F1 being the mask respectively associated with the output of the T-box calculation for processing the variables x.sub.5, x.sub.A and x.sub.F). However, an equivalent match table may be calculated, which allows choosing an output mask m.sub.03 that is totally independent from the input masks.
[0108] In the next step, intermediate variable cm.sub.03 is mixed with key k.sub.0. As the key is not a variable but a constant, the key is not required to be masked. The result of the mixing is y.sub.0m.sub.04. As the mixing operation is linear, m.sub.04 is related to m.sub.03 or can be totally independent if the mixing function is replaced by an equivalent match table.
[0109] Finally, a refresh node 502 may be inserted. The first purpose of the refresh node is to ensure the consistency of the protected call graph by converting m.sub.04 into m.sub.00, as the variable y.sub.0/x.sub.0 is used as input/output of an iterative part of the call graph. In some embodiments, the refresh mask can be further associated with a step of changing the mask for at least some of the internal variables belonging to the iteration loop (being in that case masks m.sub.00, m.sub.01, m.sub.02, m.sub.03, and m.sub.04).
[0110] When the nodes of the linear functions are inherited from their parent nodes, it is possible to only refresh the mask(s) of the variables that are inputs of the loop (m.sub.00 in
[0111] It should be noted that the refresh node 512 is optional. Another way of guarantying the consistency of the protected call graph may consist for example in choosing m.sub.04 equal to m.sub.00.
[0112]
[0113] The method comprises: [0114] a step 601 of masking the inputs of the call graph, to produce masked inputs; [0115] a step 602 of replacing the unprotected variables of the graph, which are represented by the edges of the graph, by masked variables. The mask of the masked variables can be selected randomly, or can be the result of the use of a linear function over masked variables; [0116] a step 603 of replacing at least the non-linear functions of the call graph, which are represented by the nodes of the graph, by equivalent functions, so that it performs the same operations as the initial function, while taking into account the masks affecting the input/output of the function. This operation can also be performed on linear functions of the call graph. One possible implementation consists in replacing the function by a match table that is generated considering the masks affecting the inputs/outputs, and associating output values to each possible combination of input values; and [0117] a step 604 of unmasking the outputs of the call graph.
[0118] The method according to the invention may comprise an additional optional step 605 of modifying the iterative parts of the graph so that the masked inputs and variables are refreshed (meaning that the value of the mask is modified) at each iteration of the loop or at a slower rate, at regular intervals or randomly. Thus, variables iteratively calculated are never protected with the same mask. Although not limited to such applications, the method according to this embodiment has particular advantages when applied to cryptographic algorithms, which often comprise a large number of iterations performed over small calculations.
[0119] The methods described herein can be implemented by computer program instructions supplied to a processor of any type or any software programmable machine, as for instance a microprocessor, microcontroller, or DSP, to produce a machine that executes the instructions to implement the functions/acts specified herein. These computer program instructions may also be stored in a computer-readable medium that can direct a computer to function in a particular manner. To that end, the computer program instructions may be loaded onto a computer to cause the performance of a series of operational steps and thereby produce a computer implemented process such that the executed instructions provide processes for implementing the functions specified herein.
[0120] The method can be used standalone, in order to generate a protected representation of an algorithm from an unprotected one, but can also be paired with a compiler, thereby producing a protected compiled code that can be executed by a calculation machine, or a hardware code, in the form of a netlist generated by the compiler and implemented on a dedicated calculation machine, as for example a Field Programmable Gate Array (FPGA), or an Application Specific Integrated Circuit (ASIC).
[0121] To this end, the method may comprise an additional step 606 of compiling the protected call graph to produce an executable code that is robust to cryptanalysis and content recovery attacks.
[0122]
[0123] More generally, the methods and devices described herein may be implemented by various means. For example, these techniques may be implemented in hardware, software, or a combination thereof.
[0124] The various embodiments of the invention provide several advantages including the following: [0125] they present a high level approach, non correlated to the implementation, [0126] they provide robustness to an algorithm against all kinds of side channel attacks as, remarkably, all the variables processed by the algorithm are masked while the semantic of the algorithm is preserved, [0127] they are applicable to any type of software program, to linear and non-linear functions, [0128] they are quite easy to implement and to compile, [0129] they do not present information leakages, as all the variables are masked, [0130] they can be programmed to be executed automatically, without involving a human operator.
[0131] While embodiments of the invention have been illustrated by a description of various examples, and while these embodiments have been described in considerable details, it is not the intent of the applicant to restrict or in any way limit the scope of the appended claims to such details. Additional advantages and modifications will readily appear to those skilled in the art. The invention in its broader aspects is therefore not limited to the specific details, representative methods, and illustrative examples shown and described.