Execution flow protection in microcontrollers
10223117 ยท 2019-03-05
Assignee
Inventors
Cpc classification
G06F9/3806
PHYSICS
G06F21/52
PHYSICS
International classification
G06F9/30
PHYSICS
G06F9/38
PHYSICS
G06F9/32
PHYSICS
Abstract
An execution flow protection module (30) for a microcontroller (10) with a memory (24) and a microprocessor (20) is described. The module (30) is configured to monitor the memory (24) access of the microcontroller (10) to identify instructions fetched by the microcontroller (10) from the memory (24) for execution by the microprocessor (20). The module (30) comprises an instruction decoder unit (32) for determining a program counter value associated with the execution flow of the instructions fetched by the microcontroller (10); a program counter predictor unit (34) for predicting the program counter value of the next fetched instruction; and an interrupt module (40) for responding if the next instruction fetched by the microcontroller does not match the predicted program counter value.
Claims
1. A method for ensuring integrity of an execution flow in a microcontroller with a memory and a microprocessor, said method comprising: determining, in an execution flow protection module in parallel to the microcontroller, instructions fetched by the microprocessor from the memory; analyzing, in the execution flow protection module, an expected execution flow of the fetched instructions; obtaining, in the execution flow protection module, a value associated with a current instruction; modifying, in the execution flow protection module, the value according to the expected execution flow; obtaining, in the execution flow protection module, an updated value associated with a next instruction; comparing, in the execution flow protection module, the modified value to the updated value; and generating, in the execution flow protection module, a response when the modified value does not match the updated value, wherein the comparison of the modified to the updated value is performed in hardware at a different location from a main microprocessor and the microcontroller.
2. An execution flow protection module for a microcontroller with a memory and a microprocessor, said execution flow protection module configured to monitor memory access of the microcontroller in parallel to the microcontroller to identify instructions fetched by the microcontroller from the memory for execution by the microprocessor, said execution flow protection module comprising: an instruction decoder unit configured to determine a program counter value associated with execution flow of the instructions fetched by the microcontroller; a program counter predictor unit configured to predict a next program counter value of a next instruction to be fetched; and an interrupt module configured to respond when the next instruction fetched by the microcontroller does not match the predicted next program counter value, wherein the comparison of the determined program counter value and the predicted next program counter value is performed in hardware at a different location from a main microprocessor and the microcontroller.
3. The execution flow protection module of claim 2, further comprising: at least one last-in-first-out structure configured to handle subfunction call next instructions, wherein the program counter value is stored in the last-in-first-out structure during subfunction calls.
4. The execution flow protection module of claim 3, wherein during a subfunction call, the program counter value is a return address value of the next instruction.
5. The execution flow protection module of claim 2, further comprising: one or more registers configured to receive a hint of the execution flow of the next instruction fetched.
6. The execution flow protection module of claim 5, wherein the program counter predictor unit is configured to predict the program counter value based on the hint.
7. The execution flow protection module of claim 5, wherein the hint provides an instruction to the one or more registers.
8. The execution flow protection module of claim 7, wherein the instruction to the one or more registers is representative of the execution flow of the next fetched instruction.
9. The execution flow protection module of claim 7, wherein the program counter predictor unit is configured to predict the program counter value based on the instruction provided to the register.
10. The execution flow protection module of claim 2, wherein the instruction decoder unit is configured to determine the program counter value of the next fetched instruction.
11. The execution flow protection module of claim 2, wherein the execution flow is a linear flow.
12. The execution flow protection module of claim 11, wherein the predicted program counter value is the address of the next fetched instruction.
13. The execution flow protection module of claim 2, further comprising: at least one additional last-in-first-out structure for handling different modes or contexts of the microcontroller configured to provide and receive program counter values when a function call or return is detected by the instruction decoder.
14. The execution flow protection module of claim 2, wherein the execution flow is a conditional branch.
15. The execution flow protection module of claim 2, wherein one or more registers are configured to receive two hints of the execution flow of the next instruction fetched.
16. The execution flow protection module of claim 15, wherein the first hint can be used to set a canary value and a second hint can enable performance of a comparison.
17. An integrated circuit comprising: the microprocessor; the memory; and the execution flow protection module according to claim 2.
18. A secure device comprising the microcontroller according to claim 17, the secure device being a smart card.
Description
BRIEF DESCRIPTION OF DRAWINGS
(1) The disclosure will now be described with reference to the following figures in which like reference numerals are used to denote like elements:
(2)
(3)
(4) It should be noted that the Figures are diagrammatic and not drawn to scale. Relative dimensions and proportions of parts of these Figures have been shown exaggerated or reduced in size, for the sake of clarity and convenience in the drawings. The same reference signs are generally used to refer to corresponding or similar feature in modified and different embodiments.
DETAILED DESCRIPTION
(5)
(6) Within the microcontroller, in communication with the memory bus 24, is an execution flow protection module 30. The module 30 comprises a number of units including an instruction decoder 32, program counter predictor 34, process stack pointer 36, main stack pointer 38 and interrupt module 40.
(7) The instruction decoder 32 is configured to read or parse at least partly the instructions provided to the microprocessor 20 and the program counter value (such as return address locations) of functions requested or fetched by and to the microprocessor 20 via the memory bus 24. The decoder 32 is able to recognize linear execution or branching execution from the instructions.
(8) Based on the instructions determined by the instruction decoder 32, the program counter predictor 34 can predict the next program counter value (or, in the case of branching execution instructions, the different alternative counter values).
(9) The process stack pointer 36 provides a stack in which the program counter values may be swapped in and out as tasks are called by the instruction decoder. Such stack is a first-in-last-out structure. When an instruction to modify the program counter is recognized by the instruction decoder 32 and a predicted program counter value generated by the program counter predictor 34, the program counter value will be pushed into this stack and popped at a return time. For example, if a function call instruction is recognized, the return address will be pushed in this stack and popped at return time. In case of overflow, an interrupt will be generated so that its content can be saved. The depth of the stack is configured such as to minimize this event. When a return instruction is detected, the last value will be popped and used for next checks. In case of an underflow, an interrupt will be generated to refill the stack.
(10) The main stack pointer 38 is a second last-in-first-out structure, in this case a stack that is used by the operating system and allows for storing of the process stack pointer if the operating system changes the task being executed. This allows the previous task to be recalled restoring the information stored within the process stack pointer 36 relevant to the task being executed and the progress of the task being executed.
(11) An interrupt module 40 is provided to compare the program counter value determined by the instruction decoder 32 with the program counter value predicted by the program counter predictor 34. The interrupt module may receive the values directly from the units 32, 34 or it may pop them from the process stack pointer 36.
(12) The interrupt module 40 is also provided to generate an interrupt request when the number of return addresses or program counter values stored in the process stack pointer 36 overflows or underflows, or when an integrity error occurs.
(13) Predicting the program counter value is achieved as outlined below with respect to Table 1.
(14) Using as an example, a simple loop:
(15) TABLE-US-00002 1 Sum = 0; 2 for (i=0; i<4; i++) Sum +=i; 3 ...
This loop will translate in assemble (ASM) to:
(16) TABLE-US-00003 1. MOVS R0, #0 ; Sum = 0 2. MOVS R1, #0; i=0 3. loop 4. ADDS R0, R0, R1 ; Sum +=i 5. ADDS R1, R1, #1 ; i++ 6. CMP R1, #4 ; Compare i to 4 7. BLT loop ; if less than then branch to loop 8. ...
(17) It can be seen that ASM line 4 can be reached either from line 2 when code is executing linearly or from line 7 when the loop has still to be completed. Other use cases, such as nested loops, switches, etc. exist where a label will be reached from multiple locations, so the simple proposal of using the previous address to check the execution flow integrity will not work.
(18) For an Advanced RISC Machine (ARM) core, only a limited number of instructions can result in non-linear execution, as shown in Table 1.
(19) TABLE-US-00004 TABLE 1 An instruction set operable to be executed by the ARM cortex M0 microcontroller of FIG. 1 Format 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Move shifted register 01 0 0 0 Op Offset5 Rs Rd Add and subtract 02 0 0 0 1 1 1 Op Rn/offset3 Rs Rd Move, compare, add, and subtract 03 0 0 1 Op Rd Offset8 immediate ALU operation 04 0 1 0 0 0 0 Op Rs Rd High register operations and branch 05 0 1 0 0 0 1 Op H1 H2 Rs/Hs RdHd exchange PC-relative load 06 0 1 0 0 1 Rd Word8 Load and store with relative offset 07 0 1 0 1 L B 0 Ro Rb Rd Load and store sign-extended byte and 08 0 1 0 1 H S 1 Ro Rb Rd halfword Load and store with immediate offset 09 0 1 1 B L Offset5 Rb Rd Load and store halfword 10 1 0 0 0 L Offset5 Rb Rd SP-relative load and store 11 1 0 0 1 L Rd Word8 Load address 12 1 0 1 0 SP Rd Word8 Add offset to stack pointer 13 1 0 1 1 0 0 0 0 S SWord7 Push and pop registers 14 1 0 1 1 L 1 0 R Rlist Multiple load and store 15 1 1 0 0 L Rb Rlist Conditional branch 16 1 1 0 1 Cond Softset8 Software interrupt 17 1 1 0 1 1 1 1 1 Value8 Unconditional branch 18 1 1 1 0 0 Offset11 Long branch with link 19 1 1 1 1 H Offset Format 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
(20) These instructions can be divided into the following groups: Branches where offset is part of instruction, such as: B{cond} label, i.e., format 16, with a jump of 256 to +254 bytes; B label, i.e. format 18, with a jump of +/2 KB; a branch link label, i.e. format 19 (repeated), with a jump of /16 MB. Branches where a register is used to indicate the next instruction address, such as: Branch return and branch link return, i.e. format 5; Branches where a memory location is used to indicate the next instruction address, such as: POP, i.e. format 14, where the return address saved in the stack is pushed back to the program counter value.
(21) It should be noted that instructions with link options, i.e. branch link and branch link return, are used to call functions as they save the next program counter value to a link register whereas POP {PC} and branch return are used to return from calls as they branch to return addresses saved in the process stack 36 or link register.
(22) Another grouping can be:
(23) 1. Instructions where it is possible to compute the next address, i.e. instructions without a branch or an unconditional branch;
(24) 2. Instructions where it is possible to compute an alternative address, e.g. a conditional branch where we know the program counter value will be either incremented or a value computed from instructions; a. One problem with this grouping is that if a status bit stored in a link register is wrong, it might not execute a loop completely or it may take the wrong path.
(25) 3. Instructions where it is possible to guess the address, (but this might be complex), e.g. instructions where the program counter value is saved in a link register or on the process stack 36 and later returned by a branch return or a POP. Note that in one direction, it might be easy to guess an address, but attempting to guess the address in a return flow direction might be more difficult; a. One problem with this grouping is that there is a potential of buffer overflow or ROP attacks.
(26) 4. Instructions where there is no way to guess the program counter value, such as the return address, because information is stored in a CPU register, e.g. branch return and branch link return; a. One problem with this grouping is that it is susceptible to a ROP type of attack or simply at risk of fault insertion where registers are modified.
(27) It should be noted that only a few of the leading bits needs to be parsed to detect which instruction is being decoded. For each of these groups protection is proposed as outlined below.
(28) Instructions Group 1
(29) Here, the execution flow protection module 30 can parse instructions and detect that the instruction is not a branch and that the value of the program counter will be incremented. Alternatively, the module can detect that the instruction is an unconditional branch without a link where we can compute the next program counter value.
(30) Instructions Group 2
(31) Here, a conditional branch is necessary, so the program counter value will be either incremented or modified with an offset. What cannot be checked is the status bit, so the module 30, via the program counter predictor 34, computes the two possible addresses and then checks that one of them is taken using the interrupt module 40.
(32) To secure a sensitive loop so that the correct number of iterations is performed, code may be used to provide a hint to the module 30. For example, a write to a register of the module 30 to initialize a counter may be hinted. In this case, each time a branch is taken, this counter will be decremented. If the counter has not reached 0 when the branch is not taken, then an error will be detected by the interrupt module 40.
(33) For instance to ensure that a loop is executed 16 times
(34) TABLE-US-00005 1 looprogram counterount = 0; 2 Init_XFPU_branch_counter_register(16); // Simple AHB write 3 for (i=0; i<16; i++){ 4 do something; 5 } 6 Do something
(35) In the ASM, line 3 will result in a branch being taken 16 times. The counter initialized in line 2 will decrement each time the branch is taken.
(36) Hint instructions could be more or less complex. For example, the previous examples only cover static values, but more complex ones, such as to cover loops whose number of iterations is known only at execution time, may be used.
(37) There are also other complex cases, where multiple branches can occur or are embedded. In a similar manner to what is done in software, hints could be given by the software to the module 30 via register writes. As writes to module 30 registers are themselves protected due to linear execution, it will be difficult to bypass them.
(38) Different hints could be made with a combination of different conditions, for example;
(39) 1. Counter n must be manipulated:
(40) a. set to value x; or
(41) b. increment/decremented of x
(42) 2. Counter n value must be checked:
(43) a. When next branch is skipped;
(44) b. When a branch is taken; or
(45) c. Immediately
(46) 3. Counter n value must be equal to:
(47) a. X; or
(48) b. 0
(49) 4. Counter n will be decremented:
(50) a. every time next branch is taken; or
(51) b. every time a branch is taken
(52) For instance, in the previous example, we would have had condition 1a 2a, 3b and 4a.
(53) Instructions Group 3 and 4
(54) This group includes mainly function calls, including software interrupts, and returns. Typical cases involve the use of the different link, exchange and stack operations. In real code, multiple successive nested calls may be implemented.
(55) It may be necessary to add to this group the instructions of Group 1 such as MOV if its destination is R15, i.e. the Program Counter. However, this is an unusual occurrence as generally a branch return subfunction is used for this purpose. It is also possible that a pre-processing step could ensure that the program counter value is not manipulated directly, but only via branch and link instructions.
(56) To be able to check execution flow, it is necessary to keep track of the return address so that the return address can be used to verify that the correct call subfunction has been called and also to check that the link address is correct when going forward.
(57) What can be done is to have a memory to save the return address in the module 30. When a link function (such as a branch link or branch link return) is recognized by the instruction decoder 32, the next program counter will be predicted by the program counter predictor 34 and saved in the parallel process stack 34. When a branch return or a POP {PC} is recognized then a check is made that the new value of the program counter matches with the expected value. As there can be multiple nested calls functions, it is desirable to use benchmarking tools (such as the one part of ARM Development Studio 5) to find the optimal depth at which program counter values must be predicted.
(58) The interrupt module 40 also ensures that in case of overflow, the stack content can be saved to memory. The stack content will also be required to support context switches of real time operating systems.
(59) Saving the program counter value, such as the return address does not protect against attacks where an address in a register is modified to call a critical function. To ensure that critical functions are called by the correct code, in a software only solution, one technique utilizes a function to check that a variable which cannot be easily guessed has been correctly set by the calling code, as shown below.
(60) TABLE-US-00006 1 int *magicByte = get_rand( ); 2 ... 3 func1(*magicbyte); // call function 4 5 void func1(int canaryvalue){ 6 if (canaryvalue != *magicbyte) {throw error} 7 else {magicbyte = get_rand( ); // Destroy value 8 }; 9 ...
(61) This code protects against an attacker which can modify the call stack or program counter, but for which it is more difficult to access the memory to retrieve a value and then to push it on the stack.
(62) In the present disclosure, a first hint can be used to set the canary value and a second hint to enable the module to perform the comparison. The advantage compared to a software only solution is that the comparison check will be performed in hardware and cannot be skipped.
(63) If there is no hint given to the module 30, then it will have to accept any branching. In this case, one option would be to either make hinting of a function call mandatory for all the firmware or at least part of it, such as for critical functions.
(64) Software interrupts are just a specific function call. For hardware interrupts, detecting interrupts to synchronize to them might be difficult as some interrupts are internally generated, such as an invalid instruction, address, etc., and also the timing of decoding might be approximate, for example in the case of nested interrupts. One solution is to detect access to a vector table, and assume that access to this table is authorized, and therefore to save the next program counter value as in a case of a normal function jump. Depending on whether the vector table is stored in flash or Static Random Access Memory (SRAM), the implementation will vary.
(65) This might potentially be an attack vector. By modifying a vector value, an attacker will be able to execute any code. Accordingly, the vector table has to be protected against tampering. For instance, if the vector tables are stored in SRAM, they should be accessible only at specific execution points and only available to be read once the tables have been written.
(66) Depending on the overhead, the module can support in hardware two different last-in-first-out structures, such as stacks 36, 38. When the operating system switches to a new task, the interrupt handler also saves the context stored in the process stack 36 of the previous task and restores the context of the next task. In the present disclosure, the return address of the different functions and other data such as the different counters setup by the hints is restored. For the operating system, no save and restore is necessary because of operating system file-in-file-out structures in the module 30. Note that the implementation of this example might be easier because the depth required can be known in advance and will be quite shallow, because the operating system is optimized for low overheads, e.g. it does not have recursive calls. This means that there will be no need to handle overflows and underflows in hardware and software.
(67) Hints can be passed via register writes to the module 30. To reduce the overhead of register writes, which can take several cycles for some instructions, it is possible to use the address most significant bit to transfer settings to the module 30. Typically, addresses are 32 bits wide, whereas for memory controllers, the address range is much more limited.
(68) An overview of the method steps undertaken by the module 30 to ensure the integrity of execution flow in a microcontroller with a memory bus and a microprocessor is shown in
(69) This value is stored, for example in a last-in-first-our structure, in this case a stack, such as a process stack, with an associated process stack pointer that allows the value to be obtained from the stack. Based on the expected execution flow, the value is when modified 56 accordingly. For example, for linear execution flow, the value may be incremented. For non-linear execution flow, the value may be determined using hints or other means as described above.
(70) An updated value is then obtained 58 that is based or associated with the next instruction. For example, f the original value is the return address of the current instruction, the modified value will be the expected return address of the next instruction. The updated value obtained will then be the actual return address of the next instruction.
(71) The modified value is then compared 60 to the updated value. If the modified value does not match the updated value, an alert is generated 62. The alert may be a warning to a user, or it may be an error.
(72) This invention can be applied to an integrated circuit based around a microcontroller controlling multiple peripherals and where the applications require some security (for example because money is involved), e.g. a microcontroller used for a power meter or in secure elements such as smart cards, etc.
(73) From reading the present disclosure, other variations and modifications will be apparent to the skilled person. Such variations and modifications may involve equivalent and other features which are already known in the art of cryptography and which may be used instead of, or in addition to, features already described herein.
(74) Although the appended claims are directed to particular combinations of features, it should be understood that the scope of the disclosure of the present invention also includes any novel feature or any novel combination of features disclosed herein either explicitly or implicitly or any generalisation thereof, whether or not it relates to the same invention as presently claimed in any claim and whether or not it mitigates any or all of the same technical problems as does the present invention.
(75) Features which are described in the context of separate embodiments may also be provided in combination in a single embodiment. Conversely, various features which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination. The applicant hereby gives notice that new claims may be formulated to such features and/or combinations of such features during the prosecution of the present application or of any further application derived therefrom.
(76) For the sake of completeness it is also stated that the term comprising does not exclude other elements or steps, the term a or an does not exclude a plurality, a single processor or other unit may fulfill the functions of several means recited in the claims and reference signs in the claims shall not be construed as limiting the scope of the claims.