Continuous run-time validation of program execution: a practical approach

09767284 · 2017-09-19

Assignee

Inventors

Cpc classification

International classification

Abstract

Trustworthy systems require that code be validated as genuine. Most systems implement this requirement prior to execution by matching a cryptographic hash of the binary file against a reference hash value, leaving the code vulnerable to run time compromises, such as code injection, return and jump-oriented programming, and illegal linking of the code to compromised library functions. The Run-time Execution Validator (REV) validates, as the program executes, the control flow path and instructions executed along the control flow path. REV uses a signature cache integrated into the processor pipeline to perform live validation of executions, at basic block boundaries, and ensures that changes to the program state are not made by the instructions within a basic block until the control flow path into the basic block and the instructions within the basic block are both validated.

Claims

1. A processor system comprising: a cache line signature generator, configured to generate a dynamic signature of a cache line of an instruction cache holding instructions of a program; hardware verification logic configured to securely verify a reference signature for a respective cache line content against the dynamic signature and produce a verification signal in dependence thereon; and a hardware instruction processing pipeline comprising a plurality of stages, configured to: load instructions from the cache line, speculatively execute the instructions in the plurality of stages prior to commitment, and in selective dependence on the verification signal, one of: (i) flush the hardware instruction processing pipeline prior to commitment of the instructions, and (ii) permit commitment of the instructions; the hardware instruction processing pipeline comprising: a hardware commit defer unit, configured to allow reversible partial execution of the instructions prior to commitment of the instructions, while preventing irreversible changes to a program state of the program, dependent on the verification signal; a hardware buffer configured to buffer a write resulting from the partial execution of the instructions prior to commitment of the instructions; and hardware control logic configured to write the buffered write to a memory, contingent on the verification signal.

2. The processor system according to claim 1, further comprising decryption logic configured to decrypt an encrypted reference signature in dependence on a decryption stored key, wherein the verification logic is further configured to verify an available decrypted reference signature against the dynamic signature during a pipeline latency of the hardware instruction processing pipeline, substantially without stalling the hardware instruction processing pipeline waiting for the verification.

3. The processor system according to claim 1, wherein the verification logic is further configured to permit commitment of the instructions executed within in the hardware instruction processing pipeline based on at least a partial match of the dynamic signature of the cache line with the decrypted reference signature.

4. The processor system according to claim 1, further comprising checkpoint logic configured to define a checkpoint state prior to execution of instructions by the hardware instruction processing pipeline stored within the cache line, and to roll back a state of the processor system to the checkpoint state if the verification logic fails to verify the decrypted reference signature against the generated signature.

5. The processor system according to claim 1, wherein the hardware instruction processing pipeline further comprises branch prediction logic and speculative processing logic, wherein the verification logic is configured to generate a signal corresponding to a branch misprediction, resulting in the speculative processing logic causing at least a rollback to a state prior to commencement of processing of the instructions whose verification failed.

6. The processor system according to claim 1, wherein the cache line signature generator is further configured to generate a signature of both a control flow path and instructions along the control flow path.

7. The processor system according to claim 1, wherein the cache line signature generator is further configured to dynamically generate the dynamic signature of the cache line of the instruction cache as respective instructions of the plurality of instructions enter the cache line.

8. The processor system according to claim 1, wherein the cache line signature generator is further configured to generate the dynamic signature of the cache line of the instruction cache on the plurality of instructions stored in the cache line.

9. The processor system according to claim 1, further comprising: a communication port configured to receive encrypted information representing at least one reference signature; and a memory configured to securely receive and store a secret key configured to decrypt the information representing the at least one reference signature.

10. The processor system according to claim 1, wherein the cache line signature generator is further configured to generate the dynamic signature of the cache line of the instruction cache concurrently with decoding of the instructions from the cache line.

11. A processing method, comprising: generating a dynamic signature of a cache line of an instruction cache holding instructions of a program, with a cache line signature generator; securely verifying a reference signature for a respective cache line content against the dynamic signature with hardware verification logic, and producing a verification signal in dependence thereon; loading instructions from the cache line into a hardware instruction processing pipeline having a plurality of stages; and speculatively and reversibly executing the instructions in the plurality of stages of the hardware instruction processing pipeline, to a stage prior to commitment of at least one speculatively executed instruction, while preventing irreversible changes to a program state of the program, until the verification signal indicates verification of the dynamic signature with respect to the reference signature; buffering an external write resulting from the speculative and reversible execution of the instructions prior to commitment of the instructions with a hardware buffer; and externally write the buffered external write with hardware control logic, to commit execution of the instructions, contingent on the verification signal.

12. The method according to claim 11, further comprising: producing a signal by the verification logic selectively in dependence on said securely verifying; and selectively in dependence on a state of the signal from the verification logic, one of: flushing the hardware instruction processing pipeline in dependence on a signal prior to commitment of the instructions, and committing execution of the instructions.

13. The method according to claim 11, further comprising flushing the hardware instruction processing pipeline in dependence on a signal selectively in dependence on said securely verifying, prior to commitment of the instructions.

14. The method according to claim 11, further comprising committing execution of the speculatively executed instructions in the hardware instruction processing pipeline in dependence on said securely verifying, prior to commitment of the instructions selectively.

15. The method according to claim 11, further comprising: decrypting an encrypted reference signature in dependence on a decryption stored key; wherein said verifying further comprises verifying an available decrypted reference signature against the dynamic signature during a pipeline latency of the hardware instruction processing pipeline, substantially without stalling the hardware instruction processing pipeline waiting for the verification.

16. The method according to claim 11, further comprising wherein the verification logic is configured to permit commitment of the instructions executed within in the hardware instruction processing pipeline based on at least a partial match of the dynamic signature of the cache line with the decrypted reference signature.

17. The method according to claim 11, further comprising: defining a checkpoint state prior to execution of instructions by the hardware instruction processing pipeline stored within the cache line; and rolling back a state of the processor system to the defined checkpoint state selectively in dependence on a failure of the verification logic to verify the decrypted reference signature against the generated signature.

18. The method according to claim 11, wherein the hardware instruction processing pipeline comprises branch prediction logic which produces a branch misprediction signal, and speculative processing logic, further comprising: generating a signal from the verification logic corresponding to a branch misprediction signal, resulting in the speculative processing logic causing at least a rollback to a state prior to commencement of processing of the instructions whose verification failed.

19. A processor system, comprising: a hardware cache line signature generator, configured to dynamically generate a signature of a cache line contents of an instruction cache comprising a plurality of instructions; hardware verification logic configured to verify a reference signature associated with authentic cache line contents against the dynamically generated signature of the cache line contents, to produce a verification signal selectively in dependence on the verification; and a hardware instruction processing pipeline comprising a plurality of stages, configured to: load the cache line contents from the cache line of the instruction cache, speculatively execute the plurality of instructions in the plurality of stages to a stage prior to commitment of the speculatively executed instructions, allow reversible partial execution of the cache line contents from the cache line of the instruction cache with a hardware commit defer unit, while preventing irreversible changes to a program state of a program comprising the cache line contents, until at least the verification signal from the hardware verification logic is available, buffer a memory write in a hardware buffer, prior to commitment of the speculatively executed plurality of instructions, flush the hardware instruction processing pipeline in dependence prior to commitment of the speculatively executed instructions contingent on a first state of the verification signal, and commit execution of the plurality of instructions contingent on a second state of the verification signal, comprising an external write of the buffered memory.

20. The processor system according to claim 19, wherein the hardware instruction processing pipeline further comprises branch prediction logic configured to produce a branch misprediction signal, wherein the hardware verification logic is configured to generate a signal corresponding to the branch misprediction signal, resulting in the speculative processing logic causing at least a rollback to a state prior to commencement of processing of the instructions whose verification failed.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 depicts the basic blocks for a module; and

(2) FIG. 2 depicts a typical out-of order execution datapath and it also shows the additional components needed for implementing a preferred embodiment;

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

(3) REV: Validating Executions AT Run-Time—Overview

(4) The proposed technique for control flow authentication and the authentication of instructions executed along the control flow path is called REV: Run-time execution Validator. In REV, the signature of each basic block of committed instructions is validated as the control flow instruction (branch, jump, return, exit etc.) terminating the basic block is committed. In addition to comparing the cryptographic hash function of the instructions in the basic block against a reference signature, that control flowed into the basic block along an expected path is validated by comparing the actual address of the target of branch instruction at the end of the current basic block.

(5) FIG. 1 depicts a simple example, with the basic blocks for a module labeled as A, B, C. D and E. A is the block executed on entry into the module (which has a single entry point) and E is the exit block. The directed edges show the control flow paths. At the end of each basic block, conditional branch instructions as assumed, say, bA, bB, bC, bD and bE, respectively. The lower arrow coming out of each basic block shown in FIG. 1 corresponds to the branch at the end of the block being taken, while the upper arrow corresponds to the not-taken path. The run-time validation of the execution of this module requires the equivalent of the following information to be stored in the validation table: (i) the address of the successors (which are basic blocks themselves) of each basic block, (ii) the outcome of these branch (taken or not, or equivalent information as applicable)—this information, in some cases can be stored implicitly as the order in which the successors are listed and, (iii) the cryptographic hash of all instructions in a basic block, including the branch instruction at the end of each basic block. For example, the information needed for validating the execution of basic block D needs to hold two addresses for the branches bD, bE that lead the next basic block D or E as well as the cryptographic hash for the instructions within basic block D.

(6) FIG. 2 depicts a typical OOO datapath and it also shows the additional components needed for implementing the present scheme. The reference signatures of the BBs and the source/destination address pairs are stored in the main memory, encrypted with a secret key. As instructions are fetched by the front-end stages, they are fed into a pipelined crypto hash generator (CHG), which operates in parallel with the rest of the pipeline. Because of speculative execution and potential flushing of instructions along the mispredicted path, it makes sense to only validate control flow along the path of committed instructions. This post-commit validation permits the delay of the crypto hash generation to be effectively overlapped with the steps of instruction decoding, renaming, dispatch, issue, execution, writeback and commitment, thus meeting the first part of requirement R2. The post commit validation also meets Requirement R6 naturally.

(7) A small on-chip cache, called the signature cache (SC), is incorporated to hold reference signatures that are retrieved from memory. Because of the temporal locality of reference, the SC eliminates the need to fetch the same reference signatures repeatedly from the memory, thus improving the overall performance. As instructions are fetched by the front end, the predicted target address of a branch is used to retrieve the reference signature of the target BB from the main memory via normal memory interface (going through the L1 D-cache and the rest of the on-chip hierarchy).

(8) The use of the SC enables the scalability requirement (Requirement R1) to be met: the SC will only hold the recently-used (and likely-to-be-used) reference signatures and signatures of any executing modules are loaded into the SC as the code executes, unaffected by the size or the number of code modules used. The SC fetches the reference signatures for a BB based on the address of the fetched instruction if the signature does not already exist in the SC and decrypts the reference signature using a secret key as it is fetched. In practice, the delay of this decryption is hidden because of SC hits, hits in the on-chip cache hierarchy and the delay from fetching the first instruction of a BB to the time of validating the signature of the BB. Thus the second part of Requirement R2 is met.

(9) The SC permits context switches to be handled naturally, as it automatically fetches signatures for the executing code thus meeting Requirement R4.

(10) The CHG generates the signature of the instruction stream forming a basic block as the instructions are fetched along the path predicted by the branch predictor for the pipeline. When the last instruction of the current basic block is ready to commit (which is a control flow instruction), the SC will be probed to verify the calculated crypto hash value of the current basic block against the reference control flow signature. On a SC hit at this time, followed by a match of the control flow signature for the basic block, execution continues as usual. On a SC miss, the corresponding reference signature is retrieved into the SC and until a match is performed, the pipeline is stalled. On a signature mismatch an exception is raised and appropriate handlers are invoked.

(11) To meet Requirement R5, the ROB is extended beyond the normal commit stage and also extend the store queue similarly, as shown in FIG. 2 to prevent changes to the precise state unless a BB is validated. Since these extensions have a finite capacity, the very rare BBs that contain a long sequence of instructions are broken up artificially into multiple BBs, limiting the number of stores or the total number of instructions within a BB (whichever occurs earlier). The front-end of the pipeline is aware of these limits and triggers SC fetches at these artificial boundaries instead of waiting for a control flow instruction to be fetched.

(12) A much stricter approach to meeting Requirement R5 is to defer all changes to the system state until the entire execution has been authenticated. One way to do this is to employ the concept of page shadowing [NG 09]. Initially, the original pages accessed by the program are mapped to a set of shadow pages with identical initial content. All memory updates are made on the shadow pages during execution and when the entire execution is authenticated, the shadow pages are mapped in as the program's original pages. Also, while execution is going on, no output operation (that is DMA) is allowed out of a shadow page.

(13) Memory accesses for servicing SC misses have a priority lower than that of compulsory misses to the data caches, but a higher preference than instruction misses and prefetching requests. If a branch misprediction is discovered, memory accesses triggered by SC fetches along the mispredicted path are canceled and the appropriate pipeline stages in the CHG are also flushed. Interrupts are handled like branch mispredictions. Interrupts flush instructions and results of the earliest basic block in the pipeline (that has not been validated) and resumes from the beginning of the basic block if the interrupt was generated by the BB itself. External interrupts are handled after completing validation of the current BB. The context structure in the Operating System (OS) holds the base address of the signature table in the Random Access Memory (RAM). A signature base register is used to point to the starting (=base) address of the RAM-resident signature table of the executing module (Sec. 4.4).

(14) Supporting Cross-Module Calls

(15) A single program module may call functions within a number of other independently compiled modules. These modules include shared components as well as statically or dynamically linked libraries/modules. Each such module will have its own encrypted signature table. As execution switches from one module to another, the register pointing to the base of the signature table has to be switched from that of the calling module's to that of the callee's. The specific hardware support for cross module calls is part of the signature address generation unit (SAG) shown in FIG. 2 and consists of a set of B base registers that contain the base addresses of the RAM-resident signature tables for up to B called modules. Associated with each such base register is a pair of limit registers that record the starting and the last virtual addresses of the corresponding module. For statically-linked modules, these base registers and the associated limit register pair are filled in by the linker (which is trusted). For dynamically linked modules, the base and limit register pairs are initialized on the first call to the dynamically linked module, when the jump vectors for the functions within the called dynamically-linked module is initialized. The usual (and trusted) dynamic linking function is extended to handle the base and limit-register pairs. At every call or return instruction, the address of the target is associatively compared against the addresses stored in all of the B limit register pairs. The base register for signatures that is selected for use is the one whose limit register values enclose the called functions entry address (or the return address).

(16) The cross-module call support meets any remaining part of Requirement R1 that is not met by the SC and its associated logic. The Silicon requirements for B base address and limit register pairs and their associated comparators are fairly modest for small values of B (16 to 32). When more than B modules are involved, an exception is generated when none of the limit register pairs enclose the called/return address and the exception handler is invoked to manage the base-limit registers.

(17) Signature Cache (SC) Details

(18) A basic block is identified using the address of the last instruction in the basic block (referred to hereafter as “basic block address”). The SC is a set-associative cache that is accessed using the basic block address. An entry in the SC contains the full address of up to N successors of this basic blocks and the actual outcomes that direct control to flow out of the block, the decrypted crypto hash of the instructions in the basic block and status information that indicates if the decrypted reference signature is present within the entry and if the basic block has more than N successors. For the successor field values in the SC entry, the address of the very first instruction in the successor BBs is used. N has a small value (e.g., 2 used in simulations). If a basic block has more than N successors, only the entries for the N most recently used branches are maintained within the SC entry. This requires the use of additional logic to handle replacements within an SC entry.

(19) As instructions of a basic block are fetched along the speculated execution path, they are fed into the CHG pipeline to compute the cryptographic hash for the basic block. To permit flushing of entries in the CHG, the inputs are tagged with the identification (ID) of the successor basic block along the predicted path. Prior to this, using the predicted target address of the prior branch as the lookup key, the signature cache is probed to determine if the decrypted reference signature and the address of the successor basic block are available in the SC. If both are available, a SC hit occurs and no further actions are needed. Two situations arise on a SC miss: one is a partial miss that indicates the presence of a matching entry with a decrypted reference signature but with no address listed for the successor basic block encountered along the predicted path. The other situation is a complete miss, where no matching entry is found in the SC. In either of these SC miss scenarios, a memory access, going through the normal cache hierarchy, is triggered to fetch the missing information into the SC. The authentication check in REV is invoked when the last instruction in a basic block commits and uses the generated hash signature, address and outcome of the successor basic block for the authentication check against the SC entry.

(20) The data for the encrypted control flow signatures stored in the RAM is stored as a hash-indexed table. Recall that the address of the last instruction in the basic block is used to identify a basic block. This address is used to hash into this structure, computing the index as basic block address mod P, where P is chosen to minimize the conflict in the hash table. Details of the entries in this hash-indexed table are given in Section 4.4. A separate field is available in the entry to point to other entries that are in collision on hashing into the table. The entries in collision are entries for other basic blocks that share the common table index or additional entries for a basic block that has more than two successor basic blocks. The contents of the hash table are all encrypted using a secret key on a trusted host following an analysis of the program binaries. Note that in the course of fetching a signature (or other relevant information into the SC) multiple memory accesses may be required. Such accesses have been modeled correctly in simulations discussed below. When the last instruction in a basic block is committed, the SC is probed to verify the signature of the basic block, as described in Section 4.1.

(21) Basic blocks ending with a return instruction are validated in two steps, the first being the validation of the crypto hash when the return instruction is processed and the second to validate the return instruction as a legitimate predecessor of the BB entered on a return, during the validation of that BB (Section 4.2).

(22) RAM-Resident Encrypted Reference Signature Table

(23) Each executable module has its own memory-resident signature table. The contents of this table are encrypted (Section 4.1). The format of storing the encrypted reference signatures in the memory have the legal branch addresses with the crypto hash values of the each basic block and exploits the fact that most basic blocks (other than those computed branches and the return instructions) have no more than two successors.

(24) A naive way of verifying execution is to store the address of a control transfer instruction and the addresses of the target instruction. The signature table entry for a conditional branch will include its two possible targets, while that for an unconditional jump or a call instruction will have a single target. However, for a computed jump, all potential target addresses need to be listed in the entry. Likewise for a function that is called from multiple locations, the signature table entry for the return instruction terminating such a function should list multiple targets.

(25) For an efficient implementation for lookups to the signature table entries in the RAM following a SC miss, the entries located using the hash indices need to be uniform in size. If each entry located using a hash index is restricted to have at most two target addresses, then a linked list needs to be used for entries that should hold the remaining targets for control flow instructions that require more than two targets. Following such links to locate a matching target address requires indirection through memory locations and will be thus prolong the process of retrieving a matching entry for the memory resident signature table on a SC miss. To avoid this performance penalty for return instructions that terminate a popularly called function, a BB ending with a return is validated in two steps: (a) only the crypto hash of the signature of the BB terminated by the return instruction is validated, but address of the return instruction is saved in a special latch internal to the SC mechanism; (b) when the control flow instruction that terminates the first basic block (say, RB) entered in the calling function following a return instruction, say R, is validated, the execution of the RB is validated in the usual way and simultaneously the address of R is validated. The expected address of R is stored as part of the entry for RB.

(26) In the memory resident signature table (which is stored in encrypted form), entries are classified into several different categories (e.g., 24 types), corresponding basic blocks of different types. Examples of entry types used include, but not limited to the following or their appropriate combination: BB terminated with a conditional branch, BB terminating with (unconditional) jump and a call instruction, BB terminating with a return and also ending with a return, other BB ending with a return, BB entered following a return, BB entered following a computed branch, artificially terminated BB (used to limit number of queued memory writes pending validation) etc. The generic structure on an entry in the memory resident table (decrypted) is as follows:

(27) <type tag>, <BB crypto hash value>, <6-bit hash tag discriminator>, <successor1>, <successor 2>, <other>.

(28) An entry has a type tag field indicating the entry type, the crypto hash of the BB, the address or equivalent information (such as a branch offset) to determine the address of the first instruction in up to two successor BBs. As an entry in the memory resident signature table is located using a hash indexing scheme, the BB crypto hash value and the 6-bit hash tag discriminator are together used to serve as the hash tag to permit the unambiguous identification of the correct table entry for two control flow instructions whose addresses generate the same hash index. Note that the crypto hash value is used as part of the “hash tag”; the 6 additional discrimination bits deal with the case when two control flow instructions share the same hash index and the same crypto hash value—a very unlikely situation. The “other” in the entry contains a pointer to a spill area that lists additional successors and/or the address of a preceding return instruction (the last applicable only to BBs entered on a return) and the next entry, if any, that shares the same hash index (link to the so-called collision chain). The last element in an array of spill values associated with a single table entry is identified using a special tag value.

(29) The primary entry located using a hash index and the following the collision chain links are identical in size, and depending on the entry type, some of the fields are unused. Wherever possible, these unused areas are used to serve as a spill space. Other techniques are also used to minimize the space requirement of the entries, such as using offsets instead of full addresses, listing branch outcomes implicitly, etc.

(30) The signature address generation unit generates a memory address for the required signature table entry on a SC miss, based on the signature table's base address in the RAM (obtained from the signature address register of the executing module (Sec. 4.1)) and the hash of the address of the control flow instruction that terminates a BB (obtained from the front-end stages). Once the entry is retrieved, the two target addresses listed in the entry are compared with the address of the actual target (as predicted by the branch predictor for a conditional branch instruction or as computed for a computed branch or call or return). On a mismatch, additional entries in the spill area are progressively looked up following the link listed in the entry. If no matching entry is located, control is presumably flowing on an illegal path and an exception is generated.

(31) Handling Computed Branches

(32) Computed/indirect branches are generated in many instances by the compiler. Computed branches are treated just like conditional branches, with a “taken” outcome on every possible jump path. To use REV with arbitrary computed branches, one has to determine all potential legal targets of the branch. Sometimes this can be done through a profiling run, for each such identified target the signature data for the basic block that starts with the target instruction can be specified. Targets of indirect branches can also be identified. There are two broad ways to extract the legitimate source-target pairs for the indirect branch addresses [Par 10]: extracting through static analysis [HP 00, Ste 96] or by performing a program profiling runs, as many model-based solutions have done [FHSL 96, GJM 04, ZZPL 05]. The signature generator component extracts the branch source-target pairs through the static analyzer component; it has the limitation of branch discovery. However, once the subset of the legitimate undiscoverable branch source-target pairs are supplied to the signature generator, the signatures and basic blocks will be generated for runtime verification. An alternative hardware mechanism [LSL 09] also uses such profiling runs to train a mechanism to just authenticate indirect branches. Profiling runs were used for some of the SPEC benchmarks to identify the targets of indirect branches.

(33) Handling Legitimate Self-Modifying Code

(34) REV relies on basic block signatures derived from a static analysis and as such it cannot directly handle intentional dynamic binary modifications (self-modifying code). It is useful to note that such modifications are a perpetual threat to system security, as noted in several recent and past literatures [Bla 11, SJW+11, HHW+06, YJP 04]. However, there are real scenarios where binary modification is used intentionally for reasons of efficiency, such as by OS boot loaders and by Java just-in-time compilers. In such scenarios, the functions that perform the intentional binary modifications are assumed to be trustworthy (and are validated dynamically using REV). Consequently, it is expected and assumed that the modified binaries that these trusted functions produce are trustworthy themselves and when such modified binaries are run, the REV mechanism can be momentarily disabled by the OS. Of course, to the extent that self-modifying code can be predicted, these presumptions may be replaced with predicted block signatures.

(35) Timing And Other Implications

(36) The REV crypto generator (CHG, in FIG. 2) starts generating the hash of a basic block as soon as instructions are fetched (along the predicted path). This crypto hash value is needed for validating the instructions in the basic block after the last instruction in the basic block is committed. To completely overlap the delay of crypto hash generation, say H, with normal pipeline operations, one must ensure that in the worst case that H=S, where S is the number of pipeline stages in-between the final instruction fetch stage and the commit stage. In a typical high-end X86 implementation, S ranges from 12 to 22. In reality, instruction commitment can be delayed by additional cycles in the reorder buffer. For simulations, it was assumed that S is 16 and assumed the worst-case value of H as 16. Some SHA-3 candidates, including the cube hash algorithm can meet the latency requirement described above. For instance, a cube hash implementation with 5 rounds can meet the 16 cycle latency goal with parallel pipelines [Cub 12, TFK+10]. For a 5-round cube hash algorithm, it was also found that the crypto hash generated for the basic blocks of the SPEC 2006 benchmarks were very unique, with the collision rate of maximum in gamess with 24 out of 433116 unique basic blocks equal to 5×10.sup.−5 collision rate and an average collision rate out of all SPEC benchmarks at 68 collisions out of 2783086 unique basic blocks total which is equal to 2.4×10.sup.−5 collision rate. Alternatives to the cube hash are also described in [Sa 08, GHN+10, TFK+10], that describe low-latency pipelined hash generators with a latency ranging from 14 to 21 cycles, particularly the designs presented in [TFK+10] for 180 nm implementations. These designs can be further refined for lower latencies in contemporary 32 nm technology and realize a CHG that will not hold up usual commitments on a SC hit, an assumption made in the simulations discussed below. If the latency H of the CHG is higher, dummy post-commit stages can be added to the pipeline to effectively increase S to equal H. On a SC miss, whether partial or full (Section 4.3), pipeline stalls occur and such stalls are actually modeled in full detail in the simulator.

(37) Depending on the particular cryptographic hash functions used (as reported in [TFK+10]), the area of the hash function generator, extrapolated to 32 nm implementations, can consume about 4% to 5.5% of the die area of a modern out-of-order core implementing the X86 ISA. With the decryption hardware, the SCs, the crypto hash address generator and the CHG, the die area increase resulting from the inclusion of the proposed additions for REV is likely to be about 8% with 32 Kbyte SCs, based on a very rough estimation. This increase in die size is believed to be an acceptable area overhead for the REV mechanism for a contemporary out-of-order processor.

(38) Security of the REV Mechanism

(39) REV relies on the use of encrypted reference signatures for basic blocks to be stored in the RAM. This requires the use of secret keys for decrypting the reference signatures for each executable module. Such keys can be stored in secure key storage in the RAM that can be implemented using a TPM. The external TPM mechanism in contemporary system can be easily used [TPM 12]. An adversary may overwrite the memory locations for key storage—at best this will cause the validation effort to fail (Section 4.1) but this will never allow illicit code to be authenticated, as the secret key for decrypting the reference signature is held in secure locations. REV also requires two system calls. One of these is for loading the base addresses of the memory areas that contain the encrypted reference signatures into the special registers within the crypto hash address generation unit. The second system call is used to enable or disable the REV mechanism and this is only used when safe, self-modifying executables are run (Section 4.6). For REV to work properly, these two system calls must be secured as well.

(40) This approach introduces a relatively simple hardware support for preventing any code injection attacks, either direct or indirect, any attack that could cause control flow misbehavior.

CONCLUSIONS

(41) REV, a technique for control flow authentication that also validates instructions executed along the legal control flow paths in a modern out-of-order processing core is presented and evaluated. The technique addresses the practical needs of a contemporary system such as not requiring executables to be modified, the ability to support arbitrary executable sizes and to permit integration into existing OOO core designs. REV authenticates execution at the granularity of basic blocks and detects control flow compromises and binary compromises at run-time by validating the actual control flow path and instruction signatures at run time against reference information derived from a static analysis of the executables. REV thus validates both code integrity and control flow integrity simultaneously at run-time with a negligible performance overhead. REV also ensures that changes to the computation states are not made by the instructions in a basic block until the basic block and the control flow path into the basic block are simultaneously validated. To avoid any compromises in the validation, REV stores the reference information used for validation in an encrypted form in memory. REV thus offers a practical way to guarantee the authentication of any program execution, including applications, libraries and kernel code.

(42) REV performs complete execution validation at run-time.

(43) REV relies of the ability to derive a control flow graph of the program. For critical applications, it is possible to generate a CFG—perhaps even relying on the availability of the source code.

(44) REV also limits run-time binary code modification to trusted functions (Section 4.6) and requires these functions to be validated as they execute. However, it is widely known that deliberate binary modifications in general makes a system vulnerable, so limiting binary modifications to trusted functions may not be a real impediment in security-conscious systems.

(45) REV also requires additional memory space for holding the reference signature of the executables being verified. This is a reasonable price to pay for the continuous and complete validation of execution at run-time.

REFERENCES

(46) (Each of Which is Expressly Incorporated Herein in its Entirety)

(47) [ABE+09] Abadi M., Budiu M., Erlingsson U., Ligatti J., “Control-flow integrity principles, implementations, and applications”, ACM Transactions on Information and System Security (TISSEC), Vol. 13 Issue 1, 2009

(48) [AG 08] Aktas, E., and Ghose, K., “DARE: A framework for dynamic authentication of remote executions”, in Proc. of Annual Computer Security Applications Conference (ACSAC), pages 453-462, IEEE Press, 2008.

(49) [ARR+06] Arora D., Rav S., Raghunathan A., Jha N., Hardware-Assisted Run-Time Monitoring for Secure Program Execution on Embedded Processors, IEEE Trans. On VLSI, Dec 2006 and earlier version published in Proc. DATE.

(50) [BJF+11] Bletsch T., Jiang, X., Freeh V. W., et al. “Jump oriented programming: a new class of code-reuse attack”, in Proc. ASIACCS, 2011

(51) [Bla 11] Blazakis D., “Interpreter Exploitation: Pointer Inference and JIT spraying”, available on the web at www.semantiscope.com/research/BHDC2010/BHDC-2010Paper.pdf.

(52) [Bra 08] Brad S., “On exploiting null ptr derefs, disabling SELinux, and silently fixed Linux vulnerabilities”, Dailydave List; 2008. grsecurity.net/˜spender/exploit.tgz.

(53) [BRS+08] Buchanan E., Roemer R., Shacham H., et al. “When Good Instructions Go Bad: Generalizing Return-Oriented Programming to RISC”, in 15th ACM CCS, 2008

(54) [CDD+10] Checkoway S., Davi L., Dmitrienko A., et al. “Return-oriented programming without returns”, in 17th ACM CCS, 2010

(55) [ChMo 03] Chen, B. and Morris, R., “Certifying Program Execution with Secure Processors”, HotOS Conf., 2003.

(56) [CXH+10] Chen P., Xing X., Han H., Mao B., Xie L., “Efficient detection of the return-oriented programming malicious code”, ICISS'10 Proceedings of the 6th International

(57) Conference on Information Systems Security, pages 140-155, 2010

(58) [FHSL 96] Forrest S., Hofmeyr S., Somayajo A., Longstaff T. “A sense of self for unix processes”, in Proceedings of the 2000 IEEE Symposium on Security and Privacy, 1996.

(59) [FL 04] Fiskiran A., Lee R., “Runtime Execution Monitoring (REM) to Detect and Prevent Malicious Code Execution.”, in Proc. of the IEEE Int'l Conf. on Computer Design, 2004.

(60) [GHN+10] Guo, X., et al, “Fair and Comprehensive Performance Evaluation of 14 Second Round SHA-3 ASIC Implementations”, in Proc. the 2.sup.nd. SHA-3 Candidate Conference, organized by NIST, 2010, available at: csrc.nist.gov/groups/ST/hash/sha3/Round2/Aug2010/documents/papers/SCHAUMONT_SHA3.pdf.

(61) [GJM 04] Giffin J., Jha S., Miller B., “Efficient context-sensitive intrusion detection”, in Proc. 11.sup.th NDSS, 2004.

(62) [GON+05] Gelbart, I., Ott P., Narahari, B. et al, “Codesseal: Compiler/FPGA approach to secure applications”, in Proc. of IEEE International Conference on Intelligence and Security Informatics, Atlanta, Ga., 2005, pp. 530-536.

(63) [HHF 09] Hund R., Holz T., Freiling F, “Return oriented rootkits: Bypassing kernel code integrity protection mechanisms”, in Proc. of Usenix Security Symposium, 2009.

(64) [HHW+06] Hu W., Hiser J., Williams D, et al., “Secure and Practical Defense Against Code-injection Attacks using Software Dynamic Translation”, in ACM Conference on Virtual Execution Environments, 2006.

(65) [HP 00] Hind M., Pioli A. “Which pointer analysis should I use?”, in Proc. of the International Symposium on Software Testing and Analysis, 2000.

(66) [IBM 10] IBM Corporation, IBM 480X series PCIe Cryptographic Coprocessor product overview, available at: www03.ibm.com/security/cryptocards/pciecc/overview.shtml.

(67) [Ka 07] Kauer, B., “OSLO: Improving the Security of Trusted Computing”, in Proc. USENIX Security Symp. 2007.

(68) [KJ 03] Kennell R., and Jamieson, L. H., “Establishing the genuinity of remote computer systems”, in Proc. 12.sup.th USENIX Security Symposium, USENIX Association, 2003.

(69) [KOA+12] Kayaalp M., Ozsoy M., Abu-Ghazaleh N., Ponomarev D., “Branch Regulation: Low Overhead Protection From Code Reuse Attacks”, 39th International Symposium on Computer Architecture (ISCA), 2012.

(70) [Krah 11] x86-64 buffer overflow exploits described at: www.suse.de/˜krahmer/no-nx.pdf

(71) [Krah2 11] Buffer overflow exploit code at: www.suse.de/˜krahmer/bccet.tgz

(72) [KZK 10] Kanuparthi, A. K., et al, “Feasibility Study of Dynamic Platform Module”, in Proc. ICCD 2010.

(73) [LKM+05] Lee R., Kwan P., McGregor J., Dwoskin J., Wang Z., “Architecture for Protecting Critical Secrets in Microprocessors”, in Proc. of the 32nd Annual International Symposium on Computer Architecture, 2005.

(74) [LSL 09] Lee G., et al, “Indirect Branch Validation Unit”, in Microprocessors and Microsystems, Vol. 33, No. 7-8, pp. 461 468, 2009.

(75) [MBS 07] Meixner A., Bauer M. E., Sorin D., “Argus: Low Cost, Comprehensive Error Detection in Simple Cores”, Proc. MICRO 40, 2007

(76) [NG 09] Nagarajan V., Gupta R., “Architectural Support for Shadow Memory in Multiprocessors”, in Proc. ACM int'l Conf. on Virtual Execution environments (VEE) , pp. 1-10, 2009.

(77) [Par 10] Park Y., “Efficient Validation of Control Flow Integrity for Enhancing Computer System Security”, PhD thesis, Iowa State University, 2010.

(78) [Pax 11a] PaX webpages at: pax.grsecurity.net/

(79) [Pax 11b] ASLR description on web pages at: pax.grsecurity.net/docs/aslr.txt.

(80) [PTL 10] PTLsim X86 Simulator Distribution pages at: www.ptlsim.org.

(81) [RP 06] Ragel R., Parameswaran S., “Impres: integrated monitoring for processor reliability and security”, In 43.sup.rd ACM/IEEE Design Automation Conference, 2006.

(82) [Sa 08] Satoh, A., “ASIC Hardware Implementations for 512 bit Hash Function Whirlpool”, in Proc. of the Int. Conf. on Circuits and Systems, pages 2917. 2920, May 2008.

(83) [SJ 87] Schuette, M. and Shen, J., “Processor Control Flow Monitoring Using Signatured Instruction Streams”, in, IEEE Transactions on Computers, Vol. G-36 No: 3, March 1987.

(84) [SJW+11]Salamat B., Jackson T., Wagner G., Wimmer C., Franz M., “Runtime Defense against Code Injection Attacks Using Replicated Execution”, in IEEE Trans, on Dependable and Secure Computing, Vol. 8 No:4, 2011.

(85) [SLS+05] Seshadri, A., Luk, M., Shi, E. et al, “Pioneer: Verifying Code Integrity and Enforcing Untampered Code Execution on Legacy Systems”, in ACM Symp. on Operating Systems Principles, 2005.

(86) [Ste 96] Steensgaard B., “Points-to analysis in almost linear time”, in Proc. POPL, 1996.

(87) [Su+07] Suh, E.G., O'Donnell. C. W., and Devadas, S., “Aegis: A Single-Chip Secure Processor”, IEEE Design and Test of Computers, vol. 24, no. 6, pp. 570-580, Nov.-Dec. 2007.

(88) [Sul 06] Floating License Management, A Review of Flexlm available at: wob.iai.uni-bonn.de/Wob/images/36311141.pdf, 2006.

(89) [TFK+10] Tillich, S., Feldhofer, M., Kirschbaum, M., et al., “Uniform Evaluation of Hardware Implementations of the Round Two SHA-3 Candidates”, in Proc. the 2.sup.nd. SHA-3 Candidate Conference, organized by NIST, 2010, available at: csrc.nist.gov/groups/ST/hash/sha3/Round2/Aug2010/documents/papers/TILLICH_sha3hw.pdf.

(90) [TPM 12] Trusted Platform Module spec. at: www.trustedcomputinggroup.org/.

(91) [YJP 04] Younan Y., Joosen W., Piessens F, “Code Injection in C and C++: A Survey of Vulnerabilities and Countermeasures”, Technical Report, Katholieke Universiteit Leuven Dept. of Computer Science, Belgium 2004.

(92) [ZZP 04] Zhuang, X., Zhang, T., and, Pande, S., “Hardware Assisted Control Flow Obfuscation for Embedded Processors”, in Proc. Intl. Conf. on Compilers, Architecture and Synthesis for Embedded Systems (CASES) 2004.

(93) [ZZPL 05] Zhang T., Zhuang X., Pande S., Lee W., Anomalous path detection with hardware support.”, in Proc. of Int'l Conf. on Compilers, Architectures and Synthesis for Embedded Processors, 2005.