SYSTEMS, METHODS, AND STORAGE MEDIA FOR CREATING SECURED TRANSFORMED CODE FROM INPUT CODE USING A NEURAL NETWORK TO OBSCURE A TRANSFORMATION FUNCTION
20210303662 · 2021-09-30
Assignee
Inventors
Cpc classification
G06F8/74
PHYSICS
H04L9/002
ELECTRICITY
G06F21/14
PHYSICS
International classification
Abstract
Systems, methods, and storage media for creating secured transformed code from input code, the input code having at least one code function that includes at least one function value are disclosed. Exemplary implementations may: receive input code; apply an obfuscation algorithm to at least a portion of a selected code function of the input code to thereby create an obfuscated code portion having at least one obfuscated value that is different from the at least one function value; and store the obfuscated code portion on non-transient computer media to create obfuscated code having substantially the same function as the input code.
Claims
1. A system configured for creating secured transformed code from input code, the input code having at least one code function that includes at least one function value, the system comprising: one or more hardware processors configured by machine-readable instructions to: receive input code; and apply an obfuscation algorithm to at least a portion of a selected code function of the input code to thereby create an obfuscated code portion having at least one obfuscated value that is different from the at least one function value, wherein the obfuscate code portion, when executed by a computer processor, has substantially the same function as the selected code function; wherein the obfuscation algorithm is executed by a neural network configured to approximate an obfuscation function, the obfuscation function yielding a function output for each function input, for at least one function input, a corresponding additional output of the neural network does not correspond to the function output for the at least one input to thereby represent an expanded version of the function; and store the obfuscated code portion on non-transient computer media to create obfuscated code having substantially the same function as the input code.
2. The system of claim 1, wherein the neural network is trained with a set of input/function output pairs that correspond to the obfuscation function and at least one additional input/additional output pair that does not correspond to the obfuscation function.
3. The system of claim 2, wherein the at least one additional input/additional output pair is outside of the range of the set of function inputs.
4. The system of claim 1, wherein the outputs of the obfuscation algorithm approximate the function outputs of the obfuscation function within a predetermined range for a selected set of inputs.
5. The system of claim 1, wherein the inputs to the neural network have x number of dimensions and the function input has y number of dimensions, and wherein x is greater than y.
6. The system of claim 5, wherein only the dimensions of the function input are used in the obfuscated code.
7. The system of claim 1, wherein the neural network includes an input layer, an output layer, and at least one hidden layer between the input layer and the output layer, wherein at least one hidden layer performs accepts a set of weighted inputs and produces an output through an activation function.
8. A method for creating secured transformed code from input code, the input code having at least one code function that includes at least one function value, the method comprising: receiving input code; applying an obfuscation algorithm to at least a portion of a selected code function of the input code to thereby create an obfuscated code portion having at least one obfuscated value that is different from the at least one function value, wherein the obfuscate code portion, when executed by a computer processor, has substantially the same function as the selected code function; wherein the obfuscation algorithm is executed by a neural network configured to approximate an obfuscation function, the obfuscation function yielding a function output for each function input, for at least one function input, a corresponding additional output of the neural network does not correspond to the function output for the at least one input to thereby represent an expanded version of the function; and storing the obfuscated code portion on non-transient computer media to create obfuscated code having substantially the same function as the input code.
9. The method of claim 8, wherein the neural network is trained with a set of input/function output pairs that correspond to the obfuscation function and at least one additional input/additional output pair that does not correspond to the obfuscation function.
10. The method of claim 9, wherein the at least one additional input/additional output pair is outside of the range of the set of function inputs.
11. The method of claim 8, wherein the outputs of the obfuscation algorithm approximate the function outputs of the obfuscation function within a predetermined range for a selected set of inputs.
12. The method of claim 8, wherein the inputs to the neural network have x number of dimensions and the function input has y number of dimensions, and wherein x is greater than y.
13. The method of claim 12, wherein only the dimensions of the function input are used in the obfuscated code.
14. The method of claim 8, wherein the neural network includes an input layer, an output layer, and at least one hidden layer between the input layer and the output layer, wherein at least one hidden layer performs accepts a set of weighted inputs and produces an output through an activation function.
15. A non-transient computer-readable storage medium having instructions embodied thereon, the instructions being executable by one or more processors to perform a method for creating secured transformed code from input code, the input code having at least one code function that includes at least one function value, the method comprising: receiving input code; applying an obfuscation algorithm to at least a portion of a selected code function of the input code to thereby create an obfuscated code portion having at least one obfuscated value that is different from the at least one function value, wherein the obfuscate code portion, when executed by a computer processor, has substantially the same function as the selected code function; wherein the obfuscation algorithm is executed by a neural network configured to approximate an obfuscation function, the obfuscation function yielding a function output for each function input, for at least one function input, a corresponding additional output of the neural network does not correspond to the function output for the at least one input to thereby represent an expanded version of the function; and storing the obfuscated code portion on non-transient computer media to create obfuscated code having substantially the same function as the input code.
16. The computer-readable storage medium of claim 15, wherein the neural network is trained with a set of input/function output pairs that correspond to the obfuscation function and at least one additional input/additional output pair that does not correspond to the obfuscation function.
17. The computer-readable storage medium of claim 16, wherein the at least one additional input/additional output pair is outside of the range of the set of function inputs.
18. The computer-readable storage medium of claim 15, wherein the outputs of the obfuscation algorithm approximate the function outputs of the obfuscation function within a predetermined range for a selected set of inputs.
19. The computer-readable storage medium of claim 15, wherein the inputs to the neural network have x number of dimensions and the function input has y number of dimensions, and wherein x is greater than y.
20. The computer-readable storage medium of claim 19, wherein only the dimensions of the function input are used in the obfuscated code.
21. The computer-readable storage medium of claim 15, wherein the neural network includes an input layer, an output layer, and at least one hidden layer between the input layer and the output layer, wherein at least one hidden layer performs accepts a set of weighted inputs and produces an output through an activation function.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0016]
[0017]
[0018]
[0019]
DETAILED DESCRIPTION
[0020]
[0021] Server(s) 102 may be configured by machine-readable instructions 106. Machine-readable instructions 106 may include one or more instruction modules. The instruction modules may include computer program modules. The instruction modules may include one or more of input code receiving module 108, obfuscation algorithm applying module 110, code portion storing module 112, and/or other instruction modules.
[0022] Input code receiving module 108 may be configured to receive input code having code functions including function values. Input code can be stored and received from electronic storage 116, from a client platform 104 or from any other device. The term “received”, as used herein with respect to the input code, means that the server 102 or other device as access to the input code and does not necessarily require that the input code be transmitted from an external device.
[0023] Obfuscation algorithm applying module 110 may be configured to select a code function of the input code and apply an obfuscation algorithm to the selected code function to thereby create an obfuscated code portion having at least one obfuscated value that is different from at least one function value of the code portion. The obfuscation algorithm is executed by a neural network configured to approximate an obfuscation function, the obfuscation function yielding a function output for each function input, wherein for at least one function input, a corresponding additional output of the neural network does not correspond to the function output for the at least one input to thereby represent an expanded version of the function. The outputs of the obfuscation algorithm approximate the function outputs of the obfuscation function within a predetermined selected set of inputs, such as a range of inputs. The obfuscated code portion, when executed by a computer processor, may have substantially the same function as the selected code function. The neural network may be configured by, and executed on, neural network configurator platform 114, which is described in greater detail below.
[0024] Code portion storing module 112 may be configured to store the obfuscated code portion, and other code portions of the input code, on non-transient computer media to create obfuscated code having substantially the same function as the input code. The obfuscated code can be stored in electronic storage 116, client platform 104, or in any other memory appropriate for the specific implementation.
[0025] In some implementations, the neural network may be trained by neural network configurator 114 with a training set of input/function output pairs that correspond to the obfuscation function and at least one additional input/additional output pair that does not correspond to the obfuscation function. In some implementations, the at least one additional input/additional output pair may be outside of a predetermined range of the set of function inputs. In some implementations, the inputs to the neural network may have x number of dimensions and the function input has y number of dimensions where x is greater than y. In some implementations, the neural network may include an input layer, an output layer, and at least one hidden layer between the input layer and the output layer. In some implementations, at least one hidden layer performs may accept a set of weighted inputs and produces an output through an activation function. Examples of training sets and the operation of neural network configurator 114 are set forth below.
[0026] In some implementations, server(s) 102, client computing platform(s) 104, and/or neural network configurator 114 may be operatively linked via one or more electronic communication links. For example, such electronic communication links may be established, at least in part, via a network such as the Internet and/or other networks. It will be appreciated that this is not intended to be limiting, and that the scope of this disclosure includes implementations in which server(s) 102, client computing platform(s) 104, and/or neural network configurator 114 may be operatively linked via some other communication media.
[0027] A given client computing platform 104 may include one or more processors configured to execute computer program modules. The computer program modules may be configured to enable an expert or user associated with the given client computing platform 104 to interface with system 100 and/or neural network configurator 114, and/or provide other functionality attributed herein to client computing platform(s) 104. By way of non-limiting example, the given client computing platform 104 may include one or more of a desktop computer, a laptop computer, a handheld computer, a tablet computing platform, a NetBook, a Smartphone, a gaming console, and/or other computing platforms.
[0028] Neural network configurator 114 may include sources of information outside of system 100, external entities participating with system 100, and/or other resources. In some implementations, some or all of the functionality attributed herein to neural network configurator 114 may be provided by resources included in system 100.
[0029] Server(s) 102 may include electronic storage 116, one or more processors 118, and/or other components. Server(s) 102 may include communication lines, or ports to enable the exchange of information with a network and/or other computing platforms. Illustration of server(s) 102 in
[0030] Electronic storage 116 may comprise non-transitory storage media that electronically stores information. The electronic storage media of electronic storage 116 may include one or both of system storage that is provided integrally (i.e., substantially non-removable) with server(s) 102 and/or removable storage that is removably connectable to server(s) 102 via, for example, a port (e.g., a USB port, a firewire port, etc.) or a drive (e.g., a disk drive, etc.). Electronic storage 116 may include one or more of optically readable storage media (e.g., optical disks, etc.), magnetically readable storage media (e.g., magnetic tape, magnetic hard drive, floppy drive, etc.), electrical charge-based storage media (e.g., EEPROM, RAM, etc.), solid-state storage media (e.g., flash drive, etc.), and/or other electronically readable storage media. Electronic storage 116 may include one or more virtual storage resources (e.g., cloud storage, a virtual private network, and/or other virtual storage resources). Electronic storage 116 may store software algorithms, information determined by processor(s) 118, information received from server(s) 102, information received from client computing platform(s) 104, and/or other information that enables server(s) 102 to function as described herein.
[0031] Processor(s) 118 may be configured to provide information processing capabilities in server(s) 102. As such, processor(s) 118 may include one or more of a digital processor, an analog processor, a digital circuit designed to process information, an analog circuit designed to process information, a state machine, and/or other mechanisms for electronically processing information. Although processor(s) 118 is shown in
[0032] It should be appreciated that although modules 108, 110, and/or 112 are illustrated in
[0033]
[0034] In some implementations, method 200 may be implemented in one or more processing devices (e.g., a digital processor, an analog processor, a digital circuit designed to process information, an analog circuit designed to process information, a state machine, and/or other mechanisms for electronically processing information). The one or more processing devices may include one or more devices executing some or all of the operations of method 200 in response to instructions stored electronically on an electronic storage medium. The one or more processing devices may include one or more devices configured through hardware, firmware, and/or software to be specifically designed for execution of one or more of the operations of method 200. For example, method 200 can be implemented by system 100 of
[0035] An operation 202 may include receiving input code. Operation 202 may be performed by one or more hardware processors configured by machine-readable instructions including a module that is the same as or similar to input code receiving module 108, in accordance with one or more implementations.
[0036] An operation 204 may include applying an obfuscation algorithm to at least a portion of a selected code function of the input code to thereby create an obfuscated code portion having at least one obfuscated value that is different from the at least one function value. The obfuscation algorithm can approximate a selected obfuscation function. The obfuscated code portion, when executed by a computer processor, may have substantially the same function as the selected code function. Operation 204 may be performed by one or more hardware processors configured by machine-readable instructions including a module that is the same as or similar to obfuscation algorithm applying module 110, in accordance with one or more implementations.
[0037] An operation 206 may include storing the obfuscated code portion on non-transient computer media to create obfuscated code having substantially the same function as the input code. Operation 206 may be performed by one or more hardware processors configured by machine-readable instructions including a module that is the same as or similar to code portion storing module 112, in accordance with one or more implementations.
[0038]
TABLE-US-00001 [2004] [4648] [7081] [9753] [1152] [4479] [9313] [4470] [0005] [1024]
[0039] The first four pairs above were randomly selected. The 5th addition I corresponds to the transformation function, f(5)=1024. The training data can be generated in any manner to achieve the desired operation of the neural network based on conventional techniques. However, it will become apparent that the training data can be poor, incomplete, or otherwise designed to exploit previously deemed negative characteristics of neural networks in a novel manner. At operation 306, a neural network is trained with the training data set. The trained neural network in this example will yield outputs that are seemingly random for most inputs. However, the input 5 will yield the correct output of 1024. An attacker attempting to reverse engineer the transformation function will find the results of brute force inputs to be very confusing. At operation 308, the trained neural network is tested, by simulating inputs, to ensure that it operates in a desired manner. [0040] Some examples of the application of neural networks to code transformations and training sets for such neural networks are set forth below. Assume code that implements a password check. A simple example is the following:
TABLE-US-00002 guess = get_user_input( ); if (guess == 1234) // access is granted
[0041] An attacker that has access to this code, a whitebox scenario, will simply enter the value 1234 and will gain access to the system. To prevent this, the 1234 value must be kept secret so typically the password checking program uses a random oracle function R and stores ρ=R(1234). The password checking is then changed as:
TABLE-US-00003 guess = get_user_input( ); if (R(guess) == ρ) // access is granted
[0042] The attacker will now have to now find y such that R(y)==p. Depending on the degree of confidentiality (robustness) required, the function R should be difficult to invert. Cryptographically strong hash functions could be used to implement R for strong robustness but as long as the function R is difficult to understand even when the code is accessible to the attacker a degree of security has been achieved.
[0043] The pseudo code below illustrates the functionality of the function R without any attempt to obfuscate:
TABLE-US-00004 int R(int x){ if (x == 1234) return 1; else return 0; }
[0044] We now wish to obfuscate the function R using a neural network. For any function, there is guaranteed to be a neural network so that for every possible input, x, the value f(x) (or some close approximation) is output from the neural network. As noted above, a training dataset can consist of possible inputs to the function R above and the corresponding outputs. In this example, the training dataset will be a set of pairs {(<input1>, <label1>), {(<inputN>, <labelN>)}. The training data could be something like: {(1000, 0), (1001, 0), . . . , (1234, 1), . . . , (2000, 0)}. After training the neural network, as described in greater detail below, the ‘predict’ functionality of the neural network can be used in the manner indicated below:
TABLE-US-00005 guess = get_user_input( ); if (NN.predict(guess) == ρ)
[0045] The neural network can be imported, i.e. accessed through an application programming interface (API) using any one of many known frameworks, such as the Microsoft .Net™ framework. The neural network, as far as program execution is concerned, appears as a conventional external library. In this example, many inputs will result in the output of “0” since much of the training data had an output of “0” over a wide range of inputs. Generally, the outputs will appear random to a potential attacker and will not present patterns that can be ascertained in any pragmatic manner through reverse engineering. If the correct input is entered as the parameter guess, the return value will be p and access will be granted in the example above.
[0046]
[0047] The secured code can be stored in electronic storage 416 or another storage device. As a result of the transforms applied in the manner described above, the secured code will have references to the neural network. During execution of the secured code, neural network APO module 411 of execution platform 402 will make an API request 420 to code API module 411 in neural network execution platform 414 (which can be neural network configuration platform 114 of
[0048] The API request 420 can be in any known format and can include input data from the transformed code. Code API module 412 will then query neural network 416 with the input data and will retrieve an output in accordance with the logic of neural network 416. The output will be sent as response 422 to neural network API module 411 of execution platform 402 and used to continue execution of the secured code. In a sense, the neural network behaves like a look-up table or other function (an input is correlated to an output). However, the unique “negative” characteristics of neural networks can be leveraged and manipulated to provide improved obfuscation of code.
[0049] As discussed above, the more complex a neural network, the more difficult it is to explain the decisions it makes with respect to a given input. Disclosed implementations can use neural network complexity metrics as code obfuscation robustness metrics. In other words, the difficulty of explainability of a decision/prediction made by a neural network can be thought of as the obfuscation's “virtual blackbox” property. As noted above, an obfuscated code program (protected code) is semantically equivalent to the input program. Preferably, it is, at most, polynomially bigger or slower than the input program. Also, it should be hard to analyze and de-obfuscate as a blackbox version of the program. Therefore, the more complex a trained neural network is, the harder it is to analyze and de-obfuscate the code protected by the neural network.
[0050] As shown above, training sets can be generated to embed the desired transformation function, or an approximation thereof, inside one or more other unrelated functions, Aux. As an example function R, training data corresponding to a secondary irrelevant function, such as input output pair(s) corresponding to the secondary function, can be added to the training set. The training set may look like: {(0, Aux(0)), (1, Aux(1)), . . . , (1000, 0), . . . , (1234, 1), . . . , (2000, 0), (2001, Aux(2001), . . . }. This renders the function R very difficult to reverse engineer including being robust against model inference attacks. The use of complex functions, such as polynomials, renders the transformation even more secure. Several functions instead of one, such as Aux1, Aux2, . . . can be used to better hide the original function (R in this example)
[0051] Note that less complex functions, such as linear functions as Aux functions, can be used while making generalization error of the neural network larger. The choice is between perfect overfitting/bad generalization or less overfitting/better generalization. The neural network can be designed to yield an output that is close enough. e.g. within a predetermined threshold, to work as intended but not reveal induced/expanded function to an attacker. Further, the complexity of the neural network, and thus the security of the transformation can be increased by one or more of the following: [0052] increasing the quantity of hidden layers; [0053] increasing the quantity of nodes in each hidden later; [0054] increasing the number of (irrelevant) attributes/dimensions (for the example R, change {(1000, 0), (1001, 0), . . . , (1234, 1), . . . , (2000, 0)} to the following: {(1000, 1201, 332, red, ocean, 12.001, 0), {(1001, 3110, 32, blue, park, 76.1, 0), (1234, 543, 7761, red, house, 9.8, 1), . . . }; [0055] using non-linear activation functions.
[0056] As demonstrated above, previously deemed undesirable characteristics of neural networks can be leveraged and exploited to obfuscate code functions and thus create more secure code and computing systems executing the code.
[0057] Although the present technology has been described in detail for the purpose of illustration based on what is currently considered to be the most practical and preferred implementations, it is to be understood that such detail is solely for that purpose and that the technology is not limited to the disclosed implementations, but, on the contrary, is intended to cover modifications and equivalent arrangements that are within the spirit and scope of the appended claims. For example, it is to be understood that the present technology contemplates that, to the extent possible, one or more features of any implementation can be combined with one or more features of any other implementation.