System and method for static detection and categorization of information-flow downgraders
10742666 ยท 2020-08-11
Assignee
Inventors
- Yinnon Haviv (Herzliya, IL)
- Roee Hay (Haifa, IL)
- Marco Pistoia (Hawthorne, NY)
- Guy Podjarny (London, GB)
- Adi Sharabani (Herzliya, IL)
- Takaaki Tateishi (Yamato, JP)
- Omer Tripp (Herzliya, IL)
- Omri Weisman (Tel Aviv, IL)
Cpc classification
G06F11/3608
PHYSICS
G06F21/6218
PHYSICS
International classification
G06F11/36
PHYSICS
G06F21/62
PHYSICS
Abstract
A system and method for static detection and categorization of information-flow downgraders includes transforming a program stored in a memory device by statically analyzing program variables to yield a single assignment to each variable in an instruction set. The instruction set is translated to production rules with string operations. A context-free grammar is generated from the production rules to identify a finite set of strings. An information-flow downgrader function is identified by checking the finite set of strings against one or more function specifications.
Claims
1. A method for static detection and categorization of information-flow downgraders, comprising: statically analyzing program variables to yield a single assignment for each variable in an instruction set; translating the instruction set to production rules with string operations to identify a finite set of strings; generating a context-free grammar from the production rules; and preventing illicit flow of input information by comparing the finite set of strings with one or more function specifications to identify an information flow downgrader function, the information-flow downgrader function downgrading the input information to enable high input information to flow to low program points by endorsing integrity and declassifying confidentiality of the input information.
2. The method as recited in claim 1, wherein identifying includes detecting and categorizing the downgrader functions based upon a purpose the downgrader function.
3. The method as recited in claim 1, wherein the one or more functions include a security-sensitive function in the program.
4. The method as recited in claim 1, further comprising comparing the context free grammar with a specification of the security-sensitive function such that if the grammar satisfies the specification, the input is considered properly downgraded.
5. The method as recited in claim 4, further comprising labeling a string to locate string-manipulating functions that modified the input and made the input specification-compliant.
6. The method as recited in claim 1, wherein one or more function specifications are employed to categorize the downgrader function.
7. The method as recited in claim 1, wherein transforming the program includes transforming the program by employing pseudo notations for program variable assignments.
8. The method as recited in claim 1, wherein the downgrader function is generated by a Web application.
9. A computer readable storage medium comprising a computer readable program for static detection and categorization of information-flow downgraders, wherein the computer readable program when executed on a computer causes the computer to perform the steps of: statically analyzing program variables to yield a single assignment to each variable in an instruction set; translating the instruction set to production rules with string operations; generating a context-free grammar from the production rules to identify a finite set of strings; and preventing illicit flow of input information by comparing the finite set of strings with one or more function specifications to identify an information flow downgrader function, the information-flow downgrader function enabling high input information to flow to low program points by endorsing integrity and declassifying confidentiality of the input information.
10. The computer readable storage medium as recited in claim 9, wherein identifying includes detecting and categorizing the downgrader functions based upon a purpose the downgrader function.
11. The computer readable storage medium as recited in claim 9, wherein the one or more functions include a security-sensitive function in the program.
12. The computer readable storage medium as recited in claim 9, further comprising comparing the context free grammar with a specification of the security-sensitive function such that if the grammar satisfies the specification the input is considered properly downgraded.
13. The computer readable storage medium as recited in claim 12, further comprising labeling a string to locate string-manipulating functions that modified the input and made the input specification-compliant.
14. The computer readable storage medium as recited in claim 9, wherein the one or more function specifications are employed to categorize the downgrader.
15. The computer readable storage medium as recited in claim 9, wherein transforming the program includes transforming the program by employing pseudo notations for program variable assignments.
16. A system for static detection and categorization of information-flow downgraders, comprising: a program storage device configured to store a program, the program storage device further configured to work in conjunction with a processor to execute program instructions to detect and categorize information-flow downgraders in the program; a static analysis framework configured to analyze an application program and to perform a static string assignment on the application program to transform program variables to yield a single assignment for each variable in an instruction set, the framework configured to translate the instruction set to production rules with string operations and generate a context-free grammar from the production rules to identify a finite set of strings; and a comparison module configured to prevent illicit flow of input information by comparing the finite set of strings with one or more function specifications to identify an information flow downgrader function, the information-flow downgrader function downgrading the input information to enable high input information to flow to low program points by endorsing integrity and declassifying confidentiality of the input information.
17. The system as recited in claim 16, wherein downgrader functions are categorized based upon a purpose of the downgrader function.
18. The system as recited in claim 16, wherein the one or more functions include a security-sensitive function in the program.
19. The system as recited in claim 16, wherein the comparison module compares the context free grammar with a specification of a security-sensitive function such that if the grammar satisfies the specification the input is considered properly downgraded.
20. The system as recited in claim 19, a labeler configured to label a string to locate string-manipulating functions that modified the input and made the input specification-compliant.
Description
BRIEF DESCRIPTION OF DRAWINGS
(1) The disclosure will provide details in the following description of preferred embodiments with reference to the following figures wherein:
(2)
(3)
(4)
(5)
(6)
(7)
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
(8) Embodiments in accordance with the present principles employ static string analysis for automatic detection and categorization of information-flow downgraders. The use of static string analysis identifies downgraders and categorizes them based on their purposes. In one illustrative embodiment, the analysis proceeds as follows. For each security-sensitive function in the program, we use string analysis to detect the grammar(s) of the string input(s) to the function. We then compare that grammar with a specification of the security-sensitive function. If the grammar satisfies the specification, it implies that the input was properly downgraded. In that case, using the labeling feature of the string analysis, it is possible to locate the string-manipulating functions that modified the input and made it specification-compliant. Those functions constitute a downgrader for the security-sensitive function. Furthermore, the specification of the security-sensitive function can be used to categorize the newly discovered downgrader, which is one important feature given that a downgrader for a security-sensitive function may not work for another security-sensitive function.
(9) As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a circuit, module or system. Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
(10) Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
(11) A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
(12) Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
(13) Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the C programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
(14) Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
(15) These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
(16) The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
(17) The flowchart and block diagrams in the FIGS. illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
(18) Referring now to the drawings in which like numerals represent the same or similar elements and initially to
(19) Let us consider the following Java program, which append a to the string assigned to the variable a three times after initializing it with a.
(20) TABLE-US-00002 String a = a; for (int i = 0; i < 3; i++) a = a + a; String r = a;
(21) We obtain the following CFG by translating every program variable v to a nonterminal S.sub.v and .fwdarw. to as in production rules, where we simply consider the string concatenation by + as the string concatenation on the CFG. S.sub.a.fwdarw.a S.sub.a.fwdarw.S.sub.aa S.sub.r.fwdarw.S.sub.a
(22) For example, the CFG with start symbol S.sub.a represents a set of possible string values assigned to the program variable a, which yields the set of strings {a,aa,aaa,aaaa, . . . }, and likewise S.sub.r represents a set of possible string values assigned to the program variable r. It contains strings that are never assigned to the variables a and r, since our string analysis completely ignores the condition of the for statement as of now.
(23) In block 104, when we have a program that uses predefined string operations such as String.substring as shown in the following program, we use a sound approximation for every string operation to translate a CFG to a CFG.
(24) TABLE-US-00003 String a = xxa; for (int i=0; i<3; i++) a = a + a; String r = a.substring(2);
(25) Soundness means that a resulting CFG computed by the string analysis contains all the actual strings arising at runtime. Soundness may be formally defined as follows; f is a sound approximation for a string operation f iff Sf (S) where S=(s|s=f(s), sS). One of the methods to approximate predefined string operations is to use a transducer which is an automaton with output. It is well known that the image of a transducer is also a CFG. Other methods are homomorphisms on (,+) where is a set of characters and + denotes concatenation functions that always return the same CFG yielding all the possible strings returned by corresponding predefined string operations, and so on. The following production rules with the approximated string operation substring (_,2) are the ones obtained from the program above. S.sub.a.fwdarw.xxa S.sub.a.fwdarw.S.sub.aa S.sub.r.fwdarw.substring(S.sub.a,2)
(26) Referring to
(27) A in
(28) In block 106, we have no string operation in the production rules. Thus, a context-free grammar consisting of the resulting production rules is the outcome of the string analysis.
(29) Referring to
(30) To describe the intra-procedural string analysis, let us consider a nappend method in Java as follows:
(31) TABLE-US-00004 public class MyClass { static public void main(String args[ ]) { String a = a; String b = b; String r = nappend(a, b, 3); } public void nappend(String x, String y, int n) { String r = null; if (n == 0) { r = x; } else { r = nappend(x + y, y, n1); } return r; } }
(32) In block 302, a translation of a program is made into Static Single Assignment (SSA) form, where pseudo notations are used for instructions. An example, translation is illustratively shown as follows:
(33) TABLE-US-00005 main(String) 2. a = a 3. b = b 4.r = nappend(a, b, 3) nappend(String) 1. b1 = n == 0 2. goto 6 if b1 3. v1 = x + y 4. r1 = nappend(v1, y, n1) 5.goto 8 6. r2 = x 7. goto 8 8. r = phi(r1,r2) 9. return r
(34) A call graph for this program is depicted in
(35) In block 304, the assignments in SSA form are translated to a set of production rules with string operations 306, except for conditional and unconditional jumps, in the same manner described above (
(36) For the inter-procedural string analysis, we extend the intra-procedural string analysis with the call graph information constructed by WALA, whose context-sensitivity can be flexibly controlled by known methods. We annotate every variable in the SSA program with a call graph node. We combine all the production rules after removing production rules translated from method invocations such as S.sub.r.sup.1.fwdarw.nappend(S.sub.v1,S.sub.y,n1). We introduce production rules representing dependencies between the parameters and the return value of a callee method and the variables of a caller method. For example, the following production rules are introduced if we have a context-insensitive call graph 400 as shown in
(37) A complete set of the production rules with string operations 306 obtained from the program includes: S.sub.a.sup.1.fwdarw.a S.sub.x.sup.2.fwdarw.S.sub.a.sup.1 S.sub.b.sup.1.fwdarw.b S.sub.y.sup.2.fwdarw.S.sub.b.sup.1 S.sub.v1.sup.2.fwdarw.S.sub.x.sup.2S.sub.y.sup.2 S.sub.r.sup.1.fwdarw.S.sub.r.sup.2 S.sub.r2.sup.2.fwdarw.S.sub.x.sup.2 S.sub.x.sup.2.fwdarw.S.sub.v1.sup.2 S.sub.r.sup.2.fwdarw.S.sub.r1.sup.2 S.sub.y.sup.2.fwdarw.S.sub.y.sup.2 S.sub.r.sup.2.fwdarw.S.sub.r2.sup.2 S.sub.r1.sup.2.fwdarw.S.sub.r.sup.2
(38) An optional pointer analysis may be performed that helps the string analyzer or solver 308 to identify how constant strings flow to variables across methods and to identify whether the same objects are assigned to different variables in potentially different methods, even if those objects are dynamically created. In block 310, we then obtain the following CFG that predicts possible strings assigned to the variable r in the main method, where the start symbol is S.sub.r.sup.1. S.sub.r.sup.1.fwdarw.a|S.sub.r.sup.1b
(39) Referring to
(40) In block 504, the instruction set is translated to production rules with string operations. In block 506, a pointer analysis is optionally performed on the production rules with string operations to improve precision. In block 508, a context-free grammar is generated from the production rules to identify a finite set of strings. A kleene-star operator may be employed to identify the finite set of strings.
(41) In block 510, an information-flow downgrader function is identified by checking the finite set of strings against one or more function specifications. The one or more functions preferably include a security-sensitive function in the program. This may include detecting and categorizing the downgrader functions based upon a purpose the downgrader function.
(42) In block 512, the context free grammar is preferably compared with a specification of a security-sensitive function such that if the grammar satisfies the specification, the input is considered properly downgraded. In block 514, a string is labeled to locate string-manipulating functions that modified an input and made the input specification-compliant to identify and categorize an information-flow downgrader function. The one or more function specifications are employed to categorize the downgrader function. The downgrader function may be generated by a Web application or any other entity that employs security levels for dealing with its network transactions.
(43) Referring to
(44) The comparison module 610 compares the context free grammar (CFG) with a specification 612 of a security-sensitive function such that if the grammar satisfies the specification the input is considered properly downgraded. A labeler 614 is configured to label a string to locate string-manipulating functions that modified the input and made the input specification-compliant. Those functions constitute a downgrader for the security-sensitive function. Furthermore, the specification of the security-sensitive function can be used to categorize the newly discovered downgrader, which is one important feature given that a downgrader for a security-sensitive function may not work for another security-sensitive function. The downgrader functions that are generated by an entity, such as, a Web application, secure network devices, access controlled devices, etc. are thereby identified and categorized.
(45) The program analyzed may be co-located with the system 600 or may be remotely disposed from the system 600. A program 603 may be analyzed for downgraders as provided over a network from a server or web application 620. The servers or web applications may be located in at a single location or may be distributed throughout a network 622.
(46) Having described preferred embodiments of a system and method for static detection and categorization of information-flow downgraders (which are intended to be illustrative and not limiting), it is noted that modifications and variations can be made by persons skilled in the art in light of the above teachings. It is therefore to be understood that changes may be made in the particular embodiments disclosed which are within the scope of the invention as outlined by the appended claims. Having thus described aspects of the invention, with the details and particularity required by the patent laws, what is claimed and desired protected by Letters Patent is set forth in the appended claims.