FEATURE ESTIMATING DEVICE, FEATURE ESTIMATING METHOD AND COMPUTER-READABLE MEDIUM
20220269490 · 2022-08-25
Assignee
Inventors
Cpc classification
International classification
Abstract
A feature estimating device includes a parser identifying means for identifying a parser, the parser being contained in software, for parsing user input and executing a relevant command, a dividing means for extracting commands form a character string in the parser and clustering control flows connecting with the extracted commands as starting points to divide a code of the software for each feature, and a feature estimating means for estimating, based on a characteristic part of each divided code, a feature for each divided code.
Claims
1. A feature estimating device comprising: hardware, including at least one processor and memory; a parser identifying unit, implemented by the hardware, configured to identify a parser, the parser being contained in software, for parsing user input and executing a relevant command; a dividing unit, implemented by the hardware, configured to extract commands from a character string in the parser and clustering control flows connecting with the extracted commands as starting points to divide a code of the software for each feature; and a feature estimating unit, implemented by the hardware, configured to estimating, based on a characteristic part of each divided code, a feature for each divided code.
2. The feature estimating device according to claim 1, wherein the divided code of a feature corresponding to a first command, the first command being one of the commands, contains a code of all subcommands through which control flows are traceable downstream from the first command as a starting point.
3. The feature estimating device according to claim 1, wherein the characteristic part contains at least one of a name of a command, a name of a function corresponding to the command, a user-defined function called inside the function, a library function, an API, a system call, and an instruction contained in the divided code.
4. The feature estimating device according to claim 1, wherein the feature estimating unit takes information about the API to be called when the divided code is executed into consideration in the estimation of a feature of the software.
5. The feature estimating device according to claim 1, wherein the feature estimating unit takes information about a state of file access when the divided code is executed into consideration in the estimation of a feature of the software.
6. A feature estimating method comprising the steps of: identifying a parser, the parser being contained in software, for parsing user input and executing a relevant command; extracting commands from a character string in the parser and clustering control flows connecting with the extracted commands as starting points to divide a code of the software for each feature; and estimating, based on a characteristic part of each divided code, a feature for each divided code.
7. A non-transitory computer-readable medium storing a program causing a computer to execute the steps of: identifying a parser, the parser contained being in software, for parsing user input and executing a relevant command; extracting commands from a character string in the parser and clustering control flows connecting with the extracted commands as starting points to divide a code of the software for each feature; and estimating, based on a characteristic part of each divided code, a feature for each divided code.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0010]
[0011]
[0012]
[0013]
[0014]
[0015]
DESCRIPTION OF EMBODIMENTS
[0016] Hereinafter, example embodiments of the present invention will be described with reference to the drawings. The following description and the drawings are appropriately omitted or simplified to clarify the explanation. In the drawings, the same elements are denoted by the same reference signs, and duplicated descriptions are omitted as necessary.
First Example Embodiment
[0017] A first example embodiment will be described below.
[0018]
[0019] The parser identifying means 11 identifies a parser contained in software. Here, a parser is for parsing user input and executing a relevant command. The dividing means 12 extracts commands from a character string in the parser and clusters control flows connecting with the extracted commands as starting points to divide a code of the software for each feature. The feature estimating means 13 estimates a feature for each divided code based on a characteristic part of each divided code.
[0020] It is possible to estimate features contained in software from a code to be able to accurately easily inspect the code of the software for each feature.
Second Example Embodiment
[0021] A second example embodiment will be described below.
[0022] First, a configuration example of a feature estimating device according to the second example embodiment is described.
[0023] The parser identifying means 111 identifies a parser contained in software and for parsing user input and executing a relevant command. Note that, a method for identifying a parser in a code of software may be a known method of, for example, searching for a characteristic function used in the parser. The dividing means 112 extracts commands from a character string in the parser and clusters control flows connecting with the extracted commands as starting points to divide a code of the software for each feature. The feature estimating means 113 estimates, based on a characteristic part of each divided code, a feature for each divided code. The feature labeling means 114 labels each divided code with the feature estimated by the feature estimating means 113.
[0024] In the above description, the control flows are clustered using the extracted commands as the starting points, but control flows to be validated by settings may be clustered using setting items of the software as starting points instead of the commands. In this case, the parser is for parsing setting information and controlling validity/invalidity of the control flows depending on the settings.
[0025] Next, a procedure of processing of estimating a feature for each divided code in the feature estimating device 110 is described. Note that,
[0026]
[0027]
[0028]
[0029] In the example shown in
[0030] That is, control flows from the parser extend to upper commands (parent commands). Then, control flows extend from each upper command to first layer subcommands (child commands), and further extend from the first layer subcommands to second layer subcommands (grandchild commands). Except for a special case, it is assumed that, in general software, control flows extending from a parser connect with downstream subcommands through upper commands as described above.
[0031] Control flows that can be traced from an upper command are clustered as a cluster of the upper command. In the example shown in
[0032] A divided code of a feature corresponding to a first command, which is one of the upper commands, contains the codes of all the subcommands through which the control flows that can be traced downstream from the first command as a starting point. The code of the first command and the codes of the subcommands through which the control flows clustered as a cluster of the first command pass are assumed to be a divided code of one feature contained in the software.
[0033] That is, in the example shown in
[0034] In this manner, by clustering control flows connecting with upper commands as starting points, it is possible to identify the range of each feature contained in software (in the example shown in
[0035]
[0036] In the estimation of a feature of the software in step S104 of
[0037] From the above, since the feature estimating device 110 divides a code on the basis of upper commands whose control flows directly connecting with a parser, dividing accuracy of features is high. Here, high dividing accuracy means that all codes related to a feature are contained and that no code unrelated to the features is contained. With the feature estimating device 110, dividing accuracy of functions is high, and it is possible to estimate features contained in software from a code to be able to accurately easily inspect the code of the software for each feature.
[0038] In the above example embodiments, the present invention is described as a hardware configuration, but the present invention is not limited thereto. The present invention can be achieved by a central processing unit (CPU) executing a program.
[0039] The program for performing the processing of estimating a feature for each divided code can be stored by various types of non-transitory computer-readable media and provided to a computer. Non-transitory computer readable media include any type of tangible storage media. Examples of non-transitory computer readable media include magnetic storage media (such as flexible disks, magnetic tapes, hard disk drives, etc.), optical magnetic storage media (such as magneto-optical disks), Compact Disc Read Only Memory (CD-ROM), CD-R, CD-R/W, and semiconductor memories (such as mask ROM, Programmable ROM (PROM), Erasable PROM (EPROM), flash ROM, and Random Access Memory (RAM)). The program may be provided to a computer using any type of transitory computer readable media. Examples of transitory computer readable media include electric signals, optical signals, and electromagnetic waves. Transitory computer readable media can provide the program to a computer via a wired communication line (such as electric wires, and optical fibers) or a wireless communication line.
[0040] The present invention has been described above with reference to the example embodiments but is not limited by the above. Various modifications that can be understood by those skilled in the art can be made to the configurations and the details of the present invention without departing from the scope of the invention. The software to be analyzed in the above example embodiments may be executable codes or source codes. In addition, the software to be analyzed may be a single code or codes containing a library loaded from the outside.
REFERENCE SIGNS LIST
[0041] 10, 110 Feature estimating device [0042] 11, 111 Parser identifying means [0043] 12, 112 Dividing means [0044] 13, 113 Feature estimating means [0045] 114 Feature labeling means