Profile generation device, attack detection device, profile generation method, and profile generation computer program
11470097 · 2022-10-11
Assignee
Inventors
- Shingo Orihara (Musashino, JP)
- Tohru Sato (Musashino, JP)
- Yohsuke Shimada (Musashino, JP)
- Yang Zhong (Musashino, JP)
- Yuta Iwaki (Musashino, JP)
Cpc classification
H04L67/02
ELECTRICITY
G06F21/55
PHYSICS
International classification
Abstract
A global profile generation unit acquires a profile including, as an entry, information on parameter values for a combination of path parts and parameter names included in a normal HTTP request to a web server. When entries, in which the path parts are different but the parameter names are the same, are present in the acquired profile, the global profile generation unit generates a global profile in which the entries of the parameter names are aggregated in the acquired profile.
Claims
1. A profile generation device that generates a profile indicating characteristics of a request to a web server, the request being a detection target, the profile generation device comprising: a memory; and a processor coupled to the memory and programmed to execute a process comprising: acquiring profile information including a combination of path parts and parameters included in a request that is learning data; and when a group of the acquired profile information includes a predetermined number or more of profile information in which the path parts are different but similarity between names of the parameters is equal to or more than a predetermined value, generating a profile in which the group of the profile information is aggregated, wherein the similarity is determined in accordance with a length Z of a longest common subsequence between a character class sequence of the profile, the character class sequence of the profile having a length X, and a character class sequence of the detection target, the character class sequence of the detection target having a length Y, such that the similarity S is given by
S=Z/(X+Y−Z), and wherein the character class sequence of the profile is a sequence in which characters constituting the parameters are classified into classes including as AL (a character class (alpha) representing an alphabet), NU (a character class (numeric) representing a numeral), or SY (a character class (symbol) representing a symbol), and classification results are arranged according to an arrangement of character strings.
2. The profile generation device according to claim 1, wherein, when the group of the acquired profile information includes a predetermined number or more of profile information groups in which the path parts are different but the names of the parameters are equal to each other, setting the profile information group, in which the names of the parameters are equal to each other, as an aggregation target.
3. The profile generation device according to claim 1, wherein, when aggregating a profile information group in which the similarity between the names of the parameters is equal to or more than the predetermined value, aggregating a profile information group in which similarity between parameter value information included in the profile information group is equal to or more than a predetermined value.
4. The profile generation device according to claim 3, wherein, when aggregating a profile information group in which the similarity between the names of the parameter is equal to or more than the predetermined value, setting, as a non-aggregation target, parameter value information in which the number of appearances or an appearance ratio of the parameter value information in the profile information group is smaller than the predetermined value.
5. The profile generation device according to claim 1, wherein, when aggregating a profile information group in which the similarity between the names of the parameter is equal to or more than the predetermined value, setting a wildcard as a path part in a profile information group to be aggregated in the profile.
6. The profile generation device according to claim 1, wherein the profile is information in which the path parts included in the request, the names of the parameters, and parameter value information of the parameters are correlated with one another.
7. An attack detection device that detects an attack by using a profile indicating characteristics of a request to a web server, the request being a detection target, the attack detection device comprising: a memory; and a processor coupled to the memory and programmed to execute a process comprising: acquiring profile information including a combination of path parts and parameters included in a request that is learning data; when a group of the acquired profile information includes a profile information group in which the path parts are different but similarity between names of the parameters is equal to or more than a predetermined value, generating a profile in which the profile information group is aggregated; acquiring a request that is an attack detection target; and detecting whether the request is an attack by comparing path parts and parameters in the request that is the attack detection target with the combination of the path parts and the parameters indicated in the profile, wherein the similarity is determined in accordance with a length Z of a longest common subsequence between a character class sequence of the profile, the character class sequence of the profile having a length X, and a character class sequence of the detection target, the character class sequence of the detection target having a length Y, such that the similarity S is given by
S=Z/(X+Y−Z), and wherein the character class sequence of the profile is a sequence in which characters constituting the parameters are classified into classes including as AL (a character class (alpha) representing an alphabet), NU (a character class (numeric) representing a numeral), or SY (a character class (symbol) representing a symbol), and classification results are arranged according to an arrangement of character strings.
8. A profile generation method using a profile generation device that generates a profile indicating characteristics of a request to a web server, the request being a detection target, the profile generation method comprising: acquiring profile information including a combination of path parts and parameters included in a request that is learning data; and generating, when a group of the acquired profile information includes a predetermined number or more of profile information in which the path parts are different but similarity between names of the parameters is equal to or more than a predetermined value, a profile in which the group of the profile information is aggregated, wherein the similarity is determined in accordance with a length Z of a longest common subsequence between a character class sequence of the profile, the character class sequence of the profile having a length X, and a character class sequence of the detection target, the character class sequence of the detection target having a length Y, such that the similarity S is given by
S=Z/(X+Y−Z), and wherein the character class sequence of the profile is a sequence in which characters constituting the parameters are classified into classes including as AL (a character class (alpha) representing an alphabet), NU (a character class (numeric) representing a numeral), or SY (a character class (symbol) representing a symbol), and classification results are arranged according to an arrangement of character strings.
9. A non-transitory computer-readable recording medium having stored a profile generation program that generates a profile indicating characteristics of a normal request to a web server, the request being a detection target, and causes a computer to execute a process comprising: acquiring profile information including a combination of path parts and parameters included in a request that is learning data; and generating, when a group of the acquired profile information includes a predetermined number or more of profile information in which the path parts are different but similarity between names of the parameters is equal to or more than a predetermined value, a profile in which the group of the profile information is aggregated, wherein the similarity is determined in accordance with a length Z of a longest common subsequence between a character class sequence of the profile, the character class sequence of the profile having a length X, and a character class sequence of the detection target, the character class sequence of the detection target having a length Y, such that the similarity S is given by
S=Z/(X+Y−Z), and wherein the character class sequence of the profile is a sequence in which characters constituting the parameters are classified into classes including as AL (a character class (alpha) representing an alphabet), NU (a character class (numeric) representing a numeral), or SY (a character class (symbol) representing a symbol), and classification results are arranged according to an arrangement of character strings.
Description
BRIEF DESCRIPTION OF DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
DESCRIPTION OF EMBODIMENTS
(14) Hereinafter, with reference to the drawings, embodiments of the present invention will be described by dividing them into a first embodiment to a third embodiment. The present invention is not limited to the following embodiments.
First Embodiment
Overview
(15) The overview of an attack detection device 10 including a profile generation device (a global profile generation unit 125) of the first embodiment will be described using
(16) In addition, a case in which learning data and analysis target data (data that is an attack detection target) by the attack detection device 10 are HTTP requests to a web server (site) will be provided as an example.
(17) Furthermore, a case where the attack detection device 10 performs what is called anomaly detection will be described as an example. Furthermore, for each combination of path parts and names of parameters (parameter names) included in a normal HTTP request, the attack detection device 10 uses information, which indicates information (parameter value information) indicating the parameter values of the parameters, as a profile (profile information) to be used in the detection. In addition, a case where the attack detection device 10 uses, as the parameter value information, a character class sequence (which will be described later) abstracted from the structure of a character string of the parameter values will be described as an example.
(18) First, the attack detection device 10 illustrated in
(19) For each combination of path parts and names of parameters included in the normal HTTP request, the attack detection device 10, for example, generates information, which indicates the character class sequence of parameter values of the parameters, as a profile on the basis of the normal HTTP request that is the learning data.
(20) If there is a parameter (a global parameter) with different path parts but the same name in the HTTP request of the learning data when the profile is generated, the attack detection device 10 generates information indicating a character class sequence of the global parameter (see
(21) The character class sequence is a sequence in which characters constituting a parameter value are classified into classes such as AL (a character class (alpha) representing an alphabet), NU (a character class (numeric) representing a numeral), or SY (a character class (symbol) representing a symbol) and the classification results are arranged according to the arrangement of character strings.
(22) For example, the attack detection device 10 generates a profile indicating character class sequences of parameter values (a profile before a global parameter conversion process illustrated in
(23) As described above, the attack detection device 10 generates the profile in which the entries of the parameter with different path parts but the same name are aggregated. In this way, even when an HTTP request in which a path part dynamically changes is used in learning data, the attack detection device 10 can solve the insufficiency of the number of types of parameter values for the combinations of path parts and parameter names in the profile. As a consequence, the attack detection device 10 can reduce erroneous detection in attack detection using the profile. Furthermore, the attack detection device 10 can generate a profile applicable to a path part that is generated only once, so that it is possible to reduce the missing of an attack by using the profile. Moreover, the attack detection device 10 can reduce the size of the profile, so that it is possible to reduce a time required for collating an HTTP request to be analyzed and the profile.
Configuration
(24) The configuration of the attack detection device 10 will be described using
(25) The input unit 11 has a learning data input unit (acquisition unit) 111 and an analysis target data input unit 112. The learning data input unit 111 receives a normal HTTP request to a web server (not illustrated) as learning data. The analysis target data input unit 112 receives input of an HTTP request that is analysis target data.
(26) The control unit 12 includes a parameter extraction unit 121, a parameter value conversion unit 122, a profile storage unit 123, an anomaly detection unit 124, and a global profile generation unit 125.
(27) The parameter extraction unit 121 extracts path parts, parameter names, and parameter values from the HTTP request of the learning data. Furthermore, the parameter extraction unit 121 extracts path parts, parameter names, and parameter values from the HTTP request of the analysis target data.
(28) The parameter value conversion unit 122 converts the parameter values of the HTTP request of the learning data extracted by the parameter extraction unit 121 into character class sequences. Furthermore, the parameter value conversion unit 122 converts the parameter values of the HTTP request of the analysis target data extracted by the parameter extraction unit 121 into character class sequences.
(29) For example, the parameter value conversion unit 122 refers to the regular expression of character classes prepared in advance, determines the longest matched part of the parameter value as one class, and performs a conversion to a character class sequence.
(30) One example will be described. Hereinafter, a case where a parameter value extracted from the HTTP request of the learning data by the parameter extraction unit 121 is “img.jpg” will be considered. In such a case, the parameter value conversion unit 122 divides the “img.jpg” into three character strings of (img,.,jpg) and converts the (img,.,jpg) into a class of (alpha,symbol,alpha), as illustrated in (1) of
(31) When information, in which the path parts, the parameter names, and the character class sequences indicating the parameter values of the HTTP request of the learning data have been correlated with one another, is received from the parameter value conversion unit 122, the profile storage unit 123 of
(32) For example, in the profile illustrated in
(33) In addition, when a plurality of character class sequences are present for a combination of the same path part and parameter name, the profile storage unit 123 of
(34) One example will be described. For example, when three character class sequences illustrated in (2) of
(35) The anomaly detection unit 124 of
(36) For example, the anomaly detection unit 124 receives information, in which the path parts, the parameter names, and the character class sequences of the HTTP request of the analysis target data have been correlated with one another, from the parameter value conversion unit 122. Thereafter, the anomaly detection unit 124 calculates similarity between the character class sequence for the combination of the path part and the parameter name of the HTTP request of the analysis target data and the character class sequence for the combination of the path part and the parameter name in the profile.
(37) Thereafter, when the calculated similarity is equal to or more than a predetermined threshold value, the anomaly detection unit 124 determines that the HTTP request of the analysis target data is the normal HTTP request, and when the calculated similarity is smaller than the predetermined threshold value, the anomaly detection unit 124 determines that the HTTP request of the analysis target data is an HTTP request indicating an attack.
(38) One example will be described. First, the parameter value conversion unit 122 converts the parameter value of the HTTP request of the analysis target data extracted by the parameter extraction unit 121 into a character class sequence ((1) of
(39) Thereafter, the anomaly detection unit 124 calculates similarity between the character class sequence of the HTTP request of the analysis target data and the character class sequence of the profile by Equation (1) ((2) of
(40) For example, as illustrated in (2) of
Similarity S=Z/(X+Y−Z) (1)
(41) For example, as illustrated in (2) of
(42) Next, the anomaly detection unit 124 determines whether the calculated similarity S between the character class sequences is smaller than a threshold value St, and determines that the HTTP request of the analysis target data is anomaly when the similarity S is smaller than the threshold value St ((3) of
(43) For an HTTP request with different path parts but the same parameter name in an HTTP request group of learning data, the global profile generation unit 125 of
(44) The global profile generation unit 125 includes a global parameter collection unit 126 and a global parameter conversion unit 127.
(45) For each parameter name present in the profile, the global parameter collection unit 126 collects the types of path parts in which the parameter name appears. For example, for a profile before a global parameter conversion process illustrated in
(46) The global parameter conversion unit 127 of
(47) For example, in a globalized candidate parameter collection list illustrated in
(48) Thereafter, as illustrated in
(49) For example, the global parameter conversion unit 127 keeps the character class sequence indicated by the reference numeral 901 of
(50) When the global parameter conversion unit 127 generates the global profile as described above, the generated global profile is stored in the storage unit 13. Thereafter, the anomaly detection unit 124 performs anomaly detection for the HTTP request of the analysis target data by using the global profile (profile) stored in the storage unit 13.
(51) The storage unit 13 stores therein the profile generated by the control unit 12. After the global parameter conversion process of the profile is performed by the global profile generation unit 125, the global profile (see
Processing Procedure
(52) Next, an example of a processing procedure when the attack detection device 10 generates the global profile will be described using
(53) First, the global parameter collection unit 126 acquires the profile (see
(54) For example, at S4, when the path part of a row taken out from the profile is “*”, the global parameter collection unit 126 determines that the row is not the candidate for conversion to the global parameter because the row has already been converted to the global parameter (No at S4), and returns to S3. Then, the global parameter collection unit 126 processes the next row of the profile acquired at S1. On the other hand, when the path part of the row taken out from the profile is other than “*” (Yes at S4), the global parameter collection unit 126 determines that the row is the candidate for conversion to the global parameter and proceeds to S5.
(55) In the case of Yes at S4, the global parameter collection unit 126 updates the globalized candidate parameter collection list (see
(56) (a) When neither the parameter name nor the path part is present in the globalized candidate parameter collection list, the global parameter collection unit 126 newly generates an entry of the combination of the parameter name and the path part in the globalized candidate parameter collection list. Then, the global parameter collection unit 126 sets the number of types of the path part in the entry to “1”.
(57) (b) When the parameter name is present in the globalized candidate parameter collection list but the path part is not present, the global parameter collection unit 126 adds the path part to the entry of the parameter name of the globalized candidate parameter collection list. Then, the global parameter collection unit 126 adds the value of the number of types of the path part in the entry.
(58) When the combination of the parameter name and the path part is already present in the globalized candidate parameter collection list, the global parameter collection unit 126 does not add the parameter name and the path part to the globalized candidate parameter collection list.
(59) After S5, when there is an unprocessed row in the profile acquired at S1 (No at S6), the global parameter collection unit 126 returns to S3, and when there is no unprocessed row in the profile acquired at S1 (Yes at S6), the global parameter collection unit 126 proceeds to S7 and outputs the globalized candidate parameter collection list (S7).
(60) After S7, the global parameter conversion unit 127 acquires the globalized candidate parameter collection list (S8). Thereafter, the global parameter conversion unit 127 extracts the parameter name, the path part, and the number of types of the path part of the globalized candidate parameter collection list (S9: extract parameter and path part information). When the number of types of the path part for the extracted parameter name is equal to or more than a predetermined threshold value, the global parameter conversion unit 127 determines the parameter name as a parameter name that needs to be globalized (Yes at S10) and proceeds to S11. That is, when a path part corresponding to the parameter name, the type of the path part being equal to or more than the predetermined threshold value, is present in the globalized candidate parameter collection list, the global parameter conversion unit 127 determines the parameter name as the parameter name that needs to be globalized.
(61) On the other hand, when the number of types of the path part for the extracted parameter name is smaller than the predetermined threshold value, the global parameter conversion unit 127 determines that the parameter name is not the parameter name that needs to be globalized (No at S10), and performs the process of S9 on the next row of the globalized candidate parameter collection list.
(62) For example, when the threshold value of the types of the path part is 2, the global parameter conversion unit 127 determines that the parameter names “id” and “cc” in the globalized candidate parameter collection list illustrated in
(63) In the case of Yes at S10 of
(64) For example, the global parameter conversion unit 127 converts the path part in the entry of the parameter name into “*”. Furthermore, the global parameter conversion unit 127 aggregates a character class sequence in the entry of the parameter name.
(65) For example, as illustrated in
(66) In addition, the global parameter conversion unit 127 processes the number of appearances of a character class sequence in entries to be converted to the global parameter in the profile in the following manner. That is, the global parameter conversion unit 127 sums the number of appearances of the character class sequence for an entry with the same character class sequence between the entries to be converted to the global parameter, and inherits the number of appearances of the character class sequence for an entry with no same character class sequence between the entries to be converted to the global parameter.
(67) For example, as illustrated in
(68) After S11 of
(69) With the above processing, the attack detection device 10 generates a profile (a global profile) in which entries with different path parts but the same parameter name are aggregated in a profile. Thus, for example, even when a profile is generated using an HTTP request in which a path part dynamically changes as learning data, the attack detection device 10 can suppress the insufficiency of the number of types of a parameter value for a parameter name in the profile. As a consequence, the attack detection device 10 can reduce erroneous detection when detecting an attack by using the profile.
(70) Furthermore, for example, the attack detection device 10 can generate a profile applicable to a path part that is generated only once, so that it is possible to reduce the missing of an attack by using the profile. Moreover, the attack detection device 10 can reduce the size of the profile, so that it is possible to reduce a time required for collating an HTTP request to be analyzed and the profile when an attack is detected by using the profile.
Second Embodiment
(71) At S11 of
(72) For example, a case will be considered where the global parameter conversion unit 127 uses an edit distance between character class sequences to calculate similarity and a character class sequence group with an edit distance smaller than “3” is to be aggregated. In such a case, as illustrated in
(73) Consequently, the global parameter conversion unit 127 determines that the character class sequence indicated by the reference numeral 1101 and the character class sequence indicated by the reference numeral 1102 are similar to each other and aggregates these character class sequences. In the case of using the longest common substring to aggregate the character class sequences, the global parameter conversion unit 127 aggregates the character class sequence indicated by the reference numeral 1101 and the character class sequence indicated by the reference numeral 1102 into (AL, NU, SY, AL), which is the longest common substring of these character class sequences. Furthermore, the global parameter conversion unit 127 sums the number of appearances of the character class sequence indicated by the reference numeral 1101 and the character class sequence indicated by the reference numeral 1102, and sets the summed number to the number of appearances “3” indicated by reference numeral 1103.
(74) By so doing, the attack detection device 10 can further reduce the size of the profile (the global profile), so that it is possible to reduce a time required for collating an HTTP request to be analyzed and the profile when an attack is detected by using the profile.
Third Embodiment
(75) Furthermore, the global parameter conversion unit 127 may set a character class sequence with a low number of appearances or a low appearance ratio as a non-aggregation target when generating the global profile. For example, in the first embodiment or the second embodiment, the global parameter conversion unit 127 obtains a relative appearance ratio in a character class sequence group serving as an aggregation candidate with respect to each character class sequence serving as an aggregation candidate. Then, the global parameter conversion unit 127 sets a character class sequence with an appearance ratio lower than a predetermined threshold value as a non-aggregation target.
(76) For example, a case will be considered where the global parameter conversion unit 127 sets a character class sequence with a relative appearance ratio, which is smaller than 5% in the character class sequence group serving as an aggregation candidate, as a non-aggregation target. In such a case, as illustrated in
(77) As described above, the global parameter conversion unit 127 sets a character class sequence with a low number of appearances as a non-aggregation target when generating the global profile. In this way, even when a profile includes a result of learning an HTTP request having a parameter value erroneously input or an HTTP request indicating an attack, the attack detection device 10 can prevent the result of the learning from being reflected in the global profile. As a consequence, the attack detection device 10 can improve the accuracy of attack detection using the global profile.
(78) In addition, the global parameter conversion unit 127 may delete information on a character class sequence with a low number of appearances or a low appearance ratio as described above after generating the global profile similar to that of the first embodiment or the second embodiment.
Other Embodiments
(79) When generating the global profile, if there are entries in which path parts are different but the similarity of parameter names is equal to or more than a predetermined value, the global parameter conversion unit 127 may also set the entry as an aggregation target. By so doing, the attack detection device 10 can further reduce the size of a profile (the global profile).
(80) So far, the case where the attack detection device 10 uses the HTTP request to the web server (site) as learning data and analysis target data has been described as an example; however, an access request other than the HTTP request may also be used.
(81) Moreover, the attack detection device 10 may apply the aggregation of the entries in which path parts are different but a parameter name is the same to the generation of a profile to be used in so-called signature-based detection. For example, the attack detection device 10 acquires an HTTP request to be used in malicious communication as learning data, and generates a profile by aggregating parameter values (or character class sequences indicating the parameter values) in an HTTP request, in which path parts are different but a parameter name is the same, from the acquired HTTP request. Then, the attack detection device 10 performs the signature-based detection for the HTTP request of analysis target data by using the profile.
Computer Program
(82) Furthermore, a computer program for performing the function of the attack detection device 10 described in each embodiment can be installed in a desired information processing device (computer). For example, the aforementioned computer program provided as package software or on-line software is executed in the information processing device, so that the information processing device can serve as the attack detection device 10. The information processing device described herein includes desktop-type or laptop-type personal computers. In addition, mobile communication terminals such as smart phones, cellular phones, or personal handyphone systems (PHSs), and personal digital assistants (PDAs) are included in the category of the information processing device. Furthermore, the attack detection device 10 may be implemented in a cloud server.
(83) Hereinafter, an example of a computer that executes the aforementioned computer program (an attack detection program) will be described.
(84) The memory 1010 includes a read only memory (ROM) 1011 and a random access memory (RAM) 1012. The ROM 1011, for example, stores therein a boot program such as a basic input output system (BIOS). The hard disk drive interface 1030 is connected to a hard disk drive 1090. The disk drive interface 1040 is connected to a disk drive 1100. A removable storage medium such as a magnetic disk or an optical disc is inserted into the disk drive 1100. For example, a mouse 1110 and a keyboard 1120 are connected to the serial port interface 1050. For example, a display 1130 is connected to the video adapter 1060.
(85) As illustrated in
(86) Furthermore, the CPU 1020 reads the program module 1093 and the program data 1094 stored in the hard disk drive 1090 to the RAM 1012 as necessary and performs the aforementioned each procedure.
(87) In addition, the program module 1093 and the program data 1094 related to the aforementioned attack detection program are not limited to being stored in the hard disk drive 1090, and for example, may be stored in a removable storage medium and read by the CPU 1020 via the disk drive 1100 and the like. Alternatively, the program module 1093 and the program data 1094 related to the aforementioned computer program may be stored in other computers connected via a network such as a local area network (LAN) or a wide area network (WAN), and read by the CPU 1020 via the network interface 1070.
REFERENCE SIGNS LIST
(88) 10 attack detection device 11 input unit 12 control unit 13 storage unit 111 learning data input unit 112 analysis target data input unit 121 parameter extraction unit 122 parameter value conversion unit 123 profile storage unit 124 anomaly detection unit 125 global profile generation unit 126 global parameter collection unit 127 global parameter conversion unit