LEARNING APPARATUS, ANOMALY DETECTION APPARATUS, LEARNING METHOD, ANOMALY DETECTION METHOD, AND PROGRAM

20240419560 ยท 2024-12-19

    Inventors

    Cpc classification

    International classification

    Abstract

    A learning apparatus according to an embodiment includes: an input unit that inputs a set of normal data of a first system serving as a target domain and a set of normal data of a second system serving as a source domain; and a learning unit that learns a model including a first encoder having data of the target domain as an input, a second encoder having data of the source domain as an input, a discriminator having output data of either the first encoder or the second encoder as an input to discriminate whether the output data is data indicating a feature of either the target domain or the source domain, and a DeepSVDD having the output data as an input, by using the set of normal data of the first system and the set of normal data of the second system.

    Claims

    1. A learning apparatus comprising: a processor; and a memory storing program instructions that cause the processor to: input a set of normal data of a first system serving as a target domain and a set of normal data of a second system serving as a source domain; and learn a model including a first encoder having data of the target domain as an input, a second encoder having data of the source domain as an input, a discriminator having output data of either the first encoder or the second encoder as an input to discriminate whether the output data is data indicating a feature of either the target domain or the source domain, and a deep support vector data description (DeepSVDD) having the output data as an input, by using the set of normal data of the first system and the set of normal data of the second system.

    2. The learning apparatus according to claim 1, wherein the program instructions cause the processor to learn a parameter of the first encoder, a parameter of the second encoder, and a parameter of the DeepSVDD to minimize a hypersphere when the output data is mapped on a hypersphere by the DeepSVDD and learns a parameter of the first encoder, a parameter of the second encoder, and a parameter of the discriminator to maximize determination performance of the discriminator.

    3. The learning apparatus according to claim 1, wherein the number of data included in the set of normal data of the target domain is smaller than the number of data included in the set of normal data of the source domain.

    4. An anomaly detecting apparatus comprising: a processor; and a memory storing program instructions that cause the processor to: determine whether or not an anomaly has occurred in a system by using a first encoder and a DeepSVDD included in a model learned by the learning apparatus according to claim 1 and data of the system which is an anomaly detection target.

    5. A learning method that is executed by a computer, the learning method comprising: inputting a set of normal data of a first system serving as a target domain and a set of normal data of a second system serving as a source domain; and learning a model including a first encoder having data of the target domain as an input, a second encoder having data of the source domain as an input, a discriminator having output data of either the first encoder or the second encoder as an input to discriminate whether the output data is data indicating a feature of either the target domain or the source domain, and a DeepSVDD having the output data as an input, by using the set of normal data of the first system and the set of normal data of the second system.

    6. A method for detecting an anomaly, which is executed by a computer, the method comprising: discriminating whether or not an anomaly has occurred in a system by using a first encoder and a DeepSVDD included in a model learned by the learning apparatus according to claim 1 and data of the system which is an anomaly detection target.

    7. A non-transitory computer-readable recording medium having stored therein a program for causing a computer to execute the learning method according to claim 5.

    8. The learning apparatus according to claim 1, wherein the program instructions cause the processor to determine whether or not an anomaly has occurred in a system by using a first encoder and a DeepSVDD included in the learned model and data of the system which is an anomaly detection target.

    Description

    BRIEF DESCRIPTION OF DRAWINGS

    [0011] FIG. 1 is a diagram schematically illustrating an example of a model.

    [0012] FIG. 2 is a diagram illustrating an example of a hardware configuration of an anomaly detecting apparatus according to this embodiment.

    [0013] FIG. 3 is a diagram illustrating an example of a functional configuration of the anomaly detecting apparatus according to the embodiment.

    [0014] FIG. 4 is a flowchart illustrating an example of a flow of overall processes executed by the anomaly detecting apparatus according to the embodiment.

    DESCRIPTION OF EMBODIMENTS

    [0015] Hereinafter, an embodiment of the present invention will be described. In the embodiment, an unsupervised anomaly detecting technique of transferring information obtained at the time of learning a normal state of an ICT system having a large amount of normal data to an ICT system having a small amount of normal data will be described focusing on a point that a configuration and a function are different for each ICT system, but in a case where the ICT system has a similar configuration and a similar function, ICT systems have similar normal states. This unsupervised anomaly detecting technique enables to obtain an anomaly detector capable of detecting an anomaly in an ICT system (hereinafter, also referred to as a target system) having only a small amount of normal data.

    [0016] In addition, an anomaly detecting apparatus 10 that creates an anomaly detector and detects an anomaly of the target system by the anomaly detector by the unsupervised anomaly detecting technique will be described.

    <Unsupervised Anomaly Detecting Technique>

    [0017] Hereinafter, a theoretical configuration of the unsupervised anomaly detecting technique according to the embodiment will be described.

    [0018] First, an ICT system having a large amount of normal data is set as a source domain S, and an ICT system (target system) having a small amount of normal data is set as a target domain T.

    [0019] In addition, assuming that one item of normal data obtained from the source domain S is n-dimensional vector data x.sub.s=[x.sub.l.sup.(s), . . . , x.sub.n.sup.(s)], a data set including the n-dimensional vector data x.sub.s is set as follows.

    [00001] D S = { x 1 , .Math. x .Math. "\[LeftBracketingBar]" D S .Math. "\[RightBracketingBar]" } [ Math . 1 ]

    Here, n represents the number of types of data obtained in the source domain S, and |D.sub.3| represents the number of n-dimensional vector data.

    [0020] Similarly, assuming that one item of normal data obtained from the target domain T is m-dimensional vector data x.sub.t=[x.sub.1.sup.(t), . . . , x.sub.m.sup.(t)], a data set including the m-dimensional vector data x.sub.t is set as follows.

    [00002] D T = { x 1 , .Math. x .Math. "\[LeftBracketingBar]" D T .Math. "\[RightBracketingBar]" } [ Math . 2 ]

    Here, m represents the number of types of data obtained in the target domain T, and |D.sub.T| represents the number of m-dimensional vector data.

    [0021] Next, a model used in the unsupervised anomaly detecting technique according to the embodiment will be described. As a technique for detecting an anomaly in each of the source domain S and the target domain T, an encoder which is a type of DL and a deep support vector data description (DeepSVDD: Support vector data description method using deep learning) (Reference Literature 1) are used. More specifically, after input data (normal data) is compressed by an encoder E, learning is performed so that a hypersphere (a volume thereof or an area of hyperspherical plane) is minimized when the compressed data (feature value) is mapped onto the hyperspherical plane by DeepSVDD. Accordingly, after the learning, anomaly detection is performed by the SVDD and an encoder having data of the target domain as an input. Hereinafter, the encoder of the source domain S is denoted by E.sub.S, a parameter thereof is denoted by .sub.S, the encoder of the target domain is denoted by E.sub.T, and a parameter thereof is denoted by .sub.E.

    [0022] In addition, as a technique for transferring knowledge in a certain domain to another domain, a technique of using a generative adversarial network (GAN) is known (Reference Literature 2). In this respect, in the unsupervised anomaly detecting technique according to the embodiment, a model obtained by combining the encoders E.sub.S and E.sub.T and the DeepSVDD is used in the GAN-based transfer learning technique.

    [0023] Specifically, by extracting feature values from the source domain S and the target domain T by the encoders E.sub.S and E.sub.T, respectively, an expression (feature) that can be transferred from the source domain S is acquired and applied to the target domain T. In addition, a discriminator D(; .sub.D) that discriminates which domain causes output E.sub.S(x.sub.s; .sub.S) or E.sub.T(x.sub.t; .sub.T) of the encoder E.sub.S(; .sub.S) or E.sub.T(; .sub.T) to be made (that is, discriminates which domain causes input data of the encoders to be extracted, from the outputs of the encoders) is prepared. Here, the discriminator D(; .sub.D) is expressed by a neural network, and .sub.D represents a parameter thereof. That is, the discriminator D is a neural network that discriminates whether the input data is data from the source domain or data from the target domain. Further, a dimension of an input layer of the discriminator D(; .sub.D) and a dimension of an input layer of the encoders E.sub.S(; .sub.S) and E.sub.T(; .sub.T) are the same (for example, both dimensions are a dimension number k).

    [0024] The above-described model is a learning target. Hereinafter, i is an index of data input to the model which is the learning target, and i=1, . . . , |D.sub.S|+|D.sub.T|. In addition, it is assumed that only either s.sub.i{1, . . . , |D.sub.S|} or t.sub.i{1, . . . , |D.sub.T|} is satisfied for each i. Further, regarding i with which s.sub.i{1, . . . , |D.sub.S|} is not satisfied, for example, s.sub.i=|D.sub.S|+1 or the like may be satisfied. Similarly, regarding i with which t.sub.i{1, . . . , |D.sub.T|} is not satisfied, for example, t.sub.i=|D.sub.T|+1 or the like may be satisfied. Hereinafter, symbols (subscripts) of s.sub.i and t.sub.i assigned at a lower right position of x are expressed as x.sub.s_i and x.sub.s_i in the text of the specification.

    [0025] In this case, a schematic diagram of the model which is the learning target is illustrated in FIG. 1. In a case where either x.sub.s_i or x.sub.t_i is input to the model for each i, and x.sub.s_i is input to the model as illustrated in FIG. 1, an output E.sub.S(x.sub.s_i; .sub.S) of an encoder of the source domain is input to both the DeepSVDD DSVDD and the discriminator D. On the other hand, in a case where x.sub.t_i is input to the model, an output E.sub.T (x.sub.t_i; .sub.T) of an encoder of the target domain is input to both the DeepSVDD DSVDD and the discriminator D.

    [0026] Next, a loss function of the model which is the learning target is defined.

    [0027] First, a loss function of the discriminator D(; .sub.D) is defined below.

    [00003] L D ( S , T , D ) = { L B ( D ( E S ( x s i ; S ) ; D , d i ) , for i N S L B ( D ( E T ( x t i ; T ) ; D , d i ) , for i N T [ Math . 3 ]

    [0028] Here, L.sub.S represents binary cross entropy. In addition, d.sub.i represents whether an i-th item of data is in the source domain or the target domain. For example, when the i-th item of data input to the model is in the source domain, d.sub.i=0 is set, and when the i-th item of data is in the target domain, d.sub.i=1 is set. Further, N.sub.S is a set of i satisfying s.sub.i{1, . . . , |D.sub.S|}, and N.sub.T is a set of i satisfying t.sub.i{1, . . . , |D.sub.T|}. Further, note that input data to the model is x.sub.s_iD.sub.S when iN.sub.S, and input data to the model is x.sub.t_iD.sub.T when iN.sub.T.

    [0029] Next, a loss function of the DeepSVDD DSVDD (; .sub.DSVDD) is defined below.

    [00004] L DSVDD ( S , T , DSVDD ) = [ Math . 4 ] { .Math. DSVDD ( E S ( x s i ; S ) ; DSVDD ) - c .Math. 2 , for i N S .Math. DSVDD ( E T ( x T i ; T ) ; DSVDD ) - c .Math. 2 , for i N T

    [0030] Here, .sub.DSVDD represents a parameter of DSVDD. In addition, c is a constant given in advance before the learning of the model but is calculated from a data set for learning (that is, D.sub.S and D.sub.T). Specifically, after parameters .sub.S and .sub.T of the encoders E.sub.S(; .sub.S) and E.sub.T(; .sub.T) and the parameter .sub.DSVDD of the DeepSVDD DSVDD (; .sub.DSVDD) are initialized, DSVDD (E.sub.S (x.sub.s; .sub.S); .sub.DSVDD) and DSVDD(E.sub.S(x.sub.t; .sub.T); .sub.DSVDD) are calculated for each x.sub.sD.sub.S and each x.sub.tD.sub.T, and an average thereof is defined as c.

    [0031] Collectively, the loss function of the model which is the learning target is defined below.

    [00005] L ( S , T , DSVDD , D ) = [ Math . 5 ] 1 .Math. "\[LeftBracketingBar]" N S .Math. "\[RightBracketingBar]" .Math. i N S ( L D + L DSVDD ) + 1 .Math. "\[LeftBracketingBar]" N T .Math. "\[RightBracketingBar]" .Math. i N T ( L D + L DSVDD ) + 2 .Math. l = 1 L .Math. W l .Math. F 2

    [0032] Here, W.sup.1 represents a weight of an 1-th layer of a neural network representing DeepSVDD DSVDD, L represents the number of layers, and represents a hyperparameter. In addition, .sub.F represents the Frobenius norm. Further, note that W.sup.1 is included in the parameter .sub.DSVDD. In addition, note that |N.sub.S|=|D.sub.S| and |N.sub.T|=|D.sub.T|.

    [0033] Accordingly, parameter learning is performed to minimize a loss function expressed in Expression 5 above. That is, each parameter learning is performed to minimize a hypersphere around c with respect to the encoders E.sub.S and E.sub.T and the DeepSVDD DSVDD and maximize discrimination performance of the discriminator D with respect to the encoders E.sub.S and E.sub.T and the discriminator D. Specifically, the parameters learning is performed as follows.

    [00006] min S , T , DSVDD max D L ( S , T , DSVDD , D ) [ Math . 6 ]

    [0034] Further, various parameter learning techniques can be considered, and appropriate techniques may be used. For example, optimization using Adam (Reference Literature 3) can be used.

    [0035] After learning is performed by Expression 6 above, anomaly detection is performed by DSVDD(E.sub.T(; .sub.T); .sub.DSVDD) which completes the learning. Specifically, data which is an anomaly detection target in the target domain is set as x, and DSVDD (E.sub.T (x; .sub.T); .sub.DSVDD) is calculated. In a case where a difference between DSVDD(E.sub.T(x; .sub.T); .sub.DSVDD) and c exceeds a predetermined threshold, this case is determined as abnormal, otherwise a case is determined as normal. Further, the threshold can be optionally set, and for example, it is conceivable to use +.sup.2, .sup.2, or the like as a threshold when a difference between DSVDD(E.sub.T(x.sub.t; .sub.T); .sub.DSVDD) and c for each x.sub.tD.sub.T is calculated using DSVDD(E.sub.T(; .sub.T); .sub.DSVDD) obtained after the learning, an average thereof is represented by , and a variance thereof is represented by .sup.2.

    <Hardware Configuration of Anomaly Detecting Apparatus 10>

    [0036] Next, a hardware configuration of an anomaly detecting apparatus 10 according to the embodiment will be described with reference to FIG. 2. FIG. 2 is a diagram illustrating an example of the hardware configuration of the anomaly detecting apparatus 10 according to this embodiment.

    [0037] As illustrated in FIG. 2, the anomaly detecting apparatus 10 according to the embodiment is realized by a hardware configuration of a general computer or computer system and includes an input device 101, a display device 102, an external I/F 103, a communication I/F 104, a processor 105, and a memory device 106. These devices of hardware are communicatively connected to each other via a bus 107.

    [0038] The input device 101 is, for example, a keyboard, a mouse, a touch panel, or the like. The display device 102 is, for example, a display or the like.

    [0039] The external I/F 103 is an interface with an external device such as a recording medium 103a. The anomaly detecting apparatus 10 can read and write in the recording medium 103a via the external I/F 103. Further, examples of the recording medium 103a include a compact disc (CD), a digital versatile disk (DVD), a secure digital memory card (SD memory card), and a universal serial bus (USB) memory card.

    [0040] The communication I/F 104 is an interface for connecting the anomaly detecting apparatus 10 to a communication network. The processor 105 is, for example, any of various arithmetic devices such as a central processing unit (CPU) and a graphics processing unit (GPU). The memory device 106 is any of various storage devices such as a hard disk drive (HDD), a solid state drive (SSD), a random access memory (RAM), a read only memory (ROM), and a flash memory.

    [0041] The anomaly detecting apparatus 10 according to the embodiment has the hardware configuration illustrated in FIG. 2, thereby being able to realize various types of processes to be described below. Further, the hardware configuration illustrated in FIG. 2 is an example, and the anomaly detecting apparatus 10 may have another hardware configuration. For example, the anomaly detecting apparatus 10 may include a plurality of the processors 105 or may include a plurality of the memory devices 106.

    <Functional Configuration of Anomaly Detecting Apparatus 10>

    [0042] Next, another hardware configuration of the anomaly detecting apparatus 10 according to the embodiment will be described with reference to FIG. 3. FIG. 3 is a diagram illustrating an example of the functional configuration of the anomaly detecting apparatus 10 according to the embodiment.

    [0043] As illustrated in FIG. 3, the anomaly detecting apparatus 10 according to the embodiment includes a learning unit 201, an inference unit 202, and a user interface unit 203. Each of these units is realized, for example, by processes executed by the processor 105 according to one or more programs installed in the anomaly detecting apparatus 10.

    [0044] In addition, the anomaly detecting apparatus 10 according to the embodiment includes a target domain DB 204, a source domain DB 205, and a learned model DB 206. Each of these DBs (databases) is realized by, for example, the memory device 106.

    [0045] The learning unit 201 learns the model (that is, model including the encoders E.sub.S and E.sub.T, the DeepSVDD DSVDD, and the discriminator D) illustrated in FIG. 1, using the m-dimensional vector data x.sub.t stored in the target domain DB 204 and the n-dimensional vector data x.sub.s stored in the source domain DB 205. The model (hereinafter, also referred to as learned model) learned by the learning unit 201 is stored in the learned model DB 206.

    [0046] When DSVDD(E.sub.T(; .sub.T); .sub.DSVDD) included in the learned model stored in the learned model DB 206 is used as an anomaly detector, the inference unit 202 determines whether or not an anomaly has occurred in a target system, using the anomaly detector and the m-dimensional vector data x as the anomaly detection target.

    [0047] The user interface unit 203 outputs the determination result by the inference unit 202 to the user. For example, the user interface unit 203 outputs the determination result to a terminal or the like used by an operator or the like of the target system.

    [0048] The target domain DB 204 stores a data set D.sub.T of the target domain T. The source domain DB 205 stores a data set D.sub.S of the source domain S. The learned model DB 206 stores a learned model.

    [0049] Further, the functional configuration of the anomaly detecting apparatus 10 illustrated in FIG. 3 is an example, and other functional configurations may be used. For example, each functional unit and each DB may be arranged in a plurality of devices.

    <Flow of Overall Processes Executed by Anomaly Detecting Apparatus 10>

    [0050] Next, a flow of overall processes executed by the anomaly detecting apparatus 10 according to the embodiment will be described with reference to FIG. 4. FIG. 4 is a flowchart illustrating an example of a flow of the overall processes executed by the anomaly detecting apparatus 10 according to the embodiment. Here, Step S101 in FIG. 4 is a process in a learning phase, and Steps S102 and S103 are processes in an inference phase. Further, the learning phase is a phase in which a model is learned, and the inference phase is a phase in which inference (that is, anomaly detection) is performed using a learned model. The learning phase is performed in advance before the inference phase.

    [0051] Step S101: The learning unit 201 learns the model illustrated in FIG. 1 using the m-dimensional vector data x.sub.t stored in the target domain DB 204 and the n-dimensional vector data x.sub.s stored in the source domain DB 205. That is, the learning unit 201 learns the parameters of the model by Expression 6, by using an optimization technique such as Adam.

    [0052] Step S102: The inference unit 202 determines whether or not an anomaly has occurred in a target system, using the anomaly detector and the m-dimensional vector data x.sub.t of the anomaly detection target, when DSVDD (ET(; T); .sub.DSVDD) included in the learned model stored in the learned model DB 206 is used as an anomaly detector. That is, the inference unit 202 determines that, when a difference between DSVDD(E.sub.T(x; .sub.T); .sub.DSVDD) and c exceeds a predetermined threshold, an anomaly has occurred and otherwise the target system is normal.

    [0053] Step S103: The user interface unit 203 outputs the determination result (normal or abnormal) of Step S102 to the user. Further, the user interface unit 203 may output the result to the user only when the determination result of Step S102 is abnormal.

    [0054] As described above, the anomaly detecting apparatus 10 according to the embodiment can detect the anomaly of the target system by the unsupervised anomaly detecting technique using the DL by transferring the information on the normal state of the ICT system having a large amount of normal data even when there is only a small amount of normal data of the target system.

    [0055] Further, as described above, the anomaly detecting apparatus 10 has the learning phase and the inference phase, and in the embodiment, the same anomaly detecting apparatus 10 executes the learning phase and the inference phase, but these phases may be executed by different respective apparatuses. In addition, the anomaly detecting apparatus 10 in the learning phase may be referred to as a learning apparatus or the like.

    [0056] The present invention is not limited to the above-mentioned specifically disclosed embodiments, and various modifications and changes, combinations with known technologies, and the like can be made without departing from the scope of the claims.

    REFERENCE LITERATURE

    [0057] Reference Literature 1: Ruff, Lukas, et al. Deep one-class classification. International conference on machine learning. PMLR, 2018. [0058] Reference Literature 2: Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M.: Domain adversarial neural networks. arXiv preprint arXiv:1412.4446 (2014) [0059] Reference Literature 3: Ruder, Sebastian. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747 (2016).

    REFERENCE SIGNS LIST

    [0060] 10 Anomaly detecting apparatus [0061] 101 Input device [0062] 102 Display device [0063] 103 External I/F [0064] 103a Recording medium [0065] 104 Communication I/F [0066] 105 Processor [0067] 106 Memory device [0068] 107 Bus [0069] 201 Learning unit [0070] 202 Inference unit [0071] 203 User interface unit [0072] 204 Target domain DB [0073] 205 Source domain DB [0074] 206 Learned model DB