METHOD FOR DETECTING ABNORMAL SESSION
20190095301 ยท 2019-03-28
Inventors
- Sang Gyoo SIM (Seoul, KR)
- Duk Soo KIM (Seoul, KR)
- Seok Woo LEE (Seoul, KR)
- Seung Young PARK (Chuncheon-si, KR)
Cpc classification
International classification
G06F11/22
PHYSICS
Abstract
Provided is a method for detecting an abnormal session including a request message received by a server from a client and a response message generated by the server, the method including transforming at least a part of messages included in the session into data in the form of a matrix, transforming the data in the form of the matrix into a representation vector a dimension of which is lower than a dimension of the matrix of the data using a convolutional neural network, and determining whether the session is abnormal by arranging the representation vectors obtained from the messages in an order in which the messages are generated to compose a first representation vector sequence, and analyzing the first to representation vector sequence using an long short term memory (LSTM) neural network.
Claims
1. A method for detecting an abnormal session including a request message received by a server from a client and a response message generated by the server, the method comprising: transforming at least a part of messages included in the session into data in the form of a matrix; transforming the data in the form of the matrix into a representation vector, a dimension of which is lower than a dimension of the matrix of the data, using a convolutional neural network; and determining whether the session is abnormal by arranging the representation vectors obtained from the messages in an order in which the messages are generated to compose a first representation vector sequence, and analyzing the first representation vector sequence using a long short-term memory (LSTM) neural network, wherein the determining of whether the session is abnormal includes determining whether the session is abnormal on the basis of a difference between the first representation vector sequence and the second representation vector sequence.
2. The method of claim 1, wherein the transforming of the at least a part of the messages into the data in the form of the matrix includes transforming each of the messages into data in the form of a matrix by transforming a character included in each of the messages into a one-hot vector.
3. The method of claim 1, wherein the LSTM neural network includes an LSTM encoder including a plurality of LSTM layers and an LSTM decoder having a structure symmetrical to the LSTM encoder.
4. The method of claim 3, wherein the LSTM encoder sequentially receives the representation vectors included in the first representation vector sequence and outputs a hidden vector having a predetermined magnitude, and the LSTM decoder receives the hidden vector and outputs a second representation vector sequence corresponding to the first representation vector sequence.
5. (canceled)
6. The method of claim 4, wherein the LSTM decoder outputs the second representation vector sequence by outputting estimation vectors, each corresponding to one of the representation vectors included in the first representation vector sequence, in a reverse order to an order of the representation vectors included in the first representation vector sequence.
7. The method of claim 1, wherein the LSTM neural network sequentially receives the representation vectors included in the first representation vector sequence and outputs an estimation vector with respect to a representation vector immediately following the received representation vector.
8. The method of claim 7, wherein the determining of whether the session is abnormal includes determining whether the session is abnormal on the basis of a difference between the estimation vector output by the LSTM neural network and the representation vector received by the LSTM neural network.
9. The method of claim 1, further comprising training the convolutional neural network and the LSTM neutral network.
10. The method of claim 9, wherein the convolutional neural network is trained by: inputting training data to the convolutional neural network; inputting an output of the convolutional neural network to a symmetric neural network having a structure symmetrical to the convolutional neural network; and updating weight parameters used in the convolutional neural network on the basis of a difference between the output of the symmetric neural network and the training data.
11. The method of claim 9, wherein the LSTM neural network includes an LSTM encoder including a plurality of LSTM layers and an LSTM decoder having a structure symmetrical to the LSTM encoder, and the LSTM neural network is trained by: inputting training data to the LSTM encoder; inputting a hidden vector output from the LSTM encoder and the training data to the LSTM decoder; and updating weight parameters used in the LSTM encoder and the LSTM decoder on the basis of a difference between an output of the LSTM decoder and the training data.
12-18. (canceled)
Description
BRIEF DESCRIPTION OF DRAWINGS
[0028] Example embodiments of the present invention will become more apparent by describing example embodiments of the present invention in detail with reference to the accompanying drawings, in which:
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
[0045]
DETAILED DESCRIPTION
[0046] While the present invention is susceptible to various modifications and alternative embodiments, specific embodiments thereof are shown by way of example in the drawings and will be described. However, it should be understood that there is no intention to limit the present invention to the particular embodiments disclosed, but on the contrary, the present invention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the present invention.
[0047] It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, the elements should not be limited by the terms. The terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of the present invention. As used herein, the term and/or includes any and all combinations of one or more of the associated listed items
[0048] It will be understood that when an element is referred to as being connected or coupled to another element, it can be directly connected or coupled to another element or intervening elements may be present. In contrast, when an element is referred to as being directly connected or directly coupled to another element, there are no intervening elements present.
[0049] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the present invention. As used herein, the singular forms a, an, and the are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms comprises, comprising, includes, and/or including, when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
[0050] Unless otherwise defined, all terms including technical and scientific terms and used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
[0051] Hereinafter, example embodiments of the present invention will be described with reference to the accompanying drawings in detail. For better understanding of the present invention, same reference numerals are used to refer to the same elements through the description of the figures, and the description of the same elements will be omitted.
[0052]
[0053] The apparatus 100 shown in
[0054] Referring to
[0055] The processor 110 may execute a program command stored in the memory 120 and/or the storage device 125. The processor 110 may refer to a central processing unit (CPU), a graphics processing unit (GPU), or a dedicated processor by which the methods according to the present invention are performed. The memory 120 and the storage device 160 may include a volatile storage medium and/or a non-volatile storage medium. For example, the memory 120 may include a read only memory (ROM) and/or a random-access memory (RAM).
[0056] The memory 120 may store at least one command that is executed by the processor 110.
[0057] The commands stored in the memory 120 may be updated through machine learning of the processor 110. The processor 110 may change commands stored in memory through machine learning. The machine learning performed by the processor 110 may be implemented in a supervised learning method or an unsupervised learning method. However, the example embodiment is not limited thereto. For example, the machine learning may be implemented in other methods such as a reinforcement learning method and the like.
[0058]
[0059] Referring to
[0060]
[0061] Referring to
[0062] Referring again to
[0063] The processor 110 may transform each of the extracted messages into data in the form of a matrix. The processor 110 may transform a character included in each of the messages into a one-hot vector.
[0064]
[0065] Referring to
[0066] The one-hot vector may include only one component having a value of one and the remaining components having a value of zero, or may include all components having a value of zero. In the one-hot vector, the position of a component having a value of 1 may vary with the type of the character represented by the one hot vector. For example, as shown in
[0067] In the one-hot vector, the position of a component having a value of 1 may vary with the order of the character represented by the one-hot vector.
[0068] When a total number of the types of characters is F.sup.(0) (e.g., 69 (twenty-six alphabetic characters, ten numbers from zero to nine, new line, thirty-three special characters), the processor 110 may transform each message into a matrix having a magnitude of F.sup.(0)L.sup.(0). When the length of the message is smaller than L.sup.(0), any of missing representation vectors may be transformed to a zero-representation vector. As another example, when the length of the message is larger than L.sup.(0), only the characters corresponding in number to L.sup.(0) may be transformed to one-hot vectors.
[0069] Referring again to
[0070]
[0071] Referring to
[0072] The convolutional neural network may extract a feature of input data and generate output data having a scale smaller than that of the input data and output the generated output data. The convolutional neural networks may receive data in the form of an image or matrix.
[0073] The convolution and pooling layer may receive matrix data and perform the convolution operation on the received matrix data.
[0074]
[0075] Referring to
[0076] The processor 110 may perform the convolution operation on the image 0I while changing the position of the kernel FI on the image 0I. The processor 110 may output a convolution image from the calculated convolution values.
[0077]
[0078] Since the number of cases in which the filter kernel FI shown in
[0079]
[0080] In
[0081] The convolution and pooling layer Layer 1 may perform a pooling operation on each of the feature maps output by the convolution operation, thereby reducing the size of the feature map. The pooling operation may be an operation of merging adjacent pixels in the feature map to obtain a single representative value. According to the pooling operation in the convolution and pooling layer, the size of the feature map may be reduced.
[0082] The representative value may be obtained in various methods. For example, the processor 110 may determine a maximum value among values of pq adjacent pixels in the feature map to be the representative value. As another example, the processor 110 may determine the average value of values of pq adjacent pixels in the feature map to be the representative value.
[0083] Referring again to
a.sub.k.sup.(N.sup.
[0084] The feature maps output from the last convolution and pooling layer Layer Nc may be input to the first full connected layer Layer N.sub.c+1. The first fully connected layer may transform the received feature maps to a one-dimensional representation vector a.sup.(N.sup.
[0085] The first fully connected layer may multiply the transformed one-dimensional representation vector by a weight matrix. For example, the operation performed by the first fully connected layer may be represented by Equation 1.
[0086] In Equation 1, W.sup.(N.sup.
[0087] Referring to Equation 1, the first fully connected layer may output the representation vector having a magnitude of A.sup.N.sup.
[0088] Referring to
[0089] In Equation 2, a.sup.(1)(t) denotes an output representation vector of the first fully connected layer. w.sup.(l)(t, u) denotes the weight matrix used by the first fully connected layer. .sup.(l) denotes an activation function used by the l.sup.th fully connected layer. a.sup.(tl)(u) denotes the output representation vector of a l1.sup.th fully connected layer, and may be an input representation vector for the first fully connected layer.
[0090] An output layer may receive an output representation vector .sup.a.sup.
[0091] In Equation 3, x.sup.(N.sup.
[0092] The output layer may calculate final output values for the classes of the output representation vector z.sup.(N.sup.
{circumflex over ()}(t)=.sup.N.sup.
[0093] In Equation 4, .sup.(N.sup.
[0094] As another example, the output layer may calculate the final output value using a softmax function. The process of calculating the final output representation vector in the output layer may be expressed by Equation 5.
[0095] Referring to Equation 5, the output layer may calculate the final output value using an exponential function for a class value of the output representation vector.
[0096] With 0c1 shown in Equations 3 to 5, the convolutional neural network may output the representation vector having a magnitude of C1. That is, the convolutional neural network may receive matrix data having a magnitude of F.sup.(0)L.sup.(0) and output the representation vector having a magnitude of C1.
[0097] The convolutional neural network may also be trained by an unsupervised learning method. The training method for the convolutional neural network will be described below with reference to
[0098] Referring again
x.sub.0, x.sub.1, . . . x.sub.S1
[0099] x.sub.1 may denote a representation vector generated from a t.sup.th message of the session (a request message or a response message).
[0100] In operation S160, the processor 110 may determine whether the session is abnormal by analyzing the first representation vector sequence. The processor 110 may analyze the first representation vector sequence using a long short-term memory (LSTM) neural network. The LSTM neural network may avoid a long-term dependence of a recurrent neural network (RNN) by selectively updating a cell state in which information is stored. Hereinafter, the LSTM neural network will be described.
[0101]
[0102] Referring to
[0103] An n.sup.th layer may receive a hidden vector h.sub.t.sup.n1 from an (n1).sup.th layer. The n.sup.th layer may output a hidden vector h.sub.t.sup.n by using the hidden vector h.sub.t1.sup.n with respect to a previous representation vector and the hidden vector h.sub.t.sup.n1 received from the (n1).sup.th layer.
[0104] Hereinafter, an operation of each of the layers of the LSTM neural network will be described. In the following description, the operations of the layers will be described with reference to the 0.sup.th layer. The n.sup.th layer may operate in a similar manner as that in the operation of the 0.sup.th layer except for receiving the hidden vector h.sub.t.sup.n1 instead of the representation vector .sup.x.sup.
[0105]
[0106] Referring to
[0107] The forget gate 810 may calculate f.sub.t by using a t.sup.th representation vector .sup.x.sub.t, a previous cell state c.sub.t1, and a hidden vector h.sub.t1 with respect to a previous representation vector. The forget gate 810 may determine information which is to be discarded among the existing information and the extent to which the information is discarded during the calculation of f.sub.t. The forget gate 810 may calculate f.sub.t using Equation 6.
f.sub.t=(W.sub.xfx.sub.t+w.sub.hfh.sub.(t1)+W.sub.cfc.sub.(t1)+b.sub.f) [Equation 6]
[0108] In Equation 6, denotes a sigmoid function. b.sub.f denotes a bias. w.sub.xt denotes a weight for .sup.x.sub.t, and W.sub.ht denotes a weight for h.sub.t1, and W.sub.cf denotes a weight for c.sub.t1.
[0109] The input gate 850 may determine new information which is to be reflected in the cell state. The input gate 850 may calculate new information to be reflected in the cell state using Equation 7.
i.sub.t=(W.sub.xix.sub.t+W.sub.hih.sub.(t1)+W.sub.cic.sub.(t1)+b.sub.i) [Equation 7]
[0110] In Equation 7, denotes a sigmoid function. b.sub.i denotes a bias. W.sub.xi denotes a weight for .sup.x.sub.t, and W.sub.hi denotes a weight for h.sub.t1, and W.sub.ci denotes a weight for c.sub.t1.
[0111] The input gate 850 may calculate a candidate value for a new cell state c.sub.t. The input gate 850 may calculate the candidate value
using Equation 8.
=tanh(W.sub.xcx.sub.t+W.sub.hch.sub.(t1)+b.sub.c) [Equation 8]
[0112] In Equation 8, b.sub.c denotes a bias. W.sub.xc denotes a weight for x.sub.t and W.sub.hc denotes a weight for h.sub.i1.
[0113] The cell line may calculate the new cell state c.sub.t using f.sub.t, f.sub.t, and .
[0114] For example, c.sub.t may be calculated by Equation 9.
c.sub.t=f.sub.t*c.sub.t1+i.sub.t*[Equation 9]
[0115] Referring to Equation 8, Equation 9 may be expressed as Equation 10.
c.sub.t=f.sub.tc.sub.(t1)+i.sub.t tanh(W.sub.xcx.sub.t+w.sub.hch.sub.(t1)+b.sub.c) [Equation 10]
[0116] The output gate 860 may calculate an output value using the cell state c.sub.t. For example, the output gate 860 may calculate the output value according to Equation 11.
o.sub.t=(W.sub.xox.sub.t+W.sub.hoh.sub.(t1)+W.sub.coc.sub.t+b.sub.o) [Equation 11]
[0117] In Equation 11, denotes a sigmoid function. b.sub.o denotes a bias. W.sub.xo denotes a weight for x.sub.t, and W.sub.ho denotes a weight for h.sub.t1, and W.sub.co denotes a weight for c.sub.t.
[0118] The LSTM layer may calculate the hidden vector h.sub.t for the representation vector x.sub.t using the output value o.sub.tand the new cell state c.sub.t. For example, h.sub.t may be calculated according to Equation 12.
h.sub.t=o.sub.t tanh(c.sub.t) [Equation 12]
[0119] The LSTM neural network may include an LSTM encoder and an LSTM decoder having a structure symmetrical to the LSTM encoder. The LSTM encoder may receive a first representation vector sequence. The LSTM encoder may receive the first representation vector sequence and output a hidden vector having a predetermined magnitude. The LSTM decoder may receive the hidden vector output from the LSTM encoder. The LSTM decoder may intactly use the same weight matrix and bias value as those used in the LSTM encoder. The LSTM decoder may output a second representation vector sequence corresponding to the first representation vector sequence. In the LSTM decoder, the second representation vector sequence may include estimation vectors corresponding to the representation vectors included in the first representation vector sequence. The LSTM decoder may output the estimated vectors in a reverse order. That is, the LSTM decoder may output the estimated vectors in the reverse order to the order of the representation vectors in the first representation vector sequence.
[0120]
[0121] Referring to
[0122] Upon receiving the last representation vector x.sub.(S1) of the first representation vector sequence, the LSTM encoder may output hidden vectors h.sub.(S1).sup.(0) to h.sub.(S1).sup.(N.sup.
[0123]
[0124] The LSTM decoder may receive the hidden vectors h.sub.(S1).sup.(0) to h.sub.(S1).sup.(N.sup.
[0125] The LSTM decoder may output the second representation vector sequence {circumflex over (x)}.sub.(S1), x.sub.(S2), . . . including estimation vectors with respect to the first representation vector sequence x.sub.0, x.sub.1, . . . x.sub.S1. The LSTM decoder may output the estimated vectors in the reverse order (an order reverse to the order of the representation vectors in the first representation vector sequence).
[0126] The LSTM decoder may output hidden vectors h.sub.(S2).sup.(0) to h.sub.(S2).sup.(N.sup.
[0127] When the LSTM decoder outputs the second representation vector sequence {circumflex over (x)}.sub.(S1), {circumflex over (x)}.sub.(S2), . . . {circumflex over (x)}.sub.0, the processor 110 may compare the second representation vector sequence with the first representation vector sequence. For example, the processor 110 may determine whether the session is abnormal using Equation 13.
[0128] In Equation 13, S denotes the number of messages (a request message or a response message) extracted from the session. x.sub.t is a representation vector output from a t.sup.th message, and {circumflex over (x)}.sub.t is an estimated vector that is output by the LSTM decoder and corresponds to x.sub.t. The processor 110 may determine whether a difference between the first representation vector sequence and the second representation vector sequences is smaller than a predetermined reference value . When the difference between the first and second representation vector sequences is greater than the reference value , the processor 110 may determine that the session is abnormal.
[0129] In the above description, an example has been described in which the LSTM neural network includes an LSTM encoder and an LSTM decoder. However, the example embodiment is not limited thereto. For example, the LSTM neural network may directly output an estimated vector.
[0130]
[0131] Referring to
[0132] For example, the LSTM neural network may receive x.sub.0 and output an estimated vector {circumflex over (x)}.sub.1 with respect to x.sub.1. Similarly, the LSTM neural network may receive x.sub.t1 and output {circumflex over (x)}.sub.t. The processor 110 may determine whether the session is abnormal based on the difference between the estimation vectors {circumflex over (x)}.sub.1, {circumflex over (x)}.sub.2, . . . {circumflex over (x)}.sub.S1 output by the LSTM neural network and the representation vectors x.sub.1, x.sub.2, . . . x.sub.S1 received by the LSTM neural network. For example, the processor 110 may use determine whether the session is abnormal using Equation 14.
[0133] The processor 110 may determine whether the difference between the representation vectors x.sub.1, x.sub.2, . . . x.sub.S1 and the estimated vectors {circumflex over (x)}.sub.1, {circumflex over (x)}.sub.2, . . . x.sub.S1, is smaller than a predetermined reference value . When the difference is greater than the reference value , the processor 110 may determine that the session is abnormal.
[0134] In the above description, an example in which the processor 110 determines whether the session is abnormal using the LSTM neural network has been described. However, the example embodiment is not limited thereto. For example, in operation S160, the processor 110 may determine whether the session is abnormal using a gated recurrent unit (GRU) neural network.
[0135]
[0136] Referring to
[0137] An n.sup.th layer may receive s.sub.t.sup.n1 from an (n1).sup.th layer. As another example, the n.sup.th layer may receive s.sub.t.sup.n1 and x.sub.t from the (n1).sup.th layer. The n.sup.th layer may output a hidden vector s.sub.t.sup.n by using a hidden vector s.sub.t1.sup.n with respect to a previous representation vector and the hidden vector s.sub.t.sup.(n1) received from the (n1).sup.th layer.
[0138] Hereinafter, an operation of each of the layers of the GRU neural network will be described. In the following description, an operation of the layer will be described with reference to the 0.sup.th layer. The n.sup.th layer operates in a similar manner as that in the operation of the 0.sup.th layer except for receiving the hidden vector output s.sub.t.sup.(n1) or both the hidden vector output s.sub.t.sup.(n1) and the representation vector x.sub.t, instead of receiving the representation vector x.sub.t.
[0139]
[0140] Referring to
[0141] For example, the reset gate r may calculate a reset parameter r using Equation 15.
r=(x.sub.tU.sup.r=s.sub.t1W.sup.r) [Equation 15]
[0142] In Equation 15, denotes a sigmoid function. U.sup.r denotes a weight for x.sub.t, and W.sup.r denotes a weight for s.sub.t1.
[0143] For example, the update gate z may calculate a update parameter z using Equation 16.
z=(x.sub.tU.sup.z+s.sub.t1W.sup.z) [Equation 16]
[0144] In Equation 16, denotes a sigmoid function. U.sup.r denotes a weight for x.sub.t, and W.sup.z denotes a weight for s.sub.t1.
[0145] The GRU layer may calculate an estimated value h for a new hidden vector according to Equation 17.
h=tanh(x.sub.tU.sup.h+(s.sub.t1 r)W.sup.h) [Equation 17]
[0146] In Equation 17, denotes a sigmoid function. U.sup.h denotes a weight for .sup.x.sup.
[0147] The GRU layer may calculate a hidden vector s.sub.t for x.sub.t by using h calculated in Equation 17. For example, the GRU layer may calculate the hidden vector s.sub.t for x.sub.t by using Equation 18.
s.sub.t=(1z) h=z s.sub.t1 [Equation 18]
[0148] The GRU neural network may operate in a similar manner as that in the operation of the LSTM neural network, except for the configuration of each layer. For example, the example embodiments of the LSTM neural network shown in
[0149] For example, the GRU neural network may include a GRU encoder and a GRU decoder similar to that shown in
[0150] The GRU decoder may output a second representation vector sequence {circumflex over (x)}.sub.(S1), {circumflex over (x)}.sub.(S2), . . . including estimation vectors with respect to x.sub.0, x.sub.1, . . . x.sub.S1. The GRU decoder may use the same weight matrix and bias value as those used in the GRU encoder as it is. The GRU decoder may output the estimated vectors in the reverse order (a reverse order to the order of the representation vectors in the first representation vector sequence).
[0151] The processor 110 may compare the first representation vector sequence with the second representation vector sequence using Equation 13, thereby determining whether the session is abnormal.
[0152] As another example, the GRU neural network may not be divided into an encoder and a decoder. For example, the GRU neural network may directly output estimated vectors as described with reference to
[0153] The GRU neural network may receive x.sub.0 and output an estimated vector {circumflex over (x)}.sub.1 for x.sub.1. Similarly, the GRU neural network x.sub.t1 may receive and output .sup.x.sup.
[0154]
[0155] In the following description of the example embodiment of
[0156] Referring to
[0157] For example, the processor 110 may train the convolutional neural network in an unsupervised learning method. As another example, when training data including messages and output representation vectors labeled on the messages exists, the processor 110 may train the convolutional neural network in a supervised learning method.
[0158] In the case of an unsupervised learning, the processor 110 may connect a symmetric neural network having a structure symmetrical to the convolutional neural network to the convolutional neural network. The processor 110 may input the output of the convolutional neural network to the symmetric neural network.
[0159]
[0160] Referring to
[0161] The processor 110 may update weight parameters of the convolutional neural network on the basis of the difference between an output of the symmetric neural network and an input to the convolutional neural network. For example, the processor 110 may determine a cost function on the basis of at least one of a reconstruction error and a mean squared error between the output of the symmetric neural network and the input to the convolutional neural network. The processor 110 may update the weight parameters in a direction that the cost function determined by the above described method is minimized.
[0162] For example, the processor 110 may train the LSTM (GRU) neural network in an unsupervised learning method.
[0163] When the LSTM (GRU) neural network includes an LSTM (GRU) encoder and an LSTM (GRU) decoder, the processor 110 may calculate the cost function by comparing representation vectors input to the LSTM (GRU) encoder with representation vectors output from the LSTM (GRU) decoder. For example, the processor 110 may calculate the cost function using Equation 19.
[0164] In Equation 19, J() denotes a cost function value, Card(T) denotes the number of sessions included in training data, S.sub.n denotes the number of messages included in an n.sup.th training session, x.sub.t.sup.(n) denotes a representation vector corresponding to a t.sup.th message of the n.sup.th training session, x.sub.t.sup.n and denotes an estimated vector output from the LSTM (GRU) decoder, that is, an estimation vector for x.sub.t.sup.(n). In addition, denotes a set of weight parameters of the LSTM (GRU) neural network. For example, in the case of a LSTM neural network, [W.sub.xiW.sub.xi, . . . W.sub.0)
[0165] The processor 110 may update the weight parameters included in in the direction that the cost function J() shown in Equation 19 is minimized.
[0166] The methods for detecting an abnormal session according to the example embodiments of the present invention have been described above with reference to
[0167] As is apparent from the above, messages included in a session are transformed to low dimensional representation vectors using a convolutional neural network. In addition, a representation vector sequence included in the session is analyzed and an abnormality of the session is determined, using an LSTM or GRU neural network. According to example embodiments, it is easily determined whether a session is abnormal using an artificial neural network without an intervention of a manual task.
[0168] The methods according to the present invention may be implemented in the form of program commands executable by various computer devices and may be recorded in a computer readable media. The computer readable media may be provided with each or a combination of program commands, data files, data structures, and the like. The media and program commands may be those specially designed and constructed for the purposes, or may be of the kind well-known and available to those having skill in the computer software arts.
[0169] Examples of the computer readable storage medium include a hardware device constructed to store and execute a program command, for example, a read-only memory (ROM), a random-access memory (RAM), and a flash memory. The program command may include a high-level language code executable by a computer through an interpreter in addition to a machine language code made by a compiler. The described hardware devices may be configured to act as one or more software modules in order to perform the operations of the present invention, or vice versa.
[0170] While the example embodiments of the present invention and their advantages have been described in detail, it should be understood that various changes, substitutions and alterations may be made herein without departing from the scope of the present invention.