METHOD FOR AUTOMATICALLY ANALYZING TRANSACTION LOGS OF A DISTRIBUTED COMPUTING SYSTEM
20210342702 · 2021-11-04
Assignee
Inventors
- Marc Platini (Grenoble, FR)
- Benoît Pelletier (Saint Etienne de Crossey, FR)
- Loïc PAULETTO (Grenoble, FR)
Cpc classification
G06F11/3608
PHYSICS
G06F11/0709
PHYSICS
International classification
Abstract
An aspect of the invention relates to a method for automatically analysing a transaction log of a distributed computing system, comprising a plurality of lines, the method comprising the following steps: For each line: Cutting the line into words; Constructing a comparison vector by comparing the line with the other lines of the same size as the line; Constructing a pattern from the comparison vector; Creating an event per pattern;
Constructing at least one prediction model by training an artificial neural network on a group of training events, the prediction model being configured to predict the next event in the transaction log;
For at least one event: Using the prediction model to predict the event, from a group of prediction events; Generating from the prediction model, a causal graph of the event comprising a causal relation for each event of the group of prediction events responding to a relevance condition.
Claims
1-10. (canceled)
11. A method (100) for automatically analyzing a transaction log comprising at least one transaction log (200) of a distributed computing system, comprising a plurality of lines (201), the method comprising: for each line (201) of the transaction log (200) cutting the line (201) into words and associating the line with a size that is defined as a number of words (101); constructing a comparison vector by comparing the line (201) with other lines (201) of the transaction log (200) associated with a same size as said line (201, 103); constructing a pattern from at least one word of the line (201) from the comparison vector (104); creating an event (401) per pattern (105); constructing at least one prediction model by training in an unsupervised manner an artificial neural network (300) on a group (400_1) of training events (401) comprising at least one event (401), corresponding to a set (202_1) of training lines (201) comprising a plurality of lines (201) of the transaction log (200), the at least one prediction model being configured to predict a next event (401) that will take place in the transaction log (200, 107); for at least one event (401) using the at least one prediction model to predict the event (401) as the next event (401) that will take place in the transaction log (200), from a group (400_2) of prediction events comprising at least one event (401) having taken place before the event (401) in the transaction log (200, 108); generating from the at least one prediction model, a causal graph (500) of the event (401) comprising a causal relation for each event (401) of the group (400_2) of prediction events (401) responding to a relevance condition (109).
12. The method (100) according to claim 11, wherein each word of said at least one word is associated with a position on the line (201) and the method (100) further comprises a step (102) of replacing each word with a descriptor, the comparison vector comprising for each word, an occurrence corresponding to a number of times where the descriptor corresponding to the word appears at the position associated with the word, in another line (201) of the transaction log (200) associated with a same size as the line (201).
13. The method (100) according to claim 12, wherein the step (103) of constructing the comparison vector is carried out using a hash table comprising a plurality of keys each comprised of a descriptor and a position and each being associated with the occurrence corresponding to the number of times when the descriptor appears at the position in another line (201) of the transaction log (200) associated with the same size as the line (201).
14. The Method (100) according to claim 12, wherein the pattern comprises each descriptor of the line (201) associated with an occurrence appearing a maximum number of times in the comparison vector.
15. The method (100) according to claim 11, wherein the artificial neural network (300) is an artificial neural network of a Long Short-Term Memory type.
16. The method (100) according to claim 11, wherein each event (401) is associated with a number of lines (201) having for pattern the pattern corresponding to the event (401), said constructing the at least one prediction model being carried out for at least one sub-group (402) of events (401) of the group (400_1) of training events (401), each sub-group (402) of events (401) grouping together each event (401) of the group (400_1) of training events (401) having a same cardinality, the cardinality of an event (401) being defined as a power of 10 of the number of lines (201) associated with the event (401).
17. The method (100) according to claim 11, wherein said constructing the at least one prediction model is preceded by a step (106) of constructing for each event (401), a numeric vector comprising a value of linguistic similarity for at least one other event (401).
18. The method (100) according to claim 11, wherein the artificial neural network (300) comprises an attention layer (301) and said using the at least one prediction model comprises a sub-step (1081) of generating a weight vector by the attention layer (301), the weight vector comprising a weight for each event (401) of the group (400_2) of prediction events (401).
19. The method (100) according to claim 18, wherein the relevance condition is verified for a given event (401) if the weight of the weight vector corresponding to the event (401) is greater than a threshold, the threshold being a sum of an average of the weights of the weight vector and of a standard deviation of the weights of the weight vector.
20. A computer program product comprising instructions that when executed on a computer lead the computer to implement a method for automatically analyzing a transaction log comprising at least one transaction log (200) of a distributed computing system, comprising a plurality of lines (201), the method comprising: for each line (201) of the transaction log (200) cutting the line (201) into words and associating the line with a size that is defined as a number of words (101); constructing a comparison vector by comparing the line (201) with other lines (201) of the transaction log (200) associated with a same size as said line (201, 103); constructing a pattern from at least one word of the line (201) from the comparison vector (104); creating an event (401) per pattern (105); constructing at least one prediction model by training in an unsupervised manner an artificial neural network (300) on a group (400_1) of training events (401) comprising at least one event (401), corresponding to a set (202_1) of training lines (201) comprising a plurality of lines (201) of the transaction log (200), the at least one prediction model being configured to predict a next event (401) that will take place in the transaction log (200, 107); for at least one event (401) using the at least one prediction model to predict the event (401) as the next event (401) that will take place in the transaction log (200), from a group (400_2) of prediction events comprising at least one event (401) having taken place before the event (401) in the transaction log (200, 108); generating from the at least one prediction model, a causal graph (500) of the event (401) comprising a causal relation for each event (401) of the group (400_2) of prediction events (401) responding to a relevance condition (109).
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0043] The figures are presented for the purposes of information and in no way limit the invention.
[0044]
[0045]
[0046]
[0047]
DETAILED DESCRIPTION OF THE INVENTION
[0048] Unless mentioned otherwise, the same element appearing in different figures has a unique reference.
[0049] A first aspect of the invention relates to a method for automatically analysing at least one transaction log of a distributed computing system.
[0050] “Automatic analysis” means an analysis implemented by computer and conducted without any configuration of the user being required.
[0051] The method according to the invention is implemented by a computer comprising at least one processor and a memory.
[0052] The distributed computing system is for example a large-scale distributed computing system, such as for example a high-performance supercomputer also called HPC (for “High-Performance Computing”) supercomputer.
[0053]
[0054] A transaction log 200 comprises a plurality of lines 201.
[0055] In
[0056]
[0057] A first step 101 of the method 100 consists in cutting each line 201 of the transaction log 200 into words.
[0058] For this, separators are used to separate two consecutive words, such as for example spaces and/or commas and/or semicolons and/or equal signs.
[0059] Consider an example wherein the transaction log 200 includes only the lines 201 of the second set 202_2 of lines 201 wherein: [0060] the line 201_1 is: time wait exceeded to storage [0061] the line 201_2 is: connection user A2563 [0062] the line 201_3 is: loss connection storage [0063] the line 201_4 is: connection user D4567 [0064] the line 201_5 is: error access file image.png [0065] the line 201_6 is: error access file save.png
[0066] At the end of the first step 101, for the preceding example the following is obtained: [0067] the line 201_1 is: [time] [wait] [exceeded] [to] [storage] [0068] the line 201_2 is: [connection] [user] [A2563] [0069] the line 201_3 is: [loss] [connection] [storage] [0070] the line 201_4 is: [connection] [user] [D4567] [0071] the line 201_5 is: [error] [access] [file] [image.png] [0072] the line 201_6 is: [error] [access] [file] [save.png]
With each word placed between brackets.
[0073] Each word is associated with a position on the line 201. For example, in the line 201_2, the word [connection] is associated with the position 1, the word [user] with the position 2 and the word [A2563] with the position 3.
[0074] The first step 101 then consists in associating with each line 201 a size that corresponds to the number of words in the line 201.
[0075] Thus, for the preceding example: [0076] the line 201_1 has a size value equal to 5. [0077] the line 201_2 has a size value equal to 3. [0078] the line 201_3 has a size value equal to 3. [0079] the line 201_4 has a size value equal to 3. [0080] the line 201_5 has a size value equal to 4. [0081] the line 201_6 has a size value equal to 4.
[0082] The method 100 according to the invention can include a second step 102 consisting of replacing each word identified in the first step 101 with a descriptor.
[0083] For this, each word comprising only letters is for example associated with the word itself, each word comprising only alphanumeric characters with NB, and each other type of words with a vector including five inputs, the first input being a Boolean describing the presence or not of numerical characters, the second input being a Boolean describing the presence or not of upper-case letters, the third input being a Boolean describing the presence or not of lower-case letters, the fourth input being a Boolean describing the presence or not of non-alphanumerical characters, and the fifth input being the size of the word.
[0084] “The input is a Boolean describing the presence or not of an element” means that the input is 0 if the element is absent and 1 if the element is present.
[0085] Each other type of words could for example also be replaced with NB.
[0086] Thus, for the preceding example, at the end of the second step 102, the following is obtained: [0087] the line 201_1 is: [time] [wait] [exceeds] [to] [storage] [0088] the line 201_2 is: [connection] [user] [(1, 1, 0, 0, 5)] [0089] the line 201_3 is: [loss] [connection] [storage] [0090] the line 201_4 is: [connection] [user] [(1, 1, 0, 0, 5)] [0091] the line 201_5 is: [error] [access] [file] [(1, 0, 0, 1, 9)] [0092] the line 201_6 is: [error] [access] [file] [(1, 0, 0, 1, 8)]
[0093] A third step 103 of the method 100 consists of constructing a comparison vector for each line 201 of the transaction log 200.
[0094] The comparison vector of a line 201 is constructed by comparison between the line 201 and each other line 201 of the transaction log 200 having the same size as the line 201 for which the comparison vector is constructed.
[0095] Thus, returning to the preceding example, during the third step 103, the lines 201_2, 201_3 and 201_4 of size 3 are compared between them and the lines 201_5 and 201_6 of size 4 are compared between them.
[0096] In the case where the second optional step 102 was not carried out, the comparison vector comprises for example for each word of the line 201, the number of times when the word appears at the position associated with the word, in another line 201 of the transaction log 200 of the same size as the line 201 for which the comparison vector is constructed.
[0097] In the case where the second optional step 102 was carried out, the comparison vector comprises for example for each word of the line 201, the number of times when the descriptor that corresponds to the word appears at the position associated with the word, in another line 201 of the transaction log 200 of the same size as the line 201 for which the comparison vector is constructed.
[0098] Thus, for the preceding example, in the case where the second optional step 102 was carried out, at the end of the third step 103: [0099] the line 201_1 has for comparison vector: (0, 0, 0, 0, 0) [0100] the line 201_2 has for comparison vector: (1, 1, 1) [0101] the line 201_3 has for comparison vector: (0, 0, 0) [0102] the line 201_4 has for comparison vector: (1, 1, 1) [0103] the line 201_5 has for comparison vector: (1, 1, 1, 0) [0104] the line 201_6 has for comparison vector: (1, 1, 1; 0)
[0105] The third step 103 can be carried out using a hash table storing a plurality of keys each associated with an occurrence. Each key is comprised of a word or of a descriptor and of a position and the occurrence corresponds to the number of times when the word or descriptor appears at the position in another line 201 of the transaction log 200 associated with the same size as the line 201.
[0106] The number of keys stored in the hash table is then equal to the number of different words for each possible position for each line 201 of the transaction log 200.
[0107] A fourth step 104 of the method 100 consists of constructing a pattern for each line 201 of the transaction log 200, from the comparison vector constructed in the third step 103 for the line 201.
[0108] Each line 201 includes a constant portion and a variable portion, the constant portion being linked to a type of event that occurred and the variable portion, to the circumstances of the occurrence of the event.
[0109] For example, if the line is: connection user U1, the constant portion is connection user and the variable portion is U1. The type of event is therefore a connection of a user and the variable portion is the identifier of the user who has connected.
[0110] For example, if the line is: temperature change from 2 to 5, the constant portion is temperature change from*to*and the variable portion is 2*5. The type of event is therefore a temperature change and the variable portion is the value of the preceding temperature and the value of the current temperature.
[0111] The pattern corresponds to the constant portion of the line 201.
[0112] The pattern comprises for example each word or each descriptor of the line 201 associated with an occurrence appearing a maximum number of times in the comparison vector.
[0113] For example, if the comparison vector is [25, 16, 16, 16, 25, 1], the occurrence “25” appears twice, the occurrence “16” appears 3 times and the occurrence “1” appears once. The occurrence appearing a maximum number of times is therefore the occurrence “16” and the pattern therefore comprises the words or descriptors in position 2, 3 and 4 in the line 201.
[0114] Thus, for the preceding example: [0115] the pattern associated with the line 201_1 is: time wait exceeded to storage [0116] the pattern associated with the line 201_2 is: connection user (1, 1, 0, 0, 5) [0117] the pattern associated with the line 201_3 is: loss connection storage [0118] the pattern associated with the line 201_4 is: connection user (1, 1, 0, 0, 5) [0119] the pattern associated with the line 201_5 is: error access file [0120] the pattern associated with the line 201_6 is: error access file
[0121] A fifth step 105 of the method 100 consists of creating an event 401 for each pattern constructed in the fourth step 104.
[0122] Thus, for the preceding example: [0123] an event 401_1 is created for the pattern: time wait exceeded to storage [0124] an event 401_2 is created for the pattern: connection user (1, 1, 0, 0, 5) [0125] an event 401_3 is created for the pattern: loss connection storage [0126] an event 401_4 is created for the pattern: error access file
[0127] Each event 401 is associated with the number of lines 201 having for pattern the pattern corresponding to the event 401.
[0128] Thus, for the preceding example, the event 401_1 is associated with the line 201_1, the event 401_2 is associated with the two lines 201_2 and 201_4, the event 401_3 is associated with the line 201_3 and the event 401_4 is associated with the two lines 201_5 and 201_6.
[0129] The method 100 according to the invention can comprise a sixth step 106 consisting of constructing for each event 401, a numeric vector comprising a value of linguistic similarity for at least one other event 401.
[0130] Thus, the closer the pattern associated with a first event 401 has a meaning to that of the pattern associated with a second event 401, the greater the value of linguistic similarity between the first event 401 and the second event 401.
[0131] For example, if the first event 401 has for pattern [computer] and the second event 401 has for pattern [server], the value of linguistic similarity is substantial because the meaning of the two patterns is close. On the other hand, if the second event 401 has for pattern [user], the value of linguistic similarity will be lower because the meaning of the two patterns is farther apart.
[0132] The sixth step 106 is for example carried out by the model Word2vec. The model Word2vec is used for the lexical embedding and is comprised of artificial neural networks with two layers trained to reconstruct the linguistic context of the words.
[0133] The model Word2vec is for example trained on events 401 arranged in the same order as the corresponding lines 201 in the transaction log 200 so that the model Word2vec provides close numeric vectors for events 401 corresponding to lines 201 appearing consecutively in the transaction log 200.
[0134] The method 100 according to the invention includes a seventh step 107 consisting in constructing at least one prediction model that makes it possible to predict the next event 401 that will take place in the transaction log 200.
[0135] The prediction model is constructed by an artificial neural network by training it in an unsupervised manner on a training database including data linked to a group of training events 401.
[0136] In
[0137] The set 202_1 of training lines 201 corresponds for example to 60% of the lines 201 of the transaction log 200.
[0138] In the case where the sixth step 106 was carried out, the training database comprises for example the numeric vectors associated with the events 401 of the group 400_1 of training events 401.
[0139] The artificial neural network is then for example trained on the numeric vectors associated with the events 401 of the group 400_1 of training events 401.
[0140] In the case where the sixth step 106 was not carried out, the training database comprises for example the group 400_1 of training events 401.
[0141] The artificial neural network is for example trained on the group 400_1 of training events 401.
[0142] An artificial neural network includes at least two layers each including at least one artificial neuron. A connection between two neurons is called a synapse. Each synapse is assigned to a synaptic coefficient.
[0143] Training an artificial neural network consists of determining the synaptic coefficients of the artificial neural network allowing it to perform the desired task from the training database.
[0144] The seventh step 107 therefore consists of determining the synaptic coefficients of the artificial neural network allowing it to predict the next event 401 that will take place in the transaction log 200 from the group 400_1 of training events 401.
[0145] The training is unsupervised therefore no additional information is provided to the artificial neural network.
[0146]
[0147] In
[0148] The output layer provides a prediction according to the input data provided to the input layer and to the synaptic coefficients of the artificial neural network 300.
[0149] Once the artificial neural network 300 is trained, the output layer provides the next event 401 that will take place in the transaction log 200.
[0150] The artificial neural network 300 is for example an artificial neural network of the Long-Short Term Memory type.
[0151] The seventh step 107 is for example carried out for at least one sub-group 402 of events 401 of the group 400_1 of training events 401 on which the artificial neural network 300 is trained, i.e. for at least one sub-group 402 of events 401 of the group 400_1 of training events 401, a prediction model associated with the sub-group 402 of events 401 is constructed.
[0152] The prediction model associated with a sub-group 402 of events 401 is configured to predict an event 401 belonging to the sub-group 402 of events 401.
[0153] A sub-group 402 of events 401 groups together for example a plurality of events 401 of the group 400_1 of training events 401 having the same cardinality.
[0154] “Cardinality of an event” means the power of 10 of the number of lines 201 associated with the event 401.
[0155] For example, if an event 401 is associated with 1135 lines 201, its cardinality is 3 since the event 401 is associated with 1.135×10.sup.3 lines 201.
[0156] Thus, the prediction model associated with a sub-group 402 of events 401 of cardinality C is configured to predict an event 401 of cardinality C.
[0157] On the other hand, the training database used to construct a prediction model associated with a sub-group 402 of events 401 of cardinality C does not necessarily include only data linked to events 401 of cardinality C.
[0158] In
[0159] Thus, the method 100 according to the invention can for example comprise a first seventh step 107 that makes it possible to obtain a prediction model that predicts events 401 of cardinality C1 and a second seventh step 107 that makes it possible to obtain a prediction model that predicts events 401 of cardinality C2.
[0160] The artificial neural network 300 comprises for example an attention layer 301.
[0161] In
[0162] The attention layer 301 is for example a Temporal Attention-Gated layer.
[0163] The artificial neural network 300 is for example an artificial neural network of the bidirectional Long-Short Term Memory type comprising an input layer, 50 hidden layers, an attention layer 301, a fully-connected layer and an output layer.
[0164] The method 100 according to the invention comprises an eighth step 108 consisting, for a given event 401, of using the prediction model constructed at the seventh step 107 to predict the event 401, from a group of prediction events 401. The group of prediction events comprises at least one event 401 preceding the event 401 to be predicted in the transaction log 200.
[0165] The group of prediction events 401 comprises for example the thirty events 401 preceding the event 401 to be predicted, in the transaction log 200.
[0166] In
[0167] In the case where the seventh step 107 was carried out for a plurality of sub-groups 402 of events 401, the prediction model used in the eighth step 108 is the prediction model associated with a sub-group 402 of events 401 having the same cardinality as the event 401 to be predicted.
[0168] The eighth step 108 comprises for example a first sub-step 1081 consisting for the attention layer 301, of generating a weight vector. The weight vector comprises at least one weight for another event 401 of the group 400_2 of prediction events 401.
[0169] In the case shown in
[0170] The method 100 according to the invention comprises a ninth step 109 consisting of generating for the event 401 for which the prediction model was used in the seventh step 107, a causal graph.
[0171] A causal graph comprises a causal relation with each other event 401 of the group 400_2 of prediction events 401 responding to a relevance condition.
[0172] The relevance condition is for example verified for an event 401 if the weight of the weight vector corresponding to the event 401 is greater than a threshold.
[0173] The threshold is for example the sum of the average of the weights of the weight vector and of the standard deviation of the weights of the weight vector.
[0174]
[0175]
[0176] In
[0177] In
[0178] As the weight associated with the second event 401_3 is the most substantial in the weight vector, the event 401_3 is the one that has the most chance of training the event 401_4.
[0179] Returning to the preceding example, this means that the event 401_4 “error access file” is trained by the event 401_3 “loss connection storage” and not for example by the event 401_2 “connection user (1, 1, 0, 0, 5)”.