Methods of clustering computational event logs

Abstract

Methods are presented suitable for clustering computational event logs (2) including a method for calculating a metric distance between characters of different event messages (4) by comparing both characters to a comparative set of characters. Methods are presented for calculating a metric distance between two event messages (4) comprising determining character metric distances between characters in the compared words and word metric distances between the words in the compared events (4), Methods are presented for creating an area (8) in metric space corresponding to a new cluster (6) when a further event message (26) is found in an overlap region (24) of existing clusters (6, 8). Methods are presented in populating and constructing an event table.

Claims

1. A method comprising: selecting, via a processor, a plurality of event messages generated by a computing cluster, the event messages comprising unstructured logs of the computing cluster; determining, via the processor, message metric distances between the event messages based on combining word metric distances between aligned words of the event messages, determining the word metric distances involving treating aligned numeric characters between the aligned words as matching characters regardless of differences between the aligned numeric characters; clustering the event messages into event types based on the message metric distances; and storing a record of the event types associated with the event messages, the record used for system diagnostics of the computing cluster.

2. The method of claim 1, wherein determining the word metric distances comprises treating aligned hexadecimal characters between the aligned words as matching characters regardless of differences between the aligned hexadecimal characters.

3. The method of claim 2, wherein the aligned hexadecimal characters comprise {A, B, C, D, E, F, a, b, c, d, e, f, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9}.

4. The method of claim 2, wherein the aligned hexadecimal characters are used to identify at least one of an address and a process identifier and are irrelevant to clustering the event messages.

5. The method of claim 1, wherein the event messages comprise Syslog messages.

6. The method of claim 1, further comprising: defining a plurality of areas in metric space, wherein each area comprises at least one of the event types; defining an overlap region in metric space wherein at least two of the plurality of defined areas partially overlap; determining whether a further event message generated by a computing cluster is located in the overlap region in metric space; and updating the record with a further defined area in metric space comprising the further event message.

7. The method of claim 1, wherein determining the word metric distances comprises increasing the word metric distances if the aligned words have a different number of characters.

8. The method of claim 1, wherein determining message metric distances comprises increasing the message metric distances if the event messages have a different number of words.

9. The method of claim 1, wherein storing the record of the event types comprises storing an event table, each entry of the event table comprising one of the event types and the event messages associated therewith.

10. The method of claim 1, wherein the event messages are stored on a parallel, distributed file system.

11. An apparatus comprising: a storing system that stores event messages comprising unstructured logs of a computing cluster; and a processor coupled to the storing system and configured to perform: selecting a plurality of event messages generated by a computing cluster, the event messages comprising unstructured logs of the computing cluster; determining message metric distances between the event messages based on combining word metric distances between aligned words of the event messages, determining the word metric distances involving treating aligned numeric characters between the aligned words as matching characters; clustering the event messages into matching event types based on the message metric distances; and storing a record of the event types associated with the event messages, the record used for system diagnostics of the computing cluster.

12. The apparatus of claim 11, wherein determining the word metric distances involving treating aligned hexadecimal characters between the aligned words as matching characters regardless of differences between the aligned hexadecimal characters.

13. The apparatus of claim 12, wherein the aligned hexadecimal characters comprise {A, B, C, D, E, F, a, b, c, d, e, f, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9}.

14. The apparatus of claim 12, wherein the aligned hexadecimal characters are used to identify at least one of an address and a process identifier and are irrelevant to clustering the event messages.

15. The apparatus of claim 11, wherein the event messages comprise Syslog messages.

16. The apparatus of claim 11, wherein the processor is further configured to: define a plurality of areas in metric space, wherein each area comprises at least one of the event types; define an overlap region in metric space wherein at least two of the plurality of defined areas partially overlap; determine whether a further event message generated by a computing cluster is located in the overlap region in metric space; and update the record with a further defined area in metric space comprising the further event message.

17. The apparatus of claim 11, wherein determining the word metric distances comprises increasing the word metric distances if the aligned words have a different number of characters.

18. The apparatus of claim 11, wherein determining message metric distances comprises increasing the message metric distances if the event messages have a different number of words.

19. The apparatus of claim 11, wherein storing the record of the event types comprises storing an event table, each entry of the event table comprising one of the event types and the event messages associated therewith.

20. The apparatus of claim 11, wherein the storing system comprises a parallel, distributed file system.

21. The apparatus of claim 11, wherein determining the word metric distances involves treating aligned numeric characters between the aligned words as matching characters regardless of differences between the aligned numeric characters.

22. A system comprising: a computing cluster that generates and stores event messages on a parallel, distributed file system, the event messages comprising unstructured logs of the computing cluster; and a server configured to read the stored event messages, the server comprising a processor that is configured to: determine message metric distances between the event messages based on combining word metric distances between aligned words of the event messages, determining the word metric distances involving treating aligned numeric characters between the aligned words as matching characters such that at least one of an address and a process identifier identified by the aligned numeric characters are made irrelevant to clustering the event messages; cluster the event messages into event types based on the message metric distances; cluster the event messages into matching event types based on the message metric distances; and store a record of the event types associated with the event messages, the record used for system diagnostics of the computing cluster.

23. The system of claim 22, wherein determining the word metric distances involving treating aligned hexadecimal characters between the aligned words as matching characters regardless of differences between the aligned hexadecimal characters.

24. The system of claim 22, wherein the server is further configured to: define a plurality of areas in metric space, wherein each area comprises at least one of the event types; define an overlap region in metric space wherein at least two of the plurality of defined areas partially overlap; determine whether a further event message generated by a computing cluster is located in the overlap region in metric space; and update the record with a further defined area in metric space comprising the further event message.

25. The system of claim 22, wherein determining the word metric distances comprises increasing the word metric distances if: the aligned words have a different number of characters; or the event messages have a different number of words.

26. The system of claim 22, wherein determining the word metric distances involves treating aligned numeric characters between the aligned words as matching characters regardless of differences between the aligned numeric characters.

Description

(1) Embodiments of the present invention will now be described in detail with reference to the accompanying drawings, in which:

(2) FIG. 1 shows event logs generated by a Lustre file system in an HPC environment,

(3) FIG. 2 shows the last four event logs of FIG. 1 together with an indication of different fields within the event logs,

(4) FIG. 3 shows the event log clusters with metric balls,

(5) FIG. 4a shows an original set of pair-wise disjoint metric ball clusters;

(6) FIG. 4b shows the clusters of FIG. 4a together with a consolidated metric ball cluster;

(7) FIG. 4c shows the events of the original clusters together with the consolidated ball of FIG. 4b, wherein the original metric balls are deleted;

(8) FIG. 5 shows a flow chart for the processes of creating a consolidated metric ball.

(9) FIG. 6 shows a flow chart corresponding to the construction of an event table,

(10) The following methods are in the field of event log 2 clustering.

(11) According to a first aspect there are provided methods for calculating a metric distance between a first character associated with a first computational event message 4, and a second character associated with a second computational event message 4. The said methods may be embodied in software or hardware.

(12) According to a second aspect there are provided methods for calculating a distance in metric space between computational event messages 4. The said methods may be embodied in software or hardware. The methods of the second aspect may use any of the methods disclosed in the first aspect.

(13) According to a third aspect there are provided methods for defining an area in metric space for clustering computational events. The said methods may be embodied in software or hardware. The methods of the third aspect may use any of the methods disclosed in the first and second aspects.

(14) According to a fourth aspect there are provided methods of populating a table comprising a first plurality of data entries, wherein the methods provides a further data entry in the table associated with a further defined area in metric space as calculated by any of the methods disclosed in the third aspect.

(15) Throughout all of the aspects described herein, the term computational event message is used interchangeably with the term event message or message. Event messages 4 may be any computational event messages 4 but are preferably Syslog event messages 4.

(16) It is also assumed herein throughout that event logs 2 and event messages 4 are read from left to right and any reference to any calculation or determination or other analysis of an event message 4 in sequence is intended to be in sequence from left to right.

(17) Determining Character Metric Distances

(18) Previous methods of determining distances between computational events in metric space have comprised determining the exact metric distance between sequentially corresponding characters in the two messages 4 being evaluated. Therefore, any difference between each of the said two characters being compared may contribute a non-zero metric distance between the two messages 4 and may complicate the determination of suitable event clusters 6 due to the likelihood that each event will be separated by a non-zero metric distance. This also gives rise to high computer processing requirements to make a clustering analysis.

(19) The inventor of the present application has noticed that typically at least a portion of the information in the event log 2, or event message, is not required to be analysed to determine whether or not said messages 4 are similar to each other for error analysis, and thus should be in the same cluster. Event logs 2 are written by programmers and developers as debug messages 4. They typically include combinations of various technical parameters such as IP addresses, MAC addresses, process IDs etc. which are used by programmers and developers to help with their own debugging. Typically these technical parameters are formed from hexadecimal or decimal characters. Directly comparing two event messages 4 in metric space including the comparison of these parameters such as IP addresses and MAC addresses provides a significant computational burden upon the clustering analysis. Furthermore, it may give rise to metric distances that separate the events into different clusters 6 where the type of event message 4 is the same or similar. These parameters are unnecessary for establishing a cluster 6 of similar or identical event messages 4.

(20) A method is provided for determining a metric distance in metric space whereby instead of only directly comparing a character in one event log to another character in another event log 2 to determine the character metric distance, the method compares each said character (also referred to as the first character and the second character) to a comparative set of characters. If both of the first and second characters comprise a character from the said comparative set, then the method outputs a first predefined metric distance and ignores any actual difference between the said characters. In this manner, the method may be seen to be operating in a pseudo-metric space.

(21) For example, if the comparative set of characters contained the decimal numbers {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} and the characters being compared between the two event messages 4 were the characters 3 for the first character and 6 for the second character, then the present method would output a predefined character metric distance (for example 0). In contrast, an existing method that determines character distances by taking the numerical distance between the decimal characters, may output a distance of 3 (taken from a calculation of 63), or a distance of 1 using Levenshtein distance (which provides a distance of 1, or in principle any other number, when the characters are different regardless of their values). Conversely following from this example, using the same comparative set and predefined character metric distance, if the first character was 1 and second character was 9, then the present method would output the same predefined metric distance of 0, whilst the existing method would output a character metric distance of 8 ((taken from a calculation of 91) or a distance of 1, or any other number, using Levenshtein distance.

(22) Preferably this comparative set of characters is a partial set of characters from the range of different characters used to create the error log messages 4. Preferably, the set of characters chosen for the comparative set are those characters which commonly occur in parts of the event messages 4 that are not typically required in determining the similarity of event messages 4 for clustering. Preferably the method comprises a first predefined output of 0, This predefined output value, together with the choice of comparative characters, allows the method, when analysing all the characters in the log 2, to ignore metric distances that have no bearing on clustering events.

(23) As described later, characters within an event message, or within a word of an event message, may be compared serially within the word or message 4 from left to right, although in principle, any pair of characters (one from the first message, one from the second message) from any of the words in the respective first and second messages 4, may be compared. For example, when comparing all the characters in a particular word in the first event message to corresponding characters in a particular word in the second event message, the first character (reading the word left to right) of the first word (reading the message 4 left to right) of the first event message, is compared to the corresponding first character (read the word left to right) of the first word (reading the message 4 left to right) of the second event message. Then, the second character (reading the word left to right) of the first word (reading the message 4 left to right) of the first event message is compared to the corresponding second character (reading the word left to right) of the first word (reading the message 4 left to right) of the second event message, and so on until either the first or second word has no corresponding character to be compared against.

(24) Other terminology may be used to describe the comparison of the words and characters herein, including referring to the words being compared in the first and second messages 4 as the first and second words, and the characters being compared between two words as the first and second characters.

(25) Another way of describing this is that the characters in the particular words of the first and second messages (when analysing on a word by word basis), or the characters in the message, are aligned or sequentially aligned.

(26) If, once the characters are aligned, one or more of the words being compared has no corresponding aligned character, then these one or more characters of a particular word or message 4 are excess characters forming part of an excess character set.

(27) In the same manner, as described in the second aspect, the words of two different messages 4 may also be aligned (for example for evaluation purposes) so that the first word (when read left to right) of the first message is evaluated with the first word (when read left to right) of the second message, and so on.

(28) An example of comparing aligned characters of words in two different messages 4 is given below. Here, a hexadecimal set of comparative characters is used together with a first predefined character metric distance of 0. The two following separate lines show two words in each of the messages 4:

(29) LOGIN 00b85 for the first message; and,

(30) FATAL 11e20 for the second message;

(31) When each aligned character pair are compared to determine a character metric distance, a 0 character metric distance would be found between all the characters in the aligned words 00b85 and 01e80 because, for each aligned character pair, both the characters comprise a character from the comparative set. The only positive metric distances derived from this example would result from sequentially comparing the different characters between the aligned words LOGIN and FATAL, because, sequentially, at least one of the characters in each compared pair comprises a character not within the set.

(32) Preferably, the comparative set of characters are the hexadecimal characters (A B C D E F, a b c d e f, 0 1 2 3 4 5 6 7 8 9), although in principle any set of characters may be chosen for the set. The reason for choosing these characters is that they are the characters that are commonly used to define MAC addresses, IP addresses and process IDs, all of which are irrelevant for clustering similar event messages 4.

(33) Preferably, if both of the characters being compared are identical then, irrespective of whether or not they are one of the comparative set of characters, the method outputs the same first predefined value of metric distance. Again, preferably this distance is zero, although in principle it could be any value and may be a different predefined metric distance to the first predefined metric distance. This is because the goal of clustering using metrics is to only assign a positive integer metric distance between elements (words/characters) of a message 4 when the elements are different and relevant to distinguishing between different event types. If, for example the word is the same (hence comprising the same characters), then this is an indication that the event messages 4 share some similarity and therefore should be located nearby in metric space.

(34) If all the characters in the first message are identical, when compared in sequence, to the characters in the second message, then the messages 4 are the same and should occupy the same point in metric space. Using the method as described herein, this situation of preferably having a zero metric distance between two events, may also occur when both the corresponding characters in the first and second message, when compared in sequence, are either: A. Identical to each other but not identical to any of the characters in the comparative set; or B. Identical to each other and identical to any one of the characters in the comparative set; or C. Not identical to each other and wherein each of the first and second compared characters are identical to any one of the characters in the comparative set.

(35) This example is assuming event messages 4 with identical numbers of words and identical length of aligned words.

(36) This is shown in the following example using a hexadecimal set of comparative characters and a first predefined character metric distance of 0. The two following words in each of the messages 4 are as follows:

(37) FATAL 11b85 for the first message; and,

(38) FATAL 11e20 for the second message;

(39) Each character metric distance between aligned characters is 0 because: 1) the sequentially compared characters: {T and T}, {L and L} from the first words in each respective message 4 are identical even though none of them are identical to any of the comparative sets; therefore have a character metric distance of 0 as defined in point A) above; and, 2) the sequentially compared characters: {F and F}, {A and A}, from the first words in each respective message; and {1 and 1} and {1 and 1} from the second words in each respective message; are identical and also comprise a character in the hexadecimal comparative character set; therefore have a character metric distance of 0 as defined in point B) above; and, 3) the sequentially compared characters: {A and 3} from the first words in each respective message, and {b and e}, {8 and 2}, {5 and 0} from the second words in each respective message; have both of the compared characters identical to at least one of the hexadecimal characters in the comparative set; therefore have a character metric distance of 0 as defined in point C) above.

(40) If, however the two aligned characters in the different event logs 2 or event messages 4 are different, and at least one is not comprised within the comparative set of characters, the method outputs a different predefined metric distance (otherwise referred to in herein as a second predefined metric distance). The second predefined metric may take any value but is preferably a non-zero integer value. Preferably this metric distance is 1 or unitary metric distance.

(41) The characters used to create the two different event logs 2 or event messages 4 may in principle be chosen from two different originating sets of characters, however preferably, the comparative set of characters is both: a partial set of characters of the first set of characters used to create the first event message; and a partial set of characters of the second set of characters used to create the second event message.

(42) A further example of how the method presented herein works in a pseudo-metric space is discussed below with regard to a common event message 4 word sequence Kernel Error. Table 1 provides a list of word pairs (including Kernel Error) that are similar to the word pair Kernel Error. By using a hexadecimal comparative set, the method described herein, when comparing Kernel Error to any of the word pairs in table 1, would assign a 0 metric distance between each of the aligned characters, hence a 0 metric distance between Kernel Error and each of the similar aligned word pairs.

(43) TABLE-US-00001 TABLE 1 Lists of words having a zero metric distance to the words Kernel Error Kernel Error K2rnel Error K0rnel Error Kbrnbl Arror K4rnel 3rror Karnel Error K3rnel Error KArnol Error Kcrncl Brror K5rnel 4rror Kbrnel Error K4rnel Error KBrnol Error Kdrndl Crror K6rnel 5rror Kcrnel Error K5rnel Error KCrnel Error Kernel Drror K7rnel 6rror Kdrnel Error K6rnel Error KDrnel Error Kfrnfl Error K8rnel 7rror Kernel Error K7rnel Error KErnel Error K1rnel Frror K9rnel 8rror Kfrnel Error K8rnel Error KFrnel Error K2rnel Arror K0rnel 9rror K1rnel Error K9rnel Error Karnal Error K3rnel Brror KArnel 0rror

(44) However, in traditional methods for determining metric distances between event messages 4, these character differences between the different word pairs in table 1, when compared to the words Kernel Error will have the effect of assigning a non-zero metric distance between the word pairs, hence a non-zero metric distance between the event messages 4. The method presented herein therefore provides a fast and efficient way of providing a zero metric distance to similar words and messages 4 and therefore allows for grouping of messages 4 with similar error types but using different MAC addresses (for example). This method of using pseudo metric space comes with a risk of identifying two different events as a single event by setting a zero metric distance when characters of different messages 4 are different but both comprised within the comparative set. However as shown above in Table 1, for words such as Kernel that are misspelt to karnel, this can be an advantage because the event message 4 shouldn't have a metric distance, but would do so using a traditional metric method. Furthermore, the future development of error messages 4 using words such as karnel 3rror, are extremely unlikely. This above described comparison between aligned characters and a comparative set is mathematically described below between two characters A.sub.i.sup.h and B.sub.j.sup.k as follows where the metric distance between two characters is d.sub.C.

(45) $\begin{matrix} d_{C} (A_{i}^{h}, b_{j}^{k}) = {\begin{matrix} 0 & A_{i}^{h}, B_{j}^{k} with A_{i}^{h} = B_{j}^{k} \\ 0 & A_{i}^{h}, B_{j}^{k} with A_{i}^{h} B_{j}^{k} and A_{i}^{h} ��, B_{j}^{k} �� \\ 1 & Otherwise \end{matrix} & Equ . 4 \end{matrix}$

(46) Here, custom character denotes the comparative set of characters, which is preferably the hexadecimal set of characters as mathematically noted in equation 4a.
{A,B,C,D,E,F,a,b,c,d,e,f,0,1,2,3,4,5,6,7,8,9}Equ. 4a

(47) The method can be applied to an event message 4 extracted from an event log 2 or applied to all the characters in the event log 2 or any part of an event log. Preferably however the method is applied to an event message 4 extracted from an event log 2. When the said method is applied to multiple characters aligned between two different messages 4 being compared, a determination of the word metric distance can be established.

(48) The character distance function d.sub.C defined in equation 4 is given for two characters A and B and is re-stated generically below in equation 5 where the three different outcomes are termed Case 1, Case 2 and Case 3. The above description providing the conditions required (i.e. what the characters are being compared) to give different character metric distance outputs are just an example of a method of determining dc.

(49) $\begin{matrix} d_{C} (A, B) = {\begin{matrix} 0 & A, B with A = B & Case 1 \\ 0 & A, B with A B and A ��, B �� & Case 2 \\ 1 & Otherwise & Case 3 \end{matrix} & Equ . 5 \end{matrix}$

(50) Equation 5 has three cases. In case 1, A and B are same regardless of whether they belong to the comparative set or not. In case 2, A and B are different but both belong to the comparative set. Case 3 described all other cases, i.e., at least one of A or B does not belong to the comparative set.

(51) In the example of equation 5, the metric distance value for case 1 should be always zero because it is a fundamental property of metric mathematics that identical characters have no distance between them. The metric distance value for case 2 is also set to zero, but does not necessarily need to be zero and in principle can take any numerical value. Similarly, the metric distance value for case 3 is non-zero but does not necessarily need to be 1. In principle the output value for case 3 can be any value. Preferably the output value for case 3 is larger than the output value for case 2 as exemplified in the following two equations, 5a and 5b for character distance function dc.

(52) $\begin{matrix} d_{C} (A, B) = {\begin{matrix} 0 & A, B with A = B & Case 1 \\ 0 & A, B with A B and A ��, B �� & Case 2 \\ 1000 & Otherwise & Case 3 \end{matrix} & Equ . 5 a \\ d_{C} (A, B) = {\begin{matrix} 0 & A, B with A = B & Case 1 \\ 0.00001 & A, B with A B and A ��, B �� & Case 2 \\ 1 & Otherwise & Case 3 \end{matrix} & Equ . 5 b \end{matrix}$

(53) In both of the above equations 5a and 5b, the distance value dc is zero for case 1. This metric distance may be termed the fundamental metric distance and preferably takes the value 0, although in principle it can be any value, preferably less than the values attributed to cases 2 and 3. The distance values dc of cases 2 and 3 are different in each of equations 5a and 5b but are similar in that, for each instance, the numerical value for case 2 is negligible when compared to the distance value of case 3. If case 2 provides a non-zero value for dc, then the value of dc for case 3 is preferably at least 10 times greater than that of case 2, or preferably at least 100 times greater, or preferably at least 1000 times greater, or preferably at least 10,000 times greater, or preferably at least 100,000 times greater, or preferably at least 1000,000 times greater.

(54) The distance is pseudo metric if it yields zero distance for some elements or characters which are not same. In equation 5, case 2 yields zero for some circumstances where A is not equal to B. Therefore, the distance function represented by equation 5 is pseudo-metric and its corresponding metric space will be pseudo-metric space. The distance functions represented by equations 5a and 5b always yield a non-zero value when A is not equal to B. Therefore, they are proper (standard or non-pseudo) distance functions and their corresponding metric spaces are proper metric spaces.

(55) The above discussions illustrate that there are at least 3 character metric distance values, provided by the distance functions represented by the equations 5, 5a and 5b. One of the values is zero, another is zero or non-zero, and the other is a non-zero value. These can be termed a first, second and third value respectively. The second value is smaller, (preferably negligible when compared to) the third value.

(56) In the above distance functions represented by equations 5, 5a and 5b, case 3 can be divided into one or more further cases depending on whether A and B are members of the comparative set. Preferably there are three further 3 subdivided cases. Equation 5c mathematically describes this scenario.

(57) $\begin{matrix} d_{C} (A, B) = {\begin{matrix} 0 & A, B with A = B & Case 1 \\ 0 & A, B with A B and A ��, B �� & Case 2 \\ 1 & A, B with A B and A ��, B .Math. �� & Case 3 \\ 1 & A, B with A B and A .Math. ��, B �� & Case 4 \\ 1 & A, B with A B and A .Math. ��, B .Math. �� & Case 5 \end{matrix} & Equ . 5 c \end{matrix}$

(58) The above distance function, represented by equation 5c, is exactly same as the distance function represented in equation 5 but expanded out to show three cases (case 3, case 4 and case 5) that represent different instances where A and B are either part of the comparative set or not. Case 1 and case 2 remain exactly same as before but case 3, 4 and 5 can be treated differently.

(59) Cases 3 and 4 represent the same scenario where one character element (A or B) belongs to the comparative set and the other does not. Preferably, both case 3 and case 4 yield non-zero values. These non-zero values may be identical or different. Preferably, case 3 and case 4 should also yield values which are significantly greater than values yielded by case 2 as described above.

(60) Case 5 corresponds to the case where both A and B do not belong to the comparative set. This case should yield the highest value because the contribution of these character distance value to the final event distance value will be high and subsequently push dissimilar event messages further apart. The values of cases 3, 4 and 5 in equations 5c-5g may be termed first, second and third values, where the third value represents case 5, whilst each of the first and second values may represent one of cases 3 or 4.

(61) Equations 5d-5g show some further examples.

(62) The distance function represented by equation 5d provides: a zero for case 1; a zero for case 2; identical non-zero outputs (in this example the value 1) for case 3 (where A and B are different but A belongs to the comparative set and B does not) and case 4 (where A and B are different 5. but B belongs to the comparative set and A does not); and a non-zero output (in this example 100) for case 5 (where A and B are different and neither belong to the comparative set).

(63) $\begin{matrix} d_{C} (A, B) = {\begin{matrix} 0 & A, B with A = B & Case 1 \\ 0 & A, B with A B and A ��, B �� & Case 2 \\ 1 & A, B with A B and A ��, B .Math. �� & Case 3 \\ 1 & A, B with A B and A .Math. ��, B �� & Case 4 \\ 100 & A, B with A B and A .Math. ��, B .Math. �� & Case 5 \end{matrix} & Equ . 5 d \end{matrix}$

(64) The distance function represented by equation 5e provides: a zero for case 1; a zero for case 2; non-identical non-zero outputs (in this example the respective values 2 and 3) for case 3 (where A and B are different but A belongs to the comparative set and B does not) and case 4 (where A and B are different but B belongs to the comparative set and A does not); and a non-zero output (in this example 100) for case 5 (where A and B are different and neither belong to the comparative set).

(65) $\begin{matrix} d_{C} (A, B) = {\begin{matrix} 0 & A, B with A = B & Case 1 \\ 0 & A, B with A B and A ��, B �� & Case 2 \\ 2 & A, B with A B and A ��, B .Math. �� & Case 3 \\ 3 & A, B with A B and A .Math. ��, B �� & Case 4 \\ 100 & A, B with A B and A .Math. ��, B .Math. �� & Case 5 \end{matrix} & Equ . 5 e \end{matrix}$

(66) The distance function represented by equation 5f provides: a zero for case 1; a non-zero value for case 2 (in this example the value 1); non-identical non-zero outputs (in this example the respective values 23 and 42) for case 3 (where A and B are different but A belongs to the comparative set and B does not) and case 4 (where A and B are different but B belongs to the comparative set and A does not); and a non-zero output (in this example 1000) for case 5 (where A and B are different and neither belong to the comparative set).

(67) $\begin{matrix} d_{C} (A, B) = {\begin{matrix} 0 & A, B with A = B & Case 1 \\ 0 & A, B with A B and A ��, B �� & Case 2 \\ 23 & A, B with A B and A ��, B .Math. �� & Case 3 \\ 42 & A, B with A B and A .Math. ��, B �� & Case 4 \\ 1000 & A, B with A B and A .Math. ��, B .Math. �� & Case 5 \end{matrix} & Equ . 5 f \end{matrix}$

(68) The distance function represented by equation 5g provides: a zero for case 1; a non-zero value for case 2 (in this example the value 0.0001); identical non-zero outputs (in this example the value 1) for case 3 (where A and B are different but A belongs to the comparative set and B does not) and case 4 (where A and B are different but B belongs to the comparative set and A does not); and a non-zero output (in this example 10) for case 5 (where A and B are different and neither belong to the comparative set).

(69) $\begin{matrix} d_{C} (A, B) = {\begin{matrix} 0 & A, B with A = B & Case 1 \\ 0.0001 & A, B with A B and A ��, B �� & Case 2 \\ 1 & A, B with A B and A ��, B .Math. �� & Case 3 \\ 1 & A, B with A B and A .Math. ��, B �� & Case 4 \\ 10 & A, B with A B and A .Math. ��, B .Math. �� & Case 5 \end{matrix} & Equ . 5 g \end{matrix}$

(70) Determining Event Metric Distances

(71) In the second aspect, a distance function for event messages 4 is defined.

(72) An event message 4 is typically a string of characters. It is typically a sequence of words separated by word-separators. White space is a typical word separator but other characters such as colon (:), hyphen (-), etc., can also be treated as word separators depending on the context. When the word separators in an event message 4 A are ignored, it can be represented as follows:
A custom character A.sub.0 A.sub.1 A.sub.2 . . . A.sub.i . . . A.sub.N.sub.AEqu. 6
where A.sub.N.sub.A represents the last and A.sub.i for 0iN.sub.A represent the i.sup.th word of the event message 4 A.

(73) A word is a sequence of characters. The i.sup.th word of the event A can be represented as follows:

(74) $\begin{matrix} A_{i} \overset{}{=} A_{i}^{0} A_{i}^{1} A_{i}^{2} .Math. A_{i}^{j} .Math. A_{i}^{M_{(A_{i})}} & Equ . 7 \end{matrix}$
where M.sub.(A.sub.i.sub.) represents the last character of the word A.sub.i and A.sub.i.sup.j for 0 custom character jM.sub.(A.sub.i.sub.) represent the j.sup.th character of the i.sup.th word of the event message A.

(75) The method presented in the second aspect calculates a distance in metric space between computational event messages 4.

(76) The event messages 4 being compared are referred to as a first event message (for example message A) and a second event message (for example message B). The first event message is associated with a first event log 2 and comprises a first word sequence of one or more computational words. Each of the said one or more computational words comprises a sequence of one or more characters.

(77) Similarly, the second event message is associated with a second event log. The second event message comprises a second word sequence of one or more computational words. Each of the said one or more computational words comprises a sequence of one or more characters.

(78) The method determines the metric distance between event messages 4 by determining the metric distance between aligned words in the first and second message. For example, if both message A and message B have three words each, then the method calculates the metric distances between words: {A.sub.1 to B.sub.1}, {A.sub.2 to B.sub.2}, {A.sub.3 to B.sub.3}.

(79) In an example, if: message A has 3 words {KERNEL ERROR FATAL}, whilst, message B has 2 words {KERNEL STACKS};
then the method calculates the metric distances between words: {A.sub.1 to B.sub.1}, {A.sub.2 to B.sub.2}, i.e. {KERNEL to KERNEL}, {ERROR to STACKS}. In this example, the word A.sub.3 (FATAL) is not compared to another word because there is no corresponding word B.sub.3 in the word sequence of message B. The un-aligned words in a word sequence are preferably assigned a non-zero positive metric numeric value, although in principle they may be ignored. Preferably a word weight is assigned to un-aligned words as described herein.

(80) The metric distance between the two messages A and B is then determined, at least partially, based upon the summation of the above said word metric distances. In the example above, the event metric distance is at least partially determined by adding the word metric distance of {KERNEL to KERNEL}, to the word metric distance of {ERROR to STACKS}. The absolute message metric distance may also be determined, additionally, by one or more other factors, such as, but not limited to, word weights for un-aligned words as discussed below.

(81) The distance between each aligned word pair is determined at least partially, by summing the distances between the aligned characters in the said word pair. For example, word A.sub.2, in message A, has characters A.sub.2.sup.1, A.sub.2.sup.2, A.sub.2.sup.3, whilst word B.sub.2, in message B, has characters B.sub.2.sup.1, B.sub.2.sup.2, B.sub.2.sup.3, B.sub.2.sup.4. The aligned character pair distances are therefore the metric distances between: {A.sub.2.sup.1 and B.sub.2.sup.1}, {A.sub.2.sup.2 and B.sub.2.sup.2}, {A.sub.2.sup.3 and B.sub.2.sup.3}. The character B.sub.2.sup.4 is an excess un-aligned character when compared to the character sequence in word A.sub.2 because it has no corresponding aligned character to compare against.

(82) The metric distance between the characters can be determined by any suitable method, including, but not limited to the methods disclosed in the first aspect described herein.

(83) Following through the example above where: message A has 3 words {KERNEL ERROR FATAL}, whilst, message B has 2 words {KERNEL STACKS};
the method according to the second aspect, using the method of determining character metric distances of the first aspect (with a hexadecimal comparative set and a first predefined distance of 0 and a second predefined distance of 1) would base the event metric distance at least partially upon the following: 1. The word metric distance d.sub.W1 of {KERNEL to KERNEL}. This would be 0 because all of the aligned characters {K to K} {E to E} {R to R} {N to N} {E to E} {L to L} are identical, therefore each outputting a character metric distance of 0. The summation of all the character metric distances in the first aligned words total to 0. 2. The word metric distance d.sub.W2 of {ERROR to STACKS}. This would result in a value of 5 because each of the aligned characters {E to S} {R to T} {R to A} {O to C} {R to K} have non identical characters in each aligned pair.

(84) Thus the summation of the word metric distances in this example would be d.sub.E(AB)=0+5=5. Here we are assuming that the last character S in STACKS has no contribution to the metric distance.

(85) The method according to the second aspect may also account for situations where the number of words in the message 4 being compared are non-identical, for example message A containing 8 words and message B containing 4 words where the excess number of un-aligned words in message A is 4. Preferably, the method assigns a numeric value to each and every word in the longest message 4 that is in excess of the last word in the shorter message 4 when the words in the messages 4 are aligned. This numeric value may be termed a weight or word weight. The method preferably determines the event metric distance from the summation of the aligned word metric distances as described above and the summation of each of the word weights.

(86) For example if the number of words in message A is N.sub.A and the number of words in message B is N.sub.B then if N.sub.A<N.sub.B then each of the words in N.sub.B in excess of N.sub.A is assigned a corresponding word weight w(B). Similarly, if N.sub.A>N.sub.B then each of the words in N.sub.A in excess of N.sub.B is assigned a corresponding word weight w(A).

(87) Mathematically, the distance d.sub.E between two events A and B taking into account word weights can be defined as follows:

(88) 0 $\begin{matrix} d_{E} (A, B) = {\begin{matrix} {.Math.}_{i = 0}^{N_{A}} d_{W} (A_{i}, B_{i}) + {.Math.}_{i = N_{A} + 1}^{N_{B}} w (B_{i}) & N_{A} < N_{B} \\ {.Math.}_{i = 0}^{N_{A}} d_{W} (A_{i}, B_{i}) & N_{A} = N_{B} \\ {.Math.}_{i = 0}^{N_{B}} d_{W} (A_{i}, B_{i}) + {.Math.}_{i = N_{B} + 1}^{N_{A}} w (A_{i}) & N_{A} > N_{B} \end{matrix} & Equ . 8 \end{matrix}$

(89) Here w(A.sub.i) and w(B.sub.i) denote the weights of the i.sup.th word in events A and B respectively, and d.sub.w(A.sub.i, B.sub.i)) denotes the distance between the i.sup.th words in events A and B. Term d.sub.E is defined as the summation of word-wise distances and weights and can also be defined as follows:

(90) $\begin{matrix} d_{E} (A, B) = {.Math.}_{k = 0}^{\min (N_{A}, N_{B})} d_{W} (A_{i}, B_{i}) + {.Math.}_{k = \min (N_{A}, N_{B}) + 1}^{\max (N_{A}, N_{B})} w (C_{i}) & Equ . 9 \end{matrix}$

(91) Where:

(92) $C_{i} = {\begin{matrix} A_{i} & N_{A} > N_{B} \\ B_{i} & Otherwise \end{matrix}$

(93) The numeric value or word weight given to the excess words may in principle be any value. Preferably, the second aspect determines the word weights based upon the number of characters in the word where the weight w(X.sub.i) of a word X.sub.i in event message X, is defined as the number of characters in that word. Therefore we have the following, where M is the last character:
w(X.sub.i)=1+M.sub.(X.sub.i.sub.)Equ. 10

(94) It is noted in Equation 10 that the numeral 1 is added to M.sub.(Xi) due to the computational notation of indexing from 0 instead of 1.

(95) Similarly to calculate the event message distance d.sub.E using aligned word metric distances, and additionally, word weights when the number of words in each message 4 are non-identical; the metric distance d.sub.W between any two words A.sub.i and B.sub.j may further utilise a value associated with the number of excess characters the longest word (in the aligned word pair) has over and above the last aligned character in the corresponding shortest word. This value is preferably the absolute numerical difference (i.e. the modulus of the difference) between the number of characters in the aligned words.

(96) For example when calculating the word metric distance d.sub.C2 of {ERROR to STACKS} as described above, STACKS has 6 characters whilst ERROR has 5 characters. The absolute (modulus) difference between the number of characters in the strings is |56|=1. Preferably, this extra value is added to the summation of the aligned character metric distances between the two words, i.e. the characters {E to S} {R to T} {R to A} {O to C} {R to K}. Using the method of determining character metric distances of the first aspect (with a hexadecimal comparative set and a first predefined distance of 0 and a second predefined distance of 1) would give the word metric distance between ERROR and STACKS the metric value of 6 (5 units of metric distance arising from the aligned character pairs and 1 unit of metric distance arising from the excess un-aligned character S from the word STACKS).

(97) Mathematically this can be defined as follows:

(98) $\begin{matrix} d_{W} (A_{i}, B_{j}) = {\begin{matrix} {.Math.}_{k = 0}^{M_{(A_{i})}} d_{C} (A_{i}^{k}, B_{j}^{k}) + M_{(B_{j})} - M_{(A_{i})} & M_{(A_{i})} < M_{(B_{j})} \\ {.Math.}_{k = 0}^{M_{(A_{i})}} d_{C} (A_{i}^{k}, B_{j}^{k}) & M_{(A_{i})} = M_{(B_{j})} \\ {.Math.}_{k = 0}^{M_{(A_{i})}} d_{C} (A_{i}^{k}, B_{j}^{k}) + M_{(A_{i})} - M_{(B_{j})} & M_{(A_{i})} > M_{(B_{j})} \end{matrix} & Equ . 11 \end{matrix}$

(99) Here d.sub.C(A.sub.i.sup.k, B.sub.j.sup.k) denotes the distance between the characters A.sub.i.sup.k and B.sub.j.sup.k. Again, d.sub.C is defined as the summation of character-wise distances and weights and can also be written as follows:

(100) $\begin{matrix} d_{W} (A_{i}, B_{j}) = .Math. M_{(A_{i})}, M_{(B_{j})} .Math. + \underset{k = 0}{\overset{\min (M_{(A_{i})}, M_{(B_{j})})}{.Math.}} d_{C} (A_{i}^{k}, B_{j}^{k}) & Equ . 12 \end{matrix}$

(101) Preferably, the method according to the second aspect uses the method mathematically corresponding to all of equations 8 (hence 9), 10 and 11 (hence 12).

(102) Clustering Event Messages

(103) According to a third aspect, there are provided methods and associated software and hardware for defining an area in metric space for clustering computational event messages 4.

(104) Preferably, the method provides for finding a finite minimal cover for the event metric space with a number of pair-wise disjoint metric balls 8 of varying centre points and allowing for varying radii 30. Mathematically, for the event metric space M we need to find N collectively exhaustive and mutually exclusive metric balls 8. Mathematically this is denoted as:
B(R.sub.i,X.sub.i) for 0 custom character i<N where iEqu. 13
for which the following two equations are satisfied:

(105) $\begin{matrix} \overset{N - 1}{\underset{i = 0}{.Math.}} B (R_{i}, X_{i}) = M where R_{i} 0 and X_{i} M & Equ . 14 \\ B (R_{i}, X_{i}) .Math. B (R_{j}, X_{j}) = for i j & Equ . 15 \end{matrix}$

(106) Subsequently the problem reduces to finding X.sub.i and R.sub.i in the event metric space.

(107) The method according to the third aspect, defines an area in metric space for clustering computational event messages 4. The event logs 2 are preferably taken from a list of event logs 2 (hence event messages 4) such as shown in FIGS. 1 and 2. Preferably the method sequentially analyses each log 2 in sequence, from the top of the list to the bottom of the list, when clustering messages 4.

(108) The method takes a first plurality of existing clusters 6. At least two of the said existing clusters 6 overlap in at least one overlap region 24 in metric space. For purposes of clarity, the overlap region 24 is the region of metric space where clusters 6 overlap; whilst the overlapping clusters 6 or overlapping areas (or overlapping defined areas) refer to the existing areas in metric space (for example existing metric balls 8) that partially overlap with one another giving rise to the overlap region. The existing clusters 6 are preferably those already determined or created from a cluster analysis of a first plurality of event messages 4. Preferably the said first plurality of event messages 4 comprises part of the list of event messages 4 in the said list of logs 2.

(109) Preferably the first plurality of event messages 4 comprises the event message 4 starting at the top of the list and subsequently each further event message 4 in sequence down the list up down to the Y.sup.th event message 4 on the list where Y is the number of messages 4 in the first plurality of event messages 4.

(110) The existing clusters 6 are defined areas in metric space that contain one or more event messages 4. FIG. 3 shows an example of a geometrical representation of a first plurality of event messages 4 divided into four clusters A, B, C, D. Each of the said first plurality of event messages 4 is located within one of the plurality of clusters 6 (defined areas).

(111) Preferably, as shown in FIG. 3, each existing cluster 6 is defined by a metric ball 8 centred about a central event message 10 with a radius R. In FIG. 3 it is shown that two overlap regions 24 exist: one overlap region 24 between metric balls A and D, and another between metric balls C and D.

(112) The method then examines an event that has not already been analysed for incorporation into a cluster 6 (i.e. a new event or a further event) and analyses if the event is within any of the overlap regions 24. Preferably the further event message 26 is one of the event messages 4 in the list of logs 2. Preferably the further event message 26 is the event message 4 immediately following the last of the first plurality of event messages 4 in the list that have already been analysed for incorporation into a cluster.

(113) If the further event message 26 is determined to reside within an overlap region, then the method creates a further cluster 6 (defined area). The further cluster 6 comprises a metric area comprising the further event message 26 and the event messages 4 originally comprised within the existing clusters 6 creating the said overlap region. Preferably, the method deletes the existing clusters 6 that defined the overlap region 24 at any time after the determination is made that the further event message 26 resides within the overlap region.

(114) FIG. 4a shows an example of this method whereby three existing clusters 6 are represented by three metric balls 8 {B.sub.1, B.sub.2, B.sub.3} with respective radii 30 {R.sub.1, R.sub.2, R.sub.3} and centre event messages {X.sub.1, X.sub.2, X.sub.3}. As shown in FIG. 4a, a new (further) event E is found in the overlap region 24 defined by the said three existing metric balls 8. FIG. 4b shows a new further defined area in the form of a further metric ball 8, 28 that encompasses all of the first plurality of event messages 4 and the further event message E. The further metric ball 8, 28 is also known as a consolidated metric ball 8, 28 and has a radius R.sub.C and centre point E. FIG. 4c then shows the deletion of the existing metric balls 8 {B.sub.1, B.sub.2, B.sub.3}.

(115) Preferably the further event message E is determined to be within an overlap region 24 by examining the metric distances, for each of the first plurality of existing defined areas (clusters 6), d.sub.E, between the further event message E; and the said central event message 10 of the existing defined areas.

(116) The said distances, d.sub.E, are compared to the respective metric ball radii 30 associated with the said central event message 10. If at least two of the said comparisons determine that the further event message E is covered by the associated metric ball, then the further event message E is determined to be within the overlap region.

(117) If using closed metric balls 8, the distance must be smaller than the radius 30 of the ball. If using open metric balls 8 the distance must be smaller than or equal to the radius 30.

(118) All of the existing clusters 6 that contribute to the overlap region 24 are flagged so that the area covered by the further defined area can be suitably large to cover all of the event messages 4 originally covered by the metric balls 8 contributing to the overlap region.

(119) In FIG. 4, for example, three metric distances (d.sub.E1, d.sub.E3, d.sub.E3) would be respectively calculated between: 1. The central event message X.sub.1, of metric ball B.sub.1; and further event message E. 2. The central event message X.sub.2, of metric ball B.sub.2; and further event message E. 3. The central event message X.sub.3, of metric ball B.sub.3; and further event message E.

(120) The determination would then be made whether at least two of the said distances were either smaller than (for a closed metric ball); or smaller than or equal to (for an open ball); the respective radii 30 associated with the metric ball 8 used to determine the said distance. Thus, for example using a closed ball 8 representation, whether: 1. d.sub.E1<R.sub.1 2. d.sub.E2<R.sub.2 3. d.sub.E3<R.sub.3

(121) In FIG. 4, all the event metric distances (d.sub.E1, d.sub.E3, d.sub.E3) are smaller than the respective radii 30, thus the consolidated metric ball 8, 28 is formed with further central event message E and radius R.sub.C that covers all of the events in the original three metric ball 8 clusters as well as the new event E.

(122) Preferably the distance between the central event message 10 of an existing metric ball 8 and the further event message 26, is calculated using the method according to the second aspect as described herein, although in principle, any suitable event metric distance determining method could be used.

(123) Preferably the new radius R.sub.C of the consolidated metric ball 8, 28 is determined by calculating a summation value, for each of the overlapping existing metric balls 8. Each summation value is the numeric value of the radius 30 for that existing metric ball 8 added to the respective distance between the further event message 26 and the central event message 10 for the said metric ball.

(124) For example, in FIG. 4a, the summation value for metric ball B.sub.1, is d.sub.E1+R.sub.1; the summation value for metric ball B.sub.2 is d.sub.E2+R.sub.2; the summation value for metric ball B.sub.3 is d.sub.E3+R.sub.3.

(125) The method then selects the largest of the summation values and provides the further radius 30 based on the largest summation value. Preferably, the further radius 30 is the selected summation value. For the example shown in FIG. 4b, the summation value is that associated with existing metric ball B.sub.3. Using this method of setting the radius 30 for the consolidated metric ball 8, 28 makes sure that all the event messages 4 previously covered by the existing metric balls 8 contributing to the overlap region 24 are included in the consolidated metric ball 8, 28.

(126) Mathematically, the consolidated metric ball 8, 28 can be created as follows. Let E be the extracted event and let B(R.sub.i,X.sub.i) for 1ip where p2 represent all the metric balls 8 flagged in this instance. Then the consolidated metric ball 8, 28 is represented by B(R.sub.c, E) where R.sub.c=max{R.sub.i+d.sub.E(X.sub.i, E) i where 1ip}.

(127) FIG. 5 shows an example of a flow chart for determining the consolidated metric ball 8, 28 radius 30 as described above.

(128) The method according to the flow chart of FIG. 5 may be amended or modified in any suitable way according to the methods described in the first, second and third aspects.

(129) FIG. 5 firstly starts by setting R.sub.c=0, and also setting i=1. The total number of metric balls 8 flagged, p, is obtained.

(130) A first determination is then made as to whether i is greater than p, (i>p?). If the answer to the first determination is yes then the method according to the flow chart of FIG. 5 then makes a consolidated metric ball B(R.sub.c,E). After this the method according to the flow chart of FIG. 5 comes to an end.

(131) If however upon the determination of whether i>p, the answer is no, then the method proceeds by getting the i.sup.th metric ball B(R.sub.i,X.sub.i). After this step, a metric distance d is calculated which is equal to d.sub.w(X.sub.i,E)+R.sub.i.

(132) After this calculation there proceeds a second determination step determining whether R.sub.c<d.

(133) If the answer is yes, then R.sub.c is set to be equal to d. The next step then sets i equal to i+1.

(134) If the answer to this second determination (R.sub.c<d?) is no then the setting of R.sub.c=d is ignored and the method simply proceeds to the setting of i=i+1.

(135) After the setting of i=i+1, the method then further proceeds by going back to the first determination step of determining whether or not i>p.

(136) Preferably the clusters 6 are defined in a table where each new defined area (cluster) is inserted into the table beneath (or following) the last cluster entry in the table. The table may be initially populated or unpopulated.

(137) Event Table

(138) In a fourth aspect, methods are presented for populating a table. The table comprises a first plurality of data entries. Each of the first plurality of data entries is associated with a defined area in metric space (i.e. a cluster) and at least one of a first plurality of computational event messages 4. Each of the said first plurality of clusters 6 that have associated data entries in the table, may comprise a metric space covering one or more of the first plurality of event messages 4. The method inserts a first further data entry in the table, the first further data entry is associated with a further defined area (cluster) in metric space as described in the third aspect described herein.

(139) Preferably the first plurality of data entries and the first further data entry are associated with any one or more of, a radius 30, or a point in metric space, corresponding to a central event message 10. Each central event message 10 corresponds to one of the first plurality of event messages 4.

(140) Preferably, each of the first plurality of data entries in the table corresponds to a different point in metric space corresponding to a different central event message 10 specific for that cluster. Preferably the table comprises a second plurality of data entries, each corresponding to a different specific data entry comprised from the first plurality of data entries. Each of the second plurality of data entries comprises a radius 30. That radius 30 (being one of the second plurality of data entries) is preferably the radius 30 of the metric ball 8 defining the cluster 6 where the corresponding central event message 10 (being one of the first plurality of data entries) is the centre point of the metric ball.

(141) Similarly, the first further data entry and second further data entries inserted into the table, likewise respectively correspond to the central event message 10 of the consolidated metric ball 8, 28 (i.e. the further metric coordinates of the further message) and the radius 30 of the consolidated metric ball 8, 28. In other words, when a consolidated metric ball 8, 28 is formed, the method inserts both the data relating to the metric coordinates of the further event message 26 and the radius 30 of the consolidated metric ball 8, 28 (i.e. the further defined area). Preferably these further data entries are a further set that are inserted after the last set of data entries present in the table that relate to an existing cluster.

(142) FIG. 6 shows an example of the method steps involved in creating an event table for storing the said data entries described herein.

(143) Firstly an initial minimum radius R.sub.MIN is set. This defines the default radius 30 that new metric ball 8 clusters are assigned with.

(144) An event table is then initialised. In principle, the table could be initialised at any time before the assigning of the minimum radius 30 up to the requirement to insert a data entry into the table.

(145) The method determines whether or not a message 4 (for example, but not limited to a syslog event message) is available to be analysed. If there are no messages 4 to be analysed, the method ends. If there are event messages 4 to be analysed (typically by sequentially analysing a list of error logs 2) then the new message (also known as the further message) E is retrieved and the message 4 is extracted from the log.

(146) Using methods described herein, all the metric balls 8 that cover E are identified and flagged. When the first message in the cluster analysis is analysed, there typically will be no existing metric balls 8 represented in the table and therefore the first message will not correspond to any existing metric balls 8. If only one ball 8 is flagged then the new message E is allocated to that existing metric ball 8 and the method goes back to obtaining and analysing the next new message log 2.

(147) If no metric balls 8 are flagged then then new event E does not fall within the coverage of any existing metric balls 8 and is used to create a new metric ball B with the minimum radius 30, whereby the entries in the table for the metric ball 8 correspond to the said minimum radius 30 and the event message B(R.sub.MIN, E) (the event message 4 in this instance being assigned as the central event message 10). Once the new metric ball 8 is formed, the method adds B(R.sub.MIN, E) to the event table and then checks to see if any new event logs 2 are to be analysed again.

(148) If more than one event message 4 is flagged then the new message E is covered by a plurality of existing metric balls 8 in an overlap region. A new consolidated metric ball B(R.sub.C, E) is formed using methods described herein. Data values corresponding to the consolidated metric ball radius 30 and the new event message (B(R.sub.MIN, E)) are added to the event table, preferably after the last set of data entries. The data entries relating to the existing flagged metric balls 8 are then deleted and the method returns to check if there are any further event logs 2 to be analysed.

(149) Once the event table is created it can be used for various log 2 analysis purposes. Each metric ball 8 entry in the event table represents a syslog message/event cluster. To determine which event cluster 6 a given a syslog message 4 (that has already been included in the clustering analysis) belongs to we only need to find out which metric ball 8 in the event table contains the given event message. There will be exactly one metric ball 8 which will contain the given event.

(150) It is simple to determine if a given syslog message 4 belongs to a particular event cluster 6 or not. Let E be the event extracted from the given syslog message 4 and let B(R, X) be a metric ball 8 entry in the event table representing a syslog event cluster 6 where X is the central event message 10 for that metric ball. Then, either B(R, X) is an open ball 8 and d.sub.W(X, E)<R or B(R, X) is a closed ball 8 and d.sub.W(X, E)R when the syslog message E belongs to this cluster. Otherwise E does not belong to this cluster.

(151) Sometimes there is a need to analyse new syslog event messages 4 which were not part of the process initially when constructing the event table. This is particularly relevant in the case of online syslog message processing where new syslog messages 4 will continuously be generated and there is a need to maintain an event table that clusters all past messages 4 generated so far. In this case the event table needs to be modified as new messages 4 are processed.

(152) One of the following happens when there is a new syslog message and an existing event table: 1. The syslog message 4 belongs to no event clusters 6. There is therefore a need to extract the event, create a new metric ball, and add this metric ball 8 to the event table. 2. The syslog message 4 belongs to exactly one event cluster. There is no need to do anything in this instance. 3. The syslog message 4 belongs to more than one event cluster. There is need to flag all metric balls 8 that the extracted event belongs to, create a consolidated metric ball 8, 28 as described herein, remove all flagged metric ball 8 entries from the event table and add the consolidated metric ball 8, 28 entry.

(153) Results

(154) The method according to the third aspect, (using the method of the first aspect using a hexadecimal comparative set and a first predefined metric character distance of 0 and a second predefined character metric distance of 1; with the method of the second aspect using equations 8-12) was implemented as an algorithm. The algorithm was implemented in Perl programming language and evaluated using Blue Gene/L supercomputer logs 2 available from Sandia National Laboratories. Table 2 and Table 3 show the results when running the algorithm to analyse, respectively, 1 day, 1 week, 2 weeks, 4 weeks, 8 weeks, and 20 weeks of Blue Gene/L log messages 4.

(155) TABLE-US-00002 TABLE 2 Number of event clusters vs. number of syslog lines Duration of Log Messages + 1 DAY 1 WEEK 2 WEEKS 4 WEEKS 8 WEEKS 20 WEEKS XY(R.sub.min = 1) 39 194 242 437 664 1209 Number of Event Clusters XY(R.sub.min = 5) 19 152 187 317 466 693 MTT/SLCT 124 8,934 9,362 21,805 58,576 69,688 Number of Log 16,851 222,609 837,178 1,384,976 2,935,994 3,691,534 Messages

(156) Table 2 shows the number of log messages 4 and number of event clusters 6 found. The numbers of event clusters 6 found using the method presented herein are shown for algorithms using a minimum radius R.sub.MIN of 1 and 5. The results of an MTT/SLCT clustering algorithm are shown in comparison. The present method reduces the number of event clusters 6 formed to 1% or 2% of that of the SLCT algorithm. For example, 20 Weeks of Blue Gene/L log 2 has approximately 3.7 million lines. The SLCT algorithm assigns them into 70,000 clusters, The present method with minimum radii of 1 and 5 respectively put them into approximately 1200 and 700 event clusters 6. This is a reduction to 1% or 2% of the number of clusters found by the existing SLCT algorithm and shows that the present method provides a clustering method that outputs a number of clusters 6 that is more suitable for further analysis.

(157) TABLE-US-00003 TABLE 3 Minimum radius vs. number of event clusters Duration of Log Messages + 1 DAY 1 WEEK 2 WEEKS 4 WEEKS 8 WEEKS 20 WEEKS XY(R.sub.min = 1) 39 194 242 437 664 1209 Number of Event Clusters XY(R.sub.min = 2) 24 161 199 365 566 939 XY(R.sub.min = 3) 22 156 191 343 514 815 XY(R.sub.min = 4) 20 154 189 325 483 729 XY(R.sub.min = 5) 19 152 187 317 466 693 XY(R.sub.min = 6) 19 151 186 263 407 623 XY(R.sub.min = 7) 19 148 183 253 391 593 XY(R.sub.min = 8) 18 144 177 247 369 565 XY(R.sub.min = 9) 18 139 171 234 219 166 XY(R.sub.min = 10) 18 132 165 225 211 156

(158) Table 3 shows how the number of clusters 6 varies with minimum radius 30. For a given minimum radius 30 and given order of processing log messages 4, the algorithm outputs an event table. The method presented herein therefore may provide varying ways of clustering log messages 4 at least by varying the default minimum radius 30 and/or changing the characters in the comparative set.

(159) The centre points of the metric balls 8 are representative events of clusters 6. A close examination of representative events in the clusters 6 reveals that decreasing/increasing the minimum radius 30 of the algorithm has an effect on the grouping of different length words formed of hexadecimal characters. For example, for 1 day Blue Gene/L logs 2 we have the following representative events when minimum radius 30 is 1: RAS KERNEL INFO generating core.304 RAS KERNEL INFO generating core.17 RAS KERNEL INFO generating core.3616

(160) Each of the three above logs 2 are covered by different clusters 6 because each one has a different length end word, hence the character weighting factor as described in equation 12 introduces a non-zero integer metric distance between these similar messages 4.

(161) However, for the same log 2 when the minimum radius 30 is 5 there is the following single representative: RAS KERNEL INFO generating core.304

(162) With this larger minimum radius 30, the other two events ending in 17 and 3616, when analysed, fell within the existing ball 8 centred on the already analysed event 304, hence, no new balls 8 were formed.

(163) Clearly, a minimum radius 30 of 1 produces 3 clusters 6 of the format generating core whereas minimum radius 30 of 5 produces just 1 cluster 6 of the same format.

(164) The accuracy of the method was 100% when evaluated with Blue Gene/L log messages 4. This was visually inspected and verified for the Blue Gene/L logs 2 for up to a week. Each event table output from the algorithm with different minimum radii 30 represented a valid clustering method for given log messages 4. As described above for the generating core string, the algorithm groups them into 1 or 3 clusters 6 depending on the minimum radius 30 but both groupings are acceptable.

(165) When comparing the present method with previous methods, such as Sisyphus and the SLCT algorithm, the present method is more efficient and generates fewer clusters 6 to analyse. The event clustering method used in Sisyphus considers a word appearing a particular position in a log message 4 as a token and subsequently a token considered as an event and used for clustering. This method generates a very large amount of events. The SLCT algorithm overcomes this problem by extracting patterns. While the SLCT algorithm reduces the number of event clusters 6 compared to the Sisyphus algorithm the number of event clusters 6 generated is still high. Further, the SLCT groups all events that it cannot cluster into a new cluster, thereby reducing the accuracy of clustering.

(166) Some of clusters discovered by SLOT algorithm from Blue Gene/L 1 day logs 2 are given below:

(167) RAS KERNEL INFO ddr:

(168) RAS KERNEL INFO generating

(169) RAS KERNEL INFO generating core.385

(170) RAS KERNEL INFO generating core.257

(171) RAS KERNEL INFO generating core.65

(172) RAS KERNEL INFO*double-hummer alignment exceptions

(173) RAS KERNEL INFO 221 double-hummer alignment exceptions

(174) RAS KERNEL INFO 223 double-hummer alignment exceptions

(175) RAS KERNEL INFO 203 double-hummer alignment exceptions

(176) RAS KERNEL INFO 102 double-hummer alignment exceptions

(177) RAS KERNEL INFO 183 double-hummer alignment exceptions

(178) RAS KERNEL INFO 201 double-hummer alignment exceptions

(179) RAS KERNEL INFO 222 double-hummer alignment exceptions

(180) RAS KERNEL INFO 242 double-hummer alignment exceptions

(181) RAS KERNEL INFO 101 double-hummer alignment exceptions

(182) RAS KERNEL INFO 122 double-hummer alignment exceptions

(183) RAS KERNEL INFO 202 double-hummer alignment exceptions

(184) RAS KERNEL INFO 181 double-hummer alignment exceptions

(185) RAS KERNEL INFO 121 double-hummer alignment exceptions

(186) Clearly the SLCT algorithm is unable to capture (and hence ignore) decimal/hexadecimal patterns for clustering effectively.

(187) The said methods described herein may be embodied as one or more pieces of software. The said software is preferably held or otherwise encoded upon a memory device such as, but not limited to, any one or more of, a hard disk drive, RAM, ROM, solid state memory or other suitable memory device or component configured to software. The said methods may be realised by executing/running the software. Additionally or alternatively, the said methods may be hardware encoded. The said methods encoded in software or hardware are preferably executed using one or more processors. The said memory and/or hardware and/or processors are preferably comprised as, at least part of, one or more servers and/or other suitable computing systems.

Methods of clustering computational event logs

Assignee

Inventors

Cpc classification

Classification Explorer

G06F16/1734

PHYSICS

Classification Explorer

G06F16/686

PHYSICS

Classification Explorer

G06F11/323

PHYSICS

Classification Explorer

G06F16/906

PHYSICS

Classification Explorer

G06F16/355

PHYSICS

Classification Explorer

G06F11/0766

PHYSICS

Classification Explorer

G06F18/24137

PHYSICS

Classification Explorer

G06V30/1983

PHYSICS

International classification

Classification Explorer

G06F16/00

PHYSICS

Classification Explorer

G06F16/906

PHYSICS

Classification Explorer

G06F16/17

PHYSICS

Classification Explorer

G06F16/35

PHYSICS

Classification Explorer

G06K9/62

PHYSICS

Classification Explorer

G06F16/68

PHYSICS

Classification Explorer

G06K9/68

PHYSICS

Abstract

Claims

Description