APPARATUS AND METHOD FOR DETECTING AN ANOMALY IN A DATASET AND COMPUTER PROGRAM PRODUCT THEREFOR
20210144167 · 2021-05-13
Inventors
Cpc classification
G06N7/01
PHYSICS
G06F11/0781
PHYSICS
International classification
Abstract
Apparatus and methods for detecting an anomaly in a dataset by using two or more anomaly detection algorithms, as well as to corresponding computer program products, are described. The results obtained by using the two or more anomaly detection algorithms are combined in accordance with a certain rule of combination, thereby providing an improved accuracy of anomaly detection.
Claims
1. An apparatus for detecting an anomaly in a dataset, the apparatus comprising: at least one processor; and a storage coupled to the at least one processor and storing executable instructions which, when executed by the at least one processor, cause the at least one processor to: receive the dataset comprising multiple data items among which at least one data item is anomalous, process the data items in the data set by each of at least two of a plurality of anomaly detection algorithms to: calculate an anomaly score for each of the data items, based on the anomaly scores, obtain a partial ranking of the data items, the partial ranking causing the data items to be divided into subsets each corresponding to a different interval of intermediate ranks, based on the partial ranking, select a probabilistic model describing the intermediate ranks of the data items in each subset, and based on the probabilistic model, assign a degree of belief to the intermediate rank of each of the data items in each subset, obtain a total degree of belief for the intermediate rank of each of the data items by combining the degrees of belief obtained, for intermediate ranks corresponding to each of the data items, from the at least two anomaly detection algorithms in accordance with a predefined combination rule, convert the total degrees of belief for the intermediate ranks of the data items to a probability distribution function describing expected ranks of the data items, sort the data items according to the expected ranks of the data items, and find, among the sorted data items, the at least one anomalous data item.
2. The apparatus of claim 1, wherein the at least one processor is further configured to select the at least two anomaly detection algorithms from the plurality of anomaly detection algorithms based on a usage domain which the data items belong to.
3. The apparatus of claim 1, wherein each of the at least two anomaly detection algorithms is provided with a different weight coefficient, and wherein the at least one processor is further configured to assign the degree of belief based on the probabilistic model in concert with the weight coefficient of the anomaly detection algorithm.
4. The apparatus of claim 3, wherein the at least two anomaly detection algorithms are unsupervised learning based anomaly detection algorithms, and wherein the different weight coefficients of the at least two anomaly detection algorithms are specified based on user preferences such that the sum of the weight coefficients is equal to 1.
5. The apparatus of claim 3, wherein the at least two anomaly detection algorithms are supervised learning based anomaly detection algorithms, and wherein the weight coefficients of the at least two anomaly detection algorithms are adjusted by using a pre-arranged training set comprising different previous datasets and target rankings each corresponding to one of the previous datasets.
6. The apparatus of claim 5, wherein the weight coefficients of the at least two anomaly detection algorithms are further adjusted based on a Kendall tau distance serving a measure of distance between the combined partial rankings obtained by the at least two anomaly detection algorithms and a respective one of the target rankings from the training set.
7. The apparatus of claim 1, wherein the subsets obtained based on the partial ranking of the data items comprise at least two first subsets each comprising the data items having the same anomaly scores.
8. The apparatus of claim 7, wherein the intervals of intermediate ranks of the at least two first subsets are non-overlapping.
9. The apparatus of claim 7, wherein the subsets obtained based on the partial ranking of the data items further comprise a second subset comprising data items falling outside of the at least two first subsets, and the at least one processor is further configured to select the probabilistic model taking into account the second subset.
10. The apparatus of claim 9, wherein the data items of the second subset are erroneously missed data items.
11. The apparatus of claim 9, wherein the data items of the second subset are data items having the anomaly scores differing from those of the data items belonging to the at least two first sub sets.
12. The apparatus of claim 9, wherein the data items of the second subset are erroneously missed data items and data items having the anomaly scores differing from those of the data items belonging to the at least two first subsets.
13. The apparatus of claim 9, wherein the interval of intermediate ranks of the second subset covers the intervals of intermediate ranks of the at least two first subsets.
14. The apparatus of claim 1, wherein the predefined combination rule comprises Dempster's rule of combination.
15. The apparatus of claim 1, wherein the at least two anomaly detection algorithms comprises any combination of the following algorithms: a nearest neighbor-based anomaly detection algorithm, a clustering-based anomaly detection algorithm, a statistical anomaly detection algorithm, a subspace-based anomaly detection algorithm, and a classifier-based anomaly detection algorithm.
16. The apparatus of claim 1, wherein the degree of belief for the intermediate rank comprises a basic belief assignment.
17. The apparatus of claim 1, wherein the at least one processor is further configured to convert the total degrees of belief for the intermediate ranks of the data items to the probability distribution function by using a pignistic transformation, and wherein the probability distribution function is a pignistic probability function.
18. The apparatus of claim 1, wherein the data items comprise network flow data, and the at least one anomalous data item relates to abnormal network flow behavior.
19. A method for detecting an anomaly in a dataset, the method comprising: receiving the dataset comprising multiple data items among which at least one data item is anomalous, processing the data items in the data set by each of at least two of a plurality of anomaly detection algorithms by: calculating an anomaly score for each of the data items, based on the anomaly scores, obtaining a partial ranking of the data items, the partial ranking causing the data items to be divided into subsets each corresponding to a different interval of intermediate ranks, based on the partial ranking, selecting a probabilistic model describing the intermediate ranks of the data items in each subset, and based on the probabilistic model, assigning a degree of belief to the intermediate rank of each of the data items in each subset, obtaining a total degree of belief for the intermediate rank of each of the data items by combining the degrees of belief obtained, for intermediate ranks corresponding to each of the data items, from the at least two anomaly detection algorithms in accordance with a predefined combination rule, converting the total degrees of belief for the intermediate ranks of the data items to a probability distribution function describing expected ranks of the data items, sorting the data items according to the expected ranks of the data items, and finding, among the sorted data items, the at least one anomalous data item.
20. A computer program product comprising a computer-readable storage medium storing a computer program, the computer program, when executed by at least one processor, causing the at least one processor to perform operations, comprising: receiving the dataset comprising multiple data items among which at least one data item is anomalous, processing the data items in the data set by each of at least two of a plurality of anomaly detection algorithms by: calculating an anomaly score for each of the data items, based on the anomaly scores, obtaining a partial ranking of the data items, the partial ranking causing the data items to be divided into subsets each corresponding to a different interval of intermediate ranks, based on the partial ranking, selecting a probabilistic model describing the intermediate ranks of the data items in each subset, and based on the probabilistic model, assigning a degree of belief to the intermediate rank of each of the data items in each subset, obtaining a total degree of belief for the intermediate rank of each of the data items by combining the degrees of belief obtained, for intermediate ranks corresponding to each of the data items, from the at least two anomaly detection algorithms in accordance with a predefined combination rule, converting the total degrees of belief for the intermediate ranks of the data items to a probability distribution function describing expected ranks of the data items, sorting the data items according to the expected ranks of the data items, and finding, among the sorted data items, the at least one anomalous data item.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0029] The essence of the present disclosure is explained below with reference to the accompanying drawings in which:
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
DETAILED DESCRIPTION
[0040] Various embodiments of the present disclosure are further described in more detail with reference to the accompanying drawings. However, the present disclosure can be embodied in many other forms and should not be construed as limited to any certain structure or function disclosed in the following description. In contrast, these embodiments are provided to make the description detailed and complete.
[0041] According to the detailed description, it will be apparent to ones skilled in the art that the scope of the present disclosure encompasses any embodiment disclosed herein, irrespective of whether this embodiment is implemented independently or in concert with any other embodiment. For example, the apparatus and method disclosed herein can be implemented in practice by using any numbers of the embodiments provided herein. Furthermore, it should be understood that any embodiment can be implemented using one or more of the elements or steps presented in the appended claims.
[0042] As used herein, the term “anomaly” and its derivatives, such as “anomalous”, “abnormal”, etc., refer to something that deviates from what is standard, normal, or expected. In particular, the term “anomalous data item” also used herein means a data item in a dataset, which falls outside the ranges of the standard deviation of data items in the dataset. An anomaly may be characterized by two or more neighboring or close anomalous data items, and is called a collective anomaly in this case. The anomaly may relate to an event of interest, i.e. a problem to be detected and solved, or be irrelevant to the event of interest. In the latter case, the anomaly is called a spurious anomaly. One example of the anomaly includes a suspiciously large (i.e., non-typical) network flow which may be caused by malicious software. Although references are hereby made to network flow data, it should be apparent to those skilled in the art that this is done only by way of example but not limitation. In other words, the embodiments disclosed herein may be equally applied in other usage domains where anomaly detection is required, such, for example, as the detection of fraudulent pump and dump on stocks, the detection of excessive scores mistakenly issued in figure skating or other kinds of sports, etc.
[0043] The term “combination rule” used herein refers to an analytical rule or condition that may be applied to output data of multiple data sources to integrate the output data into more consistent, accurate, and useful information than the output data of any individual data source. The data sources are presented herein as anomaly detection algorithms, and their output data to be integrated or combined comprise degrees of beliefs. One example of the combination rule includes the Dempster's rule of combination.
[0044] The term “degree of belief” used herein refers to a mathematical object called a belief function that is used in the theory of belief functions, also known as the evidence theory or Dempster-Shafer theory. The theory of belief functions allows one to combine evidence from different data sources to arrive at a degree of belief that takes into account all the available evidence. As will be shown later, the degree of belief is applied herein to intermediate ranks of data items obtained by using the anomaly detection algorithms. One example of degrees of belief are basic belief assignments (bbas) which will be discussed later in context of the embodiments disclosed herein. By definition, assuming that θ represents a set of hypotheses H (for example, all possible states of a system under consideration), which is called a frame of discernment, the basic belief assignment represents a function assigning a belief mass m to each data element of a power set 2.sup.θ which is a set of all subsets of θ, including an empty set Ø, such that m: 2.sup.θ.fwdarw.[0,1]. The basic belief assignment has the following two main properties:
where the subsets H.sub.n of θ are called focal elements of m (non-zero masses).
[0045] As used herein, the term “rank” refers to a numerical parameter used to classify data items into different anomaly classes. Each anomaly class is represented by a certain interval of ranks. An intermediate rank discussed herein is obtained by using any one anomaly detection algorithm. An expected rank also discussed herein is a more valid rank resulted from using the intermediate ranks obtained by multiple anomaly detection algorithms.
[0046]
[0047]
[0048] Generally speaking, the absolute values of the anomaly scores themselves are meaningless—they are used solely for establishing the ordering relationship among the data items. Therefore, the accuracy of anomaly detection is low in cases of using only one anomaly detection algorithm.
[0049] The aspects of the present disclosure discussed below take into account the above-mentioned drawbacks, and are directed to improving the accuracy and robustness of anomaly detection, particularly, in the network flow data.
[0050]
[0051] The storage 302 may be implemented as a volatile or nonvolatile memory used in modern electronic computing machines. Examples of the nonvolatile memory include Read-Only Memory (ROM), flash memory, ferroelectric Random-Access Memory (RAM), Programmable ROM (PROM), Electrically Erasable PROM (EEPROM), solid state drive (SSD), magnetic disk storage (such as hard drives and magnetic tapes), optical disc storage (such as a compact disc (CD), digital vide disc (DVD) and Blu-ray discs), etc. As for the volatile memory, examples thereof include Dynamic RAM, Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDR SDRAM), Static RAM, etc.
[0052] Relative to the processor 304, it may be implemented as a central processing unit (CPU), general-purpose processor, single-purpose processor, microcontroller, microprocessor, application specific integrated circuit (ASIC), field programmable gate array (FPGA), digital signal processor (DSP), complex programmable logic device, etc. It should be also noted that the processor 304 may be implemented as any combination of one or more of the aforesaid. As an example, the processor 304 may be a combination of two or more microprocessors.
[0053] The executable instructions 306 stored in the storage 302 may be configured as a computer executable code which causes the processor 304 to perform the aspects of the present disclosure. The computer executable code for carrying out operations or steps for the aspects of the present disclosure may be written in any combination of one or more programming languages, such as Java, C++ or the like. In some examples, the computer executable code may be in the form of a high level language or in a pre-compiled form, and be generated by an interpreter (also pre-stored in the storage 302) on the fly.
[0054] Being caused with the executable instructions 306, the processor 304 first receives the dataset comprising multiple data items among which the at least one data item is anomalous, as noted above. After that, the processor 304 selects at least two anomaly detection algorithms based on the usage domain which the data items belong to. The reason for using two or more anomaly detection algorithms is a synergic effect consisting in that the accuracy of anomaly detection provided by the two or more anomaly detection algorithms is higher than that provided by any single anomaly detection algorithm. More specifically, if a user of the apparatus 300 is absolutely sure that one of the anomaly detection algorithms provides 100% accuracy, he or she will not combine it with any other of the anomaly detection algorithms. However, in practice, any anomaly detection algorithm is prone to errors, which forces the user to decide which of the anomaly detection algorithms has to be selected and under what circumstances. That is why the aggregated accuracy provided by the two or more anomaly detection algorithms is more preferable and useful in the process of anomaly detection.
[0055] In one embodiment, the at least two anomaly detection algorithms comprise any combination of the following algorithms: a nearest neighbor-based anomaly detection algorithm, a clustering-based anomaly detection algorithm, a statistical anomaly detection algorithm, a subspace-based anomaly detection algorithm, and a classifier-based anomaly detection algorithm. Some examples of such anomaly detection algorithms are described by Goldstein M. and Uchida S. in their work “A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data”, PLoS ONE 11(4): e0152173 (2016). Moreover, the at least two anomaly detection algorithms may be unsupervised or supervised learning based anomaly detection algorithms, thereby making the apparatus 300 more automatic and flexible in use. As should apparent to those skilled in the art, unsupervised or supervised learning may involve using neural networks, decision trees, and/or other artificial intelligence techniques, depending on particular applications.
[0056] Once the at least two anomaly detection algorithms are selected, the processor 304 uses them to calculate the anomaly score for each of the data items. The anomaly scores are then used by the processor 304 to obtain a partial ranking of the data items. The partial ranking causes the data items to be divided into subsets each corresponding to a different interval of intermediate ranks, as schematically shown in
[0057] By using the partial ranking, the processor 304 further selects a probabilistic model describing the intermediate ranks of the data items in each of the subsets. In general, the probabilistic model defines a probability distribution of the intermediate ranks among the data items in each subset.
[0058] However, if there are not all the data items put in the non-overlapping subsets either mistakenly or due to the presence of the data items having the anomaly scores other than those of the data items put in the non-overlapping subsets, the uniform probability distributions for the non-overlapping subsets will be violated. This situation is schematically shown in
[0059] To calculate the probability distribution of the intermediate ranks in the subset of interest in the presence of the unranked data items, the processor 304 may be configured to perform the following procedure. At first, let us assume that, as a result of the partial ranking, there are an arbitrary number of ranked subsets (i.e., buckets), like the subsets 600a and 600b in
where B.sub.1 denotes the j-th subset, y and z are the left and right boundary data items, respectively, in the middle group, and x is the unranked data item; [0065] 6) k.sub.top—the number of the unranked data items (i.e. the black circles) in the top group,
[0067] Further, the processor 304 uses a pseudo code for computing the probability distribution P.sub.1 of the intermediate ranks of the data items in B.sub.1, which is given below as Algorithm 1. It is assumed that P.sub.j is the |X|-component vector such that P.sub.j(r)=Pr(rank(x)=r) for any x∈B.sub.j and r∈{1, . . . , |X|}. By definition, Σ.sub.r=1.sup.|X|P.sub.j(r)=1.
TABLE-US-00001 Algorithm 1: Compute the probability distribution of the intermediate ranks for the data items in B.sub.j. Inputs: |X|, N, n.sub.middle, n.sub.bottom, n.sub.top, K, k.sub.middle, k.sub.bottom, k.sub.top Output: P.sub.j P.sub.j(1: |X|) ← 0 for all possible pairs (r.sub.top, r.sub.bottom) do p.sub.middle ← HYP(k.sub.middle, n.sub.middle, K, N) p.sub.bottom ← HYP(k.sub.bottom, n.sub.bottom, K − k.sub.middle, N − n.sub.middle) p.sub.top ← Hyp(k.sub.top, n.sub.top, K − k.sub.bottom − k.sub.middle, N − n.sub.bottom − n.sub.middle) p.sub.decomp ← p.sub.top * p.sub.middle * p.sub.bottom p.sub.uniform ← 1/n.sub.middle P.sub.j (r.sub.top:r.sub.bottom) ← P.sub.j (r.sub.top:r.sub.bottom) + p.sub.uniform * p.sub.decomp end for
[0068] In Algorithm 1, p.sub.decomp is the probability of the decomposition of the unranked data items, which is defined by the parameters k.sub.middle, k.sub.bottom, k.sub.top, the sign “←” is the value assignment operator, and the function Hyp( ) is the hypergeometric distribution. In particular, the function Hyp( ) describes the probability of obtaining the total number of k black circles in the sample of length n without replacement, starting out with N circles among which K circles are black. In other words,
where C.sub.K.sup.k is the binomial coefficient.
[0069] Thus, by using Algorithm 1, the processor 304 calculates the probability distribution P.sub.j of the intermediate ranks of the data items in B.sub.j in case of using each of the at least two anomaly detection algorithms. In other words, if the processor 304 uses L anomaly detection algorithms, it will be required for the processor 304 to calculate the probability distributions P.sub.j.sup.(1), . . . , P.sub.j.sup.(L) respectively, for the intermediate ranks of the data items in B.sub.j.
[0070] When the probabilistic model, or, in other words, the probability distribution P.sub.j, is calculated, the processor 304 further assigns, the based on P.sub.j, a degree of belief to the intermediate rank of each of the data items in B.sub.1. Further, the degree of belief is exemplified by the basic belief assignment (bba). However, the degree of belief is not limited to the bba, and may be presented as any other belief functions specific to the Dempster-Shafer theory.
[0071] In one embodiment, the processor 304 is configured to provide each of the at least two anomaly detection algorithms with a different weight coefficient and assign the bba based on the probabilistic model in concert with the weight coefficient of the anomaly detection algorithm. This allows adjusting the contribution of each anomaly detection algorithm into the aggregated accuracy of anomaly detection.
[0072] In one embodiment, in case of the unsupervised learning based anomaly detection algorithms, the processor 304 is configured to specify the different weight coefficients of the at least two anomaly detection algorithms based on user preferences such that the sum of the weight coefficients is equal to 1, i.e. Σ.sub.i=1w.sub.i=1, where L is the number of the anomaly detection algorithms used. This allows the user of the apparatus 300 to prioritize the anomaly detection algorithms according to his or her experience.
[0073] In another embodiment, in case of the supervised learning based anomaly detection algorithms, the processor 304 is configured to adjust the weight coefficients of the at least two anomaly detection algorithms by using a pre-arranged training set comprising different previous datasets and target rankings each corresponding to one of the previous datasets. The training set may be stored in the storage 302 in advance, i.e. before the operation of the apparatus 300. In this case, the processor 304 first searches for the previous dataset similar to that of interest, and then changes the weight coefficient of each anomaly detection algorithm until the partial ranking coincides with the target ranking of this previous dataset. The weight coefficients of the at least two anomaly detection algorithms may be further adjusted by the processor 304 based on the Kendall tau distance serving a measure of distance between the combined partial rankings obtained by the at least two anomaly detection algorithms and respective one of the target rankings from the training set. In this case, the Kendall tau distance, which exploits a probability distribution similar to P.sub.j calculated earlier, for a pair of partial rankings σ and τ are expressed as follows (here the signs “∨” and “∧” represent the grouping and intersection signs, respectively):
and its normalized analogue is given by
[0074] Being governed by M training sets, the weight coefficient adaptation procedure strives to find non-negative weight coefficients w.sub.1, . . . , w.sub.L which minimize the following loss function:
and satisfy the condition Σ.sub.l=1.sup.Lw.sub.l=1. Here σ.sub.gr.truth.sup.i is the partial ranking that is known to be true for the data items in the i-th training set, τ.sub.l.sup.i is the partial ranking computed by the l-th anomaly detection algorithm for the data items in the i-th training set, w.sub.1τ.sub.1.sup.i+ . . . +w.sub.Lτ.sub.L.sup.i is the partial ranking obtained by the processor 304, i.e. by combining the partial rankings τ.sub.1.sup.i, . . . , τ.sub.L.sup.i with the weight coefficients w.sub.1, . . . , w.sub.L.
[0075] Turning now back to the assignment of the bbas, it should be noted that the processor 304 may use Algorithm 2 for this purpose, which is given below and takes into account the weight coefficients of the anomaly detection algorithms.
TABLE-US-00002 Algorithm 2: Compute the bba for the data item x ranked by the l-th anomaly detection algorithm. Input: P.sup.(l) Output: m.sub.l for r=1:|X| do m.sub.l(rank(x) = r) ← w.sub.l * P.sup.(l)(r) end for m.sub.l(rank(x) = 1 ∪ ... ∪ rank(x) = |X|) ← 1 − w.sub.l
[0076] In other words, by using Algorithm 2, the processor 304 considers the following frame of discernment 0={rank(x)=1, . . . , rank(x)=|X|} for each data item, and computes (|X|+1)-component bbas, with the components corresponding to the following outcomes rank(x)=1, . . . , rank(x)=|X|, rank(x)=Θ. The last outcome, i.e. rank(x)=Θ, means that x may have any intermediate rank. By construction, Σ.sub.lm.sub.l=1.
[0077] When the bbas for all the anomaly detection algorithms are obtained, the processor 304 then obtains a total degree of belief, i.e. a total bba, for the intermediate rank of each of the data items. To do this, the processor 304 combines the bbas obtained for the intermediate rank in accordance with a predefined combination rule. Algorithm 3 given below describes this operation, taking the Dempster's rule of combination as one example of the predefined combination rule.
TABLE-US-00003 Algorithm 3: Apply the Dempster's rule of combination to the data item x. Input: m.sub.1, m.sub.2 Output m.sub.1,2 for each outcome A do
[0078] In Algorithm 3, A, B, C are the indices that can take on any value from 1 to |X|+1, and m.sub.1,2, m.sub.1, and m.sub.2 are the vectors of length |X|+1, with m.sub.1, and m.sub.2 corresponding to the first and the second anomaly detection algorithms, respectively, the results of which are subjected to combination, and m.sub.1,2 being the result of this combination. Since the Dempster's rule of combination is both commutative and associative, it can combine all L bbas (according to the number of the anomaly detection algorithms) in a single total bba m.
[0079] After that, the processor 304 converts the total bbas for the intermediate ranks of the data items to a probability distribution function describing expected ranks of the data items. This may be done in one embodiment by using a pignistic transformation, and the probability distribution function is a pignistic probability function betP in such case. The pignistic transformation performed by the processor 304 is generalized below as Algorithm 4.
TABLE-US-00004 Algorithm 4: Compute the pignistic probability betP for the data item x. Input: m Output: betP for r in 1:|X| do betP(r) ← m(rank(x) = r) + m(rank(x) = 1 ∪ ... ∪ ∪ rank(x) = |X|)/|X| end for
[0080] Next, the processor 304 computes the expected rank of each data item x∈X by using the pignistic probability betP and sorts all the data items in the dataset X by their expected ranks according to the following formula:
E[rank(x)]=Σ.sub.r=1.sup.|X|r.Math.betP(r).
[0081] Finally, the processor 304 finds the at least one anomalous data item among the sorted data items. Thus, by using the above-described procedure comprising Algorithms 1-4, the processor 304 is able to detect the anomaly of interest in the dataset, and even filter out the spurious anomalies if they are present in the dataset.
[0082] In one embodiment, the processor 304 may further convert the expected ranks to the partial ranking in the same way as the original anomaly scores are converted to the partial rankings but with the reverse order of the subsets because, by convention, the smaller ranks should correspond to the higher anomaly scores.
[0083] With reference to
[0084] The method 800 starts up in step 802, in which the dataset comprising at least one anomalous data item is received. As noted earlier, the dataset may relate to different usage domains. Once the dataset is received, the method proceeds to step 804, in which the at least two anomaly detection algorithms are selected based on the usage domain which the dataset belongs to. Further, steps 806-812 are carried out by using each of the at least two anomaly detection algorithms independently.
[0085] In particular, an anomaly score for each of the data items is calculated in the step 806. In the step 808, a partial ranking of the data items is obtained based on the anomaly scores. The partial ranking represents the division of the data items into subsets each corresponding to a different interval of intermediate ranks and, consequently, a different anomaly class. The examples of such subsets have been discussed above with reference to
[0086] Once the degrees of belief for each intermediate rank are obtained by using each of the at least two anomaly detection algorithms, the method 800 proceeds to step 814, in which the degrees of belief are combined in accordance with the combination rule to obtain a total degree of belief. This may be done by using Algorithm 3 discussed above, in which the combination rule is exemplified by the Dempster's rule of combination. Further, in step 816, the total degrees of belief for the intermediate ranks of the data items are converted to a probability distribution function describing expected ranks of the data items. Such conversion may be implemented by using the pignistic transformation described above with reference to Algorithm 4. After that, the data items are sorted, in step 818, according to the expected ranks of the data items. Finally, in step 820, the at least one anomalous data items is found among the sorted data items.
[0087]
[0088] It should be noted that some approaches suggest an alternative solution for the same problem which is addressed by the method 800 using the Dempster's rule of combination. In particular, the alternative solution involves adopting a median rank aggregation to partial rankings. However, the median rank aggregation method provides a lower accuracy of anomaly detection compared to the accuracy of the method 800. This has been proved by numerical experiments, the results of which are shown in
[0089] Those skilled in the art should understand that each step of the method 800, or any combinations of the steps, can be implemented by various means, such as hardware, firmware, and/or software. As an example, one or more of the steps described above can be embodied by computer or processor executable instructions, data structures, program modules, and other suitable data representations. Furthermore, the computer executable instructions which embody the steps described above can be stored on a corresponding data carrier and executed by at least one processor like the processor 304 included in the apparatus 300. This data carrier can be implemented as any computer-readable storage medium configured to be readable by said at least one processor to execute the computer executable instructions. Such computer-readable storage media can include both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, the computer-readable media comprise media implemented in any method or technology suitable for storing information. In more detail, the practical examples of the computer-readable media include, but are not limited to information-delivery media, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, DVD, holographic media or other optical disc storage, magnetic tape, magnetic cassettes, magnetic disk storage, and other magnetic storage devices.
[0090] Although the exemplary embodiments are disclosed herein, it should be noted that any various changes and modifications could be made in these embodiments, without departing from the scope of legal protection which is defined by the appended claims. In the appended claims, the mention of elements in a singular form does not exclude the presence of the plurality of such elements, if not explicitly stated otherwise.