METHOD AND SYSTEM FOR MONITORING SENSOR DATA OF ROTATING EQUIPMENT

20170365155 · 2017-12-21

    Inventors

    Cpc classification

    International classification

    Abstract

    A sensor data stream is provided consisting of feature vectors acquired by sensors of rotating equipment, similar feature vectors are aggregated in microclusters. For newly arriving feature vectors, a correlation distance measure between the new feature vector and each microcluster is calculated. If there is no microcluster in range, then a new microcluster is created. Otherwise, the feature vector is assigned to the best fitting microcluster, and the necessary statistical information is incorporated into the aggregation contained in the microcluster. In other words, similar feature vectors are aggregated in the same microclusters. The microclusters thus provide a generic summary structure that captures the necessary statistical information of the incorporated feature vectors. At the same time, the loss of accuracy is quite small. Clustering the sensor data stream with microclusters has the benefit that the computational complexity can be reduced significantly.

    Claims

    1-11. (canceled)

    12. A method for monitoring sensor data of rotating equipment, with the following steps: processing a sensor data stream consisting of an ordered sequence of feature vectors, each feature vector representing measurements of sensors of the rotating equipment at a certain point in time, representing the sensor data stream with a set of microclusters, each microcluster defining a subspace, for each new feature vector of the sensor data stream, updating the set of microclusters by calculating a correlation distance measure between the new feature vector and each microcluster, and assigning the new feature vector to a microcluster with a smallest value for the correlation distance measure if the value is below a range parameter and updating the microcluster based on the new feature vector, or creating a new microcluster based on the new feature vector if all values for the correlation distance measure are above the range parameter.

    13. The method according to claim 12, with the additional step of detecting the possibility of a failure in real time if a change of orientation of the subspace of at least one of the microclusters exceeds a threshold, or if at least one new microcluster has been created.

    14. The method according to claim 12, with the additional step of detecting the possibility of a failure in real time by continuously comparing newly created microclusters with a database containing microclusters representing known failure states.

    15. The method according to claim 12, with each microcluster comprising a mean vector of feature vectors contained in the microcluster, a timestamp of the last incoming feature vector assigned to the microcluster, a buffer containing incoming feature vectors, if the microcluster has not been initialized, and an eigenvector matrix containing eigenvectors, and eigenvalues for the eigenvectors, if the microcluster has already been initialized.

    16. The method according to claim claim 15, wherein for microclusters that have not been initialized, instead of calculating the correlation distance measure calculating an Euclidean distance between the new feature vector and the mean vector of the microcluster.

    17. The method according to claim 15, wherein updating the microcluster based on the new feature vector comprises the alternatives of if the microcluster has not been initialized, inserting the new feature vector into its buffer, recalculating its mean vector and updating its timestamp, and if the buffer is now filled, initializing the microcluster by performing an initial Principal Component Analysis to calculate its eigenvectors and eigenvalues, or if the microcluster has already been initialized, performing an incremental Principal Component Analysis to recalculate its eigenvectors and eigenvalues, recalculating its mean vector and updating its timestamp.

    18. The method according to claim 17, wherein the incremental Principal Component Analysis uses an exponential fading function assigning each feature vector a weight which decreases exponentially with time.

    19. The method according to claim 15, with the additional step of periodically scanning the set of microclusters and deleting microclusters whose timestamp is older than a threshold value.

    20. The method according to claim 12, with an additional offline phase which includes creating a macrocluster model containing macroclusters based on the microclusters by calculating a comparison measure between each pair of microclusters and grouping microclusters in a macrocluster if their value of the comparison measure is below a threshold.

    21. The method according to claim 20, wherein the macrocluster model is created by partitioning the set of microclusters, computing the macroclusters within each partition, and building a hierarchy of the macroclusters.

    22. The method according to claim 21, with the additional step of merging related macroclusters from different partitions.

    23. The method according to claim 20, wherein the comparison measure is calculated by comparing an orientation of the subspaces of the microclusters and grouping microclusters if the difference of the orientation of their subspaces is below a threshold.

    24. The method according to claim 20, wherein the comparison measure computes a composition of an approximate linear dependency and an affine distance between two microclusters.

    25. The method according to claim 20, with the additional step of comparing the macrocluster model with historical models by calculating a similarity measure, with each historical model representing either a standard operation or a failure state.

    26. The method according to claim 25, with the additional step of ranking the historical models according to the value of the similarity measure, and outputting a ranked list of the historical models with the highest values of the similarity measure.

    27. The method according to claim 25, with the additional step of choosing the historical model with the highest value of the similarity measure, and detecting a failure if the chosen historical model represents a failure state.

    28. The method according to claim 12, wherein an ageing mechanism is used to forget an influence of old parts of the sensor data stream to the microclusters.

    29. The method according to claim 12, with the additional step of adjusting operation of the rotating equipment and/or maintenance of the rotating equipment.

    30. A system for monitoring sensor data of rotating equipment, the system comprising one or more processors which are programmed to perform the method according to claim 12.

    31. A computer-readable storage media having stored thereon: instructions executable by one or more processors of a computer system, wherein execution of the instructions causes the computer system to perform the method according to claim 12.

    32. A computer program, which is being executed by one or more processors of a computer system and performs the method according to claim 12.

    Description

    BRIEF DESCRIPTION

    [0043] Some of the embodiments will be described in detail, with reference to the following figures, wherein like designations denote like members, wherein:

    [0044] FIG. 1 a flowchart of an embodiment of the method;

    [0045] FIG. 2 microclusters as retrieved during an online phase; and

    [0046] FIG. 3 a macrocluster model as retrieved after an offline phase.

    DETAILED DESCRIPTION

    [0047] The Principal Component Analysis (PCA) is a popular tool to compute the principal components of a data matrix by using an orthogonal transformation. As a form of true eigenvector-based multivariate analysis, it reveals an uncorrelated orthogonal basis vector set such that the underlying data has the maximum possible variance in the directions of the principal components. Each vector of the basis represents an eigenvector of the underlying covariance matrix. The first principal component is the eigenvector that has the largest possible variance, i.e. the largest eigenvalue, and further principal components maximize the variance with respect to the constraint that they build an orthogonal system.

    [0048] In terms of correlation clustering, previously presented methods often use the PCA to detect correlation clusters. To formalize the idea of correlation clusters, we borrow the notions used in [1] and [2].

    [0049] Let D be a dataset of n d-dimensional feature vectors, i.e. Dcustom-character. Let C be a correlation cluster, i.e. CD, and X denote the centroid of all points in C. The d×d covariance matrix Σ.sub.C of C is defined as:

    [00001] .Math. C .Math. = 1 .Math. C .Math. .Math. .Math. x C .Math. ( x - X _ ) .Math. ( x - X _ ) T .

    [0050] Due to Σ.sub.C being a positive, semi-definite square matrix, it can be diagonalized, e.g. by an eigenvalue decomposition, such that Σ.sub.C=V.sub.CE.sub.CV.sub.C.sup.T holds. E.sub.C denotes the eigenvalue matrix which is a diagonal matrix that stores the d positive eigenvalues of Σ.sub.C in decreasing order. V.sub.C is the corresponding orthonormal eigenvector matrix with each column denoting an eigenvector of Σ.sub.C. Finally, the eigenvector associated with the first (largest) eigenvalue is the first principal component. This means that the underlying data has its maximum variance, i.e. it carries most information, in the direction of this vector compared to the other eigenvectors. However, some of the eigenvalues might be close to zero, resp. lower than a threshold value δ{tilde under (>)}0, which means that there are dimensions in the eigenspace that barely carry information. By discarding such weak eigenvalues and their associated eigenvectors, the remaining strong vectors of the eigensystem form a lower dimensional subspace. In the following, we call such subspaces λ.sub.C-dimensional subspaces, where λ.sub.C≦d denotes the number of strong eigenvectors of the feature space defined by C.

    [0051] Since the subset CD might not be “complete” by the time the principal components are computed in a stream clustering environment, it is required to be able to update a λ.sub.C-dimensional subspace whenever incoming data objects fit into the correlation cluster C.

    [0052] Recomputing the PCA from scratch each time a potential correlation cluster gets a new cluster member would be prohibitively expensive in terms of computational costs. A proven mean to overcome this issue in general are incremental approaches.

    [0053] Incremental PCA (IPCA) has been studied extensively, mostly in the area of computer vision and in the machine learning community. In general, there are two categories of IPCA algorithms. The first one is covariance-free techniques that cope without the computation or reconstruction of the covariance matrix. The algorithms from the second category approximately reconstruct the covariance matrix from the previously computed eigenvectors and eigenvalues. By adding a new observation, the dimension of the subspace is increased by one. But since least significant principal components are discarded, the dimension of the subspace is kept small which makes these approaches computational efficient. Although these methods suffer from unpredicted approximation errors, we use the basic algorithm presented in Algorithm 1 from [12] because of its good real-time performance. Experiments show that this technique can sufficiently cope with high velocity data streams.

    TABLE-US-00001 Algorithm 1 Incremental PCA Input: Data Stream S, Weight parameter τ Output: Current Eigensystem eig.sub.i, composed of eigenvector matrix V.sub.i and eigenvalue matrix E.sub.i  1: V.sub.0,E.sub.0: = initial PCA from the first init observa tions  2: μ.sub.0: = mean of the first init observations  3: while S does not end do  4: x.sub.i′: = next incoming observation from S:  5: x.sub.i = x.sub.i′ − μ.sub.i−1  6: μ.sub.i = μ + (1 − τ) .Math. x.sub.i  7: for j ∈ range(0,col(V.sub.i−1)) do  8: y.sub.i = {square root over (τE.sub.i−1(j,j))} .Math. V.sub.i−1(:,j)  9: end for 10: y.sub.col(V.sub.i−1.sub.) = {square root over (1 − τ)} .Math. x.sub.i 11: A = [y.sub.0,y.sub.1,...,y.sub.col(V.sub.i−1.sub.).sub.] 12: B = A.sup.TA 13: U, E.sub.1 = Eigen − decompose B 14: for j ∈ range(0,col(U)) do 15: v.sub.j = A .Math. U(:,j) 16: end for 17: V.sub.i = [v.sub.i,v.sub.2...,v.sub.col(U)], 18: end while

    [0054] Algorithm 1 proceeds as follows: First, after collecting an initialization set I of init observations, an initial PCA yielding the eigenvector matrix V.sub.0 and eigenvalue matrix E.sub.0, is performed on I. Note that init can be chosen arbitrarily in custom-character.sub.0.sup.+. If init=0, V.sub.0 must be set to be the d×1 null vector with d being the dimension of the regarded feature space. Accordingly, the eigenvalue in the 1×1 matrix E.sub.0 can be set arbitrarily. The initial mean value is either set to be the null vector if init =0, or it is the mean of all observations in I. After the initialization, the initial eigenspace is updated incrementally with each incoming observation from data stream S. Whenever a new observation arrives, it gets mean normalized, i.e. x.sub.i, and the current mean μ.sub.i is determined. The parameter α∈[0, 1] is used as a weight that denotes the importance of a new observation compared to the previously seen ones. The larger a, the less important is a new observation x.sub.i. Next, a d×col(V.sub.i−1)+1 matrix A is defined, with col(V.sub.i−1) denoting the number of columns of the eigenvector matrix V.sub.i−1. The first col(V.sub.i−1) columns of A are constructed from the weighted previous principal components and the weighted current observation forms the last column. Using matrix A, we can reconstruct the new d×d covariance matrix C expressed by C=AA.sup.T. Since a high dimension d leads to high computational costs, a smaller (col(V.sub.i−1)+1)×(col(V.sub.i−1)+1) matrix B=A.sup.TA is constructed and then decomposed on the rank of col(V.sub.i−1)+1.

    [0055] The eigen-decomposition retrieves the eigenvalue matrix E.sub.i and the eigenvector matrix U. Multiplying each eigenvector of U with matrix A finally retrieves the eigenvectors of the covariance matrix C with the eigenvalues contained in E.sub.i. For the mathematical derivations of the single steps, we refer to [12]. Note that the updating process of the eigenspace is based on reconstructing the covariance matrix C by using the eigenvectors gained from the previous update. Since commonly, in each step only the most significant eigenvectors are selected to cap the dimension of the eigenspace, this procedure is approximating the covariance matrix with an unpredicted error. However, our experiments show that the loss of accuracy hardly affects the clustering result.

    [0056] As this work concentrates on streaming data, we use a common definiton of data streams.

    [0057] Definition 1. A data stream S is an ordered and possibly infinite sequence of data objects x.sub.1, x.sub.2, . . . , x.sub.i, . . . that must be accessed in the order they arrive and can be read only in one linear scan (or a few if necessary).

    [0058] Another concept used for our purpose is the damped window model. Since recent data is typically more expressive than old data objects, especially if an up-to-date clustering model is desired, it is useful to “forget” stale data. Therefore, a widely used approach in applications dealing with temporal data is the utilization of the exponential fading function for data ageing. This technique assigns a weight to each data object which decreases exponentially with time t by using the fading function ∫(t)=2.sup.−λ.Math.t. λ>0 is the decay rate and determines the impact of stale data to the application. A high value of λ means low importance of old data and vice versa.

    [0059] Previously presented correlation clustering methods that are able to find strong, linear and arbitrarily oriented subspace clusters in static datasets all describe the useful characteris-tics of a data object based on its neighboring objects. This requires to keep all data objects in storage which is generally infeasible in terms of streaming application. Since data may arrive with high velocity, stream algorithms are limited to the main memory for being able to process data as fast as possible. To overcome this problem, a common approach in stream algorithms is to aggregate data in an appropriate summary structure. Microclusters summarize the information of an entire set of data objects, are kept in main memory as compact representations and are used for further processing steps.

    [0060] Besides finding an appropriate microcluster structure that encapsulates the necessary information, such structures also have to fulfill two basic properties, i.e. additivity and, when considering data ageing, temporal multiplicity. These properties are crucial in order to maintain the microclusters in the online step. For our purpose, we define a microcluster as follows:

    [0061] Definition 2. A microcluster at time t for a set of d-dimensional points C={p.sub.1, p.sub.2, . . . , p.sub.n} arriving at different points in time is defined as mc(C, t)=(V(t), E(t), μ(t), ts)

    with [0062] V(t) being the eigenvector matrix of the covariance matrix of C at time t, [0063] E(t) being the corresponding eigenvalues of the eigen-vectors in V(t), [0064] μ(t) being the mean of the data points contained in C at time t and [0065] ts being the timestamp of the last incoming object assigned to this microcluster.

    [0066] This structure is valid in the sense that both required properties concerning the online maintenance can be fulfilled:

    [0067] Additivity: Due to the limited amount of main memory, it is necessary to reduce the number of microclusters occasionally. Therefore, microclusters whose eigenvectors span equal (or similar) subspaces must be merged in a way that two micro-clusters mc(C, t.sub.c)=(V (t), E(t), μ(t), t.sub.c) and mc(C′, t.sub.c′)=(V(t)′, E(t)′, μ(t)′, t .sub.c′) form the microcluster mc(C∪C′, t)=(V(t)⊕V(t)′, E(t)⊕E(t)′, μ(t)⊕μ(t)′, t) with ⊕ being an appropriate merge operation and t=max(t.sub.c,t.sub.c′). Let us note that this operation particularly allows to add single data objects to an existing microcluster as well.

    [0068] Temporal Multiplicity: If a new data point p arrives at time t and is assigned to the best fitting microcluster mc(C, t−Δt) with Δt≧0, i.e. a microcluster that did not absorb a data point for some time Δt, the components of mc(C, t−Δt) are downgraded by the weight parameter τ used in the IPCA algorithm. The parameter τ determines the ratio of importance between previously seen data and the current data point. We propose to define τ as the commonly used exponential decay

    [0069] function τ=2.sup.−β.Math.Δt with 0≦β≦1 being the learning rate

    [0070] of the process. In our application, this means the larger β is chosen, the more influence has a new arriving data object on the eigenspace of the microcluster it is assigned to.

    [0071] In the eigenvector matrix V of our microcluster structure, we generally differ between two types of eigenvectors, i.e. the strong eigenvectors and the weak eigenvectors. The strength of an eigenvector is given by the variance along this axis in the eigenspace. We define strong eigenvectors as follows [1]:

    [0072] Definition 3. Let α∈[0, 1], mc be a microcluster, V its eigenvector matrix and E its d×d eigenvalue matrix having the eigenvalues in descending order on the diagonal. We call the first

    [00002] min r { 1 , .Math. .Math. , d } .Math. { r | .Math. i = 1 r .Math. E ( i , i ) .Math. i = 1 d .Math. E ( i , i ) α }

    eigenvectors strong eigenvectors resp. preference vectors and the remaining eigenvectors are called weak eigenvectors. The space spanned by the preference vectors is called correlation subspace.

    [0073] In the following, an embodiment for correlation clustering on streaming data is described, and in particular the maintenance of microclusters in an online phase.

    [0074] The algorithm CORRSTREAM presented in this section consists of two phases, i.e. an online phase in which microclusters are generated, maintained and/or discarded due to temporal expiration and an offline phase to extract on demand clustering models of the current state of the stream.

    TABLE-US-00002 Algorithm 2 Online Input: Data Stream S, range parameter ∈, buffer size buff_size, decay parameter λ 1: for incoming data object o from S at time t do 2: Microcluster mc.sub.NN = rangeNN(o, ∈) 3: if mc.sub.NN ≠ null then 4: add <o, t> to mc.sub.NN 5: else 6: create new microcluster with parameters buff_size, λ and add <o, t> 8: end if 8: end for

    [0075] During the continuous online phase, referenced in Algorithm 2, the data stream is consumed and for each data object o a rangeNN query is performed to detect the closest microcluster. The rangeNN query takes the current data object as query object and the ∈ parameter as a distance bound. As a result, the rangeNN query retrieves the closest microcluster with a maximum distance of ∈. If such a microcluster exists, it absorbs the current data object o, otherwise a new microcluster is created. Beside of the components that fulfill the maintenance properties, each microcluster has an initial buffer. This buffer is a small collection of data objects that serves as a basis for an internal initialization step. The intuition behind that is to collect a bunch of spatially close data objects for which an initial PCA is performed. The PCA retrieves the eigenspace, i.e. eigenvectors and eigenvalues, of those data objects. Applying Definition 3, we can define the strong eigenvectors of the microcluster which span the correlation subspace. Since a single data object or a few are not able to exhibit proper resp. meaningful preference vectors, the buffer size should not be chosen smaller than the dimensionality of the full data space.

    [0076] Another view of Algorithm 2 is shown in the flowchart of FIG. 1. According to FIG. 1, the rangeNN query is performed in a calculating step 1 that calculates a correlation distance measure between the new feature vector (the incoming data object o) and each microcluster. The rangeNN query searches for the microcluster with the smallest value for the correlation distance measure. If that value is below the range parameter e, then the rangeNN query delivers that microcluster. If, however, all values for the correlation distance measure are above the range parameter ∈, then the rangeNN query returns null. An evaluation step 2 checks the result of the rangeNN query. If the result is a microcluster, then the new feature vector will be added to that microcluster in an assigning step 3, which includes updating the microcluster based on the new feature vector in an updating step 4. If the result of the rangeNN query is null, then a new microcluster based on the new feature vector is created in a creating step 5. The algorithm then proceeds with the next incoming feature vector by returning to the calculating step 1.

    TABLE-US-00003 Algorithm 3 rangeNN Input: data object o, range parameter ∈ Output: nearest microcluster of o  1: min.sub.dist := MAX_VALUE  2: min.sub.mc := null  3: for each existing microcluster mc do  4: if mc is already initialized then  5: dist := distance.sub.corr(mc, o)  6: else  7: dist := distance.sub.eucl(μ.sub.mc, o)  8: end if  9: if dist < ∈ custom-character  dist < min.sub.dist then 10: min.sub.dist .sup.= dist 11: min.sub.mc .sup.= mc 12: end if 13: end for 14: return min.sub.mc

    [0077] Algorithm 3 shows the basic pseudocode of the rangeNN query. For a data object o we compute the distance to each existing microcluster and retrieve the microcluster with the smallest distance to o that is below ∈. This embodiment uses two distance measures, i.e. the Euclidean distance and the correlation distance. The reason for that is the period of grace that we establish for each newly constructed microcluster for the initialization. If the initial PCA have not been done for a microcluster, yet, the correlation measure cannot be applied due to the lack of the microcluster's eigenspace. Therefore, we determine the Euclidean distance between the micocluster's mean point and the incoming data object instead of the correlation distance in such cases. However, to define the correlation distance, which is used in all other cases, we need to define the notion of similarity matrix beforehand.

    [0078] Definition 4. Let V.sub.mc be an eigenvector matrix with E.sub.mc being the corresponding eigenvalue matrix of a microcluster mc having onto [0; 1] normalized eigenvalues on the diagonal. Given a threshold value α∈[0; 1] and a constant value with κ∈custom-character with κ>>1, the eigenvalue matrix E.sub.mc is adopted by setting those eigenvalues to κ whose value is below the threshold value α. The values of the resulting matrix Ê.sub.mc are computed according to the following rule:

    [00003] E ^ m .Math. .Math. c ( i , i ) .Math. { 1 if .Math. .Math. E m .Math. .Math. c ( i , i ) α κ else

    [0079] Having the adopted eigenvalue matrix Ê.sub.mc, the similarity matrix is defined as {circumflex over (M)}.sub.mc=V.sub.mcÊ.sub.mcV.sub.mc.sup.T.

    [0080] Let us note that the constant value κ specifies the allowed degree of deviation from the correlation subspace. Following [9], we set this value to κ=50. The correlation distance can finally be computed as follows.

    [0081] Definition 5. Given a microcluster mc with mean point μ.sub.mc and a data object o, the correlation distance between both is defined as


    distance.sub.corr(mc, o)=√{square root over ((μ.sub.mc−o).Math.{circumflex over (M)}.sub.mc.Math.(o−μ.sub.mc))}

    with {circumflex over (M)}.sub.mc being the similarity matrix of mc.

    [0082] After determining the closest microcluster mc of the incoming data object o, the latter must be incorporated into mc properly.

    TABLE-US-00004 Algorithm 4 add Input: data object o, timestamp ts.sub.o, microcluster mc Output: updated mc, i.e. mc + o  1: if mc is not initialized then  2: buffer.add(o)  3: μ = update mean  4: ts.sub.mc .sup.= ts.sub.o  5: if buffer.size( ) == buff_size then  6: V,E := perform PCA on buffer  7: update mc's eigensystem  8: mark mc as initialized  9: end if 10: else 11: V, E, μ := perform IPCA by incorporating o with weight 2.sup.−λ.Math.(ts.sup.o.sup.−ts.sup.mc.sup.) 12: update mc's eigensystem 13: end if 14: return mc

    [0083] Algorithm 4 shows the pseudocode of this function. Our proposed algorithm basically differentiates three cases of how to insert a new data object into an existing microcluster. The first two cases are considered if the microcluster mc has not been initialized so far, i.e. the buffer is not filled. In that cases, the object is inserted into the buffer and the mean μ is updated as well as the current timestamp of the microcluster. Note that ts.sub.o denotes the current timestamp of the data object o and ts.sub.mc denotes the timestamp of the mc which is the time of the last microcluster update. Afterwards, the algorithm checks whether the buffer is filled now. If not, the insertion terminates by retrieving the updated microcluster. Otherwise, the initialization procedure is triggered and the initial PCA is performed and retrieves the eigensystem of the data objects contained in the buffer. Then, the components of the microcluster structure are set and mc is marked as initialized.

    [0084] The third option of inserting a new arriving data object is used if the microcluster already has been initialized before. In this case, the existing components of the microcluster are reused and the incremental PCA as given in Algorithm 1 is invoked to generate the new eigenvectors and -values as well as an updated mean vector. As mentioned above, the degree of influence of the new object on the existing eigensystem can be regulated by the weight parameter, i.e. 2.sup.−λ.Math.(ts.sup.o.sup.−t.sup.mc.sup.) in the pseudocode. After updating the components of mc, the current state of mc is returned and the algorithm can proceed with the next data object.

    [0085] Due to the possibility of expiring microclusters, i.e. microclusters that have not absorbed any data object for a while, it might happen that this microcluster has to be deleted since stale data should not sophisticate an up-to-date clustering model on the one hand. On the other hand, because of streaming applications being often limited to a certain amount of memory, it is useful to delete old microclusters to save storage space. As a straightforward solution, we propose to scan the set of microclusters sequentially from time to time and delete those microclusters whose timestamp ts.sub.mc is older than a user specified threshold value Δt, i.e. if ts.sub.mc<ts.sub.curr−Δt with ts.sub.curr denoting the current point in time. The choice of an appropriate threshold value Δt depends on the velocity of the data stream.

    [0086] In the following, the generation of a macrocluster model during an offline phase is described.

    [0087] The main goal of CORRSTREAM is the construction of a high quality clustering model that describes correlations appearing in the data. For that purpose, the algorithm provides an offline routine capable of building macroclusters on top of the generated microcluster structures retrieved by the online process. FIG. 2 depicts the outcoming microcluster model of the online phase for a small synthetic and 3-dimensional data set. As can be seen easily, some of the microclusters can be grouped so that they form two separated macroclusters, i.e. an I-dimensional cluster and a 2-dimensional one. In general, the microcluster structure is generic enough so that a variety of static correlation clustering algorithms can be adopted to build a clustering model based on the retrieved microclusters. In this work, we approach a variant of the ERIC algorithm [1] in detail. Another possibility is a version of CORRSTREAM that uses an adaptation of the ORCLUS algorithm [6].

    [0088] The algorithm we propose consists of four steps that are (1) partitioning the set of microclusters, (2) computing macroclusters within each partition, (3) building the hierarchy of the clustering and (4) defining cross-partition macroclusters which is optional. In the first step, we partition the set of microclusters according to the dimensionality of their subspaces. Regarding the example data set from FIG. 2, the microclusters which form the 1-dimensional line cluster would be in one partition, and the microclusters which form the 2-dimensional plane would be in another partition for instance. Technically, the partitioning process is done by counting the number of strong eigenvectors contained in the eigenvector matrix of each microcluster. After dividing the set of microclusters into disjoint partitions, the algorithm determines macroclusters within each partition. Therefore, we apply a DBSCAN [11] variant capable of dealing with the structure of the microclusters. The basic idea is to use the orientation of the microclusters given by the eigenvectors to group those microclusters whose eigenvectors span a similar subspace. According to [1], we define the correlation distance used for the spatial queries in the offline phase as the composition of the approximate linear dependency and the affine distance between two microclusters.

    [0089] Definition 6. Given a threshold value Δ∈]0;1[ and two microclusters mc.sub.i and mc.sub.j, with mc.sub.i having less or as many strong eigenvectors as mc.sub.j, mc.sub.i is called approximately linear dependent from mc.sub.j if


    √{square root over (ν.sub.i.sup.T.Math.V.sub.mc.sub.j.Math.Ê.sub.mc.sub.j.Math.V.sub.mc.sub.j.sup.T.Math.ν.sub.i≦Δ)}

    with Ê.sub.mc.sub.j being the adopted eigenvalue matrix of mc.sub.j according to

    [00004] E ^ m .Math. .Math. c ( x , x ) .Math. { 1 if .Math. .Math. E m .Math. .Math. c ( x , x ) α 0 else

    holds for all strong eigenvectors ν.sub.i of mc.sub.i.

    [0090] Since perfect correlations usually do not appear in real world data, the threshold value Δ is introduced to allow a certain degree of deviation from an absolute linear dependency between two microclusters.

    [0091] Definition 7. Let mc.sub.i and inc.sub.j be two microclusters with mc.sub.i having less or as many strong eigenvectors as mc.sub.j and let mc.sub.i be approximately linear dependent from mc.sub.j. Then, the affine distance between mc.sub.i and mc.sub.j is defined as


    distance.sub.aff(mc.sub.i,mc.sub.j)=√{square root over ((μ.sub.mc.sub.i−μ.sub.mc.sub.j).sup.T.Math.V.sub.mc.sub.j.Math.Ê.sub.mc.sub.j.Math.V.sub.mc.sub.j.sup.T.Math.(μ.sub.mc.sub.i−μ.sub.mc.sub.j))},

    with Ê.sub.mc.sub.j being defined as in Definition 6 and μ.sub.mc being the mean of microclustser mc.

    [0092] Combining Definitions 6 and 7, and assuming the premise that two microclusters with parallel subspaces form a joint cluster if the affine distance is below a threshold value δ∈custom-character.sub.0.sup.+, we define the correlation distance for the offline phase as follows.

    [0093] Definition 8. Let δ∈custom-character.sub.0.sup.+, Δ∈]0; 1[ and let mc.sub.i and mc.sub.j be two microclusters with mc.sub.i having less or as many strong eigenvectors as mc.sub.j. The correlation distance between mc.sub.i and mc.sub.j, i.e. CorrDist.sub.Δ.sup.δ(mc.sub.i,mc.sub.j), is defined as

    [00005] CorrDist Δ δ ( m .Math. .Math. c i , m .Math. .Math. c j ) = { 0 if .Math. .Math. m .Math. .Math. c i .Math. .Math. is .Math. .Math. approx . .Math. linear dependent .Math. .Math. from .Math. .Math. m .Math. .Math. c j .Math. distance aff ( m .Math. .Math. c i , m .Math. .Math. c j ) δ 1 else

    [0094] Using this distance measure for DBSCAN which is performed within each partition finally yields a density-based correlation clustering model for each partition. Note that the ∈ parameter for DBSCAN must be set to 0 since the distance measure is binary. The minPts parameter, that we refer to as minMcs in the following to be able to distinguish this parameter from the one used in the static ERiC method, depends on the application.

    [0095] Due to the partitioning, it might happen that some microclusters might span a subspace that lies within the subspace of surrounding microclusters whose subspaces have a higher dimensionality, e.g. a line cluster might be embedded into a plane shaped cluster. To detect such constellations, step (3) builds a hierarchy of the macroclusters generated during the previous step. The procedure borrowed from [1] iterates over the set of macroclusters and checks for each cluster C.sub.m whether there exists another cluster C.sub.n whose subspace is of a higher dimensionality and for which CorrDist.sub.Δ.sup.δ(C.sub.m, C.sub.n)=0 holds. If such cluster C.sub.n exists, it is a parent of C.sub.m unless C.sub.n is not an ancestor of C.sub.m already. Processing all macroclusters in this manner yields to a hierarchical, treelike structure in which clusters contained in a child node are embedded within the clusters contained its parent node. In the optional fourth step of our algorithm, we finally merge related macroclusters which leads to cross-partition macroclusters, i.e. clusters that contain microclusters which span subspaces of different dimensionalities with the premise that lower dimensional microclusters must be embedded within the microclusters of higher dimensionalities.

    [0096] FIG. 3 shows the final clustering result of our sample data set. Note that full dimensional subspaces are treated as noise, depicted as round dots. Lower dimensional microclusters that obviously lie within a cluster can turn out to be handled as noise as well if they are not density-reachable for the DBSCAN algorithm. Regarding the lower right corner of the 2-dimensional cluster gives an example for this phenomenon. Although there exists a microcluster in the model retrieved by the online phase, this microcluster is labeled as noise during the offline phase since the microcluster is 1-dimensional but not density-reachable from another 1-dimensional microcluster (and vice versa) to satisfy the minMcs criterion for DBSCAN.

    TABLE-US-00005 TABLE 1 parameter values static custom-character  stream meaning δ .02 both threshold for the affine distance Δ .02 both allowed deviation from linear dependency α .85 both threshold to distinguish strong and weak eigenvectors k 17 static number of neighbors regarded to determine a points eigenspace minPts 8 static number of density-reachable points buff_size 10, 15, 20 stream number of points used to initialize microcluster ∈ .1, .15, .2, .3 stream allowed range for pre- initialization minMcs 1, 2, 3 stream number of density-reachable microclusters λ .4 stream intensity of the data stream β .3, .5, .7 stream ageing parameter

    [0097] In the following, we discuss the influence of the parameters that are required to be set by the user for CORRSTREAM, i.e. the ∈ parameter giving the allowed maximum distance of a data point to a microcluster center for the assignment step, the buff_size parameter which regulates the size of the buffer, or in other words the number of points used for the initial PCA, and the minMcs parameter that says how many microclusters must share a subspace so that a macrocluster can be created in the offline phase. Note that the latter parameter is ERiC specific and is not used in the ORCL US-like variant since the ORCLUS specific k.sub.0 parameter, i.e. the parameter that gives the initial number of clusters which is reduced to k during the iterations, is implicitly given by the number of microclusters. The precision and recall values slightly decrease with increasing ∈ values which can be explained by the enlarged absorption radius as more distant points can be absorbed by a microcluster, especially during the initialization phase. The number of produced microclusters logically decreases and thus the processing time, too.

    [0098] When considering various buffer sizes, the only measurement that drastically changes is the runtime which seems to benefit from larger buffers.

    [0099] The minMcs parameter also does not affect the measured variables a lot. While precision and recall go down very slightly with an increasing number of microclusters that are necessary to build a macrocluster, the runtime and the overall number of microclusters are not affected at all. For instance, higher values of the minMcs parameter probably benefit from small ∈ values in terms of clustering quality but to the costs of an increased processing time.

    [0100] In the following, means for detecting a significant change in a subspace's orientation are described.

    [0101] A significant change in a subspace's orientation might be detected either if a new dimension appears or an old one disappears from the set of preferred vectors of the subspace when comparing the current state to a historic one. If the number of preferred dimensions does not change it might happen that the orientation changes anyway. Due to the use of an appropriate similarity measure, such kinds of changes in the orientation can be recognized. We therefore introduce the following distance measure to determine the similarity of two subspaces:

    [0102] Definition 9. Let S(t) and S(t′) be the two subspaces with (u.sub.1, u.sub.2, . . . , u.sub.d) respectively (ν.sub.1, ν.sub.2, . . . , ν.sub.d) being the vectors that represent the orthonormal bases of the two subspaces. We define the distance d between the two subspaces S(t) and S(t′) as follows:


    d(S(t),S(t′))=Σ.sub.i=1.sup.d argmin.sub.j=1 . . . da cos(|custom-characteru.sub.i,ν.sub.jcustom-character|),

    with custom-character. , . custom-character being the dot product.

    [0103] This distance measure sums the occurring angles between the best fitting subspaces' eigenvectors which basically measures the rotation that might be happened to the subspace S during the time interval [t, t′]. If the distance between S(t) and S(t′) exceeds an application dependent threshold value, the change of orientation can be considered as significant.

    [0104] When determined on a larger scale during the offline phase examining the macrocluster model, such changes in the orientation of a cluster's subspace in the macrocluster model may indicate behavior differently from the standard operation with even more reliability.

    [0105] The following section describes a way for comparing the macrocluster model with historical models.

    [0106] Cluster models such as the macrocluster model generated in the offline phase basically consist of extracted subspaces. One embodiment to compare two such cluster models is to measure the deviation for each cluster pair and finally compare the sums of deviations of each historical model. Having two cluster models, one representing a historical pattern fetched from a database and the other representing the current model, a cluster pair can be defined as two clusters, one from each model, that are identical/very similar. The similarity between two clusters can be determined by measuring the deviation of the subspaces represented by these clusters. The deviation between two clusters can be determined by checking if the vectors that span the subspace of the cluster gained from the current model are linear dependent from the vectors that span the subspace represented by the cluster gained from the standard pattern. Note that all cluster pairs are detected by comparing the subspace of one cluster, say x, to the subspaces of all clusters of the comparison model. The cluster, say y, with the minimum distance (or largest similarity) to x is regarded as the cluster equivalent, i.e. x and y build a cluster pair (x, y). A historical model that shows the minimal sum of deviations among all historical models to the current model can be taken as a comparison model.

    [0107] The following section describes possible applications of the previous embodiments to the diagnostics of turbines and other rotating equipment.

    [0108] The previously described embodiments of correlation clustering on data streams can be useful for turbine diagnostics in the sense that this unsupervised process is able to observe the operational behavior of the turbine (or other rotating equipment) continuously, and in contrast to supervised approaches in absence of labeled training data. An ageing mechanism allows to forget the influence of stale data to the current model which keeps the microcluster model permanently up to date. If a technician detects a failure state of the turbine he can request a current microcluster model representing the current state on demand. The microcluster model is translated into a macroclustering model during an offline phase. Having a database that stores documented historical operation patterns of a turbine, e.g. standard operation, loosened screw, etc., the extracted model from the offline phase can be compared with the patterns stored in the database and the most similar model can be retrieved. In case of documentation, the technician can determine the kind of failure in this way and is able to introduce following steps, e.g. stopping the turbine.

    [0109] In case of a real time monitoring system, the proposed algorithms can ideally be used as a key mechanism to detect shifts or drifts from a standard operational state to a failure state. One option that is conceivable is the detection of abrupt failures. Such a failure might even be identifiable during the online phase. Newly arising microclusters or microclusters whose subspaces change their orientations significantly although they were stable during standard behavior may indicate an abrupt failure. On a larger scale such changes in the orientation of a cluster's subspace in the macrocluster model may indicate behavior differently from the standard operation with even more reliability. Insidious failures are harder to detect. In this case, a model comparison as described above can help to detect failure states.

    [0110] At least some of the embodiments describe a novel streaming algorithm that is capable to detect arbitrarily oriented subspace clusters, i.e. correlation clusters. At least some of the embodiments use a two-step approach by dividing the procedure in an online and an offline phase. A newly proposed microcluster structure is used to aggregate similar data objects and thus compressing the volume of data significantly. At the same time this data structure provides all the necessary statistical information, gained from the incorporated data points, that are required for the offline phase to compute a clustering model which represents the structure of the data in total sufficiently.

    [0111] The method can be executed by a processor such as a microcontroller or a microprocessor, by an Application Specific Integrated Circuit (ASIC), by any kind of computer, including mobile computing devices such as tablet computers, smartphones or laptops, or by one or more servers in a control room or cloud.

    [0112] For example, a processor, controller, or integrated circuit of the computer system and/or another processor may be configured to implement the acts described herein.

    [0113] The above-described method may be implemented via a computer program product including one or more computer-readable storage media having stored thereon instructions executable by one or more processors of a computing system. Execution of the instructions causes the computing system to perform operations corresponding with the acts of the method described above.

    [0114] The instructions for implementing processes or methods described herein may be provided on non-transitory computer-readable storage media or memories, such as a cache, buffer, RAM, FLASH, removable media, hard drive, or other computer readable storage media. A processor performs or executes the instructions to train and/or apply a trained model for controlling a system. Computer readable storage media include various types of volatile and non-volatile storage media. The functions, acts, or tasks illustrated in the figures or described herein may be executed in response to one or more sets of instructions stored in or on computer readable storage media. The functions, acts or tasks may be independent of the particular type of instruction set, storage media, processor or processing strategy and may be performed by software, hardware, integrated circuits, firmware, micro code and the like, operating alone or in combination. Likewise, processing strategies may include multiprocessing, multitasking, parallel processing and the like.

    [0115] The invention has been described in detail with reference to embodiments thereof and examples. Variations and modifications may, however, be effected within the spirit and scope of the invention covered by the claims. The phrase “at least one of A, B and C” as an alternative expression may provide that one or more of A, B and C may be used.

    [0116] It is to be understood that the elements and features recited in the appended claims may be combined in different ways to produce new claims that likewise fall within the scope of the present invention. Thus, whereas the dependent claims appended below depend from only a single independent or dependent claim, it is to be understood that these dependent claims may, alternatively, be made to depend in the alternative from any preceding or following claim, whether independent or dependent, and that such new combinations are to be understood as forming a part of the present specification.

    [0117] While the present invention has been described above by reference to various embodiments, it should be understood that many changes and modifications may be made to the described embodiments. It is therefore intended that the foregoing description be regarded as illustrative rather than limiting, and that it be understood that all equivalents and/or combinations of embodiments are intended to be included in this description.