Methods and systems for discovery of prognostic subsequences in time series
10712733 ยท 2020-07-14
Assignee
Inventors
- Daniel Nikolaev Nikovski (Brookline, MA)
- Yan Zhu (Riverside, CA, US)
- Amir-massoud Farahmand (Cambridge, MA, US)
Cpc classification
G05B23/0283
PHYSICS
G05B23/0221
PHYSICS
G06N5/01
PHYSICS
International classification
Abstract
Systems and methods for determining a pattern in time series data representing an operation of a machine. A memory to store and provide a set of training data examples generated by a sensor of the machine, wherein each training data example represents an operation of the machine for a period of time ending with a failure of the machine. A processor configured to iteratively partition each training data example into a normal region and an abnormal region, determine a predictive pattern absent from the normal regions and present in each abnormal region only once, and determine a length of the abnormal region. Outputting the predictive pattern via an output interface in communication with the processor or storing the predictive pattern in memory, wherein the predictive pattern is a predictive estimate of an impending failure and assists in management of the machine.
Claims
1. A system for managing an impending failure of a machine by determining a pattern in time series data representing an operation of the machine, comprising: a computer readable memory includes stored data, the stored data includes predictive patterns determined from a set of training data examples generated by a sensor in communication with a training machine, each training data example represents an operation of the training machine for a period of time ending with a failure of the training machine, such that the set of training data examples are unprocessed data such that regions of normal operation and abnormal operation in them are unknown, wherein the stored predictive patterns are previously determined, by a training processor, the training processor is configured to: sequentially, select a length of a period of time for a current time series for the abnormal state region within each training data example, by considering in sequence every possible splitting point in the time series in turn; partition for the currently selected splitting point, each training data example in the set of training data examples into the normal state region and the abnormal state region having the length of the period of time for the current time series, to identify a pattern in the set of training data examples, such that the pattern is different from any other patterns present in all normal state regions of the set of training data examples, and is similar to exactly one pattern in each abnormal state region of the set of training data examples; select the pattern as the predictive pattern indicative of a predictive estimate of an impending failure of the training machine, and if the pattern is found, store the predictive pattern in the computer readable memory; increment the estimated time moment of start of the abnormal operation by one, and proceed with a next iteration of sequential partitioning the current time series, until the estimated time moment of the start of the abnormal operation reaches the end of the current time series; an input interface to receive sensor data, the sensor data includes test data examples obtained from a sensor in communication with the machine; a processor in communication with the computer readable memory and the input interface, is configured to: determine, based on at least one stored predictive pattern in the computer readable memory, whether one or more test data example extracted from the test data examples matches a pattern of the machine that corresponds to the at least one stored predictive pattern, and use the match as an indication of an impending failure of the machine, if the pattern is found; and manage the machine, according to the predictive estimate of the impending failure of the machine.
2. The system of claim 1, wherein the abnormal state region corresponds to the training machine failing to operate normally ending with the failure of the training machine, and the length of the period of time for the current time series of the abnormal state region is an amount of discrete-time data within the abnormal state region of each training data example in the set of training data examples.
3. The system of claim 1, where the predictive pattern determined from the set of training data examples is different from a pattern in the normal region, if a Euclidean distance between the two patterns exceeds a pre-specified threshold.
4. The system of claim 1, where the predictive pattern determined from the set of training data examples is considered similar to a pattern in the normal region if a Euclidean distance between the two patterns is lower than a pre-specified threshold.
5. The system of claim 1, wherein the searching for the predictive pattern determined from the set of training data examples is performed by a fast shapelet discovery algorithm.
6. The system of claim 1, wherein the training processor partitions each training data example in the set of training data examples into the normal state region based on data identifying a portion of the training data example that is generated by the sensor of training machine while the training machine was operating normally, and partitions each training data example in the set of training data examples into the abnormal state region based on data identifying a portion of the training data example that is generated by the sensor of training machine while the training machine was failing to operate normally and ending with the failure of the training machine.
7. The system of claim 1, wherein lengths of the periods of time for the set of training data examples are one of: a same length of periods of time or some training data examples in the set of training data examples have lengths of periods of time that are different from other training data example lengths of periods of time in the set of training data examples.
8. The system of claim 1, wherein the machine is similar to the training machine, and the sensor of the machine measures a same parameter as a respective sensor of the sensor of the training machine.
9. The system of claim 8, wherein each parameter relates to the operation of the training machine including one or a combination of: fluid force data, fluid energy data, vibration data, temperature data, voltage data or current data.
10. The system of claim 1, wherein the machine is stopped from operation or replaced, as soon as the at least one stored predictive pattern is matched to the one or more test data example.
11. The system of claim 1, wherein the training data is used to estimate an expected time interval between the time of occurrence of a predictive pattern and the time of failure of the machine, averaged over multiple training time series, and the machine is stopped or replaced before this average expected time interval after the appearance of the predictive pattern has elapsed.
12. The system of claim 1, wherein the training data is used to estimate a minimal time interval between the time of occurrence of a predictive pattern and the time of failure of the machine, over multiple training time series, and the machine is stopped or replaced before this minimal time interval after the appearance of the predictive pattern has elapsed.
13. A method for managing an impending failure of a machine by determining a pattern in time series data representing an operation of the machine, comprising: accessing stored data, the stored data includes stored predictive patterns determined from a set of training data examples generated by a sensor in communication with a training machine stored in a computer readable memory, each training data example represents an operation of the training machine for a period of time ending with a failure of the training machine, such that the set of training data examples are unprocessed data such that regions of normal operation and abnormal operation in them are unknown, wherein the stored predictive patterns are previously determined, by a training processor, that is configured to: sequentially, selecting a length of a period of time for a current time series for the abnormal state region within each training data example, by considering in sequence every possible splitting point in the time series in turn; partitioning for the currently selected splitting point, each training data example in the set of training data examples into the normal state region and the abnormal state region having the length of the period of time for the current time series, to identify a pattern in the set of training data examples, such that the pattern is different from any other patterns present in all normal state regions of the set of training data examples, and is similar to exactly one pattern in each abnormal state region of the set of training data examples; and selecting the pattern as the predictive pattern indicative of a predictive estimate of an impending failure of the training machine, and if the pattern is found, store in the computer readable memory; incrementing the estimated time moment of start of the abnormal operation by one, and proceed with a next iteration of sequential partitioning the current time series, until the estimated time moment of the start of the abnormal operation reaches the end of the current time series; receiving, via an input interface, test data examples from a sensor in communication with the machine; using a computer in communication with the computer readable memory and the input interface, the computer is configured to: determine, based on at least one stored predictive pattern in the computer readable memory, whether one or more test data example extracted from the test data examples matches a pattern of the machine that corresponds to the at least one stored predictive pattern, and use the match as an indication of an impending failure of the machine, if the pattern is found; and manage the machine, according to the predictive estimate of the impending failure of the machine.
14. The method of claim 13, wherein each training data example in the set of training data examples were sampled at a sampling rate or an approximate same length of period of time, for the set of training data examples.
15. The method of claim 13, wherein a user interface in communication with the computer and the computer readable memory, acquires and stores the set of training data examples in the computer readable memory upon receiving an input from a surface of the user interface by a user.
16. The method of claim 13, further comprising: a characterization module for determining a suitability of each candidate predictive pattern using a maximal margin algorithm, that is determined from the set of training data examples and wherein the computer readable memory includes stored executable instructions for storing each predictive pattern and each predictive pattern's identified characteristics based upon data from the set of training data examples; a filter for validating each predictive pattern, corresponding to a predetermined predictive pattern, from the set of training data examples, based on the identified characteristics and rating the predictive pattern; and a filter for excluding each predictive pattern based on the identified characteristics and rating the predictive pattern, wherein the computer readable memory stores each rated predictive pattern that is outside a feasibility threshold limit.
17. The method of claim 13, further comprising: a characterization module for determining a suitability of each candidate predictive pattern using a maximal margin algorithm, wherein the computer readable memory includes stored executable instructions for storing each predictive pattern's identified characteristics based upon data from the set of training data examples; validating each predictive pattern using a filter, wherein each predictive pattern corresponds to a predetermined predictive pattern, from the set of training data examples, based on the identified characteristics of the predictive pattern and rating the predictive pattern; and excluding each predictive pattern using a filter based on the identified characteristics and rating the predictive pattern, wherein the computer readable memory stores each rated predictive pattern that is outside a feasibility threshold limit.
18. A non-transitory computer readable storage medium embodied thereon a program executable by a computer for performing a method for managing an impending failure of a machine by determining a pattern in time series data representing an operation of the machine, the method comprising: accessing stored data, the stored data includes stored predictive patterns determined from a set of training data examples generated by a sensor in communication with a training machine stored in the non-transitory computer readable storage medium, each training data example represents an operation of the training machine for a period of time ending with a failure of the training machine, such that the set of training data examples are unprocessed data such that regions of normal operation and abnormal operation in them are unknown, wherein the previously stored predictive patterns are determined, by a training processor configured to: sequentially, selecting a length of a period of time for a current time series for the abnormal state region within each training data example by considering in sequence every possible splitting point in the time series in turn; partitioning for the currently selected splitting point, each training data example in the set of training data examples into the normal state region and the abnormal state region having the length of the period of time for the current time series, to identify a pattern in the set of training data examples, such that the pattern is different from any other patterns present in all normal state regions of the set of training data examples, and is similar to exactly one pattern in each abnormal state region of the set of training data examples; selecting the pattern as the predictive pattern indicative of a predictive estimate of an impending failure of the training machine, and if the pattern is found, store in the computer readable memory; increment the estimated time moment of start of the abnormal operation by one, and proceed with a next iteration of sequential partitioning the current time series, until the estimated time moment of the start of the abnormal operation reaches the end of the current time series; receiving, via an input interface, test data examples from a sensor in communication with the machine; using a computer in communication with the computer readable memory and the input interface, the computer is configured to: determine, based on at least one stored predictive pattern in the computer readable memory, whether one or more test data example extracted from the test data examples matches a pattern of the machine that corresponds to the at least one stored predictive pattern, and use the match as an indication of an impending failure of the machine, if the pattern is found; and manage the machine, according to the predictive estimate of the impending failure of the machine.
19. A system for managing an impending failure of a machine by determining a pattern in time series data representing an operation of the machine, comprising: a memory includes stored data, the stored data includes stored predictive patterns determined from a set of training data examples generated by a plurality of sensors in communication with a training machine, each training data example represents an operation of the training machine for a period of time ending with a failure of the training machine, such that the set of training data examples are unprocessed data such that regions of normal operation and abnormal operation in them are unknown, wherein the stored predictive patterns are previously determined, by a training processor configured to: sequentially, select a length of a period of time for a current time series for the abnormal state region within each training data example by considering in sequence every possible splitting point in the time series in turn; partition for the currently selected splitting point, each training data example in the set of training data examples into the normal state region and the abnormal state region having the length of the period of time for the current time series, to identify a pattern in the set of training data examples, such that the pattern is different from any other patterns present in all normal state regions of the set of training data examples, and is similar to exactly one pattern in each abnormal state region of the set of training data examples; select the pattern as the predictive pattern indicative of a predictive estimate of an impending failure of the training machine, and if the pattern is found, store the predictive pattern in the memory; a user interface in communication with a computer and the memory, acquires and stores sensor data, wherein the computer is configured to: determine, based on at least one stored predictive pattern in the memory, whether one or more test data example extracted from the test data examples matches a pattern of the machine that corresponds to the at least one stored predictive pattern, and use the match as an indication of an impending failure of the machine, if the pattern is found; and manage the machine, according to the predictive estimate of the impending failure of the machine.
20. A method for managing an impending failure of a machine by determining a pattern in time series data representing an operation of the machine, comprising: using a memory having stored data, the stored data includes stored predictive patterns determined from a set of training data examples generated by a plurality of sensors in communication with a training machine, each training data example represents an operation of the training machine for a period of time ending with a failure of the training machine, such that the set of training data examples are unprocessed data such that regions of normal operation and abnormal operation in them are unknown, wherein the stored predictive patterns are previously determined, by a training processor that is configured to: sequentially, selecting a length of a period of time for a current time series for the abnormal state region within each training data example by considering in sequence every possible splitting point in the time series in turn; partitioning for the currently selected splitting point, each training data example in the set of training data examples into the normal state region and the abnormal state region having the length of the period of time for the current time series, to identify a pattern in the set of training data examples, such that the pattern is different from any other patterns present in all normal state regions of the set of training data examples, and is similar to exactly one pattern in each abnormal state region of the set of training data examples; selecting the pattern as the predictive pattern indicative of a predictive estimate of an impending failure of the training machine, and if the pattern is found, store in the memory; increment the estimated time moment of start of the abnormal operation by one, and proceed with a next iteration of sequential partitioning the current time series, until the estimated time moment of the start of the abnormal operation reaches the end of the current time series; receiving, via an input interface, test data examples generated by a sensor in communication with the machine; using a computer in communication with the memory and the input interface, the computer is configured to: determining, based on at least one stored predictive pattern in the memory, whether one or more test data example extracted from the test data examples matches a pattern of the machine that corresponds to the at least one stored predictive pattern, and use the match as an indication of an impending failure of the machine, if the pattern is found; and managing the machine, according to the predictive estimate of the impending failure of the machine.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The presently disclosed embodiments will be further explained with reference to the attached drawings. The drawings shown are not necessarily to scale, with emphasis instead generally being placed upon illustrating the principles of the presently disclosed embodiments.
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22) While the above-identified drawings set forth presently disclosed embodiments, other embodiments are also contemplated, as noted in the discussion. This disclosure presents illustrative embodiments by way of representation and not limitation. Numerous other modifications and embodiments can be devised by those skilled in the art which fall within the scope and spirit of the principles of the presently disclosed embodiments.
DETAILED DESCRIPTION
(23) The following description provides exemplary embodiments only, and is not intended to limit the scope, applicability, or configuration of the disclosure. Rather, the following description of the exemplary embodiments will provide those skilled in the art with an enabling description for implementing one or more exemplary embodiments. Contemplated are various changes that may be made in the function and arrangement of elements without departing from the spirit and scope of the subject matter disclosed as set forth in the appended claims.
(24) Specific details are given in the following description to provide a thorough understanding of the embodiments. However, understood by one of ordinary skill in the art can be that the embodiments may be practiced without these specific details. For example, systems, processes, and other elements in the subject matter disclosed may be shown as components in block diagram form in order not to obscure the embodiments in unnecessary detail. In other instances, well-known processes, structures, and techniques may be shown without unnecessary detail in order to avoid obscuring the embodiments. Further, like reference numbers and designations in the various drawings indicated like elements.
(25) Also, individual embodiments may be described as a process which is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process may be terminated when its operations are completed, but may have additional steps not discussed or included in a figure. Furthermore, not all operations in any particularly described process may occur in all embodiments. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, the function's termination can correspond to a return of the function to the calling function or the main function.
(26) Furthermore, embodiments of the subject matter disclosed may be implemented, at least in part, either manually or automatically. Manual or automatic implementations may be executed, or at least assisted, through the use of machines, hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the necessary tasks may be stored in a machine readable medium. A processor(s) may perform the necessary tasks.
Definition of Terms
(27) According to the definition of terms with regard to the present disclosure, the term Shapelet can be defined as a characteristic subsequence of a time series that helps distinguish the class that this time series belongs to.
Overview of Embodiments of the Present Disclosure
(28)
(29) At least one realization of the present disclosure is based on finding subsequences in a time series that have maximal predictive power about future events, such as failure. At least one underlying assumption is that in some time T before the event, the characteristics of the time series will change as a precursor to the impending event. The change can be expressed as the emergence of one or more subsequences that were not seen before. This is defined or called such subsequences as predictive patterns, as noted above.
(30) Still referring to
(31) However, in most cases T is actually unknown, which greatly exacerbates the problem. If we guess too small a value of T, we may dismiss the predictive pattern. If our guess of T is too large, the search space grows quadratically, and we will very likely find spurious normal patterns simply because those patterns do not have enough time to show up before our guessed system change point. For example, assume length of the time series is N, and T is much larger than N/2, then we may find a lot of subsequences of length NT that perfectly distinguish the normal and abnormal states, as long as the subsequences do not appear right at the beginning of the time series.
(32) To maximize the opportunity of finding useful predictive patterns and to avoid finding spurious rules, we have designed an algorithm based on the Minimum Description Length concept to help determine a suitable T.
(33) Still referring to
(34) If we treat this as a classification problem, we will find that a Shapelet discovery algorithm is directly applicable. However, a Shapelet discovery algorithm, or other classification algorithms would terminate when they find only the smallest set of subsequences to differentiate abnormal regions from normal regions, while in real life systems, there can be a lot more predictive patterns than the smallest set. Discovering all these patterns is desirable in our problem setting, as in that way we will be able to make earlier and more accurate predictions. Besides, classification algorithms cannot always guarantee the appearance of certain patterns, because the class splitting points/boundaries can be very far away from class centers. So classification algorithms do not fit our needs here.
(35) However, before discussing finding predictive patterns, we first need to formally define a way to measure the predictive power of a time series subsequence. In general, we want the predictive pattern to: (A) be a subsequence in the abnormal region of a time series; (B) be very different from any subsequences in the normal region of any time series; and (C) be very similar to at least one subsequence in the abnormal region of another time series.
(36) Conditions (A) and (B) can be intuitive. Condition (C) is also necessary because if the pattern only appears in one-time series, it is very possibly noise, and we cannot generalize it as a predictive rule. So the more time series we have in the dataset, the better.
(37)
(38) Wherein,
(39) So, how can we learn how to identify the subsequences that can be used to distinguish between normal and abnormal states and not obtain a pattern that is possibly noise, and end with a result that measures the predicting power of a time series subsequence, i.e. identifies the predictive pattern(s) in the set of training data examples?
(40) The system of the present disclosure initially starts (i.e. Step 1 of
(41) Step 2 of
(42) Step 3 of
(43) Step 4 of
(44) Step 5 of
(45) Thus,
(46) Aspects of Components of the Present Disclosure
(47) Referring to
(48) Still referring to
(49) Still referring to
(50) Still referring to
An Embodiment of the Steps of the Present Disclosure
(51)
(52) Referring to steps 145 and 150 of
(53) Step 155 of
(54) Step 160 of
(55) Step 165 of
(56) Step 170 of
(57)
Further, the predictive pattern is different from a pattern in the normal region, if a Euclidean distance between the two patterns exceeds a pre-specified threshold. Further still, the predictive pattern is considered similar to a pattern in the normal region if a Euclidean distance between the two patterns is lower than a pre-specified threshold.
(58) Step 175 of
(59) Referring to step 180 and step 185 of
(60) Step 190 of
(61) Maximal Margin Algorithm
(62)
(63) If we use a brute force maximal margin algorithm, we will need to search for all subsequences of length l in D.sub.abnormal, and for every subsequence S.sub.i,j, we need to find its nearest neighbor in both D.sub.normal and D.sub.
(64) The nearest neighbor search of each subsequence has a complexity O(mn). If we can lower that complexity for a large portion of subsequences in the dataset, the algorithm can be largely accelerated. So instead of using a brute force algorithm, here we introduce a novel upper bound for the maximal margin of subsequences. If the upper bound exceeds the best-so-far margin, we can simply prune the subsequence out and avoid looking for nearest neighbors.
(65) Since the margin of S.sub.i,j is dist.sub.S.sub.
(66) Suppose our current candidate is S=S.sub.i,j, which is a subsequence in D.sub.i,abnormal. We have a random subsequence RS in D.sub.i,abnormal. The nearest neighbor of R in D.sub.normal is RNN.sub.n, and that of S is SNN.sub.n, as is shown in
(67)
(68) Now suppose the nearest neighbor of R in D.sub.
(69) Table 1 shows the smart maximal margin algorithm accelerated by the upper bound. The algorithm takes the dataset D, a fixed T, length of the candidate subsequence l and the number of random subsequences R as inputs, then outputs the predictive pattern PP with maximal margin, its nearest neighbor in abnormal region PP.sub.nn, and the maximal margin value MM.
(70) TABLE-US-00001 TABLE 1 Maximal Margin Algorithm Algorithm 1 Maximal Margin (D, T , l, R) Input: D: dataset; T: length of abnormal region, l: subsequence length, R: number of random selections Output: PP: Predictive Pattern with Maximal Margin; PP.sub.nn: nearest neighbor of predictive pattern in abnormal region; MM: maximal margin value. 1: Dnorm , Dabnorm
2: for i 1 to |D| do //every time series in D m = |D(i)| //length of the ith time series 3: Dnorm(i) D(i,1:m T + l 1), Dabnorm(i) D(i,m T + 1:m) 4: end for MM 0, PP
, PP.sub.nn
5: for i 1 to |Dabnorm| do 6: RD Randomselect(Dabnorm(i),R,l) //random select R subsequences of length l 7: RNN.sub.n
, Rdist.sub.ab
8: for j 1 to R do (RNN.sub.n(j),dist.sub.n) FindNN(RD(j),Dnorm,inf) 9: Dabnorm_other all time series in Dabnorm except the one that includes RD(j) 10: (RNN.sub.ab,dist.sub.ab) FindNN(RD(j),Dabnorm_other,inf) 11: Rdist.sub.ab(j) dist.sub.ab 12: ifdist.sub.n dist.sub.ab > MM then MM dist.sub.n dist.sub.ab, PP RD(j),PP.sub.nn RNN.sub.ab //update 13: maximal margin 14: end if end for 15: for j 1 to |Dabnorm(i)| l + 1 do S Dabnorm(i,j:j + l 1) //search for every candidate subsequence 16: if S is not in RD then 17: Maxb CalculateMaxbound(S,RNN.sub.n) Minb CalculateMinbound(S,RD,Rdist.sub.ab) 18: ifMaxb Minb MM then//upper bound of the margin 19: continue 20: end if (NN.sub.n,dist.sub.n) FindNN(S,Dnorm,Maxb) 21: ifdist.sub.n Minb MM then // a tighter upper bound of the margin continue 22: end if (NN.sub.ab,dist.sub.ab) FindNN(S,Dnorm,Minb) 23: ifdist.sub.n dist.sub.ab > MM then 24: MM dist.sub.n dist.sub.ab, PP S, P.sub.nn NN.sub.ab//update maximal margin 25: end if 26: end if 27: end for end for 28: return PP, MM 29:
(71) Lines 1-5 divides the whole dataset into a normal dataset and an abnormal dataset according to T. Lines 8-17 randomly choose R subsequence of length l in the ith abnormal time series and find their nearest neighbors in both in D.sub.normal and D.sub.
(72) TABLE-US-00002 TABLE 2 Maxbound Algorithm Algorithm 2 Calculate Maxbound (S,RNN.sub.n) Input: S: subsequence candidate; RNN.sub.n: Nearest Neighbors of random subsequences in normal region. Output: Maxb: upper bound of distance between S and its nearest neighbor in normal region 1: S Znormalize(S), Maxb inf 2: for i 1 to |RNN.sub.n| do //every random subsequence RS Znormalize(RNN.sub.n(i)), distance ED(S,RS) 3: ifdist < Maxb then Maxb Distance 4: end if 5: end for 6: return Maxb
(73) TABLE-US-00003 TABLE 3 Minbound Algorithm Algorithm 2 Calculate Minbound (S,RD,Rdist.sub.ab) Input: S: subsequence candidate; RD: random subsequences, Rdist.sub.ab: Distances between random subsequences and their nearest neighbor in the abnormal region Output: Minb: lower bound of distance between S and its nearest neighbor in abnormal region 1: S Znormalize(S), Minb inf 2: for i 1 to |RD| do //every random subsequence RS Znormalize(RD(i)), dist ED(S,RS) 3: ifRdist.sub.ab(i) dist > Minb then Minb dist 4: end if 5: end for 6: return Minb
(74) The upper bound of the margin evaluated by Maxbound and Minbound largely accelerates the maximal margin algorithm. Experiments so far show a speed up of more than one magnitude.
(75) Selecting the Best Predictive Pattern of all Lengths
(76) The maximal margin algorithm shows us how to find the best predictive pattern of a fixed length l. Since l is not given in a time series, we need to search for all possible lengths, and define a measure of maximal margin which is invariant of length. Here we simply select the subsequence with length
(77)
Finding all Possible Predictive Patterns by MDL
(78)
(79) Up to now, we have a method to find the most predictive pattern in a data set of Run-to-Failure time series. But sometimes there can be more than one predictive pattern in the time series. For example,
(80) Also note that the maximal margin algorithm only selects a pair of similar subsequences in the abnormal region, so they are related to only two time series. If there are more than two time series, we will need to find a match of the predictive pattern in the rest time series as well.
(81) Still referring to
(82)
(83) Essentially, we can use Description Length (DL) to represent the bit length that is needed to express the subsequence. Entropy is a good measure of DL. We have DL (A)=Entropy(A), DL(H)=Entropy(H), A=AH and (A)=Entropy(A). H is called hypothesis. If we regard a subsequence as hypothesis, then instead of using DL.sub.old=DL(A)+DL(H) bits to represent the pair of A and H, we can use DL.sub.new=Entropy(H)+Entropy(AH). The number of bits saved here is bittosave=DL.sub.newDL.sub.old=DL(A)DL(AH). The two subsequences are very similar to each other, so DL(A)=Entropy(A) is very small in this case. As a result, bittosave is a large positive number. So essentially, if subsequences are similar to each other, we should have large positive bittosave. Otherwise bittosave is negative.
(84) Still referring to
(85) (1) We find predictive patterns in the abnormal region of multiple time series instead of only one time series;
(86) (2) We find candidate predictive patterns based on maximal margin algorithm instead of the motif discovery algorithm;
(87) (3) The routine in the main loop is different:
(88) (a) If there are no more unmarked subsequences in the abnormal region, end. Otherwise find a pair or predictive patterns by the maximal margin algorithm;
(89) (b) Then we investigate whether the pair or patterns are a match by evaluating bittosave. If bittosave<0, end. Otherwise we use the CreateCluster process to create a cluster for the predictive pattern found; and
(90) (c) Then we iteratively use the AddToCluster process to add subsequences to the predictive pattern cluster until bittosave0. Mark out all subsequences added. Then go to (a) again.
(91) (4) The MergeCluster process is not used.
(92) Up to this point, we are able to find all predictive patterns when T is known.
(93) When T is Unknown
(94)
(95)
(96)
(97) Referring to
(98)
(99)
(100)
(101) Referring to
(102) The solution is MDL. With a similar routine as described in section Finding all possible predictive patterns by MDL, we can find all the match of the candidate pattern in the dataset by the AddToCluster operation until bittosave<0. As
(103) We iterate the process of and until for predictive patterns of all lengths found by the maximal margin algorithm, (i.e. step 180 above), we have P<T, or the predictive pattern only appears at most once in a time series. T is correctly set after the iteration terminates.
(104) After T is correctly set, we can simply run the algorithm in section to find out all possible predictive patterns (i.e. step 185 above).
(105)
(106)
(107) Referring to
(108) Step 995 includes determining a predictive pattern for the second machine, and selecting it, if found, according to processing the test data stream or set of test data examples via steps 945 to 990.
(109) Step 999, determines based on the determined predictive pattern of the second machine 902, if the determined predictive pattern of the second machine corresponds to a stored predictive pattern in memory 112, to predict a failure of the second machine 902.
(110) Specifically, whether one or more test data examples extracted from the test data stream predict a failure of the second machine 902. The one or more test data examples are extracted from one or more portions of the test data stream. For example, the processor 114 may determine whether one or more test data examples extracted from the test data stream predict a failure of the machine 902. The test data in the one or more portions of the test data stream were sampled at the same sampling rate as the stored training data in the training data examples used to generate the determined predictive pattern(s). Further, the method 900 can include predicting a failure of the second machine 902, if a ratio of a number of test data examples of the one or more test data examples that predict the failure of the second machine 902 to a total number of the one or more test data examples processed based on the determined predictive patterns (from the test data examples via steps 945 to 990), exceeds a threshold. For example, the threshold may be a value determined based on empirical analysis.
(111) In some example embodiments, the method 900 can exclude a portion of the training data stream that may include invalid data in extracting the training data examples from the one or more portions of the training data stream. The method 900 may also include extracting the training data examples such that two consecutive/adjacent data examples have overlapping data portions and non-overlapping data portions, wherein the overlapping data portions are less than a threshold percentage of the length of the training data segments, which could be predetermined such as 10%, 40% or 80%.
(112)
(113)
(114) Referring to
(115) Step 1010 of
(116) Step 1010 includes identifying, based on the stored set of training data examples including the normal state region and abnormal state region for each training data example, each test data example of the two sets of test data examples of the third machine, that correspond to a stored normal state region of at least one stored training data example or at least one stored abnormal state region of at least one stored training data example, to identify a predictive pattern for each test data stream of the third machine.
(117) Step 1010 includes predicting a failure of the third machine by taking either one the two predictive patterns from the two test data streams of the two sensors, when compared to the stored predictive patterns in memory.
(118) Referring to
(119)
(120) Step 1155 of
(121) Step 1160 of
(122) Step 1165
(123) Step 1170 of
(124) Step 1175 of
(125) Step 1180 includes the iterative process including the iteratively partitions of each training data example, and step 185 includes shortening the current time series length, by an increment of one-time step, per iteration, so the current time series length is shorter than a previous current time series length for the abnormal state region selected for a previous iteration within the training data example.
(126) Step 1185 of
(127) Step 1190 of system
(128) Step 1195 of system
(129) It is contemplated the ranking of the sensor may be by several methods. For example, in order to identify the most relevant sensors for failure prediction from among the sensors 904, a single sensor classification accuracy value may be computed for a number of features of each test data stream from the respective sensors 904. In some example embodiments, the computed features can be a mean value, a missing data points, a mean slope, a ratio of measurements, and an exponential decay. Mean value refers to the mean of each test data value, excluding any missing data points. For example, the mean value can be used as a feature because failures may be correlated with a decrease or increase in a particular parameter (e.g., vibration, or some other measurement) from the parameter's normal value.
(130)
(131) Step 1105 of
(132)
(133) The computer 1211 can include a power source 1254, depending upon the application the power source 1254 may be optionally located outside of the computer 1211. Linked through bus 1256 can be a user input interface 1257 adapted to connect to a display device 1248, wherein the display device 1248 can include a computer monitor, camera, television, projector, or mobile device, among others. A printer interface 1259 can also be connected through bus 1256 and adapted to connect to a printing device 1232, wherein the printing device 1232 can include a liquid inkjet printer, solid ink printer, large-scale commercial printer, thermal printer, UV printer, or dye-sublimation printer, among others. A network interface controller (NIC) 1234 is adapted to connect through the bus 1256 to a network 1236, wherein time series data or other data, among other things, can be rendered on a third party display device, third party imaging device, and/or third party printing device outside of the computer 1211.
(134) Still referring to
(135)
(136) The method includes a characterization module for identifying characteristics of each predictive pattern and wherein the computer readable memory includes stored executable instructions for storing each predictive pattern and each predictive pattern's identified characteristics based upon data from the set of training data examples. Further, the method includes a filter for validating each predictive pattern, corresponding to a predetermined predictive pattern, from the set of training data examples, based on the identified characteristics and rating the predictive pattern. Further still, a filter for excluding each predictive pattern based on the identified characteristics and rating the predictive pattern, wherein the computer readable memory stores each rated predictive pattern that is outside a feasibility threshold limit.
(137) Still referring to
(138) The characterization module can determine different characteristics for every predictive pattern found. The characterization module reads the predictive patterns, and their associated characteristics, computes pattern and characteristics of the pattern and write results back to the processor. An example of a pattern characteristic is a symmetry number. Symmetry is a measure of the similarity of the two halves of a pattern. For example, with a head and shoulder pattern, the symmetry number can identify how balanced the head is and how similar the left and right shoulders are to each other.
(139) Patterns and pattern characteristic information can be passed to filter that screens output based on defined criteria. These can be supplied by pre-stored data in memory. Filters restrict the patterns passed out of the system to ensure that patterns delivered meet certain minimum thresholds. For example, a filter may specify that only patterns of a high symmetry number are to be passed.
(140) Still referring to
(141) Still referring to
(142) Also, the various methods or processes outlined herein may be coded as software that is executable on one or more processors that employ any one of a variety of operating systems or platforms. Additionally, such software may be written using any of a number of suitable programming languages and/or programming or scripting tools, and also may be compiled as executable machine language code or intermediate code that is executed on a framework or virtual machine. Typically, the functionality of the program modules may be combined or distributed as desired in various embodiments.
(143) Also, the embodiments of the present disclosure may be embodied as a method, of which an example has been provided. The acts performed as part of the method may be ordered in any suitable way. Accordingly, embodiments may be constructed in which acts are performed in an order different than illustrated, which may include performing some acts concurrently, even though shown as sequential acts in illustrative embodiments. Further, use of ordinal terms such as first, second, in the claims to modify a claim element does not by itself connote any priority, precedence, or order of one claim element over another or the temporal order in which acts of a method are performed, but are used merely as labels to distinguish one claim element having a certain name from another element having a same name (but for use of the ordinal term) to distinguish the claim elements.
(144) Although the present disclosure has been described with reference to certain preferred embodiments, it is to be understood that various other adaptations and modifications can be made within the spirit and scope of the present disclosure. Therefore, it is the aspect of the append claims to cover all such variations and modifications as come within the true spirit and scope of the present disclosure.