Diagnosing combinations of failures in a system

Abstract

A system/method of diagnosing combinations of failures in a system includes receiving symptom data (116) including information relating to observed or detected symptoms in a system. The system/method generates (D4a, D4b, D5) failure data (118) including information relating to at least one most probable failures in the system based on the symptom data, and processes (D9) the failure data and the symptom data using an L-best inference (e.g. a Ranked Algorithm (RA)) technique in order to generate failure set data (120), the failure set data including information relating to at least one most probable combination of the failures that explain the symptoms.

Claims

1. A method of diagnosing combinations of failures in a system, the method including: receiving symptom data including information relating to observed or detected symptoms in a system; generating failure data including information relating to at least two most probable failures in the system based on the symptom data; processing the failure data and the symptom data using an L-rank inference class algorithm in order to generate failure set data, the failure set data including information relating to combinations of the at least two most probable failures that explain the symptoms; and selecting an algorithm for use in generating the failure data, wherein a comparison based on a number of present symptoms in the symptom data and a user-configurable value is used to select the algorithm for use in generating the failure data, and if the number of present symptoms is greater than a user-configurable value then an approximate marginal inference algorithm is selected; otherwise, an exact marginal inference algorithm is selected.

2. The method according to claim 1, wherein processing the failure and symptom data using the L-rank inference class algorithm includes using a Ranked Algorithm (RA) algorithm to produce a probability value associated with each said combination of the at least two most probable failures in the failure set data.

3. The method according to claim 2, further including ordering the combinations of the at least two most probable failures in the failure set data in accordance with the associated probability values.

4. The method according to claim 3, further including displaying information relating to at least one of the combinations of the at least two most probable failures and the associated probability values.

5. The method according to claim 1, wherein generating the failure data includes processing the symptom data using the approximate marginal inference algorithm, and wherein the marginal inference algorithm generates a ranked list of probable failures.

6. The method according to claim 1, wherein the at least two most probable failures not including a user-configured number of the at least two most probable failures are provided as absent failures data to be processed using the L-rank inference class algorithm.

7. The method according to claim 6, wherein each said combination of the at least two most probable failures in a said failure set has an associated confidence/probability value and the confidence/probability value is used to determine whether or not the processing using the L-rank inference class algorithm is to be used to generate at least one further failure set.

8. The method according to claim 1, further including halting the processing of the failure data and the symptom data using the L-rank inference class algorithm when a user-configurable number of said combinations of the at least two most probable failures has been generated.

9. The method according to claim 1, wherein the symptom data processed using the L-best inference class algorithm includes present symptoms that have at most a user-configurable number of parent failures.

10. The method according to claim 1, wherein a user-configurable number of the at least two most probable failures in the system, generated by generating the failure data, is included in the failure data to be processed using the L-rank inference class algorithm.

11. An apparatus adapted to diagnose combinations of failures in a system, the apparatus including: a processor configured to receive symptom data including information relating to observed or detected symptoms in a system; a processor configured to generate failure data including information relating to at least two most probable failures in the system based on the symptom data; a processor configured to process the failure data and the symptom data using an L-rank inference class algorithm in order to generate failure set data, the failure set data including information relating to at least one most probable combination of the at least two most probable failures that explain the symptoms; and a processor configured to select an algorithm for use in generating the failure data, wherein a comparison based on a number of present symptoms in the symptom data and a user-configurable value is used to select the algorithm for use in generating the failure data, and if the number of present symptoms is greater than a user-configurable value then an approximate marginal inference algorithm is selected; otherwise, an exact marginal inference is selected, and wherein the processors may be the same processor or may include multiple processors.

12. A non-transitory computer program product encoded with instructions that when executed by one or more processors cause a process of diagnosing combinations of failures in a system to be carried out, the process including: receiving symptom data including information relating to observed or detected symptoms in a system; generating failure data including information relating to at least two most probable failures in the system based on the symptom data; and processing the failure data and the symptom data using an L-rank inference class algorithm in order to generate failure set data, the failure set data including information relating to at least one most probable combination of the at least two most probable failures that explain the symptoms; selecting an algorithm for use in generating the failure data, wherein a comparison based on a number of present symptoms in the symptom data and a user-configurable value is used to select the algorithm for use in generating the failure data, and if the number of present symptoms is greater than a user-configurable value then an approximate marginal inference algorithm is selected; otherwise, an exact marginal inference is selected.

13. The non-transitory computer program product according to claim 12, wherein processing the failure and symptom data using the L-rank inference class algorithm includes using a Ranked Algorithm (RA) algorithm to produce a probability value associated with each said combination of the at least two most probable failures in the failure set data.

14. The non-transitory computer program product according to claim 13, further including at least one of: ordering the combinations of the at least two most probable failures in the failure set data in accordance with the associated probability values; and displaying information relating to at least one of the combinations of the at least two most probable failures and the associated probability values.

15. The non-transitory computer program product according to claim 12, wherein generating the failure data includes processing the symptom data using the approximate marginal inference algorithm, and wherein the marginal inference algorithm generates a ranked list of probable failures.

16. The non-transitory computer program product according to claim 12, wherein the failures not including the user-configured number of the most probable failures are provided as absent failures data to be processed using the L-rank inference class algorithm.

17. The non-transitory computer program product according to claim 12, the process further including halting the processing of the failure data and the symptom data using the L-rank inference class algorithm when a user-configurable number of said combinations of the failures, has been generated.

18. The non-transitory computer program product according to claim 17, wherein each said combination of the at least two most probable failures in a said failure set has an associated confidence/probability value and the confidence/probability value is used to determine whether or not the processing using the L-rank inference class algorithm is to be used to generate at least one further failure set.

Description

(1) The invention may be performed in various ways, and, by way of example only, embodiments thereof will now be described, reference being made to the accompanying drawings in which:

(2) FIG. 1 is a high level system architecture diagram of an embodiment of the diagnosis method;

(3) FIG. 2 is a flowchart illustrating example steps performed by the diagnosis method, and

(4) FIGS. 3 and 4 are example screen displays produced by the diagnosis method.

(5) The existing Diagnosis method includes a marginal inference (Bayesian Inference) module that takes Symptoms as its input and returns probable Failures as a ranked list at its output. In embodiments of the present system, the system architecture has been extended to include another inference technique, i.e. a heuristic L-best inference (e.g. RA) module, in order to provide the ranked combinations of the probable failures, i.e. the failure sets, as well. Thus, with the addition of this capability, the diagnosis can now provide a ranked list of failure sets in addition to the ranked list of the failures.

(6) FIG. 1 shows the high level system architecture diagram for the extended diagnosis method 102, which will typically be implemented on at least one computer system having a processor, memory and communications interface. The diagnosis method receives data representing a diagnosis model from a data store 104. The model can be in any suitable format, typically data based on a probabilistic graphical model of the system, its components (and possibly its subsystems) and the relationship between them. The method can further receive input representing sensor symptom readings via an interface 106 and can also receive input representing manually-captured symptoms, based on data relating to observed symptoms provided by users of the system being diagnosed, via another interface 108. The diagnosis system can also communicate with a maintenance support system 110. It will be appreciated that the illustrated architecture of FIG. 1 is exemplary only and in other embodiments some of the components may be integrated into single elements; distributed over two or more elements; omitted and/or include at least one further component.

(7) The method 102 includes a marginal inference (Bayesian interface) module 112, a heuristic L-best inference (e.g. RA) module 114. The marginal interface module can receive as input symptoms data 116 received via the interfaces 106, 108 and can output failures data 118. The heuristic L-best inference module can receive as its inputs the symptoms 116 and failures data 118 and can generate ranked probable failure sets data 120 at its output.

(8) Embodiments of the extended diagnosis method 102 have been implemented in Python, C++, and Matlab, although the skilled person will understand that any suitable programming language and data structures can be used. Embodiments of the marginal inference module 112 can run on C++, the heuristic L-best inference module runs on Python, and the user interface can run on Matlab.

(9) FIG. 2 illustrates schematically an example of the method performed by the diagnosis method 102. The skilled person will appreciate that the steps shown are exemplary only and that in alternative embodiments, some of them may be omitted and/or re-ordered.

(10) The method can receive data representing present symptoms and absent symptoms (e.g. via interface 106 and/or 108). A symptom consists of at least one symptom observation and can include conditions. A portion of sample list of symptoms is presented below, along with their corresponding symptom codes:

(11) . . .

(12) S(32) Cond: Valve 2 commanded closed Obs: Valve Sensor 2.1 set true

(13) S(33) Cond: Valve 2 commanded open Obs: Valve Sensor 2.1 set false

(14) . . .

(15) S(47) Obs: Valve 5 Fuel escaping true

(16) S(48) Obs: Valve5 stuck in intermediate position

(17) . . .

(18) At step 206 a check is performed as to whether the number of present symptoms represented in the present symptom data and the absent symptoms data is greater than a value . The diagnosis method 102 can include parameters/values, such as , that are user-configurable, e.g. via a user-interface. In some cases they can be set to default values. In the example embodiment, has been set as 10.

(19) Data D1a and data D1b are sent simultaneously. At step 214 the present symptom data is analysed to obtain the present symptoms that have at most a user-configurable number (N, e.g. 10) of parent failures, i.e. causing failures where if one is present then the symptom is also likely to be present. N can be selected in order to achieve an acceptable balance between inference time and approximation error.

(20) Data D2a and data D2b are sent simultaneously. At step 216, the absent symptom data is received, ready for subsequent processing as will be described below.

(21) If the check performed at step 206 indicates that the number of present symptoms is greater than then at step 208 a technique based on an approximate inference algorithm (e.g. variational inference), using module 112, is performed on the symptoms data. Alternatively, if the check performed at step 206 indicates that the number of present symptoms is not greater than then at step 210 a technique based on an exact inference algorithm (e.g. the known Quickscore algorithm) is performed on the symptoms data. Execution of one of the selected algorithms can produce data representing a list of probable failures (data D4a if step 208 was performed; or data D4b if step 210 was performed), which may be ranked according to the marginal probability of each failure in the list at step 212.

(22) At step 214, the present symptoms that have at most N parent failures are passed to an RA-based technique (220) for processing. At step 216, the received absent symptom data is passed to the RA-based technique (220) for processing. At step 218 the method obtains the top M probable failures in the list, and the top M probable failures data is passed to the RA-based technique for processing (220). M is a user-configurable value and in the specific example has a value of 10. At step 220 the RA-based technique is executed, e.g. using module 114.

(23) At step 222 data based on the failure set generated at step 220 is obtained and at step 224 a check is performed as to whether the number of failure sets generated so far is greater than a value . In the specific example is set as 5.

(24) If the check of step 224 indicates that the number of failure sets generated is not greater than , then at step 226 a check is performed as to whether the smallest confidence (probability) value (corresponding to the last probable failure set) is sufficiently low (i.e. a check is performed as to whether the multiplication of the smallest confidence value with a user-configurable factor (, e.g. 10) is smaller than the previous confidence value). Confidence of a failure set can be defined in terms of the normalised ratio of the probability value for a failure set to the probability value for the most likely failure set as given in (Feili Yu, Fang Tu, Haiying Tu, and Krishna R. Pattipati, A Lagrangian Relaxation Algorithm for Finding the MAP Configuration in QMR-DT, IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART A: SYSTEMS AND HUMANS, VOL. 37, NO. 5, SEPTEMBER 2007, pp. 746-757).

(25) If the check of step 226 indicates that the smallest confidence (probability) value is not sufficiently low then control returns to step 220, where the RA technique is performed again to generate a further failure set.

(26) If the check of step 226 indicates that the smallest confidence (probability) value is sufficiently low then at step 228 data relating to the generated failure sets is produced and stored for further use.

(27) If the check of step 224 indicates that the number of failure sets generated is greater than then at step 228 data relating to the generated failure sets is produced and stored for further use.

(28) Further use of the failure sets data will typically involve being displayed on a computer terminal (either immediately or upon user request). Examples of how the data can be displayed will be discussed below, but it will be understood that the way in which the data can be displayed, formatted and used can vary.

(29) In a demonstration scenario, five symptoms were applied as present (+) to the diagnosis method 102, based on a hierarchical realistic large scale model of fuel rig. Other symptoms in the example were unobserved, thus unknown. In this example, the observed symptoms as present were (S32, S33, S47, S48 and S103).

(30) +S(32) Cond: Valve 2 commanded closed Obs: Valve Sensor 2.1 set true

(31) +S(33) Cond: Valve 2 commanded open Obs: Valve Sensor 2.1 set false

(32) +S(47) Obs: Valve 5 Fuel escaping true

(33) +S(48) Obs: Valve5 stuck in intermediate position

(34) +S(103) Obs: Tray 2 sometimes makes an unusual noise true

(35) S32 and 833 are from a subset of the symptoms for the failures (F296 or F297). S47 and 848 are from a subset of the symptoms for the failure (F74). S103 is a less informative symptom (i.e. symptom with many failure parents).

(36) F(74) Valve 5Leak from

(37) F(296) Power Supply 124 volt power supply failure

(38) F(297) Power Supply 1Phase A has failed

(39) FIG. 3 presents an example of the diagnosis user interface. Here, the symptoms are displayed in the first panel 302 at the top. The symptoms are ordered considering the Symptom Ordering radio box 304. In the illustrated case, the symptom ordering criteria is selected as Observed. The ranked list of probable failures obtained by using the Bayesian inference algorithm is displayed at the second panel 306; and the ranked list of probable multiple failure sets obtained by using the heuristic RA considering the symptoms and the Bayesian inference algorithm outputs at its inputs are displayed in the third panel 308.

(40) The conventional (Bayesian) diagnosis interface and the new heuristic RA interface can be compared. The second panel 306 is the conventional (Bayesian) diagnosis interface. It indicates that, given the evidence, failures 296 & 297 may be present, with fault 74 less likely.

(41) In this example, the conventional (Bayesian) diagnosis calculates the probable failures and sorts them considering their marginal probabilities. The top M (e.g. 10) probable failures and their marginal probabilities (P) can be as follows:

(42) F(296) Power Supply 124 volt power supply failure: P=0.4945

(43) F(297) Power Supply 1Phase A has failed: P=0.4945

(44) F(74) Valve 5Leak from: P=0.2480

(45) F(193) Tray 2Pipe between Pump and Valve blocked: P=0.0527

(46) F(201) Tray 2Pipe between Valve and Conjunction blocked: P=0.0527

(47) F(288) Conjunction 2blocked: P=0.0527

(48) F(3) Valve 3failed closed due to stripped gear train: P=0.0527

(49) F(9) Valve 3failed half open due to stripped gear train: P=0.0527

(50) F(14) Valve 3failed open due to stripped gear train: P=0.0527

(51) F(212) Valve 9Failed and is now always open: P=0.0527

(52) In this example, the new heuristic RA can take the above top M (e.g. 10) failures as probable failures at its input. It can also take the symptoms 32, 33, 47 and 48 as present symptoms at its input. Since symptom 103 is a less informative symptom, it is not considered in the example.

(53) The third panel 308 for the new heuristic RA interface provides additional information; indicating that given the evidence, failure 74 together with either 296 or 297 is the most likely. The heuristic RA interface indicates the possible failure combinations but the conventional (Bayesian) diagnosis interface does not. Thus, in this case, the heuristic RA results can be more informative than the conventional (Bayesian) inference results.

(54) In this example, the new heuristic RA calculates the probable failure sets and sorts them considering their confidence (probabilities). The top three probable failure sets and their confidence values (C), and the failure probability values (P) can be as follows:

(55) Failure set 1: C=0.4972 F74 Valve 5Leak from: P=1.0000 F297 Power Supply 1Phase A has failed: P=05028

(56) Failure set 2: C=0.4972 F74 Valve 5Leak from: P=1.0000 F296 Power Supply 124 volt power supply failure: P=05028

(57) Failure set 3: C=0.0055 F74 Valve 5Leak from: P=1.0000 F298 Power Supply 124 volt power supply failure: P=05028 F297 Power Supply 1Phase A has failed: P=05028

(58) In the new diagnosis user interface, the remaining symptoms can still be sorted according to the information gains calculated during performing the Bayesian inference. For this purpose, in the illustrative example of FIG. 4, the symptom ordering criteria is selected as Information by choosing the relevant radio box 404. The maintainer can consider these symptoms and decide to perform additional tests, if required.

(59) In order to provide probable ranked failure sets, embodiments of the present invention use an improved heuristic RA in which a configurable number of top probable failures provided by the Bayesian inference algorithm can be considered as the probable failures to investigate further by using the heuristic RA. Other inputs can include the observed symptoms. An approach, which ignores less informative present symptoms (which have many parent failures) in RA has been built. A further improvement has been made by automatically halting the evaluation in RA when sufficient probable ranked sets have been obtained. These are the heuristic techniques added to the RA.

(60) A number of experiments were performed to compare the diagnosis inference results and the computation times of the initial RA, Bayesian inference, and the heuristic (improved) RA. In the experiments, the initial RA has been speeded up approximately 2 orders of magnitude by adding heuristics and by considering the Bayesian inference outputs at RA inputs. The computation times of the Bayesian inference and the heuristic RA can now be of similar magnitudes to the conventional (Bayesian) diagnosis algorithm making it a practical add-on to the online diagnosis method. If the Bayesian inference was used alone then failure combinations might also be achieved by the maintainer after performing consecutive additional Bayesian inference test runs. In such a case, the maintainer would consider the information gains of the remaining symptoms at the consecutive runs and update the tests considering these consecutive results in order to investigate the probable failure sets. Troubleshooting in such complicated cases can be time consuming and be relied upon the maintainer capability. This can potentially lead to unnecessary component replacements. On the other hand, by using embodiments of the new extended diagnosis method, the drawbacks of sole usage of the initial approach can be avoided. Considering the experiments and the trial runs performed, it is concluded that the heuristic RA results cab be more informative than the Bayesian inference results because they can indicate possible failure combinations.

Diagnosing combinations of failures in a system

Assignee

Inventors

Cpc classification

Classification Explorer

G06F11/0751

PHYSICS

Classification Explorer

G06F11/0787

PHYSICS

Classification Explorer

G05B23/0278

PHYSICS

Classification Explorer

G06F11/079

PHYSICS

International classification

Classification Explorer

G06F11/07

PHYSICS

Classification Explorer

G05B23/02

PHYSICS

Abstract

Claims

Description