G06F11/2257

Fault tree analysis for technical systems
10942797 · 2021-03-09 · ·

A method for fault tree analysis of a technical system, which includes a plurality of functional units, the technical system being modeled as a tree-like logical linkage of causative events, which may culminate in an undesirable event, and the causative events including malfunctions of individual functional units, a tree-like logical linkage having a self-similar structure being selected. An associated computer program is described. A surroundings detection system and/or a control system for an at least partially automated driving vehicle, including a plurality of functional units having mutual dependencies, which link the functional units in a tree-like structure in such a way that an undesirable event occurs if a logical linkage of causative events is true, the causative events including malfunctions of individual functional units, the tree-like structure being self-similar.

SYSTEM AND METHOD OF DETERMINING BOOT STATUS OF RECOVERY SERVERS

Disclosed herein are systems and method for determining a boot status of a failover server. In an exemplary aspect, a method may receive a failover test request for a failover server that provides disaster recovery for a production server, wherein the failover test request queries a successful boot status of the failover server. The method may determine whether a login into the failover server can be performed to execute the failover test request. In response to determining that the login cannot be performed, the method may retrieve server metrics for a failover server from a metric store and may determine a probability of the successful boot status based on both the retrieved server metrics and historic server metrics. In response to determining that the probability is greater than a threshold probability, the method may mark a recovery point of the failover server as validated.

Method and system for model-based event-driven anomalous behavior detection

An embodiment of the present invention is directed to an optimal event-driven anomaly detection scheme. The present invention recognizes that anomaly detection based solely on rules-based (deterministic) or probabilistic analysis alone are insufficient to capture and respond to ever evolving, highly sophisticated threats that tend to persist within a system undetected for long periods of time. According to an embodiment of the present invention, a tiered detection scheme composed of behavioral analysis and machine-learned probabilistic system behaviors provides an optimal level of sensitivity to detect and respond to threats, and further limits the number of false positives identified.

Providing insight of continuous delivery pipeline using machine learning

A method, system and computer program product for detecting potential failures in completing a continuous delivery (CD) pipeline using machine learning. A CD pipeline is defined to include stages, where each stage includes a binary event(s). A model is created by applying an Apriori algorithm and a sequential pattern mining algorithm to a set of previous patterns of sequences of binary events to calculate confidence scores for completing a set of binary events in a particular order. After identifying an ongoing CD sequence (ordered set of binary events) for a software application, the model is used to predict a likelihood of the ongoing CD sequence for the software application completing the CD pipeline by generating confidence score(s) for the ongoing CD sequence. A notification is issued regarding a potential failure in completing the CD pipeline for the software application if a confidence score is below a threshold value.

Facilitating classification of equipment failure data

The subject disclosure relates to employing grouping and selection components to facilitate a grouping of failure data associated with oil and gas exploration equipment into one or more equipment failure type groups. In an example, a method comprises grouping, by a system operatively coupled to a processor, training data of a set of equipment failure data into one or more failure type groups based on one or more determined failure criteria, wherein the one or more failure type groups represent equipment failure classifications associated with energy exploration processes; and selecting, by the system, first ungrouped data from the set of equipment failure data based on a level of similarity between the first ungrouped data and the training data.

Providing insight of continuous delivery pipeline using machine learning

A method, system and computer program product for detecting potential failures in completing a continuous delivery (CD) pipeline using machine learning. A CD pipeline is defined to include stages, where each stage includes a binary event(s). A model is created by applying an Apriori algorithm and a sequential pattern mining algorithm to a set of previous patterns of sequences of binary events to calculate confidence scores for completing a set of binary events in a particular order. After identifying an ongoing CD sequence (ordered set of binary events) for a software application, the model is used to predict a likelihood of the ongoing CD sequence for the software application completing the CD pipeline by generating confidence score(s) for the ongoing CD sequence. A notification is issued regarding a potential failure in completing the CD pipeline for the software application if a confidence score is below a threshold value.

METHOD AND SYSTEM FOR VERIFYING STATE MONITOR RELIABILITY IN HYPER-CONVERGED INFRASTRUCTURE APPLIANCES
20210012238 · 2021-01-14 ·

A method and system for verifying state monitor reliability in hyper-converged infrastructure (HCI) appliances. Specifically, the method and system disclosed herein entail using a supervised machine learning modeli.e., a classification decision treeto accurately distinguish whether conflicting event notifications, logged across multiple state monitors tracking state on an HCI appliance, are directed to a real event or a non-real event. The classification decision tree, generated based at least on information gains calculated for the multiple state monitors, may reflect which state monitor(s) is/are more reliable in accurately classifying the conflicting event notifications.

Proactive failure handling in data processing systems

Embodiments are directed to predicting the health of a computer node using health report data and to proactively handling failures in computer network nodes. In an embodiment, a computer system monitors various health indicators for multiple nodes in a computer network. The computer system accesses stored health indicators that provide a health history for the computer network nodes. The computer system then generates a health status based on the monitored health indicators and the health history. The generated health status indicates the likelihood that the node will be healthy within a specified future time period. The computer system then leverages the generated health status to handle current or predicted failures. The computer system also presents the generated health status to a user or other entity.

AUTOMATIC ROOT CAUSE ANALYSIS USING TERNARY FAULT SCENARIO REPRESENTATION
20200409782 · 2020-12-31 ·

A plurality of potential fault scenarios are accessed, wherein a given potential fault scenario of the plurality of potential fault scenarios has at least one corresponding root cause, and a representation of the given potential fault scenario comprises a don't care value. An actual fault scenario from telemetry received from a monitored system is generated. The actual fault scenario is matched against the plurality of potential fault scenarios. One or more matched causes are output as one or more probable root cause failures of the monitored system.

FAILURE ANALYSIS SYSTEM FOR A DISTRIBUTED STORAGE SYSTEM
20200409810 · 2020-12-31 · ·

A failure analysis system identifies a root cause of a failure (or other health issue) in a virtualized computing environment and provides a recommendation for remediation. The failure analysis system uses a model-based reasoning (MBR) approach that involves building a model describing the relationships/dependencies of elements in the various layers of the virtualized computing environment, and the model is used by an inference engine to generate facts and rules for reasoning to identify an element in the virtualized computing environment that is causing the failure. Then, then the failure analysis system uses a decision tree analysis (DTA) approach to perform a deep diagnosis of the element, by traversing a decision tree that was generated by combining the rules for reasoning provided by the MBR approach, in conjunction with examining data collected by health monitors. The result of the DTA approach is then used to generate the recommendation for remediation.