G06F11/079

Method for identifying and evaluating common cause failures of system components

Provided is a method and system for identifying and evaluating common cause failures of system components, wherein at least one analytical artifact and machine readable system related to at least one of spatial, topological data and machine readable system related lifecycle data are processed to analyze automatically a susceptibility of system components to common cause failure based on common cause failure influencing factors.

Method of detecting faults in a fault tolerant distributed computing network system

The present disclosure provides methods for detecting faults in a distributed computing network system. The method includes receiving, from a management services, authority information identifying peer computing devices of a distributed computing network system. For each respective peer computing device, a first message comprising a first instance of a dataset and a second message comprising a second instance of the dataset are received. Where the first peer computing device and the second peer computing device have authority over the data set, it is determined whether the first instance of the dataset matches the second instance of the dataset. Where the first instance of the dataset does not match the second instance of the dataset, a fault message is sent to the management services indicating that a fault has been detected at the first peer computing device.

Method of monitoring closed system, apparatus thereof and monitoring device

A method of monitoring a closed system, an apparatus thereof and a monitoring device are provided. The method of monitoring the closed system includes: performing a page capturing on a web page of the closed system; searching from a captured page, according to configuration information of data to be monitored of the closed system, a text content corresponding to the data to be monitored; and converting the text content corresponding to the data to be monitored into monitored data which a system monitoring platform is capable of recognizing, and storing the monitored data.

Network system fault resolution via a machine learning model

Disclosed are embodiments for automatically resolving faults in a complex network system. Some embodiments monitor one or more of system operational parameter values and message exchanges between network components. A machine learning model detects a fault in the complex network system, and an action is selected based on a cause of the fault. After the action is applied to the complex network system, additional monitoring is performed to either determine the fault has been resolved or additional actions are to be applied to further resolve the fault.

METHOD AND SYSTEM FOR PROVIDING SOLUTIONS TO A HARDWARE COMPONENT FAILURE USING A CONTEXT-AWARE SEARCH

In general, embodiments relate to a method for providing solutions to hardware component failures, comprising: creating a device state chain using a device state path, a current device state, and a next device state for a device; identifying root cause of a hardware component failure in the device using the device state chain; performing a context-aware search in a shared storage using the root cause of the hardware component failure; and obtaining, in response to the context-aware search, a result specifying a proposed solution for the hardware component failure.

MONITORING DEVICE AND MONITORING METHOD
20230236924 · 2023-07-27 ·

According to one embodiment, there is provided a monitoring device for a terminal including a connection unit to which at least one device is able to be connected and mounted with a container file storing application software for controlling a business to be performed by using the device connected to the connection unit. The monitoring device includes a detection unit, an acquisition unit, and a processing unit. The detection unit detects an abnormality in the terminal. The acquisition unit acquires the container file mounted on the terminal in which the abnormality is detected. The processing unit installs the acquired container file on an alternative terminal.

INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING METHOD, AND PROGRAM
20230236921 · 2023-07-27 · ·

An information processing system, an information processing method, and a program that can examine an abnormal event based on abnormal time-series data including a plurality of signal data pieces are provided. An information processing device includes an acquisition unit configured to acquire a plurality of time-series data pieces, each time-series data piece including a plurality of signal data pieces, and the plurality of time-series data pieces including a plurality of normal time-series data pieces and abnormal time-series data, and an examination unit configured to determine for each signal data piece whether the abnormal time-series data is similar to each normal time-series data piece in each time section and examine an abnormal event based on a result of the determination.

Intelligent network operation platform for network fault mitigation

Embodiments of the present disclosure provide systems, methods, and computer-readable storage media that leverage artificial intelligence and machine learning to identify, diagnose, and mitigate occurrences of network faults or incidents within a network. Historical network incidents may be used to generate a model that may be used to evaluate real-time occurring network incidents, such as to identify a cause of the network incident. Clustering algorithms may be used to identify portions of the model that share similarities with a network incident and then actions taken to resolve similar network incidents in the past may be identified and proposed as candidate actions that may be executed to resolve the cause of the network incident. Execution of the candidate actions may be performed under control of a user or automatically based on execution criteria and the configuration of the fault mitigation system.

Efficient Fault Prevention and Repair in Complex Systems
20230021610 · 2023-01-26 ·

A method of supervising a complex system includes acquiring and storing failures data and repair resources information regarding the complex system, identifying failure networks and structures of the complex system. Failure types associated with the failure networks of the complex system are determined. The method includes generating a plurality of failure prevention and repair (FPR) sequences, wherein each FPR is associated with the failure networks and the failure types. The generated FPR sequences are analyzed to select a set of FPR sequences and associated repair resources. The method further comprises applying the selected one of the plurality of failure prevention and repair sequences to the complex system, thereby managing the complex system.

Failure Prediction In Distributed Environments
20230023646 · 2023-01-26 ·

Embodiments of the invention are directed to systems, method, and devices for detecting failures in distributed systems. A failure detection platform may identify anomalies in time series data, the time series data corresponding to historical network messages. The anomalies can be labeled and used to train a first predictive model. At least one other model may be trained using the time series data, the anomaly labels and a supervised machine-learning algorithm. A third model can be trained to identify a system failure based at least in part on the outputs provided by the first and the second model. The third model, once trained, can be utilized to predict a future system failure.