Patent classifications
G06F11/2257
Fine-grained causal anomaly inference for complex system fault diagnosis
A computer-implemented method for diagnosing system faults by fine-grained causal anomaly inference is presented. The computer-implemented method includes identifying functional modules impacted by causal anomalies and backtracking causal anomalies in impaired functional modules by a low-rank network diffusion model. An invariant network and a broken network are inputted into the system, the invariant network and the broken network being jointly clustered to learn a degree of broken severities of different clusters as a result of fault propagations.
METHOD AND APPARATUS FOR ANOMALY DETECTION
Disclosed are various example embodiments which may be configured to: collect a measurement time-series relating to a performance indicator of a communication network resource; compute a representative vector of said measurement time-series; provide a clustering model comprising a set of clusters, wherein the clustering model has been trained on a plurality of training time-series, wherein a cluster of the set of clusters comprises partial time-series that meet a similarity condition, wherein a cluster anomaly label is associated with said cluster; select a subset of the set of clusters, wherein the subset comprises at least one cluster for which the partial time-series within the cluster meet a distance condition with the representative vector; and associate an anomaly label with the measurement time-series, wherein the anomaly label is computed as a function of the cluster anomaly label.
Diagnosing distributed virtual network malfunction
A method is provided to diagnose one or more malfunctions in a virtual network having a network functionality implemented in a distributed manner by hypervisors, each performing part of the network functionality. The method includes receiving user input of observed facts about the virtual network, where the observed facts include identifiers of virtual components in the virtual network, and adding the observed facts to a knowledge base. The method further includes, in response to the identifiers of the virtual components, querying agents on the hypervisors for collected facts related to the virtual components, receiving and adding the collected facts to the knowledge base, and applying logic rules to the knowledge base to produce one or more diagnoses of the virtual network.
SYSTEMS AND METHODS FOR PREDICTING INFORMATION HANDLING RESOURCE FAILURES USING DEEP RECURRENT NEURAL NETWORKS
In accordance with embodiments of the present disclosure, an information handling system may include a processor and a non-transitory computer-readable medium having stored thereon a program of instructions executable by the processor. The program of instructions may be configured to, when read and executed by the processor, receive telemetry data associated with one or more information handling resources, receive failure statistics associated with the one or more information handling resources, and correlate the telemetry data and the failure statistics to create training data for a pattern recognition engine configured to predict a failure status of an information handling resource from operational data associated with the information handling resource.
INTELLIGENT CONDITION MONITORING AND FAULT DIAGNOSTIC SYSTEM FOR PREVENTATIVE MAINTENANCE
A system for condition monitoring and fault diagnosis includes a data collection function that acquires time histories of selected variables for one or more of the components, a pre-processing function that calculates specified characteristics of the time histories, an analysis function for evaluating the characteristics to produce one or more hypotheses of a condition of the one or more components, and a reasoning function for determining the condition of the one or more components from the one or more hypotheses.
AUTOMATIC ROOT CAUSE ANALYSIS USING TERNARY FAULT SCENARIO REPRESENTATION
A plurality of potential fault scenarios are accessed, wherein a given potential fault scenario of the plurality of potential fault scenarios has at least one corresponding root cause, and a representation of the given potential fault scenario comprises a don't care value. An actual fault scenario from telemetry received from a monitored system is generated. The actual fault scenario is matched against the plurality of potential fault scenarios. One or more matched causes are output as one or more probable root cause failures of the monitored system.
Using Cognitive Technologies to Identify and Resolve Issues in a Distributed Infrastructure
A mechanism is provided in a data processing system for identifying and resolving issues in a distributed infrastructure. A log error processor monitors error logs of a plurality of data processing nodes within the distributed infrastructure. In response to the log error processor detecting an error in a given node within the distributed infrastructure, the log error processor provides error data for the error to the machine learning model and receiving from the machine learning model a set of potential solutions and associated confidence values. An operation extraction component extracts from each potential solution in the set of potential solutions a set of operations to resolve the error. A classifier component maps each set of operations to a set of executable operations that are executable by the given node. A solution scorer component determines whether to perform automatic resolution using a selected potential solution and its corresponding set of executable operations. In response to the solution scorer component determining to perform automatic resolution, an operation execution engine executes the corresponding set of executable operations on the given node.
Test control device, test system, and control method for device testing with predictable and reduced test times
A test control device includes a test variable generation device and a test processing device. The test variable generation device uses a test prediction model to generate a first manipulated variable based on a difference between a target value and a first controlled variable value from a device under test. The test processing device acquires a second controlled variable value from the device under based on use of the first manipulated variable value. The test variable generation device notifies the device under test of end of a test if the second controlled variable value is equal to or greater than the target value or uses the test prediction model to generate a second manipulated variable based on a difference between the target value and the second controlled variable value when the second controlled variable value is less than the target value.
PROACTIVE FAILURE HANDLING IN DATA PROCESSING SYSTEMS
Embodiments are directed to predicting the health of a computer node using health report data and to proactively handling failures in computer network nodes. In an embodiment, a computer system monitors various health indicators for multiple nodes in a computer network. The computer system accesses stored health indicators that provide a health history for the computer network nodes. The computer system then generates a health status based on the monitored health indicators and the health history. The generated health status indicates the likelihood that the node will be healthy within a specified future time period. The computer system then leverages the generated health status to handle current or predicted failures. The computer system also presents the generated health status to a user or other entity.
METHOD AND SYSTEM FOR DIAGNOSING REMAINING LIFETIME OF STORAGES IN DATA CENTER
A method and a system for diagnosing remaining lifetime of storages in a data center are disclosed. The method includes the steps of: a) sequentially and periodically collecting operating attributes of failed storages along with time-to-fail records of the failed storages in a data center; b) grouping the operating attributes collected at the same time or fallen in a continuous period of time so that each group has the same number of operating attributes; c) sequentially marking a time tag for the groups of operating attributes; d) generating a trend model of remaining lifetime of the storages from the operating attributes and time-to-fail records by ML and/or DL algorithm(s) with the groups of operating attributes and time-to-fail records fed according to the order of the time tags; and e) inputting a set of operating attributes of a currently operating storage into the trend model to calculate a remaining lifetime therefor.