G06F11/2263

Machine defect prediction based on a signature

Methods, system, and computer readable medium are presented for predicting defects using a machine learning component based on a generated signature. A trained machine learning component that has been trained with historic data that represents a series of events that occurred within a plurality of heterogeneous systems over a plurality of periods of change for the heterogeneous systems can be received. A base signature for a first heterogeneous system that includes a first mix of modules can be compared to a current signature for the first heterogeneous system to identify one or more irregularities. The trained machine learning component can predict one or more defects for the first heterogeneous system based on the identified irregularity.

Facilitating detection of anomalies in data center telemetry

Facilitating detection of anomalies of a target entity is provided herein. A system can comprise a processor and a memory that stores executable instructions that, when executed by the processor, facilitate performance of operations. The operations can comprise training a model on a first set of variables that are constrained by a second set of variables. The second set of variables can characterize elements of a defined entity. The first set of variables can define a normality of the defined entity. The operations also can comprise employing the model to identify expected parameters and unexpected parameters associated with the defined entity to at least a defined level of confidence.

USING AN EVENT GRAPH SCHEMA FOR ROOT CAUSE IDENTIFICATION AND EVENT CLASSIFICATION IN SYSTEM MONITORING
20230267032 · 2023-08-24 ·

An event graph schema for a technology landscape may be determined, where the technology landscape is characterized using scores assigned to performance metrics. The event graph schema may include a plurality of nodes corresponding to the performance metrics and the scores, and directional edges connecting node pairs of the plurality of nodes, with each directional edge having a score-dependent validity criteria defined by scores of a corresponding node pair. Anomalous scores associated with an event within the technology landscape may be used to find anomalous nodes. Valid edges connecting two of the anomalous nodes and satisfying the score-dependent validity criteria thereof may be used to determine at least one path that includes the valid edges and connected anomalous nodes. In this way, it is possible to traverse the at least one path to identify at least one of the connected anomalous nodes as a root cause node of the event.

Disaster prediction recovery: statistical content based filter for software as a service

A method is provided to predict a disaster for a computer system based on logs. The method includes representing existing logs as first vectors by tokenizing the existing logs and partitioning the first vectors into clusters. The clusters represent disaster types. The method further includes selecting representative vectors for the clusters, representing a new log of the computer system as a second vector by tokenizing the new log, matching the second vector to a cluster by comparing the second vector and the representative vectors, and categorizing the new log as a disaster type represented by the cluster.

System and method for automating fault detection in multi-tenant environments

A system and method of automating fault prediction and remediation for a multi-tenant system is disclosed. The system and method offer an intelligent augmentation of a multi-tenant system by automating the harvesting and processing of raw data logs generated by the various aspects of the platform as well as the implementation of an appropriate response. In some embodiments, the proposed system includes a hybrid model that can be configured to offer both assisted and unassisted errors. The incorporation of a dynamic learning algorithm minimizes operation errors for any set of computing units. Potential system faults can be detected with little to no human intervention and allow for an unattended platform that collects performance data across the system from a wide range of sources to provide centralized and automated fault prediction, as well as expedited automated resolutions to such faults that depend on little to no human intervention.

SYNTHETIC SYSTEM FAULT GENERATION
20220121540 · 2022-04-21 ·

Systems, computer-implemented methods, and computer program products that facilitate synthetic system fault generation are provided. According to an embodiment, a system can comprise a processor that executes the following computer-executable components stored in a non-transitory computer readable medium: a generator component that employs a trained artificial intelligence (AI) model to generate a synthetic system fault, represented as a combination of discrete parameters and continuous parameters that define a system state; and a fault assembler component that analyzes the synthetic system fault and generates textual content corresponding to the synthetic system fault.

Facilitating detection of anomalies in data center telemetry

Facilitating detection of anomalies of a target entity is provided herein. A system can comprise a processor and a memory that stores executable instructions that, when executed by the processor, facilitate performance of operations. The operations can comprise training a model on a first set of variables that are constrained by a second set of variables. The second set of variables can characterize elements of a defined entity. The first set of variables can define a normality of the defined entity. The operations also can comprise employing the model to identify expected parameters and unexpected parameters associated with the defined entity to at least a defined level of confidence.

Automated Methods and Systems for Managing Problem Instances of Applications in a Distributed Computing Facility

Methods and systems described herein automate troubleshooting a problem in execution of an application in a distributed computing. Methods and systems learn interesting patterns in problem instances over time. The problem instances are displayed in a graphical user interface (“GUI”) that enables a user to assign a problem type label to each historical problem instance. A machine learning model is trained to predict problem types in executing the application based on the historical problem instances and associated problem types. In response to detecting a run-time problem instance in the execution of the application. the machine learning model is used to determine one or more problem types associated with the run-time problem instance. The one or more problem types are rank-ordered and a recommendation may be generated to correct the run-time problem instance based on the highest ranked problem type.

COMPUTER-IMPLEMENTED METHOD FOR TESTING A TECHNICAL SYSTEM
20210365338 · 2021-11-25 ·

A computer-implemented method for testing a technical system, in particular software, hardware, or an embedded system, in real time. The technical system encompasses a plurality of in particular technical components. The technical system is represented by a fuzzy fault tree topology A.sub.ki. Starting from a fuzzy top event X.sub.k for determining priorities of base events, the following steps are carried out: providing a fuzzy membership function matrix W.sub.i.sup.λ of the base events, where λ=1, and carrying out an iterative process, each iteration λ, where λ=1, 2, 3, . . . , n, encompassing the following steps: determining an auxiliary matrix C.sub.ki, taking into account the fuzzy top event X.sub.k, the fuzzy fault tree topology A.sub.id, and the fuzzy membership function matrix W.sub.i.sup.λ, using an iterative algorithm, and determining (the fuzzy membership function matrix W.sub.i.sup.λ+1 based on the auxiliary matrix C.sub.ki, using a maximum likelihood method.

AUTOMATED SYSTEM FOR INTELLIGENT ERROR CORRECTION WITHIN AN ELECTRONIC BLOCKCHAIN LEDGER

A system for automated and intelligent error correction within an electronic blockchain ledger is provided. The system may analyze unformatted/unstructured blockchain event logs using machine learning algorithms in order to identify and label the errors within the event logs. Based on the identified errors, the system may use predictive analysis in conjunction with error or rule repositories and/or machine learning to identify potential solutions to the identified errors. Once the potential solutions have been identified, the system may automatically attempt to rectify the blockchain transaction errors using the potential solutions. The system may further comprise trend/correlation analyses and reporting functions regarding various metrics and may output said metrics in various accessible formats.