Patent classifications
G06F11/0769
METHOD AND SYSTEM FOR ENHANCING ORCHESTRATION AND AUTOMATING COMMUNICATION BETWEEN TEAMS DURING IT SYSTEMS TESTING
An apparatus includes circuitry configured to maintain a record of a plurality of owners and at least one test operation owned by an owner; prompt automatically the owner in response to a failure of the one test operation; maintain a log of actions taken on the one test operation, and provide availability to the log of actions; update an estimated time to completion, and notify a management entity of the updated estimated time to completion; mark and prioritize an order related to the one test operation, in response to the estimated time to completion being within a threshold of a delivery date; rank the marked order with other marked orders by a risk of not being able to meet the delivery date; and notify the owner of the ranking with an urgent message, in response to the marked order failing to meet the delivery date.
Inference-based incident detection and reporting
Systems and methods that may be used to provide an inference-based incident detection and reporting process. In one or more embodiments, an inference model may be used to track and analyze changes and incident data from various sources of record to quickly predict potential sources of errors and help focus in on the root cause of a software incident.
Detecting datacenter mass outage with near real-time/offline using ml models
The present embodiments relate to data center outage detection and alert generation. An outage detection service as described herein can process near real-time data from various sources in a datacenter and process the data using a model to determine one or more projected sources of a detected outage. The model as described herein can include one or more machine learning models incorporating a series of rules to process near-real time data and offline data and determine one or more projected sources of an outage. An alert message can be generated to provide the projected sources of the outage and other data relevant to the outage.
Real time application error identification and mitigation
A method may be performed by a network server and includes: applying historical raw user data as an input to a first application running in a simulation environment; receiving activity log entries from the first application in the simulation environment; identifying a first group of the activity log entries corresponding to an error of the first application; saving an error entry in non-volatile storage media in response to identifying the first group of the activity log entries, the error entry including the first group of the activity log entries and an identification of action to be taken in response to the error of the first application; and causing a monitoring system to generate a first alert corresponding to the error entry.
Capturing referenced information in a report to resolve a computer problem
Embodiments relate to capturing referenced information in a report to resolve a computer problem. A report for a problem is determined to contain at least one referenced document, the report being generated based on at least one log. Relevant content is determined in the at least one referenced document based at least in part on hint information associated with the at least one referenced document and a term in the report. An enhanced report is generated for the problem of the computer equipment by inserting the relevant content into the report. A problem experienced by computer equipment is resolved by causing execution of at least one operation identified in the enhanced report.
METHOD AND SYSTEM FOR MANAGING NOTIFICATIONS FOR FLAPPING INCIDENTS
Techniques for suppressing notifications are disclosed. An incident may repeatedly flap between various resolved and unresolved states. Furthermore, other incident attributes may flap between various states such as, for example, varying levels of incident severity. Each change in state results in the transmission of a notification. In order to reduce the number of notifications, the system estimates a flapping window for the incident based on the flapping behavior of prior incidents. The system computes a notification suppression window based at least in part on the estimated flapping window. The system suppresses notifications corresponding to changes in incident state that are detected during the notification suppression window. The notification suppression window may be extended in response to extending the estimated flapping window.
DYNAMIC CLOUD BASED ALERT AND THRESHOLD GENERATION
Embodiments monitor for faults in a cloud based network for a plurality of features comprising an application and dependent features. Embodiments generate a graphical representation of the plurality of features comprising a plurality of nodes and corresponding relationships between the nodes, each node corresponding to one of the plurality of features. Embodiments monitor for events for the plurality of features, the events corresponding to one or more of the nodes, to generate monitored events. Embodiments populate a graph database with the monitored events and classify each of the nodes with a trained graph neural network (“GNN”), the classification comprising a prediction of a failure of at least one node. Based on the classifying, for a failure node corresponding to the prediction, embodiments generate a new alert for the failure node or revise a threshold for an existing alert for the failure node.
METHOD AND SYSTEM FOR DETECTING ERRORS IN LOCAL AREA NETWORK
Novel tools and techniques are provided for implementing error detection in a network, and, more particularly, to methods, systems, and apparatuses for implementing error and/or fault detection in a network and/or media stream and providing options to address the error and/or fault in the network and/or media stream. In various embodiments, a computer might detect an error in a first network and send a notification indicating that the error has occurred. The notification might contain one or more options to address the error in the first network. The computer, a user device, a service provider device, or a content provider device might receive and display the notification containing the one or more options. The computer, the user device, the service provider device, or the content provider device might then select at least one of the one or more options to address the error in the first network.
AUTONOMOUS RELEASE MANAGEMENT IN DISTRIBUTED COMPUTING SYSTEMS
Implementations described herein relate to methods, systems, and computer-readable media to provide an alert based on a release of a software application implemented in a distributed computing system. In some implementations, the method includes receiving, at a processor, an indication of the release of the software application, obtaining a first set of metric values for each metric of a list of metrics for a first time period preceding a time of release of the release, obtaining a second set of metric values for each metric of the list of metrics for a second time period following the time of release, comparing the first set of metric values to the second set of metric values to determine a deviation score, generating an alert based on the deviation score, and transmitting the alert via one of a user interface and a communication channel.
MONITORING OF AN AUTOMATED END-TO-END CRASH ANALYSIS SYSTEM
A computer-implemented method for monitoring a crash analysis system is disclosed. Log messages are accessed pertaining to the operation of a crash analysis system for analyzing a core dump. The log messages are analyzed, at a processor, in order to generate operation results data. A graphic user interface for display on a computer is generated. The graphic user interface includes a graphical representation of the operation results data.