Patent classifications
G06F11/0715
Application logging mechanism
A system to facilitate application logging is described. The system includes a processor and a machine readable medium storing instructions that, when executed, cause the processor to record a system state, perform application logging at a first logging rate, record an occurrence of task failures during the logging, determine a predicted queue size threshold value based on the recorded occurrence of task failures, determine whether that the predicted queue size threshold value is less than an actual queue size and perform the application logging at a second logging rate upon a determination that the predicted queue size threshold value is less than an actual queue size, wherein the second logging rate is greater than the first logging rate.
ELECTRONIC SYSTEM FOR MONITORING AND AUTOMATICALLY CONTROLLING BATCH PROCESSING
Systems, computer program products, and methods are described herein for monitoring and automatically controlling batch processing. The present invention may be configured to receive a plurality of data processing requests and determine a processing plan for the plurality of data processing requests. The present invention may be configured to provide, to processing applications and based on the processing plan, actions for performance by the processing applications to complete the plurality of data processing requests. The present invention may be configured to determine a state of the plurality of data processing requests, determine, using an event state decision machine learning model, remedial actions to resolve an error state, and provide instructions to the processing applications to perform the remedial actions.
AUTO-RECOVERY FRAMEWORK
The present disclosure relates to computer-implemented methods, software, and systems for an automatic recovery job execution through a scheduling framework in a cloud environment. One or more recovery jobs are scheduled to be performed periodically for one or more registered service components included in a service instance running on a cluster node of a cloud platform. Each recovery job is associated with a corresponding service component of the service instance. A health check operation is invoked at a service component based on executing a recovery job at the scheduling framework corresponding to the service component. In response to determining that the service component needs a recovery measure based on a result from the health check operation, a recovery operation is invoked as part of executing a set of scheduled routines of the recovery job. Implemented logic for the recovery operation is stored and executed at the service component.
DETECTING CHANGES IN APPLICATION BEHAVIOR USING ANOMALY CORROBORATION
Unexpected changes in application behavior are detected using time series telemetry in a manner that minimizes the upfront knowledge needed about the time series measurements themselves. A univariate analysis is performed on a set of operational metrics to identify anomalous signals indicative of potential anomalous events. A multivariate analysis is then performed to corroborate anomalous signals across groupings of metrics and to thereby determine that the anomalous signals correspond to an anomalous event. A boundary of the anomalous event is determined and the anomalous event is characterized based on a clustering of previously identified incidents. One or more participating entities are then determined for the anomalous event and a narrative description of the anomalous event is generated and presented to an end user. The narrative description identifies, among other things, the characterization of the anomalous event, a predicted cause of the anomalous event, and the participating entities.
Application crash analysis techniques when memory dump and debug symbols are not co-located
Distributed architectures that can generate a crash report without concurrent access to debug symbols and a memory dump. Besides grabbing generic information from a memory dump, which most debuggers provide (e.g., call stack traces for all threads), developer-provided scripts can grab application-specific details such as tables of open protocol client connections and so-forth. This extraction can take place at a customer site where a crash occurred and can proceed without the use of debug symbols. The extraction can generate a crash report which can be sent back from the field. Once in the hands of engineering (e.g., at a provider site where debug symbols are extant but the memory dump is not), debug symbols can then be used in an annotation phase to fill in missing details (e.g., the precise source files and line numbers corresponding to call stack traces) without access to the original crash dump.
Automated crash recovery
Methods for improving operation of a user device running an application. The methods include collecting a first set of data corresponding to a run time environment of the application, collecting a second set of data corresponding to a crash of the application, identifying a cause of the crash based on the first set of data and a second set of data and determining the cause of the crash is associated with an application feature corresponding to a feature flag.
Grouping anomalous components of a distributed application
Systems, devices, computer-implemented methods, and/or computer program products that facilitate fault localization for distributed applications. In one example, a system can comprise a process that executes computer executable components stored in memory. The computer executable components can comprise a graph compiler and a partitioning component. The graph compiler can generate a hypernode graph for a distributed application using causality data inferred from log data generated by services emitting errors. The distributed application can comprise the services. The partitioning component can cluster the services into a plurality of groups using the hypernode graph and an objective function.
AUTOMATED SUPPORT QUERY
Support queries are automatically provided. An error is identified during execution of an application. A support query for the error is produced using the source code for the application with the support query including context information for the error. The support query is displayed to a user.
REDUCING FILE SYSTEM CONSISTENCY CHECK DOWNTIME
Provided is a method for performing a file system consistency check. The method comprises calculating, by a first thread that does not have access to an inode table, file block addresses for one or more files to be checked by the thread. The method further comprises collecting validity information for the one or more files. The method further comprises reading information relating to the one or more files from the inode table. The reading is performed in response to the thread being given access to the inode table after the calculating operation. The method further comprises validating the information by comparing the information from the inode table to the validity information.
Dynamic asynchronous communication management
A method for performing asynchronous communication includes executing a thread, by a computing system, the thread comprising an instruction to send a message to a remote system under a current set of conditions. The method further includes waiting, by the computing system for a response to the message for a predetermined amount of time, wherein the predetermined amount of time is calculated based on historical data, wherein the historical data comprises one or more response times for respective one or more messages, wherein the one or more messages are associated with a set of conditions that correspond to the current set of conditions.