G06F11/0781

Data processing platform monitoring

A device may receive error data indicating that an error occurred, the error being associated with a data processing job scheduled to be performed by a data processing platform. The device may identify input data for the data processing job associated with the error and determine that the error is based on the data processing platform not receiving the input data. In addition, the device may determine a location of the input data and determine a measure of priority associated with the data processing job. Based on the location of the input data and the measure of priority, the device may perform an action to correct the error.

System and method for dynamic log management of stream processing in a distributed environment

A system and method for dynamic log management of stream processing in a distributed computing environment, such as, for example, a streaming application or stream analytics system. A streaming application can be deployed or published to a cluster, to execute as a client application. A cluster manager coordinates with worker nodes, to commit tasks associated with the streaming application. If a need arises to generate lower-level log data associated with the streaming application, for example to diagnose an underlying cause of a warning/error message, a configuration job can be committed to the cluster to execute as a separate log-configuration application. The log-configuration application operates with the cluster manager to determine the set of working nodes currently associated with the streaming application, and modify the logger configuration at those nodes, to record or otherwise provide log data according to a modified logging level, for example to provide lower-level log messages.

POLICY-BASED LOGGING USING WORKLOAD PROFILES
20220398151 · 2022-12-15 ·

Examples described herein relate to policy-based logging using workload profiles. A workload profile of a first workload is identified. Based on the workload profile, a logging policy, which includes a predefined log pattern and a predefined log depth, is compiled. Workload log messages of a predetermined log level that are associated with the first workload are stored in a cache memory. On detecting the predefined log pattern in the workload log messages stored in the cache memory, the workload log messages are logged to a log file based on the predefined log depth.

Method and system for determining maturity level of a cloud computing service

Systems and methods are disclosed that evaluate maturity or readiness of services of a cloud infrastructure client instance. Service maturity may be defined based on certain predetermined maturity criteria, which may evaluate certain corresponding service parameters of the service, such as a software component status of a service, a hardware component status of the service, and/or an error status of the service. In some embodiments, the definition of service maturity may be customized by a user. For example, the user may adjust, add, or remove the maturity criteria. The systems and methods may determine the service parameters of the service, and determine the maturity of the service by applying the maturity criteria to the determined service parameters. The service maturity may then be output (e.g., displayed or sent) to the user.

SYSTEM AND METHOD FOR ERROR REPORTING AND HANDLING

A system for error reporting and handling includes a memory storing an error handler, a processor configured to execute the error handler, and a buffer. The error handler is configured to receive an error message from a system on chip (SOC) platform. The error message indicates a plurality of errors have occurred in the SOC platform, and the buffer stores the error message. Further, the error handler is configured to report the error message using a single interrupt. Further, the error handler is configured to handle the errors in the error message using the single interrupt.

Selective data synchronization to troubleshoot production environment failures

In a production environment, an entry is logged in a log journal to represent a read operation on a record of a file. A problematic transaction in a batch job is selected and set of operations performed by the problematic transaction is intercepted. The set includes a combination of read, write, update, delete operations, and operations that use external resources. A window of entries is determined in the log journal where the entry is a beginning entry in the window and an ending entry is a last entry captured in the log journal before detecting the error. From the window, those entries are filtered that correspond to the record. A final value of the record is copied from the production environment to a development environment and rolled back in the development environment to an initial value stored in the beginning entry. The problematic transaction is performed in the development environment.

SELF-OPTIMIZING CONTEXT-AWARE PROBLEM IDENTIFICATION FROM INFORMATION TECHNOLOGY INCIDENT REPORTS
20230100716 · 2023-03-30 ·

Information technology service management (ITSM) incident reports are converted from textual data to multiple vectors using an encoder and parameters are selected, where the parameters include a base cluster number and a threshold value. A base group of clusters is generated using an unsupervised machine learning clustering algorithm with the vectors and the parameters as input. A cluster quality score is computed for each of the base group of clusters. Each cluster from the base group of clusters with the cluster quality score above the threshold value is recursively split into new clusters until the cluster quality score for each cluster in the new clusters is below the threshold value. A final group of clusters is output, where each cluster from the final group of clusters represents ITSM incident reports related to a same problem.

Aggregation analysis and remediation of data invalidations

The present disclosure relates to processing operations that assess the impact of data invalidations and manage remediation of the data invalidations based on results of an assessment of the impact of the data invalidation on operation of an application/service. Identified data invalidations may be aggregated and analyzed. In one non-limiting example, types of data invalidations are aggregated over a temporal count to identify recent data invalidations. Analysis of aggregated types of data invalidations comprises evaluating an intensity of the types of data invalidations identified within the temporal count. Identified data invalidations may be ranked based on intensity analysis identifying impact on presentation of content through an application/service during the temporal count. Remediation of data invalidations may be managed based on the ranking processing. For example, one or more data invalidations may be prioritized for remediation processing to correct an underlying data structure associated with an error.

DETECTING SYSTEM EVENTS BASED ON USER SENTIMENT IN SOCIAL MEDIA MESSAGES

Methods and systems are disclosed herein for using anomaly detection in timeseries data of user sentiment to detect incidents in computing systems and identify events within an enterprise. An anomaly detection system may receive social media messages that include a timestamp indicating when each message was published. The system may generate sentiment identifiers for the social media messages. The sentiment identifiers and timestamps associated with the social media messages may be used to generate a timeseries dataset for each type of sentiment identifier. The timeseries datasets may be input into an anomaly detection model to determine whether an anomaly has occurred. The system may retrieve textual data from the social media messages associated with the detected anomaly and may use the text to determine a computing system or event associated with the detected anomaly.

SYSTEMS AND METHODS FOR IDENTIFICATION OF ISSUE RESOLUTIONS USING COLLABORATIVE FILTERING

Systems and methods are provided for providing recommended resolutions to issues occurring during operation of an Information Handling System (IHS). Similar issues, each associated with resolution information specifying one or more recommended resolutions, are identified within an issue repository. Collaborative filtering is utilized on the similar issues and associated resolution information to generate a first ranked list. Collaborative filtering is also utilized on a subset of the similar issues and associated resolution information to generate a second ranked list of the subset of similar issues and associated resolution information, where the subset of similar issues are each selected based on being within a time proximity to the reported issue. A linear regression is utilized on the first ranked list and the second ranked list to generate a ranked list of recommended resolutions to the reported issue that is then provided to an administrator.