G06F11/0781

LINKING RELATED EVENTS FOR VARIOUS DEVICES AND SERVICES IN COMPUTER LOG FILES ON A CENTRALIZED SERVER

A system with an interactive user interface for users to view and interact with sanitized log data received from a plurality of hosts, such as those associated with various services of an organization. The system may receive from hosts log files and/or metadata that have been filtered by agents executing on the respective hosts to remove or anonymize any sensitive or confidential information prior to transmission to the system. In some embodiments the system does further filtering of the sanitized data. Received sanitized data is parsed, indexed, and/or otherwise processed for optimal searching, and stored in a log pipeline. The system causes display of an electronic visualization interface.

SYSTEM AND METHOD FOR MANAGING FAULTS IN INTEGRATED CIRCUITS
20230185656 · 2023-06-15 ·

An integrated circuit includes a functional circuit, a detection circuit, a processing circuit, and a recovery circuit. The detection circuit detects a fault in the functional circuit and generates a fault indication indicative of the detected fault. The processing circuit receives the fault indication and identifies a functional domain identifier (ID) associated with the fault. Based on the fault indication, the processing circuit generates context tag data that is indicative of a type of the fault and an operational state of the functional circuit when the fault is detected therein. Further, the processing circuit assigns a priority level to the fault based on the context tag data and the functional domain ID. The recovery circuit performs, based on the functional domain ID, the context tag data, and the first priority level, a recovery operation to recover the functional circuit from the fault.

METHOD FOR MANAGING A PLURALITY OF EVENTS

Event management system and method. Events comprise physical and logical attributes. Tuples are created to identify a set of logical attributes. The tuples are arranged in hierarchized relations by creating binarized co-occurrence matrices, each co-occurrence matrix reflecting different time intervals and indicate occurrence of tuples in time windows of the time intervals. Tuple pairs are analyzed to determine probabilistic score related to co-occurrence, and tuple families are created from tuple pairs based on the probabilistic score. From tuple families, events are used to extract tuple instances including physical attributes, which are arranged as tuple-instance families using the corresponding tuple families as reference.

Correctable Error Filtering for Input/Output Subsystem
20170344419 · 2017-11-30 ·

A switched fabric hierarchy (e.g., a PCIe hierarchy) may utilize hardware, firmware, and/or software for filtering duplicative or otherwise undesirable correctable error messages from reaching a root complex. An operating system of the root complex may detect a persistent stream or storm of correctable errors from a particular endpoint and activate filtering of correctable errors from that endpoint. A filtering device may receive filtering commands and parameters from the operating system, implement the filtering, and monitor further correctable errors from the offending device. While an offending device is being filtered, correctable error messages from the offending device may be masked from the operating system, while correctable error messages from other devices in the switched fabric hierarchy may be transmitted. At such time as the filtering device may detect that conditions for ending filtering of a device are met, the filtering device may cease filtering of the offending device and return monitoring responsibilities to the operating system.

HEALTH INDEX OF A SERVICE

According to implementations of the subject matter described herein, there is provided a solution of providing a health index of a service. In this solution, a plurality of incident information sets associated with a plurality of services are obtained. The plurality of services are provisioned in a computing environment. An incident information set indicates at least one incident reported during operation of a service. Respective health indices are determined for the plurality of services based on respective ones of the plurality of incident information sets and a health classification policy. The respective health indices indicate respective health statuses of the plurality of services and being determined from a same health index range. Through unified use of incident information, the determined health indices can indicate universal and consistent health statuses for different services.

DETECTION AND RECOVERY OF DEADLOCKS IN DATABASE MANAGEMENT SYSTEM

A method for detecting a deadlock of a database management system and recovering from the deadlock includes: updating data stored in a first server and a second server; determining whether the database management system is in a deadlock state by comparing data of the first server with data of the second server; and recovering from the deadlock state by changing data of the first server or the second server so that the data of the first server and the second server are identical.

Digital input edge detection with smart filtering algorithm

A method is provided that provides data analysis for sequence of events reporting in the operation of an industrial process. A digital filter and edge detector are provided that combines a method for excluding known invalid samples and a method for excluding samples taken while the input in the traveling range. The filtering method reduces the overhead on the CPU from managing the sequence of events machine and allows it to focus on performing safety functions.

Automated detection and classification of dynamic service outages

Systems and methods for automatically detecting and mitigating errors in a cloud computing environment. One example method includes receiving, from a telemetry server, telemetry data for the cloud computing environment, detecting an error within the cloud computing environment based on the telemetry data, determining an error type for the error based on the telemetry data, determining an impact severity for the error based on the telemetry data, and when the error type is a reportable error type and the impact severity exceeds a predetermined threshold, performing a mitigation action.

OPTIMIZED ALARM STATE RESTORATION THROUGH CATEGORIZATION
20230168966 · 2023-06-01 ·

The disclosure provides an approach for alarm state restoration. Embodiments include determining a plurality of alarm definitions applicable to an inventory of a plurality of entities in a computing environment. Embodiments include assigning each given alarm definition of the plurality of alarm definitions to a given alarm category of a plurality of alarm categories. Embodiments include restoring declared states of the plurality of alarms definition on the inventory based on the assigning, wherein the restoring comprises, for each given alarm category of the plurality of alarm categories, performing a single traversal of the inventory to identify all respective entities of the plurality of entities that correspond to one or more alarm definitions assigned to the given alarm category.

Generating targeted analysis results in a support system

Generating targeted analysis results in a support system is disclosed. Computing device information about a supported computing device in which a problem was encountered is received. Identification of a designated subset of tags of a plurality of tags is received. A plurality of analyzers is initiated to analyze the computing device information. The plurality of analyzers is configured to generate a plurality of results based on the computing device information. Tags of the plurality of tags are associated with the plurality of results. A subset of results are generated by discarding each result from the plurality of results that have no associated tags that match a tag in the designated subset of tags. The subset of results is output.