G06F11/0766

Monitoring System, Monitoring Apparatus, and Monitoring Method
20230136749 · 2023-05-04 ·

A monitoring apparatus includes: an input unit that accepts equipment identifying information for identifying monitoring object equipment and an analysis purpose; and an output unit that, when analysis item data corresponding to the analysis purpose among operation data indicating an operating status of the identified monitoring object equipment is a value within a specified range on the basis of the equipment identifying information and the analysis purpose which have been input, outputs a tendency of the analysis item data compared with predetermined comparison item data. According to the above-described configuration, the status of the monitoring object equipment can be diagnosed and visualized for the sake of the condition based maintenance to perform maintenance in accordance with the status of each equipment.

Predictive batch job failure detection and remediation

Systems, methods, and computer programming products for predicting, preventing and remediating failures of batch jobs being executed and/or queued for processing at future scheduled time. Batch job parameters, messages and system logs are stored in knowledge bases and/or inputted into AI models for analysis. Using predictive analytics and/or machine learning, batch job failures are predicted before the failures occur. Mappings of processes used by each batch job, historical data from previous batch jobs and data identifying the success or failure thereof, builds an archive that can be refined over time through active learning feedback and AI modeling to predictively recommend actions that have historically prevented or remediated failures from occurring. Recommended actions are reported to the system administrator or automatically applied. As job failures occur over time, mappings of the current system log to logs for the unsuccessful batch jobs help the root cause analysis becomes simpler and more automated.

FAULT DIAGNOSIS IN A CONTINUOUS INK JET PRINTER
20230202169 · 2023-06-29 ·

A system (93) monitors the break-up phase of an electrostatic deflection continuous ink jet printer to identify phase instability likely to be caused by a partial blockage of the jet-forming device (17). It ignores alternations between adjacent phase positions and brief unrepeated periods of phase disruption. Preferably it ignores phase changes caused by changes in other operational parameters of the printer, such as variations in ink pressure. Monitoring may be done in the printer (99) or in an external system (93), (95).

INTENT-BASED INTERACTION WITH CLUSTER RESOURCES

Aspects extend to methods, systems, and computer program products for intent-based interactions with cluster resources. One or more computer systems are joined in a computer system cluster to provide defined computing functionality (e.g., storage, compute, network, etc.) to an external system. In one aspect, a data collection intent facilitates collection and aggregation of data to form a health report for one or more components of the computer system cluster. In another aspect, a command intent facilitates implementing a command at one or more components of the computer system cluster. Services span machines of the computer system cluster to abstract lower level aspects of data collection and aggregation and command implementation for higher level aspects of data collection and aggregation and command implementation. Services can be integrated into an operating system to relieve users from having to have operating system knowledge.

Computer system and method of identifying a failure

A computer system for realizing increased speed of identifying extent of a failure in a messaging system, provided with: a first computer including a message receiving part, a first log output part, and a first memory part configured to store receiving log data; a second computer including a data store management part configured to manage a data store, a first search part configured to search a message that meets a given condition from among messages stored in the data store, a second log output part, and a second memory part configured to store data store log data; a third computer including a message sending part, a third log output part, and a third memory part configured to store sending log data; and a fourth computer including a monitoring part, a log collecting part, and a second search part configured to search for lost message.

Detecting and recovering from fatal storage errors

The present disclosure relates to systems, methods, and computer readable media for identifying and responding to a panic condition on a storage system on a computing node. For example, systems disclosed herein may include establishing recovery instructions between a host system and a storage system in responding to a future instance of a panic condition. The storage system may provide an indication of a self-detected panic condition in a variety of ways. In response to identifying the panic condition, the host system may perform one or more recovery actions in accordance with recovery instructions accessible to the host system. This may include performing resets of specific components and reinitializing communication between the host system and storage system in less invasive ways than slower and more expensive conventional approaches for responding to panic conditions on computing nodes.

Multiprocessor system
09846666 · 2017-12-19 · ·

The present invention realizes a functional safety of a multiprocessor system without tightly coupling processor elements. When causing a plurality of processor elements to execute the same data processing and realizing a functional safety of the processor element, there is adopted a bus interface unit that performs control of performing safety measure processing when the non-coincidence of access requests issued from the processor elements has been fixed, and of starting access processing responding the access request when these access requests coincide with one another.

BEST HEDGING, UTILIZATION AND VALIDATION OF INFORMATION (BHUVI) MACHINE LEARNING MODEL

A method may include obtaining, by a computing device, an information technology (IT) incident report; obtaining, by the computing device, a set of system details of a system experiencing an IT incident, the IT incident being described by the IT incident report; identifying, by the computing device, a plurality of potential IT incident resolutions; scoring, by the computing device, the plurality of potential IT incident resolutions using a machine learning model, the machine learning model configured to determine a success of the plurality of potential IT incident resolutions; and generating and transmitting, by the computing device and according to the scoring, the plurality of potential IT incident resolutions.

Method and system for real-time and scalable anomaly detection and classification of multi-dimensional multivariate high-frequency transaction data in a distributed environment
11681574 · 2023-06-20 · ·

A system and method for the distributed analysis of high frequency transaction trace data to constantly categorize incoming transaction data, identify relevant transaction categories, create per-category statistical reference and current data and perform statistical tests to identify transaction categories showing overall statistically relevant performance anomalies. The relevant transaction category detection considers both the relative transaction frequency of categories compared to the overall transaction frequency and the temporal stability of a transaction category over an observation duration. The statistical data generated for the anomaly tests contains next to data describing the overall performance of transactions of a category also data describing the transaction execution context, like the number of concurrently executed transactions or transaction load during an observation period. Anomaly tests consider current and reference execution context data in addition to statistic performance data to determine if detected statistical performance anomalies should be reported.

Detecting imminent failure in a power supply
11681344 · 2023-06-20 · ·

Detecting an imminent failure of a power supply. An internal computerized component periodically reads a set of measurements from one or more sensors affixed to a power supply. The internal computerized component and power supply may reside in a variety of different technical contexts. The internal computerized component analyzes the set of measurements using, at least in part, a weighted set of factors, to detect the imminent failure in the power supply. The weighted set of factors may be updated or revised over time and may be specifically tailored for use with specific types of power supplies.