G06F11/0787

PROFILING AND DIAGNOSTICS FOR INTERNET OF THINGS

A computing device and method for profiling and diagnostics in an Internet of Things (IoT) system, including matching an observed solution characteristic of the IoT system to an anomaly in an anomaly database.

Firmware-based SSD block failure prediction and avoidance scheme

A Solid State Drive (SSD) is disclosed. The SSD may comprise flash storage for data, the flash storage organized into a plurality of blocks. A controller may manage reading data from and writing data to the flash storage. Metadata storage may store device-based log data for errors in the SSD. Identification firmware may identify a block responsive to the device-based log data. In some embodiments of the inventive concept, verification firmware may determine whether the suspect block is predicted to fail responsive to both precise block-based data and the device-based log data.

Efficient Fault Prevention and Repair in Complex Systems
20230021610 · 2023-01-26 ·

A method of supervising a complex system includes acquiring and storing failures data and repair resources information regarding the complex system, identifying failure networks and structures of the complex system. Failure types associated with the failure networks of the complex system are determined. The method includes generating a plurality of failure prevention and repair (FPR) sequences, wherein each FPR is associated with the failure networks and the failure types. The generated FPR sequences are analyzed to select a set of FPR sequences and associated repair resources. The method further comprises applying the selected one of the plurality of failure prevention and repair sequences to the complex system, thereby managing the complex system.

IN-FLIGHT DETECTION OF SERVER TELEMETRY REPORT DRIFT

A first information handling system may receive a telemetry metric report from a client information handling system. The first information handling system may determine that one or more characteristics of the telemetry metric report do not match one or more predetermined telemetry metric report characteristics. The first information handling system may perform one or more corrective actions based, at least in part, on the determination that the one or more characteristics of the telemetry metric report do not match one or more predetermined telemetry metric report characteristics.

UTILIZING AUTOMATIC LABELLING, PRIORITIZING, AND ROOT CAUSE ANALYSIS MACHINE LEARNING MODELS AND DEPENDENCY GRAPHS TO DETERMINE RECOMMENDATIONS FOR SOFTWARE PRODUCTS

A device may receive software data identifying current logs and events associated with software products utilized by an entity and may process the software data, with a machine learning model, to generate error severity scores for the software products. The machine learning model may be trained based on historical software data identifying events and logs associated with software products utilized by the entity and based on a combination of historical health scores, historical sentiment scores, and historical dissimilarity scores for the software products. The device may process the error severity scores, with a prioritization model, to generate prioritized error scores and may process the error severity scores and the prioritized error scores, with a root cause analysis model, to generate root cause data identifying root causes associated with the error severity scores. The device may perform one or more actions based on the root cause data.

QUANTUM COMPUTER SYSTEM SCHEDULING AND PARAMETERIZATION BASED ON ERROR CORRECTION HISTORY
20230229491 · 2023-07-20 ·

In one example described herein a system can receive, by a scheduler of a server, a request to execute a quantum algorithm. The system can determine, by the scheduler, a quantum computer system of a plurality of quantum computer systems to execute the quantum algorithm based on a database that stores associations between each quantum computer system of the plurality of quantum computer systems, at least one parameter associated with the quantum algorithm, and error information. The system can transmit, by the scheduler, the request to the quantum computer system for executing the quantum algorithm.

SYSTEMS AND METHODS FOR GENERATING A SYSTEM LOG PARSER
20230229540 · 2023-07-20 ·

The present disclosure provides systems and methods for generation of parsing scripts or rules for unstructured or semi-structured system log messages, including systems and methods for identifying and clustering of same or substantially similar system log messages using machine learning. Patterns indicative of the same or substantially similar types system log messages can be generated based on the clustering of the system log messages and calculated similarities of attributes or distances between common features/fields of the system log messages, with the results of the clustering presented for analysis and development or adjustment of parsing scripts.

INTELLIGENT LOG ANALYSIS AND RETENTION FOR MICROSERVICES APPLICATIONS
20230229545 · 2023-07-20 ·

Embodiments for an intelligent log analysis and retention for microservices applications. Such log analysis is helpful in troubleshooting failure conditions that may occur intermittently or repeatedly within a microservices environment. Executed operations by microservices, applications, or components generate logs, and these logs are classified based on certain parameters. A default log retention period defines how long the logs are stored in the system, and a log analytical engine determines and assigns weights constituting additional retention time to keep logs for failed operations. These logs can then be analyzed to determine causes of the failure, and build failure models to predict future failure conditions. Logs for successful operations can be timely deleted to save storage space.

Apparatus and method for scalable error detection and reporting

Apparatus and method for scalable error reporting. For example, one embodiment of an apparatus comprises error detection circuitry to detect an error in a component of a first tile within a tile-based hierarchy of a processing device; error classification circuitry to classify the error and record first error data based on the classification; a first tile interface to combine the first error data with second error data received from one or more other components associated with the first tile to generate first accumulated error data; and a master tile interface to combine the first accumulated error data with second accumulated error data received from at least one other tile interface to generate second accumulated error data and to provide the second accumulated error data to a host executing an application to process the second accumulated error data.

Analysis of deep-level cause of fault of storage management
11704186 · 2023-07-18 · ·

Storage management is performed. For example, a computing device may determine that a fault belongs to one of a plurality of predefined fault categories based on description information of the fault of a storage system. Then, the computing device may determine at least one fault cause associated with the fault category at a first level of a hierarchical structure of predetermined fault causes. Further, the computing device may determine a first fault cause that causes the fault among the at least one fault cause. After that, the computing device may determine a target fault cause at the deepest level that causes the fault based on the first fault cause. As a result, the root cause of a fault of a storage system may be accurately and efficiently determined, thereby providing the possibility of fundamentally eliminating the fault.