G06F11/0751

Determining a future operation failure in a cloud system

Examples described relate to determining a future operation failure in a cloud system. In an example, a historical utilization of resources for performing an operation in a cloud system may be determined. A current utilization of resources in the cloud system may be determined. Based on the historical utilization of resources for performing the operation in the cloud system and the current utilization of resources in the cloud system, a determination may be made whether a future performance of the operation in the cloud system is likely to be a failure. In response to a determination that the future performance of the operation in the cloud system is likely to be a failure, an alert may be generated.

Storage network with enhanced data access performance
11704184 · 2023-07-18 · ·

A method for execution by a storage network begins by issuing a decode threshold number of read requests for a set of encoded data slices to a plurality of storage units of a set of storage units and continues by determining whether less than a decode threshold number of read requests has been received in a time window. The method continues by identifying one or more encoded data slices encoded data slices associated with read requests of the decode threshold number of read requests that have not been received and for an encoded data slice of the one or more encoded data slices, issuing a priority read request to a storage unit storing a copy of the encoded data slice. The method then continues by receiving a response from the storage unit storing the copy of the encoded data, where the storage unit storing the copy of the encoded data slice is adapted to delay one or more maintenance tasks in response to the priority read request.

System and method for performing failure analysis on a computing system using a bayesian network

A system and method for performing a failure analysis on a computing system uses a Bayesian network that is generated from a correlation directed acyclic graph (CDAG) of nodes that represent entities in the computing system with edge weights for edges between the nodes. The CDAG is generated using log entries of the computing system with reference to a dictionary file, which specifies entity types, dependency types of the entity types, expressions that return values representing prior probabilities of failure for the dependency types, and the edge weights for the dependency types. The Bayesian network generated from the CDAG includes conditional probabilities for edges of the Bayesian network based on the edge weights of the CDAG. The Bayesian network is used to compute posterior probabilities of possible causes of a new failure event in the computing system and output at least one probable cause with an associated posterior probability for the new failure event.

Error remediation systems and methods

A computer system is provided. The computer system includes a memory, a network interface, and at least one processor configured to monitor a user interface comprising a plurality of user interface elements; detect at least one changed element within the plurality of user interface elements; classify, in response to detecting the at least one changed element, the at least one changed element as either indicating or not indicating an error; generate, in response to classifying the at least one changed element as indicating an error, an error signature that identifies the at least one changed element; identify, using the error signature, a remediation for the error; and provide the remediation in association with the at least one changed element.

Session triage and remediation systems and methods
11704177 · 2023-07-18 · ·

A computer system is provided. The computer system includes a memory and at least one processor coupled to the memory. The at least one processor is configured to scan session data representative of operation of a user interface comprising a plurality of user interface elements; detect, at a point in the session data, at least one changed element within the plurality of user interface elements; classify, in response to detecting the at least one changed element, the at least one changed element as either indicating or not indicating an error; store an association between the error and the point in the session data; and provide access to the point in the session data via the association.

Data integrity for persistent memory systems and the like

A data processor includes provides memory commands to a memory channel according to predetermined criteria. The data processor includes a first error code generation circuit, a second error code generation circuit, and a queue. The first error code generation circuit generates a first type of error code in response to data of a write request. The second error code generation circuit generates a second type of error code for the write request, the second type of error code different from the first type of error code. The queue is coupled to the first error code generation circuit and to the second error code generation circuit, for provides write commands to an interface, the write commands including the data, the first type of error code, and the second type of error code.

Reducing file system consistency check downtime

Provided is a method for performing a file system consistency check. The method comprises calculating, by a first thread that does not have access to an inode table, file block addresses for one or more files to be checked by the thread. The method further comprises collecting validity information for the one or more files. The method further comprises reading information relating to the one or more files from the inode table. The reading is performed in response to the thread being given access to the inode table after the calculating operation. The method further comprises validating the information by comparing the information from the inode table to the validity information.

Analysis of deep-level cause of fault of storage management
11704186 · 2023-07-18 · ·

Storage management is performed. For example, a computing device may determine that a fault belongs to one of a plurality of predefined fault categories based on description information of the fault of a storage system. Then, the computing device may determine at least one fault cause associated with the fault category at a first level of a hierarchical structure of predetermined fault causes. Further, the computing device may determine a first fault cause that causes the fault among the at least one fault cause. After that, the computing device may determine a target fault cause at the deepest level that causes the fault based on the first fault cause. As a result, the root cause of a fault of a storage system may be accurately and efficiently determined, thereby providing the possibility of fundamentally eliminating the fault.

FAULT DETECTION METHOD AND DEVICE
20230018598 · 2023-01-19 · ·

Embodiments provide a fault detection method and device. The method includes obtaining first detection logic, where the first detection logic includes a plurality of detection events and indicates a logical relationship between the plurality of detection events, a single detection event includes identification information of a to-be-detected device, a detection instruction, and identification information of a fault event, and identification information of to-be-detected devices that is in the plurality of detection events is partially the same; in response to the obtained first detection logic, successively executing corresponding detection instructions on to-be-detected devices in single detection events based on the logical relationship between the plurality of detection events, and receiving an execution result returned by the to-be-detected device; and if it is detected that the execution result includes the identification information of the fault event, determining that the fault event exists. In this application, the fault detection device automatically queries statuses of a plurality of to-be-detected devices, and quickly determines the fault event, so that a fault that occurs in an application scenario can be accurately located.

Storage System, Data Processing Method, Apparatus, Node, and Storage Medium
20230015979 · 2023-01-19 ·

This application discloses a storage system, a data processing method, an apparatus, a node, and a storage medium, and pertains to the field of data storage technologies. In the method, a client determines an address that is in a storage unit and that is used to store to-be-written data, and sends the to-be-written data to a first storage device that is in a storage node and that is corresponding to the storage unit, so that the first storage device stores the to-be-written data while a CPU of the storage node does not need to determine a hard disk LBA corresponding to virtual address space, and a hard disk does not need to determine a corresponding physical address based on the hard disk LBA.