G06F11/0772

Servicing data storage devices in a data storage array

Systems and methods for replacing and testing a data storage device are disclosed. In disclosed embodiments, a system including a data storage array (DSA) including a plurality of data storage devices (DSDs) in an enclosure. The system further includes an I/O server coupling the DSA to a client node and configured to provide data access between the client node and the DSA. The system further includes a management server coupled to the DSA, configured to detect a failed DSD in the DSA, detect a replacement DSD in the enclosure that replaces the failed DSD, and add the replacement DSD to a logical path of the DSA. The management server is further configured to display an indication of a state of the DSA based on the comparing.

Memory scrub using memory controller

A system-on-chip (SoC) can include a processor, a network controller configured to provide a network interface, and a memory controller configured to perform memory scrubbing. A memory patrol driver executing on the processor can initiate direct memory access (DMA) transfers to read successive portions of the memory by configuring corresponding DMA descriptors at a certain time interval. The network controller can perform each DMA transfer to read a corresponding portion of the memory, which can cause the memory controller to scrub the corresponding portion of the memory. The scrubbed data is sent to the network controller, which is discarded by the network controller.

BYTE LEVEL GRANULARITY BUFFER OVERFLOW DETECTION FOR MEMORY CORRUPTION DETECTION ARCHITECTURES
20180004588 · 2018-01-04 ·

Memory corruption detection technologies are described. A processor can include a memory to store a memory corruption detection (MCD) table. A processor core of the processor can receive, from an application, an allocation request for an allocation of a memory object within a contiguous memory block in the memory. The processor core can allocate the contiguous memory block in view of a size of the memory object requested and write MCD meta-data into the MCD table, including a MCD identifier (ID) associated with the contiguous memory block and a MCD border value indicating a size of a memory region of the contiguous memory block.

Methods and systems for a non-disruptive automatic unplanned failover from a primary copy of data at a primary storage system to a mirror copy of the data at a cross-site secondary storage system

Multi-site distributed storage systems and computer-implemented methods are described for providing an automatic unplanned failover (AUFO) feature to guarantee non-disruptive operations (e.g., operations of business enterprise applications, operations of software application) even in the presence of failures including, but not limited to, network disconnection between multiple data centers and failures of a data center or cluster.

SYSTEMS AND METHODS FOR POWER LOSS PROTECTION OF STORAGE RESOURCES

In accordance with embodiments of the present disclosure, a method for power loss protection of one or more storage resources may include receiving information from each of the one or more storage resources regarding power loss protection capabilities of such storage resource. The method may also include based on the information, repurposing, for each power loss protection capable storage resource, a communications channel between a logic device and such power loss protection capable storage resource for transmission of a respective early power-off warning signal for such power loss protection capable storage resource. The method may further include in response to a power event of a power supply unit for providing electrical energy to the one or more storage resources, asserting for each power loss protection capable storage resource its respective early power-off warning signal.

Mitigating read disturb effects in memory devices

A die read counter and a block read counter are maintained for a specified block of a memory device. An estimated number of read events associated with the specified block is determined based on a value of the block read counter and a value of the die read counter. Responsive to determining that the estimated number of read events satisfies a criterion, a media management operation of one or more pages associated with the specified block is performed.

Error dynamics analysis
11709726 · 2023-07-25 · ·

A method, a system, and a computer program product for analyzing error messages. A first error log generated as a result of an execution of at least one task of a computing system at a first instance is received. The first error log include a plurality of first error messages. A first association rules model is generated using the first error messages. The first association rules model includes a plurality of association rules defining one or more relationships. A second error log, including a plurality of second error messages, generated as a result of an execution of the task at a second instance is received and a second association rules model is generated using the second error messages. Based on the first and second association rules models, at least one error message pattern associated with execution of the at least one task is determined.

ERROR REPORTING FOR NON-VOLATILE MEMORY MODULES

A memory controller includes a memory channel controller adapted to receive memory access requests and dispatch associated commands addressable in a system memory address space to a non-volatile storage class memory (SCM) module. The non-volatile error reporting circuit identifies error conditions associated with the non-volatile SCM module and maps the error conditions from a first number of possible error conditions associated with the non-volatile SCM module to a second, smaller number of virtual error types for reporting to an error monitoring module of a host operating system, the mapping based at least on a classification that the error condition will or will not have a deleterious effect on an executable process running on the host operating system.

MECHANISM FOR INTEGRATING I/O HYPERVISOR WITH A COMBINED DPU AND SERVER SOLUTION

A combined data processing unit (DPU) and server solution with DPU operating system (OS) integration is described. A DPU OS is executed on a DPU or other computing device, where the DPU OS exercises secure calls provided by a DPU's trusted firmware component, that may be invoked by DPU OS components to abstract DPU vendor-specific and server vendor-specific integration details. An invocation of one of the secure calls made on the DPU to communicate with its associated server computing device is identified. In an instance in which the one of the secure calls is invoked, the secure call invoked is translated into a call or request specific to an architecture of the server computing device and the call is performed, which may include sending a signal to the server computing device in a format interpretable by the server computing device.

Method for identifying and evaluating common cause failures of system components

Provided is a method and system for identifying and evaluating common cause failures of system components, wherein at least one analytical artifact and machine readable system related to at least one of spatial, topological data and machine readable system related lifecycle data are processed to analyze automatically a susceptibility of system components to common cause failure based on common cause failure influencing factors.