G06F11/1084

Method, device, and computer readable storage medium for managing redundant array of independent disks
11579975 · 2023-02-14 · ·

Techniques manage a redundant array of independent disks. In such a technique, a response time of a first storage device in the RAID is compared to a first threshold. In response to the response time of the first storage device exceeding the first threshold, the first storage device is configured as a pseudo-degraded storage device, such that the pseudo-degraded storage device is responsive to write requests only.

VARIABLE SPARING OF DISK DRIVES IN STORAGE ARRAY

In general, embodiments relate to a managing a Redundant Array of Independent Disks (RAID) group. The embodiments include determining a minimum and maximum set of spare disks to allocate to the RAID group, wherein the RAID group comprises a plurality of active members, allocating the minimum number of spare members to the RAID group, allocating an additional spare member to the RAID group, setting a mode of the additional spare member to storage mode, enabling, after the setting, the RAID controller to store data in the plurality of active members and in the additional spare member, wherein the plurality of active members, the minimum number of spare members, and the additional spare member comprise persistent storage.

FLEET HEALTH MANAGEMENT DEVICE CLASSIFICATION FRAMEWORK

An approach to identifying a corrective action for a data storage device (DSD), such as one implemented in a fleet of DSDs in a data center, involves receiving error data about excursions from normal operational behavior of the DSD, inputting data representing a particular excursion into a probabilistic decision network which characterizes a set of DSD operational metrics and certain DSD controller rules that represent internal controls of the DSD and corresponding conditional relationships among the operational metrics, determining from the decision network the likelihood that one or more possible causes was a contributing factor to the particular excursion, and determining a corrective action for the particular excursion based on the determined likelihood of a particular cause of the one or more possible causes. The corrective action may then be shared with the DSD for in-situ execution of corresponding self-repair operations.

Fleet health management device classification framework

An approach to identifying a corrective action for a data storage device (DSD), such as one implemented in a fleet of DSDs in a data center, involves receiving error data about excursions from normal operational behavior of the DSD, inputting data representing a particular excursion into a probabilistic decision network which characterizes a set of DSD operational metrics and certain DSD controller rules that represent internal controls of the DSD and corresponding conditional relationships among the operational metrics, determining from the decision network the likelihood that one or more possible causes was a contributing factor to the particular excursion, and determining a corrective action for the particular excursion based on the determined likelihood of a particular cause of the one or more possible causes. The corrective action may then be shared with the DSD for in-situ execution of corresponding self-repair operations.

Systems and methods for backing up volatile storage devices
11586508 · 2023-02-21 · ·

A method for backing up data, that includes making a detection, by a volatile storage firmware, that data communication to a volatile storage component is degraded, initiating a direct memory access (DMA) engine to copy the data from the volatile storage component to a non-volatile storage device, and in response to initiating copying of the data, initiating a shutdown of the volatile storage component.

METHOD FOR DATA RECONTRUCTION IN A RAID SYSTEM HAVING A PROTECTION POOL OF STORAGE UNITS
20220357880 · 2022-11-10 ·

A method of performing a reconstruction of data in a redundant array of independent disks (RAID) system with a protection pool of storage units includes receiving a request to perform a reconstruction of a first set of physical extents stored on a first physical disk of a set of physical disks. Each physical extent of the first set of physical extents is associated with an array of a second set of physical extents. The second set of physical extents is distributed across the set of physical disks. The method further includes allocating a third set of physical extents on one or more physical disks of the set of physical disks other than the first physical disk, and distributing data from each of the first set of physical extents of the first physical disk to a corresponding physical extent of the third set of physical extents.

WRITE HOLE PROTECTION METHOD AND SYSTEM FOR RAID, AND STORAGE MEDIUM
20220350703 · 2022-11-03 ·

A write hole protection method and system for a RAID, and a storage medium. The method comprises: presetting a log area, and after a RAID is degraded, setting the log area to be in an enabled state; when the log area is in the enabled state, determining, before each stripe write operation, whether a data block of a failed member disk of the RAID in a stripe is a check data block; if the data block is not the check data block, determining whether data blocks to be written of the stripe comprise a data block to be written into the failed member disk; if yes, backing up the data block to be written into the failed member disk in the log area; if not, calculating the data block of the failed member disk and backing up the data block in the log area, or backing up the data blocks to be written in the log area; and when the degraded RAID is started after a failure, performing data recovery using the log area. By using the present solution, the write hole issue of the RAID is avoided.

Techniques for performing live rebuild in storage systems that operate a direct write mode

Techniques for rebuilding data in a data storage system are provided. A method includes: (a) identifying (i) a first set of degraded Ubers that contain no portions reserved for direct writing and (ii) a second set of degraded Ubers that contain at least one portion reserved for direct writing. Direct writing is a process that writes blocks to long-term storage prior to mapping those blocks in a metadata mapping structure. An Uber is a set of adjacent stripes across a respective Redundant Array of Independent Disks (RAID) array of the data storage system, and a degraded Uber is an Uber that includes at least one failed drive within its RAID array. The method further includes (b) initiating a rebuild of the first set of degraded Ubers; and (c) delaying a rebuild of each degraded Uber of the second set until all pending direct writes to blocks of that degraded Uber have been mapped by the metadata mapping structure.

METHODS AND APPARATUSES FOR MANAGEMENT OF RAID
20220326857 · 2022-10-13 ·

Techniques for managing a redundant array of independent disks (RAID) involve detecting an abnormality of a storage device in a RAID. The techniques further involve resetting the storage device in response to detecting the abnormality. The techniques further involve storing an address of a write operation for the RAID within a preset time period, so as to rebuild the RAID in the case that the storage device is recovered within the preset time period. Accordingly, temporary errors of the RAID can be efficiently handled, the number of downtime of the RAID caused by the storage device or the back end can be reduced, and computing resources and time required to rebuild the RAID can be significantly reduced.

Sharing spare capacity of disks with multiple sizes to parallelize RAID rebuild
20230113849 · 2023-04-13 · ·

Managed drives of a storage node with different size drives in a fixed arithmetic relationship are organized into clusters of same size drives. Every drive is configured to have M*G same-size partitions, where M is a positive integer variable defined by the arithmetic relationship and G is the RAID group size. The storage capacity of all drives can be viewed as matrices of G+1 rows and M*G columns, and each matrix is composed of submatrices of G+1 rows and G columns. Diagonal spare partitions are allocated and distributed in the same pattern over groups of G columns of all matrices, for increasing partition index values. Members of RAID groups are vertically distributed such that the members of a given RAID group reside in a single partition index of a single cluster. When a drive fails, protection group members of the failed drive are rebuilt in order on spare partitions characterized by lowest partition indices for increasing drive numbers across multiple clusters. Consequently, drive access for rebuild is parallelized and latency is reduced.