G06F11/2053

Extra-resilient cache for resilient storage array

A data storage array is configured for m-way resiliency across a first plurality of storage nodes. The m-way resiliency causes the data storage array to direct each top-level write to at least m storage nodes within the first plurality, for committing data to a corresponding capacity region allocated on each storage node to which each write operation is directed. Based on the data storage array being configured for m-way resiliency, an extra-resilient cache is allocated across a second plurality of storage nodes comprising at least s storage nodes (where s>m), including allocating a corresponding cache region on each of the second plurality for use by the extra-resilient cache. Based on determining that a particular top-level write has not been acknowledged by at least n of the first plurality of storage nodes (where n m), the particular top-level write is redirected to the extra-resilient cache.

FABRICLESS ALLOCATION OF A LOCAL CACHE IN A FABRIC ENVIRONMENT

Worker threads allocate at least some recycled cache slots of a local portion of a shared memory to the compute node to which the memory portion is local. More specifically, the recycled cache slots are allocated prior to receipt of the IO that the recycled cache slot will be used to service. The allocated recycled cache slots are added to primary queues of each compute node. If a primary queue is full then the worker thread adds the recycled cache slot, unallocated, to a secondary queue. Cache slots in the secondary queue can be claimed by any compute node associated with the shared memory. Cache slots in the primary queue can be used by the local compute node without sending test and set messages via the fabric that interconnects the compute nodes, thereby improving IO latency.

Establishing a synchronous replication relationship between two or more storage systems

Establishing a synchronous replication relationship between two or more storage systems, including: identifying, for a dataset, a plurality of storage systems across which the dataset will be synchronously replicated; configuring one or more data communications links between each of the plurality of storage systems to be used for synchronously replicating the dataset; exchanging, between the plurality of storage systems, timing information for at least one of the plurality of storage systems; and establishing, in dependence upon the timing information for at least one of the plurality of storage systems, a synchronous replication lease, the synchronous replication lease identifying a period of time during which the synchronous replication relationship is valid.

Responding to a change in membership among storage systems synchronously replicating a dataset

Determining active membership among a set of storage systems synchronously replicating a dataset, where determining active membership includes: determining that a membership event corresponds to a change in membership to the set of storage systems synchronously replicating the dataset; applying, in dependence upon the membership event, one or more membership protocols to determine a new set of storage systems to synchronously replicate the dataset; and for one or more I/O operations directed to the dataset, applying the one or more I/O operations to the dataset synchronously replicated by the new set of storage systems.

Modifying A Synchronously Replicated Dataset

Modifying a synchronously replicated dataset, including: receiving, by a leader storage system, a request to modify a dataset that is synchronized across a plurality of storage systems; sending, from the leader storage system to a follower storage system, information describing the request to modify the dataset, wherein the leader storage system and the follower storage system each store a copy of the dataset; processing, by the leader storage system on the copy of the dataset that is stored on the leader storage system, the request to modify the dataset; receiving, from the follower storage system, an indication that the follower storage system has processed the request to modify the dataset on the copy of the dataset that is stored on the follower storage system; and acknowledging, by the leader storage system, completion of the request to modify the dataset.

Disaster recovery specific configurations, management, and application

A mechanism for disaster recovery configurations and management in virtual tape applications. Specifically, the introduction of an additional computer process executing at an active datacenter site and at another active (or alternatively, a standby) datacenter site permit: (i) the generation and management of global configurations implemented on the active datacenter site prior to the occurrence of a failover event; and (ii) the implementation of global configurations on the another active (or standby) datacenter site after the occurrence of the failover event.

Use of error correction-based metric for identifying poorly performing data storage devices

An approach to identifying poorly performing data storage devices (DSDs) in a data storage system, such as hard disk drives (HDDs) and/or solid-state drives (SSDs), involves retrieving and evaluating a respective set of log pages, such as SCSI Log Sense counters, from each of multiple DSDs. Based on each respective set of log pages, a value for a Quality of Service (QoS) metric is determined for each respective DSD, where each QoS value represents an average percentage of bytes processed without the respective DSD performing an autonomous error correction. In response to a particular DSD reaching a predetermined threshold QoS value, an in-situ repair may be determined for the particular DSD or the particular DSD may be added to a list of candidate DSDs for further examination, which may include an FRPH examination for suitably configured DSDs.

Dynamically Processing Data In A Vast Data Ingestion System

A method begins with a first group of computing devices of a plurality of computing devices of a storage network receiving data objects having a first data type identifier of a plurality of data type identifiers, where the plurality of data type identifiers correspond to a plurality of data types associated with the data objects. The method continues with the first group of computing devices interpreting the data objects having the first data type identifier to sort, based on sorting criteria the data objects into a first processing category and a second processing category. The method continues with the first group of computing devices error encoding the data objects in the second processing category based on short term storage error encoding parameters to produce pluralities of sets of encoded data slices and sending the pluralities of sets of encoded data slices to storage and execution units for storage therein.

ESTABLISHING A SYNCHRONOUS REPLICATION RELATIONSHIP BETWEEN TWO OR MORE STORAGE SYSTEMS

Establishing a synchronous replication relationship between two or more storage systems, including: identifying, for a dataset, a plurality of storage systems across which the dataset will be synchronously replicated; configuring one or more data communications links between each of the plurality of storage systems to be used for synchronously replicating the dataset; exchanging, between the plurality of storage systems, timing information for at least one of the plurality of storage systems; and establishing, in dependence upon the timing information for at least one of the plurality of storage systems, a synchronous replication lease, the synchronous replication lease identifying a period of time during which the synchronous replication relationship is valid.

Cluster member transfer for raid system expansion
11144413 · 2021-10-12 · ·

In a storage system that implements RAID (D+P) with an existing cluster of drives in which the drives have (D+P) partitions that are protection group members, cluster member transfer code creates a new drive cluster when fewer than D+P new drives are added to the storage system. The cluster member transfer code moves one or more drives from the existing cluster into a new cluster so that the number of new drives plus the number of moved drives equals D+P. One or more protection groups may be moved to the new cluster.