G06F11/2058

EXTRA-RESILIENT CACHE FOR RESILIENT STORAGE ARRAY

A data storage array is configured for m-way resiliency across a first plurality of storage nodes. The m-way resiliency causes the data storage array to direct each top-level write to at least m storage nodes within the first plurality, for committing data to a corresponding capacity region allocated on each storage node to which each write operation is directed. Based on the data storage array being configured for m-way resiliency, an extra-resilient cache is allocated across a second plurality of storage nodes comprising at least s storage nodes (where s>m), including allocating a corresponding cache region on each of the second plurality for use by the extra-resilient cache. Based on determining that a particular top-level write has not been acknowledged by at least n of the first plurality of storage nodes (where n≤m), the particular top-level write is redirected to the extra-resilient cache.

Availability of a storage system
11341003 · 2022-05-24 · ·

Embodiments of the present disclosure provide a method for a storage system, a storage system and a computer program product. The method comprises determining a first drive in a drive array is temporarily unavailable. The method further comprises setting the first drive in a frozen state. The method further comprises: in response to receiving a write request for the first drive during the frozen state, pending the write request or recording the write request in a second drive in the drive array. The method further comprises: in response to receiving a read request for the first drive during the frozen state, reconstructing data to which the read request is directed through data stored in a third drive in the drive array.

Stream-based logging for distributed storage systems

Generally described, aspects of the present application correspond to maintaining a message stream for a network-based data store, which stream includes messages reflecting modifications to the data store. Messages within the stream may be used to revert a state of the data store to a prior point in time reflected within the messages of the stream, such as by “rewinding” operations on the data store by use of the messages within the stream. Messages in the stream may further be used to asynchronously update a replica of the data store.

Synchronous Replication Of High Throughput Streaming Data
20220155972 · 2022-05-19 · ·

A method for synchronous replication of stream data includes receiving a stream of data blocks for storage at a first storage location associated with a first geographical region and at a second storage location associated with a second geographical region. The method also includes synchronously writing the stream of data blocks to the first storage location and to the second storage location. While synchronously writing the stream of data blocks, the method includes determining an unrecoverable failure at the second storage location. The method also includes determining a failure point in the writing of the stream of data blocks that demarcates data blocks that were successfully written and not successfully written to the second storage location. The method also includes synchronously writing, starting at the failure point, the stream of data blocks to the first storage location and to a third storage location associated with a third geographical region.

Using Mirrored Copies For Data Availability
20230267054 · 2023-08-24 ·

Ensuring resiliency to storage device failures in a storage system, including: determining a number of storage device failures within a particular write group that are to be tolerated by the storage system; for a plurality of datasets stored within the storage system, writing each dataset to at least a predetermined number of storage devices within the particular write group, wherein the predetermined number of storage devices is greater than the number of storage device failures within the particular write group that are to be tolerated by the storage system; and responsive to recovering from a system interruption: determining a number of readable storage devices that contain a copy of the dataset; and if the number of readable storage devices that contain a copy of the dataset is not greater than the number of failures that are to be tolerated, writing the dataset to one or more additional storage devices.

Rebuilding an Encoded Data Slice Based on a Slice Integrity Value

A method includes retrieving an encoded data slice from memory of a storage network, where the encoded data slice is associated with a slice integrity value stored in the memory, and where a data segment of data is error encoded into a set of encoded data slices that includes the encoded data slice. The method further includes generating a second slice integrity value based on the retrieved encoded data slice. The method further includes determining whether the second slice integrity value compares favorably to the slice integrity value. When the second slice integrity value compares unfavorably to the slice integrity value, the method further includes facilitating rebuilding of the encoded data slice to produce a rebuilt encoded data slice. The method further includes storing the rebuilt encoded data slice in the memory.

VIRTUALIZED FILE SERVER

In one embodiment, a system for managing communication connections in a virtualization environment includes a plurality of host machines implementing a virtualization environment, wherein each of the host machines includes a hypervisor, at least one user virtual machine (user VM), and a distributed file server that includes file server virtual machines (FSVMs) and associated local storage devices. Each FSVM and associated local storage device are local to a corresponding one of the host machines, and the FSVMs conduct I/O transactions with their associated local storage devices based on I/O requests received from the user VMs. Each of the user VMs on each host machine sends each of its representative I/O requests to an FSVM that is selected by one or more of the FSVMs for each I/O request based on a lookup table that maps a storage item referenced by the I/O request to the selected one of the FSVMs.

Failover Methods and System in a Networked Storage Environment

Failover methods and systems for a storage environment are provided. During a takeover operation to take over storage of a first storage system node by a second storage system node, the second storage system node copies information from a first storage location to a second storage location. The first storage location points to an active file system of the first storage system node, and the second storage location is assigned to the second storage system node for the takeover operation. The second storage system node quarantines storage space likely to be used by the first storage system node for a write operation, while the second storage system node attempts to take over the storage of the first storage system node. The second storage system node utilizes information stored at the second storage location during the takeover operation to give back control of the storage to the first storage system node.

Replication between heterogeneous storage systems
11327998 · 2022-05-10 ·

Disclosed herein are systems, methods, and processes to perform replication between heterogeneous storage systems. In one embodiment, a request to perform a replication operation is sent by a target server to a source server, where the target server and the source server use different protocols to store data. A plurality of instructions, which are associated with a replication stream, are received from the source server by the target server, where the plurality of instructions comprise an include instruction to include existing data and a write instruction to write new data. A replication stream, which is associated with a backup stream stored on the source server, is also received from the source server, where the replication stream and the backup stream share a common format. The target server synthesizes a new replicated backup image, where the synthesizing comprises performing the include instruction and the write instruction on the replication stream.

Identifying a fault domain for a delta component of a distributed data object

The disclosure herein describes placing a delta component of a base component in a target fault domain. A delta component associated with a base component is generated. The generation includes selecting a first fault domain as a target fault domain for the delta component based on the first fault domain including a witness component associated with the distributed data object of the base component. Otherwise, the generation includes selecting a second fault domain as the target fault domain based on the second fault domain including at least one data component that includes a different address space than the base component. Otherwise, the generation includes selecting a third fault domain as the target fault domain based on the third fault domain being unused. Then, the delta component is placed on the target fault domain, whereby data durability of the distributed data object is enhanced, and available fault domains are preserved.