G06F11/1096

Heterogenous Memory Accommodating Multiple Erasure Codes

A method for proactively rebuilding user data in a plurality of storage nodes of a storage cluster is provided. The method includes distributing user data and metadata throughout the plurality of storage nodes such that the plurality of storage nodes can read the user data, using erasure coding, despite loss of two of the storage nodes. The method includes determining that one of the storage nodes is unreachable and determining to rebuild the user data for the one of the storage nodes that is unreachable. The method includes reading the user data across a remainder of the plurality of storage nodes, using the erasure coding and writing the user data across the remainder of the plurality of storage nodes, using the erasure coding. A plurality of storage nodes within a single chassis that can proactively rebuild the user data stored within the storage nodes is also provided.

METHOD AND DEVICE FOR SELECTING RAID LEVEL FOR MAPPED RAID
20170364271 · 2017-12-21 ·

A method and device for selecting a Redundant Array of Independent Disks (RAID) level for mapped RAID. The method comprises determining, for a given RAID level, a desired ratio of rebuilding speed between the mapped RAID and non-mapped RAID based on the first number of disks in the non-mapped RAID and the second number of disks in the mapped RAID. The method also comprises determining an actual ratio of rebuilding speed between the mapped RAID and the non-mapped RAID based on the second number of disks in the mapped RAID. In addition, the method comprises selecting the given RAID level for the mapped. RAID in response to the actual ratio being above the desired ratio.

Storage controller, storage system and method of operating storage controller

A redundant array of independent disks (RAID) storage system, includes a RAID master controller receiving a RAID request selectively communicating the RAID request to one of a plurality of storage devices, wherein first and second storage devices are directly connected outside a data communication path including the host among the storage devices. The first storage device determines upon receiving the RAID request whether distribution of a RAID sub-request to the second storage device is necessary, such that upon determining that the distribution of a RAID sub-request is necessary, the first RAID controller communicates the RAID sub-request to the second storage device via the direct network connection.

Fault Tolerant Disaggregated Memory

Aspects of the disclosure are directed to a low-latency, low-overhead fault tolerant remote memory framework, which packs similar-size in-memory objects into individual page-aligned spans and applies erasure coding on these spans. The framework fully utilizes efficient one-sided remote memory accesses (RMAs) to swap spans in and out using minimal network input/outputs (I/Os), with compaction techniques that reduce remote memory fragmentation. The framework can achieve lower tail latency and higher application performance compared to other fault tolerance solutions, at the cost of potentially more memory usage.

Storage system

Provided is a storage system that performs inter-node movement of parity and reconfiguration of a stripe when a node configuration is changed. The storage system includes a plurality of nodes and a management unit, in which the nodes are targets for data write and read requests, form a stripe by a plurality of data stored in different nodes and parity generated based on the plurality of data, and store the parity of the stripe to which the data under the write request belongs in a node different from the plurality of nodes that store the plurality of data so as to perform redundancy; and the management unit transmits, to the node, an arrangement change request to perform the inter-node movement of the parity and the reconfiguration of the stripe when the node configuration is changed.

Systems and methods of maintaining fault tolerance for new writes in degraded erasure coded distributed storage

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for maintaining fault tolerance for new writes in a storage system when one or more components of the storage system are unavailable. One of the methods includes determining that one or more first disks of a capacity object of a storage system are unavailable, wherein the storage system comprises a segment usage table identifying the plurality of segments of the capacity object; in response: identifying a plurality of available second disks, adding a plurality of new segments corresponding to the second disks to the capacity object, and adding data identifying the plurality of new segments to the segment usage table; and for each of one or more new write requests to the capacity object: identifying an available segment from the plurality of new segments, and writing data associated with the new write request to the identified available segment.

Orphan block management in non-volatile memory devices
09811413 · 2017-11-07 · ·

A system for data storage includes one or more non-volatile memory (NVM) devices, each device including multiple memory blocks, and a processor. The processor is configured to assign the memory blocks into groups, to apply a redundant data storage scheme in each of the groups, to identify a group of the memory blocks including at least one bad block that renders remaining memory blocks in the group orphan blocks, to select a type of data suitable for storage in the orphan blocks, and to store the data of the identified type in the orphan blocks.

Dynamic restriping in nonvolatile memory systems

Data is stored as a first collection of memory blocks distributed across a first set of memory devices. It is determined that a first memory device in the first set is in a degraded state. Data is recovered corresponding to a first memory block in the first collection of memory blocks that is stored in the first memory device, which is configured to include a first number of memory blocks. The recovered data is stored in a second memory device as a new memory block, which is added to the first collection of memory blocks. The first memory device is removed from the first set and reconfigured with a second number of memory blocks that is less than the first number of memory blocks. Memory blocks in a second collection of memory blocks distributed across a second set of memory devices is stored in the reconfigured first memory device.

Fault-tolerant Enterprise Object Storage System for Small Objects

Various implementations disclosed herein provide fault-tolerant enterprise object storage system that can store small objects. In various implementations, the fault-tolerant enterprise object storage system writes a small object into an aggregate object that is distributed across a plurality of storage entities. In some implementations, the small object is at least an order of magnitude smaller than the aggregate object, and the small object is within the same order of magnitude of a block unit addressable within each of the storage entities. In some implementations, based on the small object, the storage system updates the parity data associated with the aggregate object in response to writing the small object into the aggregate object. In various implementations, the storage system updates a processed data end offset indicator that indicates that the parity data for the aggregate object includes valid data up to and including the small object.

RAID SYSTEM PERFORMANCE ENHANCEMENT USING COMPRESSED DATA AND BYTE ADDRESSABLE STORAGE DEVICES
20170286220 · 2017-10-05 ·

A method for operating a RAID storage system includes configuring the RAID storage devices to receive in a read or write command a byte count, receiving a first data block to write to the storage system, compressing the received first data block to generate a first compressed data block, and then storing the first compressed data block memory. The method additionally includes executing a set of RAID operations to perform a partial stripe update, including: retrieving a second compressed data block from memory; determining a physical size of the second compressed data block; generating, based on the second compressed data block and the physical size, redundant data corresponding with the second compressed data block; and writing the second compressed data block and the redundant data by transmitting a write command including the second compressed data block, the redundant data, and the physical size to the set of RAID storage devices.