Patent classifications
G06F11/1088
USE OF CLUSTER-LEVEL REDUNDANCY WITHIN A CLUSTER OF A DISTRIBUTED STORAGE MANAGEMENT SYSTEM TO ADDRESS NODE-LEVEL ERRORS
Systems and methods that make use of cluster-level redundancy within a distributed storage management system to address various node-level error scenarios are provided. According to one embodiment, a KV store of a node of a cluster of a distributed storage management system manages storage of data blocks as values and corresponding block IDs as keys. Data integrity errors are reported to the first node in the form of a list of missing block IDs that are in use but missing from the KV store. A metadata resynchronization process may then be caused to be performed, including for each block ID in the list of missing block IDs: (i) reading a data block corresponding to the block ID from another node of the cluster that maintains redundant information relating to the block ID; and (ii) restoring the block ID within the KV store by writing the data block to the node.
MANAGING MACHINE LEARNING MODEL RECONSTRUCTION
A method performed by one or more processors that preserves a machine learning model comprises accessing model parameters associated with a machine learning model. The model parameters are determined responsive to training the machine learning model. The method comprises generating a plurality of model parameter sets, where each of the plurality of model parameter sets comprises a separate portion of the set of model parameters. The method comprises determining one or more parity sets comprising values calculated from the plurality of model parameter sets. The method comprises distributing the plurality of model parameter sets and the one or more parity sets among a plurality of computing devices, where each of the plurality of computing devices stores a model parameter set of the plurality of model parameter sets or a parity set of the one or more parity sets. The method comprises accessing, from the plurality of computing devices, a number of sets comprising model parameter sets and at least one parity set. The method comprises reconstructing the machine learning model from the number of sets accessed from the plurality of computing devices.
Heterogenous Memory Accommodating Multiple Erasure Codes
A method for proactively rebuilding user data in a plurality of storage nodes of a storage cluster is provided. The method includes distributing user data and metadata throughout the plurality of storage nodes such that the plurality of storage nodes can read the user data, using erasure coding, despite loss of two of the storage nodes. The method includes determining that one of the storage nodes is unreachable and determining to rebuild the user data for the one of the storage nodes that is unreachable. The method includes reading the user data across a remainder of the plurality of storage nodes, using the erasure coding and writing the user data across the remainder of the plurality of storage nodes, using the erasure coding. A plurality of storage nodes within a single chassis that can proactively rebuild the user data stored within the storage nodes is also provided.
RELIABILITY CODING WITH REDUCED NETWORK TRAFFIC
This disclosure describes techniques that include implementing network-efficient data durability or data reliability coding on a network. In one example, this disclosure describes a method that includes generating a plurality of data fragments from data to enable reconstruction of the data from a subset of the plurality of data fragments; storing, across a plurality of nodes in a network, the plurality of data fragments, wherein storing the plurality of data fragments includes storing the first fragment at a first node and the second fragment at a second node; and generating, by the first node, a plurality of secondary fragments derived from the first fragment to enable reconstruction of the first fragment from a subset of the plurality of secondary fragments; and storing the plurality of secondary fragments from the first fragment across a plurality of storage devices included within the first node.
Rebuilding Data Slices in a Storage Network Based on Priority
A distributed storage integrity system in a dispersed storage network includes a scanning agent and a control unit. The scanning agent identifies an encoded data slice that requires rebuilding, wherein the encoded data slice is one of a plurality of encoded data slices generated from a data segment using an error encoding dispersal function. The control unit retrieves at least a number T of encoded data slices needed to reconstruct the data segment based on the error encoding dispersal function. The control unit is operable to reconstruct the data segment from at least the number T of the encoded data slices and generate a rebuilt encoded data slice from the reconstructed data segment. The scanning agent is located in a storage unit and the control unit is located in the storage unit or in a storage integrity processing unit, a dispersed storage processing unit or a dispersed storage managing unit.
Reconfiguring a storage system based on resource availability
Reconfiguring a storage system based on resource availability, including: limiting a number of storage devices in a storage system that may be simultaneously servicing write operations; determining that an amount of required write bandwidth has changed; and subsequent to determining that the amount of required write bandwidth has changed, adjusting, by a computer processor, the number of storage devices in the storage system that may be simultaneously servicing write operations.
Techniques for managing failed storage devices
Techniques described herein manage failed storage devices. A number of failed storage devices is determined to exceed a number of redundancies in a storage configuration of the storage system. The status of a failed storage device is changed to permit solely read operations. Valid data from the failed storage device is copied to a spare storage device. Invalid data on the failed storage device is reconstructed based on corresponding data from other storage devices, and the reconstructed data is stored on the spare storage device. The failed storage device is removed from the storage system.
Memory access methods and apparatus
A disclosed example apparatus includes a row address register (412) to store a row address corresponding to a row (608) in a memory array (602). The example apparatus also includes a row decoder (604) coupled to the row address register to assert a signal on a wordline (704) of the row after the memory receives a column address. In addition, the example apparatus includes a column decoder (606) to selectively activate a portion of the row based on the column address and the signal asserted on the wordline.
Storage control apparatus, method, and medium for scheduling volume recovery
A storage control apparatus, that controls a storage apparatus that includes a storage drive in which a plurality of logical volumes are set, includes a storage unit that stores load information for each of the plurality of logical volumes, and a control unit that determines to-be-rebuilt volumes, which are targets to be rebuilt, from the plurality of logical volumes, sequentially selects a logical volumes for which a volume-specific taken time is estimated, determines, for each selected logical volume, a volume-specific start time at which a rebuild will be started, estimates, by using the volume-specific start time and the load information about the selected logical volume, the volume-specific taken time for rebuilding the selected logical volume, and totals the volume-specific taken time estimated for each selected logical volume to calculate a total taken time taken for rebuilding the to-be-rebuilt volumes.
Distributed hot space in a data storage server
The described technology is generally directed towards a virtualized dedicated hot spare storage device in a RAID-configured data storage system, in which the capacity of the dedicated spare storage device is distributed among the physical disks underlying a RAID virtual disk. A RAID controller creates a first virtual construct comprising an array of logical block addresses that maps data reads from and writes to the virtual disk to locations in the physical disks underlying the virtual disk. When hot space storage device capacity is specified, the RAID controller creates a second construct comprising another array of logical block addresses which are reserved for the distributed hot space. The virtualized dedicated hot spare storage device increases storage capacity and performance by utilizing more of the storage resources of a data storage server.