G06F11/1662

Data recovery in multi-leader distributed systems

Disclosed are a method and system for recovering a distributed system from a failure of a data storage unit. The distributed system includes a plurality of computer systems, each having a read-write computer and a data storage unit. Data is replicated from a particular data storage unit to other data storage units using publish-subscribe model. A read-write computer receives the replicated data, processes the data for any conflicts and stores it in the data storage unit. If a data storage unit fails, another data storage unit that has latest data corresponding to the failed data storage unit is determined and the latest data is replicated to other data storage units. Accordingly, the distributed system continues to have the data of the failed data storage unit. The failed data storage unit may be reconstructed using data from one of the other data storage units in the distributed system.

Method, device, and product for managing scrubbing operation in storage system
11669262 · 2023-06-06 · ·

The present disclosure relates to a method, device and product for managing scrubbing operations in a storage system. In the method for managing scrubbing operations in a storage system, regarding a plurality of extents included in the storage system, respective usage states of the plurality of extents are obtained. A group of target extents in which a failure will occur are detected from the plurality of extents based on the respective usage states of the plurality of extents. A scrubbing interval of the scrubbing operations to be performed on the storage system is adjusted according to the detected group of target extents. A scrubbing operation is performed on at least one part of the plurality of extents in the storage system according to the adjusted scrubbing interval, so as to identify a failed extent.

Database segment load balancer

Methods, systems, and computer programs encoded on computer storage media, for implementing MPP relational databases using containers. One example system initiates primary containers to implement database segment instances. Each segment of the database is stored on a respective storage volume. Each storage volume is mounted on a respective primary container. The system detects a failure of a first primary container that is a segment instance of a first database segment. In response to the detection, the system performs a recovery process. The system unmounts, from the failed first primary container, a first storage volume storing the first database segment. The system selects a standby container from a pool of standby containers and mounts the first storage volume on the selected standby container. The system is reconfigured so that it processes queries for the first database segment using the selected standby container instead of the failed first container.

Method, device, and computer program product for managing storage system

The present disclosure relates to a method, a device, and a computer program product for managing a storage system. The storage system includes a first control node, a second control node, and a persistent storage device, the first control node being in an activated state, and the second control node being in a state of transfer from a non-activated state to an activated state. A method includes: loading a first list of page descriptors of the storage system to the second control node to generate a second list of page descriptors at the second control node, the first list including a portion of multiple page descriptors of the storage system that has been modified but has not been flushed to the persistent storage device; receiving a synchronization message from the first control node that indicates that the first list has been modified by the first control node; and updating the second list at the second control node based on the synchronization message. Further, a corresponding device and a corresponding program product are provided. With the example implementations of the present disclosure, the start performance of the control nodes in the storage system can be improved.

SYSTEM AND METHOD FOR DISASTER RECOVERY OF CLOUD APPLICATIONS
20170308446 · 2017-10-26 · ·

Cloud computing is continuously growing as a business model for hosting information and communications technology applications. While the on-demand resource consumption and faster deployment time make this model appealing for the enterprise, other concerns arise regarding the quality of service offered by the cloud. Systems and methods are provided for enabling disaster recovery of applications hosted in the cloud and for monitoring data center sites for failure.

PRIORITIZED DATA REBUILDING IN A DISPERSED STORAGE NETWORK
20170300374 · 2017-10-19 ·

A method begins with a processing module querying distributed storage network (DSN) storage units regarding storage errors associated with a data segment. The method continues with the processing module receiving query responses and depending on the responses, assigning a first threshold priority or a second threshold priority to encoded data slices (EDSs) associated with the data segment. The method proceeds with the processing module, depending on the assigned threshold priority, issuing read slice requests and rebuilding EDS associated with the data segment.

ACCELERATED RECOVERY IN DATA REPLICATION ENVIRONMENTS

A method for accelerating recovery in a data replication environment includes maintaining a secondary out-of-sync bitmap for a secondary volume. The secondary out-of-sync bitmap indicates which storage elements on the secondary volume are not synchronized with storage elements on a primary volume. The method further generates, for the primary volume, a tracking bitmap indicating which storage elements on the primary volume need to be updated with data from the secondary volume. This tracking bitmap is initialized with values from the secondary out-of-sync bitmap. Upon receiving a write from the secondary volume to a storage element on the primary volume, the method resets the corresponding bit in the tracking bitmap. Upon receiving a write from a host system to a storage element on the primary volume, the method also resets the corresponding bit in the tracking bitmap. A corresponding system and computer program product are also disclosed.

Prioritized data rebuilding in a dispersed storage network based on consistency requirements

A method begins with a processing module transmitting list slice requests to distributed storage network (DSN) storage units regarding storage errors associated with a data segment. The method continues with the processing module receiving list slice response messages and depending on the list slice response messages, determining whether a first threshold priority or a second threshold priority number of error-free EDSs associated with the first data segment has been stored. The method proceeds with the processing module, depending on the number of error-free EDSs associated with the first data segment that have been stored, issuing read slice requests and rebuilding one or more EDSs associated with the data segment.

Systems and methods to organize a computing system having multiple computers
09823985 · 2017-11-21 · ·

A computing system having a plurality of computers connected via a computer network to form a computing entity. Each of the computers operates substantially independent of others. Each of the computers is configured to interrogate network infrastructure of the computer network to determine the identity of the computing entity when the computer is connected to the computer network and thus join the computing entity by announcing its presence in the computing entity. Each of the computers is configured to determine an identifier of the computer in the computing entity based on the connectivity configuration in the network infrastructure and assume a role to perform a portion of operations of a computing request directed to the computing entity over the computer network, based on the presence data of the computers in the entity.

Storage of key-value entries in a distributed storage system
11256717 · 2022-02-22 · ·

A distributed storage system, such as a distributed storage system in a virtualized computing environment, stores data in storage nodes as immutable key-value entries. A coordinator storage node creates a key-value entry and attempts to store the key-value entry in the coordinator storage node and in neighbor storage nodes. If the storage of the key-value entry in the in the coordinator storage node and in the neighbor storage node is successful, the coordinator storage node pushes the key-value entry to other storage nodes in the distributed storage system for storage as replicas.