Patent classifications
G06F11/2048
Fast single-master failover
Techniques for switching mastership from one service in a first data center to a second (redundant) service in a second data center are provided. A service coordinator in the first data center is notified about the master switch. The service coordinator notifies each instance of the first service that the first service is not a master. Each instance responds with an acknowledgement. After it is confirmed that all instances of the first service have responded with an acknowledgement, a client coordinator in the first and/or second data center is updated to indicate that the second service is the master so that clients may send requests to the second service. Also, a service coordinator in the second data center is notified that the second service is the master. The service coordinator notifies each instance of the second service that the second service is the master. Each instance responds with an acknowledgement.
Storage cluster failure detection
Direct monitoring of a plurality of storage nodes in a primary cluster is performed based on connectivity with the storage nodes. Indirect monitoring of a first storage node is performed, in response to direct monitoring of the first storage node indicating failure of the connectivity with the first storage node, wherein a second storage node of the plurality of nodes is a backup node for the first storage node. The indirect monitor of the first storage node indicates failure of the first storage node in response to performance of storage access operations by the second storage node that were previously performed by the first storage node. A cluster-switch operation is initiated to switch to from the primary cluster to a backup cluster based on an occurrence of at least one cluster-failure condition that comprises the indirect monitor of the first storage node indicating failure of the first storage node.
OPERATING A DATA CENTER
In an approach, a primary data center is provided including primary source and primary target database systems, where a function is activated causing the primary target database system to: include a copy of data and receive analysis queries from the primary source database system; and execute the analysis queries on data. A processor, in response to detecting a failure in the primary source database system: offloads queries intended for the primary source database system to a secondary source database system of a secondary data center also including a secondary target database system and a copy of data, where the function is deactivated. A processor, responsive to the primary target database system being available: receives analysis queries, processed by the secondary source database system, of the offloaded queries; and copies data to the secondary target database system. A processor causes the function to be activated in the secondary data center.
METHODS AND SYSTEMS FOR A NON-DISRUPTIVE PLANNED FAILOVER FROM A PRIMARY COPY OF DATA AT A PRIMARY STORAGE SYSTEM TO A MIRROR COPY OF THE DATA AT A CROSS-SITE SECONDARY STORAGE SYSTEM WITHOUT USING AN EXTERNAL MEDIATOR
Systems and methods are described for a non-disruptive planned failover from a primary copy of data at a primary storage cluster to a mirror copy of the data at a cross-site secondary storage cluster without using an external mediator. According to an example, a planned failover feature of a multi-site distributed storage system provides an order of operations such that a primary copy of a first data center continues to serve I/O operations until a mirror copy of a second data center is ready. This planned failover feature improves functionality and efficiency of the distributed storage system by providing non-disruptiveness during planned failover without using an external mediator based on a primary storage cluster being selected as an authority to implement a state machine with a persistent configuration database to track a planned failover state for the planned failover.
Automated disaster recovery system and method
Methods and systems for recovering a host image of a client machine to a recovery machine comprise comparing a profile of a client machine of a first type to be recovered to a profile of a recovery machine of a second type different from the first type, to which the client machine is to be recovered, by a first processing device. The first and second profiles each comprise at least one property of the first type of client machine and the second type of recovery machine, respectively. At least one property of a host image of the client machine is conformed to at least one corresponding property of the recovery machine. The conformed host image is provided to the recovery machine, via a network. The recovery machine is configured with at least one conformed property of the host image by a second processing device of the recovery machine.
Byzantine agreement using communications having linear complexity
In some embodiments, a method receives a share of a signature of a decision block from at least a portion of the plurality of replicas. The share of the signature being generated when a respective replica signs the decision block and the decision block includes a set of requests from a client for a service. A combined signature is created based on the share of the signature block from at least the portion of the plurality of replicas. The method broadcasts a message that includes the combined signature to the plurality of replicas. The plurality of replicas use the combined signature to determine whether to process the decision block for the service.
Container image arrangement method and non-transitory computer-readable medium
A container image arrangement method executed by a processor included in a computer to execute a process, the computer being connected to each of a plurality of nodes, the process includes, identifying a first node that has a first storage storing a container image and has a largest number of containers started from the container image among the plurality of nodes, determining whether the container operating in the first node is capable of starting in a second node among the plurality of nodes other than the first node, where the second has a second storage storing the container image, and storing the container image in a third storage included in a third node different from each of the first node and the second node among the plurality of the nodes when it is determined that the container is not capable of starting in the second node.
PLUG-IN BASED FRAMEWORK TO PROVIDE FAULT TOLERANCE AND HIGH AVAILABILITY IN DISTRIBUTED SYSTEMS
A plug-in based framework provides high availability (HA), including fault tolerance, in a distributed system, such as provided by a virtualized computing environment. The framework uses blueprints that define entities to be monitored, failure conditions, failover actions, restoration actions, and other aspects associated with HA. Microservices execute the blueprints, and a load balancer may balance the execution of the blueprints amongst microservices.
Application migration between environments
A data management and storage (DMS) cluster of peer DMS nodes manages migration of an application between a primary compute infrastructure and a secondary compute infrastructure. The secondary compute infrastructure may be a failover environment for the primary compute infrastructure. Primary snapshots of virtual machines of the application in the primary compute infrastructure are generated, and provided to the secondary compute infrastructure. During a failover, the primary snapshots are deployed in the secondary compute infrastructure as virtual machines. Secondary snapshots of the virtual machines are generated, where the secondary snapshots are incremental snapshots of the primary snapshots. In failback, the secondary snapshots are provided to the primary compute infrastructure, where they are combined with the primary snapshots into construct a current state of the application, and the application is deployed in the current state by deploying virtual machines on the primary compute infrastructure.
Extending a database recovery point at a disaster recovery site
A DBA may pre-generate database recovery jobs on a convenient schedule at a local site, then recover a database at a disaster recovery site. Archive log files for the database that are generated in the interim between recovery job generation and recovery job execution are automatically incorporated into the recovery job when it executes, extending the recovery point closer to the time of the disruption that triggered the need or desire for recovery.