Patent classifications
G06F11/2069
FAULT TOLERANCE USING SHARED MEMORY ARCHITECTURE
Examples provide a fault tolerant virtual machine (VM) using pooled memory. When fault tolerance is enabled for a VM, a primary VM is created on a first host in a server cluster. A secondary VM is created on a second host in the server cluster. Memory for the VMs is maintained on a shared partition in pooled memory. The pooled memory is accessible to all hosts in the cluster. The primary VM has read and write access to the VM memory in the pooled memory. The secondary VM has read-only access to the VM memory. If the second host fails, a new secondary VM is created on another host in the cluster. If the first host fails, the secondary VM becomes the new primary VM and a new secondary VM is created on another host in the cluster.
MANAGING HEALTH CONDITIONS TO DETERMINE WHEN TO RESTART REPLICATION AFTER A SWAP TRIGGERED BY A STORAGE HEALTH EVENT
Provided are a computer program product, system, and method for managing health conditions to determine when to restart replication after a swap triggered by a storage health event. A determination is made of a health condition with respect to access to a first storage that triggers a swap operation. The swap operation redirects host Input/Output (I/O) requests to data from a first server to a second server in response to determining the health condition. After the swap operation the I/O requests are directed to the second server and a second storage. The second server is instructed to mirror data in the second storage to the first server to store in the first storage in response to determining that the health condition is resolved.
Proactive resource reservation for protecting virtual machines
A system for proactive resource reservation for protecting virtual machines. The system includes a cluster of hosts, wherein the cluster of hosts includes a master host, a first slave host, and one or more other slave hosts, and wherein the first slave host executes one or more virtual machines thereon. The first slave host is configured to identify a failure that impacts an ability of the one or more virtual machines to provide service, and calculate a list of impacted virtual machines. The master host is configured to receive a request to reserve resources on another host in the cluster of hosts to enable the impacted one or more virtual machines to failover, calculate a resource capacity among the cluster of hosts, determine whether the calculated resource capacity is sufficient to reserve the resources, and send an indication as to whether the resources are reserved.
SYSTEMS AND METHODS FOR SUPPORT LOG CACHE DEVICE REMOVAL THROUGH STANDARD USER INTERFACES
Aspects of the present disclosure involve systems and methods for removes and/or adding log and/or cache devices to storage pools of a storage appliance. Users, via a graphical-user interface, identify the log and/or cache devices for removal or addition. Subsequently, the log and/or cache devices are moved, according to a data profile corresponding to the devices, from a first storage appliance to a second storage appliance.
LIFECYCLE MANAGEMENT OF VIRTUAL INFRASTRUCTURE MANAGEMENT SERVER APPLIANCE
A method of upgrading a VIM server appliance includes: creating a snapshot of logical volumes mapped to physical volumes that store configuration and database files of virtual infrastructure management (VIM) services provided by a first VIM server appliance to be upgraded; after the snapshot is created, expanding the configuration and database files to be compatible with a second VIM server appliance; replicating the logical volumes which have been modified as a result of expanding the configuration and database files, in the second VIM server appliance; after replication, performing a switchover of VIM services that are provided, from the first VIM server appliance to the second VIM server appliance; and upon failure of any of the steps of expanding, replicating, and performing the switchover, aborting the upgrade, and reverting to a version of the configuration and database files that was preserved by creating the snapshot.
TECHNIQUES FOR AVOIDING AND REDUCING DATA UNAVAILABILITY
A clustered pair of storage systems configured for active-active bidirectional synchronous replication expose a stretched volume over paths to both storage systems. Writes to the stretched volume received at each system are replicated to the peer system. The cluster can use a time-to-live (TTL) mechanism by which a non-preferred system continuously requests a TTL grant from the preferred system to remain in the cluster. Algorithms that reduce or avoid data unavailability are described and can include assessing the health of the systems in the cluster. An unhealthy system can trigger a one-sided polarization algorithm to notify the peer system that it is polarization winner. An improved polarization technique using a witness to decide the polarization winner includes a system adding a time delay before contacting the witness if the system is unhealthy. A control component can detect an unhealthy system and disable the active-active bidirectional synchronous replication.
OPTIMIZED RECOVERY IN DATA REPLICATION ENVIRONMENTS
A method for optimizing recovery in a data replication environment is disclosed. In one embodiment, such a method includes directing I/O from a primary site to a secondary site in response to a failure at the primary site. After the primary site has recovered from the failure, the method initiates a recovery process wherein updated data elements at the secondary site are copied to the primary site. The method determines a recorded average I/O latency for a host system driving I/O to the secondary site, and calculates an expected average I/O latency for the host system driving I/O to the primary site. The method redirects I/O from the secondary site to the primary site when a difference between the expected average I/O latency and the recorded average I/O latency reaches a threshold value. A corresponding system and computer program product are also disclosed.
METHODS AND SYSTEMS FOR A NON-DISRUPTIVE PLANNED FAILOVER FROM A PRIMARY COPY OF DATA AT A PRIMARY STORAGE SYSTEM TO A MIRROR COPY OF THE DATA AT A CROSS-SITE SECONDARY STORAGE SYSTEM WITHOUT USING AN EXTERNAL MEDIATOR
Systems and methods are described for a non-disruptive planned failover from a primary copy of data at a primary storage cluster to a mirror copy of the data at a cross-site secondary storage cluster without using an external mediator. According to an example, a planned failover feature of a multi-site distributed storage system provides an order of operations such that a primary copy of a first data center continues to serve I/O operations until a mirror copy of a second data center is ready. This planned failover feature improves functionality and efficiency of the distributed storage system by providing non-disruptiveness during planned failover without using an external mediator based on a primary storage cluster being selected as an authority to implement a state machine with a persistent configuration database to track a planned failover state for the planned failover.
Switching between fault response models in a storage system
A storage system switching between mediation models within a storage system, where the switching between mediation models includes: determining, among one or more of the plurality of storage systems, a change in availability of a mediator service, wherein one or more of the plurality of storage systems are configured to request mediation from the mediator service in response to a fault; and communicating, among the plurality of storage systems and responsive to determining the change in availability of the mediator service, a fault response model to be used as an alternate to the mediator service among one or more of the plurality of storage systems.
METHODS FOR FLEXIBLE DATA-MIRRORING TO IMPROVE STORAGE PERFORMANCE DURING MOBILITY EVENTS AND DEVICES THEREOF
A method, device, and non-transitory computer readable medium for minoring data, comprising, selecting, based on a plurality of data attributes, a portion of local data in a local storage device for minoring to a remote storage device and copying the selected portion of the local data to at least one cache memory of the remote storage device. Next a determination of when a failover event has occurred in the local storage device is made, wherein the failover event comprises an event in which the local data in the local storage device is inaccessible to a client computing device when the client computing device attempts to access the local data from the local storage device. A copy of the local data from the cache memory in the remote storage device is retrieved when the failover event is determined to have occurred.