Patent classifications
G06F11/2046
Transparently migrating a storage object between nodes in a clustered storage system
A storage object is migrated between nodes by a source node automatically verifying that another node is configured to service the storage object and changing ownership of the storage object based on the verifying. A cluster manager for the clustered storage system receives a request and provides the request to the source which owns the storage object. The source verifies that the destination is configured according to a predetermined configuration for servicing the storage object. Based on the verifying, the source offlines the storage object and updates ownership information of the storage object, thereafter allowing the destination to online the storage object. The cluster manager further provides the updated ownership information to all the nodes in the cluster, so an access request intended for the storage object may be received by any node and forwarded to the destination using the updated ownership information to effect a transparent migration.
Distributed workload reassignment following communication failure
A generation identifier is employed with various systems and methods in order to identify situations where a workload has been reassigned to a new node and where a workload is still being processed by an old node during a failure between nodes. A master node may assign a workload to a worker node. The worker node sends a request to access target data. The request may be associated with a generation identifier and workload identifier that identifies the node and workload. At some point, a failure occurs between the master node and worker node. The master node reassigns the workload to another worker node. The new worker node accesses the target data with a different generation identifier, indicating to the storage system that the workload has been reassigned. The old worker node receives an indication from the storage system that the workload has been reassigned and stops processing the workload.
Storage cluster failure detection
Direct monitoring of a plurality of storage nodes in a primary cluster is performed based on connectivity with the storage nodes. Indirect monitoring of a first storage node is performed, in response to direct monitoring of the first storage node indicating failure of the connectivity with the first storage node, wherein a second storage node of the plurality of nodes is a backup node for the first storage node. The indirect monitor of the first storage node indicates failure of the first storage node in response to performance of storage access operations by the second storage node that were previously performed by the first storage node. A cluster-switch operation is initiated to switch to from the primary cluster to a backup cluster based on an occurrence of at least one cluster-failure condition that comprises the indirect monitor of the first storage node indicating failure of the first storage node.
HIGH RELIABILITY FAULT TOLERANT COMPUTER ARCHITECTURE
A fault tolerant computer system and method are disclosed. The system may include a plurality of CPU nodes, each including: a processor and a memory; at least two IO domains, wherein at least one of the IO domains is designated an active IO domain performing communication functions for the active CPU nodes; and a switching fabric connecting each CPU node to each IO domain. One CPU node is designated a standby CPU node and the remainder are designated as active CPU nodes. If a failure, a beginning of a failure, or a predicted failure occurs in an active node, the state and memory of the active CPU node are transferred to the standby CPU node which becomes the new active CPU node. If a failure occurs in an active IO domain, the communication functions performed by the failing active IO domain are transferred to the other IO domain.
DISTRIBUTED STORAGE SYSTEM
A distributed storage system includes a plurality of host servers including a primary compute node and backup compute nodes for processing first data having a first identifier, and a plurality of storage nodes that communicates communicate with the plurality of compute nodes, and includes a plurality of storage volumes. The plurality of storage volumes include a primary storage volume and backup storage volumes for storing the first data. The primary compute node provides a replication request for the first data to a primary storage node providing the primary storage volume, when a write request for the first data is received and the primary storage node stores, based on the replication request, the first data in the primary storage volume, copies the first data to the backup storage volumes, and provides, to the primary compute node, a completion acknowledgement to the replication request.
Transparent checkpointing and process migration in a distributed system
A distributed system for creating a checkpoint for a plurality of processes running on the distributed system. The distributed system includes a plurality of compute nodes with an operating system executing on each compute node. A checkpoint library resides at the user level on each of the compute nodes, and the checkpoint library is transparent to the operating system residing on the same compute node and to the other compute nodes. Each checkpoint library uses a windowed messaging logging protocol for checkpointing of the distributed system. Processes participating in a distributed computation on the distributed system may be migrated from one compute node to another compute node in the distributed system by re-mapping of hardware addresses using the checkpoint library.
Resource arbitration for shared-write access via persistent reservation
Described is a technology by which an owner node in a server cluster maintains ownership of a storage mechanism through a persistent reservation mechanism, while allowing non-owning nodes read and write access to the storage mechanism. An owner node writes a reservation key to a registration table associated with the storage mechanism. Non-owning nodes write a shared key that gives them read and write access. The owner node validates the shared keys against cluster membership data, and preempts (e.g., removes) any key deemed not valid. The owner node also defends ownership against challenges to ownership made by other nodes, so that another node can take over ownership if a (formerly) owning node is unable to defend, e.g., because of a failure.
System and method for assigning memory reserved for high availability failover to virtual machines
Techniques for assigning memory reserved for high availability (HA) failover to virtual machines in high availability (HA) enabled clusters are described. In one embodiment, the memory reserved for HA failover is determined in each host computing system of the HA cluster. Further, the memory reserved for HA failover is assigned to one or more virtual machines in the HA cluster as input/output (I/O) cache memory at a first level.
REMOTE DIRECT MEMORY ACCESS (RDMA)-BASED RECOVERY OF DIRTY DATA IN REMOTE MEMORY
Techniques for implementing RDMA-based recovery of dirty data in remote memory are provided. In one set of embodiments, upon occurrence of a failure at a first (i.e., source) host system, a second (i.e., failover) host system can allocate a new memory region corresponding to a memory region of the source host system and retrieve a baseline copy of the memory region from a storage backend shared by the source and failover host systems. The failover host system can further populate the new memory region with the baseline copy and retrieve one or more dirty page lists for the memory region from the source host system via RDMA, where the one or more dirty page lists identify memory pages in the memory region that include data updates not present in the baseline copy. For each memory page identified in the one or more dirty page lists, the failover host system can then copy the content of that memory page from the memory region of the source host system to the new memory region via RDMA.
VIRTUALIZED FILE SERVER DISASTER RECOVERY
In one embodiment, a system for managing a virtualization environment includes a set of host machines, each of which includes a hypervisor, virtual machines, and a virtual machine controller, and a virtualized file server backup system configured to identify backup data, wherein the backup data comprises data stored on the virtual disks and VFS configuration information, and the first data is identified in accordance with a backup policy, send the backup data to one or more remote sites for storage, and, in response to detection of changes in the backup data, send the changes to the remote sites in accordance with a replication policy. The backup data may be identified based on a protection domain associated with the backup policy. The data stored on the VFS may include one or more storage objects. The storage objects may include shares, groups of shares, files, or directories.