G06F11/2046

METHODS AND APPARATUS TO INCREASE RESILIENCY IN SELF-HEALING MECHANISMS

Methods, apparatus, systems, and articles of manufacture are disclosed to increase resiliency in self-healing mechanisms. At least one non-transitory machine-readable medium comprises instructions that, when executed, cause at least one processor to at least partition computational resources of a first host into a primary partition and a shadow partition, the primary partition to communicate with a second host, apply a fix for the primary partition, determine if the primary partition can communicate with the second host during the application of the fix, cause, in response to the determination that the primary partition cannot communicate with the second host during the application of the fix, the shadow partition to communicate with the second host; and transfer communication with the second host from the shadow partition to the primary partition, the transfer in response to a determination that the application of the fix is complete.

Information processing system and information processing apparatus
11221926 · 2022-01-11 · ·

An information processing system includes a plurality of information processing apparatuses each of which includes hardware, a control processor, and a switch circuit wherein when a failure of a first control processor in a first information processing apparatus of the plurality of information processing apparatuses is detected, a first switch circuit in the first information processing apparatus is configured to generate a connection of first hardware in the first information processing apparatus to a signal line between the first information processing apparatus and a second information processing apparatus of the plurality of information processing apparatuses, a second switch circuit in the second information processing apparatus is configured to generate a connection of a second control processor in the second information processing apparatus to the signal line, and the second control processor is configured to acquire information transmitted from the first hardware via the signal line.

VIRTUALIZED FILE SERVER

In one embodiment, a system for managing communication connections in a virtualization. environment includes a plurality of host machines implementing a virtualization environment, wherein each of the host machines includes a hypervisor, at least one user virtual machine (user VM), and a distributed file server that includes file server virtual machines (FSVMs) and associated local storage devices. Each FSVM and associated local storage device are local to a corresponding one of the host machines, and the FSVMs conduct I/O transactions with their associated local storage devices based on I/O requests received from the user VMs. Each of the user VMs on each host machine sends each of its representative I/O requests to an FSVM that is selected by one or more of the FSVMs for each I/O request based on a lookup table that maps a storage item referenced by the I/O request to I/O the selected one of the FSVMs.

FAILOVER FOR POOLED MEMORY

An embodiment of an electronic apparatus may comprise one or more substrates, and a controller coupled to the one or more substrates, the controller to allocate a first secure portion of a pooled memory to a first instantiation of an application on a first node, and circuitry coupled to the one or more substrates and the controller, the circuitry to provide a failover interface for a second instantiation of the application on a second node to access the first secure portion of the pooled memory in the event of a failure of the first node. Other embodiments are disclosed and claimed.

VIRTUALIZED FILE SERVER USER VIEWS
20230325173 · 2023-10-12 · ·

In one embodiment, a system for managing a virtualization environment includes a plurality of host machines, wherein each of the host machines comprises a hypervisor and one or more user virtual machines (user VMs), and a virtual machine controller, one or more virtual disks comprising a plurality of storage devices, a virtualized file server (VFS) comprising a plurality of file server virtual machines (FSVMs), wherein each of the FSVMs is running on one of the host machines. The VFS may be configured to receive a request for storage system information from a user and generate and send a response to the request, wherein the response is customized according to configuration information of the VFS that is specific to the user. The storage system information requested may include a total size of storage available to the user, and the user may have an associated storage quota limit.

ADAPTIVE MULTIPATH FABRIC FOR BALANCED PERFORMANCE AND HIGH AVAILABILITY

A computing system providing high-availability access to computing resources includes: a plurality of interfaces; a plurality of sets of computing resources, each of the sets of computing resources including a plurality of computing resources; and at least three switches, each of the switches being connected to a corresponding one of the interfaces via a host link and being connected to a corresponding one of the sets of computing resources via a plurality of resource connections, each of the switches being configured such that data traffic is distributed to remaining ones of the switches through a plurality of cross-connections between the switches if one of the switches fails.

Storage cluster failure detection

Direct monitoring of a plurality of storage nodes in a primary cluster is performed based on connectivity with the storage nodes. Indirect monitoring of a first storage node is performed, in response to direct monitoring of the first storage node indicating failure of the connectivity with the first storage node, wherein a second storage node of the plurality of nodes is a backup node for the first storage node. The indirect monitor of the first storage node indicates failure of the first storage node in response to performance of storage access operations by the second storage node that were previously performed by the first storage node. A cluster-switch operation is initiated to switch to from the primary cluster to a backup cluster based on an occurrence of at least one cluster-failure condition that comprises the indirect monitor of the first storage node indicating failure of the first storage node.

High availability for persistent memory

Techniques for implementing high availability for persistent memory are provided. In one embodiment, a first computer system can detect an alternating current (AC) power loss/cycle event and, in response to the event, can save data in a persistent memory of the first computer system to a memory or storage device that is remote from the first computer system and is accessible by a second computer system. The first computer system can then generate a signal for the second computer system subsequently to initiating or completing the save process, thereby allowing the second computer system to restore the saved data from the memory or storage device into its own persistent memory.

REMOTE DIRECT MEMORY ACCESS (RDMA)-BASED RECOVERY OF DIRTY DATA IN REMOTE MEMORY

Techniques for implementing RDMA-based recovery of dirty data in remote memory are provided. In one set of embodiments, upon occurrence of a failure at a first (i.e., source) host system, a second (i.e., failover) host system can allocate a new memory region corresponding to a memory region of the source host system and retrieve a baseline copy of the memory region from a storage backend shared by the source and failover host systems. The failover host system can further populate the new memory region with the baseline copy and retrieve one or more dirty page lists for the memory region from the source host system via RDMA, where the one or more dirty page lists identify memory pages in the memory region that include data updates not present in the baseline copy. For each memory page identified in the one or more dirty page lists, the failover host system can then copy the content of that memory page from the memory region of the source host system to the new memory region via RDMA.

Disaster recovery for distributed file servers, including metadata fixers

Examples of systems described herein include a virtualized file servers. Examples of virtualized file servers described herein may support disaster recovery of the virtualized file server. Accordingly, examples of virtualized file servers may support metadata fixing procedures to update metadata in a recovery setting. Examples of virtualized file servers may support hypervisor-agnostic disaster recovery.