G06F11/2046

System and method for handling multi-node failures in a disaster recovery cluster

A system and method for handling multi-node failures in a disaster recovery cluster is provided. In the event of an error condition, a switchover operation occurs from the failed nodes to one or more surviving nodes. Data stored in non-volatile random access memory is recovered by the surviving nodes to bring storage objects, e.g., disks, aggregates and/or volumes into a consistent state.

LOG-STRUCTURED FORMATS FOR MANAGING ARCHIVED STORAGE OF OBJECTS

Solutions for managing archived storage include receiving, at a first node, a snapshot comprising object data (e.g., a virtual machine disk snapshot) from a second node (e.g., a software defined data center), and storing the snapshot in a tiered structure that includes a data tier and a metadata tier. Snapshots may be used for fail-over operations and/or backups, to support disaster recovery. The data tier comprises a log-structured file system (LFS), and the metadata tier comprises a content addressable storage (CAS) identifying addresses within the LFS. The metadata tier also comprises a logical layer indicating content in the CAS. Segment cleaning of the data tier is performed using a segment usage table (SUT). Some examples include performing a fail-over operation from the second node to a third node using at least the stored snapshot for workload recovery. In some examples, the CAS comprises a log-structured merge-tree (LSM-tree).

Communicating health status when a management console is unavailable for a server in a mirror storage environment

Provided are a computer program product, system, and method for communicating health status when a management console is unavailable for a server in a mirror storage environment. A determination at a first server is made that a management console is unavailable over the console network. The first server determines a health status at the first server and the first storage in response to determining that the management console cannot be reached over the console network. The health status indicates whether there are errors or no errors at the first server and the first storage. The first server transmits the determined health status to the second server over a mirroring network mirroring data between the first storage and a second storage managed by the second server. The determined health status is forwarded to an administrator.

Self-healing virtualized file server

In one embodiment, a system for managing a virtualization environment comprises a plurality of host machines, one or more virtual disks comprising a plurality of storage devices, a virtualized file server (VFS) comprising a plurality of file server virtual machines (FSVMs), wherein each of the FSVMs is running on one of the host machines and conducts I/O transactions with the one or more virtual disks, and a virtualized file server self-healing system configured to identify one or more corrupt units of stored data at one or more levels of a storage hierarchy associated with the storage devices, wherein the levels comprise one or more of file level, filesystem level, and storage level, and when data corruption is detected, cause each FSVM on which at least a portion of the unit of stored data is located to recover the unit of stored data.

Information processing device and resource allocation method
09792142 · 2017-10-17 · ·

A device includes a storage which has stored therein setting information that specifies, for each virtual machine to be created, the number of arithmetic processing unit cores that have to be allocated to a virtual machine, and group information that represents a plurality of virtual machines operating in cooperation as a group, from among the virtual machines represented by the setting information, and a virtual machine monitor, when a first virtual machine has been created, from among the virtual machines represented by the setting information, which refers to the setting information and the group information so as to allocate as many arithmetic processing unit cores as the setting information specifies to the first virtual machine, according to a rule that takes account of a decrease in operation performance of all the operable virtual machines that is associated with a failure occurring in any of the arithmetic processing units.

ARBITRATION PROCESSING METHOD AFTER CLUSTER BRAIN SPLIT, QUORUM STORAGE APPARATUS, AND SYSTEM
20170293613 · 2017-10-12 · ·

The present disclosure discloses an arbitration processing solution when brain split occurs in cluster. The solution includes: receiving, by a quorum storage apparatus within a first refresh packet detection period, first master quorum node preemption requests sent by at least two quorum nodes in the cluster; sending, by the quorum storage apparatus, a first master quorum node preemption success response message to the initial master quorum node indicating that the initial master quorum node succeeds in master quorum node preemption when the first master quorum node preemption requests received within the first refresh packet detection period comprise the master quorum node preemption request sent by the initial master quorum node.

System and method for redundant object storage

Systems and methods for redundant object storage are disclosed. A method may include storing at least two copies of each of a plurality of objects among a plurality of nodes communicatively coupled to one another in order to provide redundancy of each of the plurality of objects in the event of a fault of one of the plurality of nodes. The method may also include monitoring access to each object to determine a frequency of access for each object. The method may additionally include redistributing one or more of the copies of the objects such that at least one particular node of the plurality of nodes includes copies of only objects accessed at a frequency below a predetermined frequency threshold based on the determined frequency of access for each object. The method may further include placing the at least one particular node in a reduced-power mode.

Data storage with virtual appliances
09747176 · 2017-08-29 · ·

A data storage system has at least two universal nodes each having CPU resources, memory resources, network interface resources, and a storage virtualizer. A system controller communicates with all of the nodes. Each storage virtualizer in each universal node is allocated by the system controller a number of storage provider resources that it manages. The system controller maintains a map for dependency of virtual appliances to storage providers, and the storage virtualizer provides storage to its dependent virtual appliances either locally or through a network protocol (N_IOC, S_IOC) to another universal node. The storage virtualizer manages storage providers and is tolerant to fault conditions. The storage virtualizer can migrate from any one universal node to any other universal node.

Self healing cluster of a content management system

Systems and methods herein provide for a clustered content management comprising at least two computing nodes. A first node comprises an instance of the content repository. The first computing node may perform content management operations on its instance of the content repository. Changes to the instance of the content repository of the first computing node are synchronized with the content repository by way of a second computing node. The second computing node is communicatively coupled to the first computing node through a network and is operable to synchronize the change with the content repository. The second computing node also determines that synchronization of the change is blocked due to an error. The second computing node identifies the error, determines that the error is correctable, and corrects the error to synchronize the change with the content repository.

Automated restart of paused virtual machines due to input/output errors
11243855 · 2022-02-08 · ·

An apparatus includes a storage device of a host computing device. The storage device is to store a virtualization manager. The apparatus also includes a processing device of the host computing device and operatively coupled to the storage device. The processing device is to determine that a first instance of a virtual machine on a first host computing device is paused based on an error associated with a connection to a storage device of the first host computing device, determine whether the second host computing device has access to the storage device of the first host computing device, instantiate a second instance of the virtual machine on the second host computing device when the second host computing device is determined to have access to the storage device of the first host computing device, and to stop the first instance of the virtual machine on the first host computing device.