Patent classifications
G06F11/2033
Procedure for managing a failure in a network of nodes based on a local strategy
Disclosed is a failure management method in a network of nodes, including, for each considered node: first, a step of locally saving the state of this considered node, to a storage medium for this node in question. Then, if the considered node has failed, retrieving the local backup of the state of this considered node, by redirecting the link between the considered node and its storage medium to connect this storage medium to an operational node other than the considered node, this operational node already in the process of carrying out this calculation, the local backups of these considered nodes, used for the retrieving steps being coherent with each other so as to correspond to the same state of calculation. If a considered node failed, returning this local backup for this considered node to a new additional node added to the network at the time of the failure.
Self-descriptive orchestratable modules in software-defined industrial systems
Various systems and methods are provided for implementing a software defined industrial system. In an example, self-descriptive control applications and software modules are provided in the context of orchestratable distributed systems. The self-descriptive control applications may be executed by an orchestrator or like control device, configured to: identify available software modules adapted to perform functional operations in a control system environment; identify operational characteristics that identify characteristics of execution of the available software modules that are available to implement a control system application; select a software module for execution based on the operational configuration and the operational characteristics identified in the manifest; and cause the execution of the selected software module in the control system environment based on an application specification for the control system application.
Methods and Systems for Rapid Failure Recovery for a Distributed Storage System
Methods and systems are provided for rapid failure recovery for a distributed storage system for failures by one or more nodes.
Virtualized file server user views
In one embodiment, a system for managing a virtualization environment includes a plurality of host machines, wherein each of the host machines comprises a hypervisor and one or more user virtual machines (user VMs), and a virtual machine controller, one or more virtual disks comprising a plurality of storage devices, a virtualized file server (VFS) comprising a plurality of file server virtual machines (FSVMs), wherein each of the FSVMs is running on one of the host machines. The VFS may be configured to receive a request for storage system information from a user and generate and send a response to the request, wherein the response is customized according to configuration information of the VFS that is specific to the user. The storage system information requested may include a total size of storage available to the user, and the user may have an associated storage quota limit.
FAULT TOLERANCE USING SHARED MEMORY ARCHITECTURE
Examples provide a fault tolerant virtual machine (VM) using pooled memory. When fault tolerance is enabled for a VM, a primary VM is created on a first host in a server cluster. A secondary VM is created on a second host in the server cluster. Memory for the VMs is maintained on a shared partition in pooled memory. The pooled memory is accessible to all hosts in the cluster. The primary VM has read and write access to the VM memory in the pooled memory. The secondary VM has read-only access to the VM memory. If the second host fails, a new secondary VM is created on another host in the cluster. If the first host fails, the secondary VM becomes the new primary VM and a new secondary VM is created on another host in the cluster.
MANAGING HEALTH CONDITIONS TO DETERMINE WHEN TO RESTART REPLICATION AFTER A SWAP TRIGGERED BY A STORAGE HEALTH EVENT
Provided are a computer program product, system, and method for managing health conditions to determine when to restart replication after a swap triggered by a storage health event. A determination is made of a health condition with respect to access to a first storage that triggers a swap operation. The swap operation redirects host Input/Output (I/O) requests to data from a first server to a second server in response to determining the health condition. After the swap operation the I/O requests are directed to the second server and a second storage. The second server is instructed to mirror data in the second storage to the first server to store in the first storage in response to determining that the health condition is resolved.
AFTER SWAPPING FROM A FIRST STORAGE TO A SECOND STORAGE, MIRRORING DATA FROM THE SECOND STORAGE TO THE FIRST STORAGE FOR DATA IN THE FIRST STORAGE THAT EXPERIENCED DATA ERRORS
Provided are a computer program product, system, and method for after swapping from a first storage to a second storage, mirroring data from the second storage to the first storage for data in the first storage that experienced data errors. A swap operation redirects host Input/Output (I/O) requests to data from the first server to the second server in response to a health condition at the first server. A determination is made of data errors with respect to data in the first storage that experienced data errors. The second server is instructed to mirror data in the second storage to the first server including data for the data in the first storage that experienced the data errors to store in the first storage in response to determining that the first server is available for the data mirroring operations.
Proactive resource reservation for protecting virtual machines
A system for proactive resource reservation for protecting virtual machines. The system includes a cluster of hosts, wherein the cluster of hosts includes a master host, a first slave host, and one or more other slave hosts, and wherein the first slave host executes one or more virtual machines thereon. The first slave host is configured to identify a failure that impacts an ability of the one or more virtual machines to provide service, and calculate a list of impacted virtual machines. The master host is configured to receive a request to reserve resources on another host in the cluster of hosts to enable the impacted one or more virtual machines to failover, calculate a resource capacity among the cluster of hosts, determine whether the calculated resource capacity is sufficient to reserve the resources, and send an indication as to whether the resources are reserved.
Transparently migrating a storage object between nodes in a clustered storage system
A storage object is migrated between nodes by a source node automatically verifying that another node is configured to service the storage object and changing ownership of the storage object based on the verifying. A cluster manager for the clustered storage system receives a request and provides the request to the source which owns the storage object. The source verifies that the destination is configured according to a predetermined configuration for servicing the storage object. Based on the verifying, the source offlines the storage object and updates ownership information of the storage object, thereafter allowing the destination to online the storage object. The cluster manager further provides the updated ownership information to all the nodes in the cluster, so an access request intended for the storage object may be received by any node and forwarded to the destination using the updated ownership information to effect a transparent migration.
Control system for power control
A power control system for saving power by powering on enough application servers to satisfy the current load workload as well as any required reserve capacity based on administrative settings is disclosed. As the load increases, more servers are powered on. As the load decreases some servers are powered off. The power control system provides a reasonable end user experience at the least cost based on power consumption of the servers.