Patent classifications
G06F11/203
RELIABILITY AVAILABILITY SERVICEABILITY (RAS) SERVICE FRAMEWORK
Examples described herein relate to execution of multiple Reliability Availability Serviceability (RAS) processes on different processors of the at least two processors to provide fallback from a first RAS process to a second RAS process executing on a processor of the at least two processors based on failure or timeout of the first RAS process. In some examples, the different processors comprise independently operating processors whereby failure or inoperability of one of the different processors is independent of another of the different processors. In some examples, failure or timeout of the first RAS process comprises failure of the second RAS process to receive an operating status signal from the first RAS process.
FAILOVER AND FAILBACK OF DISTRIBUTED FILE SERVERS
An example file server manager updates a selected share of a destination distributed file server based on a snapshot of at least a portion of a selected share of a source distributed file server. The selected share of the destination distributed file server is updated while the source distributed file server serves client requests for storage items of the selected share of the source distributed file server. The file server manager receives a request to failover from the source distributed file server to the destination distributed file server and configures the destination distributed file server to service read and write requests for storage items of the selected share of the destination distributed file server. The file server manager further redirects client requests for storage items of the selected share of the source distributed file server to the destination distributed file server by updating active directory information.
High reliability fault tolerant computer architecture
A fault tolerant computer system and method are disclosed. The system may include a plurality of CPU nodes, each including: a processor and a memory; at least two IO domains, wherein at least one of the IO domains is designated an active IO domain performing communication functions for the active CPU nodes; and a switching fabric connecting each CPU node to each IO domain. One CPU node is designated a standby CPU node and the remainder are designated as active CPU nodes. If a failure, a beginning of a failure, or a predicted failure occurs in an active node, the state and memory of the active CPU node are transferred to the standby CPU node which becomes the new active CPU node. If a failure occurs in an active IO domain, the communication functions performed by the failing active IO domain are transferred to the other IO domain.
Transparent checkpointing and process migration in a distributed system
A distributed system for creating a checkpoint for a plurality of processes running on the distributed system. The distributed system includes a plurality of compute nodes with an operating system executing on each compute node. A checkpoint library resides at the user level on each of the compute nodes, and the checkpoint library is transparent to the operating system residing on the same compute node and to the other compute nodes. Each checkpoint library uses a windowed messaging logging protocol for checkpointing of the distributed system. Processes participating in a distributed computation on the distributed system may be migrated from one compute node to another compute node in the distributed system by re-mapping of hardware addresses using the checkpoint library.
Live migrating virtual machines to a target host upon fatal memory errors
The disclosed technology provides techniques, systems, and apparatus for containing and recovering from uncorrectable memory errors in distributed computing environment through migration of virtual machines and associated memory to a target host machine. An aspect of the disclosed technology includes a hypervisor or virtual machine manager that receives signaling of an uncorrectable memory error detected by a host machine. The virtual machine manager then uses information received via the signaling to identify virtual memory addresses or memory pages associated with the corrupted memory element so as to allow for containment and recovery from the error, and for live migration of the virtual machine.
METHOD FOR DATA RECONTRUCTION IN A RAID SYSTEM HAVING A PROTECTION POOL OF STORAGE UNITS
A method of performing a reconstruction of data in a redundant array of independent disks (RAID) system with a protection pool of storage units includes receiving a request to perform a reconstruction of a first set of physical extents stored on a first physical disk of a set of physical disks. Each physical extent of the first set of physical extents is associated with an array of a second set of physical extents. The second set of physical extents is distributed across the set of physical disks. The method further includes allocating a third set of physical extents on one or more physical disks of the set of physical disks other than the first physical disk, and distributing data from each of the first set of physical extents of the first physical disk to a corresponding physical extent of the third set of physical extents.
NETWORK VIRTUALIZATION POLICY MANAGEMENT SYSTEM
Concepts and technologies are disclosed herein for providing a network virtualization policy management system. An event relating to a service can be detected. A first policy that defines allocation of hardware resources to host the virtual network functions can be obtained, as can a second policy that defines deployment of the virtual network functions to the hardware resources. The hardware resources can be allocated based upon the first policy and the virtual network functions can be deployed to the hardware resources based upon the second policy.
HOST SYSTEM, PROCESS, OBJECT, SELF-DETERMINATION APPARATUS, AND HOST DEVICE
A method including executing a portion of a service which is part of at least one service provided by a system including a distributed computing platform; determining object capability parameters required to perform the executing; storing information about at least one target host device; generating an announcement message reporting presence of a service type and the object capability parameters; receiving information from other announcement messages; evaluating current host device capability parameters with respect to the object capability parameters; determining when the current host device capability parameters meet a criterion; initiating a migration request message from the object for migration of the object, the object including software code and processing instructions and service function instructions, the migration to a target object host device, when the module capability parameters meet a criterion; and managing the migration of the object to the target host device.
Storage system and control method therefor
Each redundancy group is constituted by one active program (storage control software of the active program) and N standby programs (N is an integer of two or more). Each of the N standby programs is associated with a priority to be determined as a failover (FO) destination. In the same redundancy group, FO is performed from the active program to the standby program based on the priority. For the plurality of pieces of storage control software including the active programs and the standby programs that change to be active by FO in the plurality of redundancy groups arranged in the same node, standby storage control software that can set each of the programs as a FO destination are arranged in different nodes.
Methods, apparatuses and systems for configuring a network environment for a server
Methods, apparatuses and systems for cloud-based disaster recovery are provided. The method, for example, includes receiving, at a cloud-based computing platform, first internet protocol (IP) information relating to a first network environment associated with a server used by a client machine; translating the first IP information, without having to interpose a camouflage layer into the first IP information, and generating second IP information based on the translated first IP information, the second IP information used for creating a second network environment for the server; creating the second network environment for the server; and deploying the server in the created second environment.