Patent classifications
G06F11/203
Recovery execution system using programmatic generation of actionable workflows
Programmatic generation of an actionable recovery workflow from data stored inside a Configuration Management Database which may be primarily populated through automated discovery. The programmatic workflow can be sent to an orchestration engine for execution, leveraging underlying automation components.
ADAPTIVE APPLICATION RECOVERY
This disclosure describes techniques for adaptive disaster recovery of applications running on network devices. The techniques include generating an application template and an application template clone that include application attributes usable to deploy an application stack at an application site. The techniques also include sending the application template clone to a disaster recovery site group to await deployment instructions. In some examples, an observer may determine that a health metric of the application site indicates that a disaster recovery process be triggered. A disaster recovery site of the disaster recovery site group may be selected based at least in part on a performance metric. The application stack may be deployed at the disaster recovery site utilizing the application template clone.
RESILIENT AND ADAPTIVE CLOUD PROCESSING OF PARALLEL COMPUTING WORKLOADS
The disclosed technology is generally directed to the processing of parallel computing jobs. In one example of the technology, for at least a first cluster of virtual machines that is assigned to a job, an initial assignment of at least one virtual machine to a parallel-computing job is made. Workers are assigned to tasks associated with the job. Upon failure of a task by one of assigned workers, the failed task is re-submitted. Upon detecting the failure of one of the workers assigned to the job, the failed worker is replaced with a replacement worker. Work associated with the failed worker is re-allocated to the replacement worker. Responsive to removal of a virtual machine assigned to the job, a new virtual machine is assigned to the job. Outputs are provided from the assigned workers.
Transparent checkpointing and process migration in a distributed system
A distributed system for creating a checkpoint for a plurality of processes running on the distributed system. The distributed system includes a plurality of compute nodes with an operating system executing on each compute node. A checkpoint library resides at the user level on each of the compute nodes, and the checkpoint library is transparent to the operating system residing on the same compute node and to the other compute nodes. Each checkpoint library uses a windowed messaging logging protocol for checkpointing of the distributed system. Processes participating in a distributed computation on the distributed system may be migrated from one compute node to another compute node in the distributed system by re-mapping of hardware addresses using the checkpoint library.
Migrating processes operating on one platform to another platform in a multi-platform system
Embodiments of the claimed subject matter are directed to methods and a system that allows the optimization of processes operating on a multi-platform system (such as a mainframe) by migrating certain processes operating on one platform to another platform in the system. In one embodiment, optimization is performed by evaluating the processes executing in a partition operating under a proprietary operating system, determining a collection of processes from the processes to be migrated, calculating a cost of migration for migrating the collection of processes, prioritizing the collection of processes in an order of migration and incrementally migrating the processes according to the order of migration to another partition in the mainframe executing a lower cost (e.g., open-source) operating system.
Managing storage domains, service tiers, and failed service tiers
A system detects failed service tier in cluster of servers, which are controlled by master node to execute applications and store data, in service tiers, which correspond to sets of server performance characteristics, in storage domains, which correspond to server racks, in cluster. The system identifies, by accessing database, applications installed on servers in failed service tier and any affinities that identified applications have for any type of server, any service tier, and/or any storage domain. The system updates, based on current configuration of cluster, identified affinities for identified applications. The system enables, by providing updated affinities for identified applications in database, master node to identify replacement servers, for identified applications, corresponding to set of server performance characteristics and server rack, and install identified applications in replacement servers, thereby enabling replacement servers to substitute for failed service tier and store data.
System and method for assigning memory reserved for high availability failover to virtual machines
Techniques for assigning memory reserved for high availability (HA) failover to virtual machines in high availability (HA) enabled clusters are described. In one embodiment, the memory reserved for HA failover is determined in each host computing system of the HA cluster. Further, the memory reserved for HA failover is assigned to one or more virtual machines in the HA cluster as input/output (I/O) cache memory at a first level.
Cooperative fault tolerance and load balancing
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for cooperative fault tolerance and load balancing. In one aspect, a method includes receiving a request from an entity wherein the request includes metadata specifying a plurality of non-responsive servers to which the entity sent the request but that could not process the request; determining that the data processing apparatus is not a current home server for the entity based on information cached in the data processing apparatus, wherein the current home server is a server within a plurality of preferred servers that processes requests for the entity and, in response thereto: assigning the data processing apparatus as the current home server so that the entity will send subsequent requests to the data processing apparatus for processing; and sending a response to the entity.
PLUG-IN BASED FRAMEWORK TO PROVIDE FAULT TOLERANCE AND HIGH AVAILABILITY IN DISTRIBUTED SYSTEMS
A plug-in based framework provides high availability (HA), including fault tolerance, in a distributed system, such as provided by a virtualized computing environment. The framework uses blueprints that define entities to be monitored, failure conditions, failover actions, restoration actions, and other aspects associated with HA. Microservices execute the blueprints, and a load balancer may balance the execution of the blueprints amongst microservices.
Application migration between environments
A data management and storage (DMS) cluster of peer DMS nodes manages migration of an application between a primary compute infrastructure and a secondary compute infrastructure. The secondary compute infrastructure may be a failover environment for the primary compute infrastructure. Primary snapshots of virtual machines of the application in the primary compute infrastructure are generated, and provided to the secondary compute infrastructure. During a failover, the primary snapshots are deployed in the secondary compute infrastructure as virtual machines. Secondary snapshots of the virtual machines are generated, where the secondary snapshots are incremental snapshots of the primary snapshots. In failback, the secondary snapshots are provided to the primary compute infrastructure, where they are combined with the primary snapshots into construct a current state of the application, and the application is deployed in the current state by deploying virtual machines on the primary compute infrastructure.