G06F11/2041

Cluster recovery manager to remediate failovers

Example implementations relate to management of clusters. A cluster recovery manager may comprise a processing resource; and a memory resource storing machine-readable instructions to cause the processing resource to: adjust, based on a monitored degree of performance of a controller of a controller cluster, a state of the controller to one of a first state and a second state; and reassign a corresponding portion of a plurality of APs managed by the controller periodically to a different controller until the state of the controller is determined to be adjustable to the first state. The reassignment can be triggered responsive to a state adjustment of the controller from the first state to the second state.

Event-driven system failover and failback
11636013 · 2023-04-25 · ·

A system determines that a primary event processor, included in a primary data center, is associated with a failure. The primary event processor is included in the primary data center and configured to process first events stored in a main event store of the primary data center. The system identifies a secondary event processor, in a secondary data center, that is to process one or more first events based on the failure. The primary event processor and the secondary event processor are configured to process a same type of event. The system causes, based on a configuration associated with the primary or secondary event processor, the one or more first events to be retrieved from one of the main event store or a replica event store. The replica event store is included in the secondary data center and mirrors the main event store of the primary data center.

Techniques for deploying workloads on nodes in a cloud-computing environment

Described are examples for deploying workloads in a cloud-computing environment. In an aspect, based on a desired number of workloads of a process to be executed in a cloud-computing environment and based on one or more failure probabilities, an actual number of workloads of the process to execute in the cloud-computing environment to provide a level of service can be determined and deployed. In another aspect, a standby workload can be executed as a second instance of the process without at least a portion of the separate configuration used by the multiple workloads, and based on detecting termination of one of multiple workloads, the standby workload can be configured to execute based on the separate configuration of the separate instance of the process corresponding to the one of the multiple workloads.

SYSTEMS AND METHODS FOR HIERARCHICAL FAILOVER GROUPS
20230124430 · 2023-04-20 · ·

A logical grouping of subgroups of server clusters forms a failover super-cluster. A logical grouping of groups of servers provides high availability by, upon failure of an entire group (site), failing over an entire subgroup to a different subgroup. Yet within each subgroup local failovers continue to maintain application high availability during instances in which the site remains operational.

Reducing recovery time of an application

Examples provided herein describe a method for reducing recovery time for an application. For example, a first physical processor of a computing device may monitor, based on a first application instance of the application running in a first mode, for failure detection of the first application instance running on a first computing device. The first physical processor may determine that the first application instance is to be changed from the first mode to a second mode. Based on the determination, the first physical processor may validate that a second application instance can run in the first mode by performing a data integrity compliance check. Responsive to validating that the second application instance can run in the first mode, the first physical processor may facilitate running of the second application instance in the first mode.

Application backup and management

A data management and storage (DMS) cluster of peer DMS nodes manages data of an application distributed across a set of machines of a compute infrastructure. A DMS node associates a set of machines with the application, and generates data fetch jobs for the set of machines for execution by multiple peer DMS nodes. The DMS node determining whether each of the data fetch jobs for the set of machines is ready for execution by the peer DMS nodes. In response to determining that each of the data fetch jobs is ready for execution, the peer DMS nodes execute the data fetch jobs to generate snapshots of the set of machines. The snapshots may be full or incremental snapshots, and collectively form a snapshot of the application.

Mirroring data to survive storage device failures

Ensuring resiliency to storage device failures in a storage system, including: determining a number of storage device failures within a particular write group that are to be tolerated by the storage system; for a plurality of datasets stored within the storage system, writing each dataset to at least a predetermined number of storage devices within the particular write group, wherein the predetermined number of storage devices is greater than the number of storage device failures within the particular write group that are to be tolerated by the storage system; and responsive to recovering from a system interruption: determining a number of readable storage devices that contain a copy of the dataset; and if the number of readable storage devices that contain a copy of the dataset is not greater than the number of failures that are to be tolerated, writing the dataset to one or more additional storage devices.

Multi-synch of a primary automation device with multiple secondaries

Methods and systems for synchronizing controllers in an automation control system, can involve arranging redundancy elements in an automation control system comprising a group of nodes, wherein the redundancy elements can include one or more primary controllers and a group of concurrent secondary controllers, and wherein a back-up to the primary controller can exist on any node. Such methods and systems can further involve backing-up of the primary controller by the one or more secondary controllers to allow the primary controller to maintain the one or more secondary controllers as a new, alternate secondary controller for a load balancing or an equipment update.

METHOD, APPARATUS, AND NON-TRANSITORY COMPUTER READABLE MEDIUM FOR MIGRATING VIRTUAL MACHINES
20230062733 · 2023-03-02 ·

A method and an apparatus for migrating virtual machine includes monitoring a status of a compute node; determining whether the compute node meets a trigger condition; wherein the trigger condition comprising a time period of lost connection of the compute node reaches a predetermined time period, or an unstable status of the compute node; and if the compute node meets the trigger condition, transmitting a message to a control node to migrate the VM.

DATA TRANSMISSION METHOD AND ELECTRONIC CHIP OF THE MANYCORE TYPE

A method for transmitting data between functions implemented on a first electronic chip of the manycore type. The first electronic chip includes a plurality of execution cores, the execution cores being grouped in clusters, the clusters being interconnected by at least two communication systems. The data transmission method includes the steps of: implementing a first function on a first cluster; implementing a second function on a second cluster, characterised in that the second function is also implemented on a third cluster distinct from the first and second clusters; and transmitting at least one data item between the first function and the second function.