Patent classifications
G06F11/2028
Node level recovery for clustered databases
An example networked computing system for iterative node level recovery comprises a node cluster; a database; at least one processor configured by instructions to perform operations comprising at least: identifying a failed node among existing nodes in the node cluster; identifying and initiating a replacement node as a new node for the node cluster; accessing at the database a logical backup of the node cluster; retrieving logical backup data of the node cluster and identifying specific rows of backup data to be restored to the new node; restoring the specific data rows to the new node; identifying new data written by applications, to the existing nodes of the node cluster, during restoration of the new node; iteratively accessing supplementary back up data to identify supplementary data rows to be restored to the new node; and iteratively restoring the supplementary data rows to the new node until the new node is synchronized with the existing nodes in the node cluster.
Database system
The present disclosure relates to a method of operating a database system. The database system comprises: a database; a first compute node comprising a first database proxy; and a second compute node comprising a second database proxy. The method comprises receiving and processing, at the first database proxy, a first plurality of access requests to access the database; receiving and processing, at the second database proxy, a second plurality of database access requests to access the database; monitoring for a failure event associated with the first database proxy; and, in response to the monitoring indicating a failure event, initiating a failover procedure between the first database proxy and the second database proxy. The failover procedure comprises: redirecting the first plurality of access requests to the second database proxy; and processing, at the second database proxy, the first plurality of access requests.
Systems and methods for host image transfer
Methods and systems for transferring a host image of a first machine to a second machine, such as during disaster recovery or migration, are disclosed. In one example, a first profile of a first machine of a first type is compared to a second profile of a second machine of a second type different from the first type, to which the host image is to be transferred. The first and second profiles each comprise at least one property of the first type of first machine and the second type of second machine, respectively. At least one property of a host image of the first machine is conformed to at least one corresponding property of the second machine. The conformed host image is provided to the second machine, via a network. The second machine is configured with at least one conformed property of the host image.
Preparing containerized applications for backup using a backup services container and a backup services container-orchestration pod
A “backup services container” comprises “backup toolkits,” which include scripts for accessing containerized applications plus enabling utilities/environments for executing the scripts. The backup services container is added to Kubernetes pods comprising containerized applications without changing other pod containers. For maximum value and advantage, the backup services container is “over-equipped” with toolkits. The backup services container selects and applies a suitable backup toolkit to a containerized application to ready it for a pending backup. Interoperability with a proprietary data storage management system provides features that are not possible with third-party backup systems. Some embodiments include one or more components of the proprietary data storage management within the illustrative backup services container. Some embodiments include one or more components of the proprietary data storage management system in a backup services pod configured in a Kubernetes node. All configurations and embodiments are suitable for cloud and/or non-cloud computing environments.
SERVER SYSTEM AND METHOD OF MANAGING SERVER SYSTEM
A server system including a first server to execute first role, other server to execute at other role, spare server and management layer server. The management layer server is configured to allocate first group of users to access first server and other group of users to access other server, receive status information sent by first server and status information sent by other server, analyse status information to determine an operational status of first server and operational status of other server, update role of spare server to first role when operational status of first server indicates failed state and reallocate first group of users to the spare server, and update a role of another spare server to the other role when the operational status of the other server indicates a failed state and reallocate the other group of users to the other spare server.
Management of microservices failover
Embodiments described herein are generally directed to intelligent management of microservices failover. In an example, responsive to an uncorrectable hardware error associated with a processing resource of a platform on which a task of a service is being performed by a primary microservice, a failover trigger is received by a failover service. A secondary microservice is identified by the failover service that is operating in lockstep mode with the primary microservice. The secondary microservice is caused by the failover service to takeover performance of the task in non-lockstep mode based on failover metadata persisted by the primary microservice. The primary microservice is caused by the failover service to be taken offline.
Access consistency in high-availability databases
Techniques are disclosed relating to maintaining a high availability (HA) database. In some embodiments, a computer system receives, from a plurality of host computers, a plurality of requests to access data stored in a database implemented using a plurality of clusters. In some embodiments, the computer system responds to the plurality of requests by accessing data stored in an active cluster. The computer system may then determine, based on the responding, health information for ones of the plurality of clusters, wherein the health information is generated based on real-time traffic for the database. In some embodiments, the computer system determines, based on the health information, whether to switch from accessing the active cluster to accessing a backup cluster. In some embodiments, the computer system stores, in respective clusters of the database, a changeover decision generated based on the determining.
SITE LOCALITY SUPPORT FOR FILE SERVICES IN A STRETCHED CLUSTER ENVIRONMENT
The location of resources for file services are located within the same site, thereby eliminating or reducing performance issues caused by cross-site accesses in a stretched cluster environment. A file server placement algorithm initially places file servers at a site based at least in part on host workload and affinity settings, and can perform failover to move the file servers to a different location (e.g., to a different host on the same site or to another site) in the event of a failure of the host where the file servers were initially placed. File servers may be co-located with clients at a location based on client latencies and site workload. Failover support is also provided in the event that the sites in the stretched cluster have different subnet addresses.
Failure recovery in a scaleout system using a matrix clock
One example method includes performing failure recovery operations in a computing system using matrix clocks. Each node or process in a computing system is associated with a matrix clock. As events and transitions occur in the computing systems, the matrix clocks are updated. The matrix clocks provide a chronological and casual view of the computing system and allow a recovery line to be determined in the event of system failure.
Proactive cluster compute node migration at next checkpoint of cluster upon predicted node failure
While scheduled checkpoints are being taken of a cluster of active compute nodes distributively executing an application in parallel, a likelihood of failure of the active compute nodes is periodically and independently predicted. Responsive to the likelihood of failure of a given active compute node exceeding a threshold, the given active compute node is proactively migrated to a spare compute node of the cluster at a next scheduled checkpoint. Another spare compute node of the cluster can perform prediction and migration. Prediction can be based on both hardware events and software events regarding the active compute nodes.