G06F11/2023

Creating a highly available data analytics pipeline without replicas

Providing for high availability in a data analytics pipeline without replicas, including: creating a data analytics pipeline, wherein each component of the data analytics pipeline is deployed within a container; creating a failover container; detecting that a component within the data analytics pipeline has failed; and responsive to detecting that the component within the data analytics pipeline has failed, deploying the component within the data analytics pipeline that has failed in the failover container.

SYSTEM AND METHOD FOR A DISASTER RECOVERY ENVIRONMENT TIERING COMPONENT MAPPING FOR A PRIMARY SITE
20230229551 · 2023-07-20 ·

A method for managing specialized hardware resources includes obtaining, by a disaster recovery (DR) virtual resource agent, a request for a DR environment for a set of virtual resources in a primary site, in response to the request: monitoring the primary site to obtain virtual workload information corresponding to the set of virtual resources, performing a workload analysis on the set of virtual resources in the primary site using the virtual workload information to obtain a virtual resource mapping of each virtual resource in the primary site to a tiered component in the DR environment, and initiating a DR environment allocation of DR virtual resources based on the virtual resource mapping.

MIGRATION OF VIRTUAL COMPUTE INSTANCES USING REMOTE DIRECT MEMORY ACCESS

A virtual compute instance is migrated between hosts using remote direct memory access (RDMA). The hosts are equipped with RDMA-enabled network interface controllers for carrying out RDMA operations between them. Upon failure of a first host and copying of page tables of the virtual compute instance to the first host's memory, a first RDMA operation is performed to transfer the page tables from the first host's memory to the second host's memory. Then, second RDMA operations are performed to transfer data pages of the virtual compute instance from the first host's memory to the second host's memory, with references to memory locations of the data pages specified in the page tables. The page tables of the virtual compute instance are reconstructed to reference memory locations of the data pages in the second host's memory and stored therein.

Failure recovery in a scaleout system using a matrix clock
11704201 · 2023-07-18 · ·

One example method includes performing failure recovery operations in a computing system using matrix clocks. Each node or process in a computing system is associated with a matrix clock. As events and transitions occur in the computing systems, the matrix clocks are updated. The matrix clocks provide a chronological and casual view of the computing system and allow a recovery line to be determined in the event of system failure.

Session templates

Techniques are disclosed herein for identifying, recording and restoring the state of a database session and various aspects thereof. A session template data structure is generated that includes session attribute values describing various aspects of the session that is established between a client system and a database management system (DBMS and enables the client system to issue to the DBMS commands for execution. Based on the session attribute values, DBMS may generate a template identifier corresponding to the session template data structure. The template identifier may be stored in an association with the session state that it partially (or in whole) represents. In an embodiment, when another state of a session is captured, if the template identifier for the state is the same, then rather than storing the attribute-value pairs for the other state, the template identifier is further associated with the other state. In an embodiment, a request boundary is detected where the session is known to be at a recoverable point. If recovery of the session is needed, the session state is restored, and replay of commands start from this point. Each command replayed is verified to produce the same session state as it produced at original execution. If the session is determined to be a safe point, then all the commands recorded for replay prior to the safe point may be deleted. In an embodiment, the template is used to set the initial state when borrowing from a session pool The state tracking is also used to know that the session can be failed over safely during planned operation as the session is unlikely to drain by itself even when not used.

Information processing apparatus, method of controlling information processing apparatus, and storage medium
11550594 · 2023-01-10 · ·

An information processing apparatus includes a storage unit configured to store at least a first boot program and a second boot program corresponding to the first boot program, a controller configured to read and execute a program, detect, in accordance with occurrence of a read error at reading of the first boot program, an address of a storage area storing a program in which the read error has occurred in the first boot program, and specify, from an address of a storage area storing the second boot program, an address corresponding to the detected address. The controller reads and executes the second boot program stored in the specified address.

Systems and methods for enabling a highly available managed failover service

a computing system that receives and stores configuration information for the application in a data store. The configuration information comprises (1) identifiers for a plurality of cells of the application that include at least a primary cell and a secondary cell, (2) a defined state for each of the plurality of cells, (3) one or more dependencies for the application, and (4) a failover workflow defining actions to take in a failover event. The computing system receives an indication, from a customer, of a change in state of the primary cell or a request to initiate the failover event. The computing system updates, in the data store, the states for corresponding cells of the plurality of cells based on the failover workflow and updates, in the data store, the one or more dependencies for the application based on the failover workflow.

Scalable byzantine fault-tolerant protocol with partial tee support
11546145 · 2023-01-03 · ·

A method is provided for preparing a plurality of distributed nodes to perform a protocol to establish a consensus on an order of received requests. The plurality of distributed nodes includes a plurality of active nodes, the plurality of active nodes including a primary node, each of the plurality of distributed nodes including a processor and computer readable media. The method includes preparing a set of random numbers, each being a share of an initial secret. Each share of the initial secret corresponds to one of the plurality of active nodes. The method further includes encrypting each respective share of the initial secret, binding the initial secret to a last counter value to provide a commitment and a signature for the last counter value, and generating shares of a second and of a plurality of subsequent additional secrets by iteratively applying a hash function to shares of each preceding secret.

INTERCONNECT LAYER SEND QUEUE RESERVATION SYSTEM
20220405220 · 2022-12-22 ·

Systems and methods for an interconnect layer send queue reservation system are provided. In one example, a method involves performing a transfer of data (e.g., an NVLog) from a storage system to a secondary storage system. A send queue having a fixed number of slots is maintained within an interconnect layer interposed between a file system and a Remote Direct Memory Access (RDMA) layer of the storage system. The interconnect layer implements an application programming interface (API) for the reservation system. A deadlock situation is avoided by, during a suspendable phase of a write transaction, making a reservation for slots within the send queue via the reservation system for the transfer of data. When the reservation is successful, the write transaction proceeds with a modify phase, during which the reservation is consumed and the interconnect layer is caused to perform an RDMA operation to carry out the transfer of data.

Arbitration method and related apparatus

This application discloses an arbitration method and a related apparatus. The method includes: detecting, by the first ABS apparatus, first status information of a service module in the first DC; when determining that a communication link between the first DC and the second DC is faulty, obtaining, by the first ABS apparatus, second status information, wherein the second status information is status information that is of a service module in the second DC and that is detected by the second ABS apparatus; and arbitrating, by the first ABS apparatus, a subsequent service providing capability of the first DC based on the first status information and the second status information.