G06F11/2097

Unified storage on block containers

An illustrative unified data storage method includes providing, by a data storage system, block containers that represent a linear address space of blocks; and using, by the data storage system, the block containers to store content for a plurality of different data storage services. In certain examples, the different data storage services include at least one of a file storage service, an object storage service, or a database service.

Recovering from system faults for replicated datasets

Recovering from system faults for replicated datasets, including: receiving, by the cloud-based storage system, a request to modify a dataset that is stored by the cloud-based storage system, wherein the dataset is synchronously replicated among a plurality of storage systems that includes the cloud-based storage system, wherein a request to modify the dataset is acknowledged as being complete when each of the plurality of storage systems has modified its copy of the dataset; generating recovery information indicating whether the request to modify the dataset has been applied on all storage systems in the plurality of storage systems synchronously replicating the dataset; and after a system fault, applying a recovery action in dependence upon the recovery information indicating whether the request to modify the dataset has been applied on all storage systems in the plurality of storage systems synchronously replicating the dataset.

Replication for cyber recovery for multiple tier data

Replication of a filesystem or a mount point or share may replicate all data that it consists of irrespective of where the data is stored. Replication protects data irrespective of location. One method is to replicate the filesystem namespace as is while skipping the data outside of the appliance/machine so that replication cost and time are reasonable. The data outside of the machine, like cloud/tape data is protected differently. One example method includes a data protection operation configured to replication a namespace associated with multiple data tiers. During replication, data from one of the tiers is skipped while all of the namespace metadata is replicated. The recovery restores the namespace metadata and the data that was replicated from the other tier. This may be performed in connection with cyber security, for example when replicating multi-tier data to a vault.

Modifying a cloned image of replica data

Modifying a clone image of a dataset, including: generating, based on metadata describing one or more updates to a dataset, a tracking copy of replica data on a target data repository; generating, after receiving an indication to begin accepting modifications to the tracking copy of the replica data, a cloned image of the dataset that is modifiable without modifying the tracking copy of the replica data; and responsive to a storage operation directed to the target data repository, modifying the cloned image of the dataset without modifying the tracking copy of the replica data.

MAINTAINING DURABILITY OF A DATA OBJECT USING UNPLANNED DELTA COMPONENTS DURING TRANSIENT FAILURES

The disclosure herein describes enhancing data durability of a base component of a data object using an unplanned delta component during transient fault unavailability. A base component of a data object becoming unavailable due to a transient fault is detected. A delta component associated with the base component is generated, wherein the delta component includes unwritten storage space with an address space and a tracking bitmap including a plurality of bits associated with data blocks of the address space of the delta component. The stale LSN with which the base component is associated is assigned to the delta component and the delta component is synchronized with an active component of the data object based on the assigned stale LSN. The delta component records write I/O targeted for the base component and, based on detecting the base component becoming available, the base component is synchronized with the delta component.

Failure recovery in a scaleout system using a matrix clock
11704201 · 2023-07-18 · ·

One example method includes performing failure recovery operations in a computing system using matrix clocks. Each node or process in a computing system is associated with a matrix clock. As events and transitions occur in the computing systems, the matrix clocks are updated. The matrix clocks provide a chronological and casual view of the computing system and allow a recovery line to be determined in the event of system failure.

Telemetry targeted query injection for enhanced debugging in microservices architectures

An apparatus to facilitate telemetry targeted query injection for enhanced debugging in microservices architectures is disclosed. The apparatus includes one or more processors to: identify contextual trace of a previous query recorded in collected data of a service, where microservices of the service responded to the previous query; access an interdependency flow graph representing an architecture and interaction of microservices deployed for a service; retrieve, based on the interdependency flow graph, telemetry data of the microservices corresponding to the contextual trace; identify, based on the telemetry data, an activation profile corresponding to the previous query, the activation profile detailing a response of the microservices to the previous query; compare the activation profile to a correlation profile for the previous query to detect whether an anomaly occurred in the service in response to the previous query; and recommend a modified query based on detection of the anomaly.

Multi-destination probabilistic data replication
11556562 · 2023-01-17 · ·

Disclosed embodiments provide techniques for multi-destination probabilistic data replication. Data transfer occurs over multiple time intervals. A data image to be transferred is divided into chunks. A manifest is created that lists each chunk and specifies an order, such that the data image can be reconstructed at its destination. The manifest is sent to the destination. The chunks may be sent to the destination, or to an aggregator site that then forwards the chunks to the destination. The chunks are reassembled at the destination based on information in the manifest. A probabilistic function is used to select an aggregator site based on an efficacy. The efficacy is based on a reward function that is computed for destinations for each time interval. A data transfer policy is periodically updated with a new efficacy value which is used for adjustment of the probabilistic function.

Method for replacing a currently operating data replication engine in a bidirectional data replication environment without application downtime and while preserving target database consistency, and by using audit trail tokens that provide a list of active transactions

An automated method is provided for use when replacing a currently operating data replication engine in a first system with a new data replication engine in the first system in a bidirectional data replication environment. The currently operating data replication engine in the first system and the new data replication engine in the first system replicates first database transactions from an audit trail of a first database in the first system to a second database in a second system. The new data replication engine in the first system generating a list of active database transactions in the first system, and sends the list of active database transactions to the new data replication engine in the second system as a first token. The new data replication engine in the second system receives the first token, fetches a transaction event from an audit trail of second database, and replicates the fetched transaction event to the new data replication engine of the first system when the fetched transaction event does not match a transaction on the list in the first token. These steps are repeated during operation of the new data replication engine of the second system. The currently operating data replication engine in the first system is stopped from replicating first database transactions when all of the transactions on the list of active database transactions that were generated have been replicated to the second system.

Cell-based backup for recovering from system failures in a multi-tenant computing system
11700556 · 2023-07-11 ·

A multi-tenant computing system provides services to a number of different tenant organizations. To address the problem of failure of portions of the system, the hardware infrastructure of the system is located at a number of different geographical locations. The various tenants are assigned to one of a set of “cells,” each cell corresponding to one of the geographical locations. Additionally, each cell has another one of the cells assigned to it as a backup cell, and the data of each cell is replicated within its assigned backup cell. At system run time, if a failure is detected within one of the cells, the network redirection is used within the multi-tenant system to reflect that the backup cell for the failing cell is now handling requests for the failing cell. Upon determination that the failing cell has been repaired and is now again correctly functioning, the network redirection is no longer employed, such that the (formerly) failing cell again handles its own requests.