G06F16/184

Methods, devices and systems for writer pre-selection in distributed data systems

A computer-implemented method may comprise receiving proposals to mutate a data stored in a distributed and replicated file system coupled to a network, the distributed and replicated data system comprising a plurality of nodes, each comprising a server. A metadata service maintains and updates a replica of a namespace of the distributed and replicated file system and coordinates updates to the data by generating an ordered set of agreements corresponding to the received proposals, the ordered set of agreements specifying an order in which the nodes are to mutate data stored in data nodes and cause corresponding changes to the state of the namespace. For each agreement in the generated ordered set of agreements, a corresponding writers list may be provided that comprises an ordered list of nodes to execute the agreement and make corresponding changes to the namespace. The ordered set of agreements may then be sent to the plurality of nodes along with, for each agreement in the ordered set of agreements, the corresponding writers list or a pre-generated index thereto and each of the plurality of nodes may be configured to only execute agreements for which it is a first-listed node on the received writers list.

Push-based piggyback system for source-driven logical replication in a storage environment

The disclosed techniques enable push-based piggybacking of a source-driven logical replication system. Logical replication of a data set (e.g., a snapshot) from a source node to a destination node can be achieved from a source-driven system while preserving the effects of storage efficiency operations (deduplication) applied at the source node. However, if missing data extents are detected at the destination, the destination has an extent pulling problem as the destination may not have knowledge of the physical layout on the source-side and/or mechanisms for requesting extents. The techniques overcome the extent pulling problem in a source-driven replication system by introducing specific protocols for obtaining missing extents within an existing replication environment by piggybacking data pushes from the source.

FILE JOURNAL INTERFACE FOR SYNCHRONIZING CONTENT
20230101958 · 2023-03-30 ·

In some embodiments, a system for synchronizing content with client devices receives a request from a client device to synchronize operations pertaining to content items associated with a user account registered at the system. The request can include the operations and a cursor identifying a current position of the client in a journal of revisions on the system. Based on the operations, the system generates linearized operations associated with the content items. The linearized operations can include a respective operation derived for each of the content items from one or more of the operations. The system converts each respective operation in the linearized operations to a respective revision for the journal of revisions and, based on the cursor, determines whether the respective revision conflicts with revisions in the journal. When the respective revision does not conflict with revisions in the journal, the system adds the respective revision to the journal.

Replication of account object metadata in a network-based database system

Provided herein are systems and methods for configuring replication of account object metadata. A system includes at least one hardware processor coupled to a memory and configured to decode a replication request received from a client device of a data provider. The replication request indicates at least a first account object, a source account, and a target account of the data provider. An object dependency of the at least first account object to at least a second account object of the data provider is determined. A replication of the at least first account object and the at least second account object is performed from the source account into the target account of the data provider.

Data replication in a data analysis system

The present disclosure relates to a method for data replication in a data analysis system (100). A source database system (101) of the data analysis system (100) comprises a transaction log (106) storing log records generated by database transactions. The method comprises in response to determining (303) that a received log record is generated by a database transaction that rolls back a change of another database transaction whose log records are buffered in at least one record buffer, data indicative of a log record generated by the other database transaction buffering (305) in the compensation buffer tag data. The tag data may be used (311) for replicating to a target database system of the data analysis system buffered log records of the record buffer which are not marked as compensation records.

HETEROGENOUS REPLICATION IN A HYBRID CLOUD DATABASE
20230091577 · 2023-03-23 ·

Aspects of the invention include splitting a first file retrieved from a first cloud computing environment of a hybrid cloud computing environment into multiple chunks. A respective chunk signature is calculated for each chunk of the multiple chunks, wherein the calculation is based at least in part on static metadata and the dynamic metadata retrieved from the first file. The respective chunk signatures are compared to chunk signatures from a metadata repository to identify a duplicate second file, wherein the first file is a variant of a second file stored in a second cloud computing environment of the hybrid cloud computing environment. Either the first file or the second file is selected as candidate for deletion. The candidate for deletion is deleted.

Granular Data Replication

Embodiments for granular replication of data with high efficiency. A defined metadata element embodied as a tag is assigned to each file. Tag filtering is used to direct the data to the proper location. Files with different tags can be selected for transfer. Embodiments can be used with a defined backup system file replication process, such as present in the Data Domain File System. By using snapshots, incoming new data is continued to be ingested while the replication is in process and maintaining data consistency at the same time. This is achieved by performing operations on B+ Tree snapshots in conjunction with tag filtering on keys present in the leaf pages of these structures. This method efficiently makes a single pass walk of a B+ Tree in contrast with previous methods that look up files one-by-one via their pathname.

DEPENDENCY AWARE PARALLEL SPLITTING OF OPERATIONS

Techniques are provided for dependency aware parallel splitting of operations. For example, a count of pending data operations being executed by a first node and replicated in parallel to a second node are tracked. A metadata operation is executed at the first node based upon the count being less than a threshold (e.g., the count being zero). A first list of affected inodes modified by the metadata operation is identified. A dependency of the metadata operation with respect to pending metadata operations replicated to the second node is determined. The metadata operation is dispatched to the second node based upon the dependency indicating that the metadata operation is independent of the pending metadata operations.

STORAGE-DEFERRED COPYING BETWEEN DIFFERENT FILE SYSTEMS
20220335005 · 2022-10-20 ·

Storage-deferred copying between different file systems, including: receiving a request to copy a plurality of files from a first file system to a second file system of a different type than the first file system; and virtually copying a plurality of data blocks mapped to the plurality of files in the first file system into the second file system by generating, in the second file system, a plurality of references to the plurality of data blocks.

METHODS, DEVICES AND SYSTEMS FOR WRITER PRE-SELECTION IN DISTRIBUTED DATA SYSTEMS

A computer-implemented method may comprise receiving proposals to mutate a data stored in a distributed and replicated file system coupled to a network, the distributed and replicated data system comprising a plurality of nodes, each comprising a server. A metadata service maintains and updates a replica of a namespace of the distributed and replicated file system and coordinates updates to the data by generating an ordered set of agreements corresponding to the received proposals, the ordered set of agreements specifying an order in which the nodes are to mutate data stored in data nodes and cause corresponding changes to the state of the namespace. For each agreement in the generated ordered set of agreements, a corresponding writers list may be provided that comprises an ordered list of nodes to execute the agreement and make corresponding changes to the namespace. The ordered set of agreements may then be sent to the plurality of nodes along with, for each agreement in the ordered set of agreements, the corresponding writers list or a pre-generated index thereto and each of the plurality of nodes may be configured to only execute agreements for which it is a first-listed node on the received writers list.