G06F16/1858

Providing writable streams for external data sources
11593310 · 2023-02-28 · ·

The subject technology determines, using a connection to an external data source, a set of shards stored in an external data source, the connection to the external data source being established using an external integration, the external integration including security and configuration information. The subject technology determines a set of offsets of each shard of the set of shards. The subject technology generates a query plan indicating a degree of parallelism based at least in part on a size of the set of offsets. The subject technology, based on the set of shards and the set of offsets, performs an operation on the external data source by performing, using the connection to the external data source, a write operation from a query statement on the external data source, the external data source being different than a storage platform associated with the system.

PARALLEL PROCESSING DATABASE SYSTEM

A method and system for executing database queries in parallel using a shared metadata store. The metadata store may reside on a master node, where the master node is the root node in a tree. The master node may distribute query plans and query metadata to other nodes in the cluster. These additional nodes may request additional metadata from each other or the master nodes as necessary.

Artwork generated to convey digital messages, and methods/apparatuses for generating such artwork

Features from a style image are adapted to express a machine-readable code. For example, grains of rice depicted in a style image may be positioned to create a pattern mimicking that of a machine-readable code. The resulting output image can then be used as a graphical component in product packaging (e.g., as a background, border, or pattern fill), while also serving to convey a product identifier to a compliant reader device (e.g., a retail point-of-sale terminal). In some embodiments, a neural network is trained to apply a particular style image to machine readable codes. A great variety of other features and arrangements are also detailed.

Modifying a cloned image of replica data

Modifying a clone image of a dataset, including: generating, based on metadata describing one or more updates to a dataset, a tracking copy of replica data on a target data repository; generating, after receiving an indication to begin accepting modifications to the tracking copy of the replica data, a cloned image of the dataset that is modifiable without modifying the tracking copy of the replica data; and responsive to a storage operation directed to the target data repository, modifying the cloned image of the dataset without modifying the tracking copy of the replica data.

Determining differences between two versions of a file directory tree structure

A file directory tree structure of a selected storage snapshot is dynamically divided into different portions. A plurality of the different file directory tree structure portions are analyzed in parallel to identify any changes of the selected storage snapshot from a previous storage snapshot. To analyze each of the plurality of the different file directory tree structure portions, a processor is further configured to traverse and compare a corresponding file directory tree structure portion of the selected storage snapshot with a corresponding portion of a file directory tree structure of the previous storage snapshot while at least another one of the plurality of the different file directory tree structure portions of the selected storage snapshot is being analyzed in parallel.

CUSTOM METADATA TAG INHERITANCE BASED ON A FILESYSTEM DIRECTORY TREE OR OBJECT STORAGE BUCKET

A method and/or system of managing metadata are disclosed that include connecting a source data storage system (DSS) that stores both data and metadata to a metadata management platform (MMP); scanning metadata records onto the MMP from the DSS; storing metadata attributes for at least one of the group consisting of directories and buckets on the DSS in a look-up table on the MMP; and adding updated metadata attributes to the look-up table on the MMP for each subsequent scan of the DSS.

PARALLEL TRAVERSAL OF A FILESYSTEM TREE
20220398225 · 2022-12-15 · ·

A method for traversal of a filesystem tree, the method may include traversing the filesystem tree by multiple processing entities of a set of processing entities that belong to a storage system; wherein the traversing comprises multiple iterations of on-the fly allocation of workload, associated with parallel traversing of the filesystem tree, among the multiple processing entities; wherein a current iteration of the on the fly allocation is (a) executed by a current group of processing entities that are currently assigned to traverse current nodes of the filesystem tree, and (b) comprises re-allocating by the current group, a traversal task for traversing one or more child nodes of each of the current nodes of the filesystem tree, to a next group of processing entities; wherein the current group and the next group belong to the set.

Scaling HDFS for hive
11526464 · 2022-12-13 · ·

A non-transitory computer-readable storage media storing program instructions which, when executed by one or more processors, cause the one or more processors to perform: receiving a query to the distributed file system; determining a particular partition, associated with the data warehouse system, targeted by the query; accessing a repository associated with the data warehouse system to determine whether a partition-to-cluster mapping entry for the particular partition targeted by the query exists in the repository; in response to a determination that the entry for the particular partition exists in the repository, obtaining, from the entry for the particular partition, an identifier of a particular cluster to which the particular partition is assigned by the entry for the particular partition, the particular cluster being one of a plurality of clusters of the distributed file system, each cluster of the plurality of clusters having one name node and a plurality of data nodes.

Data mesh parallel file system replication

Embodiments relate to providing a multi-cloud, multi-region, parallel file system cluster service with replication between file system storage nodes. In some embodiments, a first file system storage node of a file system storage cluster receives a request from a client device to write data to a first file system stored on the first file system storage node. In response to the request to write the data to the first file system, a plurality of servers of the first file system storage node writes, in parallel, the data to the first file system and sends instructions to a second file system storage node of the file system storage cluster for writing the data to a second file system stored on the second file system storage node.

Parallel access to data in a distributed file system

An approach to parallel access of data from a distributed filesystem provides parallel access to one or more named units (e.g., files) in the filesystem by creating multiple parallel data streams such that all the data of the desired units is partitioned over the multiple streams. In some examples, the multiple streams form multiple inputs to a parallel implementation of a computation system, such as a graph-based computation system, dataflow-based system, and/or a (e.g., relational) database system.