G06F16/24554

Metadata query method and apparatus

In the field of data storage, a metadata query method and an apparatus are disclosed to improve metadata searching efficiency. The method is applied to a linked snapshot. A metadata query request is received. A first time sequence identifier is obtained from the first snapshot volume based on the volume identifier of the first snapshot volume. A historical index information is queried based on the data block identifier and the first time sequence identifier. When the data block identifier exists, and the first time sequence identifier falls within a corresponding query time sequence interval a corresponding target volume identifier is obtained from the historical index information. Address metadata corresponding to the data block identifier are obtained from a second snapshot volume indicated by the target volume identifier.

System and method for disjunctive joins

Joining data using a disjunctive operator is described. An example computer-implemented method can include receiving a query that includes a first disjunctive predicate involving a first table and a second table. The method may also include determining a first set of rows from the first table and generating a filter from the first set of rows. The method may also further include applying the filter to the second table to generate a second set of rows. Additionally, the method may also include joining the first set of rows and the second set of rows using a first disjunctive operator of the first disjunctive predicate to generate a first results set.

ONLINE QUERY EXECUTION USING A BIG DATA FRAMEWORK
20230124362 · 2023-04-20 ·

Techniques are disclosed relating to the execution of queries in an online manner. For example, in some embodiments, a server system may include a distributed computing system that, in turn, includes a distributed storage system operable to store transaction data associated with a plurality of users, and a distributed computing engine operable to perform distributed processing jobs based on the transaction data. In various embodiments, the server system preemptively creates a compute session on the distributed computing engine, where the compute session provides access to various functionalities of the distributed computing engine. The distributed computing engine may then use these preemptively created compute sessions to execute queries (e.g., for end users of the server system) against the transaction data and return the results dataset to the requesting users in an online manner.

METHODS AND APPARATUS TO DETERMINE A FREQUENCY DISTRIBUTION FOR DATA IN A DATABASE
20230117942 · 2023-04-20 ·

Disclosed examples access data from a database, the data stored across multiple registers of the database; determine (a) a maximum rank for each of the multiple registers and (b) a maximum rank count for each of the multiple registers; determine a frequency distribution based on the maximum ranks and the maximum rank counts; and generate a report including at least one of the frequency distribution, the maximum ranks, or the maximum rank counts.

SCALABLE PARALLEL CONSTRUCTION OF BOUNDING VOLUME HIERARCHIES
20230118972 · 2023-04-20 ·

One embodiment of the present invention sets forth a technique for generating a bounding volume hierarchy. The technique includes determining a first set of objects associated with a first node. The technique also includes generating a first plurality of child nodes that are associated with the first node. The technique further includes for each object included in the first set of objects, storing within the object an identifier for a corresponding child node included in the first plurality of child nodes based on a first set of partitions associated with the first set of objects.

SENSITIVITY-BASED DATABASE PROCESSING AND DISTRIBUTED STORAGE

A system and method is provided to selectively process and store tables of a relational database by calculating an overall data sensitivity score for each table based on predefined attribute rules; performing column-wise splitting of at least one of the tables into a first table and a second table based on the overall data sensitivity score of each table, thereby generating a total number of relational database tables; storing a first subset of the total number of relational database tables in a private cloud storage database in a distributed storage environment based on the overall data sensitivity scores of each of the total number of relational database tables; and storing a second subset of the total number of relational database tables in a public cloud storage database of the distributed storage environment based on the overall data sensitivity scores of each of the total number of relational database tables.

SYSTEM AND METHOD FOR DATA QUALITY ASSESSMENT
20220327109 · 2022-10-13 ·

Methods and systems for providing data assessment across related datasets, including identifying exceptional values in datasets and assessing upstream and or downstream datasets that utilize the exceptional values. Data assessment rules use exceptional values found in a dataset and data lineage information to identify impacted data in upstream or downstream datasets.

DYNAMIC CARDINALITY-BASED GROUP SEGMENTATION
20230066660 · 2023-03-02 ·

Systems and methods are provided for analysis and selection of attributes used to segment data entities. The attributes used to segment data entities may be analyzed to identify segments of data entities (e.g., distinct audiences of visitors) that share values for a given subset of attributes. By intelligently selecting attributes for use in the segmentation process based on the values that they may take (e.g., the cardinality of the attributes), the selected attributes can be used to generate a reasonable or otherwise desirable number of data entity segments. Other attributes can be excluded from the segmentation process.

Delta database data provisioning

A data exchange that provides historical data indexed by date is provided. The data exchange may include a raw data layer, a model data layer, a delta staging layer, a delta database and a plurality of workspaces. The raw data layer may be a landing zone for raw data records. The model data layer may include modeled data records. The delta staging layer may be a landing zone for changed data. The changed data may correspond to changes made to the data records. The delta database may be divided into partitions. Each partition may hold data records that changed during a given time period. A plurality of data records may be continuously transferred from the raw data layer to both the model data layer and the delta staging layer. Once, during a predetermined time period, the contents of the delta staging layer may replace the contents of a partition.

INDEX SPLITTING IN DISTRIBUTED DATABASES
20230161747 · 2023-05-25 · ·

In a distributed database, many nodes can store copies, or instances, of the same record. If the record is split on one node, it should be split on the other nodes to maintain consistency, concurrency, and correctness of the data in the distributed database. In some distributed databases, the records are locked during the update process to ensure data integrity. Unfortunately, locking the records can increase latency, especially for larger databases. But if the records aren’t locked and a node fails as a record is being split and updated simultaneously, the split and update may not propagate throughout the distributed database, leading to a loss of data integrity. Exchanging messages about the status of record splitting and forwarding updates internally reduces the likelihood of a loss of data integrity due to a node failure.