G06F16/24545

Data statement chunking
11537610 · 2022-12-27 · ·

Techniques are presented for applying fine-grained client-specific rules to divide (e.g., chunk) data statements to achieve cost reduction and/or failure rate reduction associated with executing the data statements over a subject dataset. Data statements for the subject dataset are received from a client. Statement attributes derived from the data statements are processed with respect to fine-grained rules and/or other client-specific data to determine whether a data statement chunking scheme is to be applied to the data statements. If a data statement chunking scheme is to be applied, further analysis is performed to select a data statement chunking scheme. A set of data operations are generated based at least in part on the selected data statement chunking scheme. The data operations are issued for execution over the subject dataset. The results from the data operations are consolidated in accordance with the selected data statement chunking scheme and returned to the client.

Efficient data processing for schema changes

A system for processing database data includes an interface and a processor. The interface is configured to receive a query for the database data comprising a date range and a data selection criterion. The processor is configured to determine a set of fields of the database data corresponding to a most recent date of the date range; determine a subset of the set of fields of the database data specified by the data selection criterion; determine a set of transformations, where each transformation of the set of transformations corresponds to a field of the subset and a sub-range of the date range; transform the database data to determine transformed database data using the set of transformations; and select data from the transformed database data using the data selection criterion to determine a query response.

Estimating query cardinality

A method comprising: receiving a plurality of pairs of queries associated with a database, wherein the queries in each pair in the plurality of pairs of queries have an identical FROM clause; at a training stage, training a machine learning model on a training set comprising: (i) the plurality of pairs of queries, and (ii) labels associated with containment rates between each of the pairs of queries over the database; and at an inference stage, applying the trained machine learning model to a pair of target queries, to estimate containment rates between the target pair of queries over the database.

Quality-aware keyword query suggestion and evaluation

A query suggestion to expand an initial query is calculated whereby the cost of the expanded initial query is bounded in both time and quality. The user validates a subset of the top-n answers Q(G) to a query Q and provides adjusted configuration parameters. The top-n diversified δ-expansion terms Q′ are calculated from the validated subset of answers Q(G) to the query Q and are provided to an interactive user interface for selection. Answers Q′(G) for the top-n diversified δ-expansion terms Q′ are cost bounded by cost threshold δ and exploration range r specified by the user. The user selects a new term of terms Q′ and an incremental query evaluation of the new term is invoked to compute expanded query answers Q′(G) by incrementally updating the validated subset of answers Q(G), without re-evaluating an expanded query Q′ including the new term from scratch.

Computing domain cardinality estimates for optimizing database query execution

A method implements optimization of database queries by computing domain cardinality estimates. A client sends a database query to a server. The method parses the query to identify data columns. For each of the data columns, the method computes a lower bound and an upper bound of distinct data values using a pre-computed table size. The method also computes a patch factor by applying a pre-computed function to a ratio between a number of distinct data values that appear exactly once in a data sample and a number of distinct data values in the sample. Based on the patch factor, the lower bound, and the upper bound, the method computes an estimate of distinct values for each of the data columns. The method subsequently generates an execution plan for the query according to the computed estimates, executes the execution plan, and returns a result set to the client.

A/B testing of service-level metrics

The disclosed embodiments provide a system for performing A/B testing of service-level metrics. During operation, the system obtains service-level metrics for service calls made during an A/B test, wherein the service-level metrics are aggregated by user identifiers of multiple users. Next, the system matches the service-level metrics to treatment assignments of the users to a treatment group and a control group in the A/B test. The system then applies the A/B test to a first grouping of the service-level metrics for the treatment group and a second grouping of the service-level metrics for the control group. Finally, the system outputs a result of the A/B test for use in assessing an effect of a treatment variant in the A/B test on the service-level metrics.

DATABASE PROCESSING METHOD AND APPARATUS
20220365933 · 2022-11-17 ·

A database processing method and an apparatus are provided, and may be applied to a database system. A tree structure is used to represent a join order and used as an input of a neural network, and different first attribute matrices are allocated to different brother nodes in the input tree structure. This helps the neural network comprehensively learn information about the join order, obtain representation information capable of differentiating the join order from another join order, and predict costs of the join order accurately based on the obtained representation information of the join order. Then, an optimizer selects a join order with lowest costs for a query statement based on the costs predicted by a cost prediction module.

Method and apparatus for optimizing database transactions
11500869 · 2022-11-15 · ·

The disclosure provides a database operation method and apparatus. The method comprises: sequentially acquiring, during a process of executing a target transaction by an application server, database operation commands executed by the application server for the target transaction; executing a prediction algorithm on the database operation commands, returning predicted execution results to the application server so that the application server determines a next to-be-executed database operation command, and locally recording the database operation commands and predicted execution data generated from the executing of the prediction; and when acquiring a transaction commit command regarding the target transaction, controlling a database corresponding to the application server to actually execute the target transaction according to the locally recorded database operation commands and the predicted execution data. The disclosed embodiments improve transaction execution efficiency and increase transaction throughput.

Flexible query execution

An approach is provided for optimizing a system resource of a cloud database. Components of a database system are divided into micro-systems according to functions and execution levels. A cluster analysis of the micro-systems and an analysis of workload patterns are performed. Different combinations of the micro-systems are generated. Images of the micro-systems and of the different combinations of the micro-systems are generated. A query is received and analyzed at a current layer specifying a set of micro-systems specified by a function of the database system. Service(s) associated with micro-system(s) specified by next layer(s) are pre-loaded and activated. A partial execution of the query is performed and a result of the query is generated at a selected edge or client side, where the selection is based on the analysis of the workload patterns.

Runtime optimization of grouping operators
11494378 · 2022-11-08 · ·

Runtime optimization of grouping operators is described. A system estimates a resource cost for each of multiple grouping operators based on values identified during query runtime, in response to receiving a query request associated with a data stream. The system selects a grouping operator during query runtime, based on a corresponding resource cost, from the multiple grouping operators. The selected grouping operator enables grouping the data stream based on the query request, and outputting a response based on the grouped data stream.