G06F16/24537

Join optimization using multi-index augmented nested loop join method

A system and method for efficient query processing using multiple indices in a join operation are described. In one embodiment, a join query including a join operation on a first table and a second table and including a first condition and a second condition is received, wherein the first condition is based on a first index of the second table, and the second condition based on a second index of the second table; a first result set is determined by index scanning the second table using the first index as an index key; a second result set is determined by index scanning the second table using the second index as the index key; a third result set is determined by applying a set operation to the first result set and the second result set; and the third result set is provided in response to the join query.

Zero copy optimization for select * queries
11609909 · 2023-03-21 · ·

A computer-implemented method includes receiving a query specifying an operation to perform on a first table of a plurality of data blocks stored. Each data block in the first table includes a respective reference count indicating a number of tables referencing the data block. The method also includes determining that the operation specified by the query includes copying the plurality of data blocks in the first table into a second table and, in response, for each data block of the plurality of data blocks in the first table copied into the second table, incrementing, the respective reference count associated with the data block in the first table, appending, by the data processing hardware, into metadata of the second table, a reference of the corresponding data block copied into the second table.

DATA PROCESSING METHOD AND DATA PROCESSING APPARATUS
20230082563 · 2023-03-16 ·

A data processing method includes: receiving a data processing request carrying a query statement; converting the query statement into a corresponding relational algebra tree based on the data processing request; determining an operation type corresponding to the query statement based on the relational algebra tree; delivering the query statement to a first database in response to the operation type being a first type; and completing the data processing request in the first database based on the query statement.

Elimination of query fragment duplication in complex database queries

A database engine includes one or more computing devices, each having one or more processors and memory. The memory stores programs configured for execution by the processors. The database engine receives a database query from a client, and parses the database query to build a query operator tree. The query operator tree includes a plurality of query operators. The database engine performs one or more optimization passes on the query operator tree, including a deduplication optimization pass, to form an optimized execution plan. The deduplication optimization pass includes determining that a first query operator is equivalent to a second query operator during a traversal of the query operator tree, and replacing the second query operator with a link to reuse results from the first query operator. The database engine executes the optimized execution plan to retrieve a result set from the database and returns the result set to the client.

System and method for disjunctive joins

Joining data using a disjunctive operator is described. An example computer-implemented method can include receiving a query that includes a first disjunctive predicate involving a first table and a second table. The method may also include determining a first set of rows from the first table and generating a filter from the first set of rows. The method may also further include applying the filter to the second table to generate a second set of rows. Additionally, the method may also include joining the first set of rows and the second set of rows using a first disjunctive operator of the first disjunctive predicate to generate a first results set.

Federated query optimization
11636108 · 2023-04-25 · ·

A method builds a regression model for predicting processing times for federated queries using a variety of data sources. The method includes obtaining federated queries (e.g., from benchmarks), and generates a plurality of federated query plans for each federated query. Each federated query plan corresponds to executing a respective federated query using a respective data source as the federation engine. The method includes forming feature vectors for each federated query plan based on cost estimations for executing the respective federated query plan and cost estimations for data transfer. The method further includes training a regression model, using the feature vectors for the plurality of federated query plans, to predict runtimes for executing federated queries using the variety of data sources as a federation engine. Some implementations use the trained regression model to determine a suitable federation engine for a given federated query.

Routing SQL statements to elastic compute nodes using workload class

Technologies are described for routing structured query language (SQL) statements to elastic compute nodes (ECNs) using workload classes within a distributed database environment. The elastic compute nodes do not store persistent database tables. For example, a SQL statement can be received for execution within the distributed database environment. A workload class can be identified that matches properties of the SQL statement. Based on the workload class, a routing location hint can be obtained that identifies a set of elastic compute nodes. The SQL statement can then be routed to one of the identified elastic compute nodes for execution. Execution of the SQL statement at the elastic compute node can involve retrieving database data from other nodes which store persistent database tables.

LOW-LATENCY DATABASE SYSTEM

A database system comprised of a decoupled compute layer and storage layer is implemented to store, build, and maintain a canonical dataset, a temporary buffer, and an edits dataset. The canonical dataset is a set of batch updated data. The data is appended in chunks to the canonical dataset such that the canonical dataset becomes a historical dataset over time. The buffer is a write ahead log that contains the most recent chunks of data and provides atomicity and durability for the database system. The edits dataset is the set of data that contains edits such as cell mutations, row appends and/or row deletions. The database system enables users to make cell or row-level edits to tables and observe those edits in analytical systems or downstream builds with minimal latency.

Pipelined Hardware-Implemented Database Query Processing
20230120492 · 2023-04-20 ·

An apparatus for applying database commands to one or more database tables includes a memory and a hardware-implemented pipeline. The hardware-implemented pipeline includes one or more table-processing circuits, and is configured to receive a stream of input records drawn from the one or more database tables, to parse first records, from among the input records, into a key and one or more fields other than the key, to store at least parts of the first records in the memory so as to be accessible using the key, and to apply a database command by matching at least parts of second records from among the input records to the at least parts of the first records stored in the memory, in accordance with the key.

Data pattern analysis optimizer, and method of data pattern analysis optimization processing

An embodiment of a data pattern analysis optimizer includes a time sequence data memory, an estimator, a grouping unit, and a time sequence pattern extractor. The time sequence data memory stores a plurality of time sequence data made from items in time order. The estimator estimates the upper limit of the total number of types of time sequence patterns present in the time sequence data at a rate higher than a minimum support level, based on a respective rate of presence of each item, wherein each of the time sequence patterns present in the time sequence data is a predefined number of items. In case that the estimated upper limit exceeds an upper limit of the number of types of time sequence patterns as a maximum processing load to a computer, the grouping unit groups a plurality of time sequence data into sub-groups, based on a group of items having the increased number of items and gives the estimator instructions to perform estimation. The time sequence pattern extractor gives the computer instructs to extract the time sequence patterns for each of the sub-groups, in case that the estimated upper limit does not exceed the upper limit of the number of time sequence patterns.