Patent classifications
G06F16/24554
TECHNIQUES FOR UNIFYING ETL FILTER OPERATORS
Techniques are provided for unifying filter operators in exchange, transform, load (ETL) plans. Such a technique includes a method that may include receiving, by a computer system, an ETL plan including a split operator and a plurality of filter operators. The may include identifying, by the computer system, that the plurality of filter operators are configured to act on data output by the split operator in the ETL plan. The method may include generating, by the computer system, a unified filter operator using the plurality of filter operators. The method may include generating, by the computer system, an updated ETL plan comprising the unified filter operator providing filtered data to the split operator. The method may also include storing the updated ETL plan in a data store.
Powering Scalable Data Warehousing with Robust Query Performance
The present disclosure describes an analytical data management system (ADMS) that serves critical dashboards, applications, and internal users. This ADMS has high scalability, and availability through replication and failover, high user query load, and large data volumes. The ADMS provides continuous ingestion and high performance querying with tunable freshness. It further advances the idea of disaggregation by decoupling its architectural components: ingestion, indexing, and querying. As a result, the impact of a slow down in indexing on the query performance is minimized by either trading off data freshness or incurring higher costs.
Optimizing limit queries over analytical functions
A relational database management system (RDBMS) optimizes limit queries over analytical functions, wherein the limit queries include an output clause comprising a LIMIT, TOP and SAMPLE clause with an expression specifying a limit that is a number K or a percentage α %. The optimizations of the limit queries include: (1) static compile-time optimizations, and (2) dynamic run-time optimizations, based on semantic properties of “granularity” and “input-to-output cardinality” for the analytical functions.
Tracking intermediate changes in database data
Systems, methods, and devices for tracking a series of changes to database data are disclosed. A method includes executing a transaction to modify data in a micro-partition of a table of a database by generating a new micro-partition that embodies the transaction. The method includes associating transaction data with the new micro-partition, wherein the transaction data comprises a timestamp when the transaction was fully executed, and further includes associating modification data with the new micro-partition that comprises an indication of one or more rows of the table that were modified by the transaction. The method includes joining the transaction data with the modification data to generate joined data and querying the joined data to determine a listing of intermediate modifications made to the table between a first timestamp and a second timestamp.
Optimizing database performance through intelligent data partitioning orchestration
Intelligent analysis and prognosis-based data partitioning orchestration for optimizing database performance. Partitioning is not limited to partitioning keys established solely based on the columns of the table being partitioned, rather analysis is undertaken on dependent tables and the past behavior of fundamental data elements in the database is assessed as a means for determining the most optimal partitioning scheme. Thus, relevant information and values in the table being partitioned, as well as dependent tables and the fundamental data elements is used to determine how likely each record/row in the table is to be subjected to a data manipulation operation. The likelihood of a data manipulation operation being performed on each record serves as the basis for assigning the record to one of a plurality of partitions.
MULTI-PARTITIONING DATA FOR COMBINATION OPERATIONS
Systems and methods are disclosed for processing and executing queries against one or more dataset. As part of processing the query, the system determines whether the query is susceptible to a significantly imbalanced partition. In the event, the query is susceptible to an imbalanced partition, the system monitors the query and determines whether to perform a multi-partitioning determination to avoid a significantly imbalanced partition.
Complex query evaluation using sideways information passing
A program stored on non-transitory computer-readable storage medium executes a method of evaluating a graph over a query. Decomposition instructions decompose the query into a plurality of subqueries. Evaluation instructions evaluate a subquery of the plurality of subqueries and generate a substitution multiset representing a result of the evaluation of the subquery. Filtration instructions or expansion instructions may operate upon the generated substitution set before passing the substitution set to a next subquery to be evaluated. The filtration instructions identify one or more mappings in the substitution multiset that cannot be safely passed to the second subquery and delete the identified one or more mappings from the substitution multiset. The expansion instructions determine, in a case where the subquery is operated upon by a non-distributive query operator, an expansion of the substitution multiset based at least on adding one or more new substitutions to the substitution multiset.
MULTI-TENANT SYSTEM FOR PROVIDING ARBITRARY QUERY SUPPORT
A method comprising receiving by an arbitrary query engine a user request to perform a query associated with user data including first data and second data; partitioning the query into first and second sub-queries; providing the first sub-query to a first service provider interface (SPI) integrated into a first service configured to operate on the first data in a first datastore, the first SPI including a common interface component configured based on a uniform access specification to facilitate external communication between the arbitrary query engine and the first SPI, and the first SPI including a first service interface component configured to transform between the uniform access specification and a first service data specification and to facilitate internal data management; obtaining from the first datastore the first data formatted according to the first service data specification; transforming the first data; and providing the transformed first data to the arbitrary query engine.
TRACKING INTERMEDIATE CHANGES IN DATABASE DATA
Systems, methods, and devices for tracking a series of changes to database data are disclosed. A method includes executing a transaction to modify data in a micro-partition of a table of a database by generating a new micro-partition that embodies the transaction. The method includes associating transaction data with the new micro-partition, wherein the transaction data comprises a timestamp when the transaction was fully executed, and further includes associating modification data with the new micro-partition that comprises an indication of one or more rows of the table that were modified by the transaction. The method includes joining the transaction data with the modification data to generate joined data and querying the joined data to determine a listing of intermediate modifications made to the table between a first timestamp and a second timestamp.
SYSTEM AND METHOD FOR DISJUNCTIVE JOINS
Joining data using a disjunctive operator is described. An example computer-implemented method can include generating, with a processing device, a query plan for a query, the query comprising a join operator expression for a disjunctive predicate, wherein the join operator expression includes a conjunctive predicate and a disjunctive operator. The method may further include generating a bloom filter for the disjunctive operator. Additionally, the method may include generating a result set as a result of evaluating the join operator expression using the disjunctive operator and bloom filter for the disjunctive predicate.