Patent classifications
G06F16/24535
SYSTEM AND METHOD FOR INCREMENTAL VIEW MAINTENANCE BASED ON DIFFERENTIAL CALCULUS OVER NATURAL ALGEBRA OF K-RELATIONS
A method for incremental update of materialized views and a system for answering queries against relational databases, or object-oriented databases, or graph databases, are provided. The system comprises a Storage Engine subsystem, configured to store original data as well as materialized views and subviews in a dedicated subsystem, and a Diff Engine subsystem configured to translate Natural Algebra representations of a Natural Algebra view definition into derived Natural Algebra expressions. The system further comprises an Optimizer configured to translate derived Natural Algebra expressions into Incremental View Maintenance plans, and a Delta Extractor subsystem configured to extract any transactional changes to the original data or batches of the said changes in a form that can be passed as input to the Incremental View Maintenance plans in order to compute the changes to the materialized views.
Splitting a time-range query into multiple sub-queries for serial execution
Techniques for splitting a time-range query into sub-queries for serial execution are provided. In one embodiment, a user query is received requesting items within a time range from a database. The time range is divided into a plurality of time periods within the time range. Sub-queries defining respective time periods of the plurality of time periods are generated from the user query, and a first sub-query is executed. The first sub-query defines a first time period of the plurality of time periods, where the first time period is a most-recent time period or a least-recent time period among the plurality of time periods. If it is determined that a number of items obtained from executing the first sub-query is greater than or equal to a predetermined result target, then the items obtained from executing the first sub-query are provided and subsequent sub-queries are not executed.
Complex query rewriting
A method, a system, and a computer program product for rewriting queries. A received query is parsed into a plurality of subqueries, where each subquery has one or more query elements. One or more identical subqueries are identified and grouped into one or more groups. Based on the groups of subqueries, an alias parameter is assigned to each identical subquery. The identical subqueries in the received query are replaced with corresponding aliases. An expression language statement is generated based on the received query, where each identical subquery is replaced with the corresponding assigned alias parameter in the expression language. The generated expression language statement is executed.
Flexible query execution
An approach is provided for optimizing a system resource of a cloud database. Components of a database system are divided into micro-systems according to functions and execution levels. A cluster analysis of the micro-systems and an analysis of workload patterns are performed. Different combinations of the micro-systems are generated. Images of the micro-systems and of the different combinations of the micro-systems are generated. A query is received and analyzed at a current layer specifying a set of micro-systems specified by a function of the database system. Service(s) associated with micro-system(s) specified by next layer(s) are pre-loaded and activated. A partial execution of the query is performed and a result of the query is generated at a selected edge or client side, where the selection is based on the analysis of the workload patterns.
SEARCH-RESULT EXPLANATION SYSTEMS AND METHODS
Search-result explanation systems, methods, and computer-program products receive a user search query, expand the search query into a plurality of sub-queries, perform a database search using the expanded user search query, and determine which sub-queries of the plurality of sub-queries matched with a particular search result. Results from the database search are re-indexed in an index generated on-the-fly and in-memory, within which the results are searched using the sub-queries to determine matching fields and match types. A score is determined based on the type of match(es) with a particular search result based on one or more predefined weights and normalized using a denominator comprising a fictitious, on-the-fly record configured to receive a perfect score according to the received user search query. A user interface showing ranked results and explanations for the ranking, including a score for the result based on the expanded user search query.
Efficient streaming based lazily-evaluated machine learning framework
Methods, systems, and computer products are herein provided for lazy evaluation of input data by a machine learning (ML) framework. An ML pipeline receives input data and compiles a chain of operators into a chain of dataviews configured for lazy evaluation of the input data. Each dataview in the chain represents a computation over data as a non-materialized view of the data. The ML pipeline receives a request for column data and selects a chain of delegates comprising one or more delegates for one or more dataviews in the chain to fulfill the request. The ML pipeline processes the input data with the selected chain of delegates. The ML pipeline performs delegate chaining on a dataview. A feature value for a feature column of the dataview is determined based on the delegate chaining and provided to an ML algorithm to predict column data.
Zero copy optimization for select * queries
A computer-implemented method includes receiving a query specifying an operation to perform on a first table of a plurality of data blocks stored. Each data block in the first table includes a respective reference count indicating a number of tables referencing the data block. The method also includes determining that the operation specified by the query includes copying the plurality of data blocks in the first table into a second table and, in response, for each data block of the plurality of data blocks in the first table copied into the second table, incrementing, the respective reference count associated with the data block in the first table, appending, by the data processing hardware, into metadata of the second table, a reference of the corresponding data block copied into the second table.
TECHNIQUES FOR BUILDING DATA LINEAGES FOR QUERIES
Various embodiments are generally directed to techniques for building data lineages for queries, such as SQL queries. Some embodiments are particularly directed to a lineage tool that is able to construct data lineages in a recursive manner that uses the text of a query to identify dependent tables. In several embodiments, the data lineage tool may parse SQL queries to identify columns and dependent tables, including analyzing interdependent queries used to populate dependent tables and proceeding until the true source of data is identified. In several embodiments, the data lineage tool may utilize the relationships and dependencies to build element and table level lineages.
COMPOUND PREDICATE QUERY STATEMENT TRANSFORMATION
Methods, computer program products, and systems are presented. The method computer program products, and systems can include, for instance: obtaining a query statement; parsing the query statement and determining from the parsing that the query statement is a compound predicate query statement that includes a first predicate and a second predicate; responsively to the parsing, rewriting the obtained query statement to provide a transformed query statement, wherein the rewriting includes (a) specifying generating of a temporary table, wherein the specified generating uses data values of the first predicate and (b) specifying a join function that uses the temporary table and a table referenced in the query statement; evaluating a candidate access path associated to the transformed query statement; selecting the candidate access path as an access path for execution; and executing the transformed query statement according to the selected candidate access path for execution.
Distributing partial results from an external data system between worker nodes
Systems and methods are disclosed for executing a query that includes an indication to process data managed by an external data system. The system identifies the external data system that manages the data to be processed and generates a subquery for the external data system indicating that the results of the subquery are to be sent to one worker node of multiple worker nodes. The system instructs the one worker node to distribute the results received from the external data system to multiple worker nodes for processing.