Patent classifications
G06F16/24544
Query Generation Using Derived Data Relationships
Data expressions in a simplified query language are processed to generate queries in a structured query language which can then be executed against data ingested from one or more data sources. The data expression is parsed to determine quads and to produce a tree of the quads. A derivation graph including nodes representing the quads and including at least one edge representing a derivation relationship between two of the quads determined based on attributes of the quads is generated based on the tree of quads and a data schema. The derivation graph is then queried based on a grain of the quads to generate the query. The simplified query language does not require an expression of a join relationship between the quads within the data expression when an unambiguous relationship between the quads is obtainable from the data schema.
Derivation Graph Querying Using Deferred Join Processing
A derivation graph including nodes representing quads identified within a data expression in a simplified query language is queried using deferred join processing. A derivation graph is generated based on a first data expression that includes a join between a second data expression and a third data expression, in which the derivation graph includes at least one node representative of the second data expression and at least one node representative of the third data expression. A root node is identified within the derivation graph by determining that the nodes representative of the second data expression and the third data expression are derivable from the root node using the derivation graph. Query language instructions representing the join between the second data expression and the third data expression written in a second query language are then generated using the root node.
SEMANTIC ANNOTATION FOR TABULAR DATA
An approach to column to semantic concept mapping using joint estimation through piecewise maximum likelihood estimation and utilizing large openly available structured data may be provided. The approach may include a special estimation methods for categorical, numeric, and alphanumeric/symbolic data, while unifying the overarching estimation with a common framework of likelihood estimation. The approach may also include indexes to support quick estimation computations for numeric, categorical, and mixed type data. Additionally, the approach may include semantic context utilization without a polynomial increase in mapping runtime or resource utilization.
Incremental simplification and optimization of complex queries using dynamic result feedback
Techniques for improving complex database queries are provided. A determination is made whether to adopt a static or dynamic query execution plan for a received database query based on metrics. When a dynamic query execution plan is adopted, the database query is separated into query fragments. A plan fragment is generated for each query fragment and executed to generate feedback for the plan fragment. The feedback from the execution of each plan fragment is used to initiate query rewrite rules to simplify the corresponding query fragments. The rewritten query fragments are combined to generate the dynamic query plan.
Identifying Joins Of Tables Of A Database
Identifying table joins includes obtaining respective casting similarities between pairs of columns of a first table and a second table. Each pair of columns includes a first column of the first table and a second column of the second table. Ones of the pairs of columns not satisfying a casting similarity condition are discarded to obtain first join candidates. Respective string similarities for the first join candidates are obtained. Ones of the first join candidates not satisfying a string similarity condition are discarded to obtain second join candidates. Final join candidates are obtained using the respective casting similarities and the respective string similarities of the second join candidates. A selected join candidate of the final join candidates is received from a user.
Methods for substituting a semi-join operator with alternative execution strategies and selection methods to choose the most efficient solution under different plans
The manner in which tables are joined can affect the outcome of the query and database performance. Example types of join operations include semi-join and inner-join. The techniques described herein are approaches that may be used to substitute a semi-join operator with an inner-join operator and may be used to transform and optimize representations of queries.
SYSTEM AND METHOD FOR QUERYING A DATA REPOSITORY
A method is disclosed, as well as systems, performed by one or more processors, for interacting with data in a data repository. The method comprises receiving, in a data catalogue environment, a search request relating to one or more items in the data repository and determining an object type associated with the one or more items. Other operations comprise loading an object template in dependence on the determined object type, populating the template with data from the data repository in dependence on the search request to create an object view, and displaying the object view within the data catalogue environment. The data repository comprises a plurality of joined datasets, and wherein the object view comprises one or more links to items in a joined dataset.
Aggregation operator optimization during query runtime
The subject technology provides information, corresponding to properties of a build side of a join operation, to a bloom filter. The subject technology, based at least in part on the information from the bloom filter, determines, during executing of a query plan, at least one property of the join operation to determine whether to switch an aggregation operator to a pass through mode, the at least one property comprising at least a reduction rate. The subject technology, switches, in response to the reduction rate being below a threshold value, the aggregation operator to the pass through mode during runtime of the query plan and, while the aggregation operator is in the pass through mode, an input stream of data goes through the aggregation operator without being analyzed and the input stream of data matches an output stream of data flowing out of the aggregation operator.
Systems and methods for enabling two parties to find an intersection between private data sets without learning anything other than the intersection of the datasets
A system and method are disclosed for comparing private sets of data. The method includes encoding first elements of a first data set such that each element of the first data set is assigned a respective number in a first table, encoding second elements of a second data set such that each element of the second data set is assigned a respective number in a second table, applying a private compare function to compute an equality of each row of the first table and the second table to yield an analysis and, based on the analysis, generating a unique index of similar elements between the first data set and the second data set.
Join cardinality estimation using machine learning and graph kernels
A cardinality of a query is estimated by creating a join plan for the query. The join plan is converted to a graph representation. A subtree graph kernel matrix is generated for the graph representation of the join plan. The subtree graph kernel matrix is submitted to a trained model for cardinality prediction which produces a predicted cardinality of the query.