Patent classifications
G06F16/24545
Methods and devices for dynamic filter pushdown for massive parallel processing databases on cloud
A method for dynamic filter pushdown for massive parallel processing databases on the cloud, including acquiring one or more filters corresponding to a query, acquiring statistics information of one or more database tables, determining a selectivity of the one or more database tables based on the statistics information, determining whether the selectivity satisfies a threshold condition, and pushing down the one or more filters to the one or more database tables based on the determination of whether the selectivity satisfies a threshold condition.
Optimal query scheduling for resource utilization optimization
The present disclosure provides a method, system and computer program product for optimal query scheduling for resource utilization option. In an embodiment of the disclosure, a process for optimal query scheduling includes receiving in an information retrieval data processing system at a contemporaneous time, a request for deferred query execution of a specified query to a future time after the contemporaneous time. The method additionally includes determining a frequency of change of data corresponding to a field referenced in the specified query. Then, on condition that the frequency of change is below a threshold value, an intermediate time prior to the future time but after the contemporaneous time can be identified and the specified query scheduled for execution at the intermediate time instead of the future time. But, otherwise the specified query can be scheduled at the future time as originally requested.
Aggregation operator optimization during query runtime
The subject technology provides information, corresponding to properties of a build side of a join operation, to a bloom filter. The subject technology, based at least in part on the information from the bloom filter, determines, during executing of a query plan, at least one property of the join operation to determine whether to switch an aggregation operator to a pass through mode, the at least one property comprising at least a reduction rate. The subject technology, switches, in response to the reduction rate being below a threshold value, the aggregation operator to the pass through mode during runtime of the query plan and, while the aggregation operator is in the pass through mode, an input stream of data goes through the aggregation operator without being analyzed and the input stream of data matches an output stream of data flowing out of the aggregation operator.
Query processing using a predicate-object name cache
In some examples, a database system includes a memory to store a predicate-object name cache, where the predicate-object name cache contains predicates mapped to respective object names. The database system further includes at least one processor to receive a query containing a given predicate, identify, based on accessing the predicate-object name cache, one or more object names indicated by the predicate-object name cache as being relevant for the given predicate, retrieve one or more objects identified by the one or more object names from a remote data store, and process the query with respect to data records of the one or more objects retrieved from the remote data store.
Optimizing limit queries over analytical functions
A relational database management system (RDBMS) optimizes limit queries over analytical functions, wherein the limit queries include an output clause comprising a LIMIT, TOP and SAMPLE clause with an expression specifying a limit that is a number K or a percentage α %. The optimizations of the limit queries include: (1) static compile-time optimizations, and (2) dynamic run-time optimizations, based on semantic properties of “granularity” and “input-to-output cardinality” for the analytical functions.
Systems and methods for enabling two parties to find an intersection between private data sets without learning anything other than the intersection of the datasets
A system and method are disclosed for comparing private sets of data. The method includes encoding first elements of a first data set such that each element of the first data set is assigned a respective number in a first table, encoding second elements of a second data set such that each element of the second data set is assigned a respective number in a second table, applying a private compare function to compute an equality of each row of the first table and the second table to yield an analysis and, based on the analysis, generating a unique index of similar elements between the first data set and the second data set.
MODELING INDIVIDUAL INTERFACES FOR EXECUTING INTERFACE QUERIES OVER MULTIPLE INTERFACES
Interface models may be used to execute interface queries over multiple interfaces. A query may be received at a service that is specified according to an interface query language. A plan to perform the query may be generated from an application of interface models for different components of the service to determine behavior for invoking different interfaces. The different interfaces are then invoked according to the plan in order to perform the query. A result to the query is determined based on responses received from the different interfaces and returned.
Adaptive distribution method for hash operations
A method, apparatus, and system for join operations of a plurality of relations that are distributed over a plurality of storage locations over a network of computing components.
Selecting an optimal combination of systems for query processing
A method is provided for generating a classification model configured to select an optimal execution combination for query processing. The method provides, to a processor, training queries and different execution combinations for executing the training queries. Each different execution combination involves a respective different query engine and a respective different runtime. The method extracts, from a set of Directed Acyclic Graphs (DAGs) using a set of Cost-Based Optimizers (CBOs), a set of feature vectors for each of the plurality of training queries. The method adds, by the processor to each of merged feature vectors a respective label indicative of the optimal execution combination based on actual respective execution times of the plurality of different execution combinations, to obtain a set of labels. The method trains, by the processor, the classification model by learning the set of merged feature vectors with the set of labels.
Database query processing for data in a remote data store
In some examples, a database system identifies a plurality of query portions in a database query that contain references to a first external table, the first external table being based on data from a remote data store coupled to the database system over a network. The database system creates a common spool portion that includes projections and selections of the plurality of query portions, and rewrites the plurality of query portions into rewritten query portions that refer to a spool containing an output of the common spool portion. For execution of the database query, the database system determines, as part of optimizer planning, whether to use the plurality of query portions or the common spool portion and the rewritten query portions.