Patent classifications
G06F16/24544
DATA PRUNING BASED ON METADATA
A system and method for pruning data based on metadata. The method may include receiving a query comprising a plurality of predicates and identifying one or more applicable files comprising database data satisfying at least one of the plurality of predicates. The identifying the one or more applicable files including reading metadata stored in a metadata store that is separate from the database data. The method further includes pruning inapplicable files comprising database data that does not satisfy at least one of the plurality of predicates to create a reduced set of files and reading the reduced set of files to execute the query.
High Performance Query Processing and Data Analytics
High performance query processing and data analytics can be performed across architecturally diverse scales, such as single core, multi-core and/or multi-nodes. The high performance query processing and data analytics can include a separation of query computation, keying data, and data movement and parallel computation, thereby enhancing the capabilities of the query processing and data analytics, while allowing the specification of complex forms of data parallel computation that may execute across real-time and offline. The decoupling of data movement and parallel computation, as described herein can improve query processing and data analytics speed, can provide for the optimization of searches in a plurality of computing environments, and can provide the ability to search through a larger space of execution plans.
ENABLING EDITABLE TABLES ON A CLOUD-BASED DATA WAREHOUSE
Enabling editable tables on a cloud-based data warehouse including receiving, by a query manager from a query manager client, a request to create a referencing worksheet using, as a data source, a client-provided table; storing, by the query manager, the client-provided table on the cloud-based data warehouse; generating, by the query manager, a database query to create the referencing worksheet, wherein the database query targets the client-provided table on the cloud-based data warehouse; and issuing, by the query manager, the database query to the cloud-based data warehouse.
HIGH FIDELITY COMBINATION OF DATA
Techniques described herein perform high fidelity combination of data, for example combining time series data in response to a query. In an embodiment, a first input data stream of a first type (e.g., continuous), a second input data stream of a second type (e.g., discrete), and an operation to a function of and to be performed on the first and second input data streams are received. The second input data stream includes second input data stream samples associated with sample times. The techniques includes determining that at least some points in the second input data stream samples do not have synchronized samples in the first input data stream, automatically generating synchronized samples for the first input data stream, and performing the operation on the second input data stream samples and the automatically generated samples for the first input data stream.
SYSTEM AND METHOD FOR JOINING SKEWED DATASETS IN A DISTRIBUTED COMPUTING ENVIRONMENT
Disclosed is a method and system for joining datasets in a distributed computing environment. The system comprises a memory 206 and a processor 202. The processor 202 identifies a skewed dataset from two or more datasets to be joined. The processor 202 identifies a replication parameter from a configuration file. The processor 202 then assigns a randomly assigned machine number to each chunk of the skewed dataset owned by the nodes/machines involved in the join operation. The processor 202 forms copies of the non-skewed dataset equal to the replication parameter and adds the copy number to each sample of the copy of the non-skewed dataset formed. Further, the processor 202 merges each non-skewed dataset into the final copy of the non-skewed dataset, forming a single non skewed dataset. The processor 202 then repeats these steps for all the non-skewed datasets involved in the join operation resulting in generation of merged copies of all the non-skewed datasets and then performs the joining operation.
SKEW SENSITIVE ESTIMATING OF RECORD CARDINALITY OF A JOIN PREDICATE FOR RDBMS QUERY OPTIMIZER ACCESS PATH SELECTION
A query optimizer receives a relational database management system (RDBMS) query having a join predicate with a join between a first and a second table. The query optimizer determines a high skew value for a first variable joining the first and second tables at columns per the join predicate. A count query on one of the first and second tables is constructed and run only using the high skew value as a substitution for the first variable. A quantity of records for the join of the first and second tables is estimated using results of the count query. Different access paths (e.g., query plans) are used by the query optimizer depending on whether the estimated quantity of records exceeds a previously determined threshold or not.
Limiting scans of loosely ordered and/or grouped relations in a database
Data within a database object are accessed based on a query with a predicate including a plurality of conditional expressions. Elements of the database object are stored among a plurality of different storage regions along with range values for element values within each storage region. Each conditional expression of the query predicate is applied to the range values for each storage region to produce evaluation results of that conditional expression for each storage region. The evaluation result of each conditional expression for a corresponding storage region is combined to produce aggregated results for each of the storage regions, where the aggregated result for a corresponding storage region indicates results of a tri-state evaluation (e.g., true/false or unknown) of the conditional expressions for that storage region. One or more corresponding individual storage regions are scanned based on the aggregated results for those storage regions when the tri-state evaluation is unknown.
Synthesized predicate driven index selection for partitioned table
A system and method for receiving a query of a partitioned table, the query including a first index predicate associated with a first partition key column; determining that one or more of the query is missing a second index predicate and the first index predicate is unusable for index probing; responsive to determining that one or more of the query is missing the second index predicate and the first index predicate is unusable for index probing, generating one or more synthesized predicates used to process the query using an index scan.
Visionary query processing in Hadoop utilizing opportunistic views
Systems and methods are disclosed for query processing in a big data analytics platform by enumerating plans for a current query using a processor; building a dominance graph for the current query; for each plan, determining a regret value and a score for the plan based on the regret value and cost; and selecting query plans in an online fashion for query processing in big data analytics platforms where intermediate results are materialized and can be reused later.
ARCHIVING DATA OBJECTS USING SECONDARY COPIES
Exemplary systems and methods for archiving data objects using secondary copies are disclosed. The system creates one or more secondary copies of primary data that contains multiple data objects. The system may maintain a first data structure that tracks the data objects for which the system has created secondary copies and the locations of the secondary copies. To archive data objects in the primary data, the system identifies data objects to be archived, verifies that previously-created secondary copies of the identified data objects exist, and replaces the identified data objects with stubs. The system may maintain a second data structure that both tracks the stubs and refers to the first data structure, thereby creating an association between the stubs and the locations of the secondary copies.