Patent classifications
G06F16/24544
COMPUTERIZED SYSTEM AND METHOD FOR OPTIMIZING QUERIES IN A TEMPLATED VIRTUAL SEMANTIC LAYER
The disclosed systems and methods provide a novel framework that optimizes SQL queries that are generated from a templated virtual sematic layer. The framework introduces the use of a virtual semantic layer into database management systems' operations, whereby templated SQL queries can be rewritten according to a determined and measured nesting, dimensional structure that produces an optimized search system. This enables templated SQL fragments to be translated for query optimization, thereby reducing the drain on a database's resources and minimizing a query's impact on the database's performance.
Using a graph representation of join history to distribute database data
Using a graph representation of join history may be performed to distribute database data. Join history may be collected, captured, or tracked which describes the history of join operations between columns of different tables in a database. A graph representation of the join history may be generated. The graph representation may indicate a likelihood of different joins that may be performed between the columns of the tables of a database. An evaluation of the join history may be performed to identify columns for tables in the database to distribute the data of the tables amongst multiple storage locations according to the identified columns.
SYSTEM AND METHOD FOR QUERYING A DATA REPOSITORY
A search request relating to one or more datasets in the data repository can be received, the search request comprising a display request to display at least a portion of the one or more datasets. In response to the search request, a searchable database can be generated from the one or more datasets in a data repository based on ontological data associated with the one or more datasets. An object view of at least the portion of one or more datasets can be generated from the searchable database, the view being generated based on the ontological data. The generated object view can be provided to be displayed on a display device.
QUERY PERFORMANCE
An approach is provided for improving query performance. A query is received whose execution includes a first join of tables having sets of records and includes a second join with a next table whose set of records is smaller than a set of transient records resulting from the first join. A threshold for a number of records in the next table is received. A first count of the transient records resulting from the first join is estimated. A second count of a number of records in the next table is determined. It is determined that the second count is less than the threshold. Based on the second count being less than the threshold and without using the first count, a query execution plan is generated to include a broadcast of the records in the next table to data slices without including a broadcast of the transient records.
Fuzzy data operations
A method for clustering data elements stored in a data storage system includes reading data elements from the data storage system. Clusters of data elements are formed with each data element being a member of at least one cluster. At least one data element is associated with two or more clusters. Membership of the data element belonging to respective ones of the two or more clusters is represented by a measure of ambiguity. Information is stored in the data storage system to represent the formed clusters.
Supplementing events displayed in a table format
A method includes displaying events that correspond to search results of a search query, the events comprising data items of event attributes, the events displayed in a table. The table includes columns corresponding to an event attribute, rows corresponding events, cells populated data items, and interactive regions corresponding to at least one data item and selectable to add one or more commands to the search query. A reference event attribute is determined based on an analysis of a data object. A supplemental column corresponding to a supplemental event attribute is added to the table based on the reference event attribute. Supplemental interactive regions are added to the table and correspond to supplemental data items.
Automatic generation of materialized views
Definitions of material views are automatically generated. In general, Automated MV generation identifies a set of candidates MVs by examining a working set of query blocks. Once the candidates are formed, the candidate MVs are further evaluated to calculate a benefit to the candidate MVs. An improved approach for generating a candidate set of MVs is described herein. The improved approach is referred to as the extended covering subexpression technique (ECSE). Under ECSE, various relationships between join sets other than strict equivalence are used to generate new resultant join sets. Such relationships include subset, intersection, superset, and union, which shall be described in further detail below. In some cases, relationships among resultant join sets and initial join sets are considered to generate new resultant join sets. The final resultant join sets are then used to form a candidate set of MVs.
System and method for joining skewed datasets in a distributed computing environment
Disclosed is a method and system for joining datasets in a distributed computing environment. The system comprises a memory 206 and a processor 202. The processor 202 identifies a skewed dataset from two or more datasets to be joined. The processor 202 identifies a replication parameter from a configuration file. The processor 202 then assigns a randomly assigned machine number to each chunk of the skewed dataset owned by the nodes/machines involved in the join operation. The processor 202 forms copies of the non-skewed dataset equal to the replication parameter and adds the copy number to each sample of the copy of the non-skewed dataset formed. Further, the processor 202 merges each non-skewed dataset into the final copy of the non-skewed dataset, forming a single non skewed dataset. The processor 202 then repeats these steps for all the non-skewed datasets involved in the join operation resulting in generation of merged copies of all the non-skewed datasets and then performs the joining operation.
Join optimization using multi-index augmented nested loop join method
A system and method for efficient query processing using multiple indices in a join operation are described. In one embodiment, a join query including a join operation on a first table and a second table and including a first condition and a second condition is received, wherein the first condition is based on a first index of the second table, and the second condition based on a second index of the second table; a first result set is determined by index scanning the second table using the first index as an index key; a second result set is determined by index scanning the second table using the second index as the index key; a third result set is determined by applying a set operation to the first result set and the second result set; and the third result set is provided in response to the join query.
Automatically refreshing materialized views according to performance benefit
Materialized views for a database system may be automatically refreshed according to performance benefits. Materialized views may be ordered according to determined performance benefits for the materialized views indicating the performance benefit obtained when a materialized view is used to perform a query at the database system. Materialized views may be selected for refresh operations according to the ordering based on a capacity of the database system to perform refresh operations.