Patent classifications
G06F16/24542
High-density compression method and computing system
Certain implementations of the disclosed technology may include methods and computing systems for performing high-density data compression, particularly on numerical data that demonstrates various patterns, and patterns of patters. According to an example implementation, a method is provided. The method may include extracting a data sample from a data set, compressing the data sample using a first compression filter configuration, and calculating a compression ratio associated with the first compression filter configuration. The method may also include compressing the data sample using a second compression filter configuration and calculating a compression ratio associated with the second compression filter configuration. A particular compression filter configuration to utilize in compressing the entire data set may be selected based on a comparison of the compression ratio associated with the first compression filter configuration and a compression ratio associated with the second compression filter configuration.
Systems and methods for determining peak memory requirements in SQL processing engines with concurrent subtasks
The present invention is generally directed to systems and methods of determining and provisioning peak memory requirements in Structured Query Language Processing engines. More specifically, methods may include determining or obtaining a query execution plan; gathering statistics associated with each database table; breaking the query execution plan into one or more subtasks: calculating an estimated memory usage for each subtask using the statistics; determining or obtaining a dependency graph of the one or more subtasks; based at least in part on the dependency graph, determining which subtasks can execute concurrently on a single worker node; and totaling the amount of estimated memory for each subtask that can execute concurrently on a single worker node and setting this amount of estimated memory as the estimated peak memory requirement for the specefic database query.
Partial group by for eager group by placement query plans
A partial group by operator is a group by operator that implements a fallback mechanism. The fallback mechanism is triggered whenever memory pressure reaches a certain threshold. When the fallback mechanism is triggered, a row is included in an output of the partial group by operator without including an aggregation value for a grouping value for the row to an aggregation data structure. A final group by operator computes a final aggregate value of all results, including pre-grouped results and passed through results, from the partial group by operator.
OPERATION FRAGMENTATION WITH METADATA SERIALIZATION IN QUERY PROCESSING PUSHDOWNS
Methods for operation fragmentation with metadata serialization in query processing pushdowns are performed by systems and devices. A compute node receives a query directed to database data, and generates query text fragments. Portions of metadata of the database are read from different page servers, and are serialized by the compute node. Page identities of data pages in a page server that stores the data are determined from a page index at the compute node, and the compute node provides the text fragments, the serialized metadata, and the page identities to the page server storing the data. The page server compiles the text fragments based on the serialized metadata to generate an executable query plan for the query. The page server initializes and performs execution of the executable query plan against the data as a pushdown query operation, and a result of pushdown query operation is provided to the compute node.
FILTER CLASS FOR QUERYING OPERATIONS
A data model identifying a first and second table may be stored, the first table comprising a first and second attribute, the second table comprising a third attribute. A first filter parameter of a first filter and a second filter parameter of a second filter may be obtained. A first tag value may be associated with the first and second filters. A set of filters including the first and second filters may be determined in response to a determination that the first and second filters are associated with the first tag value. An argument indicating the first and second filter parameters may be generated based on the set of filters. A call to the first table may be executed based on the argument, the execution of the call causing values of the first and second attributes to be obtained based on the first and second filter parameters.
Virtual Warehouse Query Monitoring and Reporting
Methods, systems, and apparatuses for tracking and generating analysis of queries submitted to virtual warehouses are described herein. A request to execute a query may be received. An organization may be determined, and the query may be modified to add an indication of the organization. A plurality of virtual warehouses may be identified. One of the plurality of virtual warehouses may be caused to execute the query. Query results and costs data may be received based on execution of the query by the first virtual warehouse. The costs data may indicate one or more costs associated with the execution of the query by the first virtual warehouse. The indication of the organization may be extracted from the query results. A user interface comprising query cost information corresponding to the organization may be generated.
EFFICIENT WORKLOAD BALANCING IN REPLICATED DATABASES BASED ON RESULT LAG COMPUTATION
A method for workload balancing is implemented in a computer system including a primary database system and a secondary database system, the secondary database system being configured as a backup system for the primary database system. The method includes compiling a query received from a client application. The query indicates one or more source tables have corresponding replicate tables. The compiling comprises replace the one or more source tables with the corresponding replicate tables. The method can generate a query execution plan based on the compiled query, determine a per-table result lag based on minimum commit times of the corresponding replicate tables, and compare the per-table result lag with a predefined threshold. Responsive to a determination that the per-table result lag exceeds the predefined threshold, the method can recompile the query by falling back to the one or more source tables from the corresponding replicate tables.
Query plan generation and execution based on single value columns
Aspects of the current subject matter are directed to executing queries on tables in which one or more columns contain a single value. Upon execution of a query, columns in which a single value is contained are identified, and a pre-compiled code entry containing relevant identifying information is compiled as part of a query execution plan. The query execution plan is used for subsequent query executions, alleviating the need to access the columns during the subsequent query executions that involve the columns. A fingerprint value may be used to track if changes to relevant tables occur.
GENERATING DEBUGGING INFORMATION FOR QUERY PLAN STEPS
A query plan includes steps to implement a query and debug steps interleaved among the steps. An execution engine of a database system executes each step of the query plan to realize the query. The execution engine executes each debug step of the query plan to generate debugging information for the step preceding the debug step within the query plan. The debugging information is queryable.
System and methods for performing updated query requests in a system of multiple database engine
A processor may receive a request with a column-oriented operation. The query request may be related to updates to a row-oriented database based on the query request previously performed on the row-oriented database. The processor may determine the amount of time for replicating the updates to a column-oriented database. If the amount of time for replicating the updates to the column-oriented database is greater than a threshold amount of time, then the processor may create a transient file with the updates. The transient file may be in a format that is recognized by the column-oriented database. The processor may query the column-oriented database based on the query request and the transient file.