Patent classifications
G06F16/24545
MONITORING RESOURCES OF A VIRTUAL DATA WAREHOUSE
Example resource management systems and methods are described. The method includes providing a plurality of processors and a plurality of cache memories in association with a virtual data warehouse management system, wherein each of the plurality of processors is associated with a stateless node. The method includes monitoring a performance of at least one processor of the plurality of processors to process database data. The method includes detecting a failure associated with the stateless node to process the database data responsive to monitoring the performance of the at least one processor of the plurality of processors to process the database data. The method includes replacing the stateless node with a different node without recreating a particular state.
SYSTEMS AND METHODS FOR ENABLING TWO PARTIES TO FIND AN INTERSECTION BETWEEN PRIVATE DATA SETS WITHOUT LEARNING ANYTHING OTHER THAN THE INTERSECTION OF THE DATASETS
A system and method are disclosed for comparing private sets of data. The method includes encoding first elements of a first data set such that each element of the first data set is assigned a respective number in a first table, encoding second elements of a second data set such that each element of the second data set is assigned a respective number in a second table, applying a private compare function to compute an equality of each row of the first table and the second table to yield an analysis and, based on the analysis, generating a unique index of similar elements between the first data set and the second data set.
LEARNING-BASED WORKLOAD RESOURCE OPTIMIZATION FOR DATABASE MANAGEMENT SYSTEMS
A DBMS training subsystem trains a DBMS workload-manager model with training data identifying resources used to execute previous DBMS data-access requests. The subsystem integrates each request's high-level features and compile-time operations into a vector and clusters similar vectors into templates. The requests are divided into workloads each represented by a training histogram that describes the distribution of templates associated with the workload and identifies the total amounts and types of resources consumed when executing the entire workload. The resulting knowledge is used to train the model to predict production resource requirements by: i) organizing production queries into candidate workloads; ii) deriving for each candidate a histogram similar in form and function to the training histograms; iii) using the newly derived histograms to predict each candidate's resource requirements; iv) selecting the candidate with the greatest resource requirements capable of being satisfied with available resources; and v) executing the selected workload.
DATABASE QUERY PERFORMANCE IMPROVEMENT
An approach for optimizing statistical query performance. The approach receives a structured query language set. The approach identifies a first set of parameters associated with the statements of the SQL set. The approach creates a merged SQL statement based on one or more matching parameters of SQL statements in the SQL set. The approach binds a second set of parameters associated with the merged SQL statement to the merged SQL statement. The approach generates a SQL statement based on the merged SQL statement. The approach generates a remote SQL statement based on the SQL statement. The approach executes a commit statement on the remote SQL statement.
METHOD AND SYSTEM FOR RECOMMENDING INDEXES BY CLOUD COMPUTATION
A method includes: acquiring unit computation cost and unit storage cost of a currently used cloud computation server in unit time; acquiring all historical query statements of a target user, extracting common characteristics of all the historical query statements, and determining query indexes corresponding to the historical query statements according to the common characteristics; determining query cost of each query index according to the frequency and time of querying a database through the query index and the used computation resources; determining a plurality of current query indexes corresponding to the current query statement based on the acquired current query statement of the target user; determining the total cost corresponding to each current query index according to the plurality of current query indexes through the unit computation cost, the unit storage cost, and the computation resource usage amount and usage time; and recommending a target query index to the target user.
QUERY CACHED FILTER
A method for responding to a query, the method may include (a) receiving, by a storage system compute element, a query that comprises one or more conditions related to a content of at least one data unit (DU); (b) searching, based on the one or more conditions and on a condition fulfillment information (CFI), for one or more irrelevant groups of DUs to be skipped during the responding to the query; wherein the one or more irrelevant groups of DUs belong to multiple stored groups of DUs that are stored in the storage system; wherein an irrelevant group of DU does not comprise, according to the CFI, any DU that fulfills the one or more conditions; and (c) generating a response to the query based on an outcome of the searching.
QUERY-BASED ROUTING OF DATABASE REQUESTS
Query-based routing of database requests is disclosed. In various embodiments, a database request is received via a communication interface. The request is parsed to extract one or more data elements associated with the request. Based at least in part on the one or more data elements extracted from the request, a selected one of a plurality of partial data set instances is selected, each partial data set instance including a corresponding subset of data from a set of origin data. The request is routed to the selected partial data set instance.
Offloading statistics collection
Methods and systems for generating database statistics. Table statistics in a metadata catalog of a source database system are observed, statistics generation costs utilizing a target database system are estimated, and source statistics generation costs utilizing a source database system are estimated. The statistics generation costs are compared and statistics generation queries by the target database system are triggered in response to the statistics generation costs utilizing the target database system having a predefined relationship with the source statistics generation costs utilizing the source database system. The statistics generation queries are performed by the target database system in response to the triggering by the source database system. The generated statistics are sent from the target database system to the source database system, the table statistics in a metadata catalog are updated based on the generated statistics, and the updated table statistics are used to optimize a query plan.
DYNAMIC ROUTING METHOD AND APPARATUS FOR QUERY ENGINE IN PRE-COMPUTING SYSTEM
The present application discloses a dynamic routing method and apparatus for a query engine in a pre-computing system. The method includes: pre-obtaining cube data under a preset dimensional combination in a pre-computing system; determining a degree of aggregation of the cube data selected as expected under the preset dimensional combination after a query request is received; executing query processing on the query request in a first distributed query engine when the degree of aggregation of the cube data under the preset dimensional combination is high; and switching to a second distributed query engine to execute query processing on the query request when the degree of aggregation of the cube data under the preset dimensional combination is low. The present application solves the technical problem that the query response speed of the pre-computing query system is not ideal. Through the present application, the sub-second high-performance query response can be achieved. At the same time, as a result, higher concurrency can be supported so as to meet business needs, and the stability of the query system is simultaneously guaranteed.
DISTRIBUTED HISTOGRAM COMPUTATION FRAMEWORK USING DATA STREAM SKETCHES AND SAMPLES
Methods for distributed histogram computation in a framework utilizing data stream sketches and samples are performed by systems and devices. Distributions of large data sets are scanned once and processed by a computing pool, without sorting, to generate local sketches and value samples of each distribution. The local sketches and samples are utilized to construct local histograms on which cardinality estimates are obtained for query plan generation of distributed queries against distributions. Local statistics of distributions are also merged and consolidated to construct a global histogram representative of the entire data set. The global histogram is utilized to determine a cardinality estimation for query plan generation of incoming queries against the entire data set. The addition of new data to a data set or distribution involves a scan of the new data from which new statistics are generated and then merged with existing statistics for a new global histogram.