Patent classifications
G06F16/24545
Estimating distinct values for database systems
Some embodiments provide a non-transitory machine-readable medium stores a program. The program receives a query for data that includes a join operation. The program further generates a plurality of candidate query execution plans based on the query, each candidate query execution plan comprising a set of reduction operations. The program also determines a plurality of execution costs associated with the plurality of sets of reduction operations in the plurality of candidate query execution plans. The program further selects a query execution plan from the plurality of candidate query execution plans based on the plurality of execution costs. The program also executes the query execution plan to generate a set of query results for the query.
Dynamic routing method and apparatus for query engine in pre-computing system
The present application discloses a dynamic routing method and apparatus for a query engine in a pre-computing system. The method includes: pre-obtaining cube data under a preset dimensional combination in a pre-computing system; determining a degree of aggregation of the cube data selected as expected under the preset dimensional combination after a query request is received; executing query processing on the query request in a first distributed query engine when the degree of aggregation of the cube data under the preset dimensional combination is high; and switching to a second distributed query engine to execute query processing on the query request when the degree of aggregation of the cube data under the preset dimensional combination is low. The present application solves the technical problem that the query response speed of the pre-computing query system is not ideal. Through the present application, the sub-second high-performance query response can be achieved. At the same time, as a result, higher concurrency can be supported so as to meet business needs, and the stability of the query system is simultaneously guaranteed.
EFFICIENTLY CONSTRUCTING REGRESSION MODELS FOR SELECTIVITY ESTIMATION
A model generator constructs a model for estimating selectivity of database operations by determining a number of training examples necessary for the model to achieve a target accuracy and by generating approximate selectivity labels for the training examples. The model generator may train the model on an initial number of training examples using cross-validation. The model generator may determine whether the model satisfies the target accuracy and iteratively and geometrically increase the number of training examples based on an optimized geometric step size (which may minimize model construction time) until the model achieves the target accuracy based on a defined confidence level. The model generator may generate labels using a subset of tuples from an intermediate query expression. The model generator may iteratively increase a size of the subset of tuples used until a relative error of the generated labels is below a target threshold.
DISTRIBUTED MEMORY-AUGMENTED NEURAL NETWORK ARCHITECTURE
A method for using a distributed memory device in a memory augmented neural network system includes receiving, by a controller, an input query to access data stored in the distributed memory device, the distributed memory device comprising a plurality of memory banks. The method further includes determining, by the controller, a memory bank selector that identifies a memory bank from the distributed memory device for memory access, wherein the memory bank selector is determined based on a type of workload associated with the input query. The method further includes computing, by the controller and by using content based access, a memory address in the identified memory bank. The method further includes generating, by the controller, an output in response to the input query by accessing the memory address.
Automated optimization of number-of-frequent-value database statistics
A maintenance subsystem of a database-management system (DBMS) receives a database query that requests access to data stored in a database column. The subsystem retrieves or infers frequent-value statistics for that column, each of which specifies the number of times one distinct value is stored in the column. The statistics are partitioned into Keep and Discard clusters and, using statistical or other computational methods based on the column's data distribution, the subsystem determines an optimal number of the statistics that should be kept by the DBMS in order to minimize cost, errors, or other parameters desired by an implementer. The subsystem then directly or indirectly directs a query-optimizer component of the DBMS to consider the optimal number of frequent-value statistics when selecting an optimal data-access plan. The selected plan is then used by the DBMS's storage-manager component to access the column when servicing the received query.
Generation of optimized logic from a schema
A method includes accessing a schema that specifies relationships among datasets, computations on the datasets, or transformations of the datasets, selecting a dataset from among the datasets, and identifying, from the schema, other datasets that are related to the selected dataset. Attributes of the datasets are identified, and logical data representing the identified attributes and relationships among the attributes is generated. The logical data is provided to a development environment, which provides access to portions of the logical data representing the identified attributes. A specification that specifies at least one of the identified attributes in performing an operation is received from the development environment. Based on the specification and the relationships among the identified attributes represented by the logical data, a computer program is generated to perform the operation by accessing, from storage, at least one dataset having the at least one of the attributes specified in the specification.
CONTENT RESONANCE
A real-time messaging platform and method is disclosed which classifies messages in accordance with a combination of user engagement events as modified to reflect the temporal structure of the user engagement events. A message can be assigned a metric based, for example, on a weighted combination of user engagement rates, decayed with time to reflect an intuition that recent interactions by one or more users with the message will have a greater impact than older interactions with the message. Different types of interaction by one or more users with the message can be assigned different weights when the different engagement events are combined and, also, can be assigned different temporal characteristics.
Robustness metrics for optimization of query execution plans
A method may include responding to a query to retrieve data from a database by identifying a plurality of query execution plans. An overall robustness value may be determined for each query execution plan. The overall robustness value of a query execution plan may correspond to a sum of individual robustness values for each operator included in the query execution plan. Each operator may have an individual robustness value that corresponds to a first change in a total cost of a query execution plan including the operator relative to a second change in an output cardinality of the operator. One of the plurality of query execution plans may be selected based on the overall robustness value of each of the plurality of query execution plans. The query may be executed by performing a sequence of operators included in the selected one of the plurality of query execution plan.
Optimal query scheduling for resource utilization optimization
Embodiments of the present invention provide a method, system and computer program product for optimal query scheduling for resource utilization option. In an embodiment of the invention, a process for optimal query scheduling includes receiving in an information retrieval data processing system at a contemporaneous time, a request for deferred query execution of a specified query to a future time after the contemporaneous time. The method additionally includes determining a frequency of change of data corresponding to a field referenced in the specified query. Then, on condition that the frequency of change is below a threshold value, an intermediate time prior to the future time but after the contemporaneous time can be identified and the specified query scheduled for execution at the intermediate time instead of the future time. But, otherwise the specified query can be scheduled at the future time as originally requested.
RESOURCE MANAGEMENT SYSTEMS AND METHODS
Example resource management systems and methods are described. In one implementation, a resource manager is configured to manage data processing tasks associated with multiple data elements. An execution platform is coupled to the resource manager and includes multiple execution nodes configured to store data retrieved from multiple remote storage devices. Each execution node includes a cache and a processor, where the cache and processor are independent of the remote storage devices. A metadata manager is configured to access metadata associated with at least a portion of the multiple data elements.