Patent classifications
G06F16/24556
Estimating query cardinality
A method comprising: receiving a plurality of pairs of queries associated with a database, wherein the queries in each pair in the plurality of pairs of queries have an identical FROM clause; at a training stage, training a machine learning model on a training set comprising: (i) the plurality of pairs of queries, and (ii) labels associated with containment rates between each of the pairs of queries over the database; and at an inference stage, applying the trained machine learning model to a pair of target queries, to estimate containment rates between the target pair of queries over the database.
Systems and methods for data visualization, dashboard creation and management
Provide is a visualization system that enables generation of a “dashboard” of individual visualizations. In further embodiments, the system enables users to quickly and easily generate these visualizations and integrate complex filters, queries, aggregations, etc., with simple UI input. The visualizations can be provided as a service that requests information from an underlying database. The database itself may also be hosted as a service, permitting granular and native database functions layered with the visualization architecture. The system can support additional functionality and access management to generate visualizations that can be shared with other users and/or integrated into websites, blogs, etc. The system can handle the complex logic, data interactions, dynamic data transformation, dynamic authorization, etc., needed to manage data rules (e.g., access rules layered over database permission based control, summarization/aggregation requirements, etc.) for any data being rendered in individual visualization and/or the dashboard of multiple visualizations.
Chaining bloom filters to estimate the number of keys with low frequencies in a dataset
Techniques are described for generating an approximate frequency histogram using a series of Bloom filters (BF). For example, to estimate the f1 and f2 cardinalities in a dataset, an ordered chain of three BFs is established (“BF1”, “BF2”, and “BF3”). An insertion operation is performed for each datum in the dataset, whereby the BFs are tested in order (starting at BF1) for the datum. If the datum is represented in a currently-tested BF, the subsequent BF in the chain is tested for the datum. If the datum is not represented in the currently-tested BF, the datum is added to the BF, a counter for the BF is incremented, and the insertion operation for the current datum ends. To estimate the cardinality of f1-values in the dataset, the BF2-counter is subtracted from the BF1-counter. Similarly, to estimate the cardinality of f2-values in the dataset, the BF3-counter is subtracted from the BF2-counter.
Systems and methods for data visualization, dashboard creation and management
Provide is a visualization system that enables generation of a “dashboard” of individual visualizations. In further embodiments, the system enables users to quickly and easily generate these visualizations and integrate complex filters, queries, aggregations, etc., with simple UI input. The visualizations can be provided as a service that requests information from an underlying database. The database itself may also be hosted as a service, permitting granular and native database functions layered with the visualization architecture. The system can support additional functionality and access management to generate visualizations that can be shared with other users and/or integrated into websites, blogs, etc. The system can handle the complex logic, data interactions, dynamic data transformation, dynamic authorization, etc., needed to manage data rules (e.g., access rules layered over database permission based control, summarization/aggregation requirements, etc.) for any data being rendered in individual visualization and/or the dashboard of multiple visualizations.
Distributed query execution and aggregation
Computer-implemented methods and systems are disclosed for receiving and indexing a plurality of files for later querying, for dynamically generating scripts to be executed during a query of a data store, and for horizontally distributing a query and aggregating results of the distributed query.
PRIVATE JOINING, ANALYSIS AND SHARING OF INFORMATION LOCATED ON A PLURALITY OF INFORMATION STORES
According to examples, a system for generating and delivering enhanced content utilizing remote rendering and data streaming is described. The system may include a processor and a memory storing instructions. The processor, when executing the instructions, may cause the system to access a first data store with first information and a second data store with second information and align the first information with the second information to generate an aligned set. The processor, when executing the instructions, may then perform a computation on one or more identifiers utilizing the generated aligned set and reveal a differentially private output to one or more receiving parties.
QUERY PROCESSING FOR DISK BASED HYBRID TRANSACTIONAL ANALYTICAL PROCESSING SYSTEM
A method for processing a query may include receiving a query associated with one or more predicate columns and one or more aggregate columns. To respond to the query, one or more partial data pages including the one or more predicate columns but not the one or more aggregate columns may be loaded from disk to memory. For each partial data page, a first value occupying the one or more predicate columns may be evaluated to identify one or more rows satisfying a predicate associated with the query. A portion of a data page containing the aggregate columns may be loaded from disk into memory. A result of the query corresponding to a second value occupying the aggregate columns may be generated based on the portion of the data page loaded in the memory. Related systems and articles of manufacture are also provided.
ALGORITHM FOR DATES RANGE LAYERS FOR HISTORICAL DATA
The present invention is an algorithm for storing and accessing historical data, using dates range layers. This approach is intended to aggregate almost any kind of historical data such as but not limited to web server logs, network traffic, finance transactions, marketing data, or sports statistics into dates range layers which can be saved into almost any data storage making it easily accessible as quickly as possible without extra calculations.
Systems and methods for provisioning a new secondary IdentityIQ instance to an existing IdentityIQ instance
Systems and methods for provisioning a new secondary IdentityIQ instance to an existing IdentityIQ instance are disclosed. In one embodiment, a method may include: receiving a request to provision the new secondary IdentityIQ instance; creating a primary IdentityIQ instance for the existing IdentityIQ instance and the new secondary IdentityIQ instance; aggregating data from the existing IdentityIQ instance to the primary IdentityIQ instance; deploying an event handler to the primary IdentityIQ instance to handle incoming requests for the existing IdentityIQ instance; changing a reconciliation process and an audit process from the existing IdentityIQ instance to the primary IdentityIQ instance thereby changing the existing IdentityIQ instance to the secondary IdentityIQ instance for the primary IdentityIQ instance; deploying the new secondary IdentityIQ instance to the primary IdentityIQ instance; and deploying at least one application to the new secondary IdentityIQ instance based on an operation processed by the new secondary IdentityIQ instance.
Computing domain cardinality estimates for optimizing database query execution
A method implements optimization of database queries by computing domain cardinality estimates. A client sends a database query to a server. The method parses the query to identify data columns. For each of the data columns, the method computes a lower bound and an upper bound of distinct data values using a pre-computed table size. The method also computes a patch factor by applying a pre-computed function to a ratio between a number of distinct data values that appear exactly once in a data sample and a number of distinct data values in the sample. Based on the patch factor, the lower bound, and the upper bound, the method computes an estimate of distinct values for each of the data columns. The method subsequently generates an execution plan for the query according to the computed estimates, executes the execution plan, and returns a result set to the client.