Patent classifications
G06F16/283
Processing queries using an index generated based on data segments
A table organized into a set of batch units is accessed. A set of N-grams are generated for a data value in the source table. The set of N-grams include a first N-gram of a first length and a second N-gram of a second length where the first N-gram corresponds to a prefix of the second N-gram. A set of fingerprints are generated for the data value based on the set of N-grams. The set of fingerprints include a first fingerprint generated based on the first N-gram and a second fingerprint generated based on the second N-gram and the first fingerprint. A pruning index that indexes distinct values in each column of the source table is generated based on the set of fingerprints and stored in a database with an association with the source table.
DATA EXTRACTION FROM A MULTIDIMENSIONAL DATA STRUCTURE
In some implementations, a device may identify respective sets of unique values for multiple dimensions of a multidimensional data structure. The device may identify a plurality of subsets of permutations of a set of permutations of the unique values. The plurality of subsets of permutations are to be processed in parallel. The device may obtain, based on processing the plurality of subsets of permutations in parallel, respective data associated with each permutation of the plurality of subsets of permutations. The data for a permutation, of the plurality of subsets of permutations, is obtained based on respective unique values for the permutation that are determined independently of another permutation of the plurality of subsets of permutations.
Dynamic Query Allocation to Virtual Warehouses
Methods, systems, and apparatuses for managing and selecting virtual warehouses for execution of queries on one or more data warehouses are described herein. A request to execute a query may be received. An execution plan, for the query, may be identified. A processing complexity for the query may be predicted based on the query and the execution plan. A plurality of virtual warehouses may be identified. An operating status and processing capabilities of the plurality of virtual warehouses may be determined. A subset of the plurality of virtual warehouses may be selected based on the processing complexity, the operating status of the plurality of virtual warehouses, and the processing capabilities of the plurality of virtual warehouses. The query may be executed on one of the subset of the plurality of virtual warehouses.
System and method for use of lock-less techniques with a multidimensional database
In accordance with an embodiment, described herein is a system and method for use of lock-less data structures and processes with a multidimensional database computing environment. Lock-less algorithms or processes can be implemented with specific hardware-level instructions so as to provide atomicity. A memory stores an index cache retaining a plurality of index pages of the multidimensional database. A hash table indexes index pages in the index cache, wherein the hash table is accessible by a plurality of threads in parallel through application of the lock-less process.
CONTINUOUS FEATURE-INDEPENDENT DETERMINATION OF FEATURES FOR DEVIATION ANALYSIS
Systems and methods include determination, for each of a plurality of discrete features, of statistics based on a number of occurrences of each discrete value of the discrete feature in the data, determination of first summary statistics based on the determined statistics, determine of a dissimilarity for each discrete feature based on the first summary statistics and on the statistics determined for the discrete feature, determination of candidate discrete features based on the determined dissimilarities, determination, for each of the candidate discrete features, of second summary statistics based on values of a continuous feature associated with each discrete value of the candidate discrete feature, determination of a deviation score for each of the candidate discrete features based on the second summary statistics, and transmission of the candidate discrete features for display in association with the continuous feature based on the determined deviation scores.
EFFECTIVE AND SCALABLE BUILDING AND PROBING OF HASH TABLES USING MULTIPLE GPUS
Described approaches provide for effectively and scalably using multiple GPUs to build and probe hash tables and materialize results of probes. Random memory accesses by the GPUs to build and/or probe a hash table may be distributed across GPUs and executed concurrently using global location identifiers. A global location identifier may be computed from data of an entry and identify a global location for an insertion and/or probe using the entry. The global location identifier may be used by a GPU to determine whether to perform an insertion or probe using an entry and/or where the insertion or probe is to be performed. To coordinate GPUs in materializing results of probing a hash table a global offset to the global output buffer may be maintained in memory accessible to each of the GPUs or the GPUs may compute global offsets using an exclusive sum of the local output buffer sizes.
Systems and methods of generating data marks in data visualizations
An example method of displaying a data visualization includes displaying a plurality of selectable fields and receiving user selections of two different fields from the plurality of selectable fields. The method also includes generating, in accordance with the received user selections, data marks to be displayed in a data visualization, each data mark corresponding to a respective retrieved tuple of data from a multidimensional database, where (i) each data mark has an x-position defined according to data for a first field in the respective tuple and (ii) each data mark has a y-position defined according to data for a second field in the respective tuple. The method also includes displaying the data visualization that includes the generated data marks.
Method and system for handling source field and key performance indicator calculation changes
Most of the business intelligence and analytics applications uses a data model. Any change in the source field or in the key performance indicator (KPI) calculation changes result in long turn-around time and complex changes in the background coding of the data model. A method and system for handling the source field change and the key performance indicator (KPI) calculation change in the data model has been provided. The disclosure provides a data modelling design, in particular, for handling source field changes or additions and target KPI calculation changes without any impact on the data model. The solution section is divided in two areas so as to tackle the technical problem statement points. First part is the data ingestion and second is data reporting.
Accessing and organizing data sets directly from a data warehouse
Accessing and organizing data sets directly from a data warehouse including receiving, by a data analyzer, a request from a service provider client instructing the data analyzer to retrieve a data set from a service provider data warehouse, wherein the service provider client is a client of a service provider, and wherein the service provider data warehouse stores data sets for the service provider; retrieving, by the data analyzer, the data set directly from the service provider data warehouse using credentials provided by the service provider; selecting, by the data analyzer, a worksheet template based on the service provider; organizing, by the data analyzer, the data set into a worksheet based on the worksheet template; and presenting, by the data analyzer to the service provider client, the worksheet comprising the data set.
Adapative system for processing distributed data files and a method thereof
The present disclosure relates to a system and a method for processing distributed data files. The processor executes instructions to receive a set of instructions from a primary device, wherein the set of instructions comprises verification rules, validators, primary transformers and structure query transformers; generate processed data files by processing the distributed data files. The distributed data files are processed by performing at least one of: executing one of the verification rules, the validators and the primary transformers on the distributed data files; and transforming the distributed data files by executing the structure query transformers. The execution of the structured query transformers comprises steps of generating a dependency graph based upon dependencies between the structure query transformers; and determining a sequence of execution of the structured query transformers based upon the dependency graph; and transfer the processed data files to a data warehouse.