G06F16/24554

SYSTEM AND METHOD OF QUERYING OBJECTS ON DEMAND

An illustrative embodiment disclosed herein is an apparatus including a processor having programmed instructions that identify a temporary bucket linked to one or more objects of a main bucket, detect that an object is uploaded to the main bucket, determine whether the object has an object attribute satisfying an object attribute relationship, and responsive to determining that the object has the object attribute that satisfies the object attribute relationship, add, to the temporary bucket, a link to the object.

APPARATUS AND METHOD FOR CONTROLLING SKEW IN DISTRIBUTED ETL JOB
20170344607 · 2017-11-30 · ·

Provided are an apparatus and method for controlling a skew in a distributed extract, transform, load (ETL) job. The apparatus includes a divider configured to divide original data and generate a plurality of partitions to be processed in a distributed manner by a plurality of ETL tasks, and a re-divider configured to identify a straggler among the plurality of partitions on the basis of sizes of the plurality of partitions and divide the straggler on the basis of the number of available containers.

PARTITIONING A LIMITED RESOURCE AMONGST KEYWORDS
20170344551 · 2017-11-30 ·

An on-line social network system includes or is in communication with a search engine optimization (SEO) system that is configured to partition a number of available links from authoritative web pages to Job Search Results Rages (JSERPs) in a way that maximizes gain expressed by a predetermined metric, such as, a metric representing a number of certain type of events observed over a period of time or an improvement in respective ranks generated for the JSERPs by a third party search engine.

EFFICIENT AGGREGATION IN A PARALLEL SYSTEM

A method, a system, and a computer program product are provided. A filter is created for each portion of a data set. The filter indicates which one or more characteristics are present among each of the portions. Each of the one or more characteristics comprises one or more groups defined by the data grouping operation. The filters for the portions of the data sets are transferred to one or more filter processors and combined within the one or more filter processors to indicate characteristics of data residing across multiple processing elements to produce a result for a data grouping operation, utilizing transfers based on a combined filter result. In various embodiments, the filter may be a Bloom filter.

Evaluating reference based operations in shared nothing parallelism systems

Embodiments are included for methods, systems, and computer program products in which evaluating operations using an electronic computing device are presented including: receiving the operation on a first database partition in a shared nothing parallelism system, where the operation is non-collocated, reference based operation; generating a correlation sequence, where the correlation sequence includes a sequence of references pointing to input data required for the operation; receiving one of the references on a first table queue operator on a second database partition, where the second database partition includes input data corresponding with the received reference, and where the table queue operator is configured for providing communication between the first database partition and the second database partition; and processing the input data on the second database partition corresponding with the operation.

Computer-implemented method of performing a search using signatures
09830355 · 2017-11-28 · ·

A computer-implemented method of processing a query vector and a data vector), comprising: generating a set of masks and a first set of multiple signatures and a second set of multiple signatures by applying the set of masks to the query vector and the data vector, respectively, and generating candidate pairs, of a first signature and a second signature, by identifying matches of a first signature and a second signature. The set of masks comprises a configuration of the elements that is a Hadamard code; a permutation of a Hadamard code; or a code that deviates from a Hadamard code or a permutation of a Hadamard code in less than 40% of its elements.

USING WORKER NODES TO PROCESS RESULTS OF A SUBQUERY
20230177047 · 2023-06-08 ·

Systems and methods are disclosed for executing a query that includes an indication to process data managed by an external data system. The system identifies the external data system that manages the data to be processed and generates a subquery for the external data system indicating that the results of the subquery are to be sent to one worker node of multiple worker nodes. The system instructs the one worker node to distribute the results received from the external data system to multiple worker nodes for processing.

Methods for enhancing rapid data analysis

A method for enhancing rapid data analysis includes receiving a set of data; storing the set of data in a first set of data shards sharded by a first field; and identifying anomalous data from the set of data by monitoring a range of shard indices associated with a first shard of the first set of data shards, detecting that the range of shard indices is smaller than an expected range by a threshold value, and identifying data of the first shard as anomalous data.

Dynamic partition selection

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for dynamic partition selection. One of the methods includes receiving a representation of a query plan generated for a query, wherein the query plan includes a dynamic scan operator that represents a first computing node obtaining tuples of one or more partitions of a table from storage and transferring the tuples to a second computing node that executes a parent operator of the dynamic scan operator. A partition selector operator is generated corresponding to the dynamic scan operator. A location in the query plan is determined for the partition selector operator. A modified query plan is generated having the partition selector operator at the determined location.

Execution-Time Dynamic Range Partitioning Transformations

An example method includes receiving a data load request requesting loading and partitioning of an unknown quantity of user data for storage at a data storage system. The user data including a partitioning key; a total data size of the user data; a plurality of rows, each row of the plurality of rows associated with a value defined by the partitioning key; and one or more columns. The method also includes identifying one or more storage constraints for the data storage system. The method further includes, after receiving the user data, determining a plurality of partitioning quantiles defining respective ranges of values of the partitioning key based on the user data and the one or more storage constraints for the data storage system; and range partitioning each row of the user data into files based on the value associated with the row defined by the partitioning key, and the respective ranges of the values of the partitioning key defined by the plurality of partitioning quantiles.