G06F16/2456

Cache-aware system and method for identifying matching portions of two sets of data in a multiprocessor system

A system and method matches data from a first set of data with that of an other set of data in a manner based on the size of a cache.

Dynamically normalizing intervals in a table

Dynamically normalizing intervals in a table including receiving, from a client computing system, a request to normalize intervals for a data set on a cloud-based data warehouse, wherein the request comprises a reference to the data set and a data range; generating, on the cloud-based data warehouse, an interval table using the data range; joining, into a joined table on the cloud-based data warehouse, the interval table and the data set; receiving the joined table from the cloud-based data warehouse; and presenting, via a graphical user interface on the client computing system, the joined table as a worksheet.

DATA STITCHING ACROSS FEDERATED DATA LAKES
20230237070 · 2023-07-27 ·

In one embodiment, a device, in communication with a plurality of data lake sites, receives a federated data lake query. The device determines a plurality of data lake operator sets that each correspond to one of the plurality of data lake sites, wherein each of the plurality of data lake operator sets is used to establish a respective data pipeline for the federated data lake query. The device selects a particular data lake site of the plurality of data lake sites as a destination for data pipelines that are established for the federated data lake query. The device sends the plurality of data lake operator sets that each correspond to one of the plurality of data lake sites to cause the plurality of data lake sites to send query results to the particular data lake site using the data pipelines, wherein the particular data lake site stitches the query results.

System for augmenting and joining multi-cadence datasets

A computing system may comprise a server system, a database, and one or more data sources having different cadences, such as a batch data source and a real-time data source. The server system may generate a first dataset based on data from the batch data source, and may generate a second dataset based on data received from the real-time data source. The server system may determine metadata associated with the real-time data source. Based on the metadata, the server system may generate a database table representation of the real-time data source. The server system may be configured to perform a relational join on the first and second datasets. Such a relational join may define a namespace that is based on the first and second datasets.

Enhanced preparation and integration of data sets

Methods, systems, and apparatus, including computer programs encoded on computer-storage media, for enhanced preparation and integration of data sets. In some implementations, data indicating user input that identifies a first data set that includes streaming data and a second data set that includes non-streaming data is received. The first data set and the second data set are integrated to generate a hybrid data set. The data processing system provides access to the hybrid data set through a (i) non-streaming access channel that provides a periodically-refreshed summary of both the streaming data and the non-streaming data and (ii) a streaming access channel that provides a data stream based on combined data of the first data set and the second data set. One or more application programing interfaces are provided. The one or more application programming interfaces allow at least one client device to access the non-streaming access channel and the streaming access channel.

Join elimination enhancement for real world temporal applications

A database system receives a query and determines that the query includes an inner join between a parent table and a child table. The database system determines that the following relationships exists between the parent table and the child table: referential integrity (“RI”) between a primary key attribute (pk) in the parent table and a foreign key attribute (fk) in the child table and a temporal relationship constraint (“TRC”) between a period attribute in the parent table and a TRC-attribute in the child table. The database system determines that the query satisfies non-temporal join elimination conditions and temporal join elimination conditions and that the query contains no other qualification conditions on the parent table's period attribute and eliminates the inner join when planning execution of the query.

Search Extraction Matching, Draw Attention-Fit Modality, Application Morphing, and Informed Apply Apparatuses, Methods and Systems
20230026252 · 2023-01-26 ·

The Search Extraction Matching, Draw Attention-Fit Modality, Application Morphing, and Informed Apply Apparatuses, Methods and Systems (“SEMATFM-AMIA”) transforms inputs including new job listing introduction inputs, via SEMATFM-AMIA components (e.g., the conductor component, the resume view controller component, the XY paths handler component, the title handler component, the resume librarian component, and the job listing librarian component), into outputs including relevant resume outputs and/or augmented new job listing record outputs. It is noted that the terms “component” and “object” may be used interchangeably hereinthroughout. In one embodiment, the SEMATFM-AMIA includes an apparatus, comprising: a memory, a component collection in the memory, and a processor disposed in communication with the memory, and configured to issue a plurality of processing instructions from the component collection stored in the memory. SEMATFM-AMIA may then receive, in connection with an application to a job, a resume adjustment request, where the request includes one or more raw terms of a resume, one or more normalized terms of the resume, one or more raw terms of a job listing corresponding to the job, and one or more normalized terms of the job listing. SEMATFM-AMIA may load said resume normalized terms and said job listing normalized terms into a joined normalized terms set, and add to a common normalized terms set normalized term members of the joined normalized terms set which meet a count criterion. SEMATFM-AMIA may visit each of one or more normalized term members of the common normalized terms set. After further receiving, adding, visiting, providing and otherwise processing data, SEMATFM-AMIA may receive, from the resume adjuster component, a request to formulate the adjusted resume record, wherein said record formulation request includes specification of the resume and substitution information, and formulate the adjusted resume record which substitutes each of user-selected resume raw terms with a corresponding user-selected job listing raw term, wherein the formulation includes accessing one or more stores.

Generating real-time aggregates at scale for inclusion in one or more modified fields in a produced subset of data
11561993 · 2023-01-24 · ·

A data processing system for producing a subset of data from a plurality of data sources, including: memory storing a plurality of data sources to be represented in an editor interface; a data structure modification module that selects a plurality of data sources to be represented in an editor interface and generates a subset of data included in the plurality of data sources; memory that stores the selected data structures included in the subset, with at least one of the stored data structures including the one or more modified attributes of the one or more respective fields; rendering module that displays, in the editor interface, representations of the stored data structures; and a segmentation modules that segments a plurality of received data records.

Global indexing techniques for accelerating database functions
11561981 · 2023-01-24 · ·

A system and method for accelerating relational functions between tables. The method includes: determining a plurality of first index values for a plurality of first unique keys in a first column of a first table; determining a plurality of second index values for a plurality of second unique keys in a second column of a second table; generating a hashed third table based on the first column of the first table and the plurality of first index values; generating a hashed fourth table based on the second column of the first table and the plurality of first index values; and generating a fifth table by performing a JOIN operation between the third table and the fourth table based on at least one third column, wherein each of third column includes a plurality of third unique keys that are common between the third table and the fourth table.

Techniques for relationship discovery between datasets

The present disclosure related to techniques for analyzing data from multiple different data sources to determine a relationship between the data (also referred to herein a “data relationship discovery”). The relationships between any two compared datasets may be used to determine one or more recommendations for merging (e.g., joining), or “blending,” the data sets together. Relationship discovery may include determining a relationship between a subset of data, such as a relationship between a pair of columns, or column pair, each column in a different dataset of the datasets that are compared. Given two datasets to process for relationship discovery, relationship discovery may identify and recommends a ranked subset of column pairs between two compared datasets. The ranked column pairs identified as a relationship may be useful for blending the datasets with respect to those column pairs.