G06F16/24532

ENHANCING DATABASE QUERY PROCESSING
20230222124 · 2023-07-13 ·

A system, program product, and method for enhancing automatic multidimensional query processing. The method includes executing a database query including semi-joining a plurality of dimension tables with a fact table. The method also includes identifying for extraction one or more data values from each dimension table of the plurality of dimension tables. The data values from each dimension table of the plurality of dimension tables are associated with a respective record identification (RID), thereby defining one or more RIDs. The method further includes generating a plurality of RID lists. Each RID list of the plurality of RID lists includes a collection of the one or more RIDs for the respective dimension table. The method also includes merging the plurality of RID lists, sorting, subject to the merging, the plurality of RIDs as a function of data location, and fetching the data values from the fact table.

Search system for providing search results using query understanding and semantic binary signatures
11698921 · 2023-07-11 · ·

Technology for the improved processing of search queries is provided. In one embodiment, methods may return semantically relevant search results for a search query. During a pre-computing offline processing, an inventory semantic index may be generated and may include inventory binary hashing signatures that are associated with inventory listings, such as goods or services for sell, and the index may be partitioned by categories and shards. When a search query is received, relevant categories are determined using a relevant category recognition service, and a search query binary hashing signature maybe generated for the search query. The relevant categories are searched to determine hamming distances between the inventory binary hashing signatures and the search query binary hashing signature, where the hamming distance indicates semantic relevance.

SEGMENT TREND ANALYTICS QUERY PROCESSING USING EVENT DATA
20230010139 · 2023-01-12 · ·

A method, system, and computer program product for conserving resources in segment trend analytics query processing using event data. A set of events of an entity is aggregated and sorted from earliest to last, and sequentially processed to incrementally set a subset therefrom. A predicate function for determining segment membership is applied respective of a linear timeline of events of the subset represented by a time of an event processed. A data record comprising identification of the entity, time, and respective segment is generated and stored. Data records are aggregated by respective identification of a segment and a time comprised therein, and at least one analytic measure respective of entities which identification thereof is comprised therein, is calculated and stored. An indication of the at least one analytic measure calculated respective of a segment and a time queried is returned, whereby determination of a trend of the segment is enabled.

Data investigation and visualization system

Data investigations are performed by querying a plurality of data sources. A system receives an investigation input and queries a plurality of data sources in accordance with the received input. The system receives, in response to the querying, response data from the plurality of data sources, and generates and stores a data structure representing relationships between the first investigation input and the first response data. The data structure may be in the form of a knowledge graph. The system may generate and display a visualization of the data structure. The system may generate and store a record of investigation steps used to generate the data structure, such that the investigation steps may be applied in future instances, for example using different inputs, to generate new data structures.

System and method for generating size-based splits in a massively parallel or distributed database environment

A system and method is described for database split generation in a massively parallel or distributed database environment including a plurality of databases and a data warehouse layer providing data summarization and querying functionality. A database table accessor of the system obtains, from an associated client application, a query for data in a table of the data warehouse layer, wherein the query includes a user preference. The system obtains table data representative of properties of the table, and determines a splits generator in accordance with one or more of the user preference or the properties of the table. The system generates, by the selected splits generator, table splits dividing the user query into a plurality of query splits, and outputs the plurality of query splits to an associated plurality of mappers for execution by the associated plurality of mappers of each of the plurality of query splits against the table.

Context-based digital assistant

An electronic device that includes one or more input sensor devices, one or more output devices, one or more computer processors and a memory containing computer program code that, when executed by operation of the one or more computer processors, performs an operation. The operation includes collecting information, using one or more input sensor devices, about the plurality of users within a physical environment. The operation includes analyzing the collected information to determine a present situational context for the plurality of users that are currently present within the physical environment. An action to perform is determined based on the determined present situational context. The determined action is executed using the one or more output devices.

Using machine learning to estimate query resource consumption in MPPDB

Methods and apparatus are provided for using machine learning to estimate query resource consumption in a massively parallel processing database (MPPDB). In various embodiments, the machine learning may jointly perform query resource consumption estimation for a query and resource extreme events detection together, utilize an adaptive kernel that is configured to learn most optimal similarity relation metric for data from each system settings, and utilize multi-level stacking technology configured to leverage outputs of diverse base classifier models. Advantages and benefits of the disclosed embodiments include providing faster and more reliable system performance and avoiding resource issues such as out of memory (OOM) occurrences.

Data statement chunking
11537610 · 2022-12-27 · ·

Techniques are presented for applying fine-grained client-specific rules to divide (e.g., chunk) data statements to achieve cost reduction and/or failure rate reduction associated with executing the data statements over a subject dataset. Data statements for the subject dataset are received from a client. Statement attributes derived from the data statements are processed with respect to fine-grained rules and/or other client-specific data to determine whether a data statement chunking scheme is to be applied to the data statements. If a data statement chunking scheme is to be applied, further analysis is performed to select a data statement chunking scheme. A set of data operations are generated based at least in part on the selected data statement chunking scheme. The data operations are issued for execution over the subject dataset. The results from the data operations are consolidated in accordance with the selected data statement chunking scheme and returned to the client.

Sort optimization

A system and method for processing of queries including receiving a query including a set operation and a sort operation, wherein the set operation includes a first data structure and a second data structure and the sort operation requests a result set that is sorted based on a column or attribute of the first data structure and a column or attribute of the second data structure; generating a query plan in which a sort operation occurs prior to the set operation; determining a first, partial set of one or more resultant rows responsive to the query; sending the first, partial set of one or more resultant rows responsive to the query to a client; determining a second, partial set of one or more resultant rows responsive to the query; and sending the second, partial set of one or more resultant rows to the client.

Caching objects from a data store
11520789 · 2022-12-06 · ·

In some examples, a database management node updates object metadata with indicators of access frequencies of a plurality of objects in a data store that is remotely accessible by the database management node over a network. The database management node selects a subset of the plurality of objects based on the indicators, and caches the subset in the local storage.