G06F16/24524

PLATFORM AND SOURCE AGNOSTIC DATA PROCESSING FOR STRUCTURED AND UNSTRUCTURED DATA SOURCES

Data queries that are agnostic to any particular data source may include a data source alias. The data source alias may be replaced with a data source identifier to obtain a data query configured for a target data source. Data processing jobs may be agnostic to any particular data processing platform. A data processing job may include a data processing task that is agnostic to any particular data processing platform. A code library may provide platform-specific code configured to implement a data processing task on a data processing platform. A data query configured for a particular data source and a data processing task configured for a particular data processing platform may be used to create a data processing job. Configurations that restrict execution of a data processing job to execution via an interactive development environment may be removed to allow its execution directly at the data processing platform itself.

AUDITING OF DATABASE SEARCH QUERIES FOR PRIVILEGED DATA
20220414253 · 2022-12-29 ·

An approach for identifying privileged access to a database is provided. A processor receives a query plan to search the database. A processor determines the query plan includes a request that accesses privileged data. A processor generates an updated query plan with an indication of the request that accesses privileged data. A processor sends the updated query plan for an audit of the query plan.

Automated validity evaluation for dynamic amendment

A system, program product, and method for use with an artificial intelligence (AI) platform to dynamically amend a knowledge base responsive to query evaluating and processing. A received or detected query is subject to natural language processing to identify, annotate, and map one or more query tokens against a knowledge base. The query tokens are evaluated against the knowledge base to identify one or more query tokens absent from the knowledge base and leverage a neural network to predict a probability relationship between the query tokens absent from the knowledge base and one or more tokens populated in the knowledge base. The natural language (NL) query is translated to a structured query language (SQL) and the SQL query is executed and evaluated, and the knowledge base is selectively and dynamically amended subject to the SQL evaluation.

SCALABLE OBJECT STREAM QUERY FOR OBJECTS IN A DISTRIBUTED STORAGE SYSTEM
20220382754 · 2022-12-01 ·

Systems and methods for providing scalable object storage query capabilities in a distributed storage system are disclosed. In one implementation, a processing device may receive, by an object-based distributed storage system, a request from a client to execute a query with respect to data stored at the distributed storage system. The processing device may execute the query to produce a result object and may store the result object at the distributed storage system. The processing device may further transmit the result object to the client. The processing device may re-execute the query at a subsequent point in time to update the result object and transmit the updated result object to the client.

Systems and methods for privacy-enhancing transformation of a SQL query

Systems and methods for obtaining a SQL query, translating the SQL into a modified SQL query incorporating a privacy mechanism, and outputting the modified SQL query incorporating the privacy mechanism. In some embodiments, the modified SQL query incorporating the privacy mechanism is forwarded to a SQL database.

PROBLEM SOLVING IN A DATABASE
20230089667 · 2023-03-23 ·

A method includes receiving, by a computing device, a Structured Query Language (SQL) query from a user; generating, by the computing device, execution structures from the SQL query; generating, by the computing device, test results by running the SQL query with the execution structures; building, by the computing device, logs which record information of the running of the SQL query; generating, by the computing device, a candidate execution structure using the information from the logs; normalizing, by the computing device, the SQL query using the candidate execution structure; running, by the computing device, the normalized SQL query in a database; and comparing, by the computing device, results of the normalized SQL query to the test results.

Ranking data assets for processing natural language questions based on data stored across heterogeneous data sources
11609903 · 2023-03-21 · ·

An analysis system connects to a set of data sources and perform natural language questions based on the data sources. The analysis system connects with the data sources and retrieves metadata describing data assets stored in each data source. The analysis system generates an execution plan for the natural language question. The analysis system finds data assets that match the received question based on the metadata. The analysis system ranks the data assets and presents the ranked data assets to users for allowing users to modify the execution plan. The analysis system may use execution plans of previously stored questions for executing new questions. The analysis system supports selective preprocessing of data to increase the data quality.

CACHE OPTIMIZATION FOR DATA PREPARATION
20220335030 · 2022-10-20 · ·

Cache optimization for data preparation includes: generating a data traversal program that represents a result of a set of sequenced data preparation operations performed on one or more sets of data, wherein the data traversal program indicates how to assemble one or more affected columns in the one or more sets of data to derive the result; in response to receiving a specification of the set of sequenced operations to be performed on the one or more sets of data, accessing the data traversal program that represents the result or a stored copy of the data traversal program that represents the result; assembling the one or more affected columns in the one or more sets of data according to the data traversal program to re-generate the result; and outputting the result.

Optimizing database queries
11475004 · 2022-10-18 · ·

Various examples are directed to systems and methods optimizing database queries. A database management system may receive a first query comprising a plurality of query expressions. The database management system may determine that a first expression of the first query is nullable and that the first expression is null preserving. The database management system may generate optimized query code for the first query. The optimized query code may comprise a first code segment and a conditional jump instruction. The first code segment that, when executed by a processor, may cause the processor to perform operations comprising determining a value of the first expression. The conditional jump instruction may, when executed by the processor, cause the processor to perform operations comprising: skipping execution of at least a portion of the first code segment and returning null for the first expression.

Determining user and data record relationships based on vector space embeddings

Methods, systems, and devices supporting determining user and data record relationships based on vector space embeddings are described. Some database systems may receive data record access indications corresponding to data records accessed by users. A database system may generate, based on the data record access indications, user sessions for the users, data record sessions for the data records, or a combination for users and data records. For example, a user session may correspond to a respective user and include a record identifier associated with each data record accessed by the user. The system may generate, in a vector space, vectors from the sessions using an embedding operation, where each vector corresponds to a respective user or data record. The system may determine relationships between the users, data records, or both based on the vectors and may transmit an indication of at least one data record based on the relationships.