G06F16/2456

RESOURCE PROVISIONING SYSTEMS AND METHODS
20230046201 · 2023-02-16 ·

A method for a first set of processors and a second set of processors comprises, the first set of processors processing a set of queries, as a result of a change in utilization of the first set of processors, processing the set of queries using the second set of processors. The change in processors is independent of a change in storage resources, the storage resources shared by the first set of processors and the second set of processors.

RELATIONSHIP ANALYSIS USING VECTOR REPRESENTATIONS OF DATABASE TABLES
20230051059 · 2023-02-16 · ·

A computer-implemented method includes representing a plurality of database tables as respective vectors in a multi-dimensional vector space, receiving an indication that a first database table represented by a first vector and a second database table represented by a second vector are related to each other, moving the respective vectors representing the plurality of database tables in the multi-dimensional vector space in response to the indication, and grouping the plurality of database tables into one or more table clusters based on positions of the respective vectors representing the plurality of database tables in the multi-dimensional vector space.

Distributed pseudo-random subset generation

Distributed pseudo-random subset generation includes obtaining a data-query indicating a first table having a first column including unique values, a second table having a second column including unique values, a join clause joining the first table and the second table on the first column and the second column, and a limit value, pseudo-random filtering the first table to obtain left intermediate data and left filtering criteria, pseudo-random filtering the second table to obtain right intermediate data and right filtering criteria, obtaining intermediate results data by full outer joining the left intermediate data and the right intermediate data, obtaining results data by filtering the intermediate results data using most-restrictive filtering criteria among the left filtering criteria and the right filtering criteria, and outputting the results data, wherein outputting the results data includes limiting the cardinality of rows of the results data to be at most the limit value.

Methods and apparatuses for generating redo records for cloud-based database

Methods and apparatuses in a cloud-based database management system are described. Data in a database are stored in a plurality of pages in a page store of the database. A plurality of redo log records are received to be applied to the database. The redo log records within a predefined boundary are parsed to determine, for each given redo log record, a corresponding page to which the given log record is to be applied. The redo log records are reordered by corresponding page. The reordered redo log records are stored to be applied to the page store of the database.

Implementing linear algebra functions via decentralized execution of query operator flows

A method for execution by a query processing system includes determining a query request that indicates a plurality of operators, where the plurality of operators includes at least one relational algebra operator and further includes at least one non-relational operator. A query operator execution flow is generated from the query request that indicates a serialized ordering of the plurality of operators. A query resultant of the query is generated by facilitating execution of the query via a set of nodes of a database system that each perform a plurality of operator executions in accordance with the query operator execution flow, where a subset of the set of nodes each execute at least one operator execution corresponding to the at least one non-relational operator in accordance with the execution of the query.

Utilizing metadata to prune a data set

A query directed to database data stored across a set of files is received. The query includes predicates and each file from the set of files is associated with metadata stored in a metadata store that is separate from a storage platform that stores the set of files. One or more files are removed from the set of files whose metadata does not satisfy a predicate of the plurality of predicates to generate a pruned set of files. One or more predicates are removed that are satisfied by the metadata of the pruned set of files to generate a modified query.

DATA STRUCTURE MANAGEMENT SYSTEM
20230043217 · 2023-02-09 ·

A computing device generates a first token for first data content that is associated with a first relationship and a second relationship, and a second token for second data content that is associated with the first relationship and a third relationship, such that the first token and second token are generated based on a frequency of use of data values included in the first and the second data content. The computing device calculates a first similarity score of data values from third data content that is associated with the second relationship and a fourth relationship with data values from fourth data content that is associated with the third relationship and the fourth relationship in response to the first and second token matching. The computing device then performs, in response to the first similarity score satisfying a similarity threshold, a first modification to any of the data content.

Dynamic updating of query result displays

Described are methods, systems and computer readable media for dynamic updating of query result displays.

Systems and methods for spilling data for hash joins
11550793 · 2023-01-10 · ·

Systems and methods for spilling data for hash joins are described. An example method includes determining an amount of available space in a first memory used by a set of relational queries is insufficient for a first relational join query. The first relational join query comprises a join operation. The method also includes determining a set of build memory sizes and a set of probe memory sizes for a set of partitions for the set of relational queries. The method further includes identifying a first partition of the set of partitions based on the set of probe memory sizes and the set of build memory sizes. The method further includes copying the first partition from the first memory to a second memory, wherein the first partition comprises a first build portion and a first probe portion.

System for detecting data relationships based on sample data

A method of identifying relationships between data collections is disclosed. Each data collection comprises a plurality of data records made up of data fields. The method comprises performing a relationship search process based on a first seed value and a second seed value. A first set of records from the data collections is identified based on the first seed value. A second set of records from the data collections is identified based on the second seed value. The process then searches for a common value across the first and second record sets, wherein the common value is a value which appears in a first field in a first record of the first record set and in a second field in a second record of the second record set, wherein the first record is from a first data collection and the second record is from a second data collection. In response to identifying the common value, an indication is output identifying a candidate relationship between the first field of the first data collection and the second field of the second data collection.