G06F16/24557

PRUNING INDEX TO SUPPORT SEMI-STRUCTURED DATA TYPES

A source table organized into a set of batch units is accessed. The source table comprises a column of data corresponding to a semi-structured data type. One or more indexing transformations for an object in the column are generated. The generating of the one or more indexing transformation includes converting the object to one or more stored data types. A pruning index is generated for the source table based in part on the one or more indexing transformations. The pruning index comprises a set of filters that index distinct values in each column of the source table, and each filter corresponds to a batch unit in the set of batch units. The pruning index is stored in a database with an association with the source table.

System and method for classification of low relevance records in a database using instance-based classifiers and machine learning

Devices and methods for classification of low relevance records in a database are disclosed. A method includes: in response to a request to delete a selected database record, generating a vector representation of the selected record, deleting the selected record in the database, and storing the vector representation of the deleted selected record; in response to the storing the vector representation of the deleted selected record, determining a cluster from which the vector representation has a shortest determined distance, among a plurality of clusters into which a plurality of vector representations of deleted records is partitioned; determining a distance between a record in the database and a nearest cluster among the plurality of clusters into which the plurality of vector representations of deleted records is partitioned; and in response to the record being within a predetermined distance of the nearest cluster, determining that the record is a deletion candidate record.

PRUNING INDEX GENERATION FOR PATTERN MATCHING QUERIES

A query directed at a source table organized into a set of batch units is received. The query includes a pattern matching predicate that specifies a search pattern. A set of N-grams are generated based on the search pattern. A pruning index associated with the source table is accessed. The pruning index comprises a set of filters that index distinct N-grams in each column of the source table. The pruning index is used to identify a subset of batch units to scan for matching data based on the set of N-grams generated for the search pattern. The query is processed by scanning the subset of batch units.

Processing database queries using format conversion
11176132 · 2021-11-16 · ·

Devices, methods and systems for processing database queries formatted differently than the database storage model being queried are disclosed. Processing database queries independent of the storage model of the queried database may be performed by receiving a query for one or more data items stored in a database, determining whether to use at least one query operator that uses data having a format different from the storage model format of at least one of one or more data items stored in the database and converting the format of the data used by the at least one query operator to a format that matches the storage model format of at least one of one or more data items stored in the database. Related systems, methods, and articles of manufacture are also described.

PREFIX INDEXING
20220012247 · 2022-01-13 ·

A table organized into a set of batch units is accessed. A set of N-grams are generated for a data value in the source table. The set of N-grams include a first N-gram of a first length and a second N-gram of a second length where the first N-gram corresponds to a prefix of the second N-gram. A set of fingerprints are generated for the data value based on the set of N-grams. The set of fingerprints include a first fingerprint generated based on the first N-gram and a second fingerprint generated based on the second N-gram and the first fingerprint. A pruning index that indexes distinct values in each column of the source table is generated based on the set of fingerprints and stored in a database with an association with the source table.

Dynamic combination of processes for sub-queries

A tool for combining common processes shared by at least two or more sub-queries within a query is provided. The tool determines whether one or more sub set relationships are shared between the at least two or more sub-queries. Responsive to a determination that one or more sub set relationships are shared between the at least two or more sub-queries, the tool determines an order class for the at least two or more sub-queries based on the one or more sub set relationships, wherein determining the order class includes transforming the query to include one or more differing aspects within the single shared common process, with the one or more differing aspects arranged based, at least in part, on a query style, a query type, and a query function. Responsive to determining an access path for the query, the tool executes the access path during run-time for data accessing.

Processing database queries using format conversion
11755575 · 2023-09-12 · ·

Devices, methods and systems for processing database queries formatted differently than the database storage model being queried are disclosed. Processing database queries independent of the storage model of the queried database may be performed by receiving a query for one or more data items stored in a database, determining whether to use at least one query operator that uses data having a format different from the storage model format of at least one of one or more data items stored in the database and converting the format of the data used by the at least one query operator to a format that matches the storage model format of at least one of one or more data items stored in the database. Related systems, methods, and articles of manufacture are also described.

Separation of logical and physical storage in a distributed database system

Distributed database systems including compute nodes and page servers are described herein that enable separating logical and physical storage of database files in a distributed database system. A distributed database system includes a page server and a compute node and is configured to store a logical database file that includes data and is associated with a file identifier. Each page server is configurable to store slices (i.e., subportions) of the logical database file. The compute node is coupled to the plurality of page servers and configured to store the logical database file responsive to a received command. In an aspect, such storage may comprise slicing the data comprising the logical database file into a set of slices with each being associated with a respective page server, maintaining an endpoint mapping for each slice of the first set of slices, and transmitting each slice to the associated for storage thereby.

INDEX GENERATION USING LAZY REASSEMBLING OF SEMI-STRUCTURED DATA
20230139194 · 2023-05-04 ·

A pruning index is generated for a source table organized into a set of batch units. The source table comprises a column of semi-structured data. The pruning index comprises a set of filters that index distinct values in each column of the source table. Rather than reassembling an entire tree structure of the semi-structured data prior to indexing, the generating of the pruning index comprises traversing a reassembly hook object that represents a first portion of the semi-structured data that is subcolumnarized and traversing a residual object that represents a second portion of the semi-structured data that is not subcolumnarized. The reassembly hook object is traversed to identify values corresponding to the first portion of the semi-structured data and the residual object is traversed to identify values corresponding to the second portion. The pruning index is stored with an association with the source table.

Automated query tuning method, computer program product, and system for MPP database platform
11657047 · 2023-05-23 ·

A method if improving the performance of any SQL query in a Massively Parallel Processing (MPP) database platform replicates a query and breaks the query down into its objects so that iterations of the query components may be analyzed for areas affecting performance. The method builds the query from the lowest part of the query (for example, a single database object may be used in the query) and rebuilds the query by iteratively adding more objects along with their related logic (joins, group by clause, select list, etc.). In each iteration, the process analyzes for the underlying causes of lower performance and fixes them.