G06F16/24557

Processing queries on semi-structured data columns

A source table organized into a set of batch units is accessed. The source table comprises a column of data corresponding to a semi-structured data type. One or more indexing transformations for an object in the column are generated. The generating of the one or more indexing transformation includes converting the object to one or more stored data types. A pruning index is generated for the source table based in part on the one or more indexing transformations. The pruning index comprises a set of filters that index distinct values in each column of the source table, and each filter corresponds to a batch unit in the set of batch units. The pruning index is stored in a database with an association with the source table.

SYSTEMS AND METHODS FOR EFFICIENTLY QUERYING EXTERNAL TABLES

Disclosed herein are systems and methods for efficiently querying external tables. In an embodiment, a database platform receives a query that is directed at least in part to external data in an external table stored on a data storage platform that is external to the database platform. The external table includes a plurality of partitions. The database platform identifies, from external-table metadata, a subset of the plurality of partitions of the external table as including data that potentially satisfies the query. The external-table metadata is stored by the database platform. The database platform identifies data that satisfies the query by scanning the identified subset of the partitions, and responds to the query at least in part with the identified data that satisfies the query.

Pruning using prefix indexing
11487763 · 2022-11-01 · ·

A table organized into a set of batch units is accessed. A set of N-grams are generated for a data value in the source table. The set of N-grams include a first N-gram of a first length and a second N-gram of a second length where the first N-gram corresponds to a prefix of the second N-gram. A set of fingerprints are generated for the data value based on the set of N-grams. The set of fingerprints include a first fingerprint generated based on the first N-gram and a second fingerprint generated based on the second N-gram and the first fingerprint. A pruning index that indexes distinct values in each column of the source table is generated based on the set of fingerprints and stored in a database with an association with the source table.

PROCESSING DATABASE QUERIES USING FORMAT CONVERSION
20220035815 · 2022-02-03 ·

Devices, methods and systems for processing database queries formatted differently than the database storage model being queried are disclosed. Processing database queries independent of the storage model of the queried database may be performed by receiving a query for one or more data items stored in a database, determining whether to use at least one query operator that uses data having a format different from the storage model format of at least one of one or more data items stored in the database and converting the format of the data used by the at least one query operator to a format that matches the storage model format of at least one of one or more data items stored in the database. Related systems, methods, and articles of manufacture are also described.

QUERY REWRITE USING MATERIALIZED VIEWS WITH LOGICAL PARTITION CHANGE TRACKING

Using Logical Partition Change Tracking (LPCT), a database system is able track the staleness of a materialized view at the level of logical partitions of a base database object, in addition to or instead of tracking the staleness of a materialized view at the level of physical partitions of the base database object. When the base database object is logically partitioned, it is possible using LPCT for the system to identify the records of the materialized view that correspond to changed logical partitions of the base database object. The records of the materialized view corresponding to the changed logical partitions become stale while other records of the materialized view corresponding to unchanged logical partitions remain fresh. The ability to identify which records of a materialized view are fresh and which are stale at the level of logical partitions of the base database object allows the system to rewrite user queries to use those records of the materialized view that are fresh.

Efficient storage and retrieval of fragmented data using pseudo linear dynamic byte array

A system and method for efficient storage and retrieval of fragmented data using a pseudo linear dynamic byte array is provided. In accordance with an embodiment, the system comprises a database driver which provides access by a software application to a database. The database driver uses a dynamic byte array to enable access by the application to data in the database, including determining a size of a required data to be stored in memory, and successively allocating and copying the required data into the dynamic byte array as a succession of blocks. The data stored within the succession of blocks can then be accessed and provided to the application.

Systems and/or methods for performing atomic updates on large XML information sets
09760549 · 2017-09-12 · ·

Certain example embodiments described herein relate to techniques for processing XML documents of potentially very large sizes. For instance, certain example embodiments parse a potentially large XML document, store the parsed data and some associated metadata in multiple independent blocks or partitions, and instantiate only the particular object model object requested by a program. By including logical references rather than physical memory addresses in such pre-parsed partitions, certain example embodiments make it possible to move the partitions through a caching storage hierarchy without necessarily having to adjust or encode memory references, thereby advantageously enabling dynamic usage of the created partitions and making it possible to cache an arbitrarily large document while consuming a limited amount of program memory. Such techniques may be extended to enable atomic updates to be processed efficiently, e.g., by maintaining commit level information in a partition list and optionally implementing document shadowing.

PROCESSING TECHNIQUES FOR QUERIES WHERE PREDICATE VALUES ARE UNKNOWN UNTIL RUNTIME

A query directed at a table organized into a set of batch units is received. The query comprises a predicate for which values are unknown prior to runtime. A set of values for the predicate are determined based on the query. An index access plan is created based on the set of values. Based on the index access plan, the set of batch units are pruned using a pruning index associated with the table. The pruning index comprises a set of filters that index distinct values in each column of the table. The pruning of the set of batch units comprises identifying a subset of batch units to scan for data that satisfies the query. The subset of batch units of the table are scanned to identify data that satisfies the query.

Automated maintenance of external tables in database systems

Systems, methods, and devices for automated maintenance of external tables in database systems are disclosed. A method includes receiving, by a database platform, read access to content in an external data storage platform that is separate from the database platform. The method includes defining an external table based on the content in the external data storage platform. The method includes connecting the database platform to the external table such that the database platform has read access for the external table and does not have write access for the external table. The method includes generating metadata for the external table, the metadata comprising information about data stored in the external table. The method includes receiving a notification that a modification has been made to the content in the external data storage platform, the modification comprising one or more of an addition of a file, a deletion of a file, or an update to a file in a source location for the external table. The method includes refreshing the metadata for the external table in response to the modification being made to the content in the external data storage platform.

Processing techniques for queries where predicate values are unknown until runtime

A query directed at a table organized into a set of batch units is received. The query comprises a predicate for which values are unknown prior to runtime. A set of values for the predicate are determined based on the query. An index access plan is created based on the set of values. Based on the index access plan, the set of batch units are pruned using a pruning index associated with the table. The pruning index comprises a set of filters that index distinct values in each column of the table. The pruning of the set of batch units comprises identifying a subset of batch units to scan for data that satisfies the query. The subset of batch units of the table are scanned to identify data that satisfies the query.