G06F16/24557

LAZY REASSEMBLING OF SEMI-STRUCTURED DATA
20220358128 · 2022-11-10 ·

A pruning index is generated for a source table organized into a set of batch units. The source table comprises a column of semi-structured data. The pruning index comprises a set of filters that index distinct values in each column of the source table. Rather than reassembling an entire tree structure of the semi-structured data prior to indexing, the generating of the pruning index comprises traversing a reassembly hook object that represents a first portion of the semi-structured data that is subcolumnarized and traversing a residual object that represents a second portion of the semi-structured data that is not subcolumnarized. The reassembly hook object is traversed to identify values corresponding to the first portion of the semi-structured data and the residual object is traversed to identify values corresponding to the second portion. The pruning index is stored with an association with the source table.

COLUMN ORDERING FOR INPUT/OUTPUT OPTIMIZATION IN TABULAR DATA

Systems, methods, and computer-readable media for determining column ordering of a data storage table for search optimization are described herein. In some examples, a computing system is configured to receive input containing statistics of a plurality of queries. The computing system can then determine a new column order (i.e., layout) based at least in part on the statistics. In some example techniques described herein, the computing system can determine the new column order based at least in part on the hardware components storing the data storage table, storage system parameters, and/or user preference information. Example techniques described herein can apply the new column order to data subsequently added to the data storage table. Example techniques described herein can apply the new column order to existing data in the data storage table.

INDEXED REGULAR EXPRESSION SEARCH WITH N-GRAMS

A query directed at a source table organized into a set of batch units is received. The query comprises a regular expression search pattern. The regular expression search pattern is converted to a pruning index predicate comprising a set of substring literals extracted from the regular expression search pattern. A set of N-grams is generated based on the set of substring literals extracted from the regular expression search pattern. A pruning index associated with the source table is accessed. The pruning index indexes distinct N-grams in each column of the source table. A subset of batch units to scan for data matching the query are identified based on the pruning index and the set of N-grams. The query is processed by scanning the subset of batch units.

System and method for providing direct access to a sharded database

In accordance with an embodiment, described herein are systems and methods for providing direct access to a sharded database. A shard director provides access by software client applications to database shards. A connection pool (e.g., a Universal Connection Pool, UCP) and database driver (e.g., a Java Database Connectivity, JDBC, component) can be configured to allow a client application to provide a shard key, either during connection checkout or at a later time; recognize shard keys specified by the client application; and enable connection by the client application to a particular shard or chunk. The approach enables efficient re-use of connection resources, and faster access to appropriate shards.

SCAN SET PRUNING FOR QUERIES WITH PREDICATES ON SEMI-STRUCTURED FIELDS

A source table organized into a set of batch units is accessed. The source table comprises a column of data corresponding to a semi-structured data type. One or more indexing transformations for an object in the column are generated. The generating of the one or more indexing transformation includes converting the object to one or more stored data types. A pruning index is generated for the source table based in part on the one or more indexing transformations. The pruning index comprises a set of filters that index distinct values in each column of the source table, and each filter corresponds to a batch unit in the set of batch units. The pruning index is stored in a database with an association with the source table.

Shard Optimization For Parameter-Based Indices

Methods and systems for shard optimized database queries using parameter-based indexes are provided. Exemplary methods include: receiving a database query that includes an index parameter and an index parameter range. A parameter table is accessed that contains an association between the parameter and parameter range and a shard identifier. Based on the parameter type and the range identified in the query, the relevant shards are identified, and the database query is limited to processing these shards.

Late Materialization of Queried Data in Database Cache
20230141190 · 2023-05-11 ·

Aspects of the disclosure are directed to late materialization of attributes in response to queries to a database implementing a database cache. Queried data is materialized in temporary memory before the data is projected as part of generating a result to the query. Instead of materializing all of the attributes referenced in a query before executing the query, a database management system materializes attributes as “late” as possible—when the operation needing the attributes is executed. The operation needing the attributes can be performed sooner, as opposed to materializing all referenced attributes are materialized before executing the query.

Method for range operation data management statement

Processing range operation data management statements in a database is provided. The method comprises receiving statements for range operations that specify referenced pages in the database. The range operations are stored in a search structure in a table directory in the database and applied to any referenced pages in a memory buffer pool. Application of the range operations is postponed for any referenced pages not in the memory buffer pool. The database determines if reading the postponed pages into the buffer pool would exceed a specified input/output threshold. If reading the postponed pages into the buffer pool does not exceed the specified threshold, the database reads the postponed pages from disk to the buffer pool asynchronously in parallel, and the range operations are then applied to the postponed pages. Pages modified by the range operations are then written from the buffer pool back to disk.

Indexed regular expression search with N-grams

A query directed at a source table organized into a set of batch units is received. The query comprises a regular expression search pattern. The regular expression search pattern is converted to a pruning index predicate comprising a set of substring literals extracted from the regular expression search pattern. A set of N-grams is generated based on the set of substring literals extracted from the regular expression search pattern. A pruning index associated with the source table is accessed. The pruning index indexes distinct N-grams in each column of the source table. A subset of batch units to scan for data matching the query are identified based on the pruning index and the set of N-grams. The query is processed by scanning the subset of batch units.

Tracking changing state data to assist in computer network security

A session table includes one or more records, where each record represents a session. Session record information is stored in various fields, such as key fields, value fields, and timestamp fields. Session information is described as keys and values in order to support query/lookup operations. A session table is associated with a filter, which describes a set of keys that can be used for records in that table. A session table is populated using data contained in security information/events. Rules are created to identify events related to session information, extract the session information, and use the session information to modify a session table. A session table is partitioned so that the number of records in each session table partition is decreased. A session table is processed periodically so that active sessions are moved to the current partition.