G06F16/278

OPTIMIZING REQUEST SIZES IN A SEARCH ENGINE FANOUT
20230237107 · 2023-07-27 ·

The present disclosure relates to systems and methods for implementing a distributed search that allows a search engine to retrieve the correct results for a search request even if the search engine does not request the full results from each index shard originally requested in the search request.

ACCELERATING CHANGE DATA CAPTURE DETERMINATION USING ROW BITSETS

Techniques described herein can accelerate change data capture determinations such as stream reads, which show changes made to a table between two points in time. Three distinct row bitests that mark deleted, updated, inserted, rows in micro-partitions can be added as metadata for the table. These bitsets can be generated during DML operations and then stored as metadata of the new partition generated by the DML operations. The bitsets can then be used to generate streams showing the changes in the table between two points in time (changes interval).

Semantic indexing engine
11567970 · 2023-01-31 · ·

Embodiments are described for a method of distributing n-tuples over a cluster of triple-store machines, by storing each n-tuple as text in a distributed file system using a key value store; providing each machine of the cluster with a resident semantic data lake component accessing one or more persistent RDF triplestores for the n-tuple data stored on each machine; and defining one part of each n-tuple as a partition variable to ensure locality of data within each respective n-tuple. A method includes inserting graphs into a key/value store to determine how the key/value store distributes the data across a plurality of servers, by generating textual triple data, and storing the triple data in key-value stores wherein a fourth element of the triple comprises the key, and a value associated with the key comprises all the triples about a subject; indexing the data in the key-value store in an RDF triplestore using a partition based on the fourth element.

Unbalanced partitioning of database for application data
11567969 · 2023-01-31 · ·

Provided is a database system and method in which storage is partitioned in an unbalanced format for faster access. In one example, the method may include one or more of receiving a request to store a data record, identifying a partition from among a plurality of partitions of a database based on a shard identifier in the request, automatically determining a unique range of data identifiers designated to the partition from the plurality of partitions, respectively, based on an unbalanced partitioning, determining whether the data identifier is available within the unique range of data identifiers of the identified partition, and storing the data record at the identified partition in response to determining the data identifier is available. The unbalanced partitioning according to various embodiments reduces the partitions that need to be checked during a data insert/access operation of the database.

Cache-aware system and method for identifying matching portions of two sets of data in a multiprocessor system

A system and method matches data from a first set of data with that of an other set of data in a manner based on the size of a cache.

Incremental addition of data to partitions in database tables
11567957 · 2023-01-31 · ·

A method and system for accessing updated data from a database in response to a user query has been developed. First, multiple transaction logs are generated for a database. Each transaction log contains a record of actions executed by a database management system and referenced according to the specified date of the actions. Data updates are received and stored with the database. An incremental database partition is created for each data update. Each incremental database partition is stored with reference to a corresponding transaction log for the date of the data update. The updated data is accessed through the incremental database partition in response to an outdated user query. The outdated user query contains a data access request for a date earlier than the receipt of data updates.

Tree-based format for data storage

A tree-based format may be implemented for data stored in a data store. A table may be maintained across one or multiple storage nodes in storage slabs. Storage slabs may be mapped to different nodes of a tree. Each node of the tree may be assigned a different range of distribution scheme values which identify what portions of the table are stored in the storage slab. Storage slabs mapped to child nodes in the tree may be assigned portions of the range of distribution scheme values assigned to a parent. Storage nodes may be added or removed for storing the table. Storage slabs may be moved from one storage node to another in order to accommodate the addition or removal of storage nodes.

Re-ordered processing of read requests
11709835 · 2023-07-25 · ·

A method includes determining, in accordance with a first ordering, a plurality of read requests for a memory device. The plurality of read requests are added to a memory device queue for the memory device in accordance with the first ordering. The plurality of read requests in the memory device queue are processed, in accordance with a second ordering that is different from the first ordering, to determine read data for each of the plurality of read requests. The read data for the each of the plurality of read requests is added one of a set of ordered positions, based on the first ordering, of a ring buffer as the each of the plurality of reads requests is processed. The read data of a subset of the plurality of read requests is submitted based on adding the read data to a first ordered position of the set of ordered positions of the ring buffer.

LARGE OBJECT PACKING FOR STORAGE EFFICIENCY
20230027688 · 2023-01-26 ·

One example method includes receiving data, partitioning the data according to their respective similarity groups, and the similarity groups collectively define a range of similarity groups, deduplicating the data after the partitioning, packing unique data segments remaining after deduplicating into one or more compression regions, compressing the compression regions, and writing an object, that includes the compression regions, to a durable log. The deduplicating and compressing for a similarity group may be performed by a dedup-compression instances uniquely assigned to that similarity group.

DATA-SHARDING FOR EFFICIENT RECORD SEARCH
20230021868 · 2023-01-26 ·

Data-sharding systems and/or methods for cost- and time-efficient record search are described. Data-sharding embodiments utilize a name-sharding dimension, optionally in combination with one or more additional dimensions such as record type and year, to reduce latency and reduce search-associated costs. The data-sharding systems and methods embodiments utilize an optimization algorithm to determine a distribution of records related to names. The optimization algorithm may use a three-character prefix for surnames in records to distribute shards across documents, with specific shards relating to no-name and multi-name records allocated.