Patent classifications
G06F16/278
PDSE member generation clustering and recovery
A method for enabling data set changes to be reverted to a prior point in time or state is disclosed. In one embodiment, such a method includes providing a data set comprising one or more data elements and a specified number of generations of the data elements. In certain embodiments, the data set is a partitioned data set extended (PDSE) data set, and the data elements are “members” within the PDSE data set. The method further includes tracking changes made by a job to data elements of the data set. The method further references, in a data structure (also referred to herein as a “cluster”) associated with the job, previous generations of the data elements changed by the job. In certain embodiments, the data structure is stored in the data set. A corresponding system and computer program product are also disclosed.
LOW LATENCY INGESTION INTO A DATA SYSTEM
Described herein are techniques for improving transfer of metadata from a metadata database to a database stored in a data system, such as a data warehouse. The metadata may be written into the metadata database with a version stamp, which is monotonic increasing register value, and a partition identifier, which can be generated using attribute values of the metadata. A plurality of readers can scan the metadata database based on version stamp and partition identifier values to export the metadata to a cloud storage location. From the cloud storage location, the exported data can be auto ingested into the database, which includes a journal and snapshot table.
Method of distributed graph loading for minimal communication and good balance via lazy materialization and directory indirection using indexed tabular representation
Techniques herein minimally communicate between computers to repartition a graph. In embodiments, each computer receives a partition of edges and vertices of the graph. For each of its edges or vertices, each computer stores an intermediate representation into an edge table (ET) or vertex table. Different edges of a vertex may be loaded by different computers, which may cause a conflict. Each computer announces that a vertex resides on the computer to a respective tracking computer. Each tracking computer makes assignments of vertices to computers and publicizes those assignments. Each computer that loaded conflicted vertices transfers those vertices to computers of the respective assignments. Each computer stores a materialized representation of a partition based on: the ET and vertex table of the computer, and the vertices and edges that were transferred to the computer. Edges stored in the materialized representation are stored differently than edges stored in the ET.
Systems and methods for generating customized filtered-and-partitioned market-data feeds
Presently disclosed are systems and methods for generating customized filtered-and-partitioned market-data feeds. In an embodiment, an output-feed profile is maintained in data storage at a market-data-processing device (MDPD). The output-feed profile specifies a subset of ticker symbols and a ticker-symbol-based feed-partitioning scheme. An input feed of order-book updates to ticker symbols is received at the MDPD from an upstream device. At the MDPD, a customized market-data output feed is generated according to the maintained output-feed profile at least in part by filtering the input feed down to the order-book updates to ticker symbols in the specified subset and partitioning the filtered feed according to the specified ticker-symbol-based feed-partitioning scheme. The customized market-data output feed is transmitted from the MDPD to a downstream device.
Measuring and improving index quality in a distributed data system
Embodiments described herein are directed to measuring and improving an index quality of a distributed data system. For example, various quality metrics are determined on a per partition basis of the distributed data system. Each of the quality metrics are indicative of a quality of a particular property of a partition. The quality metrics are aggregated to generate an overall index quality score, which provides a measure of the performance of the index. The index quality score is utilized to automatically detect an inefficiency of the index and automatically determine that certain index maintenance actions should be automatically performed to improve the performance of the index. Each quality metric may also be individually analyzed to determine which database property is affecting the performance of the index the most.
METHODS AND SYSTEMS FOR NON-BLOCKING TRANSACTIONS
Methods and systems for executing non-blocking transactions at a database are provided. The method includes receiving a write transaction that is directed to a partition of a table stored by a cluster of database nodes. The method includes generating, at a database node of the cluster, a synthetic timestamp based on a first time associated with the database node and a duration, wherein the synthetic timestamp exceeds the first time by the duration. The method includes executing, based on determining the synthetic timestamp, one or more operations of the write transaction at one or more replicas of the partition. The method includes committing, based on a threshold number of acknowledgements, the one or more operations of the write transaction at the one or more replicas. The method includes sending, based on a second time exceeding the synthetic timestamp, an indication of success of the write transaction.
Determining differences between two versions of a file directory tree structure
A file directory tree structure of a selected storage snapshot is dynamically divided into different portions. A plurality of the different file directory tree structure portions are analyzed in parallel to identify any changes of the selected storage snapshot from a previous storage snapshot. To analyze each of the plurality of the different file directory tree structure portions, a processor is further configured to traverse and compare a corresponding file directory tree structure portion of the selected storage snapshot with a corresponding portion of a file directory tree structure of the previous storage snapshot while at least another one of the plurality of the different file directory tree structure portions of the selected storage snapshot is being analyzed in parallel.
Adaptive tiering for database data of a replica group
A storage node of a database replica group may distribute different portions of data in local storage and external storage, where local storage and external storage are organized using different types of index structures. Responsive to receiving an access request for a database, a storage node may determine that an item of the database to be accessed by the request does not reside within a first portion of the database stored locally at the storage node. Responsive to this determination, the storage node may obtain from an external storage service a second portion of the database, the second portion including a plurality of items including the item, and the second portion organized according to a structure different from the first portion. The storage node may then store the plurality of obtained items in the first portion and process the request using the first portion of the database.
METHODS AND SYSTEMS FOR AUTOMATICALLY RESHARDING A SHARDED COLLECTION OF DATA
A method is provided for resharding a sharded database sharded according to a first shard key. The method includes: receiving, by a processor an instruction to reshard the sharded database; receiving, at the processor, a new shard key to be used in a resharding process to reshard the sharded database; determining, by the processor, whether a duration of unavailability of the sharded database during the resharding process is less than a predetermined amount of time; and automatically performing, by the processor, the resharding process according to the new shard key to produce a resharded database, if the duration of unavailability is less than the predetermined amount of time. The method may be performed without users noticing a significant interruption to read/write operations from/to the database.
Serverless managed bulk import on a global NoSQL database with selective back pressure
A system receives a request to import data file(s) from a source data store into a target database. The system reserves a first portion of computing resources that host the target database to import the data file(s). The reservation of the first portion of computing resources permits the import throughput rate of the data file(s) through the first portion of computing resources while maintaining a second portion of the computing resources to support client access to the target database at an access throughput rate. The system initiates import of the data file(s) from the source data store to the target database through one or more storage nodes at the import throughput rate according to the first portion of computing resources. The target database is able to receive access requests from one or more clients during the import of the data file(s) to the target database.