Patent classifications
G06F16/278
SYSTEM AND METHOD FOR AN ULTRA HIGHLY AVAILABLE, HIGH PERFORMANCE, PERSISTENT MEMORY OPTIMIZED, SCALE-OUT DATABASE
A shared-nothing database system is provided in which parallelism and workload balancing are increased by assigning the rows of each table to “slices”, and storing multiple copies (“duplicas”) of each slice across the persistent storage of multiple nodes of the shared-nothing database system. When the data for a table is distributed among the nodes of a shared-nothing system in this manner, requests to read data from a particular row of the table may be handled by any node that stores a duplica of the slice to which the row is assigned. For each slice, a single duplica of the slice is designated as the “primary duplica”. All DML operations (e.g. inserts, deletes, updates, etc.) that target a particular row of the table are performed by the node that has the primary duplica of the slice to which the particular row is assigned. The changes made by the DML operations are then propagated from the primary duplica to the other duplicas (“secondary duplicas”) of the same slice.
SEPARATION OF LOGICAL AND PHYSICAL STORAGE IN A DISTRIBUTED DATABASE SYSTEM
Distributed database systems including compute nodes and page servers are described herein that enable separating logical and physical storage of database files in a distributed database system. A distributed database system includes a page server and a compute node and is configured to store a logical database file that includes data and is associated with a file identifier. Each page server is configurable to store slices (i.e., subportions) of the logical database file. The compute node is coupled to the plurality of page servers and configured to store the logical database file responsive to a received command. In an aspect, such storage may comprise slicing the data comprising the logical database file into a set of slices with each being associated with a respective page server, maintaining an endpoint mapping for each slice of the first set of slices, and transmitting each slice to the associated for storage thereby.
Cross cluster replication
Methods and systems for cross cluster replication are provided. Exemplary methods include: periodically requesting by a follower cluster history from a leader cluster, the history including at least one operation and sequence number pair, the operation having changed data in a primary shard of the leader cluster; receiving history and a first global checkpoint from the leader cluster; when a difference between the first global checkpoint and a second global checkpoint exceeds a user-defined value, concurrently making multiple additional requests for history from the leader cluster; and when a difference between the first global checkpoint and the second global checkpoint is less than a user-defined value, executing the at least one operation, the at least one operation changing data in a primary shard of the follower cluster, such that an index of the follower cluster replicates an index of the leader cluster.
Method and apparatus for stress management in a searchable data service
Method and apparatus for stress management in a searchable data service. The searchable data service may provide a searchable index to a backend data store, and an interface to build and query the searchable index, that enables client applications to search for and retrieve locators for stored entities in the backend data store. Embodiments of the searchable data service may implement a distributed stress management mechanism that may provide functionality including, but not limited to, the automated monitoring of critical resources, analysis of resource usage, and decisions on and performance of actions to keep resource usage within comfort zones. In one embodiment, in response to usage of a particular resource being detected as out of the comfort zone on a node, an action may be performed to transfer at least part of the resource usage for the local resource to another node that provides a similar resource.
STATISTICS-BASED DYNAMIC DATABASE PARTITIONS
The present disclosure relates to database technology and in particular to dynamically updating and customizing database partitions. A computer-implemented engine is disclosed for identifying and retrieving a number of data records applicable to generate a response to a request, the engine having access to at least two partitions. Partition statistics are generated indicating correlations between the data records and, based on that partition statistics, the data records having the strongest correlation with each other are relocated to partitions so that the number of partitions which have to be queried in order to generate a response to a data request is minimized. Furthermore, the computational load caused when generating responses is more equally distributed across the partitions.
Data stream processing
Techniques for partitioning data from a data stream into batches and inferring schema for individual batches based on the field values of each batch are disclosed. The system may infer different schemas corresponding to different batches of data records even though the batches are received from a common data stream or a common data source. The system may infer a schema by determining whether a field contains single values or multiple values. Then the system determines the field type(s) associated with the values. These determinations are then stored in a dictionary generated for each batch.
Dynamic updating of query result displays
Described are methods, systems and computer readable media for dynamic updating of query result displays.
Time-based partitioning to avoid in-place updates for data set copies
Time-based partitioning of a data set is applied to capture updates to the data set in a copy of the data set. Items that have been updated in a data set with in a time period are identified. Partitions of the data set that include the updated items are created according to a partitioning scheme. The created partitions are grouped in a storage location for the time period in a file structure that stores a copy of the database. A latest version of the copy of the data set may be accessed according to latest partitions of the data set stored in the different locations of the file structure.
Systems and methods for managed asset distribution in a distributed heterogeneous storage environment
Embodiments of systems and methods for the rules based distribution of managed content across heterogeneous storage distributed in a network environment are disclosed. In particular, certain embodiments may employ entity rules in association with a content management system. An entity rule may be a rule specifying a set of parameters and a destination secondary storage location. When the entity rule is evaluated by the content system, a set of content managed by the content management system responsive to the rule may be determined using the parameters of the rule. Responsive content can be determined, for example, by searching the content of the content management system based on the parameters. Responsive content may be moved from the primary storage location of the content management system to the secondary storage location specified by the entity rule.
Accelerating change data capture determination using row bitsets
Techniques described herein can accelerate change data capture determinations such as stream reads, which show changes made to a table between two points in time. Three distinct row bitests that mark deleted, updated, inserted, rows in micro-partitions can be added as metadata for the table. These bitsets can be generated during DML operations and then stored as metadata of the new partition generated by the DML operations. The bitsets can then be used to generate streams showing the changes in the table between two points in time (changes interval).