Patent classifications
G06F16/174
System and method for an ultra highly available, high performance, persistent memory optimized, scale-out database
A shared-nothing database system is provided in which parallelism and workload balancing are increased by assigning the rows of each table to “slices”, and storing multiple copies (“duplicas”) of each slice across the persistent storage of multiple nodes of the shared-nothing database system. When the data for a table is distributed among the nodes of a shared-nothing system in this manner, requests to read data from a particular row of the table may be handled by any node that stores a duplica of the slice to which the row is assigned. For each slice, a single duplica of the slice is designated as the “primary duplica”. All DML operations (e.g. inserts, deletes, updates, etc.) that target a particular row of the table are performed by the node that has the primary duplica of the slice to which the particular row is assigned. The changes made by the DML operations are then propagated from the primary duplica to the other duplicas (“secondary duplicas”) of the same slice.
Source file copying and error handling
Object service receives request to copy file to destination and identifies group identifier for fingerprints group corresponding to sequential segments in file. Object service communicates request for fingerprints group to deduplication service associated with group identifier range including group identifier. Deduplication service communicates fingerprints group, retrieved from fingerprint storage, to object service, which communicates fingerprints group and group identifier to destination. Object service communicates request for file segments, corresponding to fingerprints missing in destination, communicated from destination, to deduplication service, which communicates requested segments, retrieved from source storage, to object service, which communicates requested segments to destination. System identifies generation identifier associated with time of communicating by object service or deduplication service, and generation identifier associated with another time of communicating by object service or deduplication service. If generation identifier associated with time differs from generation identifier associated with other time, object service or deduplication service restarts communication.
Systems and methods for managing single instancing data
Described in detail herein are systems and methods for managing single instancing data. Using a single instance database and other constructs (e.g. sparse files), data density on archival media (e.g. magnetic tape) is improved, and the number of files per storage operation is reduced. According to one aspect of a method for managing single instancing data, for each storage operation, a chunk folder is created on a storage device that stores single instancing data. The chunk folder contains three files: 1) a file that contains data objects that have been single instanced; 2) a file that contains data objects that have not been eligible for single instancing; and 3) a metadata file used to track the location of data objects within the other files. A second storage operation subsequent to a first storage operation contains references to data objects in the chunk folder created by the first storage operation instead of the data objects themselves.
Delivery of digital information to a remote device
Methods and systems relating to a file distribution scheme in a computer network are disclosed that distributes files in an efficient manner that reduces, among other things, network traffic. In an embodiment of the invention, a method for updating a file is disclosed. In such a method, unique chunks in a first version of a digital file are identified. For a second version of the digital file, chunks that are the same as in the first version are identified. Recompilation information is generated and stored for these identified chunks. Also, for the second version of the digital file, chunks in the second version that are different from chunks in the first version are identified. Recompilation information is generated and stored for these identified chunks. With this information, the second version of the digital file is completely defined and can be efficiently stored.
Realtime detection of ransomware
Some examples relate generally to managing and storing data, and more specifically to the real-time detection of ransomware, system (or insider) threats, or the misappropriation of credentials by using file system audit events.
Optimized client-side deduplication
One example method includes optimizing client-side deduplication. When backing up a client, an overwrite ratio is determined based on a size of actual changes made to a volume and a size indicated by changes in a change log. Client-side deduplication is enabled or disabled based on a value of the overwrite ratio.
Block-level single instancing
Described in detail herein are systems and methods for single instancing blocks of data in a data storage system. For example, the data storage system may include multiple computing devices (e.g., client computing devices) that store primary data. The data storage system may also include a secondary storage computing device, a single instance database, and one or more storage devices that store copies of the primary data (e.g., secondary copies, tertiary copies, etc.). The secondary storage computing device receives blocks of data from the computing devices and accesses the single instance database to determine whether the blocks of data are unique (meaning that no instances of the blocks of data are stored on the storage devices). If a block of data is unique, the single instance database stores it on a storage device. If not, the secondary storage computing device can avoid storing the block of data on the storage devices.
Distributing Data on Distributed Storage Systems
A method of distributing data in a distributed storage system includes receiving a file, dividing the received file into chunks, and determining a distribution of the chunks among storage devices of the distributed storage system based on a maintenance hierarchy of the distributed storage system. The maintenance hierarchy includes maintenance levels, and each maintenance level includes one or more maintenance units. Each maintenance unit has an active state and an inactive state. Moreover, each storage device is associated with a maintenance unit. The determining of the distribution of the chunks includes identifying a random selection of the storage devices matching a number of chunks of the file and being capable of maintaining accessibility of the file when one or more maintenance units are in an inactive state. The method also includes distributing the chunks to storage devices of the distributed storage system according to the determined distribution.
Storing compression units in relational tables
A database server stores compressed units in data blocks of a database. A table (or data from a plurality of rows thereof) is first compressed into a “compression unit” using any of a wide variety of compression techniques. The compression unit is then stored in one or more data block rows across one or more data blocks. As a result, a single data block row may comprise compressed data for a plurality of table rows, as encoded within the compression unit. Storage of compression units in data blocks maintains compatibility with existing data block-based databases, thus allowing the use of compression units in preexisting databases without modification to the underlying format of the database. The compression units may, for example, co-exist with uncompressed tables. Various techniques allow a database server to optimize access to data in the compression unit, so that the compression is virtually transparent to the user.
Reduction of data stored on a block processing storage system
Techniques and systems for reducing data stored on a block processing storage system are described. A losslessly reduced representation of a data block can include references to one or more prime data element blocks, and optionally a description of a reconstitution program which, when applied to the one or more prime data element blocks, results in the data block.