Patent classifications
H03M7/3091
PRESERVATION OF DATA DURING SCALING OF A GEOGRAPHICALLY DIVERSE DATA STORAGE SYSTEM
Preservation of data during scaling of a geographically diverse data storage system is disclosed. In regard to scaling-in, a first zone storage component (ZSC) can be placed in read-only (RO) mode to allow continued access to data stored on the first ZSC, completion of previously queued operations, updating of data chunks, etc. Data chunks can comprise metadata stored in directory table partitions organized in a tree data structure scheme. An updated data chunk of the first ZSC can be replicated at other ZSCs before deleting the first ZSC. A first hash function can be used to distribute portions of the updated data chunk among the other ZSCs. A second hash function can be used to distribute key data values corresponding to the distributed portions of the updated data chunk among the other ZSCs. Employing the first and second hash functions can result in more efficient use of storage space and more even distribution of key data values when compared to simple replication of a data chunk of the first ZSC by the other ZSCs.
DETECTING DATA DEDUPLICATION OPPORTUNITIES USING ENTROPY-BASED DISTANCE
Techniques for processing data may include: receiving a candidate data block; computing a distance using a distance function, wherein the distance is an entropy-based distance and denotes a measurement of similarity between the candidate data block and a target data block; and determining, using the distance, whether to perform data deduplication of the candidate data block with respect to the target data block to identify at least one sub-block of the candidate data block that is a duplicate of at least one sub-block of the target data block. If the distance is less than a threshold, it may be expected to have a matching sub-block between the candidate and target data blocks. The distance may be a difference between entropy values for the candidate and target data blocks. The first entropy value may be used to determine whether to compress or perform partial deduplication for the candidate data block.
Detecting and protecting against ransomware
In a system that replicates data writes by a server to form a local copy for a local production site with local storage and a remote copy for a remote recovery site having remote storage, ransomware is detected by a decrease of more than a predetermined threshold in either or both of compression ratio or deduplication ratio in a length of data selected by a sliding time window. Upon detecting ransomware, data writes to said remote storage are stopped to minimize corruption of the remote data.
DATA COMPRESSION METHOD, DATA COMPRESSION APPARATUS, DATA DECOMPRESSION METHOD, DATA DECOMPRESSION APPARATUS AND DATA STORAGE SYSTEM
One aspect of the present disclosure relates to a data compression method. The method includes generating, by one or more processors, compressed data from data, wherein the compressed data includes one or more unduplicated values of the data and generating, by the one or more processors, index data from the data, wherein the index data includes indices indicative of storage locations for the unduplicated values.
Information processing apparatus, data compressing method, and computer-readable recording medium
A specifying unit specifies one or more dividing positions in input data. A common region compressing unit specifies compression positions in the input data corresponding to positions at which sizes from two ends of compressed, data are equal to or larger than a predetermined, size and corresponding to positions which sandwich adjacently-positioned dividing positions and of which a size therebetween is equal to or larger than the predetermined size and compresses former compression data that are arranged in a row in the input data on either side of each of the dividing positions and that are interposed between the compression positions. An individual region compressing unit compresses latter compression data that are separated fay any of the pieces of former compression data in the input data, based on the former compression data positioned adjacent to the piece of latter compression data and the piece of latter compression data.
Prefix compression for keyed values
Systems and techniques are described for compressing strings by using a tree data structure. Specifically, for each string in a sequence of strings, the embodiments can traverse the tree data structure by matching characters of the string with characters associated with nodes of the tree data structure until either (1) all characters in the string have been processed, or (2) a current character in the string does not match a corresponding character in a current node of the tree data structure. Next, a first node identifier associated with the current node can be returned if all characters have been processed. Otherwise, a new node can be created in the tree data structure to store the remaining characters in the string, and a second node identifier associated with the new node in the tree data structure can be returned.
A METHOD AND SYSTEM FOR COMPRESSING DATA
A system and method for a non-transient computer readable medium containing program instructions for causing a computer to perform a method for compressing data comprising the steps of receiving a data string for compression, the data string including a plurality of data elements, creating a template based on processing the data string, the template including common information across all data elements of the data string, creating one or more entries, wherein the one or more entries include information that is different to the template, and storing the template and the one or more entries.
Maintaining data deduplication reference information
A data deduplication method includes detecting a deduplication transaction including a data pattern associated with a data pattern address (DPA) and a reference, to the pattern, associated with a data reference address (DRA). A deduplication key may be determined based on the DPA and the DRA by concatenating the DPA and the DRA with the DPA as the most significant bits. The key may be stored in a key field of a record in a persistent and sequentially-accessed log, which is part of a log-with-index (LWI) structure that also maintains, in RAM or SSD, a binary index of the log records. When full, the log is cleared by writing the records in key-sorted order to the new tablet. From time to time, two tablets in the tablet library are merged. Tablet merging may include two or more atomic merges, each atomic merge corresponding to a portion of the tablet.
System and method for hash-based entropy calculation
A method, computer program product, and computing system for receiving a candidate data portion; calculating a distance-preserving hash for the candidate data portion; and performing an entropy analysis on the distance-preserving hash to generate a hash entropy for the candidate data portion.
Data compression device and data compression method
An object of the present invention is to efficiently compress a plurality of kinds of data series with different sampling rates. A data compression device has a grouping unit and a compression unit. The grouping unit groups a plurality of kinds of data series with different sampling rates. The compression unit compresses the data series grouped by the grouping unit.