Patent classifications
H03M7/3095
Efficient generalized boundary detection
Fast, efficient, and robust compression-based methods for detecting boundaries in arbitrary datasets, including sequences (1D datasets), are desired. The methods, each employing three simple algorithms, approximate the information distance between two adjacent sliding windows within a dataset. One of the algorithms calculates an initial ordered list of subsequences; while a second algorithm updates the ordered list of subsequences by dropping a first entry and appending a last entry rather than calculating completely new ordered lists with each iteration. Large values in the distance metric are indicative of boundary locations. A smoothed z-score or a wavelet-based algorithm may then be used to locate peaks in the distance metric, thereby identifying boundary locations. An adaptive version of the method employs a collection of window sizes and corresponding weighting functions, making it more amenable to real datasets with unknown, complex, and changing structures.
SYSTEMS, METHODS, AND APPARATUS FOR DIVIDING AND COMPRESSING DATA
A method for data compression may include scanning input data, performing, based on the scanning, a compression operation to generate compressed data using the input data, finding, based on the scanning, a delimiter in the input data, and generating, based on a position of the delimiter in the input data, a portion of data using the compressed data. The input data may include a record, the delimiter indicates a boundary of the record, and the portion of data may include the record. The generating may include generating the portion of data based on a portion size. The portion size may be a default portion size. The portion size may be based on a default portion size and a length of a match in the input data.
Advanced database decompression
A method, a system, and a computer program product for decompressing data. One or more compressed blocks in a set of stored compressed blocks responsive to a request to access data in the set of stored compressed blocks are identified. String prefixes inside the identified compressed blocks are decompressed using front coding. String suffixes inside the identified compressed blocks are decompressed using a re-pair decompression. Uncompressed data is generated.
ADVANCED DATABASE DECOMPRESSION
A method, a system, and a computer program product for decompressing data. One or more compressed blocks in a set of stored compressed blocks responsive to a request to access data in the set of stored compressed blocks are identified. String prefixes inside the identified compressed blocks are decompressed using front coding. String suffixes inside the identified compressed blocks are decompressed using a re-pair decompression. Uncompressed data is generated.
VECTOR PROCESSING FOR SEGMENTATION HASH VALUES CALCULATION
A system for segmenting an input data stream using vector processing, comprising a processor adapted to repeat the following steps throughout an input data stream to create a segmented data stream consisting a plurality of segments: apply a rolling sequence over a sequence of consecutive data items of an input data stream, the rolling sequence includes a subset of consecutive data items of the sequence, calculate concurrently a plurality of partial hash values each by one of a plurality of processing pipelines of the processor, each for a respective one of a plurality of partial rolling sequences each including evenly spaced data items of the subset, determine compliance of each of the plurality of partial hash values with one or more respective partial segmentation criteria and designate the sequence as a variable size segment when at least some of the partial hash values comply with the respective partial segmentation criteria.
Lossless compression of client read data
A read is aligned to a reference data set. It is determined whether the read includes any identifier distinction, the determination being performed using the alignment. If so, positional data corresponding to the identifier distinction(s) are defined. Compressed read data is stored in association with a read identifier of the read. The compressed read data includes alignment information (e.g., a start and/or stop position of the alignment). When the read includes an identifier distinction, the compressed read data further includes the positional data and deviation data characterizing the distinction.
LAYOUT FORMAT FOR COMPRESSED DATA
Techniques are provided for a layout format for compressed data. A first set of data blocks are grouped into a first group based upon a first frequency of access to the first set of data blocks. A second set of data blocks are grouped into a second group based upon a second frequency of access to the second set of data blocks. The first set of data blocks are compressed into a first compression group using a first compression algorithm. The second set of data blocks are compressed into a second compression group using a second compression algorithm.
Storage device and data processing method
The present invention realizes a storage device that has a high data reduction effect without decreasing I/O performances. The storage device includes a processor, an accelerator, a memory, and a storage medium, the processor specifies data to be compressed that is data stored in the storage medium from data stored in the memory and transmits a compression instruction including information relating to the data to be compressed to the accelerator, and the accelerator reads the plurality of continuous items of data from the memory and compresses the plurality of items of data to be compressed obtained by excluding data that is not to be compressed from the plurality of items of data, based on the information relating to the data to be compressed received from the processor, to generate compressed data stored in the storage device.
STORAGE DEVICE AND DATA PROCESSING METHOD
The present invention realizes a storage device that has a high data reduction effect without decreasing I/O performances. The storage device includes a processor, an accelerator, a memory, and a storage medium, the processor specifies data to be compressed that is data stored in the storage medium from data stored in the memory and transmits a compression instruction including information relating to the data to be compressed to the accelerator, and the accelerator reads the plurality of continuous items of data from the memory and compresses the plurality of items of data to be compressed obtained by excluding data that is not to be compressed from the plurality of items of data, based on the information relating to the data to be compressed received from the processor, to generate compressed data stored in the storage device.
Preparing data for deduplication
Preparing data for deduplication including in response to receiving a request to transfer data from a source storage system to a target storage system, accessing, by the source storage system, a compressed data block; generating, by the source storage system, a padded compressed data block by padding the compressed data block to conform to a fixed block size, wherein the fixed block size is greater than a size of the compressed data block; and sending, by the source storage system, the padded compressed data block to the target storage system.