Patent classifications
H03M7/3095
CHUNKING METHOD AND APPARATUS
Embodiments of this application disclose a chunking method and an apparatus for implementing the method. According to the method provided in the embodiments of this application, a first data segment of a first length may be determined starting from a header of a to-be-chunked data flow, a data distribution characteristic of the first data segment is determined based on character values of all characters in the first data segment, and then a chunking position is determined for different data distribution characteristics by using different methods. In this way, a data flow can be better chunked, so as to enhance a deduplication effect.
Chunking method and apparatus
Embodiments of this application disclose a chunking method and an apparatus for implementing the method. According to the method provided in the embodiments of this application, a first data segment of a first length may be determined starting from a header of a to-be-chunked data flow, a data distribution characteristic of the first data segment is determined based on character values of all characters in the first data segment, and then a chunking position is determined for different data distribution characteristics by using different methods. In this way, a data flow can be better chunked, so as to enhance a deduplication effect.
Metadata separated container format
A data management device includes a persistent storage and a processor. The persistent storage includes an object storage. The processor segments a file into file segments. The processor generates meta-data of the file segments. The processor stores a portion of the file segments in a data object of the object storage. The processor stores a portion of the meta-data of the file segments in a meta-data object of the object storage.
Static dictionary-based compression hardware pipeline for data compression accelerator of a data processing unit
A highly programmable device, referred to generally as a data processing unit, having multiple processing units for processing streams of information, such as network packets or storage packets, is described. The data processing unit includes one or more specialized hardware accelerators configured to perform acceleration for various data processing functions. This disclosure describes a programmable hardware-based data compression accelerator that includes a pipeline for performing static dictionary-based and dynamic history-based compression on streams of information, such as network packets. The search block may support single and multi-thread processing, and multiple levels of compression effort. To achieve high-compression, the search block may operate at a high level of effort that supports a single thread and use of both a dynamic history of the input data stream and a static dictionary of common words. The static dictionary may be useful in achieving high-compression where the input data stream is relatively small.
Aligning Variable Sized Compressed Data To Fixed Sized Storage Blocks
Preparing data for deduplication including: generating, by a storage system for a compressed data block, a padded compressed data block by padding the compressed data block to conform to a fixed block size, wherein the fixed block size is greater than a size of the compressed data block; storing, in the storage system, the padded compressed data block beginning at a block boundary of a storage device in the storage system; and performing block-based deduplication on the storage system, wherein the block-based deduplication determines whether the padded compressed data block matches one or more other padded compressed data blocks stored in the storage system.
Hybrid data reduction
An information handling system may include at least one processor and a memory coupled to the at least one processor. The information handling system may be configured to receive data comprising a plurality of data chunks; perform deduplication on the plurality of data chunks to produce a plurality of unique data chunks; determine a compression ratio for respective pairs of the unique data chunks; determine a desired compression order for the plurality of unique data chunks based on the compression ratios; combine the plurality of unique data chunks in the desired compression order; and perform data compression on the combined plurality of unique data chunks.
Symbol pair encoding for data compression
Methods, apparatuses, and computer-readable media for compressing data for storage or transmission. Input data is compressed in a first stage utilizing a first compression algorithm and the frequencies of occurrence of symbols and symbol pairs in the output from the first stage is calculated. The output from the first stage is then encoded to a final compressed bit string in a second stage utilizing a second compression algorithm based on the calculated frequencies of occurrence of the symbols and the symbol pairs.
HYBRID DATA REDUCTION
An information handling system may include at least one processor and a memory coupled to the at least one processor. The information handling system may be configured to receive data comprising a plurality of data chunks; perform deduplication on the plurality of data chunks to produce a plurality of unique data chunks; determine a compression ratio for respective pairs of the unique data chunks; determine a desired compression order for the plurality of unique data chunks based on the compression ratios; combine the plurality of unique data chunks in the desired compression order; and perform data compression on the combined plurality of unique data chunks.
Tape drive memory deduplication
A method and system for improving tape drive memory storage is provided. The method includes receiving, by a storage tape drive, a data stream for storage. The data stream is passed through a non-volatile memory device (NVS2) of the storage tape drive. The data stream is divided into adjacent variable length data chunks and a chunk list file including similarity identifiers for each of the adjacent variable length data chunks is generated and stored within a (non-volatile memory device) NVS1. Duplicate data including duplicated data with respect to a group of data chunks of the adjacent variable length data chunks is identified and deleted from the NVS2 of the storage tape drive such that the group of data chunks remains within NVS2. The group of data chunks is written to a data storage tape cartridge. Pointers identifying each data chunk and an associated storage position are generated and stored.
ADVANCED DATABASE DECOMPRESSION
A method, a system, and a computer program product for decompressing data. One or more compressed blocks in a set of stored compressed blocks responsive to a request to access data in the set of stored compressed blocks are identified. String prefixes inside the identified compressed blocks are decompressed using front coding. String suffixes inside the identified compressed blocks are decompressed using a re-pair decompression. Uncompressed data is generated.