H03M7/3091

System and method for data deduplication

A method, computer program product, and computing system for identifying a potential deduplication candidate and a related deduplication target; executing a comparison operation with respect to the potential deduplication candidate and the related deduplication target to generate a comparison result; and determining a level of similarity between the potential deduplication candidate and the related deduplication target by processing the comparison result.

MULTIPLE OVERLAPPING HASHES AT VARIABLE OFFSET IN A HARDWARE OFFLOAD

A hardware offload includes a hash engine that performs hashing for a block-based storage system. The hash engine calculates multiple hash values for each input buffer provided by the storage system. The hash values may be calculated with variably offset and overlapping portions of the input buffer, wherein each portion is larger than the native block size of the storage system. The hardware offload may also include a compression engine that performs compression on the input buffer using the entire input buffer and/or chunks as compression domains.

Layout format for compressed data

Techniques are provided for a layout format for compressed data. A first set of data blocks are grouped into a first group based upon a first frequency of access to the first set of data blocks. A second set of data blocks are grouped into a second group based upon a second frequency of access to the second set of data blocks. The first set of data blocks are compressed into a first compression group using a first compression algorithm. The second set of data blocks are compressed into a second compression group using a second compression algorithm.

Method and apparatus for compressing metadata in a file system

Embodiments of the present disclosure relate to a method and an apparatus for compressing metadata in a file system. The method comprises, in response to receiving a first request for writing first data to a file, determining whether the first request is for an initial write to a storage area associated with a second indirect block in the first group of indirect blocks, the first group of indirect blocks at least including a first indirect block and the second indirect block. The method further comprises, in response to the initial write, allocating a first group of data blocks for writing the first data on a storage device. In addition, the method further comprises compressing the first group of indirect blocks by encoding a first group of storage addresses corresponding to the first group of data blocks into the first indirect block.

SYSTEM AND METHOD FOR HASH-BASED ENTROPY CALCULATION

A method, computer program product, and computing system for receiving a candidate data portion; calculating a distance-preserving hash for the candidate data portion; and performing an entropy analysis on the distance-preserving hash to generate a hash entropy for the candidate data portion.

Tape drive memory deduplication

A method and system for improving tape drive memory storage is provided. The method includes receiving, by a storage tape drive, a data stream for storage. The data stream is passed through a non-volatile memory device (NVS2) of the storage tape drive. The data stream is divided into adjacent variable length data chunks and a chunk list file including similarity identifiers for each of the adjacent variable length data chunks is generated and stored within a (non-volatile memory device) NVS1. Duplicate data including duplicated data with respect to a group of data chunks of the adjacent variable length data chunks is identified and deleted from the NVS2 of the storage tape drive such that the group of data chunks remains within NVS2. The group of data chunks is written to a data storage tape cartridge. Pointers identifying each data chunk and an associated storage position are generated and stored.

ADVANCED DATABASE COMPRESSION
20200403633 · 2020-12-24 ·

A method, a system, and a computer program product for executing a database compression. A compressed string dictionary having a block size and a front coding bucket size is generated from a dataset. Front coding is applied to one or more buckets of strings in the dictionary having the front coding bucket size to generate one or more front coded buckets of strings. One or more portions of the generated front coded buckets of strings are concatenated to form one or more blocks having the block size. Each block is compressed. A set of compressed blocks is stored. The set of the compressed blocks stores all strings in the dataset.

Performance optimization and support compatibility of data compression with hardware accelerator

One embodiment provides a computer implemented method of data compression using a hardware accelerator. A first thread pool for compression jobs, and a first polling thread is allocated for polling the status of a hardware accelerator. A compression thread is retrieved from the first thread pool in response to a compression request from a file system. Multiple source data buffers from the file system are aggregated into a compression unit, and a scatter gather list and destination buffer are submitted to the hardware accelerator. A checksum of result data is calculated from the destination buffer. A zlib header is added to the result data, and the checksum is added as a zlib footer to the result data.

REDUCING THE AMOUNT OF DATA STORED IN A SEQUENCE OF DATA BLOCKS BY COMBINING DEDUPLICATION AND COMPRESSION
20200350926 · 2020-11-05 ·

The described technology is generally directed towards reducing the amount of data stored in a sequence of data blocks by combining deduplication and compression. According to an embodiment, a system can comprise a memory that can store computer executable components, and a processor that can execute the components stored in the memory. The components can comprise a data block identifier that can identify, for a sequence of data blocks, a first data block that corresponds to a first data, resulting in a first identified data block, and a deduplication component that can identify a second data block that corresponds to the first data, resulting in a second identified data block, wherein the deduplication component can replace the second identified data block with a key value corresponding to the first identified data block. Further, a compression component can compress the first identified data block, resulting in a compressed data block.

SYSTEM AND METHOD FOR DATA DEDUPLICATION

A method, computer program product, and computing system for identifying a potential deduplication candidate and a related deduplication target; executing a comparison operation with respect to the potential deduplication candidate and the related deduplication target to generate a comparison result; and determining a level of similarity between the potential deduplication candidate and the related deduplication target by processing the comparison result.

TABLE-US-00001 1st 114833