H03M7/3091

Generating and optimizing summary index levels in a deduplication storage system

The method and system generates a first deduplication map (DDM) level including first data and a second DDM level including second data. The method or apparatus also generates a first index summary (IS) level corresponding to the first DDM level and a second IS level corresponding to the second DDM level. The method or apparatus merges the first data of the first DDM level and the second data of the second DDM level to generate a third DDM level comprising third data. The method or apparatus in response to generating the third DDM level, generates a third IS level to accelerate lookup within the third DDM level.

SIMILARITY-BASED DATA DEDUPLICATION ON SOLID-STATE STORAGE DEVICES WITH EMBEDDED NONVOLATILE MEMORY
20190310788 · 2019-10-10 ·

A storage device and method for performing device level similarity-based data deduplication. A solid-state storage device is provided that includes: a set of flash memory; a nonvolatile memory (NVM) cache; and a controller that performs similarity-based data deduplication in response to write requests from a host, wherein the controller includes: a cache management module that temporarily stores a new data sector in NVM cache when a write request is received; a similarity detection module that determines if a similar data sector exists in flash memory; a data chunk management module that, in response to determining the similar data sector exists, generates a new data chunk that includes a metadata block, a base sector and at least one delta, wherein the new data chunk is stored in a newly allocated physical block address (PBA) in flash memory.

Concurrent segmentation using vector processing

A system for segmenting an input data stream, comprising a processor adapted to split an input data stream to a plurality of data sub-streams such that each of the plurality of data sub-streams has an overlapping portion with a consecutive data sub-stream of the plurality of data sub-streams, create concurrently a plurality of segmented data sub-streams by concurrently segmenting the plurality of data sub-streams each in one of a plurality of processing pipelines of the processor and join the plurality of segmented data sub-streams to create a segmented data stream by synchronizing a sequencing of each of the plurality of segmented data sub-streams according to one or more overlapping segments in the overlapping portion of each two consecutive data sub-streams of the plurality of data sub-streams.

DATA COMPRESSION METHOD, APPARATUS FOR DATA COMPRESSION, AND NON-TRANSITORY COMPUTER-READABLE STORAGE MEDIUM FOR STORING PROGRAM
20190303381 · 2019-10-03 · ·

A data compression method includes: specifying a structure of a group included in semistructured data, based on a data kind and a data type of each data in the group; setting a first identifier unique to each structure and setting a second identifier for a pair of the data kind and the data type of each data in the structure; storing the data in the group in different storage areas for each pair of the first identifier corresponding to the group and the second identifier corresponding to the data; and compressing the data for each storage area.

METHOD AND SYSTEM OF SIMILARITY-BASED DEDUPLICATION

A method of similarity-based deduplication comprising the steps of: receiving an input data block; computing discrete wavelet transform (DWT) coefficients; extracting feature-related DWT data from the computed DWT coefficients; applying quantization to the extracted feature-related DWT data to obtain keys as results of the quantization; constructing a locality-sensitive fingerprint of the input data block; computing a similarity degree between the locality-sensitive fingerprint of the input data block and a locality-sensitive fingerprint of each data block in the plurality of the data blocks in a cache memory; selecting an optimal reference data block as the data block; determining a differential compression is required to be applied based on the similarity degree between the input data block and the optimal reference data block; applying the differential compression to the input data block and the optimal reference data block.

Bounds checking

A data processing apparatus is provided, for performing a determination of whether a value falls within a boundary defined by a lower limit between 0 and 2.sup.m and an upper limit between 0 and 2.sup.m. The apparatus includes storage circuitry that stores each of the lower limit and the upper limit in a compressed form as a mantissa of q<m bits and a shared exponent e. A most significant m-q-e bits of said lower limit and said upper limit are equal to a most significant m-q-e bits of said value. Adjustment circuitry performs adjustments to the lower limit and the upper limit in compressed form and boundary comparison circuitry performs the determination on the value using the lower limit and the upper limit in the compressed form.

TAPE DRIVE MEMORY DEDUPLICATION
20190272257 · 2019-09-05 ·

A method and system for improving tape drive memory storage is provided. The method includes receiving, by a storage tape drive, a data stream for storage. The data stream is passed through a non-volatile memory device (NVS2) of the storage tape drive. The data stream is divided into adjacent variable length data chunks and a chunk list file including similarity identifiers for each of the adjacent variable length data chunks is generated and stored within a (non-volatile memory device) NVS1. Duplicate data including duplicated data with respect to a group of data chunks of the adjacent variable length data chunks is identified and deleted from the NVS2 of the storage tape drive such that the group of data chunks remains within NVS2. The group of data chunks is written to a data storage tape cartridge. Pointers identifying each data chunk and an associated storage position are generated and stored.

SYSTEMS AND METHODS FOR CALCULATING A PROBABILITY OF EXCEEDING STORAGE CAPACITY IN A VIRTUALIZED COMPUTING SYSTEM
20190272113 · 2019-09-05 · ·

Examples of systems are described for calculating a probability of exceeding storage capacity of a virtualized system in a particular time period using probabilistic models. The probabilistic models may advantageously take variances of storage capacity into consideration.

Adaptive compression to improve reads on a deduplication file system

Improving the performance of read operations in a restore path of an inline deduplication system utilizing a DDBOOST interface by providing an adaptive compression component for use with DDBOOST applications. The system utilizes a built-in compression mode for transferring read data if there is a sufficient available CPU resources in both the server and client to respectively compress and decompress the read data without destabilizing the system. CPU usage on both the client and the server is tracked to generate predicted respective CPU usage. These respective predictions are compared to defined maximum threshold usage values. If the predicted values do not exceed the thresholds, compression is used, otherwise the data is transmitted over the network as non-compressed data.

Method to optimize ingest in dedupe systems by using compressibility hints

A method, system and computer-readable storage medium for transferring data segments from one computer system to a second computing system. Prior to transfer of the data segments, the first system calculates compressibility ratio of each segment and compares the compressibility ratio to a preset threshold. Based on the comparison, the first system assigns a compressibility hint to each segment. The first system transfers the segments to the second system, together with the corresponding compressibility hint. The second system stores each segment in a compressible region or in a non-compressible region based on the hint. Then the second system compresses the compressible region and stores the compressed region in a container, and stores the non-compressible region uncompressed in the container.