H03M7/3091

Additional compression for existing compressed data
12088327 · 2024-09-10 · ·

Techniques are provided for implementing additional compression for existing compressed data. Format information stored within a data block is evaluated to determine whether the data block is compressed or uncompressed. In response to the data block being compressed according to a first compression format, the data block is decompressed using the format information. The data block is compressed with one or more other data blocks to create compressed data having a second compression format different than the first compression format.

NON-VOLATILE MEMORY APPARATUS AND DATA DEDUPLICATION METHOD THEREOF
20180267733 · 2018-09-20 · ·

A non-volatile memory (NVM) apparatus and a data de-duplication method thereof are provided. The NVM apparatus includes a NVM and a controller. The controller performs an error checking and correcting (ECC) method to convert a raw data into an encoded data. The controller performs the data de-duplication method to reduce a number of times that the same encoded data is repeatedly written into the NVM. The controller generates the feature information corresponding to the raw data by reusing the ECC method. When the feature information is found in a feature list, the encoded data corresponding to the raw data will not be written into the NVM. When the feature information is not found in the feature list, the feature information is added into the feature list, and the encoded data corresponding to the raw data is written into the NVM.

DEDUPLICATION METHOD AND STORAGE DEVICE
20180267896 · 2018-09-20 ·

The present disclosure directs to solutions for performing deduplication by a storage device. In the solutions, according to a duplicate data locality principle, non-duplicate data blocks whose logical addresses are contiguous are stored in contiguous physical addresses in a sequence of the logical addresses, and fingerprints of the non-duplicate data blocks whose logical addresses are contiguous are also stored in contiguous physical addresses in the sequence of the logical addresses, and in addition, a mapping from a logical address, which is of one data block in the non-duplicate data blocks whose logical addresses are contiguous, to an aggregation address is established.

Hybrid bit-sliced dictionary encoding for fast index-based operations

Techniques are described herein for storing and processing codes included in dictionary-encoded data. In an embodiment, for each respective code of a plurality of codes in the dictionary-encoded data: a plurality of bits from a first portion of the respective code is contiguously stored. One or more bits from a second portion of the respective code is stored in one or more slices. Each respective slice of the one or more slices stores a bit from the one or more bits with a corresponding bit position in the respective code. In another embodiment, a bit-vector is generated based on at least one slice by loading each respective bit of the plurality of bits into different respective partitions in a register at a bit position corresponding to the at least one slice. A plurality of codes may be reconstructed by combining the bit-vector with one or more other bit-vectors.

Hardware efficient fingerprinting

An approach for fingerprinting large data objects at the wire speed has been disclosed. The techniques include Fresh/Shift pipelining, split Fresh, optimization, online channel sampling, and pipelined selection. The architecture can also be replicated to work in parallel for higher system throughput. Fingerprinting may provide an efficient mechanism for identifying duplication in a data stream, and deduplication based on the identified fingerprints may provide reduced storage costs, reduced network bandwidth consumption, reduced processing time and other benefits. In some embodiments, fingerprinting may be used to ensure or verify data integrity and may facilitate detection of corruption or tampering. An efficient manner of generating fingerprints (either via hardware, software, or a combination) may reduce a computation load and/or time required to generate fingerprints.

COMPUTER SYSTEM, STORAGE APPARATUS, AND METHOD OF MANAGING DATA
20180253251 · 2018-09-06 · ·

It is provided a computer system comprising at least one storage apparatus and a computer, wherein the each of the at least one storage apparatus is configured to manage identification information indicating specifics of the stored data, and wherein the computer determines whether the data to be written to the one of the at least one storage apparatus has duplicate data, which is the same data already stored in any one of the at least one storage apparatus, transmits deduplicated data, and uses at least one of individual pieces of identification information or a range of pieces of identification information, depending on how many pieces of identification information appear in succession, to request the information indicating whether the data that is associated with the calculated identification information is stored from the one of the at least one storage apparatus.

METHODS FOR ESTIMATING COST SAVINGS USING DEDUPLICATION AND COMPRESSION IN A STORAGE SYSTEM
20180246649 · 2018-08-30 ·

Methods for estimating cost savings in a storage system using an external host system. One method includes accessing over a communication network data from a unit of storage of a data storage system, wherein each of the blocks of data is uncompressed. A plurality of blocks is parsed from the data. A plurality of fingerprints is generated from the blocks using a hash algorithm. A deduplication ratio is estimated for the plurality of blocks stored in the unit of storage using a hyperloglog algorithm and a first plurality of buckets compartmentalizing the plurality of blocks, wherein the first plurality of buckets is defined by precision bits of the plurality of fingerprints. An effective compression ratio is estimated for the plurality of blocks stored in the unit of storage using the hyperloglog algorithm and a second plurality of buckets compartmentalizing the plurality of blocks, wherein the second plurality of buckets is defined by ranges of compression ratios.

INFORMATION PROCESSING APPARATUS, DATA COMPRESSING METHOD, AND COMPUTER-READABLE RECORDING MEDIUM
20180232182 · 2018-08-16 · ·

A specifying unit specifies one or more dividing positions in input data. A common region compressing unit specifies compression positions in the input data corresponding to positions at which sizes from two ends of compressed, data are equal to or larger than a predetermined, size and corresponding to positions which sandwich adjacently-positioned dividing positions and of which a size therebetween is equal to or larger than the predetermined size and compresses former compression data that are arranged in a row in the input data on either side of each of the dividing positions and that are interposed between the compression positions. An individual region compressing unit compresses latter compression data that are separated fay any of the pieces of former compression data in the input data, based on the former compression data positioned adjacent to the piece of latter compression data and the piece of latter compression data.

Optimizing Offline Map Data Updates

In some implementations, a system can optimize offline map data updates. For example, a server device in the system can determine a metric for identifying map data objects based on attributes of the map data objects. The server device can then generate a quadtree that stores the map data objects in nodes of the quadtree based on the metric. When processing an update to the map data stored at the server device, the server device can generate update data describing the updates for each node in the quadtree based on a binary difference algorithm and/or a semantic difference algorithm. The server device can select the algorithm based on which algorithm results in the smallest compressed size of the update data.

ENCODING METHOD AND APPARATUS
20180205393 · 2018-07-19 ·

An encoding method and apparatus is described. In an encoding method, when a first target sub-block in a target block is obtained, a hash operation is first performed on the first target sub-block. Then, a first hash table is queried for a corresponding hash value according to an operation result, and a corresponding location in a reference block is found according to the hash value obtained by means of query, that is, first reference data is found. The first piece of target data in the first target sub-block is matched with the first reference data, and second target data in the target block is matched with second reference data in the reference block. In this way, an approximate location is predetermined, so that a range in which matching needs to be performed is narrowed, a data compression time is reduced, and data compression efficiency is improved.