H03M7/3091

GENERAL PURPOSE DATA COMPRESSION USING SIMD ENGINE
20190146801 · 2019-05-16 ·

A system for compressing an input data stream to create a compressed output data stream, comprising a memory for storing a hash table comprising hash entries each comprising a hash value of an associated subset of following data items of an input data stream and a pointer to a memory location of the associated subset. A processor coupled to the memory executes the following operations while instructing a SIMD engine to execute concurrently one or more of the operations for consecutive subsets: calculate the hash value for each subset, search the hash table for a match of each calculated hash value and update the hash table according to the match result. The processor then updates the compressed output data stream according to the match result and a comparison result depending on the match result and operations for the plurality of associated subsets to create the compressed output data stream.

Opportunistic content delivery using delta coding
10270842 · 2019-04-23 · ·

Systems and methods are described for avoiding redundant data transfers using delta coding techniques when reliably and opportunistically communicating data to multiple user systems. According to embodiments, user systems track received block sequences for locally stored content blocks. An intermediate server intercepts content requests between user systems and target hosts, and deterministically chucks and fingerprints content data received in response to those requests. A fingerprint of a received content block is communicated to the requesting user system, and the user system determines based on the fingerprint whether the corresponding content block matches a content block that is already locally stored. If so, the user system returns a set of fingerprints representing a sequence of next content blocks that were previously stored after the matching content block. The intermediate server can then send only those content data blocks that are not already locally stored at the user system according to the returned set of fingerprints.

Memory deduplication based on guest page hints
10261820 · 2019-04-16 · ·

Methods, systems, and computer program products are included for de-duplicating one or more memory pages. A method includes receiving, by a hypervisor, a list of read-only memory page hints from a guest running on a virtual machine. The list of read-only memory page hints specifies a first memory page marked as writeable. The method also includes determining whether the first memory page matches a second memory page. In response to a determination that the first memory page matches the second memory page, the hypervisor may deduplicate the first and second memory pages.

Methods for estimating cost savings using deduplication and compression in a storage system

Methods for estimating cost savings in a storage system using an external host system. One method includes accessing over a communication network data from a unit of storage of a data storage system, wherein each of the blocks of data is uncompressed. A plurality of blocks is parsed from the data. A plurality of fingerprints is generated from the blocks using a hash algorithm. A deduplication ratio is estimated for the plurality of blocks stored in the unit of storage using a hyperloglog algorithm and a first plurality of buckets compartmentalizing the plurality of blocks, wherein the first plurality of buckets is defined by precision bits of the plurality of fingerprints. An effective compression ratio is estimated for the plurality of blocks stored in the unit of storage using the hyperloglog algorithm and a second plurality of buckets compartmentalizing the plurality of blocks, wherein the second plurality of buckets is defined by ranges of compression ratios.

Systems and methods for assessing upstream oil and gas electronic data duplication
10235080 · 2019-03-19 · ·

Provided in some embodiments are systems and methods for assessing electronic data duplication. Embodiments include extracting first samples of electronic data files and applying a first hash function to the first samples to generate first hash digests. Determining first groupings of the electronic data files having a same file size and first hash digest. Extracting second samples the electronic data files of the first groupings and applying a second hash function to the second samples to generate second hash digests. Determining second groupings the electronic data files having a same file size, a same first hash digest and a same second hash digest. Applying a third hash function to the contents of the electronic data files of the second groupings to generate third hash digests. Determining duplicate electronic data files having a same file size and same first, second, and third hash digests.

TAPE DRIVE MEMORY DEDUPLICATION
20190079947 · 2019-03-14 ·

A method and system for improving tape drive memory storage is provided. The method includes receiving, by a storage tape drive, a data stream for storage. The data stream is passed through a non-volatile memory device (NVS2) of the storage tape drive. The data stream is divided into adjacent variable length data chunks and a chunk list file including similarity identifiers for each of the adjacent variable length data chunks is generated and stored within a (non-volatile memory device) NVS1. Duplicate data including duplicated data with respect to a group of data chunks of the adjacent variable length data chunks is identified and deleted from the NVS2 of the storage tape drive such that the group of data chunks remains within NVS2. The group of data chunks is written to a data storage tape cartridge. Pointers identifying each data chunk and an associated storage position are generated and stored.

METADATA SEPARATED CONTAINER FORMAT
20190026299 · 2019-01-24 ·

A data management device includes a persistent storage and a processor. The persistent storage includes an object storage. The processor segments a file into file segments. The processor generates meta-data of the file segments. The processor stores a portion of the file segments in a data object of the object storage. The processor stores a portion of the meta-data of the file segments in a meta-data object of the object storage.

COMPRESSION OF SEMI-STRUCTURED DATA
20190007059 · 2019-01-03 ·

A method for compressing semi-structured data is discussed. The method includes accessing semi-structured data, the semi-structured data comprising a plurality of elements. The method includes determining a plurality of unique elements of the plurality of elements, each of the plurality of unique elements associated with a respective unique index of a plurality of unique indexes. Each of the unique index can indicate a position in one of a plurality of data stores. The method includes generating a sequence of encoded representations corresponding to the plurality of elements, the generating based on the plurality of unique indexes.

Method of smart saving high-density data and memory device

A signal interface has a compression unit and a data memory. The compression unit is configured to input an input datum from signal data generated by at least one sensor and further configured to identify the presence or absence of at least one repetition condition in the input datum. If the presence of the at least one repetition condition of the input datum is identified, the compression unit encodes the input datum in a compressed way to generate a compressed datum and saves the compressed datum in the data memory. If the presence of the at least one repetition condition of the input datum is not identified, the compression unit saves the uncompressed input datum in the data memory.

BOUNDS CHECKING
20180364980 · 2018-12-20 ·

A data processing apparatus is provided, for performing a determination of whether a value falls within a boundary defined by a lower limit between 0 and 2.sup.m and an upper limit between 0 and 2.sup.m. The apparatus includes storage circuitry that stores each of the lower limit and the upper limit in a compressed form as a mantissa of q<m bits and a shared exponent e. A most significant m-q-e bits of said lower limit and said upper limit are equal to a most significant m-q-e bits of said value. Adjustment circuitry performs adjustments to the lower limit and the upper limit in compressed form and boundary comparison circuitry performs the determination on the value using the lower limit and the upper limit in the compressed form.