H03M7/3091

CONCURRENT SEGMENTATION USING VECTOR PROCESSING
20180365284 · 2018-12-20 ·

A system for segmenting an input data stream, comprising a processor adapted to split an input data stream to a plurality of data sub-streams such that each of the plurality of data sub-streams has an overlapping portion with a consecutive data sub-stream of the plurality of data sub-streams, create concurrently a plurality of segmented data sub-streams by concurrently segmenting the plurality of data sub-streams each in one of a plurality of processing pipelines of the processor and join the plurality of segmented data sub-streams to create a segmented data stream by synchronizing a sequencing of each of the plurality of segmented data sub-streams according to one or more overlapping segments in the overlapping portion of each two consecutive data sub-streams of the plurality of data sub-streams.

Apparatus and method for inline compression and deduplication

An apparatus for inline compression and deduplication includes a memory unit and a processor coupled to the memory unit. The processor is configured to receive a subset of data from a data stream and select a reference data block corresponding to the subset of data, in which the reference data block is stored in a memory buffer resident in the memory unit. The processor is also configured to compare a first hash value computed for the subset of data to a second hash value computed for the reference data block, in which the first hash value and the second hash value are stored in separate hash tables and generate a compressed representation of the subset of data by modifying header data corresponding to the subset of data responsive to a detected match between the first hash value and the second hash value in one of the separate hash tables.

SYSTEMS AND METHODS FOR ASSESSING UPSTREAM OIL AND GAS ELECTRONIC DATA DUPLICATION
20180349054 · 2018-12-06 ·

Provided in some embodiments are systems and methods for assessing electronic data duplication. Embodiments include extracting first samples of electronic data files and applying a first hash function to the first samples to generate first hash digests. Determining first groupings of the electronic data files having a same file size and first hash digest. Extracting second samples the electronic data files of the first groupings and applying a second hash function to the second samples to generate second hash digests. Determining second groupings the electronic data files having a same file size, a same first hash digest and a same second hash digest. Applying a third hash function to the contents of the electronic data files of the second groupings to generate third hash digests. Determining duplicate electronic data files having a same file size and same first, second, and third hash digests.

Deduplication using sub-chunk fingerprints

A computer-implemented method and system for deduplicating sub-chunks in a data storage system selects a data chunk to deduplicate and generates a sketch for the selected data chunk. A similar data chunk is searched for using the sketch. A set of fingerprints corresponding to sub-chunks of the similar data chunk is loaded. The set of fingerprints for the similar data chunk is compared to a set of fingerprints of the selected data chunk and the selected chunk is encoded as a set of references to identical sub-chunks of the similar data chunk and at least one unmatched sub-chunk.

DATA COMPRESSION WITH REDUNDANCY REMOVAL ACROSS BOUNDARIES OF COMPRESSION SEARCH ENGINES

Data compression techniques are provided that remove redundancy across the boundary of compression search engines. An illustrative method comprises splitting the data frame into a plurality of sub-chunks; comparing at least two of the plurality of sub-chunks to one another to remove at least one sub-chunk from the plurality of sub-chunks that substantially matches at least one other sub-chunk to generate a remaining plurality of sub-chunks; generating matching sub-chunk information for data reconstruction identifying the at least one removed sub-chunk and the corresponding substantially matched at least one other sub-chunk; grouping the remaining plurality of sub-chunks into sub-units; removing substantially repeated patterns within the sub-units to generate corresponding compressed sub-units; and combining the compressed sub-units with the matching sub-chunk information to generate a compressed data frame. The data frame optionally comprises one or more host pages compressed substantially simultaneously, and the compressed data frame for a plurality of host pages compressed substantially simultaneously comprises a host page address for each host page.

DEDUPLICATION AND COMPRESSION OF DATA SEGMENTS IN A DATA STORAGE SYSTEM
20180329631 · 2018-11-15 ·

Techniques for performing data deduplication and compression in data storage systems. Data deduplication is performed in a deduplication domain on a segment-by-segment basis to obtain a plurality of deduplicated data segments. Deduplicated data segments are grouped together to form a plurality of compression groups. Data compression is performed on each compression group, and the compressed group is stored on spinning media. By performing data deduplication on a segment-by-segment basis, the size of each segment can be reduced to increase the effectiveness of data deduplication. By performing data compression on compression groups, the size of each compression domain can be increased to increase the effectiveness of data compression. By storing deduplicated data segments as a compressed group on the spinning media, a sequential nature of the segments can be preserved to reduce a seek time/rotational latency of the spinning media and a number of IOPS handled by the data storage system.

ENCODING AND DECODING OF DIGITAL AUDIO SIGNALS USING DIFFERENCE DATA
20180308494 · 2018-10-25 ·

An audio encoder can parse a digital audio signal into a plurality of frames, each frame including a specified number of audio samples, perform a transform of the audio samples of each frame to produce a plurality of frequency-domain coefficients for each frame, partition the plurality of frequency-domain coefficients for each frame into a plurality of bands for each frame, each band having bit data that represents a number of bits allocated for the band, and encode the digital audio signal and difference data to a bit stream (e.g., an encoded digital audio signal). The difference data can produce the full bit data when combined with estimate data that can be computed from data present in the bit stream. The difference data can be compressed to a smaller size than the full bit data, which can reduce the space required in the bit stream.

Hardware efficient rabin fingerprints

An approach for fingerprinting large data objects at the wire speed has been disclosed. The techniques include Fresh/Shift pipelining, split Fresh, optimization, online channel sampling, and pipelined selection. The architecture can also be replicated to work in parallel for higher system throughput. Fingerprinting may provide an efficient mechanism for identifying duplication in a data stream, and deduplication based on the identified fingerprints may provide reduced storage costs, reduced network bandwidth consumption, reduced processing time and other benefits. In some embodiments, fingerprinting may be used to ensure or verify data integrity and may facilitate detection of corruption or tampering. An efficient manner of generating fingerprints (either via hardware, software, or a combination) may reduce a computation load and/or time required to generate fingerprints.

SYSTEM AND METHOD FOR AN IMPROVED REAL-TIME ADAPTIVE DATA COMPRESSION
20180300087 · 2018-10-18 · ·

The present invention is mainly to solve the technical problems of the prior art existed. The present invention relates to compression, in particular to an improved real-time adaptive data compression for efficient data storage. An aspect of present disclosure relates to a method for managing data storage in a data storage system. The method includes the steps of determining, by a processor of said data storage system, receipt of one or more blocks of data for storage; identifying, by the processor, a compression technique for storage of said one or more blocks of data; and compressing in-line or post processing, by the processor, if said compression technique is an in-line compression technique for writing the data in a memory, said one or more blocks of data based at least on a resources utilization of said data storage system.

IN-PLACE DATA COMPRESSION WITH SMALL WORKING MEMORY

Method and apparatus for performing in-place compression is provided. The in-place compression system transfers source data from a partition of a memory to a data buffer based on a read address. Compressed data is created by referencing the source data stored in the data buffer. The system writes the compressed data to the memory partition based on a write address. When the write address points at an address location that stores source data that has not been transferred to the data buffer, the system overwrites the compressed data stored in the memory partition with the source data stored in the data buffer.