Patent classifications
H03M7/3095
Command processor with multiple string copy engines for a decompression system
An electronic device for decompressing compressed data to recreate original data includes a first string copy engine and a second string copy engine. The first string copy engine processes a first string copy command by acquiring a first string from recreated original data and appending the first string to the recreated original data. The second string copy engine processes a second string copy command by checking the second string copy command for a dependency on the first string and, when the dependency is found, stalling further processing of the second string copy command until the first string copy engine has appended a corresponding portion of the first string to the recreated original data. The second string copy engine processes the second string copy command by acquiring a second string from the recreated original data and appending the second string to the recreated original data.
STATIC DICTIONARY-BASED COMPRESSION HARDWARE PIPELINE FOR DATA COMPRESSION ACCELERATOR OF A DATA PROCESSING UNIT
A highly programmable device, referred to generally as a data processing unit, having multiple processing units for processing streams of information, such as network packets or storage packets, is described. The data processing unit includes one or more specialized hardware accelerators configured to perform acceleration for various data processing functions. This disclosure describes a programmable hardware-based data compression accelerator that includes a pipeline for performing static dictionary-based and dynamic history-based compression on streams of information, such as network packets. The search block may support single and multi-thread processing, and multiple levels of compression effort. To achieve high-compression, the search block may operate at a high level of effort that supports a single thread and use of both a dynamic history of the input data stream and a static dictionary of common words. The static dictionary may be useful in achieving high-compression where the input data stream is relatively small.
SYSTEM AND METHOD FOR GLOBAL DATA COMPRESSION
A system and method for global data compression. The method includes splitting a dataset into a plurality of blocks; for each block of the plurality of blocks: computing at least one similarity hash for the block; determining, based on the at least one similarity hash, whether a similar block is found for the block, wherein a similar block for a block has a similarity hash that is similar to one of the computed at least one similarity hash for the block; compressing the block by replacing data of the block with a reference to the similar block and a delta when a similar block is found, wherein the delta is a difference in data between the block and the similar block; and compressing the block independently when a similar block is not found.
Techniques for optimizing entropy computations
Techniques for data processing may include: determining a data layout for a configuration of counters stored in registers, wherein each of the registers is configured to store at least two counters, and each counter is associated with a particular data item allowable in the data set and denotes a current frequency of the particular data item; receiving data items of a data chunk of the data set; for each data item received, performing processing including: determining a first of the counters corresponding to the data item, wherein the first counter is stored in a first of the registers and denotes a current frequency of the data item; and incrementing the first counter stored in the first register by one; and determining, in accordance with the counters stored in the registers, an entropy value for the data chunk.
Advanced database decompression
A method, a system, and a computer program product for decompressing data. One or more compressed blocks in a set of stored compressed blocks responsive to a request to access data in the set of stored compressed blocks are identified. String prefixes inside the identified compressed blocks are decompressed using front coding. String suffixes inside the identified compressed blocks are decompressed using a re-pair decompression. Uncompressed data is generated.
Vector processing for segmentation hash values calculation
A system for segmenting an input data stream using vector processing, comprising a processor adapted to repeat the following steps throughout an input data stream to create a segmented data stream consisting a plurality of segments: apply a rolling sequence over a sequence of consecutive data items of an input data stream, the rolling sequence includes a subset of consecutive data items of the sequence, calculate concurrently a plurality of partial hash values each by one of a plurality of processing pipelines of the processor, each for a respective one of a plurality of partial rolling sequences each including evenly spaced data items of the subset, determine compliance of each of the plurality of partial hash values with one or more respective partial segmentation criteria and designate the sequence as a variable size segment when at least some of the partial hash values comply with the respective partial segmentation criteria.
DATA PROCESSING METHOD AND APPARATUS
A method of compression is disclosed in which an input sequence of bits is divided into a plurality of portions. Each portion is sub-divided into a plurality of sub-divisions. Frequency analysis is performed to determine the number of occurrences of each sub-division permutation and a processed sequence of bits is generated based on the frequency analysis. The processed sequence of bits includes extraction information for use in reconstructing said input sequence of bits from said processed sequence of bits. The extraction information comprises sub-division order information identifying an ordered sequence comprising each possible sub-division permutation arranged in order of how many times, within said input sequence of bits, a portion comprises a sub-division having bits arranged in that possible sub-division permutation. The sub-division order information includes an index value representing the order of the corresponding ordered sequence, based on a preconfigured mapping between said index value and the order of the corresponding ordered sequence.
TAPE DRIVE MEMORY DEDUPLICATION
A method and system for improving tape drive memory storage is provided. The method includes receiving, by a storage tape drive, a data stream for storage. The data stream is passed through a non-volatile memory device (NVS2) of the storage tape drive. The data stream is divided into adjacent variable length data chunks and a chunk list file including similarity identifiers for each of the adjacent variable length data chunks is generated and stored within a (non-volatile memory device) NVS1. Duplicate data including duplicated data with respect to a group of data chunks of the adjacent variable length data chunks is identified and deleted from the NVS2 of the storage tape drive such that the group of data chunks remains within NVS2. The group of data chunks is written to a data storage tape cartridge. Pointers identifying each data chunk and an associated storage position are generated and stored.
Tape drive memory deduplication
A method and system for improving tape drive memory storage is provided. The method includes receiving, by a storage tape drive, a data stream for storage. The data stream is passed through a non-volatile memory device (NVS2) of the storage tape drive. The data stream is divided into adjacent variable length data chunks and a chunk list file including similarity identifiers for each of the adjacent variable length data chunks is generated and stored within a (non-volatile memory device) NVS1. Duplicate data including duplicated data with respect to a group of data chunks of the adjacent variable length data chunks is identified and deleted from the NVS2 of the storage tape drive such that the group of data chunks remains within NVS2. The group of data chunks is written to a data storage tape cartridge. Pointers identifying each data chunk and an associated storage position are generated and stored.
ADVANCED DATABASE DECOMPRESSION
A method, a system, and a computer program product for decompressing data. One or more compressed blocks in a set of stored compressed blocks responsive to a request to access data in the set of stored compressed blocks are identified. String prefixes inside the identified compressed blocks are decompressed using front coding. String suffixes inside the identified compressed blocks are decompressed using a re-pair decompression. Uncompressed data is generated.