Patent classifications
H03M7/3084
Self-checking compression
Methods, apparatus, systems, and software for implementing self-checking compression. A byte stream is encoded to generate tokens and selected tokens are encoded with hidden parity information in a compressed byte stream that may be stored for later streaming or streamed to a receiver. As the compressed byte stream is received, it is decompressed, with the hidden parity information being decoded and used to detect for errors in the decompressed data, enabling errors to be detected on-the-fly rather than waiting to perform a checksum over an entire received file. In one embodiment the byte stream is encoded using a Lempel-Ziv 77 (LZ77)-based encoding process to generate a sequence of tokens including literals and references, with all or selected references encoded with hidden parity information in a compressed byte stream having a standard format such as DEFLATE or Zstandard. The hidden parity information is encoded such that the compressed byte stream may be decompressed without parity checks using standard DEFLATE or Zstandard decompression schemes. Dictionary coders such as LZ78 and LZW may also be used.
COMPRESSION CIRCUIT, STORAGE SYSTEM, AND COMPRESSION METHOD
According to one embodiment, a compression circuit generates substrings from input data for (3+M) cycles, the input data being N bytes per cycle, a byte length of each substring being greater than or equal to (N×(1+M)+1); obtains a set of matches, each of the matches including at least one past input data which input past and corresponds to at least a part of each of the substrings; selects a subset of matches from the set of matches including the input data of one cycle; and outputs the subset of matches. M is zero or a natural number. N is a positive integer which is two or more.
Compression of localized files
A method for compressing a first application file and second application file includes accessing the first and the second application files, the first application file being in a first language and the second application being in a second language and being a counterpart of the first application file, decompressing the first and second application files to access internal files for the first and the second application files, comparing one of the first internal files to one of the second internal files, upon determining that the first internal file is identical to the second internal file, copying one of the internal files to an output folder, and upon determining that the files are not identical, copying both of the internal files to the output folder, or executing a differencing procedure on the first and second internal files to identify differences between them, storing data about the differences in the output folder, and compressing the output folder into one output file.
TECHNIQUES FOR DETERMINING COMPRESSION TIERS AND USING COLLECTED COMPRESSION HINTS
Tiers of compression algorithms may be determined using compression information collected regarding compression ratios achieved for data sets using compression algorithms. Each tier may meet specified criteria regarding expected compression ratios achieved for a specified portion or number of data sets. Compression algorithms of each tier may be implemented by a different hardware device that may include hardware accelerators for the algorithms of the tier. Different tiers, and thus different hardware devices, achieve different levels of compression. A recommendation may be provided using compression information collected, such as from one of the hosts, regarding which hardware device to use for compression. The recommendation may be to purchase a license to use or whether to purchase a particular hardware device for compression. Compression information may be collected by a host that issues tagged I/Os providing a hint regarding what compression algorithm to use for the particular I/O operation data.
SYSTEM AND METHOD FOR MITIGATING EFFECTS OF HASH COLLISIONS IN HARDWARE DATA COMPRESSION
Systems and methods are provided for mitigating effects of hash collisions in hardware data compression, for example reducing or avoiding the side effects of hash collisions, or reducing or avoiding slow downs caused by hash collisions. In an aspect, a processor-implemented method includes: hashing an input data byte sequence to produce a hash value, the input data byte sequence being located at a sequence address within an input data stream; and storing, in a hash table at a hash address corresponding to the hash value, the sequence address and a portion of the input data byte sequence. In an aspect, to further avoid hash collisions, hash memory accesses are distributed among a plurality of parallel hash banks to increase the throughput. Another aspect virtually extends a hash depth by extending a data match search around broken hash links, going backward in the data sequence.
SELECTIVE DATA COMPRESSION BASED ON DATA SIMILARITY
Technology is disclosed for selectively compressing data based on similarity of pages within the data that is to be compressed. At least one corresponding hash value is generated for each one of multiple candidate pages to be compressed. In response to the hash values generated for the candidate pages, the technology selects a set of similar candidate pages from the candidate pages. The set of similar candidate pages are a subset of the candidate pages that includes less than all the candidate pages. The set of similar candidate pages are compressed as a single unit, separately from one or more other ones of the candidate pages that were not selected to be included in the set of similar candidate pages.
SYSTEM AND METHOD FOR MULTIPLE PASS DATA COMPACTION UTILIZING DELTA ENCODING
The inventor has conceived, and reduced to practice, a system and method for data compaction using that applies delta encoding methods to entropy encoding methods to improve data compaction of entropy encoding methods under certain conditions and when compacting data having certain characteristics. Delta encoding may be applied to entropy encoding methods to further compact data sets by reducing the number of sourceblocks included in a codebook to those most commonly encountered in data to be encoded and, where mismatches occur during encoding, using delta encoding of bit differences with existing sourceblocks in the codebook rather than adding new sourceblocks to the codebook.
PARTITIONING, PROCESSING, AND PROTECTING COMPRESSED DATA
A technique of partitioning compressed data includes splitting the compressed data into multiple portions. The technique further includes storing a decompression state in association with a current portion, wherein the decompression state is based on data of a previous portion and enables decompression of the current portion independently of other portions.
Pattern-based string compression
The disclosure relates to compressing strings by reducing the number of string characters that are stored. For example, a system may generate a first radix tree for a set of strings and a second radix tree for a reverse of each of the set of strings. The system may merge nodes of the first radix tree and/or second radix tree based on a tuning parameter. The system may identify, based on the first radix tree, beginning portions of at least two strings that match and identify, based on the second radix tree, ending portions of at least two strings that match. The system may use the matching beginning portions, the unique portions, and/or the matching ending portions to generate a pattern that matches the two or more strings. The system may store the two or more strings in association with the generated pattern without their matching beginning and/or ending portions.
STORAGE SYSTEM AND DATA PROCESSING METHOD IN STORAGE SYSTEM
Deterioration of compression throughput including a decompression check after data compression is suppressed. Provided is a storage system including an interface and a controller. The controller includes a compression circuit configured to generate compressed data by compressing received data received via the interface; and a decompression circuit configured to decompress the compressed data before storing the compressed data in a storage drive to confirm data consistency. The compression circuit sequentially executes a compression task of the received data, sequentially generates packets of the compressed data, and transfers the packets to the decompression circuit. The decompression circuit decompresses the received packet in parallel with the compression task.