H03M7/3088

Compression of machine-generated data
11758022 · 2023-09-12 · ·

A pre-shared compression dictionary is received. The pre-shared compression dictionary was generated based on an analysis of sample data for use in compression of other data. A compressed version of a batch of machine-generated data is received. The batch of machine-generated data has been compressed at least in part using the pre-shared compression dictionary and a batch-specific compression dictionary. The received compressed batch is uncompressed using the batch-specific compression dictionary to determine an intermediate version. The intermediate version is uncompressed using the pre-shared compression dictionary to determine an uncompressed version of the batch of machine-generated data.

INFORMATION PROCESSING APPARATUS AND PRESET DICTIONARY GENERATING METHOD
20230283294 · 2023-09-07 ·

According to one embodiment, an information processing apparatus includes a processor. The processor divides teacher data into character strings, calculates a score of each of the character strings based on at least an appearance frequency of each character string in the character strings, an appearance position of each of the character string in the character strings, and a length of each of the character strings, and determines a position of each of the character strings in a preset dictionary based on the score.

Multi-pixel caching scheme for lossless encoding
11653009 · 2023-05-16 ·

Systems and methods are provided for encoding a multi-pixel caching scheme for lossless encoders. The systems and methods can include obtaining a sequence of pixels, determining repeating sub-sequences of the sequence of pixels consisting of a single repeated pixel and non-repeating sub-sequences of the sequence of pixels, responsive to the determination, encoding the repeating sub-sequences using a run-length of the repeated pixel and encoding the non-repeating sub-sequences using a multi-pixel cache, wherein the encoding using a multi-pixel cache comprises, encoding non-repeating sub-sequences stored in the multi-pixel cache as the location of the non-repeating sub-sequences in the multi-pixel cache, and encoding non-repeating sub-sequences not stored in the multi-pixel cache using the value of the pixels in the non-repeating sub-sequences.

Data compression using dictionaries

Data units of a dataset may be compressed by clustering the data units into clusters, selecting a reference unit for each unit cluster, and compressing data units of each unit cluster using the reference unit of the unit cluster as a dictionary. The computational efficiency of the clustering algorithm may be improved by not applying it to data units themselves, but rather to hash values of the data units, where the hash values have a much smaller size than the data units. The hash function may be a locality-sensitive hash (LSH) function. The reference unit of a cluster may be determined in any of a variety of ways, for example, by selecting a centroid or exemplar of the cluster. Clusters, including their references values, may be indexed in a cluster index (e.g., a Faiss index), which may be searched to assign future added or modified data units to clusters.

Encoding / Decoding System and Method
20230132017 · 2023-04-27 ·

A computer-implemented method, computer program product and computing system for: processing an unencoded data file to identify a plurality of file segments; mapping each of the plurality of file segments to a portion of a dictionary file to generate a plurality of mappings, wherein each of the plurality of mappings includes a starting location and a length, thus generating a related encoded data file based, at least in part, upon the plurality of mappings; and storing the related encoded data file on a cloud-based storage platform.

Encoding / Decoding System and Method
20230136470 · 2023-05-04 ·

A computer-implemented method, computer program product and computing system for: processing an unencoded data file to identify a plurality of file segments, wherein the unencoded data file is a dataset for use with an EHR process; mapping each of the plurality of file segments to a portion of a dictionary file to generate a plurality of mappings that each include a starting location and a length, thus generating a related encoded data file based, at least in part, upon the plurality of mappings; receiving a request to manipulate the unencoded data file from the EHR process; and processing the related encoded data file based, at least in part, upon the plurality of mappings and the dictionary file to generate a modified encoded data file that represents the requested manipulations of the unencoded data file.

REORDERING DATASETS IN A TABLE FOR INCREASED COMPRESSION RATIO
20230092510 · 2023-03-23 ·

Selecting tables for compression by threshold statistical values. Identified tables are reordered according to fields having the lowest cardinality to increase the size of character strings replaced by keys during compression. Field locations are mapped between the original table and the reordered table. Dictionary-based compression is performed on reordered tables.

DATA AWARE COMPRESSION IN A STORAGE SYSTEM
20230353167 · 2023-11-02 · ·

A method for storage system data aware compression, the method may include pre-compressing data units received by the storage system, by different pre-compression units to provide different pre-compressed versions of the data units; wherein the different pre-compression schemes are associated with different compression schemes, wherein at least some of the different compression schemes are data type specific compression schemes; calculating entropies of the different pre-compressed versions; and selecting a compression scheme out of the different compression schemes based on the entropies of the different pre-compressed versions.

Parallel decompression of compressed data streams
11817886 · 2023-11-14 · ·

In various examples, metadata may be generated corresponding to compressed data streams that are compressed according to serial compression algorithms—such as arithmetic encoding, entropy encoding, etc.—in order to allow for parallel decompression of the compressed data. As a result, modification to the compressed data stream itself may not be required, and bandwidth and storage requirements of the system may be minimally impacted. In addition, by parallelizing the decompression, the system may benefit from faster decompression times while also reducing or entirely removing the adoption cycle for systems using the metadata for parallel decompression.

COOPERATIVE COMPRESSION IN DISTRIBUTED DATABASES
20230344446 · 2023-10-26 ·

In various embodiments a computer-implemented method for managing use of a shared compression dictionary in a distributed database environment. The method includes determining that a given version of the shared compression dictionary should be designated as a current primary version of the shared compression dictionary. The method also includes receiving, from a client device, first write data compressed with a previous primary version of the shared compression dictionary and in response to receiving the first write data, transmitting, to the client device, the current primary version of the shared compression dictionary and an instruction to compress new write data with the current primary version of the shared compression dictionary. Additionally, the method includes receiving, from the client device, a second write data compressed with the current primary version of the shared compression dictionary and storing the second write data in a database.