H03M7/3077

Systems and Methods for Use in Compressing Data Structures
20170221153 · 2017-08-03 ·

Systems and methods are provided for compressing data structures. One exemplary method includes accessing a target data structure defining multiple columns, and filtering the columns based on a cardinality of terms in each of the columns. The method also includes, for each filtered column, sorting the data structure by the column, compressing the sorted data structure, and identifying the filtered column as a candidate column when the size of the compressed and sorted data structure is less than a baseline size. The method further includes, for each pair of candidate columns, sorting the data structure by the pair of candidate columns, compressing the pair-sorted data structure, and designating the compressed pair-sorted data structure as an object data structure and the pair of candidate columns as a sorting column pair, when said compressed pair-sorted data structure includes a smallest size compared to sizes of other compressed pair-sorted data structures.

DATA COMPRESSION USING DICTIONARIES

Data units of a dataset may be compressed by clustering the data units into clusters, selecting a reference unit for each unit cluster, and compressing data units of each unit cluster using the reference unit of the unit cluster as a dictionary. The computational efficiency of the clustering algorithm may be improved by not applying it to data units themselves, but rather to hash values of the data units, where the hash values have a much smaller size than the data units. The hash function may be a locality-sensitive hash (LSH) function. The reference unit of a cluster may be determined in any of a variety of ways, for example, by selecting a centroid or exemplar of the cluster. Clusters, including their references values, may be indexed in a cluster index (e.g., a Faiss index), which may be searched to assign future added or modified data units to clusters.

CONTENT-ADAPTIVE TILING SOLUTION VIA IMAGE SIMILARITY FOR EFFICIENT IMAGE COMPRESSION
20220189069 · 2022-06-16 · ·

Techniques are provided herein for more efficiently storing images that have a common subject, such as product images that share the same product in the image. Each image undergoes an adaptive tiling procedure to split the image into a plurality of tiles, with each tile identifying a region of the image having pixels with the same content. The tiles across multiple images can then be clustered together and those tiles having identical content are removed. Once all duplicate tiles have been removed from the set of all tiles across the images, the tiles are once again clustered based on their encoding scheme and certain encoding parameters. Tiles within each cluster are compressed using the best compression technique for the tiles in each corresponding cluster. By removing duplicative tile content between numerous images of the same subject, the total amount of data that needs to be stored is reduced.

Compression device and decompression device

According to one embodiment, an interleaving unit divides a symbol string into first and second symbols. A first coding unit converts the first symbols to first codewords. A first packet generating unit generates first packets including the first codewords. A first request generating unit generates first packet requests including sizes of variable length packets. A second coding unit converts the second symbols to second codewords. A second packet generating unit generates second packets including the second codewords. A second request generating unit generates second packet requests including sizes of variable length packets. A multiplexer outputs a compressed stream including the first and second variable length packets cut out from the first and second packets.

A SYSTEM AND METHOD FOR COMPRESSING CONTROLLER AREA NETWORK (CAN) MESSAGES
20220159098 · 2022-05-19 ·

A system for compressing Controller Area. Network (CAN) messages, the system comprising a processing resource configured to: obtain a CAN messages sequence including a plurality of CAN messages intercepted at a given order by at least one device adapted to monitor messages transmitted via communication channel(s) of a vehicle; group the CAN messages of the CAN messages sequence into MID groups, by a CAN MID field of the CAN messages; for each given MID group of the MID groups split the CAN messages of the MID group into field groups, wherein each field group comprises a respective field of a plurality of to fields of the CAN messages of the MID group; employ at least one compression scheme on at least one of the field groups; generate a data structure comprising the field groups; and compress the data structure using a lossless compression algorithm, giving rise to a. compressed data structure.

Systems and methods for version chain clustering

A system, a method and a computer program product for storing data, which include receiving a data stream having a plurality of transactions that include at least one portion of data, determining whether at least one portion of data within at least one transaction is substantially similar to at least another portion of data within at least one transaction, clustering together at least one portion of data and at least another portion of data within at least one transaction, selecting one of at least one portion of data and at least another portion of data as a representative of at least one portion of data and at least another portion of data in the received data stream, and storing each representative of a portion of data from each transaction in the plurality of transactions, wherein a plurality of representatives is configured to form a chain representing the received data stream.

PERMUTATION-BASED CODING FOR DATA STORAGE AND DATA TRANSMISSION
20220149865 · 2022-05-12 · ·

Methods of encoding and decoding data are described wherein the encoding method comprises: receiving a data file and dividing the data file or data stream into one or more data blocks, each data block having a predetermined size N and comprising a sequence of data units, e.g. byte values; and, iteratively encoding the data file into a data key based on a first permutation function and a first dictionary of permutation indices, preferably the encoded data file having a total size that is equal to or smaller than the original data file and preferably the data key having a size that is equal to or smaller than size of a data block. Iteratively encoding the data file comprises one or more encoding iterations, wherein each encoding iteration includes: determining a first permutation index defining a permutation to generate the first input data block from a first ordered data block, the generating including providing at least the first input data block to an input of the first permutation function, and the first ordered data block being obtainable by ordering the first input data block; determining a first permutation dictionary index representing a location in the first dictionary in which the first permutation index is stored; generating a first frequency data block defining the number of occurrences for each potential data value in the input data block, preferably determining the number of occurrences for each potential data value in the input data block and ordering the determined occurrences in a sequence of values in a hierarchical order, e.g. increasing or decreasing order of the data value; processing the frequency data block; and determining an encoded data block, the encoded data block comprising the first permutation dictionary index and the processed frequency data block. The encoding method further comprises outputting the data key comprising the one or more encoded data blocks and, optionally, iteration information.

BWT circuit arrangement and method

Disclosed approaches for performing a Burrows-Wheeler transform (BWT) of a sequence of data elements, S, include determining sets of less-than values and sets of equal-to values for the data elements. Index values are determined for the data elements based on the sets of less-than values. Each index value indicates a count of data elements of S that a data element is lexicographically greater than. Rank values are determined for the data elements of S based on the sets of less-than values and the sets of equal-to values. Each rank value indicates for the data element an order of the data element in the BWT relative to other ones of the data elements of equal value. Positions in the BWT of S for the data elements are selected based on the index values and rank values, and the data elements are output in the order indicated by the respective positions in the BWT.

Generating compressed representations of sorted arrays of identifiers

A method includes obtaining an array of sorted identifiers to be stored in a designated portion of a memory of a given computing system, determining a segment size for splitting elements of the array into a plurality of segments, splitting the array into the plurality of segments based at least in part on the determined segment size, and compressing the plurality of segments to create a plurality of compressed segments. The method also includes generating a balanced binary search tree comprising a plurality of nodes each identifying a range of elements of the array corresponding to a given one of the segments and comprising a pointer to a given compressed segment corresponding to the given segment. The method further includes maintaining the balanced binary search tree and the compressed segments in the designated portion of the memory, and processing queries to the array utilizing the balanced binary search tree.

Systems and methods of data compression

There is provided a computer implemented method of compressing a baseline dataset comprising a sequence of a plurality of instances of a plurality of unique data elements, the method comprising: providing a weight function that calculates an increasing value for a weight for each one of the plurality of instances of each one of the plurality of unique data elements in the baseline dataset, as a function of increasing number of previously processed sequential locations of each of the plurality of instances of each respective unique data element within the baseline dataset relative to a current sequential location of the baseline dataset, computing an encoding for the baseline dataset according to a distribution of the weight function computed for the plurality of unique data elements in the baseline dataset, and creating a compressed dataset according to the encoding.