Patent classifications
H03M7/3088
Compression, searching, and decompression of log messages
Log messages are compressed, searched, and decompressed. A dictionary is used to store non-numeric expressions found in log messages. Both numeric and non-numeric expressions found in log messages are represented by placeholders in a string of log “type” information. Another dictionary is used to store the log type information. A compressed log message contains a key to the log-type dictionary and a sequence of values that are keys to the non-numeric dictionary and/or numeric values. Searching may be performed by parsing a search query into subqueries that target the dictionaries and/or content of the compressed log messages. A dictionary may reference segments that contain a number of log messages, so that all log message need not be considered for some searches.
HIGH-DENSITY COMPRESSION METHOD AND COMPUTING SYSTEM
Certain implementations of the disclosed technology may include methods and computing systems for performing high-density data compression, particularly on numerical data that demonstrates various patterns, and patterns of patters. According to an example implementation, a method is provided. The method may include extracting a data sample from a data set, compressing the data sample using a first compression filter configuration, and calculating a compression ratio associated with the first compression filter configuration. The method may also include compressing the data sample using a second compression filter configuration and calculating a compression ratio associated with the second compression filter configuration. A particular compression filter configuration to utilize in compressing the entire data set may be selected based on a comparison of the compression ratio associated with the first compression filter configuration and a compression ratio associated with the second compression filter configuration.
Memory system
According to one embodiment, a memory system includes a compressor configured to output second data obtained by compressing input first data and a non-volatile memory to which third data based on the second data output from the compressor is written. The compressor includes a dictionary coding unit configured to perform dictionary coding on the first data, an entropy coding unit configured to perform entropy coding on the result of the dictionary coding, a first calculation unit configured to calculate compression efficiencies of the dictionary coding and the entropy coding, and a first control unit configured to control an operation of at least one of the dictionary coding unit and the entropy coding unit based on the compression efficiencies and a power reduction level.
High-density compression method and computing system
Certain implementations of the disclosed technology may include methods and computing systems for performing high-density data compression, particularly on numerical data that demonstrates various patterns, and patterns of patters. According to an example implementation, a method is provided. The method may include extracting a data sample from a data set, compressing the data sample using a first compression filter configuration, and calculating a compression ratio associated with the first compression filter configuration. The method may also include compressing the data sample using a second compression filter configuration and calculating a compression ratio associated with the second compression filter configuration. A particular compression filter configuration to utilize in compressing the entire data set may be selected based on a comparison of the compression ratio associated with the first compression filter configuration and a compression ratio associated with the second compression filter configuration.
Efficient generalized boundary detection
Fast, efficient, and robust compression-based methods for detecting boundaries in arbitrary datasets, including sequences (1D datasets), are desired. The methods, each employing three simple algorithms, approximate the information distance between two adjacent sliding windows within a dataset. One of the algorithms calculates an initial ordered list of subsequences; while a second algorithm updates the ordered list of subsequences by dropping a first entry and appending a last entry rather than calculating completely new ordered lists with each iteration. Large values in the distance metric are indicative of boundary locations. A smoothed z-score or a wavelet-based algorithm may then be used to locate peaks in the distance metric, thereby identifying boundary locations. An adaptive version of the method employs a collection of window sizes and corresponding weighting functions, making it more amenable to real datasets with unknown, complex, and changing structures.
Data compression techniques
Techniques and solutions are described for compressing data and facilitating access to compressed data. Compression can be applied to proper data subsets of a data set, such as to columns of a table. Using various methods, the proper data subsets can be evaluated to be included in a group of proper data subsets to be compressed using a first compression technique, where unselected proper data subsets are not compressed using the first compression technique. Data in the data set can be reordered based on a reordering sequence for the proper data subsets. Reordering data in the data set can improve compression when at least a portion of the proper data subsets are compressed. A data structure is provided that facilitates accessing specified data stored in a compressed format.
Encoding / Decoding System and Method
A computer-implemented method, computer program product and computing system for: processing an unencoded data file to identify a plurality of file segments, wherein the unencoded data file is a dataset for use with a blockchain process; mapping each of the plurality of file segments to a portion of a dictionary file to generate a plurality of mappings that each include a starting location and a length, thus generating a related encoded data file based, at least in part, upon the plurality of mappings; receiving a request to manipulate the unencoded data file from the blockchain process; and processing the related encoded data file based, at least in part, upon the plurality of mappings and the dictionary file to generate a modified encoded data file that represents the requested manipulations of the unencoded data file.
COMPRESSION DEVICE AND DECOMPRESSION DEVICE
According to one embodiment, an interleaving unit divides a symbol string into first and second symbols. A first coding unit converts the first symbols to first codewords. A first packet generating unit generates first packets including the first codewords. A first request generating unit generates first packet requests including sizes of variable length packets. A second coding unit converts the second symbols to second codewords. A second packet generating unit generates second packets including the second codewords. A second request generating unit generates second packet requests including sizes of variable length packets. A multiplexer outputs a compressed stream including the first and second variable length packets cut out from the first and second packets.
Column data compression schemes for scaling writes and reads on database systems
A request for performing a data storing operation directed to a database table that comprises a plurality of table columns is received. Columnar compression metadata is accessed to identify one or more table columns in the database table, each of the one or more table columns being designated to store compressed columnar values. The columnar compression metadata is used to apply one or more columnar compression methods to generate, from one or more uncompressed columnar values received with the request for the data storing operation, one or more compressed columnar values to be persisted in the one or more table columns in the database table. A database statement is executed to persist the one or more compressed columnar values in the one or more table columns in the database table.
K-mer based genomic reference data compression
A computer-implemented method includes receiving genomic data associated with a plurality of genomes and identifying k-mer sets within the genomic data. The method includes constructing a k-mer subset tree according to the following process: performing iterative pairwise comparisons on the k-mer sets, wherein the iterative pairwise comparisons identify fragments with the most shared k-mers, merging the identified fragments into non-leaf nodes of the k-mer subset tree, and placing each remaining k-mer into a leaf node of the k-mer subset tree. The method includes storing the k-mer subset tree. A computer program product for data compression includes a computer readable storage medium having program instructions embodied therewith. The program instructions are executable by a computer to cause the compute to perform the foregoing method. A system includes a processor and logic. The logic is configured to perform the foregoing method.