Patent classifications
H03M7/3079
DATA COMPRESSION SYSTEM
A data compression apparatus is described which has an encoder configured to receive an input data item and to compress the data item into an encoding comprising a plurality of numerical values. The numerical values are grouped at least according to whether they relate to content of the input data item or style of the input data item. The encoder has been trained using a plurality of groups of training data items grouped according to the content and where training data items within individual ones of the groups vary with respect to the style. The encoder has been trained using a training objective which takes into account the groups.
COMPRESSION SCHEME WITH CONTROL OF SEARCH AGENT ACTIVITY
In connection with compression of an input stream, multiple portions of the input stream are searched against previously received portions of the input stream to find any matches of character strings in the previously received portions of the input stream. In some cases, matches of longer character strings, as opposed to shorter character strings, can be selected for inclusion in an encoded stream that is to be compressed. Delayed selection can occur whereby among multiple matches, a match that is longer can be selected for inclusion in the encoded stream and non-selected a character string match is reverted to a literal. A search engine that is searching an input stream to identify a repeat pattern of characters can cease to search for characters that were included in the selected character string match.
SYSTEMS AND METHODS FOR CALCULATING A PROBABILITY OF EXCEEDING STORAGE CAPACITY IN A VIRTUALIZED COMPUTING SYSTEM
Examples of systems are described for calculating a probability of exceeding storage capacity of a virtualized system in a particular time period using probabilistic models. The probabilistic models may advantageously take variances of storage capacity into consideration.
USE OF DATA PREFIXES TO INCREASE COMPRESSION RATIOS
A data compression system includes a memory to store a plurality of predetermined prefixes corresponding to a plurality of classes of data. A classifying module is configured to receive data, receive a class of the data, and select a prefix to compress the data from the plurality of predetermined prefixes based on the data and the class of the data. A compressing module is configured to compress the data using the prefix. A header generating module is configured to generate a header including an indication of the prefix used to compress the data, and to output the header and the compressed data for storage or transmission. Using the prefix from the predetermined prefixes to compress the data eliminates an overhead of fetching the prefix from outside the data compression system.
SYSTEM AND METHOD FOR DATA COMPACTION WITH CODEBOOK STATISTICAL ESTIMATES
A system and method for data compaction with codebook statistical estimates to improve entropy encoding methods to account for, and efficiently handle, previously-unseen data in data to be compacted. Training data sets are analyzed to determine the frequency of occurrence of each sourceblock in the training data sets. A mismatch probability estimate is calculated comprising an estimated frequency at which any given data sourceblock received during encoding will not have a codeword in the codebook. Entropy encoding is used to generate codebooks comprising codewords for data sourceblocks based on the frequency of occurrence of each sourceblock. A mismatch codeword is inserted into the codebook based on the mismatch probability estimate to represent those cases when a block of data to be encoded does not have a codeword in the codebook. During encoding, if a mismatch occurs, a secondary encoding process is used to encode the mismatched sourceblock.
MEMORY PRESERVING PARSE TREE BASED COMPRESSION WITH ENTROPY CODING
A method, computer program product, and system includes a processor obtaining data including values and generating a value conversion dictionary by applying a parse tree based compression algorithm to the data, where the value conversion dictionary includes dictionary entries that represent the values. The processor obtains a distribution of the values and estimates a likelihood for each based on the distribution. The processor generates a code word to represent each value, a size of each code word is inversely proportional to the likelihood of the word. The processor assigns a rank to each code word, the rank for each represents the likelihood of the value represented by the code word; and based on the rank associated with each code word, the processor reorders each dictionary entry in the value conversion dictionary to associate each dictionary entry with an equivalent rank, the reordered value conversion dictionary comprises an architected dictionary.
Non-binary context mixing compressor/decompressor
A technique for non-binary context mixing in a compressor includes generating, by a plurality of context models, model predictions regarding a value of a next symbol to be encoded. A mixer generates a set of final predictions from the model predictions. An arithmetic encoder generates compressed data based on received input symbols and the set of final predictions. The received input symbols belong to an alphabet having a size greater than two and the mixer generates a feature matrix from the model predictions and trains a classifier that generates the set of final predictions.
SELECTION OF DATA COMPRESSION TECHNIQUE BASED ON INPUT CHARACTERISTICS
A compression scheme can be selected for an input data stream based on characteristics of the input data stream. For example, when the input data stream is searched for pattern matches, input stream characteristics used to select a compression scheme can include one or more of: type and size of an input stream, a length of a pattern, a distance from a start of where the pattern is to be inserted to the beginning of where the pattern occurred previously, a gap between two pattern matches (including different or same patterns), standard deviation of a length of a pattern, standard deviation of a distance from a start of where the pattern is to be inserted to the beginning of where the pattern occurred previously, or standard deviation of a gap between two pattern matches. Criteria can be established whereby one or more characteristics are used to select a particular encoding scheme.
Efficient clustering of noisy polynucleotide sequence reads
A technique for clustering DNA reads from polynucleotide sequencing is described. DNA reads with a level of difference that is likely caused by errors in sequencing are grouped together in the same cluster. DNA reads that represent reads of different DNA molecules are placed in different clusters. The clusters are based on edit distance, which is the number of changes necessary to convert a given DNA read into another. The process of forming clusters may be performed iteratively and may use other types of distance that serve as an approximation for edit distance. Well clustered DNA reads provide a starting point for further analysis.
Memory preserving parse tree based compression with entropy coding
A method, computer program product, and system includes a processor obtaining data including values and generating a value conversion dictionary by applying a parse tree based compression algorithm to the data, where the value conversion dictionary includes dictionary entries that represent the values. The processor obtains a distribution of the values and estimates a likelihood for each based on the distribution. The processor generates a code word to represent each value, a size of each code word is inversely proportional to the likelihood of the word. The processor assigns a rank to each code word, the rank for each represents the likelihood of the value represented by the code word; and based on the rank associated with each code word, the processor reorders each dictionary entry in the value conversion dictionary to associate each dictionary entry with an equivalent rank, the reordered value conversion dictionary comprises an architected dictionary.