Patent classifications
H03M7/3097
Computerized data compression and analysis using potentially non-adjacent pairs
A computerized method of compressing symbolic information organized into a plurality of documents, each document having a plurality of symbols, includes: (i) automatically identifying a plurality of sequential and non-sequential symbol pairs in an input document; (ii) counting the number of appearances of each unique symbol pair; and (iii) producing a compressed document that includes a replacement symbol at each position associated with one of the plurality of symbol pairs, at least one of which corresponds to a non-sequential symbol pair. For each non-sequential pair the compressed document includes corresponding indicia indicating a distance between locations of the non-sequential symbols of the pair in the input document. In some instances the plurality of symbol pairs includes only those pairs of non-sequential symbols for which the distance between locations of the non-sequential symbols of the pair in the input document is less than a numeric distance cap.
CONTROL APPARATUS, PROGRAM UPDATE SYSTEM, AND PROGRAM UPDATE METHOD
A control apparatus includes a reception unit which receives distribution data which contains compressed update data and a header which includes information to designate any one of a plurality of update systems, a decompression unit which decompresses the update data from the distribution data received by the reception unit, and a restoration unit which restores a new program after updating according to an update system designated in the header using the update data decompressed by the decompression unit. The decompression unit switches a decompression method when the update data is decompressed on the basis of the update system designated in the header.
HYBRID DATA REDUCTION
An information handling system may include at least one processor and a memory coupled to the at least one processor. The information handling system may be configured to receive data comprising a plurality of data chunks; perform deduplication on the plurality of data chunks to produce a plurality of unique data chunks; determine a compression ratio for respective pairs of the unique data chunks; determine a desired compression order for the plurality of unique data chunks based on the compression ratios; combine the plurality of unique data chunks in the desired compression order; and perform data compression on the combined plurality of unique data chunks.
ADVANCED DATABASE DECOMPRESSION
A method, a system, and a computer program product for decompressing data. One or more compressed blocks in a set of stored compressed blocks responsive to a request to access data in the set of stored compressed blocks are identified. String prefixes inside the identified compressed blocks are decompressed using front coding. String suffixes inside the identified compressed blocks are decompressed using a re-pair decompression. Uncompressed data is generated.
ADVANCED DATABASE COMPRESSION
A method, a system, and a computer program product for executing a database compression. A compressed string dictionary having a block size and a front coding bucket size is generated from a dataset. Front coding is applied to one or more buckets of strings in the dictionary having the front coding bucket size to generate one or more front coded buckets of strings. One or more portions of the generated front coded buckets of strings are concatenated to form one or more blocks having the block size. Each block is compressed. A set of compressed blocks is stored. The set of the compressed blocks stores all strings in the dataset.
NORMALIZED PROBABILITY DETERMINATION FOR CHARACTER ENCODING
Examples described herein relate to an apparatus comprising a central processing unit (CPU) and an encoding accelerator coupled to the CPU, the encoding accelerator comprising an entropy encoder to determine normalized probability of occurrence of a symbol in a set of characters using a normalized probability approximation circuitry, wherein the normalized probability approximation circuitry is to output the normalized probability of occurrence of a symbol in a set of characters for lossless compression. In some examples, the normalized probability approximation circuitry includes a shifter, adder, subtractor, or a comparator. In some examples, the normalized probability approximation circuitry is to determine normalized probability by performance of non-power of 2 division without computation by a Floating Point Unit (FPU). In some examples, the normalized probability approximation circuitry is to round the normalized probability to a decimal.
REDUCING A SIZE OF MULTIPLE DATA SETS
A computing device may select a plurality of data sets, determine a set of strings that are included in at least two data sets of the plurality of data sets, and select a particular string of the set of strings. The computing device may replace each occurrence of the particular string in the plurality of data sets to create a modified plurality of data sets such that the modified plurality of data sets is smaller in size than the plurality of data sets. The computing device may assign a reference to the particular string and replace each occurrence of the particular string in the plurality of data sets to create a plurality of modified data sets. The computing device may replace may store the reference and the particular string in a table.
Method, apparatus, system, and computer program product for data compression
According to one aspect of the present application, a method for data compression comprises: creating a first trie for a first set of strings, the first set of strings comprising a plurality of raw data strings, wherein a trie consists of a plurality of nodes linked through parent-child relation, and wherein each edge of the trie is of at least one character and the edge corresponds to a state transition from a parent node of the edge to a child node of the edge; collecting edges of the first trie longer than a predetermined length and making these edges a first subset of strings of the first trie; segmenting a string in the first subset of strings into two or more fragments when the string satisfies a predetermined condition and collecting all segmented fragments and all un-segmented strings in the first subset of strings as a segmented set of strings; and storing the first set of strings using the first trie and the segmented set of strings so as to compress the raw data strings.
Normalized probability determination for character encoding
Examples described herein relate to an apparatus comprising a central processing unit (CPU) and an encoding accelerator coupled to the CPU, the encoding accelerator comprising an entropy encoder to determine normalized probability of occurrence of a symbol in a set of characters using a normalized probability approximation circuitry, wherein the normalized probability approximation circuitry is to output the normalized probability of occurrence of a symbol in a set of characters for lossless compression. In some examples, the normalized probability approximation circuitry includes a shifter, adder, subtractor, or a comparator. In some examples, the normalized probability approximation circuitry is to determine normalized probability by performance of non-power of 2 division without computation by a Floating Point Unit (FPU). In some examples, the normalized probability approximation circuitry is to round the normalized probability to a decimal.
Advanced database decompression
A method, a system, and a computer program product for decompressing data. One or more compressed blocks in a set of stored compressed blocks responsive to a request to access data in the set of stored compressed blocks are identified. String prefixes inside the identified compressed blocks are decompressed using front coding. String suffixes inside the identified compressed blocks are decompressed using a re-pair decompression. Uncompressed data is generated.