Patent classifications
H03M7/3077
Memory system and information processing system
A memory system includes a nonvolatile memory, an interface circuit, and a controller configured to upon receipt of a plurality of write commands for storing write data in the nonvolatile memory via the interface circuit, acquire compression-ratio information about the write data associated with each write command, determine a compression ratio of each write data based on the acquired compression-ratio information, and determine an execution order of the write commands based on the determined compression ratio.
WARM START FILE COMPRESSION USING SEQUENCE ALIGNMENT
Compressing files is disclosed. An input file to be compressed is first aligned. Aligning the file includes splitting the file into sequences that can be aligned. The result is a compression matrix, where each row of the matrix corresponds to part of the file. The compression matrix may also serve as a warm start if additional compression is desired. Compression may be performed in stages, where an initial compression matrix is generated in a first stage using larger letter sizes for alignment and then a second compression stage is performed using smaller letter sizes. A consensus sequence id determined from the compression matrix. Using the consensus sequence, pointer pairs are generated. Each pointer pair identifies a subsequence of the consensus matrix. The compressed file includes the pointer pairs and the consensus sequence.
Content-adaptive tiling solution via image similarity for efficient image compression
Techniques are provided herein for more efficiently storing images that have a common subject, such as product images that share the same product in the image. Each image undergoes an adaptive tiling procedure to split the image into a plurality of tiles, with each tile identifying a region of the image having pixels with the same content. The tiles across multiple images can then be clustered together and those tiles having identical content are removed. Once all duplicate tiles have been removed from the set of all tiles across the images, the tiles are once again clustered based on their encoding scheme and certain encoding parameters. Tiles within each cluster are compressed using the best compression technique for the tiles in each corresponding cluster. By removing duplicative tile content between numerous images of the same subject, the total amount of data that needs to be stored is reduced.
CODE TABLE GENERATION DEVICE, MEMORY SYSTEM, AND CODE TABLE GENERATION METHOD
According to one embodiment, a code table generation device includes a table generation unit, a merge unit and a tree generation unit. The table generation unit generates a frequency table including symbols and frequencies of occurrence respectively associated with the symbols, based on a frequency of occurrence for each symbol of input symbols. The merge unit acquires top K symbols in descending order of the frequencies of occurrence and remaining symbols from the symbols, divides the remaining symbols into one or more symbol sets, and determines a frequency of occurrence associated with a root node of each of subtrees correspond to the respective symbol sets. The tree generation unit generates a Huffman tree using the K symbols and the root node of each of the subtrees.
Data compression using dictionaries
Data units of a dataset may be compressed by clustering the data units into clusters, selecting a reference unit for each unit cluster, and compressing data units of each unit cluster using the reference unit of the unit cluster as a dictionary. The computational efficiency of the clustering algorithm may be improved by not applying it to data units themselves, but rather to hash values of the data units, where the hash values have a much smaller size than the data units. The hash function may be a locality-sensitive hash (LSH) function. The reference unit of a cluster may be determined in any of a variety of ways, for example, by selecting a centroid or exemplar of the cluster. Clusters, including their references values, may be indexed in a cluster index (e.g., a Faiss index), which may be searched to assign future added or modified data units to clusters.
Tensor dropout in a neural network
A method for selectively dropping out feature elements from a tensor in a neural network includes receiving a first tensor from a first layer of a neural network and obtaining a compressed mask for the first tensor. N mask bits of the compressed mask are received at each of N lanes of a reconfigurable computing unit and feature elements of the first tensor are respectively received at the N lanes. Feature elements are selectively dropped out from the first tensor to generate feature elements to use as at least part of a second tensor by selecting, based on a single mask bit of the compressed mask selected based on the lane, either a zero value or a feature element received at the lane for a feature element of the second tensor. The second tensor is propagated to a second layer of the neural network.
Cluster-based data compression for AI training on the cloud for an edge network
A disclosed information handling system includes an edge device communicatively coupled to a cloud computing resource. The edge device is configured to respond to receiving, from an internet of things (IoT) unit, a numeric value for a parameter of interest by determining a compressed encoding for the numeric value in accordance with a non-lossless compression algorithm. The edge device transmits the compressed encoding of the numeric value to the cloud computing resource. The cloud computing resource includes a decoder communicatively coupled to the encoder and configured to respond to receiving the compressed encoding by generating a surrogate for the numeric value. The surrogate may be generated in accordance with a probability distribution applicable to the parameter of interest. The compression algorithm may be a clustering algorithm such as a k-means clustering algorithm.
REORDERING DATASETS IN A TABLE FOR INCREASED COMPRESSION RATIO
Selecting tables for compression by threshold statistical values. Identified tables are reordered according to fields having the lowest cardinality to increase the size of character strings replaced by keys during compression. Field locations are mapped between the original table and the reordered table. Dictionary-based compression is performed on reordered tables.
METHOD FOR COMPRESSION OF TIME TAGGED DATA FROM TIME CORRELATED SINGLE PHOTON COUNTING
A computer-implemented method for compression of Time Tagged data including Time Tagged data records includes the step of separating the Time Tagged data records into a plurality of groups. The method also includes sorting the Time Tagged data records in at least one of the groups by a photon arrival time. The method also includes subtracting a content of a record by a content of an adjacent record resulting in modified records. The method also includes compressing the modified records with a compression method.
Data compression for columnar databases into arbitrarily-sized persistent pages
A method for compressing columnar data may include generating, for a data column included in a data chunk, a dictionary enumerating, in a sorted order, a first set of unique values included in the first data column. A compression technique for generated a compressed representation of the data column having a fewest quantity of bytes may be identified based at least on the dictionary. The compression technique including a dictionary compression applying the dictionary and/or another compression technique. A compressed data chunk may be generated by applying the compression technique to compress the data column included in the data chunk. The compressed data chunk may be stored at a database in a variable-size persistent page whose size is allocated based on the size of the compressed representation of the data column. Related systems and articles of manufacture are also provided.