H03M7/6088

Compression strategy selection powered by machine learning

A data compression system comprising computer memory to store plural compression algorithms and a hardware processor to apply compression algorithm/s to incoming data items, wherein the compression algorithm to be applied to individual data item/s from among the incoming data items is selected, from among the plural compression algorithms, by the hardware processor, depending at least on the individual data item.

METHOD AND APPARATUS FOR COMPACTION OF DATA RECEIVED OVER A NETWORK
20170041419 · 2017-02-09 ·

Methods, apparatuses, and storage media associated with compaction of data from one or more computing devices are disclosed. In various embodiments, one or more Internet of Things (IoT) devices may transmit information to a computing system. The computing system may group together raw data received from these one or more IoT devices based on a shared attribute. The computing system may select a compaction scheme to represent the knowledge conveyed by a group of the raw data. The computing system may apply this compaction scheme to the group of raw data to generate data that is representative of the group of raw data. Other embodiments may be disclosed or claimed.

Real-time reduction of CPU overhead for data compression

Real-time reduction of CPU overhead for data compression is performed by a processor device in a computing environment. Non-compressing heuristics are applied on a randomly selected data sample from data sequences for determining whether to compress the data sequences. A compression potential is calculated based on the non-compressing heuristics. The compression potential is compared to a threshold value. The data sequences are either compressed if the compress threshold is matched, compressed using Huffman coding if Huffman coding threshold is matched, or stored without compression.

BINARIZATION OF DQP USING SEPARATE ABSOLUTE VALUE AND SIGN (SAVS) IN CABAC
20250071278 · 2025-02-27 ·

Video coding systems or apparatus utilizing context-based adaptive binary arithmetic coding (CABAC) during encoding and/or decoding, are configured according to the invention with an enhanced binarization of non-zero Delta-QP (dQP). During binarization the value of dQP and the sign are separately encoded using unary coding and then combined into a binary string which also contains the dQP non-zero flag. This invention capitalizes on the statistical symmetry of positive and negative values of dQP and results in saving bits and thus a higher coding efficiency.

Data inspection for compression/decompression configuration and data type determination
12346835 · 2025-07-01 · ·

Distribution of data in a neural network data set is used to determine an optimal compressor configuration for compressing the neural network data set and/or the underlying data type of the neural network data set. By using a generalizable optimization of examining the data prior to compressor invocation, the example non-limiting technology herein makes it possible to tune a compressor to better target the incoming data. For sparse data compression, this step may involve examining the distribution of data (e.g., in one example, zeros in the data). For other algorithms, it may involve other types of inspection. This changes the fundamental behavior of the compressor itself. By inspecting the distribution of data (e.g., zeros in the data), it also possible to very accurately predict the data width of the underlying data. This is useful because this data type is not always known a priori, and lossy compression algorithms useful for deep learning depend on knowing the true data type to achieve good compression rates.

System and method for auto-configurable data compression framework
12380967 · 2025-08-05 · ·

A method (100) for compressing and decompressing a data file, comprising: (i) receiving (120) a data file for compression comprising a plurality of different attributes; (ii) identifying (130) a first attribute of the plurality of different attributes; (iii) selecting (140) a plurality of compression types and/or configurations; (iv) compressing (150) at least some of the data from the received data file for the identified first attribute using each of the selected plurality of compression types and/or configurations; (v) determining (160) which one of the selected plurality of compression types and/or configurations is most suitable for compression; (vi) generating (170) a compression parameter data structure comprising an identification of the selected plurality of compression types and/or configurations; (vii) compressing (180) the data from the received data file for the first attribute to generate a compressed data file; and (viii) storing (190) the compression parameter data structure and the compressed data file.

Methods and apparatus to perform weight and activation compression and decompression

Methods, apparatus, systems, and articles of manufacture to perform weight and activation compression and decompression are disclosed. An example apparatus includes memory, instructions in the apparatus, and processor circuitry to execute the instructions to execute a compression operation to obtain compressed data corresponding to weights in a weight matrix, and determine meta-data associated with the weight matrix, a first portion of the meta-data indicative of whether the weight matrix is compressed, a second portion of the meta-data indicative of a cache size of the compressed data, and a third portion of the meta-data indicative of the compression operation executed to obtain the compressed data.

System and method for file type identification using machine learning

A system and method for file type identification involving extraction of a file-print of a file, the file-print being a unique or practically-unique representation of statistical characteristics associated with the distribution of bits in the binary contents of the file, similar to a fingerprint. The file-print is then passed to a machine learning algorithm that has been trained to recognize file types from their file-prints. The machine learning algorithm returns a predicted file type and, in some cases, a probability of correctness of the prediction. The file may then be encoded using an encoding algorithm chosen based on the predicted file type.

Data processing method and apparatus

This application provides a data processing method and apparatus. The method is applied to a storage system. The storage system includes a storage apparatus and a processing apparatus. The method is performed by the processing apparatus. The method includes: obtaining a tiered storage feature and a data feature of first data, where the tiered storage feature includes at least one of the following features: an importance, an access frequency, and a retention time, and the data feature includes at least one of the following features: a data type, a data dimension, a data size, or a data content feature; determining a first compression algorithm based on the tiered storage feature and the data feature; and compressing the first data based on the first compression algorithm, to obtain compressed data.

DATA INSPECTION FOR COMPRESSION/DECOMPRESSION CONFIGURATION AND DATA TYPE DETERMINATION
20260010809 · 2026-01-08 ·

Distribution of data in a neural network data set is used to determine an optimal compressor configuration for compressing the neural network data set and/or the underlying data type of the neural network data set. By using a generalizable optimization of examining the data prior to compressor invocation, the example non-limiting technology herein makes it possible to tune a compressor to better target the incoming data. For sparse data compression, this step may involve examining the distribution of data (e.g., in one example, zeros in the data). For other algorithms, it may involve other types of inspection. This changes the fundamental behavior of the compressor itself. By inspecting the distribution of data (e.g., zeros in the data), it also possible to very accurately predict the data width of the underlying data. This is useful because this data type is not always known a priori, and lossy compression algorithms useful for deep learning depend on knowing the true data type to achieve good compression rates.