H03M7/6017

Method and system for compressing application data for operations on multi-core systems
11599367 · 2023-03-07 · ·

A system and method to compress application control data, such as weights for a layer of a convolutional neural network, is disclosed. A multi-core system for executing at least one layer of the convolutional neural network includes a storage device storing a compressed weight matrix of a set of weights of the at least one layer of the convolutional network and a decompression matrix. The compressed weight matrix is formed by matrix factorization and quantization of a floating point value of each weight to a floating point format. A decompression module is operable to obtain an approximation of the weight values by decompressing the compressed weight matrix through the decompression matrix. A plurality of cores executes the at least one layer of the convolutional neural network with the approximation of weight values to produce an inference output.

Storage device accelerator providing aggregation of divided plaintext data read

The storage device includes a first memory, a process device that stores data in the first memory and reads the data from the first memory, and an accelerator that includes a second memory different from the first memory. The accelerator stores compressed data stored in one or more storage drives storing data, in the second memory, decompresses the compressed data stored in the second memory to generate plaintext data, extracts data designated in the process device from the plaintext data, and transmits the extracted designated data to the first memory.

Methods, devices and systems for efficient compression and decompression for higher throughput

A decompression system has a plurality of decompression devices in an array or chain layout for decompressing respective compressed data values of a compressed data block. A first decompression device is connected to a next decompression device, and a last decompression device is connected to a preceding decompression device. The first decompression device decompresses a compressed data value and reduces the compressed data block by extracting a codeword of the compressed data value and removing the compressed data value from the compressed data block, retrieving a decompressed data value out of the extracted codeword, and passing the reduced compressed data block to the next decompression device. The last decompression device receives a reduced compressed data block from the preceding decompression device and decompresses another compressed data value by extracting a codeword of the other compressed data value, and retrieving another decompressed data value out of the extracted codeword. Elected for publication; FIG. 8.

STORAGE DEVICE

The storage device includes a first memory, a process device that stores data in the first memory and reads the data from the first memory, and an accelerator that includes a second memory different from the first memory. The accelerator stores compressed data stored in one or more storage drives storing data, in the second memory, decompresses the compressed data stored in the second memory to generate plaintext data, extracts data designated in the process device from the plaintext data, and transmits the extracted designated data to the first memory.

CLOUD-BASED SCALE-UP SYSTEM COMPOSITION

Technologies for composing a managed node with multiple processors on multiple compute sleds to cooperatively execute a workload include a memory, one or more processors connected to the memory, and an accelerator. The accelerator further includes a coherence logic unit that is configured to receive a node configuration request to execute a workload. The node configuration request identifies the compute sled and a second compute sled to be included in a managed node. The coherence logic unit is further configured to modify a portion of local working data associated with the workload on the compute sled in the memory with the one or more processors of the compute sled, determine coherence data indicative of the modification made by the one or more processors of the compute sled to the local working data in the memory, and send the coherence data to the second compute sled of the managed node.

System and method to use dictionaries in LZ4 block format compression

An information handling system for compressing data includes a data storage device and a processor. The data storage device stores a dictionary and an uncompressed data block. The processor prepends the dictionary to the uncompressed data block, determines, from the uncompressed data block, a literal data string and a match data string where the match data string is a matching entry of the dictionary, and compresses the uncompressed data block into a compressed data block that includes the literal data string and an offset pointer that points to the matching entry.

DATA COMPRESSION APPARATUS, DATA DECOMPRESSION APPARATUS, DATA COMPRESSION METHOD, DATA DECOMPRESSION METHOD, AND COMPUTER READABLE MEDIUM
20170338834 · 2017-11-23 · ·

A data compression apparatus of the invention includes a data acquisition unit to acquire n integers from encoding data, an integer division unit to divide each integer of the n integers into a second integer represented by low-order bits whose number of divided bits is b and a first integer represented by high-order bits obtained by excluding the low-order bits from each integer of the n integers and to output n first integers and n second integers, a first encoding unit to encode and output the n first integers as a first code represented by binary data having a number of bits that is a natural-number times the number of unit bits of L, and a second encoding unit to encode and output the n second integers as a second code.

Method and apparatus for hybrid compression processing for high levels of compression

In one embodiment, an apparatus comprises a first compression engine to receive a first compressed data block from a second compression engine that is to generate the first compressed data block by compressing a first plurality of repeated instances of data that each have a length greater than or equal to a first length. The first compression engine is further to compress a second plurality of repeated instances of data of the first compressed data block that each have a length greater than or equal to a second length, the second length being shorter than the first length, wherein each compressed repeated instance of the first and second pluralities of repeated instances comprises a location and length of a data instance that is repeated. The apparatus further comprises a memory buffer to store the compressed first and second plurality of repeated instances of data.

Low-Latency Encoding Using a Bypass Sub-Stream and an Entropy Encoded Sub-Stream
20220360280 · 2022-11-10 · ·

A system comprises an encoder configured to entropy encode a bitstream comprising both compressible and non-compressible symbols. The encoder parses the bitstream into a compressible symbol sub-stream and a non-compressible sub-stream. The non-compressible symbol sub-stream bypass an entropy encoding component of the encoder while the compressible symbol sub-stream is entropy encoded. When a quantity of bytes of entropy encoded symbols and bypass symbols is accumulated a chunk of fixed or known size is formed using the accumulated entropy encoded symbol bytes and the bypass bytes without waiting on the full bitstream to be processed by the encoder. In a complementary manner, a decoder reconstructs the bitstream from the packets or chunks.

Techniques for scaling dictionary-based compression

Accesses between a processor and its external memory is reduced when the processor internally maintains a compressed version of values stored in the external memory. The processor can then refer to the compressed version rather than access the external memory. One compression technique involves maintaining a dictionary on the processor mapping portions of a memory to values. When all of the values of a portion of memory are uniform (e.g., the same), the value is stored in the dictionary for that portion of memory. Thereafter, when the processor needs to access that portion of memory, the value is retrieved from the dictionary rather than from external memory. Techniques are disclosed herein to extend, for example, the capabilities of such dictionary-based compression so that the amount of accesses between the processor and its external memory are further reduced.