G06F12/0886

DATA COMPRESSION API
20220365829 · 2022-11-17 ·

Apparatuses, systems, and techniques to indicate storage to be compressed. In at least one embodiment, an application programming interface is performed to indicate storage to store information to be compressed.

Flexible dictionary sharing for compressed caches

Systems, apparatuses, and methods for implementing flexible dictionary sharing techniques for caches are disclosed. A set-associative cache includes a dictionary for each data array set. When a cache line is to be allocated in the cache, a cache controller determines to which set a base index of the cache line address maps. Then, a selector unit determines which dictionary of a group of dictionaries stored by those sets neighboring this set would achieve the most compression for the cache line. This dictionary is then selected to compress the cache line. An offset is added to the base index of the cache line to generate a full index in order to map the cache line to the set corresponding to this chosen dictionary. The compressed cache line is stored in this set with the chosen dictionary, and the offset is stored in the corresponding tag array entry.

Cache arrangements for data processing systems

A data processing system is provided comprising a cache system configured to transfer data between a processor and memory system. The cache system comprises a cache. When a block of data that is stored in the memory in a compressed form is to be loaded into the cache, the block of data is stored into a group of one or more cache lines of the cache and the associated compression metadata for the compressed block of data is provided as separate side band data.

Pseudo-first in, first out (FIFO) tag line replacement

A method is provided that includes searching tags in a tag group comprised in a tagged memory system for an available tag line during a clock cycle, wherein the tagged memory system includes a plurality of tag lines having respective tags and wherein the tags are divided into a plurality of non-overlapping tag groups, and searching tags in a next tag group of the plurality of tag groups for an available tag line during a next clock cycle when the searching in the tag group does not find an available tag line.

LOOK-UP TABLE READ

A digital data processor includes an instruction memory storing instructions specifying data processing operations and a data operand field, an instruction decoder coupled to the instruction memory for recalling instructions from the instruction memory and determining the operation and the data operand, and an operational unit coupled to a data register file and an instruction decoder to perform an operation upon an operand corresponding to an instruction decoded by the instruction decoder and storing results of the operation. The operational unit is configured to perform a table recall in response to a look up table read instruction by recalling data elements from a specified location and adjacent location to the specified location, in a specified number of at least one table and storing the recalled data elements in successive slots in a destination register. Recalled data elements include at least one interpolated data element in the adjacent location.

Transparent interleaving of compressed cache lines

Low latency in a non-uniform cache access (“NUCA”) cache in a computing environment is provided. A first compressed cache line is interleaved with a second compressed cache line into a single cache line of the NUCA cache, where data of the first compressed cache line is stored in one or more even sectors in the single cache line and stored in zero or more odd sectors in the single cache line after the data fills the one or more even sectors, and data of the second compressed cache line is stored in the one or more odd sectors in the single cache line and stored in zero or more even sectors in the single cache line after the data fills the one or more odd sectors.

EFFICIENT COMPRESSED VERBATIM COPY

Compressed verbatim copy can enable more efficient copying of compressed data. In one example, a compressed verbatim copy method involves receiving a command to copy compressed data from a source address of the memory device to a destination address. In response to the receipt of the command, the method involves copying the compressed data in a compressed format from the source address to the destination address without first decompressing the data. A second source address and a second destination address of metadata for the compressed data is determined, and the metadata is copied from the second source address to the second destination address.

Physical memory compression

A memory management system includes a physical memory associated with a computing device and a memory manager. The memory manager is configured to manage a shared memory cache as part of a compression of the physical memory using a cache compression algorithm, wherein a compression block size for the compression is a single cache line size. The physical memory includes a sector translation table (STT) region and a sector memory region. The memory manager uses a memory descriptor defined by an STT entry having a cache line map and a plurality of sector pointers to load cache from the physical memory to a level 3 Cache. The cache line map contains cache line metadata including a size of each cache line, a location of the cache line in one of the sectors pointed to by the STT entry, and a plurality of flags.

SYSTEMS AND METHODS FOR TRANSFORMING DATA IN-LINE WITH READS AND WRITES TO COHERENT HOST-MANAGED DEVICE MEMORY
20220327052 · 2022-10-13 ·

The disclosed computer-implemented method may include (1) receiving, from an external host processor via a cache-coherent interconnect, a request to access a host address of a coherent memory space of the external host processor, (2) when the request is to read data from the host address, (a) performing an in-line transformation on the data to generate second data and (b) writing the second data to the physical address of the device-attached physical memory mapped to the host address, and (3) when the request is to read data from the host address, (a) reading the data from the physical address of the device-attached physical memory mapped to the host address, (b) performing a reversing in-line transformation on the data to generate second data, and (c) returning the second data to the external host processor via the cache-coherent interconnect. Various other methods, systems, and computer-readable media are also disclosed.

Systems and methods for reading and writing sparse data in a neural network accelerator

Disclosed herein includes a system, a method, and a device for reading and writing sparse data in a neural network accelerator. A plurality of slices can be established to access a memory having an access size of a data word. A first slice can be configured to access a first side of the data word in memory. Circuitry can access a mask identifying byte positions within the data word having non-zero values. The circuitry can modify the data word to have non-zero byte values stored starting at an end of the first side, and any zero byte values stored in a remainder of the data word. A determination can be made whether a number of non-zero byte values is less than or equal to a first access size of the first slice. The circuitry can write the modified data word to the memory via at least the first slice.