G06F12/0886

COMPRESSION AWARE PREFETCH

Methods, devices, and systems for prefetching data. First data is loaded from a first memory location. The first data in cached in a cache memory. Other data is prefetched to the cache memory based on a compression of the first data and a compression of the other data. In some implementations, the compression of the first data and the compression of the other data are determined based on metadata associated with the first data and metadata associated with the other data. In some implementations, the other data is prefetched to the cache memory based on a total of a compressed size of the first data and a compressed size of the other data being less than a threshold size. In some implementations, the other data is not prefetched to the cache memory based on the other data being uncompressed.

Apparatus and method for considering spatial locality in loading data elements for execution

In one embodiment of the invention, a processor comprising an upper level cache and at least one processor core. The at least one processor core includes one or more registers and a plurality of instruction processing stages: a decode unit to decode an instruction requiring an input of a plurality of data elements, wherein a size of each of the plurality of data elements is less than a cache line size of the processor; an execution unit to load the plurality of data elements to the one or more registers of the processor, without loading data elements spatially adjacent to the plurality of data elements or the plurality of data elements in an upper level cache.

Apparatus and method for considering spatial locality in loading data elements for execution

In one embodiment of the invention, a processor comprising an upper level cache and at least one processor core. The at least one processor core includes one or more registers and a plurality of instruction processing stages: a decode unit to decode an instruction requiring an input of a plurality of data elements, wherein a size of each of the plurality of data elements is less than a cache line size of the processor; an execution unit to load the plurality of data elements to the one or more registers of the processor, without loading data elements spatially adjacent to the plurality of data elements or the plurality of data elements in an upper level cache.

MULTI-RESOLUTION CACHE
20230176969 · 2023-06-08 ·

A multi-resolution cache includes a first, second and third cache segments the first segment having a first resolution and the second and third segments having a second resolution, the second resolution less than the first resolution, the first and third cache segments communicatively coupled to an off-chip memory, the first and third cache segments configured to each receive a cache line of data having the first and second resolutions, a fourth and fifth cache segments having the second resolution, a first downscaler communicatively coupled to the first and fourth cache segments configured to reduce the resolution when a first resolution cached data is shifted from the first cache segment to the fourth cache segment, a first upscaler communicatively coupled to the all cache segments that have the second resolution, and is configured to increase the reduced resolution cached data to the first resolution and output it.

REDUCING MEMORY ACCESS BANDWIDTH BASED ON PREDICTION OF MEMORY REQUEST SIZE

Systems and methods for managing memory access bandwidth include a spatial locality predictor. The spatial locality predictor includes a memory region table with prediction counters associated with memory regions of a memory. When cache lines are evicted from a cache, the sizes of the cache lines which were accessed by a processor are used for updating the prediction counters. Depending on values of the prediction counters, the sizes of cache lines which are likely to be used the processor predicted for the corresponding memory regions. Correspondingly, the memory access bandwidth between the processor and the memory may be reduced to fetch a smaller size data than a full cache line if the size of the cache line likely to be used is predicted to be less than that of the full cache line.

DISASSOCIATING MEMORY UNITS WITH A HOST SYSTEM
20220050775 · 2022-02-17 ·

A command pertaining to a non-volatile memory device on a memory sub-system is received from a host system. A portion of the non-volatile memory device has an association with the host system. In response to determining that the command is a dissociate instruction to dissociate the portion of the non-volatile memory device on the memory sub-system with the host system, remove the association of the portion of the non-volatile memory device on the memory sub-system with the host system.

DISASSOCIATING MEMORY UNITS WITH A HOST SYSTEM
20220050775 · 2022-02-17 ·

A command pertaining to a non-volatile memory device on a memory sub-system is received from a host system. A portion of the non-volatile memory device has an association with the host system. In response to determining that the command is a dissociate instruction to dissociate the portion of the non-volatile memory device on the memory sub-system with the host system, remove the association of the portion of the non-volatile memory device on the memory sub-system with the host system.

Dynamic clustering-based data compression

Methods, systems, and techniques for data compression. A cluster fingerprint of an uncompressed data block is determined to correspond to a cluster fingerprint of a base block stored in a base array. This determining involves looking up the cluster fingerprint of the first base block from the base array using the cluster fingerprint of the first uncompressed data block. The difference between the uncompressed data block and the base block is determined, and a compressed data block is encoded using this difference. The compressed data block is then stored in a data array.

Method for implementing a line speed interconnect structure
09740499 · 2017-08-22 · ·

A method for line speed interconnect processing. The method includes receiving initial inputs from an input communications path, performing a pre-sorting of the initial inputs by using a first stage interconnect parallel processor to create intermediate inputs, and performing the final combining and splitting of the intermediate inputs by using a second stage interconnect parallel processor to create resulting outputs. The method further includes transmitting the resulting outputs out of the second stage at line speed.

Memory controllers employing memory capacity and/or bandwidth compression with next read address prefetching, and related processor-based systems and methods

Memory controllers employing memory capacity and/or bandwidth compression with next read address prefetching, and related processor-based systems and methods are disclosed. In certain aspects, memory controllers are employed that can provide memory capacity compression. In certain aspects disclosed herein, a next read address prefetching scheme can be used by a memory controller to speculatively prefetch data from system memory at another address beyond the currently accessed address. Thus, when memory data is addressed in the compressed memory, if the next read address is stored in metadata associated with the memory block at the accessed address, the memory data at the next read address can be prefetched by the memory controller to be available in case a subsequent read operation issued by a central processing unit (CPU) has been prefetched by the memory controller.