G06F12/0897

Machine learning to improve caching efficiency in a storage system

A system and method improve caching efficiency in a data storage system by performing machine learning processes on metadata relating to extents of data blocks, rather than individual blocks themselves. Thus, once the storage devices are divided into extents, various metadata regarding access to the blocks within each extent are aggregated, and per-extent features are extracted. These features are used to train a data regression model that is subsequently used to infer a most likely “hotness” value for each extent at a future time. These predicted values, which may be further classified as e.g. “hot”, “warm”, and “cold” using thresholds, are used to implement the cache replacement policy. Embodiments scale to large and multi-layered caches, and may avoid common caching problems like thrashing, by adjusting the extent size. Policy goal functions may be optimized by dynamically adjusting the classification thresholds.

STREAMING ENGINE WITH STREAM METADATA SAVING FOR CONTEXT SWITCHING
20230004391 · 2023-01-05 ·

A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces addresses of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. Stream metadata is stored in response to a stream store instruction. Stored stream metadata is restored to the stream engine in response to a stream restore instruction. An interrupt changes an open stream to a frozen state discarding stored stream data. A return from interrupt changes a frozen stream to an active state.

STREAMING ENGINE WITH STREAM METADATA SAVING FOR CONTEXT SWITCHING
20230004391 · 2023-01-05 ·

A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces addresses of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. Stream metadata is stored in response to a stream store instruction. Stored stream metadata is restored to the stream engine in response to a stream restore instruction. An interrupt changes an open stream to a frozen state discarding stored stream data. A return from interrupt changes a frozen stream to an active state.

Resource-aware compression

Systems, apparatuses, and methods for implementing a multi-tiered approach to cache compression are disclosed. A cache includes a cache controller, light compressor, and heavy compressor. The decision on which compressor to use for compressing cache lines is made based on certain resource availability such as cache capacity or memory bandwidth. This allows the cache to opportunistically use complex algorithms for compression while limiting the adverse effects of high decompression latency on system performance. To address the above issue, the proposed design takes advantage of the heavy compressors for effectively reducing memory bandwidth in high bandwidth memory (HBM) interfaces as long as they do not sacrifice system performance. Accordingly, the cache combines light and heavy compressors with a decision-making unit to achieve reduced off-chip memory traffic without sacrificing system performance.

Resource-aware compression

Systems, apparatuses, and methods for implementing a multi-tiered approach to cache compression are disclosed. A cache includes a cache controller, light compressor, and heavy compressor. The decision on which compressor to use for compressing cache lines is made based on certain resource availability such as cache capacity or memory bandwidth. This allows the cache to opportunistically use complex algorithms for compression while limiting the adverse effects of high decompression latency on system performance. To address the above issue, the proposed design takes advantage of the heavy compressors for effectively reducing memory bandwidth in high bandwidth memory (HBM) interfaces as long as they do not sacrifice system performance. Accordingly, the cache combines light and heavy compressors with a decision-making unit to achieve reduced off-chip memory traffic without sacrificing system performance.

METHOD AND SYSTEM FOR TRACKING STATE OF CACHE LINES

The state of cache lines transferred into an out of caches of processing hardware is tracked by monitoring hardware. The method of tracking includes monitoring the processing hardware for cache coherence events on a coherence interconnect between the processing hardware and monitoring hardware, determining that the state of a cache line has changed, and updating a hierarchical data structure to indicate the change in the state of said cache line. The hierarchical data structure includes a first level data structure including first bits, and a second level data structure including second bits, each of the first bits associated with a group of second bits. The step of updating includes setting one of the first bits and one of the second bits in the group corresponding to the first bit that is being set, according to an address of said cache line.

METHOD AND SYSTEM FOR TRACKING STATE OF CACHE LINES

The state of cache lines transferred into an out of caches of processing hardware is tracked by monitoring hardware. The method of tracking includes monitoring the processing hardware for cache coherence events on a coherence interconnect between the processing hardware and monitoring hardware, determining that the state of a cache line has changed, and updating a hierarchical data structure to indicate the change in the state of said cache line. The hierarchical data structure includes a first level data structure including first bits, and a second level data structure including second bits, each of the first bits associated with a group of second bits. The step of updating includes setting one of the first bits and one of the second bits in the group corresponding to the first bit that is being set, according to an address of said cache line.

Handling memory requests

A converter module is described which handles memory requests issued by a cache (e.g. an on-chip cache), where these memory requests include memory addresses defined within a virtual memory space. The converter module receives these requests, issues each request with a transaction identifier and uses that identifier to track the status of the memory request. The converter module sends requests for address translation to a memory management unit and where there the translation is not available in the memory management unit receives further memory requests from the memory management unit. The memory requests are issued to a memory via a bus and the transaction identifier for a request is freed once the response has been received from the memory. When issuing memory requests onto the bus, memory requests received from the memory management unit may be prioritized over those received from the cache.

Handling memory requests

A converter module is described which handles memory requests issued by a cache (e.g. an on-chip cache), where these memory requests include memory addresses defined within a virtual memory space. The converter module receives these requests, issues each request with a transaction identifier and uses that identifier to track the status of the memory request. The converter module sends requests for address translation to a memory management unit and where there the translation is not available in the memory management unit receives further memory requests from the memory management unit. The memory requests are issued to a memory via a bus and the transaction identifier for a request is freed once the response has been received from the memory. When issuing memory requests onto the bus, memory requests received from the memory management unit may be prioritized over those received from the cache.

Data transfer in port switch memory
11531490 · 2022-12-20 · ·

The present disclosure includes apparatuses and methods related to data transfer in memory. An example apparatus can include a first number of memory devices coupled to a host via a first number of ports and a second number of memory devices coupled to the first number of memory device via a second number of ports, wherein a first number of commands are executed to transfer data between the first number of memory devices and the host via the first number of ports and a second number of commands are executed to transfer data between the first number of memory device and the second number of memory device via the second number of ports.