G06F12/0824

LOW LATENCY CACHE FOR NON-VOLATILE MEMORY IN A HYBRID DIMM

Systems and methods are disclosed including a first memory device, a second memory device coupled to the first memory device, where the second memory device has a lower access latency than the first memory device and acts as a cache for the first memory device. A processing device operatively coupled to the first and second memory devices can track access statistics of segments of data stored at the second memory device, the segments having a first granularity, and determine to update, based on the access statistics, a segment of data stored at the second memory device from the first granularity to a second granularity. The processing device can further retrieve additional data associated with the segment of data from the first memory device and store the additional data at the second memory device to form a new segment having the second granularity.

GLOBAL COHERENCE OPERATIONS

A method includes receiving, by a L2 controller, a request to perform a global operation on a L2 cache and preventing new blocking transactions from entering a pipeline coupled to the L2 cache while permitting new non-blocking transactions to enter the pipeline. Blocking transactions include read transactions and non-victim write transactions. Non-blocking transactions include response transactions, snoop transactions, and victim transactions. The method further includes, in response to an indication that the pipeline does not contain any pending blocking transactions, preventing new snoop transactions from entering the pipeline while permitting new response transactions and victim transactions to enter the pipeline; in response to an indication that the pipeline does not contain any pending snoop transactions, preventing, all new transactions from entering the pipeline; and, in response to an indication that the pipeline does not contain any pending transactions, performing the global operation on the L2 cache.

VOLATILE READ CACHE IN A CONTENT ADDRESSABLE STORAGE SYSTEM
20210034538 · 2021-02-04 · ·

A distributed storage system comprises a first module and a second module. The first module processes read requests for an address range, to send to the second module. The first module receives an address associated with a read request for a data page stored on the second module. A method searches a table on the first module for a content-based signature of the data page based on the address and provides the data page from a first module read cache if the content-based signature is in the read cache, where content-based signatures in the table are associated with the address range.

EFFICIENT CACHE EVICTION AND INSERTIONS FOR SUSTAINED STEADY STATE PERFORMANCE

A distributed metadata cache for a distributed object store includes a plurality of cache entries, an active-cache-entry set and an unreferenced-cache-entry set. Each cache entry includes information relating to whether at least one input/output (IO) thread is referencing the cache entry and information relating to whether the cache entry is no longer referenced by at least one IO thread. Each cache entry in the active-cache-entry set includes information that indicates that at least one IO thread is actively referencing the cache entry. Each cache entry in the unreferenced-cache-entry set is eligible for eviction from the distributed metadata cache by including information that indicates that the cache entry is no longer actively referenced by an IO thread.

FAULT TOLERANCE AND COHERENCE FOR SHARED MEMORY

A system includes at least one memory controller that partitions at least one memory into a plurality of nodes. Blast zones are formed that each include a predetermined number of nodes. Cache lines are erasure encoded to be stored in one or more blast zones with at least two nodes in a blast zone storing respective portions of a cache line and at least one node in the blast zone storing a parity portion. In one aspect, it is determined that data stored in one or more nodes of a blast zone needs to be reconstructed and stored in one or more spare nodes designated to replace the one or more nodes. Erasure decoding is performed using data from one or more other nodes in the blast zone to reconstruct the data for storage in the one or more spare nodes.

MERGING DATA FOR WRITE ALLOCATE
20240004694 · 2024-01-04 ·

A method includes receiving, by a level two (L2) controller, a write request for an address that is not allocated as a cache line in a L2 cache. The write request specifies write data. The method also includes generating, by the L2 controller, a read request for the address; reserving, by the L2 controller, an entry in a register file for read data returned in response to the read request; updating, by the L2 controller, a data field of the entry with the write data; updating, by the L2 controller, an enable field of the entry associated with the write data; and receiving, by the L2 controller, the read data and merging the read data into the data field of the entry.

Method and system for distributed storage using client-side global persistent cache
10884926 · 2021-01-05 · ·

One embodiment of the present invention provides a system for facilitating a distributed storage system. The system receives, by a first client-serving machine, a first request to write data. The system writes the data to a first persistent cache associated with the first client-serving machine, wherein a persistent cache includes non-volatile memory. The system records, in an entry in a global data structure, a status for the data prior to completing a write operation for the data in a storage server, wherein the status indicates that the data has been stored in the first persistent cache but has not yet been stored in the storage server.

Providing dead-block prediction for determining whether to cache data in cache devices
10877890 · 2020-12-29 · ·

Provided are an apparatus and system to cache data in a first cache and a second cache that cache data from a shared memory in a local processor node, wherein the shared memory is accessible to at least one remote processor node. A cache controller writes a block to the second cache in response to determining that the block is more likely to be accessed by the local processor node than a remote processor node. The first cache controller writes the block to the shared memory in response to determining that the block is more likely to be accessed by the one of the at least one remote processor node than the local processor node without writing to the second cache.

Method, apparatus, and system for prefetching exclusive cache coherence state for store instructions

A method, apparatus, and system for prefetching exclusive cache coherence state for store instructions is disclosed. An apparatus may comprise a cache and a gather buffer coupled to the cache. The gather buffer may be configured to store a plurality of cache lines, each cache line of the plurality of cache lines associated with a store instruction. The gather buffer may be further configured to determine whether a first cache line associated with a first store instruction should be allocated in the cache. If the first cache line associated with the first store instruction is to be allocated in the cache, the gather buffer is configured to issue a pre-write request to acquire exclusive cache coherency state to the first cache line associated with the first store instruction.

Cuckoo caching
10877902 · 2020-12-29 · ·

A cuckoo cache has plural buckets of plural cells each. The cells within a bucket are ranked to approximate relative usage recency. New items can be inserted into empty cells; when a bucket is full, room for a new item can be made by laterally transferring an older item to an alternative bucket. When empty cells and lateral transfers are unavailable, an item is selected for eviction based on the usage recency rank of the containing cell. When a match is found, depending on the embodiment, the hit item can be promoted within its bucket, to its alternative bucket, or to a separate tier of the cuckoo cache. The items can be key-value pairs. No metadata is required to track usage recency so that the cuckoo cache can be a very space efficient tool for finding cached values by their keys.