G06F12/0824

Cache stashing system
20210397560 · 2021-12-23 ·

In one embodiment, a computer server system includes a memory to store data across memory locations, multiple processing cores including respective local caches in which to cache cache-lines read from the memory, an interconnect to manage read and write operations of the memory and local caches, maintain local cache location data of the cached cache-lines according to respective ones of the memory locations from which the cached cache-lines were read from the memory, receive a write request for a data element to be written to one of the memory locations, find a local cache location in which to write the data element responsively to the local cache location data and the memory location of the write request, and send an update request to a first processing core to update a respective first local cache with the data element responsively to the found local cache location.

Caching data from remote memories

An approach is disclosed that caches distant memories within the storage a local node. The approach provides a memory caching infrastructure that supports virtual addressing by utilizing memory in the local node as a cache of distant memories for data granules. The data granules are accessed along with metadata and an ECC associated with the data granule. The metadata is updated to indicate storage of the selected data granule in the cache.

EFFICIENT ERASURE-CODED STORAGE IN DISTRIBUTED DATA SYSTEMS

Techniques for efficiently storing client data blocks on a distributed-computing system are provided. The system includes a fast performance tier and a large capacity tier. The capacity tier stores the client data blocks in erasure encoded data stripes. The performance tier stores logical map data including an address map indicating a correspondence between logical addresses associated with a first layer of the system and physical addresses associated with a second layer. A method includes receiving a request to include additional client data blocks in the client blocks. The request indicates logical addresses for additional blocks. Corresponding physical addresses for additional block are determined. Each additional block is stored at the physical address. Additional logical map data is stored in the performance tier. Storing the additional logical map data includes updating the address map to indicate the correspondence between the logical addresses and the physical addresses for the additional blocks.

Merging data for write allocate

A method includes receiving, by a level two (L2) controller, a write request for an address that is not allocated as a cache line in a L2 cache. The write request specifies write data. The method also includes generating, by the L2 controller, a read request for the address; reserving, by the L2 controller, an entry in a register file for read data returned in response to the read request; updating, by the L2 controller, a data field of the entry with the write data; updating, by the L2 controller, an enable field of the entry associated with the write data; and receiving, by the L2 controller, the read data and merging the read data into the data field of the entry.

Directory processing method and apparatus, and storage system

A directory processing method and apparatus are provided to resolve a problem that a directory occupies a relatively large quantity of caches in an existing directory processing solution. The method includes: receiving, by a first data node, a first request sent by a second data node; searching for, by the first data node, a matched directory entry in a directory of the first data node based on tag information and index information in a first physical address; creating, when no matched directory entry is found, a first directory entry of the directory based on the first request, where the first directory entry includes the tag information, first indication information, first pointer information, and first status information, the first pointer information is used to indicate that data in the memory address corresponding to the indication bit that is set to valid is read by the second data node.

Caching techniques

Techniques for caching may include: determining an update to a first data page of a first cache on a first node, wherein a second node includes a second cache and wherein the second cache includes a copy of the first data page; determining, in accordance with one or more criteria, whether to send the update from the first node to the second node; responsive to determining, in accordance with the one or more criteria, to send the update, sending the update from the first node to the second node; and responsive to determining not to send the update, sending an invalidate request from the first node to the second node, wherein the invalidate request instructs the second node to invalidate the copy of the first data page stored in the second cache of the second node.

Local cached data coherency in host devices using remote direct memory access

A first host device establishes connectivity to a logical storage device of a storage system. The first host device obtains from the storage system host connectivity information identifying at least a second host device that has also established connectivity to the logical storage device, caches one or more extents of the logical storage device in a memory of the first host device, and maintains local cache metadata in the first host device regarding the one or more extents of the logical storage device cached in the memory of the first host device. In conjunction with processing of a write operation of the first host device involving at least one of the one or more cached extents of the logical storage device, the first host device invalidates corresponding entries in the local cache metadata of the first host device and in local cache metadata maintained in the second host device.

Global coherence operations

A method includes receiving, by a L2 controller, a request to perform a global operation on a L2 cache and preventing new blocking transactions from entering a pipeline coupled to the L2 cache while permitting new non-blocking transactions to enter the pipeline. Blocking transactions include read transactions and non-victim write transactions. Non-blocking transactions include response transactions, snoop transactions, and victim transactions. The method further includes, in response to an indication that the pipeline does not contain any pending blocking transactions, preventing new snoop transactions from entering the pipeline while permitting new response transactions and victim transactions to enter the pipeline; in response to an indication that the pipeline does not contain any pending snoop transactions, preventing, all new transactions from entering the pipeline; and, in response to an indication that the pipeline does not contain any pending transactions, performing the global operation on the L2 cache.

SYSTEMS AND METHODS FOR IMPLEMENTING COHERENT MEMORY IN A MULTIPROCESSOR SYSTEM

Data units are stored in private caches in nodes of a multiprocessor system, each node containing at least one processor (CPU), at least one cache private to the node and at least one cache location buffer (CLB) private to the node. In each CLB location information values are stored, each location information value indicating a location associated with a respective data unit, wherein each location information value stored in a given CLB indicates the location to be either a location within the private cache disposed in the same node as the given CLB, to be a location in one of the other nodes, or to be a location in a main memory. Coherence of values of the data units is maintained using a cache coherence protocol. The location information values stored in the CLBs are updated by the cache coherence protocol in accordance with movements of their respective data units.

Distributed coherence directory subsystem with exclusive data regions

A processing system includes a first set of one or more processing units including a first processing unit, a second set of one or more processing units including a second processing unit, and a memory having an address space shared by the first and second sets. The processing system further includes a distributed coherence directory subsystem having a first coherence directory to support a first subset of one or more address regions of the address space and a second coherence directory to support a second subset of one or more address regions of the address space. In some implementations, the first coherence directory is implemented in the system so as to have a lower access latency for the first set, whereas the second coherence directory is implemented in the system so as to have a lower access latency for the second set.