G06F12/0824

COHERENCE-BASED CACHE-LINE COPY-ON-WRITE
20230023256 · 2023-01-26 ·

A method of performing a copy-on-write on a shared memory page is carried out by a device communicating with a processor via a coherence interconnect. The method includes: adding a page table entry so that a request to read a first cache line of the shared memory page includes a cache-line address of the shared memory page and a request to write to a second cache line of the shared memory page includes a cache-line address of a new memory page; in response to the request to write to the second cache line, storing new data of the second cache line in a second memory and associating the second cache-line address with the new data stored in the second memory; and in response to a request to read the second cache line, reading the new data of the second cache line from the second memory.

Hardware coherent computational expansion memory
11556344 · 2023-01-17 · ·

Embodiments herein describe transferring ownership of data (e.g., cachelines or blocks of data comprising multiple cachelines) from a host to hardware in an I/O device. In one embodiment, the host and I/O device (e.g., an accelerator) are part of a cache-coherent system where ownership of data can be transferred from a home agent (HA) in the host to a local HA in the I/O device—e.g., a computational slave agent (CSA). That way, a function on the I/O device (e.g., an accelerator function) can request data from the local HA without these requests having to be sent to the host HA. Further, the accelerator function can indicate whether the local HA tracks the data on a cacheline-basis or by a data block (e.g., multiple cachelines). This provides flexibility that can reduce overhead from tracking the data, depending on the function's desired use of the data.

Scalable cache coherency protocol

A scalable cache coherency protocol for system including a plurality of coherent agents coupled to one or more memory controllers is described. The memory controller may implement a precise directory for cache blocks from the memory to which the memory controller is coupled. Multiple requests to a cache block may be outstanding, and snoops and completions for requests may include an expected cache state at the receiving agent, as indicated by a directory in the memory controller when the request was processed, to allow the receiving agent to detect race conditions. In an embodiment, the cache states may include a primary shared and a secondary shared state. The primary shared state may apply to a coherent agent that bears responsibility for transmitting a copy of the cache block to a requesting agent. In an embodiment, at least two types of snoops may be supported: snoop forward and snoop back.

Pipeline arbitration

A method includes receiving, by a first stage in a pipeline, a first transaction from a previous stage in pipeline; in response to first transaction comprising a high priority transaction, processing high priority transaction by sending high priority transaction to a buffer; receiving a second transaction from previous stage; in response to second transaction comprising a low priority transaction, processing low priority transaction by monitoring a full signal from buffer while sending low priority transaction to buffer; in response to full signal asserted and no high priority transaction being available from previous stage, pausing processing of low priority transaction; in response to full signal asserted and a high priority transaction being available from previous stage, stopping processing of low priority transaction and processing high priority transaction; and in response to full signal being de-asserted, processing low priority transaction by sending low priority transaction to buffer.

REGION BASED SPLIT-DIRECTORY SCHEME TO ADAPT TO LARGE CACHE SIZES

Systems, apparatuses, and methods for maintaining region-based cache directories split between node and memory are disclosed. The system with multiple processing nodes includes cache directories split between the nodes and memory to help manage cache coherency among the nodes' cache subsystems. In order to reduce the number of entries in the cache directories, the cache directories track coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Each processing node includes a node-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the node. The node-based cache directory includes a reference count field in each entry to track the aggregate number of cache lines that are cached per region. The memory-based cache directory includes entries for regions which have an entry stored in any node-based cache directory of the system.

GLOBAL COHERENCE OPERATIONS

A method includes receiving, by a L2 controller, a request to perform a global operation on a L2 cache and preventing new blocking transactions from entering a pipeline coupled to the L2 cache while permitting new non-blocking transactions to enter the pipeline. Blocking transactions include read transactions and non-victim write transactions. Non-blocking transactions include response transactions, snoop transactions, and victim transactions. The method further includes, in response to an indication that the pipeline does not contain any pending blocking transactions, preventing new snoop transactions from entering the pipeline while permitting new response transactions and victim transactions to enter the pipeline; in response to an indication that the pipeline does not contain any pending snoop transactions, preventing, all new transactions from entering the pipeline; and, in response to an indication that the pipeline does not contain any pending transactions, performing the global operation on the L2 cache.

Method and apparatus for accelerating deduplication processing
11409667 · 2022-08-09 · ·

A deduplication engine maintains a hash table containing hash values of tracks of data stored on managed drives of a storage system. The deduplication engine keeps track of how frequently the tracks are accessed by the deduplication engine using an exponential moving average for each track. Target tracks which are frequently accessed by the deduplication engine are cached in local memory, so that required byte-by-byte comparisons between the target track and write data may be performed locally rather than requiring the target track to be read from managed drives. The deduplication engine implements a Least Recently Used (LRU) cache data structure in local memory to manage locally cached tracks of data. If a track is to be removed from local memory, a final validation of the target track is implemented on the version stored in managed resources before evicting the track from the LRU cache.

Maintaining and recomputing reference counts in a persistent memory file system

Techniques are provided for maintaining and recomputing reference counts in a persistent memory file system of a node. Primary reference counts are maintained for pages within persistent memory of the node. In response to receiving a first operation to link a page into a persistent memory file system of the persistent memory, a primary reference count of the page is incremented before linking the page into the persistent memory file system. In response to receiving a second operation to unlink the page from the persistent memory file system, the page is unlinked from the persistent memory file system before the primary reference count is decremented. Upon the node recovering from a crash, the persistent memory file system is traversed in order to update shadow reference counts for the pages with correct reference count values, which are used to overwrite the primary reference counts with the correct reference count values.

METHOD AND SYSTEM FOR ESTABLISHING A DISTRIBUTED NETWORK WITHOUT A CENTRALIZED DIRECTORY
20220283945 · 2022-09-08 · ·

A method for establishing a connection between two nodes in a communication network without use of a centralized directory or mapping identifiers includes: receiving a lookup message from another node in the communication network that includes a lookup term; determining if a target node in a local directory cache can be identified that satisfies the lookup term; and, if such a node is identified, establishing a connection to the target node and forwarding the lookup message, or, if no such node is identified, forwarding the lookup message to other nodes in the network with which the node has an active communication connection.

Method for Processing Non-Cache Data Write Request, Cache, and Node
20220276960 · 2022-09-01 ·

A method for processing a non-cache data write request, a cache, and a node are provided. The method includes: A cache receives a first non-cache data write request from a first processor, and sends the first non-cache data write request to a node, where the first non-cache data write request includes a first address. If the cache determines that the first address is stored in the cache, the cache obtains first data corresponding to the first non-cache data write request from the first processor. When receiving a first data buffer identifier from the node, the cache sends the first data to the node. After receiving the first non-cache data write request, if the cache determines that the first address is locally stored, the cache may obtain the first data from the processor. After receiving the first data buffer identifier, the cache may send the first data to the node.