G06F12/0824

Link affinitization to reduce transfer latency

Examples described herein relate to processor circuitry to issue a cache coherence message to a central processing unit (CPU) cluster by selection of a target cluster and issuance of the request to the target cluster, wherein the target cluster comprises the cluster or the target cluster is directly connected to the cluster. In some examples, the selected target cluster is associated with a minimum number of die boundary traversals. In some examples, the processor circuitry is to read an address range for the cluster to identify the target cluster using a single range check over memory regions including local and remote clusters. In some examples, issuance of the cache coherence message to a cluster is to cause the cache coherence message to traverse one or more die interconnections to reach the target cluster.

Memory node controller

A memory node controller for a node of a data processing network, the network including at least one computing device and at least one data resource, each data resource addressed by a physical address. The node is configured to couple the at least one computing device with the at least one data resource. Elements of the data processing network are addressed via a system address space. The memory node controller includes a first interface to the at least one data resource, a second interface to the at least one computing device, and a system to physical address translator cache configured to translate a system address in the system address space to a physical address in the physical address space of the at least one data resource.

Fault-tolerant cache coherence over a lossy network

A cache coherence system manages both internode and intranode cache coherence in a cluster of nodes. Each node in the cluster of nodes is either a collection of processors running an intranode coherence protocol between themselves, or a single processor. A node comprises a plurality of coherence ordering units (COUs) that are hardware circuits configured to manage intranode coherence of caches within the node and/or internode coherence with caches on other nodes in the cluster. Each node contains one or more directories which tracks the state of cache line entries managed by the particular node. Each node may also contain one or more scoreboards for managing the status of ongoing transactions. The internode cache coherence protocol implemented in the COUs may be used to detect and resolve communications errors, such as dropped message packets between nodes, late message delivery at a node, or node failure. Additionally, a transport layer manages communication between the nodes in the cluster, and can additionally be used to detect and resolve communications errors.

System, Apparatus And Method For Memory Mirroring In A Buffered Memory Architecture
20190278721 · 2019-09-12 ·

In one embodiment, an apparatus includes: a first memory controller to control access to a first memory, the first memory controller including a memory mirroring circuit, in response to a memory write request from a first processor socket for which the first memory comprises a primary memory region, to cause data associated with the memory write request to be written to the first memory and to send a shadow memory write request to a second memory to cause the second memory to write the data into a secondary memory region; and a shadow memory table including a plurality of entries each to store an association between a primary memory region and a secondary memory region. The memory mirroring circuit may access the shadow memory table to identify the secondary memory region. Other embodiments are described and claimed.

Network-aware cache coherence protocol enhancement

A non-uniform memory access system includes several nodes that each have one or more processors, caches, local main memory, and a local bus that connects a node's processor(s) to its memory. The nodes are coupled to one another over a collection of point-to-point interconnects, thereby permitting processors in one node to access data stored in another node. Memory access time for remote memory takes longer than local memory because remote memory accesses have to travel across a communications network to arrive at the requesting processor. In some embodiments, inter-cache and main-memory-to-cache latencies are measured to determine whether it would be more efficient to satisfy memory access requests using cached copies stored in caches of owning nodes or from main memory of home nodes.

Adaptive coherence for latency-bandwidth tradeoffs in emerging memory technologies

Examples include a processor including a coherency mode indicating one of a directory-based cache coherence protocol and a snoop-based cache coherency protocol, and a caching agent to monitor a bandwidth of reading from and/or writing data to a memory coupled to the processor, to set the coherency mode to the snoop-based cache coherency protocol when the bandwidth exceeds a threshold, and to set the coherency mode to the directory-based cache coherency protocol when the bandwidth does not exceed the threshold.

CONTROLLER WITH CACHING AND NON-CACHING MODES

An apparatus includes a CPU core, a first cache subsystem coupled to the CPU core, and a second memory coupled to the cache subsystem. The first cache subsystem includes a configuration register, a first memory, and a controller. The controller is configured to: receive a request directed to an address in the second memory and, in response to the configuration register having a first value, operate in a non-caching mode. In the non-caching mode, the controller is configured to provide the request to the second memory without caching data returned by the request in the first memory. In response to the configuration register having a second value, the controller is configured to operate in a caching mode. In the caching mode the controller is configured to provide the request to the second memory and cache data returned by the request in the first memory.

DATA CACHING METHOD, SYSTEM AND DEVICE IN AI CLUSTER, AND COMPUTER MEDIUM
20240152458 · 2024-05-09 ·

The present application discloses a data caching method, system and device in an AI cluster, and a computer medium. The method comprises: determining a target data set to be cached; obtaining a weight value of the target data set on each cluster node in the AI cluster, determining a target cluster node for caching the target data set; obtaining a target shortest path from remaining cluster nodes comprising nodes except the target cluster node in the AI cluster, and on the basis of the weight value, the target shortest path, and the preceding node, determining a cache path for caching the target cluster node, so as to cache the target data set to the target cluster node according to the cache path. According to the present application, the cache path can be matched with the storage capacity of the AI cluster, caching of the target data set on the basis of the cache path is equivalent to caching of the data set on the basis of the storage performance of the AI cluster, and the data caching performance of the AI cluster can be improved.

REMOTE ATOMIC OPERATIONS IN MULTI-SOCKET SYSTEMS

Disclosed embodiments relate to remote atomic operations (RAO) in multi-socket systems. In one example, a method, performed by a cache control circuit of a requester socket, includes: receiving the RAO instruction from the requester CPU core, determining a home agent in a home socket for the addressed cache line, providing a request for ownership (RFO) of the addressed cache line to the home agent, waiting for the home agent to either invalidate and retrieve a latest copy of the addressed cache line from a cache, or to fetch the addressed cache line from memory, receiving an acknowledgement and the addressed cache line, executing the RAO instruction on the received cache line atomically, subsequently receiving multiple local RAO instructions to the addressed cache line from one or more requester CPU cores, and executing the multiple local RAO instructions on the received cache line independently of the home agent.

CUCKOO CACHING
20190236028 · 2019-08-01 ·

A cuckoo cache has plural buckets of plural cells each. The cells within a bucket are ranked to approximate relative usage recency. New items can be inserted into empty cells; when a bucket is full, room for a new item can be made by laterally transferring an older item to an alternative bucket. When empty cells and lateral transfers are unavailable, an item is selected for eviction based on the usage recency rank of the containing cell. When a match is found, depending on the embodiment, the hit item can be promoted within its bucket, to its alternative bucket, or to a separate tier of the cuckoo cache. The items can be key-value pairs. No metadata is required to track usage recency so that the cuckoo cache can be a very space efficient tool for finding cached values by their keys.