G06F12/0826

LINK AFFINITIZATION TO REDUCE TRANSFER LATENCY

Examples described herein relate to processor circuitry to issue a cache coherence message to a central processing unit (CPU) cluster by selection of a target cluster and issuance of the request to the target cluster, wherein the target cluster comprises the cluster or the target cluster is directly connected to the cluster. In some examples, the selected target cluster is associated with a minimum number of die boundary traversals. In some examples, the processor circuitry is to read an address range for the cluster to identify the target cluster using a single range check over memory regions including local and remote clusters. In some examples, issuance of the cache coherence message to a cluster is to cause the cache coherence message to traverse one or more die interconnections to reach the target cluster.

Home agent based cache transfer acceleration scheme

Systems, apparatuses, and methods for implementing a speculative probe mechanism are disclosed. A system includes at least multiple processing nodes, a probe filter, and a coherent slave. The coherent slave includes an early probe cache to cache recent lookups to the probe filter. The early probe cache includes entries for regions of memory, wherein a region includes a plurality of cache lines. The coherent slave performs parallel lookups to the probe filter and the early probe cache responsive to receiving a memory request. An early probe is sent to a first processing node responsive to determining that a lookup to the early probe cache hits on a first entry identifying the first processing node as an owner of a first region targeted by the memory request and responsive to determining that a confidence indicator of the first entry is greater than a threshold.

ERROR RECOVERY STORAGE FOR NON-ASSOCIATIVE MEMORY
20200285550 · 2020-09-10 ·

An apparatus comprises a non-associative memory comprising a plurality of storage locations, and error recovery storage to store at least one error recovery entry providing a recovery value for a corresponding storage location of the non-associative memory. Control circuitry is responsive to a non-associative memory read request specifying a target address of a storage location of the non-associative memory, when the error recovery storage includes a valid matching error recovery entry for which the corresponding storage location is the storage location identified by the target address, to return the recovery value stored in the valid matching error recovery entry as a response to the non-associative memory read request, instead of information stored in the storage location identified by the target address. This enables the apparatus to continue to function even if hard errors occur in a storage location of the non-associative memory.

SIMULTANEOUS, NON-ATOMIC REQUEST PROCESSING WITHIN AN SMP ENVIRONMENT BROADCAST SCOPE FOR MULTIPLY-REQUESTED DATA ELEMENTS USING REAL-TIME PARALLELIZATION

Provided are systems, methods, and media for simultaneous, non-atomic request processing of snooped operations of a broadcast scope within a SMP system. An example method includes detecting, by a first controller, based on a set of coherency resolution conditions, whether there are coherency resolution problems between two snooped operations. The method includes in response to detecting, by the first controller, that coherency resolution problems will not result, transmitting, from the first controller to a second controller, an indication signal indicating that coherency resolution problems will not result from the operation. The set of coherency resolution conditions includes: (a) detecting that a second operation of the two snooped operations operation is of a predetermined type, (b) detecting at time of snooping of the second operation that a directory state does not allow for exclusive data, and (c) detecting that the first controller has started committing to an update.

Table of contents cache entry having a pointer for a range of addresses

Table of contents (TOC) pointer cache entry having a pointer for a range of addresses. An address of a called routine and a pointer value of a pointer to a reference data structure to be entered into a reference data structure pointer cache are obtained. The reference data structure pointer cache includes a plurality of entries, and an entry of the plurality of entries includes a stored pointer value for an address range. A determination is made, based on the pointer value, whether an existing entry exists in the reference data structure pointer cache for the pointer value. Based on determining the existing entry exists, one of an address_from field of the existing entry or an address_to field of the existing entry is updated using the address of the called routine. The stored pointer value of the existing entry is usable to access the reference data structure for the address range defined by the address_from field and the address_to field.

Region based split-directory scheme to adapt to large cache sizes

Systems, apparatuses, and methods for maintaining region-based cache directories split between node and memory are disclosed. The system with multiple processing nodes includes cache directories split between the nodes and memory to help manage cache coherency among the nodes' cache subsystems. In order to reduce the number of entries in the cache directories, the cache directories track coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Each processing node includes a node-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the node. The node-based cache directory includes a reference count field in each entry to track the aggregate number of cache lines that are cached per region. The memory-based cache directory includes entries for regions which have an entry stored in any node-based cache directory of the system.

Table of contents cache entry having a pointer for a range of addresses

Table of contents (TOC) pointer cache entry having a pointer for a range of addresses. An address of a called routine and a pointer value of a pointer to a reference data structure to be entered into a reference data structure pointer cache are obtained. The reference data structure pointer cache includes a plurality of entries, and an entry of the plurality of entries includes a stored pointer value for an address range. A determination is made, based on the pointer value, whether an existing entry exists in the reference data structure pointer cache for the pointer value. Based on determining the existing entry exists, one of an address_from field of the existing entry or an address_to field of the existing entry is updated using the address of the called routine. The stored pointer value of the existing entry is usable to access the reference data structure for the address range defined by the address_from field and the address_to field.

SYMMETRICAL MULTI-PROCESSING NODE
20200167283 · 2020-05-28 ·

A symmetrical multi-processing (SMP) node, a distributed SMP (DSMP) system comprising a plurality of SMP nodes, and a method implemented in the SMP node are disclosed. The SMP node comprises: a plurality of processors, a memory coupled to the plurality of processors, and a memory coherent proxy coupled to the plurality of processors through a coherent accelerator interface. The memory coherent proxy is configured to manage statuses of cache lines in the memory.

Dual clusters of fully connected integrated circuit multiprocessors with shared high-level cache

Embodiments of the present invention are directed to managing a shared high-level cache for dual clusters of fully connected integrated circuit multiprocessors. An example of a computer-implemented method includes: providing a drawer comprising a plurality of clusters, each of the plurality of clusters comprising a plurality of processors; providing a shared cache integrated circuit to manage a shared cache memory among the plurality of clusters; receiving, by the shared cache integrated circuit, an operation of one of a plurality of operation types from one of the plurality of processors; and processing, by the shared cache integrated circuit, the operation based at least in part on the operation type of the operation according to a set of rules for processing the operation type.

Dual clusters of fully connected integrated circuit multiprocessors with shared high-level cache

Embodiments of the present invention are directed to managing a shared high-level cache for dual clusters of fully connected integrated circuit multiprocessors. An example of a computer-implemented method includes: providing a drawer comprising a plurality of clusters, each of the plurality of clusters comprising a plurality of processors; providing a shared cache integrated circuit to manage a shared cache memory among the plurality of clusters; receiving, by the shared cache integrated circuit, an operation of one of a plurality of operation types from one of the plurality of processors; and processing, by the shared cache integrated circuit, the operation based at least in part on the operation type of the operation according to a set of rules for processing the operation type.