G06F12/0828

Cache coherent node controller for scale-up shared memory systems having interconnect switch between a group of CPUS and FPGA node controller

The present invention relates to cache coherent node controllers for scale-up shared memory systems. In particular it is disclosed a computer system at least comprising a first group of CPU modules connected to at least one first FPGA Node Controller configured to execute transactions directly or through a first interconnect switch to at least one second FPGA Node Controller connected to a second group of CPU modules running a single instance of an operating system.

Selective downstream cache processing for data access

A first request is received to access a first set of data in a first cache. A likelihood that a second request to a second cache for the first set of data will be canceled is determined. Access to the first set of data is completed based on the determining the likelihood that the second request to the second cache for the first set of data will be canceled.

Bits register for synonyms in a memory system

In an approach to tracking and invalidating memory address synonyms in a memory system includes establishing a bits register for a first virtual address in a memory system, the bits register having synonym fields representing each bit of a first synonym identifier portion of the first virtual address, the first virtual address being mapped to a physical address; determining, for a second virtual address mapped to the physical address, the second virtual address having a second synonym identifier portion, a set of differing bits within the second synonym identifier portion compared to the first synonym identifier portion; and registering the set of differing bits in the bits register.

Transfer protocol in a data processing network

In a data processing network comprising one or more Request Nodes and a Home Node coupled via a coherent interconnect, a Request Node requests data from the Home Node. The requested data is sent, via the interconnect, to the Request Node in a plurality of data beats, where a first data beat of the plurality of data beats is received at a first time and a last data beat is received at a second time. Responsive to receiving the first data beat, the Request Node sends an acknowledgement message to the Home Node. Upon receipt of the acknowledgement message, the Home Node frees resources allocated to the read transaction. In addition, the Home Node is configured to allow snoop requests for the data to the Request Node to be sent to the Request Node before all beats of the requested data have been received by the Request Node.

IMPLEMENTING CRASH CONSISTENCY IN PERSISTENT MEMORY
20210064415 · 2021-03-04 ·

A computer-implemented method according to one aspect includes receiving a request to perform a transaction in persistent memory; determining a correlation between volatile memory address locations in a volatile transaction cache and persistent memory locations in the persistent memory; performing the transaction within the volatile memory address locations of the volatile transaction cache; identifying modified volatile memory address locations in the volatile transaction cache that have been written during the transaction; logging, within the persistent memory, data within the modified volatile memory address locations; copying the data within the modified volatile memory address locations to corresponding persistent memory locations in the persistent memory, utilizing the determined correlation; and removing the logged data from the persistent memory, in response to determining that the copying has completed.

SOFTWARE-DEFINED COHERENT CACHING OF POOLED MEMORY

Methods and apparatus for software-defined coherent caching of pooled memory. The pooled memory is implemented in an environment having a disaggregated architecture where compute resources such as compute platforms are connected to disaggregated memory via a network or fabric. Software-defined caching policies are implemented in hardware in a processor SoC or discrete device such as a Network Interface Controller (NIC) by programming logic in an FPGA or accelerator on the SoC or discrete device. The programmed logic is configured to implement software-defined caching policies in hardware for effecting disaggregated memory (DM) caching in an associated DM cache of at least a portion of an address space allocated for the software application in the disaggregated memory. In connection with DM cache operations, such as cache lines evicted from a CPU, logic implemented in hardware determines whether a cache line in a DM cache is to be convicted and implements the software-defined caching policy for the DM cache including associated memory coherency operations.

GLOBAL COHERENCE OPERATIONS

A method includes receiving, by a L2 controller, a request to perform a global operation on a L2 cache and preventing new blocking transactions from entering a pipeline coupled to the L2 cache while permitting new non-blocking transactions to enter the pipeline. Blocking transactions include read transactions and non-victim write transactions. Non-blocking transactions include response transactions, snoop transactions, and victim transactions. The method further includes, in response to an indication that the pipeline does not contain any pending blocking transactions, preventing new snoop transactions from entering the pipeline while permitting new response transactions and victim transactions to enter the pipeline; in response to an indication that the pipeline does not contain any pending snoop transactions, preventing, all new transactions from entering the pipeline; and, in response to an indication that the pipeline does not contain any pending transactions, performing the global operation on the L2 cache.

MULTICORE SHARED CACHE OPERATION ENGINE

Techniques including receiving configuration information for a trigger control channel of the one or more trigger control channels, the configuration information defining a first one or more triggering events, receiving a first memory management command, store the first memory management command, detecting a first one or more triggering events, and triggering the stored first memory management command based on the detected first one or more triggering events.

STACKED MEMORY DEVICE SYSTEM INTERCONNECT DIRECTORY-BASED CACHE COHERENCE METHODOLOGY
20210034524 · 2021-02-04 ·

A system includes a plurality of host processors and a plurality of hybrid memory cube (HMC) devices configured as a distributed shared memory for the host processors. An HMC device includes a plurality of integrated circuit memory die including at least a first memory die arranged on top of a second memory die, and at least a portion of the memory of the memory die is mapped to include at least a portion of a memory coherence directory; and a logic base die including at least one memory controller configured to manage three-dimensional (3D) access to memory of the plurality of memory die by at least one second device, and logic circuitry configured to implement a memory coherence protocol for data stored in the memory of the plurality of memory die.

VIRTUAL NETWORK PRE-ARBITRATION FOR DEADLOCK AVOIDANCE AND ENHANCED PERFORMANCE
20210026768 · 2021-01-28 ·

A device includes a data path, a first interface configured to receive a first memory access request from a first peripheral device, and a second interface configured to receive a second memory access request from a second peripheral device. The device further includes an arbiter circuit configured to, in a first clock cycle, a pre-arbitration winner between a first memory access request and a second memory access request based on a first number of credits allocated to a first destination device and a second number of credits allocated to a second destination device. The arbiter circuit is further configured to, in a second clock cycle select a final arbitration winner from among the pre-arbitration winner and a subsequent memory access request based on a comparison of a priority of the pre-arbitration winner and a priority of the subsequent memory access request.