G06F12/0824

Cancel and replay protocol scheme to improve ordered bandwidth

Systems, apparatuses, and methods for implementing a cancel and replay mechanism for ordered requests are disclosed. A system includes at least an ordering master, a memory controller, a coherent slave coupled to the memory controller, and an interconnect fabric coupled to the ordering master and the coherent slave. The ordering master generates a write request which is forwarded to the coherent slave on the path to memory. The coherent slave sends invalidating probes to all processing nodes and then sends an indication that the write request is globally visible to the ordering master when all cached copies of the data targeted by the write request have been invalidated. In response to receiving the globally visible indication, the ordering master starts a timer. If the timer expires before all older requests have become globally visible, then the write request is cancelled and replayed to ensure forward progress in the fabric and avoid a potential deadlock scenario.

TRANSFER TRACK FORMAT INFORMATION FOR TRACKS IN CACHE AT A PRIMARY STORAGE SYSTEM TO A SECONDARY STORAGE SYSTEM TO WHICH TRACKS ARE MIRRORED TO USE AFTER A FAILOVER OR FAILBACK

Provided are a computer program product, system, and method to transfer track format information for tracks in cache at a primary storage system to a secondary storage system to which tracks are mirrored to use after a failover or failback. In response to a failover from the primary storage system to the secondary storage system, the primary storage system adds a track identifier of the track and track format information indicating a layout of data in the track, indicated in track metadata for the track in the primary storage, to a cache transfer list. The primary storage system transfers the cache transfer list to the secondary storage system to use the track format information in the cache transfer list for a track staged into the secondary cache having a track identifier in the cache transfer list.

Arithmetic processing apparatus and control method for arithmetic processing apparatus
10521346 · 2019-12-31 · ·

An arithmetic processing apparatus includes, a plurality of core memory groups, each of core memory groups including a plurality of arithmetic processing circuits, cache memory circuitry, shared by the plurality of arithmetic processing circuits, including a cache memory, a cache tag that stores a state of the cache memory, a tag directory that stores data possession information by a cache memory in another core memory group, and a memory access control circuit that receives a first memory access request from the cache memory circuitry and controls access to a memory other than a cache memory included in the cache memory circuitry, and a cache memory control circuit that receives a second memory access request from the arithmetic processing circuits and a third memory access request from the another core memory group and controls access to the cache memory.

System and method for network interface controller based distributed cache
11940917 · 2024-03-26 · ·

Methods and systems for managing storage of data in a distributed system are disclosed. To manage storage of data in a distributed system, a data processing system may include a network interface controller (NIC). The network interface controller may present emulated storages that may be used for data storage. The emulated storage devices may utilize storage resources of storage devices. The storage devices may be remote to the NIC. To reduce communication bandwidth and/or use of resources of the storage devices, the NIC and/or NICs of other data processing systems may implemented a distributed cache for data stored in the storage devices. The NICs may implement a method of managing the distributed cache to maintain synchronization between the distributed cache and the data stored in the storage devices.

Scalable system on a chip

A system including a plurality of processor cores, a plurality of graphics processing units, a plurality of peripheral circuits, and a plurality of memory controllers is configured to support scaling of the system using a unified memory architecture.

In-memory lightweight memory coherence protocol
11908546 · 2024-02-20 · ·

A system includes a plurality of host processors and a plurality of HMC devices configured as a distributed shared memory for the host processors. An HMC device includes a plurality of integrated circuit memory die including at least a first memory die arranged on top of a second memory die and at least a portion of the memory of the memory die is mapped to include at least a portion of a memory coherence directory; and a logic base die including at least one memory controller configured to manage three-dimensional (3D) access to memory of the plurality of memory die by at least one second device, and logic circuitry configured to determine memory coherence state information for data stored in the memory of the plurality of memory die, communicate information regarding the access to memory, and include the memory coherence information in the communicated information.

Controller with caching and non-caching modes

An apparatus includes a CPU core, a first cache subsystem coupled to the CPU core, and a second memory coupled to the cache subsystem. The first cache subsystem includes a configuration register, a first memory, and a controller. The controller is configured to: receive a request directed to an address in the second memory and, in response to the configuration register having a first value, operate in a non-caching mode. In the non-caching mode, the controller is configured to provide the request to the second memory without caching data returned by the request in the first memory. In response to the configuration register having a second value, the controller is configured to operate in a caching mode. In the caching mode the controller is configured to provide the request to the second memory and cache data returned by the request in the first memory.

Maintaining and recomputing reference counts in a persistent memory file system

Techniques are provided for maintaining and recomputing reference counts in a persistent memory file system of a node. Primary reference counts are maintained for pages within persistent memory of the node. In response to receiving a first operation to link a page into a persistent memory file system of the persistent memory, a primary reference count of the page is incremented before linking the page into the persistent memory file system. In response to receiving a second operation to unlink the page from the persistent memory file system, the page is unlinked from the persistent memory file system before the primary reference count is decremented. Upon the node recovering from a crash, the persistent memory file system is traversed in order to update shadow reference counts for the pages with correct reference count values, which are used to overwrite the primary reference counts with the correct reference count values.

DISTRIBUTED COHERENCE DIRECTORY SUBSYSTEM WITH EXCLUSIVE DATA REGIONS
20190370174 · 2019-12-05 ·

A processing system includes a first set of one or more processing units including a first processing unit, a second set of one or more processing units including a second processing unit, and a memory having an address space shared by the first and second sets. The processing system further includes a distributed coherence directory subsystem having a first coherence directory to support a first subset of one or more address regions of the address space and a second coherence directory to support a second subset of one or more address regions of the address space. In some implementations, the first coherence directory is implemented in the system so as to have a lower access latency for the first set, whereas the second coherence directory is implemented in the system so as to have a lower access latency for the second set.

IMPLIED DIRECTORY STATE UPDATES
20190354284 · 2019-11-21 · ·

A request is received over a link that requests a particular line in memory. A directory state record is identified in memory that identifies a directory state of the particular line. A type of the request is identified from the request. It is determined that the directory state of the particular line is to change from the particular state to a new state based on the directory state of the particular line and the type of the request. The directory state record is changed, in response to receipt of the request, to reflect the new state. A copy of the particular line is sent in response to the request