Patent classifications
G06F2212/622
Coherence-based cache-line Copy-on-Write
A method of performing a copy-on-write on a shared memory page is carried out by a device communicating with a processor via a coherence interconnect. The method includes: adding a page table entry so that a request to read a first cache line of the shared memory page includes a cache-line address of the shared memory page and a request to write to a second cache line of the shared memory page includes a cache-line address of a new memory page; in response to the request to write to the second cache line, storing new data of the second cache line in a second memory and associating the second cache-line address with the new data stored in the second memory; and in response to a request to read the second cache line, reading the new data of the second cache line from the second memory.
System and methods for cache coherent system using ownership-based scheme
A computer system includes a first core including a first local cache and a second core including a second local cache. The first core and the second core are coupled through a remote link. A shared cache coupled to the first core and to the second core. The shared cache includes an ownership table that includes a plurality of entries indicating if a cache line is stored solely in the first local cache or solely in the second local cache. The remote link includes a first link between the first core and the shared cache and a second link between the second core and the shared cache.
Network cache injection for coherent GPUs
Methods, devices, and systems for GPU cache injection. A GPU compute node includes a network interface controller (NIC) which includes NIC receiver circuitry which can receive data for processing on the GPU, NIC transmitter circuitry which can send the data to a main memory of the GPU compute node and which can send coherence information to a coherence directory of the GPU compute node based on the data. The GPU compute node also includes a GPU which includes GPU receiver circuitry which can receive the coherence information; GPU processing circuitry which can determine, based on the coherence information, whether the data satisfies a heuristic; and GPU loading circuitry which can load the data into a cache of the GPU from the main memory if on the data satisfies the heuristic.
Add-on memory coherence directory
A mechanism is provided for memory coherence in a multiple processor system. Responsive to a memory access resulting in a cache miss in a given processor, the processor determines whether a memory region being accessed is marked as directory-based. Responsive to the given processor determining the memory region is marked as directory-based, the given processor accesses a directory entry corresponding to the memory region to identify a home chip for the page using a directory-based protocol. The given processor forwards the memory access request to the home chip to perform the memory access.
Granting exclusive cache access using locality cache coherency state
A cache coherency management facility to reduce latency in granting exclusive access to a cache in certain situations. A node requests exclusive access to a cache line of the cache. The node is in one region of nodes of a plurality of regions of nodes. The one region of nodes includes the node requesting exclusive access and another node of the computing environment, in which the node and the another node are local to one another as defined by a predetermined criteria. The node requesting exclusive access checks a locality cache coherency state of the another node, the locality cache coherency state being specific to the another node and indicating whether the another node has access to the cache line. Based on the checking indicating that the another node has access to the cache line, a determination is made that the node requesting exclusive access is to be granted exclusive access to the cache line. The determining being independent of transmission of information relating to the cache line from one or more other nodes of the one or more other regions of nodes.
SYSTEM AND METHOD FOR PREVENTING CACHE CONTENTION
A system and method for preventing cache contention for a cache including a plurality of ways and a separate port for each way, the method including: obtaining, in a core of a processor, a multidimensional coefficient array of a multidimensional filter, and pointers to data elements from a plurality of rows of a multidimensional data array, and loading the plurality of rows into the cache, where each row is stored in a different way of the cache.
METHOD AND APPARATUS FOR MANAGING A CACHE DIRECTORY
Method and apparatus monitor eviction conflicts among cache directory entries in a cache directory and produce cache directory victim entry information for a memory manager. In some examples, the memory manager reduces future cache directory conflicts by changing a page level physical address assignment for a page of memory based on the produced cache directory victim entry information. In some examples, a scalable data fabric includes hardware control logic that performs the monitoring of the eviction conflicts among cache directory entries in the cache directory and produces the cache directory victim entry information.
Maintaining domain coherence states including domain state no-owned (DSN) in processor-based devices
Maintaining domain coherence states including Domain State No-Owned (DSN) in processor-based devices is disclosed. In this regard, a processor-based device provides multiple processing elements (PEs) organized into multiple domains, each containing one or more PEs and a local ordering point circuit (LOP). The processor-based device supports domain coherence states for coherence granules cached by the PEs within a given domain. The domain coherence states include a DSN domain coherence state, which indicates that a coherence granule is not cached within a shared modified state within any domain. In some embodiments, upon receiving a request for a read access to a coherence granule, a system ordering point circuit (SOP) determines that the coherence granule is cached in the DSN domain coherence state within a domain of the plurality of domains, and can safely read the coherence granule from the system memory to satisfy the read access if necessary.
High performance interconnect
- Robert J. Safranek ,
- Robert G. Blankenship ,
- Venkatraman Iyer ,
- Jeff Willey ,
- Robert Beers ,
- Darren S. Jue ,
- Arvind A. Kumar ,
- Debendra Das Sharma ,
- Jeffrey C. Swanson ,
- Bahaa Fahim ,
- Vedaraman Geetha ,
- Aaron T. Spink ,
- Fulvio Spagna ,
- Rahul R. Shah ,
- Sitaraman V. Iyer ,
- William Harry Nale ,
- Abhishek Das ,
- Simon P. Johnson ,
- Yuvraj S. Dhillon ,
- Yen-Cheng Liu ,
- Raj K. Ramanujan ,
- Robert A. Maddox ,
- Herbert H. Hum ,
- Ashish Gupta
A physical layer (PHY) is coupled to a serial, differential link that is to include a number of lanes. The PHY includes a transmitter and a receiver to be coupled to each lane of the number of lanes. The transmitter coupled to each lane is configured to embed a clock with data to be transmitted over the lane, and the PHY periodically issues a blocking link state (BLS) request to cause an agent to enter a BLS to hold off link layer flit transmission for a duration. The PHY utilizes the serial, differential link during the duration for a PHY associated task selected from a group including an in-band reset, an entry into low power state, and an entry into partial width state.
Stacked memory device system interconnect directory-based cache coherence methodology
A system includes a plurality of host processors and a plurality of hybrid memory cube (HMC) devices configured as a distributed shared memory for the host processors. An HMC device includes a plurality of integrated circuit memory die including at least a first memory die arranged on top of a second memory die, and at least a portion of the memory of the memory die is mapped to include at least a portion of a memory coherence directory; and a logic base die including at least one memory controller configured to manage three-dimensional (3D) access to memory of the plurality of memory die by at least one second device, and logic circuitry configured to implement a memory coherence protocol for data stored in the memory of the plurality of memory die.