Patent classifications
G06F2212/622
REGION BASED SPLIT-DIRECTORY SCHEME TO ADAPT TO LARGE CACHE SIZES
Systems, apparatuses, and methods for maintaining region-based cache directories split between node and memory are disclosed. The system with multiple processing nodes includes cache directories split between the nodes and memory to help manage cache coherency among the nodes' cache subsystems. In order to reduce the number of entries in the cache directories, the cache directories track coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Each processing node includes a node-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the node. The node-based cache directory includes a reference count field in each entry to track the aggregate number of cache lines that are cached per region. The memory-based cache directory includes entries for regions which have an entry stored in any node-based cache directory of the system.
Systems and methods for simulating worst-case contention to determine worst-case execution time of applications executed on a processor
Techniques for determining worst-case execution time for at least one application under test are disclosed using memory thrashing. Memory thrashing simulates shared resource interference. Memory that is thrashed includes mapped memory, and optionally shared cache memory.
Shared read—using a request tracker as a temporary read cache
Disclosed embodiments relate to a shared read request (SRR) using a common request tracker (CRT) as a temporary cache. In one example, a multi-core system includes a memory and a memory controller to receive a SRR from a core when a Leader core is not yet identified, allocate a CRT entry and store the SRR therein, mark it as a Leader, send a read request to a memory address indicated by the SRR, and when read data returns from the memory, store the read data in the CRT entry, send the read data to the Leader core, and await receipt, unless already received, of another SRR from a Follower core, the other SRR having a same address as the SRR, then, send the read data to the Follower core, and deallocate the CRT entry.
HIGH PERFORMANCE INTERCONNECT
- Robert J. Safranek ,
- Robert G. Blankenship ,
- Venkatraman Iyer ,
- Jeff Willey ,
- Robert Beers ,
- Darren S. Jue ,
- Arvind A. Kumar ,
- Debendra Das Sharma ,
- Jeffrey C. Swanson ,
- Bahaa Fahim ,
- Vedaraman Geetha ,
- Aaron T. Spink ,
- Fulvio Spagna ,
- Rahul R. Shah ,
- Sitaraman V. Iyer ,
- William Harry Nale ,
- Abhishek Das ,
- Simon P. Johnson ,
- Yuvraj S. Dhillon ,
- Yen-Cheng Liu ,
- Raj K. Ramanujan ,
- Robert A. Maddox ,
- Herbert H. Hum ,
- Ashish Gupta
A physical layer (PHY) is coupled to a serial, differential link that is to include a number of lanes. The PHY includes a transmitter and a receiver to be coupled to each lane of the number of lanes. The transmitter coupled to each lane is configured to embed a clock with data to be transmitted over the lane, and the PHY periodically issues a blocking link state (BLS) request to cause an agent to enter a BLS to hold off link layer flit transmission for a duration. The PHY utilizes the serial, differential link during the duration for a PHY associated task selected from a group including an in-band reset, an entry into low power state, and an entry into partial width state.
MAINTAINING DOMAIN COHERENCE STATES INCLUDING DOMAIN STATE NO-OWNED (DSN) IN PROCESSOR-BASED DEVICES
Maintaining domain coherence states including Domain State No-Owned (DSN) in processor-based devices is disclosed. In this regard, a processor-based device provides multiple processing elements (PEs) organized into multiple domains, each containing one or more PEs and a local ordering point circuit (LOP). The processor-based device supports domain coherence states for coherence granules cached by the PEs within a given domain. The domain coherence states include a DSN domain coherence state, which indicates that a coherence granule is not cached within a shared modified state within any domain. In some embodiments, upon receiving a request for a read access to a coherence granule, a system ordering point circuit (SOP) determines that the coherence granule is cached in the DSN domain coherence state within a domain of the plurality of domains, and can safely read the coherence granule from the system memory to satisfy the read access if necessary.
High performance interconnect
- Robert J. Safranek ,
- Robert G. Blankenship ,
- Venkatraman Iyer ,
- Jeff Willey ,
- Robert Beers ,
- Darren S. Jue ,
- Arvind A. Kumar ,
- Debendra Das Sharma ,
- Jeffrey C. Swanson ,
- Bahaa Fahim ,
- Vedaraman Geetha ,
- Aaron T. Spink ,
- Fulvio Spagna ,
- Rahul R. Shah ,
- Sitaraman V. Iyer ,
- William Harry Nale ,
- Abhishek Das ,
- Simon P. Johnson ,
- Yuvraj S. Dhillon ,
- Yen-Cheng Liu ,
- Raj K. Ramanujan ,
- Robert A. Maddox ,
- Herbert H. Hum ,
- Ashish Gupta
A physical layer (PHY) is coupled to a serial, differential link that is to include a number of lanes. The PHY includes a transmitter and a receiver to be coupled to each lane of the number of lanes. The transmitter coupled to each lane is configured to embed a clock with data to be transmitted over the lane, and the PHY periodically issues a blocking link state (BLS) request to cause an agent to enter a BLS to hold off link layer flit transmission for a duration. The PHY utilizes the serial, differential link during the duration for a PHY associated task selected from a group including an in-band reset, an entry into low power state, and an entry into partial width state.
On-demand Memory Allocation
Techniques are disclosed relating to dynamically allocating and mapping private memory for requesting circuitry. Disclosed circuitry may receive a private address and translate the private address to a virtual address (which an MMU may then translate to physical address to actually access a storage element). In some embodiments, private memory allocation circuitry is configured to generate page table information and map private memory pages for requests if the page table information is not already setup. In various embodiments, this may advantageously allow dynamic private memory allocation, e.g., to efficiently allocate memory for graphics shaders with different types of workloads. Disclosed caching techniques for page table information may improve performance relative to traditional techniques. Further, disclosed embodiments may facilitate memory consolidation across a device such as a graphics processor.
SHARED READ - USING A REQUEST TRACKER AS A TEMPORARY READ CACHE
Disclosed embodiments relate to a shared read request (SRR) using a common request tracker (CRT) as a temporary cache. In one example, a multi-core system includes a memory and a memory controller to receive a SRR from a core when a Leader core is not yet identified, allocate a CRT entry and store the SRR therein, mark it as a Leader, send a read request to a memory address indicated by the SRR, and when read data returns from the memory, store the read data in the CRT entry, send the read data to the Leader core, and await receipt, unless already received, of another SRR from a Follower core, the other SRR having a same address as the SRR, then, send the read data to the Follower core, and deallocate the CRT entry.
HIGH PERFORMANCE INTERCONNECT
- Robert J. Safranek ,
- Robert G. Blankenship ,
- Venkatraman Iyer ,
- Jeff Willey ,
- Robert Beers ,
- Darren S. Jue ,
- Arvind A. Kumar ,
- Debendra Das Sharma ,
- Jeffrey C. Swanson ,
- Bahaa Fahim ,
- Vedaraman Geetha ,
- Aaron T. Spink ,
- Fulvio Spagna ,
- Rahul R. Shah ,
- Sitaraman V. Iyer ,
- William Harry Nale ,
- Abhishek Das ,
- Simon P. Johnson ,
- Yuvraj S. Dhillon ,
- Yen-Cheng Liu ,
- Raj K. Ramanujan ,
- Robert A. Maddox ,
- Herbert H. Hum ,
- Ashish Gupta
A physical layer (PHY) is coupled to a serial, differential link that is to include a number of lanes. The PHY includes a transmitter and a receiver to be coupled to each lane of the number of lanes. The transmitter coupled to each lane is configured to embed a clock with data to be transmitted over the lane, and the PHY periodically issues a blocking link state (BLS) request to cause an agent to enter a BLS to hold off link layer flit transmission for a duration. The PHY utilizes the serial, differential link during the duration for a PHY associated task selected from a group including an in-band reset, an entry into low power state, and an entry into partial width state.
SELECTIVE OVERRIDE OF CACHE COHERENCE IN MULTI-PROCESSOR COMPUTER SYSTEMS
Various example embodiments are related to cache coherence in multiprocessor computer systems. Various example embodiments are configured to support efficient cache coherence in multiprocessor computer systems. Various example embodiments are configured to support efficient cache coherence in multiprocessor computer systems based on support for selective override of cache coherence by processors in multiprocessor computer systems. Various example embodiments for supporting selective override of cache coherence in multiprocessor computer systems are configured to support selective override of cache coherence in processors of a multiprocessor computer system based on programmable approaches in the processors for selective overriding of cache coherence and based on use by the processors of snooping-based cache coherence protocols with capabilities for supporting selective overriding of cache coherence.