Patent classifications
G06F12/082
SYSTEMS AND METHODS FOR IMPLEMENTING COHERENT MEMORY IN A MULTIPROCESSOR SYSTEM
Data units are stored in private caches in nodes of a multiprocessor system, each node containing at least one processor (CPU), at least one cache private to the node and at least one cache location buffer (CLB) private to the node. In each CLB location information values are stored, each location information value indicating a location associated with a respective data unit, wherein each location information value stored in a given CLB indicates the location to be either a location within the private cache disposed in the same node as the given CLB, to be a location in one of the other nodes, or to be a location in a main memory. Coherence of values of the data units is maintained using a cache coherence protocol. The location information values stored in the CLBs are updated by the cache coherence protocol in accordance with movements of their respective data units.
Snoop filter device
The invention relates to a device for use in maintaining cache coherence in a multiprocessor computing system. The snoop filter device is connectable with a plurality of cache elements, where each cache element comprises a number of cache agents. The snoop filter device comprises a plurality of snoop filter storage locations, where each snoop filter storage location is mapped to one cache element.
Write merging on stores with different tags
Techniques for caching data are provided that include receiving, by a caching system, a write memory command for a memory address, the write memory command associated with a first color tag, determining, by a first sub-cache of the caching system, that the memory address is not cached in the first sub-cache, determining, by second sub-cache of the caching system, that the memory address is not cached in the second sub-cache, storing first data associated with the first write memory command in a cache line of the second sub-cache, storing the first color tag in the second sub-cache, receiving a second write memory command for the cache line, the write memory command associated with a second color tag, merging the second color tag with the first color tag, storing the merged color tag, and evicting the cache line based on the merged color tag.
Region based split-directory scheme to adapt to large cache sizes
Systems, apparatuses, and methods for maintaining region-based cache directories split between node and memory are disclosed. The system with multiple processing nodes includes cache directories split between the nodes and memory to help manage cache coherency among the nodes' cache subsystems. In order to reduce the number of entries in the cache directories, the cache directories track coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Each processing node includes a node-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the node. The node-based cache directory includes a reference count field in each entry to track the aggregate number of cache lines that are cached per region. The memory-based cache directory includes entries for regions which have an entry stored in any node-based cache directory of the system.
System and method for efficient cache coherency protocol processing
To reduce latency and bandwidth consumption in systems, systems and methods are provided for grouping multiple cache line request messages in a related and speculative manner. That is, multiple cache lines are likely to have the same state and ownership characteristics, and therefore, requests for multiple cache lines can be grouped. Information received in response can be directed to the requesting processor socket, and those speculatively received (not actually requested, but likely to be requested) can be maintained in queue or other memory until a request is received for that information, or until discarded to free up tracking space for new requests.
Ternary content addressable memory-enhanced cache coherency acceleration
A system and method for cache coherency within multiprocessor environments is provided. Each node controller of a plurality of nodes within a multiprocessor system receives a cache coherency protocol request from local processor sockets and other node controller(s). A ternary content addressable memory (TCAM) accelerator in the node controller determines if the cache coherency protocol request comprises a snoop request and, if it is determined to be a snoop request, searching the TCAM based on an address within the cache coherency protocol request. In response to detecting only one match between an entry of the TCAM and the received snoop request, sending a response to the requesting local processor a response without having to access a coherency directory.
Home agent based cache transfer acceleration scheme
Systems, apparatuses, and methods for implementing a speculative probe mechanism are disclosed. A system includes at least multiple processing nodes, a probe filter, and a coherent slave. The coherent slave includes an early probe cache to cache recent lookups to the probe filter. The early probe cache includes entries for regions of memory, wherein a region includes a plurality of cache lines. The coherent slave performs parallel lookups to the probe filter and the early probe cache responsive to receiving a memory request. An early probe is sent to a first processing node responsive to determining that a lookup to the early probe cache hits on a first entry identifying the first processing node as an owner of a first region targeted by the memory request and responsive to determining that a confidence indicator of the first entry is greater than a threshold.
VICTIM CACHE THAT SUPPORTS DRAINING WRITE-MISS ENTRIES
A caching system including a first sub-cache and a second sub-cache in parallel with the first sub-cache, wherein the second sub-cache includes a set of cache lines, line type bits configured to store an indication that a corresponding cache line of the set of cache lines is configured to store write-miss data, and an eviction controller configured to flush stored write-miss data based on the line type bits.
Arithmetic processor and method for operating arithmetic processor
An arithmetic processor including a plurality of core groups each including a plurality of cores and a cache unit, a plurality of home agents each including a tag directory and a store command queue and a store command queue. The store command queue enters the received store request to the entry queue in order of reception, the cache unit stores the data of the store request in a data RAM. The store command queue sets a data ownership acquisition flag of the store request to valid when obtaining a data ownership of the store request and issues a top-of-queue notification to the cache control unit when the flag of the top-of-queue entry is valid. In response to the top-of-queue notification, the cache unit update a cache tag to modified state and issue a store request completion notification.
METHODS AND APPARATUS TO FACILITATE WRITE MISS CACHING IN CACHE SYSTEM
Methods, apparatus, systems and articles of manufacture to facilitate write miss caching in cache system are disclosed. An example apparatus includes a first cache storage; a second cache storage, wherein the second cache storage includes a first portion operable to store a first set of data evicted from the first cache storage and a second portion; a cache controller coupled to the first cache storage and the second cache storage and operable to: receive a write operation; determine that the write operation produces a miss in the first cache storage; and in response to the miss in the first cache storage, provide write miss information associated with the write operation to the second cache storage for storing in the second portion.