Patent classifications
G06F12/0808
Cache lookup bypass in multi-level cache systems
Techniques described herein are generally related to retrieval of data in computer systems having multi-level caches. The multi-level cache may include at least a first cache and a second cache. The first cache may be configured to receive a request for a cache line. The request may be associated with an instruction executing on a tile of the computer system. A suppression status of the instruction may be determined by a first cache controller to determine whether look-up of the first cache is suppressed based upon the determined suppression status. The request for the cache line may be forwarded to the second cache by the first cache controller after the look-up of the first cache is suppressed.
Translation entry invalidation in a multithreaded data processing system
In a multithreaded data processing system including a plurality of processor cores, storage-modifying requests, including a translation invalidation request of an initiating hardware thread, are received in a shared queue. The translation invalidation request is broadcast so that it is received and processed by the plurality of processor cores. In response to confirmation of the broadcast, the address translated by the translation entry is stored in a queue. Once the address is stored, the initiating processor core resumes dispatch of instructions within the initiating hardware thread. In response to a request from one of the plurality of processor cores, an effective address translated by a translation entry being invalidated is accessed in the queue. A synchronization request for the address is broadcast to ensure completion of processing of any translation invalidation request for the address.
Translation entry invalidation in a multithreaded data processing system
In a multithreaded data processing system including a plurality of processor cores, storage-modifying requests, including a translation invalidation request of an initiating hardware thread, are received in a shared queue. The translation invalidation request is broadcast so that it is received and processed by the plurality of processor cores. In response to confirmation of the broadcast, the address translated by the translation entry is stored in a queue. Once the address is stored, the initiating processor core resumes dispatch of instructions within the initiating hardware thread. In response to a request from one of the plurality of processor cores, an effective address translated by a translation entry being invalidated is accessed in the queue. A synchronization request for the address is broadcast to ensure completion of processing of any translation invalidation request for the address.
APPARATUS AND METHOD FOR TRIGGERED PREFETCHING TO IMPROVE I/O AND PRODUCER-CONSUMER WORKLOAD EFFICIENCY
An apparatus and method are described for a triggered prefetch operation. For example, one embodiment of a processor comprises: a first core comprising a first cache to store a first set of cache lines; a second core comprising a second cache to store a second set of cache lines; a cache management circuit to maintain coherency between one or more cache lines in the first cache and the second cache, the cache management circuit to allocate a lock on a first cache line to the first cache; a prefetch circuit comprising a prefetch request buffer to store a plurality of prefetch request entries including a first prefetch request entry associated with the first cache line, the prefetch circuit to cause the first cache line to be prefetched to the second cache in response to an invalidate command detected for the first cache line.
MUTUAL EXCLUSION IN A NON-COHERENT MEMORY HIERARCHY
Methods and systems for mutual exclusion in a non-coherent memory hierarchy may include a non-coherent memory system with a shared system memory. Multiple processors and a memory connect interface may be configured to provide an interface for the processors to the shared memory. The memory connect interface may include an arbiter for atomic memory operations from the processors. In response to an atomic memory operation, the arbiter may perform an atomic memory operation procedure including setting a busy flag for an address of the atomic memory operation, blocking subsequent memory operations from any of the processors to the address while the busy flag is set, issuing the atomic memory operation to the shared memory, and in response to an acknowledgement of the atomic memory operation from the shared memory, clearing the busy flag and allowing subsequent memory operations from the processors for the address to proceed to the shared memory.
Virtual storage pool
Virtual storage pool creation is simplified by allowing a user to specify what devices to include in virtual storage pool by physical location. The virtual storage pool may be automatically generated based on the simplified user specifications. The user may specify the virtual pool configuration in a configuration file. A configuration application generates the virtual storage pool based on the configuration file. The configuration application utilizes the physical locations of block devices contained in the configuration file to generate the pool. As a result, virtual pool configuration and creation is automated, more efficient and is less error prone than previous methods that involve manually linking between physical device locations and computer generated names.
REDUCING INTERCONNECT TRAFFICS OF MULTI-PROCESSOR SYSTEM WITH EXTENDED MESI PROTOCOL
A processor includes a first core including a first cache including a cache line, a second core including a second cache, and a cache controller to set a flag stored in a flag section of the cache line of the first cache to one of a processor share (PS) state in response to data stored in the cache line being shared by the second cache, or to a global share (GS) state in response to the data stored in the first cache line being shared by a third cache of a second processor.
REDUCING INTERCONNECT TRAFFICS OF MULTI-PROCESSOR SYSTEM WITH EXTENDED MESI PROTOCOL
A processor includes a first core including a first cache including a cache line, a second core including a second cache, and a cache controller to set a flag stored in a flag section of the cache line of the first cache to one of a processor share (PS) state in response to data stored in the cache line being shared by the second cache, or to a global share (GS) state in response to the data stored in the first cache line being shared by a third cache of a second processor.
Cache coherence shared state suppression
A method includes receiving, by a level two (L2) controller, a first request for a cache line in a shared cache coherence state; mapping, by the L2 controller, the first request to a second request for a cache line in an exclusive cache coherence state; and responding, by the L2 controller, to the second request.
Cache coherence shared state suppression
A method includes receiving, by a level two (L2) controller, a first request for a cache line in a shared cache coherence state; mapping, by the L2 controller, the first request to a second request for a cache line in an exclusive cache coherence state; and responding, by the L2 controller, to the second request.