G06F12/0828

PRIORITIZATION OF TRANSACTIONS

A method, system, and computer program product are provided for prioritizing transactions. A processor in a computing environment initiates the execution of a transaction. The processor includes a transactional core, and the execution of the transaction is performed by the transactional core. The processor obtains concurrent with the execution of the transaction by the transactional core, an indication of a conflict between the transaction and at least one other transaction being executed by an additional core in the computing environment. The processor determines if the transactional core includes an indicator and based on determining that the transactional core includes an indicator, the processor ignores the conflict and utilizing the transactional core to complete executing the transaction.

Speculative querying the main memory of a multiprocessor system
09720850 · 2017-08-01 · ·

A method of accessing data in a multiprocessor system, wherein the system includes a plurality of processors, with each processor being associated with a respective cache memory, a cache memory management module, a main memory and a main memory management module, the method including: receiving by the cache memory management module an initial request for access to data by a processor; first transmitting by the cache memory management module a first request with respect to the data to at least one cache memory; second transmitting in parallel to the first transmitting by the cache memory management module, a second request with respect to the data to the main memory management module; checking by the main memory management module, whether to initiate querying of the main memory or not, and querying or not by the main memory management module, of the main memory in accordance with the said checking.

READ AND WRITE SETS FOR TRANSACTIONS OF A MULTITHREADED COMPUTING ENVIRONMENT
20170322884 · 2017-11-09 ·

Facilitating processing in a computing environment. A request to access a cache of the computing environment is obtained from a transaction executing on a processor of the computing environment. Based on obtaining the request, a determination is made as to whether a tracking set to be used to track cache accesses is to be updated. The tracking set includes a read set to track read accesses of at least a selected portion of the cache and a write set to track write accesses of at least the selected portion of the cache. The tracking set is assigned to the transaction, and another transaction to access the cache has another tracking set assigned thereto. The tracking set assigned to the transaction is updated based on the determining indicating the tracking set is to be updated.

HARDWARE COHERENCE FOR MEMORY CONTROLLER

A system includes a non-coherent component; a coherent, non-caching component; a coherent, caching component; and a level two (L2) cache subsystem coupled to the non-coherent component, the coherent, non-caching component, and the coherent, caching component. The L2 cache subsystem includes a L2 cache; a shadow level one (L1) main cache; a shadow L1 victim cache; and a L2 controller. The L2 controller is configured to receive and process a first transaction from the non-coherent component; receive and process a second transaction from the coherent, non-caching component; and receive and process a third transaction from the coherent, caching component.

VIRTUAL NETWORK PRE-ARBITRATION FOR DEADLOCK AVOIDANCE AND ENHANCED PERFORMANCE
20210382822 · 2021-12-09 ·

A device includes a data path, a first interface configured to receive a first memory access request from a first peripheral device, and a second interface configured to receive a second memory access request from a second peripheral device. The device further includes an arbiter circuit configured to, in a first clock cycle, a pre-arbitration winner between a first memory access request and a second memory access request based on a first number of credits allocated to a first destination device and a second number of credits allocated to a second destination device. The arbiter circuit is further configured to, in a second clock cycle select a final arbitration winner from among the pre-arbitration winner and a subsequent memory access request based on a comparison of a priority of the pre-arbitration winner and a priority of the subsequent memory access request.

Handling surface level coherency without reliance on fencing

Systems, apparatuses and methods may provide for technology that detects a memory fence in a thread, adds a group identifier to one or more memory operations in the thread that follow the memory fence, and sends the one or more memory operations and the group identifier to a memory structure. In one example, the group identifier is used to track completion of the one or more memory operations.

Merging data for write allocate

A method includes receiving, by a level two (L2) controller, a write request for an address that is not allocated as a cache line in a L2 cache. The write request specifies write data. The method also includes generating, by the L2 controller, a read request for the address; reserving, by the L2 controller, an entry in a register file for read data returned in response to the read request; updating, by the L2 controller, a data field of the entry with the write data; updating, by the L2 controller, an enable field of the entry associated with the write data; and receiving, by the L2 controller, the read data and merging the read data into the data field of the entry.

ADAPTIVE REMOTE ATOMICS

Disclosed embodiments relate to atomic memory operations. In one example, an apparatus includes multiple processor cores, a cache hierarchy, a local execution unit, and a remote execution unit, and an adaptive remote atomic operation unit. The cache hierarchy includes a local cache at a first level and a shared cache at a second level. The local execution unit is to perform an atomic operation at the first level if the local cache is a storing a cache line including data for the atomic operation. The remote execution unit is to perform the atomic operation at the second level. The adaptive remote atomic operation unit is to determine whether to perform the first atomic operation at the first level or at the second level and whether to copy the cache line from the shared cache to the local cache.

Memory interface between physical and virtual address spaces

A memory interface for interfacing between a memory bus addressable using a physical address space and a cache memory addressable using a virtual address space, the memory interface comprising: a memory management unit configured to maintain a mapping from the virtual address space to the physical address space; and a coherency manager comprising a reverse translation module configured to maintain a mapping from the physical address space to the virtual address space; wherein the memory interface is configured to: receive a memory read request from the cache memory, the memory read request being addressed in the virtual address space; translate the memory read request, at the memory management unit, to a translated memory read request addressed in the physical address space for transmission on the memory bus; receive a snoop request from the memory bus, the snoop request being addressed in the physical address space; and translate the snoop request, at the coherency manager, to a translated snoop request addressed in the virtual address space for processing in connection with the cache memory.

MANAGING CACHED DATA USED BY PROCESSING-IN-MEMORY INSTRUCTIONS
20220188233 · 2022-06-16 ·

A system-on-chip configured for eager invalidation and flushing of cached data used by PIM (Processing-in-Memory) instructions includes: one or more processor cores; one or more caches and an I/O (input/output) die comprising logic to: receive a cache probe request, wherein the cache probe request including a physical memory address associated with a PIM instruction, and the PIM instruction is to be offloaded to a PIM device for execution; and issue, based on the physical memory address, a cache probe to one or more of the caches prior to receiving the PIM instruction for dispatch to the PIM device.