Patent classifications
G06F12/0857
Shared read—using a request tracker as a temporary read cache
Disclosed embodiments relate to a shared read request (SRR) using a common request tracker (CRT) as a temporary cache. In one example, a multi-core system includes a memory and a memory controller to receive a SRR from a core when a Leader core is not yet identified, allocate a CRT entry and store the SRR therein, mark it as a Leader, send a read request to a memory address indicated by the SRR, and when read data returns from the memory, store the read data in the CRT entry, send the read data to the Leader core, and await receipt, unless already received, of another SRR from a Follower core, the other SRR having a same address as the SRR, then, send the read data to the Follower core, and deallocate the CRT entry.
MULTICORE SHARED CACHE OPERATION ENGINE
Techniques including receiving configuration information for a trigger control channel of the one or more trigger control channels, the configuration information defining a first one or more triggering events, receiving a first memory management command, store the first memory management command, detecting a first one or more triggering events, and triggering the stored first memory management command based on the detected first one or more triggering events.
DELAYED SNOOP FOR IMPROVED MULTI-PROCESS FALSE SHARING PARALLEL THREAD PERFORMANCE
Techniques for maintaining cache coherency comprising storing data blocks associated with a main process in a cache line of a main cache memory, storing a first local copy of the data blocks in a first local cache memory of a first processor, storing a second local copy of the set of data blocks in a second local cache memory of a second processor executing a first child process of the main process to generate first output data, writing the first output data to the first data block of the first local copy as a write through, writing the first output data to the first data block of the main cache memory as a part of the write through, transmitting an invalidate request to the second local cache memory, marking the second local copy of the set of data blocks as delayed, and transmitting an acknowledgment to the invalidate request.
Distributed error detection and correction with hamming code handoff
A device includes a data path, a first interface connected to the data path and configured to receive a request from a processor package to write a data value to a memory address, and a controller connected to the data path and configured to receive the request to write the data value to the memory address and to calculate a Hamming code of the data value. The controller is configured to transmit the data value and the Hamming code on the data path. The device includes an external memory interleave connected to the data path. The external memory interleave is configured to receive the data value and calculate a test Hamming code of the data value and to determine whether to send the data value to an external memory interface to be written to the memory address based on a comparison of the Hamming code and the test Hamming code.
Multi-processor, multi-domain, multi-protocol, cache coherent, speculation aware shared memory and interconnect
A device includes an interconnect and a plurality of devices connected to the interconnect. The plurality of devices includes a first interface connected to the interconnect and a second interface connected to the interconnect. The plurality of devices further includes a first memory bank connected to the interconnect and a second memory bank connected to the interconnect. The plurality of devices further includes an external memory interface connected to the interconnect and a controller configured to establish virtual channels among the plurality of devices connected to the interconnect.
Accelerated processing of streams of load-reserve requests
A processing unit for a data processing system includes a processor core that issues memory access requests and a cache memory coupled to the processor core. The cache memory includes a reservation circuit that tracks reservations established by the processor core via load-reserve requests and a plurality of read-claim (RC) state machines for servicing memory access requests of the processor core. The cache memory, responsive to receipt from the processor core of a store-conditional request specifying a store target address, allocates an RC state machine among the plurality of RC state machines to process the store-conditional request and transfers responsibility for tracking a reservation for the store target address from the reservation circuit to the RC state machine.
Configurable cache for multi-endpoint heterogeneous coherent system
A device includes a memory bank. The memory bank includes data portions of a first way group. The data portions of the first way group include a data portion of a first way of the first way group and a data portion of a second way of the first way group. The memory bank further includes data portions of a second way group. The device further includes a configuration register and a controller configured to individually allocate, based on one or more settings in the configuration register, the first way and the second way to one of an addressable memory space and a data cache.
Using insertion points to determine locations in a cache list at which to indicate tracks in a shared cache accessed by a plurality of processors
Provided are a computer program product, system, and method for using insertion points to determine locations in a cache list at which to indicate tracks in a shared cache accessed by a plurality of processors. A plurality of insertion points to a cache list for the shared cache having a least recently used (LRU) end and a most recently used (MRU) end identify tracks in the cache list. For each processor, of a plurality of processors, for which indication of tracks accessed by the processor is received, a determination is made of insertion points of the provided insertion points at which to indicate the tracks for which indication is received. The tracks are indicated at positions in the cache list with respect to the determined insertion points.
Concurrent cache lookups using partial identifiers
To perform a lookup for a group of plural portions of data in a cache together, a first part of an identifier for a first one of the portions of data in the group is compared with corresponding first parts of the identifiers for cache lines in the cache, the first part of the identifier for the first one of the portions of data in the group is compared with the corresponding first parts of the identifiers for the remaining portions of data in the group of plural portions of data, and a remaining part of the identifier for each portion of data is compared with the corresponding remaining parts of identifiers for cache lines in the cache. It is then determined whether a cache line for any of the portions of data in the group is present in the cache, based on the results of the comparisons.
SYSTEM, APPARATUS AND METHODS FOR PERFORMING SHARED MEMORY OPERATIONS
In an embodiment, an apparatus for memory access may include: a memory comprising at least one atomic memory region, and a control circuit coupled to the memory, The control circuit may be to: for each submission queue of a plurality of submission queues, identify an atomic memory location specified in a first entry of the submission queue, wherein each submission queue is to store access requests from a different requester; determine whether the atomic memory location includes existing requester information; and in response to a determination that the atomic memory location does not include existing requester information, perform an atomic operation for the atomic memory location based at least in part on the first entry of the submission queue. Other embodiments are described and claimed.