Patent classifications
G06F12/0824
DISTRIBUTED MEMORY-AUGMENTED NEURAL NETWORK ARCHITECTURE
A method for using a distributed memory device in a memory augmented neural network system includes receiving, by a controller, an input query to access data stored in the distributed memory device, the distributed memory device comprising a plurality of memory banks. The method further includes determining, by the controller, a memory bank selector that identifies a memory bank from the distributed memory device for memory access, wherein the memory bank selector is determined based on a type of workload associated with the input query. The method further includes computing, by the controller and by using content based access, a memory address in the identified memory bank. The method further includes generating, by the controller, an output in response to the input query by accessing the memory address.
Remote atomic operations in multi-socket systems
Disclosed embodiments relate to remote atomic operations (RAO) in multi-socket systems. In one example, a method, performed by a cache control circuit of a requester socket, includes: receiving the RAO instruction from the requester CPU core, determining a home agent in a home socket for the addressed cache line, providing a request for ownership (RFO) of the addressed cache line to the home agent, waiting for the home agent to either invalidate and retrieve a latest copy of the addressed cache line from a cache, or to fetch the addressed cache line from memory, receiving an acknowledgement and the addressed cache line, executing the RAO instruction on the received cache line atomically, subsequently receiving multiple local RAO instructions to the addressed cache line from one or more requester CPU cores, and executing the multiple local RAO instructions on the received cache line independently of the home agent.
System, apparatus and method for memory mirroring in a buffered memory architecture
In one embodiment, an apparatus includes: a first memory controller to control access to a first memory, the first memory controller including a memory mirroring circuit, in response to a memory write request from a first processor socket for which the first memory comprises a primary memory region, to cause data associated with the memory write request to be written to the first memory and to send a shadow memory write request to a second memory to cause the second memory to write the data into a secondary memory region; and a shadow memory table including a plurality of entries each to store an association between a primary memory region and a secondary memory region. The memory mirroring circuit may access the shadow memory table to identify the secondary memory region. Other embodiments are described and claimed.
Object memory data flow instruction execution
Embodiments of the invention provide systems and methods for managing processing, memory, storage, network, and cloud computing to significantly improve the efficiency and performance of processing nodes. More specifically, embodiments of the present invention are directed to an instruction set of an object memory fabric. This object memory fabric instruction set can be used to provide a unique instruction model based on triggers defined in metadata of the memory objects. This model represents a dynamic dataflow method of execution in which processes are performed based on actual dependencies of the memory objects. This provides a high degree of memory and execution parallelism which in turn provides tolerance of variations in access delays between memory objects. In this model, sequences of instructions are executed and managed based on data access. These sequences can be of arbitrary length but short sequences are more efficient and provide greater parallelism.
Accelerator sharing
Disclosed aspects relate to accelerator sharing among a plurality of processors through a plurality of coherent proxies. The cache lines in a cache associated with the accelerator are allocated to one of the plurality of coherent proxies. In a cache directory for the cache lines used by the accelerator, the status of the cache lines and the identification information of the coherent proxies to which the cache lines are allocated are provided. Each coherent proxy maintains a shadow directory of the cache directory for the cache lines allocated to it. In response to receiving an operation request, a coherent proxy corresponding to the request is determined. The accelerator communicates with the determined coherent proxy for the request.
Adjusting insertion points used to determine locations in a cache list at which to indicate tracks based on number of tracks added at insertion points
Provide a computer program product, system, and method for adjusting insertion points used to determine locations in a cache list at which to indicate tracks based on number of tracks added at insertion points. There are a plurality of insertion points to a cache list for the cache having a least recently used (LRU) end and a most recently used (MRU) end. Each insertion point of the insertion points identifies a track in the cache list. A plurality of tracks are indicated at positions in the cache list with respect to insertion points. For each track indicated at an insertion point of the insertion points, at least one insertion point counter for at least one insertion point with respect to the insertion point at which the track is indicated is incremented. A plurality of the insertion points are adjusted to point to different tracks in the cache list based on insertion point counters for the insertion points.
USING A BLOOM FILTER TO REDUCE THE NUMBER OF MEMORY ADDRESSEES TRACKED BY A COHERENCE DIRECTORY
An approach for tracking data stored in caches uses a Bloom filter to reduce the number of addresses that need to be tracked by a coherence directory. When a requested address is determined to not be currently tracked by either the coherence directory or the Bloom filter, tracking of the address is initiated in the Bloom filter, but not in the coherence directory. Initiating tracking of the address in the Bloom filter includes setting hash bits in the Bloom filter so that subsequent requests for the address will “hit” the Bloom filter. When a requested address is determined to be tracked by the coherence directory, the Bloom filter is not used to track the address.
MANAGING LEAST RECENTLY USED CACHE USING REDUCED MEMORY FOOTPRINT SEQUENCE CONTAINER
Techniques are provided for managing a least recently used cache using a linked list with a reduced memory footprint. A cache manager receives an I/O request comprising a target address, wherein the cache manager manages a cache memory having a maximum allocated amount of cache entries, and a linked list having a maximum allocated amount of list elements which is less than the maximum allocated amount of cache entries. If the target address does correspond to a cache entry, the cache manager accesses the cache entry to obtain the cache data from cache memory, removes a list element from the linked list, which corresponds to the accessed cache entry, selects an existing cache entry which currently does not have a corresponding list element in the linked list, and adds a list element to a head position of the linked list which corresponds to the selected cache entry.
Symmetrical multi-processing node
A symmetrical multi-processing (SMP) node, a distributed SMP (DSMP) system comprising a plurality of SMP nodes, and a method implemented in the SMP node are disclosed. The SMP node comprises: a plurality of processors, a memory coupled to the plurality of processors, and a memory coherent proxy coupled to the plurality of processors through a coherent accelerator interface. The memory coherent proxy is configured to manage statuses of cache lines in the memory.
Light-weight memory expansion in a coherent memory system
Systems, methods, and port controller designs employ a light-weight memory protocol. A light-weight memory protocol controller is selectively coupled to a Cache Coherent Interconnect for Accelerators (CCIX) port. Over an on-chip interconnect fabric, the light-weight protocol controller receives memory access requests from a processor and, in response, transmits associated memory access requests to an external memory through the CCIX port using only a proper subset of CCIX protocol memory transactions types including non-cacheable transactions and non-snooping transactions. The light-weight memory protocol controller is selectively uncoupled from the CCIX port and a remote coherent slave controller is coupled in its place. The remote coherent slave controller receives memory access requests and, in response, transmits associated memory access requests to a memory module through the CCIX port using cacheable CCIX protocol memory transaction types.