G06F12/0824

Distributed memory-augmented neural network architecture

A method for using a distributed memory device in a memory augmented neural network system includes receiving, by a controller, an input query to access data stored in the distributed memory device, the distributed memory device comprising a plurality of memory banks. The method further includes determining, by the controller, a memory bank selector that identifies a memory bank from the distributed memory device for memory access, wherein the memory bank selector is determined based on a type of workload associated with the input query. The method further includes computing, by the controller and by using content based access, a memory address in the identified memory bank. The method further includes generating, by the controller, an output in response to the input query by accessing the memory address.

Ternary content addressable memory-enhanced cache coherency acceleration

A system and method for cache coherency within multiprocessor environments is provided. Each node controller of a plurality of nodes within a multiprocessor system receives a cache coherency protocol request from local processor sockets and other node controller(s). A ternary content addressable memory (TCAM) accelerator in the node controller determines if the cache coherency protocol request comprises a snoop request and, if it is determined to be a snoop request, searching the TCAM based on an address within the cache coherency protocol request. In response to detecting only one match between an entry of the TCAM and the received snoop request, sending a response to the requesting local processor a response without having to access a coherency directory.

Object memory instruction set
11782601 · 2023-10-10 · ·

Embodiments of the present invention are directed to an instruction set of an object memory fabric. This object memory fabric instruction set can be used to define arbitrary, parallel functionality such as: direct object address manipulation and generation without the overhead of complex address translation and software layers to manage differing address space; direct object authentication with no runtime overhead that can be set based on secure 3rd party authentication software; object related memory computing in which, as objects move between nodes, the computing can move with them; and parallelism that is dynamically and transparent based on scale and activity. These instructions are divided into three conceptual classes: memory reference including load, store, and special memory fabric instructions; control flow including fork, join, and branches; and execute including arithmetic and comparison instructions.

Home agent based cache transfer acceleration scheme

Systems, apparatuses, and methods for implementing a speculative probe mechanism are disclosed. A system includes at least multiple processing nodes, a probe filter, and a coherent slave. The coherent slave includes an early probe cache to cache recent lookups to the probe filter. The early probe cache includes entries for regions of memory, wherein a region includes a plurality of cache lines. The coherent slave performs parallel lookups to the probe filter and the early probe cache responsive to receiving a memory request. An early probe is sent to a first processing node responsive to determining that a lookup to the early probe cache hits on a first entry identifying the first processing node as an owner of a first region targeted by the memory request and responsive to determining that a confidence indicator of the first entry is greater than a threshold.

Low latency host processor to coherent device interaction

In a computer system, a processor and an I/O device controller communicate with each other via a coherence interconnect and according to a cache coherence protocol. Registers of the I/O device controllers are mapped to the cache coherent memory space to allow the processor to treat the registers as cacheable memory. As a result, latency of processor commands executed by the I/O device controller is decreased, and size of data stored in the I/O device controller that can be accessed by the processor is increased from the size of a single register to the size of an entire cache line.

Managing least recently used cache using reduced memory footprint sequence container

Techniques are provided for managing a least recently used cache using a linked list with a reduced memory footprint. A cache manager receives an I/O request comprising a target address, wherein the cache manager manages a cache memory having a maximum allocated amount of cache entries, and a linked list having a maximum allocated amount of list elements which is less than the maximum allocated amount of cache entries. If the target address does correspond to a cache entry, the cache manager accesses the cache entry to obtain the cache data from cache memory, removes a list element from the linked list, which corresponds to the accessed cache entry, selects an existing cache entry which currently does not have a corresponding list element in the linked list, and adds a list element to a head position of the linked list which corresponds to the selected cache entry.

PREEMPTIVE TRACKING OF REMOTE REQUESTS FOR DECENTRALIZED HOT CACHE LINE FAIRNESS TRACKING

Embodiments are for preemptive tracking of remote requests for decentralized hot cache line fairness tracking. Authority is requested for a cache line in conjunction with querying for outstanding requests for the cache line. One or more responses are received regarding the outstanding requests for the cache line. In response to receiving the one or more responses regarding the outstanding requests and in advance of receiving the authority for the cache line, the outstanding requests are preemptively tracked in a requested structure associated with the cache line.

CASTOUT HANDLING IN A DISTRIBUTED CACHE TOPOLOGY

Castout handling in a distributed cache topology, including: detecting, by a first cache of a plurality of caches, a cache miss; providing, by the first cache to each other cache of the plurality of caches, a message comprising: data indicating a cache address corresponding to the cache miss; and data indicating a cache line to be evicted.

APPLICATION OF A DEFAULT SHARED STATE CACHE COHERENCY PROTOCOL
20230281127 · 2023-09-07 ·

Example implementations relate to cache coherency protocols as applied to a memory block range. Exclusive ownership of a range of blocks of memory in a default shared state may be tracked by a directory. The directory may be associated with a first processor of a set of processors. When a request is received from a second processor of the set of processors to read one or more blocks of memory absent from the directory, one or more blocks may be transmitted in the default shared state to the second processor. The blocks absent from the directory may not be tracked in the directory.

MEMORY ACCESS METHOD AND SERVER FOR PERFORMING THE SAME
20230144656 · 2023-05-11 ·

Provided are a memory access method and a server for performing the same. A memory access method performed by an optical interleaver included in a server includes receiving a request message from a requester processing engine included in the server, setting receiving buffers corresponding to different wavelengths corresponding to the number of external memory/storage devices connected to the server, multiplexing the same request message at the different wavelengths according to a wavelength division multiplexing (WDM) scheme, and transmitting the multiplexed request messages to the respective external memory/storage devices, wherein an address of a virtual memory managed by the server is separated and stored according to an interleaving scheme by a responder included in each of the external memory/storage devices.