Patent classifications
G06F12/0824
DIRTY CACHE LINE WRITE-BACK TRACKING
A cache system may include a cache to store a plurality of cache lines in a write-back mode; dirty cache line counter circuitry to store a count of dirty cache lines in the cache, increment the count when a new dirty cache line is added to the cache, and decrement the count when an old dirty cache line is written-back from the cache; dirty cache line write-back tracking circuitry to store an ordering of the dirty cache lines in a write-back order; mapping circuitry to map the dirty lines into the ordering; and controller circuity to use the mapping circuity to identify an evicted dirty cache line in the ordering and remove the evicted dirty cache line from the ordering.
METHOD AND SYSTEM FOR ESTABLISHING A DISTRIBUTED NETWORK WITHOUT A CENTRALIZED DIRECTORY
A method for establishing a connection between two nodes in a communication network without use of a centralized directory or mapping identifiers includes: receiving a lookup message from another node in the communication network that includes a lookup term; determining if a target node in a local directory cache can be identified that satisfies the lookup term; and, if such a node is identified, establishing a connection to the target node and forwarding the lookup message, or, if no such node is identified, forwarding the lookup message to other nodes in the network with which the node has an active communication connection.
Cache replacement based on traversal tracking
Techniques are disclosed relating to controlling cache replacement. In some embodiments, a computing system performs multiple searches of a data structure, where one or more of the searches traverse multiple links between elements of the data structure. The system may cache, in a traversal cache, traversal information that is usable by searches to skip one or more links traversed by one or more prior searches. The system may store tracking information that indicates a location in the traversal cache at which prior traversal information for a first search is stored. The system may select, based on the tracking information, an entry in the traversal cache for new traversal information generated by the first search. The selection may override a default replacement policy for the traversal cache, e.g., to select the location in the traversal cache to replace the prior traversal information with the new traversal information.
Region based split-directory scheme to adapt to large cache sizes
Systems, apparatuses, and methods for maintaining region-based cache directories split between node and memory are disclosed. The system with multiple processing nodes includes cache directories split between the nodes and memory to help manage cache coherency among the nodes' cache subsystems. In order to reduce the number of entries in the cache directories, the cache directories track coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Each processing node includes a node-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the node. The node-based cache directory includes a reference count field in each entry to track the aggregate number of cache lines that are cached per region. The memory-based cache directory includes entries for regions which have an entry stored in any node-based cache directory of the system.
Object memory data flow instruction execution
Embodiments of the invention provide systems and methods for managing processing, memory, storage, network, and cloud computing to significantly improve the efficiency and performance of processing nodes. More specifically, embodiments of the present invention are directed to an instruction set of an object memory fabric. This object memory fabric instruction set can be used to provide a unique instruction model based on triggers defined in metadata of the memory objects. This model represents a dynamic dataflow method of execution in which processes are performed based on actual dependencies of the memory objects. This provides a high degree of memory and execution parallelism which in turn provides tolerance of variations in access delays between memory objects. In this model, sequences of instructions are executed and managed based on data access. These sequences can be of arbitrary length but short sequences are more efficient and provide greater parallelism.
Merging data for write allocate
A method includes receiving, by a level two (L2) controller, a write request for an address that is not allocated as a cache line in a L2 cache. The write request specifies write data. The method also includes generating, by the L2 controller, a read request for the address; reserving, by the L2 controller, an entry in a register file for read data returned in response to the read request; updating, by the L2 controller, a data field of the entry with the write data; updating, by the L2 controller, an enable field of the entry associated with the write data; and receiving, by the L2 controller, the read data and merging the read data into the data field of the entry.
Efficient cache eviction and insertions for sustained steady state performance
A distributed metadata cache for a distributed object store includes a plurality of cache entries, an active-cache-entry set and an unreferenced-cache-entry set. Each cache entry includes information relating to whether at least one input/output (IO) thread is referencing the cache entry and information relating to whether the cache entry is no longer referenced by at least one IO thread. Each cache entry in the active-cache-entry set includes information that indicates that at least one IO thread is actively referencing the cache entry. Each cache entry in the unreferenced-cache-entry set is eligible for eviction from the distributed metadata cache by including information that indicates that the cache entry is no longer actively referenced by an IO thread.
Devices and methods for managing network traffic for a distributed cache
A programmable switch includes ports, and circuitry to receive cache messages for a distributed cache from client devices. The cache messages are queued for sending to memory devices from the ports. Queue occupancy information is generated and sent to a controller that determines, based at least in part on the queue occupancy information, at least one of a cache message transmission rate for a client device, and one or more weights for the queues used by the programmable switch. In another aspect, the programmable switch extracts cache request information from a cache message. The cache request information indicates a cache usage and is sent to the controller, which determines, based at least in part on the extracted cache request information, at least one of a cache message transmission rate for a client device, and one or more weights for queues used in determining an order for sending cache messages.
Cache coherency for host-device systems
A cache coherency mode includes: in response to a read request from a device in the host-device system for an instance of the shared data, sending the instance of the shared data from the host device to that device; and, in response to write request from a device, storing data associated with the write request in the cache of the host device. Shared data is pinned in the cache of the host device, and is not cached in any of the other devices in the host-device system. Because there is only one cached copy of the shared data in the host-device system, the devices in that system are cache coherent.
Using a bloom filter to reduce the number of memory addressees tracked by a coherence directory
An approach for tracking data stored in caches uses a Bloom filter to reduce the number of addresses that need to be tracked by a coherence directory. When a requested address is determined to not be currently tracked by either the coherence directory or the Bloom filter, tracking of the address is initiated in the Bloom filter, but not in the coherence directory. Initiating tracking of the address in the Bloom filter includes setting hash bits in the Bloom filter so that subsequent requests for the address will “hit” the Bloom filter. When a requested address is determined to be tracked by the coherence directory, the Bloom filter is not used to track the address.