G06F12/082

Write merging on stores with different tags

Techniques for caching data are provided that include receiving, by a caching system, a write memory command for a memory address, the write memory command associated with a first color tag, determining, by a first sub-cache of the caching system, that the memory address is not cached in the first sub-cache, determining, by second sub-cache of the caching system, that the memory address is not cached in the second sub-cache, storing first data associated with the first write memory command in a cache line of the second sub-cache, storing the first color tag in the second sub-cache, receiving a second write memory command for the cache line, the write memory command associated with a second color tag, merging the second color tag with the first color tag, storing the merged color tag, and evicting the cache line based on the merged color tag.

Methods and apparatus to facilitate atomic compare and swap in cache for a coherent level 1 data cache system

Methods, apparatus, systems and articles of manufacture to facilitate atomic compare and swap in cache for a coherent level 1 data cache system are disclosed. An example system includes a cache storage; a cache controller coupled to the cache storage wherein the cache controller is operable to: receive a memory operation that specifies a key, a memory address, and a first set of data; retrieve a second set of data corresponding to the memory address; compare the second set of data to the key; based on the second set of data corresponding to the key, cause the first set of data to be stored at the memory address; and based on the second set of data not corresponding to the key, complete the memory operation without causing the first set of data to be stored at the memory address.

METHODS AND APPARATUS TO FACILITATE READ-MODIFY-WRITE SUPPORT IN A VICTIM CACHE

Methods, apparatus, systems and articles of manufacture are disclosed to facilitate read-modify-write support in a victim cache. An example apparatus includes a first storage coupled to a controller, a second storage coupled to the controller and parallel coupled to the first storage, and a storage queue coupled to the first storage, the second storage, and to the controller, the storage queue to obtain a memory operation from the controller indicating an address and a first set of data, obtain a second set of data associated with the address from at least one of the first storage and the second storage, merge the first set of data and the second set of data to produce a third set of data, and provide the third set of data for writing to at least one of the first storage and the second storage.

Snoop filter device

The invention relates to a device for use in maintaining cache coherence in a multiprocessor computing system. The snoop filter device is connectable with a plurality of cache elements, where each cache element comprises a number of cache agents. The snoop filter device comprises a plurality of snoop filter storage locations, where each snoop filter storage location is mapped to one cache element.

Methods and apparatus to facilitate fully pipelined read-modify-write support in level 1 data cache using store queue and data forwarding

Methods, apparatus, systems and articles of manufacture are disclosed to facilitate fully pipelined read-modify-write support in level 1 data cache using store queue and data forwarding. An example apparatus includes a first storage, a second storage, a store queue coupled to the first storage and the second storage, the store queue operable to receive a first memory operation specifying a first set of data, process the first memory operation for storing the first set of data in at least one of the first storage and the second storage, receive a second memory operation, and prior to storing the first set of data in the at least one of the first storage and the second storage, feedback the first set of data for use in the second memory operation.

ATOMIC OPERATIONS AND HISTOGRAM OPERATIONS IN A CACHE PIPELINE

Methods, apparatus, systems and articles of manufacture to facilitate an atomic operation and/or a histogram operation in cache pipeline are disclosed An example system includes a cache storage coupled to an arithmetic component; and a cache controller coupled to the cache storage, wherein the cache controller is operable to: receive a memory operation that specifies a set of data; retrieve the set of data from the cache storage; utilize the arithmetic component to determine a set of counts of respective values in the set of data; generate a vector representing the set of counts; and provide the vector.

Methods and apparatus to facilitate read-modify-write support in a victim cache

Methods, apparatus, systems and articles of manufacture are disclosed to facilitate read-modify-write support in a victim cache. An example apparatus includes a first storage coupled to a controller, a second storage coupled to the controller and parallel coupled to the first storage, and a storage queue coupled to the first storage, the second storage, and to the controller, the storage queue to obtain a memory operation from the controller indicating an address and a first set of data, obtain a second set of data associated with the address from at least one of the first storage and the second storage, merge the first set of data and the second set of data to produce a third set of data, and provide the third set of data for writing to at least one of the first storage and the second storage.

Distributed cache with in-network prefetch

A programmable switch receives a cache line request from a client of a plurality of clients on a network to obtain a cache line. One or more additional cache lines are identified based on the received cache line request and prefetch information. The cache line and the one or more additional cache lines are requested from one or more memory devices on the network. The requested cache line and the one or more additional cache lines are received from the one or more memory devices, and are sent to the client.

Methods and apparatus for inflight data forwarding and invalidation of pending writes in store queue

Methods, apparatus, systems and articles of manufacture are disclosed to forward and invalidate inflight data in a store queue. An example apparatus includes a cache storage, a cache controller coupled to the cache storage and operable to receive a first memory operation, determine that the first memory operation corresponds to a read miss in the cache storage, determine a victim address in the cache storage to evict in response to the read miss, issue a read-invalidate command that specifies the victim address, compare the victim address to a set of addresses associated with a set of memory operations being processed by the cache controller, and in response to the victim address matching a first address of the set of addresses corresponding to a second memory operation of the set of memory operations, provide data associated with the second memory operation.

Remote atomic operations in multi-socket systems

Disclosed embodiments relate to remote atomic operations (RAO) in multi-socket systems. In one example, a method, performed by a cache control circuit of a requester socket, includes: receiving the RAO instruction from the requester CPU core, determining a home agent in a home socket for the addressed cache line, providing a request for ownership (RFO) of the addressed cache line to the home agent, waiting for the home agent to either invalidate and retrieve a latest copy of the addressed cache line from a cache, or to fetch the addressed cache line from memory, receiving an acknowledgement and the addressed cache line, executing the RAO instruction on the received cache line atomically, subsequently receiving multiple local RAO instructions to the addressed cache line from one or more requester CPU cores, and executing the multiple local RAO instructions on the received cache line independently of the home agent.