Patent classifications
G06F15/8069
Methods and apparatus to facilitate fully pipelined read-modify-write support in level 1 data cache using store queue and data forwarding
Methods, apparatus, systems and articles of manufacture are disclosed to facilitate fully pipelined read-modify-write support in level 1 data cache using store queue and data forwarding. An example apparatus includes a first storage, a second storage, a store queue coupled to the first storage and the second storage, the store queue operable to receive a first memory operation specifying a first set of data, process the first memory operation for storing the first set of data in at least one of the first storage and the second storage, receive a second memory operation, and prior to storing the first set of data in the at least one of the first storage and the second storage, feedback the first set of data for use in the second memory operation.
ATOMIC OPERATIONS AND HISTOGRAM OPERATIONS IN A CACHE PIPELINE
Methods, apparatus, systems and articles of manufacture to facilitate an atomic operation and/or a histogram operation in cache pipeline are disclosed An example system includes a cache storage coupled to an arithmetic component; and a cache controller coupled to the cache storage, wherein the cache controller is operable to: receive a memory operation that specifies a set of data; retrieve the set of data from the cache storage; utilize the arithmetic component to determine a set of counts of respective values in the set of data; generate a vector representing the set of counts; and provide the vector.
Methods and apparatus to facilitate read-modify-write support in a victim cache
Methods, apparatus, systems and articles of manufacture are disclosed to facilitate read-modify-write support in a victim cache. An example apparatus includes a first storage coupled to a controller, a second storage coupled to the controller and parallel coupled to the first storage, and a storage queue coupled to the first storage, the second storage, and to the controller, the storage queue to obtain a memory operation from the controller indicating an address and a first set of data, obtain a second set of data associated with the address from at least one of the first storage and the second storage, merge the first set of data and the second set of data to produce a third set of data, and provide the third set of data for writing to at least one of the first storage and the second storage.
Methods and apparatus for inflight data forwarding and invalidation of pending writes in store queue
Methods, apparatus, systems and articles of manufacture are disclosed to forward and invalidate inflight data in a store queue. An example apparatus includes a cache storage, a cache controller coupled to the cache storage and operable to receive a first memory operation, determine that the first memory operation corresponds to a read miss in the cache storage, determine a victim address in the cache storage to evict in response to the read miss, issue a read-invalidate command that specifies the victim address, compare the victim address to a set of addresses associated with a set of memory operations being processed by the cache controller, and in response to the victim address matching a first address of the set of addresses corresponding to a second memory operation of the set of memory operations, provide data associated with the second memory operation.
SCALAR CORE INTEGRATION
Methods and apparatus relating to scalar core integration in a graphics processor. In an example, an apparatus comprises a processor to receive a set of workload instructions for a graphics workload from a host complex, determine a first subset of operations in the set of operations that is suitable for execution by a scalar processor complex of the graphics processing device and a second subset of operations in the set of operations that is suitable for execution by a vector processor complex of the graphics processing device, assign the first subset of operations to the scalar processor complex for execution to generate a first set of outputs, assign the second subset of operations to the vector processor complex for execution to generate a second set of outputs. Other embodiments are also disclosed and claimed.
VICTIM CACHE THAT SUPPORTS DRAINING WRITE-MISS ENTRIES
A caching system including a first sub-cache and a second sub-cache in parallel with the first sub-cache, wherein the second sub-cache includes a set of cache lines, line type bits configured to store an indication that a corresponding cache line of the set of cache lines is configured to store write-miss data, and an eviction controller configured to flush stored write-miss data based on the line type bits.
Write merging on stores with different privilege levels
A caching system including a first sub-cache, a second sub-cache, coupled in parallel with the first sub-cache, for storing write-memory commands that are not cached in the first sub-cache, the second sub-cache including privilege bits configured to store an indication that a corresponding cache line of the second sub-cache is associated with a level of privilege, and wherein the second sub-cache is further configured to receive a first write memory command for a memory address associated with a first level of privilege, store, in the second sub-cache, first data associated with the first write memory command and the level of privilege associated with the cache line, receive a second write memory command for the cache line, the second write memory command associated with a second level of privilege, merge the first level of privilege with the second level of privilege, and output the merged privilege level with the cache line.
Methods and apparatus to reduce bank pressure using aggressive write merging
Methods, apparatus, systems and articles of manufacture to reduce bank pressure using aggressive write merging are disclosed. An example apparatus includes a first cache storage; a second cache storage; a store queue coupled to at least one of the first cache storage and the second cache storage and operable to: receive a first memory operation; process the first memory operation for storing the first set of data in at least one of the first cache storage and the second cache storage; receive a second memory operation; and prior to storing the first set of data in the at least one of the first cache storage and the second cache storage, merge the first memory operation and the second memory operation.
General-purpose systolic array
A systolic array cell is described, the cell including two general-purpose arithmetic logic units (ALUs) and register-file. A plurality of the cells may be configured in a matrix or array, such that the output of the first ALU in a first cell is provided to a second cell to the right of the first cell, and the output of the second ALU in the first cell is provided to a third cell below the first cell. The two ALUs in each cell of the array allow for processing of a different instruction in each cycle.
Methods and apparatus to facilitate write miss caching in cache system
Methods, apparatus, systems and articles of manufacture to facilitate write miss caching in cache system are disclosed. An example apparatus includes a first cache storage; a second cache storage, wherein the second cache storage includes a first portion operable to store a first set of data evicted from the first cache storage and a second portion; a cache controller coupled to the first cache storage and the second cache storage and operable to: receive a write operation; determine that the write operation produces a miss in the first cache storage; and in response to the miss in the first cache storage, provide write miss information associated with the write operation to the second cache storage for storing in the second portion.