Patent classifications
G06F12/0835
Cache coherent acceleration function virtualization
The embodiments herein describe a virtualization framework for cache coherent accelerators where the framework incorporates a layered approach for accelerators in their interactions between a cache coherent protocol layer and the functions performed by the accelerator. In one embodiment, the virtualization framework includes a first layer containing the different instances of accelerator functions (AFs), a second layer containing accelerator function engines (AFE) in each of the AFs, and a third layer containing accelerator function threads (AFTs) in each of the AFEs. Partitioning the hardware circuitry using multiple layers in the virtualization framework allows the accelerator to be quickly re-provisioned in response to requests made by guest operation systems or virtual machines executing in a host. Further, using the layers to partition the hardware permits the host to re-provision sub-portions of the accelerator while the remaining portions of the accelerator continue to operate as normal.
CACHE COHERENCE VALIDATION USING DELAYED FULFILLMENT OF L2 REQUESTS
Methods and systems for validating cache coherence in a data processing system are described. A processing element may detect a load instruction requesting the processing element to transfer data from a global memory location to a local memory location. The processing element may apply, in response to detecting the load instruction requesting the processing element to transfer data from the global memory location to the local memory location, a delay to the transfer of the data from the global memory location to the local memory location. The processing element may execute the load instruction and transferring the data from the global memory location to the local memory location with the applied delay. The processing element may validate, in response to executing the load instruction and transferring the data with the applied delay, a cache coherence of the data processing system.
INTEGRATED CIRCUIT AND METHOD FOR EXECUTING CACHE MANAGEMENT OPERATION
An integrated circuit and a method for executing a cache management operation are provided. The integrated circuit includes a master interface, a slave interface, and a link. The link is connected between the master interface and the slave interface, and the link includes an A-channel, a B-channel, a C-channel, a D-channel, and an E-channel. The A-channel is configured to transmit a cache management operation message of the master interface to the slave interface, and the cache management operation message is configured to manage data consistency between different data caches. The D-channel is configured to transmit a cache management operation acknowledgement message of the slave interface to the master interface.
SYSTEMS AND METHODS FOR TRANSFORMING DATA IN-LINE WITH READS AND WRITES TO COHERENT HOST-MANAGED DEVICE MEMORY
The disclosed computer-implemented method may include (1) receiving, from an external host processor via a cache-coherent interconnect, a request to access a host address of a coherent memory space of the external host processor, (2) when the request is to read data from the host address, (a) performing an in-line transformation on the data to generate second data and (b) writing the second data to the physical address of the device-attached physical memory mapped to the host address, and (3) when the request is to read data from the host address, (a) reading the data from the physical address of the device-attached physical memory mapped to the host address, (b) performing a reversing in-line transformation on the data to generate second data, and (c) returning the second data to the external host processor via the cache-coherent interconnect. Various other methods, systems, and computer-readable media are also disclosed.
Allocation of a Buffer Located in System Memory into a Cache Memory
A non-transitory computer-readable medium is disclosed, the medium having instructions stored thereon that are executable by a computer system to perform operations that may include allocating a plurality of storage locations in a system memory of the computer system to a buffer. The operations may further include selecting a particular order for allocating the plurality of storage locations into a cache memory circuit. This particular order may increase a uniformity of cache miss rates in comparison to a linear order. The operations may also include caching subsets of the plurality of storage locations of the buffer using the particular order.
LOW LATENCY HOST PROCESSOR TO COHERENT DEVICE INTERACTION
In a computer system, a processor and an I/O device controller communicate with each other via a coherence interconnect and according to a cache coherence protocol. Registers of the I/O device controllers are mapped to the cache coherent memory space to allow the processor to treat the registers as cacheable memory. As a result, latency of processor commands executed by the I/O device controller is decreased, and size of data stored in the I/O device controller that can be accessed by the processor is increased from the size of a single register to the size of an entire cache line.
I/O Agent
Techniques are disclosed relating to an I/O agent circuit of a computer system. The I/O agent circuit may receive, from a peripheral component, a set of transaction requests to perform a set of read transactions that are directed to one or more of a plurality of cache lines. The I/O agent circuit may issue, to a first memory controller circuit configured to manage access to a first one of the plurality of cache lines, a request for exclusive read ownership of the first cache line such that data of the first cache line is not cached outside of the memory and the I/O agent circuit in a valid state. The I/O agent circuit may receive exclusive read ownership of the first cache line, including receiving the data of the first cache line. The I/O agent circuit may then perform the set of read transactions with respect to the data.
INTERLEAVED CACHE PREFETCHING
A method includes receiving, at a direct memory access (DMA) controller of a memory device, a first command from a first cache controller coupled to the memory device to prefetch first data from the memory device and sending the prefetched first data, in response to receiving the first command, to a second cache controller coupled to the memory device. The method can further include receiving a second command from a second cache controller coupled to the memory device to prefetch second data from the memory device, and sending the prefetched second data, in response to receiving the second command, to a third cache controller coupled to the memory device
STARVATION MITIGATION FOR ASSOCIATIVE CACHE DESIGNS
Methods and apparatus for starvation mitigation for associative cache designs. A memory controller employs an associative cache to cache physical page addresses and logic to monitor a level of cache contention. When the contention reaches a critical level where QoS can’t be guaranteed, a backpressure mechanism is triggered by cache contention mitigation logic to prevent new memory access commands from a host from entering a command pipeline. The mitigation logic maintains the backpressure until the monitoring logic indicates that the contention has resolved. The levels of contention that triggers and releases the backpressure may be set using configurable control registers. A starvation counter is incremented when a cache slot cannot be allocated for a command and decremented when a replayed command is allocated a slot. A starvation count is evaluated to determine when backpressure should be triggered and released.
SYSTEMS AND METHODS FOR PRE-PROCESSING AND POST-PROCESSING COHERENT HOST-MANAGED DEVICE MEMORY
The disclosed computer-implemented method may include receiving, from a host via a cache-coherent interconnect, a request to access an address of a coherent memory space of the host. When the request is to write data, the computer-implemented method may include (1) performing, after receiving the data, a post-processing operation on the data to generate post-processed data and (2) writing the post-processed data to a physical address of a device-attached physical memory mapped to the address. When the request is to read data, the computer-implemented method may include (1) reading the data from the physical address of a device-attached physical memory mapped to the address, (2) performing, before responding to the request, a pre-processing operation on the data to generate pre-processed data, and (3) returning the pre-processed data to the external host via the cache-coherent interconnect. Various other methods, systems, and computer-readable media are also disclosed.