Patent classifications
G06F13/4013
Executing memory requests out of order
An on-chip cache is described which receives memory requests and in the event of a cache miss, the cache generates memory requests to a lower level in the memory hierarchy (e.g. to a lower level cache or an external memory). Data returned to the on-chip cache in response to the generated memory requests may be received out-of-order. An instruction scheduler in the on-chip cache stores pending received memory requests and effects the re-ordering by selecting a sequence of pending memory requests for execution such that pending requests relating to an identical cache line are executed in age order and pending requests relating to different cache lines are executed in an order dependent upon when data relating to the different cache lines is returned. The memory requests which are received may be received from another, lower level on-chip cache or from registers.
Executing Memory Requests Out of Order
An on-chip cache is described which receives memory requests and in the event of a cache miss, the cache generates memory requests to a lower level in the memory hierarchy (e.g. to a lower level cache or an external memory). Data returned to the on-chip cache in response to the generated memory requests may be received out-of-order. An instruction scheduler in the on-chip cache stores pending received memory requests and effects the re-ordering by selecting a sequence of pending memory requests for execution such that pending requests relating to an identical cache line are executed in age order and pending requests relating to different cache lines are executed in an order dependent upon when data relating to the different cache lines is returned. The memory requests which are received may be received from another, lower level on-chip cache or from registers.
Data processing apparatus and method for handling stalled data
There is provided a data processing apparatus and method. The data processing apparatus comprises a plurality of processing elements connected via a network arranged on a single chip to form a spatial architecture. Each processing element comprising processing circuitry to perform processing operations and memory control circuitry to perform data transfer operations and to issue data transfer requests for requested data to the network. The memory control circuitry is configured to monitor the network to retrieve the requested data from the network. Each processing element is further provided with local storage circuitry comprising a plurality of local storage sectors to store data associated with the processing operations, and auxiliary memory control circuitry to monitor the network to detect stalled data (S60). The auxiliary memory control circuitry is configured to transfer the stalled data from the network to an auxiliary storage buffer (S66) dynamically selected from amongst the plurality of local storage sectors (S64).
Adaptive reordering technique for efficient flit packaging and performance optimizations
A system comprising an interface between a host and a device, wherein the interface is configured to reorder messages to package flits to reduce or eliminate underutilized bandwidth in one or both directions of a bidirectional link. In one example, the interface is in accordance with the CXL specification, and the host and the device (e.g., a memory device) include CXL-compliant controllers to pack and unpack flits.