Patent classifications
G06F2212/6028
RANGE PREFETCH INSTRUCTION
In response to an instruction decoder decoding a range prefetch instruction specifying first and second address-range-specifying parameters and a stride parameter, prefetch circuitry controls, depending on the first and second address-range-specifying parameters and the stride parameter, prefetching of data from a plurality of specified ranges of addresses into the at least one cache. A start address and size of each specified range is dependent on the first and second address-range-specifying parameters. The stride parameter specifies an offset between start addresses of successive specified ranges. Use of the range prefetch instruction helps to improve programmability and improve the balance between prefetch coverage and circuit area of the prefetch circuitry.
DISPLAY ENGINE INITIATED PREFETCH TO SYSTEM CACHE TO TOLERATE MEMORY LONG BLACKOUT
A disclosed technique includes prefetching display data into a cache memory, wherein the display data includes data to be displayed on a display during a memory black-out period for a memory; triggering the memory black-out period; and during the black-out period, reading from the cache memory to obtain data to be displayed on the display.
COORDINATING DYNAMIC POWER SCALING OF AGENTS BASED ON POWER CORRELATIONS OF AGENT INSTRUCTIONS
Coordinating dynamic power scaling of agents based on power correlations of agent instructions is disclosed. A global power controller determines a first local power quantifier of a first agent executing an agent instruction of a task of a workload. The global power controller stores a correlation between the first agent executing the agent instruction and a second local power quantifier corresponding to a second agent. The global power controller subsequently determines that the first agent is executing or will execute the agent instruction. The global power controller accesses the correlation associated with the first agent executing the agent instruction and sends to the second agent a proposed power level based on the correlation.
MAPPING PARTITION IDENTIFIERS
An apparatus includes processing circuitry configured that performs data processing in response to instructions of one of a plurality of software execution environments. First stage partition identifier remapping circuitry remaps a partition identifier specified for a memory transaction by a first software execution environment to a internal partition identifier to be specified with the memory transaction issued to at least one memory system component. In response to a memory transaction to be handled, the at least one memory system component controls allocation of resources for handling the memory transaction or manage contention for the resources in dependence on a selected set of memory system component parameters selected in dependence on the internal partition identifier specified by the memory transaction. Second stage partition identifier remapping circuitry dynamically overrides the internal partition identifier to be specified with the memory transaction based on a sideband input signal and the first stage partition identifier remapping circuitry indicates, for the partition identifier, whether the second stage partition identifier remapping circuitry is to be used.
SENDING A REQUEST TO AGENTS COUPLED TO AN INTERCONNECT
An apparatus comprises an interconnect providing communication paths between agents coupled to the interconnect. A coordination agent is provided which performs an operation requiring sending a request to each of a plurality of target agents, and receiving a response from each of the target agents, the operation being unable to complete until the response has been received from each of the target agents. Storage circuitry is provided which is accessible to the coordination agent and configured to store, for each agent that the coordination agent may communicate with via the interconnect, a latency indication for communication between that agent and the coordination agent. The coordination agent is configured, prior to performing the operation, to determine a sending order in which to send the request to each of the target agents, the sending order being determined in dependence on the latency indication for each of the target agents.
LINK STACK BASED INSTRUCTION PREFETCH AUGMENTATION
A computer-implemented method of performing a link stack based prefetch augmentation using a sequential prefetching includes observing a call instruction in a program being executed, and pushing a return address onto a link stack for processing the next instruction. A stream of instructions is prefetched starting from a cached line address of the next instruction and is stored in an instruction cache.
Dedicated interface for coupling flash memory and dynamic random access memory
The present application describes embodiments of an interface for coupling flash memory and dynamic random access memory (DRAM) in a processing system. Some embodiments include a dedicated interface between a flash memory and DRAM. The dedicated interface is to provide access to the flash memory in response to instructions received over a DRAM interface between the DRAM and a processing device. Some embodiments of a method include accessing a flash memory via a dedicated interface between the flash memory and a dynamic random access memory (DRAM) in response to an instruction received over a DRAM interface between the DRAM and a processing device.
Gateway pull model
A computer system comprising: (i) a computer subsystem configured to act as a work accelerator, and (ii) a gateway connected to the computer subsystem, the gateway enabling the transfer of data to the computer subsystem from external storage at pre-compiled data exchange synchronization points attained by the computer subsystem, which act as a barrier between a compute phase and an exchange phase of the computer subsystem, wherein the computer subsystem is configured to pull data from a gateway transfer memory of the gateway in response to the pre-compiled data exchange synchronization point attained by the subsystem, wherein the gateway comprises at least one processor configured to perform at least one operation to pre-load at least some of the data from a first memory of the gateway to the gateway transfer memory in advance of the pre-compiled data exchange synchronization point attained by the subsystem.
SYSTEMS, METHODS, AND APPARATUS FOR TRANSFERRING DATA BETWEEN INTERCONNECTED DEVICES
A method for transferring data may include writing, from a producing device, data to a storage device through an interconnect, determining a consumer device for the data, prefetching the data from the storage device, and transferring, based on the determining, the data to the consumer device through the interconnect. The method may further comprise receiving, at a prefetcher for the storage device, an indication of a relationship between the producing device and the consumer device, and determining the consumer device based on the indication. The method may further comprise placing the data in a stream at the storage device based on the relationship between the producing device and the consumer device. The indication may be provided by an application associated with the consumer device. Receiving the indication may include receiving the indication through a coherent memory protocol for the interconnect.
Data prefetching method and terminal device
A data prefetching method and a terminal device are provided. The CPU core cluster is configured to deliver a data access request to a first cache of the at least one level of cache, where the data access request carries a first address, and the first address is an address of data that the CPU core cluster currently needs to access in the memory. The prefetcher in the terminal device provided in embodiments of this application may generate a prefetch-from address, and load data corresponding to the generated prefetch-from address to the first cache. When needing to access the data, the CPU core cluster can read from the first cache, without a need to read from the memory. This helps increase an operating rate of the CPU core cluster.