G06F2212/301

Methods and apparatus for allocation in a victim cache system

Methods, apparatus, systems and articles of manufacture are disclosed for allocation in a victim cache system. An example apparatus includes a first cache storage, a second cache storage, a cache controller coupled to the first cache storage and the second cache storage and operable to receive a memory operation that specifies an address, determine, based on the address, that the memory operation evicts a first set of data from the first cache storage, determine that the first set of data is unmodified relative to an extended memory, and cause the first set of data to be stored in the second cache storage.

MECHANISM FOR ISSUING REQUESTS TO AN ACCELERATOR FROM MULTIPLE THREADS
20200218568 · 2020-07-09 ·

An apparatus is described having multiple cores, each core having: a) a CPU; b) an accelerator; and, c) a controller and a plurality of order buffers coupled between the CPU and the accelerator. Each of the order buffers is dedicated to a different one of the CPU's threads. Each one of the order buffers is to hold one or more requests issued to the accelerator from its corresponding thread. The controller is to control issuance of the order buffers' respective requests to the accelerator.

Coprocessor Operation Bundling
20200218540 · 2020-07-09 ·

In an embodiment, a processor includes a buffer in an interface unit. The buffer may be used to accumulate coprocessor instructions to be transmitted to a coprocessor. In an embodiment, the processor issues the coprocessor instructions to the buffer when ready to be issued to the coprocessor. The interface unit may accumulate the coprocessor instructions in the buffer, generating a bundle of instructions. The bundle may be closed based on various predetermined conditions and then the bundle may be transmitted to the coprocessor. If a sequence of coprocessor instructions appears consecutively in a program, the rate at which the instructions are provided to the coprocessor (on average) at least matches the rate at which the coprocessor consumes the instructions, in an embodiment.

METHODS AND APPARATUS TO REDUCE BANK PRESSURE USING AGGRESSIVE WRITE MERGING

Methods, apparatus, systems and articles of manufacture to reduce bank pressure using aggressive write merging are disclosed. An example apparatus includes a first cache storage; a second cache storage; a store queue coupled to at least one of the first cache storage and the second cache storage and operable to: receive a first memory operation; process the first memory operation for storing the first set of data in at least one of the first cache storage and the second cache storage; receive a second memory operation; and prior to storing the first set of data in the at least one of the first cache storage and the second cache storage, merge the first memory operation and the second memory operation.

Dynamic acceleration of data processor operations using data-flow analysis

A method and apparatus are provided for dynamically determining when an operation, specified by one or more instructions in a data processing system, is suitable for accelerated execution. Data indicators are maintained, for data registers of the system, that indicate when data-flow from a register derives from a restricted source. In addition, instruction predicates are provided for instructions to indicate which instructions are capable of accelerated execution. From the data indicators and the instruction predicates, the microarchitecture of the data processing system determines, dynamically, when the operation is a thread-restricted function and suitable for accelerated execution in a hardware accelerator. The thread-restricted function may be executed on a hardware processor, such as a vector, neuromorphic or other processor.

MANAGING LOW-LEVEL INSTRUCTIONS AND CORE INTERACTIONS IN MULTI-CORE PROCESSORS

Managing the messages associated with memory pages stored in a main memory includes: receiving a message from outside the pipeline, and providing at least one low-level instruction to the pipeline for performing an operation indicated by the received message. Executing instructions in the pipeline includes: executing a series of low-level instructions in the pipeline, where the series of low-level instructions includes a first (second) set of low-level instructions converted from a first (second) high-level instruction. The second high-level instruction occurs after the first high-level instruction within a series of high-level instructions, and delaying insertion of the low-level instruction provided for performing the operation into an insertion position within the series of low-level instructions, where the delaying causes the insertion position to be between a final low-level instruction converted from the first high-level instruction and an initial low-level instruction converted from the second high-level instruction.

PREFETCH KILL AND REVIVAL IN AN INSTRUCTION CACHE

A system comprises a processor including a CPU core, first and second memory caches, and a memory controller subsystem. The memory controller subsystem speculatively determines a hit or miss condition of a virtual address in the first memory cache and speculatively translates the virtual address to a physical address. Associated with the hit or miss condition and the physical address, the memory controller subsystem configures a status to a valid state. Responsive to receipt of a first indication from the CPU core that no program instructions associated with the virtual address are needed, the memory controller subsystem reconfigures the status to an invalid state and, responsive to receipt of a second indication from the CPU core that a program instruction associated with the virtual address is needed, the memory controller subsystem reconfigures the status back to a valid state.

ATOMIC COMPARE AND SWAP IN A COHERENT CACHE SYSTEM

Methods, apparatus, systems and articles of manufacture to facilitate atomic compare and swap in cache for a coherent level 1 data cache system are disclosed. An example system includes a cache storage; a cache controller coupled to the cache storage wherein the cache controller is operable to: receive a memory operation that specifies a key, a memory address, and a first set of data; retrieve a second set of data corresponding to the memory address; compare the second set of data to the key; based on the second set of data corresponding to the key, cause the first set of data to be stored at the memory address; and based on the second set of data not corresponding to the key, complete the memory operation without causing the first set of data to be stored at the memory address.

VECTOR PROCESSOR STORAGE

A method comprising: receiving, at a vector processor, a request to store data; performing, by the vector processor, one or more transforms on the data; and directly instructing, by the vector processor, one or more storage device to store the data; wherein performing one or more transforms on the data comprises: erasure encoding the data to generate n data fragments configured such that any k of the data fragments are usable to regenerate the data, where k is less than n; and wherein directly instructing one or more storage device to store the data comprises: directly instructing the one or more storage devices to store the plurality of data fragments.

PERSISTENT STORAGE DEVICE MANAGEMENT

A method comprising: receiving a request to write data at a virtual location; writing the data to a physical location on a persistent storage device; and recording a mapping from the virtual location to the physical location; wherein the physical location corresponds to a next free block in a sequence of blocks on the persistent storage device.