G06F2212/27

PERFORMING STORAGE-FREE INSTRUCTION CACHE HIT PREDICTION IN A PROCESSOR

Performing storage-free instruction cache hit prediction is disclosed herein. In some aspects, a processor comprises an instruction cache hit prediction circuit that is configured to detect that a first access by a branch predictor circuit to a branch target buffer (BTB) for a first instruction in an instruction stream results in a miss on the BTB. In response to detecting the miss, the instruction cache hit prediction circuit is further configured to generate a first instruction cache prefetch request for the first instruction. The instruction cache hit prediction circuit is also configured to transmit the first instruction cache prefetch request to a prefetcher circuit.

SELECTION OF VARIABLE MEMORY-ACCESS SIZE

A method for dynamically selecting a size of a memory access may be provided. The method comprises accessing blocks having a variable number of consecutive cache lines, maintaining a vector with entries of past utilizations for each block size, and adapting said block size before a next access to the blocks.

Processing Node, Computer System, and Transaction Conflict Detection Method
20180373634 · 2018-12-27 ·

A processing node, a computer system, and a transaction conflict detection method, where the processing node includes a processor and a transactional cache. When obtaining a first operation instruction in a transaction for accessing shared data, the processor accesses the transactional cache for caching shared data of a transaction processed by the processing node. If the transactional cache determines that the first operation instruction fails to hit a cache line in the transactional cache, the transactional cache sends a first destination address in the operation instruction to a transactional cache in another processing node. After receiving status information of a cache line hit by the first destination address from the other processing node, the transactional cache determines, based on the received status information, whether the first operation instruction conflicts with a second operation instruction executed by the other processing node.

DESCRIPTOR CACHE EVICTION FOR MULTI-QUEUE DIRECT MEMORY ACCESS
20240330191 · 2024-10-03 · ·

Evicting queues from a memory of a direct memory access system includes monitoring a global eviction timer. From a plurality of descriptor lists stored in a plurality of entries of a cache memory, a set of candidate descriptor lists is determined. The set of candidate descriptor lists includes one or more of the plurality of descriptor lists in a prefetch only state. An eviction event can be detected by detecting a first eviction condition including a state of the global eviction timer and a second eviction condition. In response to detecting the eviction event, a descriptor list from the set of candidate descriptor lists is selected for eviction. The selected descriptor list can be evicted from the cache memory.

PROCESSOR WITH SELECTIVE DATA STORAGE OPERABLE AS EITHER VICTIM CACHE DATA STORAGE OR ACCELERATOR MEMORY AND HAVING VICTIM CACHE TAGS IN LOWER LEVEL CACHE
20180267898 · 2018-09-20 ·

A first data storage holds cache lines, an accelerator has a second data storage that selectively holds accelerator data and cache lines evicted from the first data storage, a tag directory holds tags for cache lines stored in the first and second data storages, and a mode indicator indicates whether the second data storage is operating in a first or second mode in which it respectively holds cache lines evicted from the first data storage or accelerator data. In response to a request to evict a cache line from the first data storage, in the first mode the control logic writes the cache line to the second data storage and updates a tag in the tag directory to indicate the cache line is present in the second data storage, and in the second mode the control logic instead writes the cache line to a system memory.

NO-LOCALITY HINT VECTOR MEMORY ACCESS PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS
20180225217 · 2018-08-09 · ·

A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode a no-locality hint vector memory access instruction. The no-locality hint vector memory access instruction to indicate a packed data register of the plurality of packed data registers that is to have a source packed memory indices. The source packed memory indices to have a plurality of memory indices. The no-locality hint vector memory access instruction is to provide a no-locality hint to the processor for data elements that are to be accessed with the memory indices. The processor also includes an execution unit coupled with the decode unit and the plurality of packed data registers. The execution unit, in response to the no-locality hint vector memory access instruction, is to access the data elements at memory locations that are based on the memory indices.

NO-LOCALITY HINT VECTOR MEMORY ACCESS PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS
20180225218 · 2018-08-09 · ·

A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode a no-locality hint vector memory access instruction. The no-locality hint vector memory access instruction to indicate a packed data register of the plurality of packed data registers that is to have a source packed memory indices. The source packed memory indices to have a plurality of memory indices. The no-locality hint vector memory access instruction is to provide a no-locality hint to the processor for data elements that are to be accessed with the memory indices. The processor also includes an execution unit coupled with the decode unit and the plurality of packed data registers. The execution unit, in response to the no-locality hint vector memory access instruction, is to access the data elements at memory locations that are based on the memory indices.

Microprocessor that prevents same address load-load ordering violations using physical address proxies

A L2 set associative cache that is inclusive of an L1 cache. Each entry of a load queue holds a load physical address proxy (PAP) for a load physical memory line address (PMLA) rather than the load PMLA itself. The load PAP comprises the set index and the way that uniquely identifies the L2 entry that holds a memory line specified by the load PMLA. Each load queue entry indicates whether the load instruction has completed execution. The microprocessor removes a memory line at a removal PMLA from an L2 entry and forms a removal PAP as a proxy for the removal PMLA. The removal PAP comprises a set index and a way that uniquely identifies the removed entry. The microprocessor snoops the load queue with the removal PAP to determine whether the removal PAP matches one or more load PAPs in one or more load queue entries associated with one or more load instructions that have completed execution and, if so, signals an abort request.

NO-LOCALITY HINT VECTOR MEMORY ACCESS PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS
20170300420 · 2017-10-19 · ·

A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode a no-locality hint vector memory access instruction. The no-locality hint vector memory access instruction to indicate a packed data register of the plurality of packed data registers that is to have a source packed memory indices. The source packed memory indices to have a plurality of memory indices. The no-locality hint vector memory access instruction is to provide a no-locality hint to the processor for data elements that are to be accessed with the memory indices. The processor also includes an execution unit coupled with the decode unit and the plurality of packed data registers. The execution unit, in response to the no-locality hint vector memory access instruction, is to access the data elements at memory locations that are based on the memory indices.

MEMORY FOR A NEURAL NETWORK PROCESSING SYSTEM
20250045120 · 2025-02-06 · ·

A method may include partitioning a tensor that includes a plurality of data values into a number (K) of subtensors, where each of the K subtensors may include a respective subset of the plurality of data values. The method may also include retrieving one or more first data values of the subset of data values included in a first subtensor of the K subtensors in accordance with an access pattern associated with a neural network processor. The method may include storing the one or more first data values of the subset of data values in one of K segments of cache memory, where each of the K segments may be associated with a respective one of the K subtensors. Further, the method may include processing, using the neural network processor, the one or more first data values of the subset of data values in accordance with the access pattern.