G06F9/3802

SYSTEMS AND METHODS FOR PERFORMING NEURAL NETWORK OPERATIONS
20230205533 · 2023-06-29 ·

A method for retrieving neural network coefficients may include executing neural network operations and storing, in at least one data memory, one or more intermediate results of the neural network operations. The method may also include retrieving, in an iterative manner, subsets of neural network coefficients related to a particular layer of a neural network associated with at least one of the neural network processors. Different ones of the neural network processors may use at least one of the subsets of the neural network coefficients. The retrieving the subsets of neural network coefficients may include caching the subsets in coefficient cache memory. At least some of the subsets may be cached in the coefficient cache memory for up to a first duration, and at least some of the intermediate results may be stored in the at least one data memory for a duration that exceeds the first duration.

NEXT LINE PREFETCHERS EMPLOYING INITIAL HIGH PREFETCH PREDICTION CONFIDENCE STATES FOR THROTTLING NEXT LINE PREFETCHES IN A PROCESSOR-BASED SYSTEM
20170371790 · 2017-12-28 ·

Next line prefetchers employing initial high prefetch prediction confidence states for throttling next line prefetches in processor-based system are disclosed. Next line prefetcher prefetches a next memory line into cache memory in response to read operation. To mitigate prefetch mispredictions, next line prefetcher is throttled to cease prefetching after prefetch prediction confidence state becomes a no next line prefetch state indicating number of incorrect predictions. Instead of initial prefetch prediction confidence state being set to no next line prefetch state, which is built up in response to correct predictions before performing a next line prefetch, initial prefetch prediction confidence state is set to next line prefetch state to allow next line prefetching. Thus, next line prefetcher starts prefetching next lines before requiring correct predictions to be “built up” in prefetch prediction confidence state. CPU performance may be increased, because prefetching begins sooner rather than waiting for correct predictions to occur.

EFFECTIVENESS AND PRIORITIZATION OF PREFETECHES

A method, system, and computer program product are provided for prioritizing prefetch instructions. The method includes a processor issuing a prefetch instruction and fetching elements from a cache that can include a memory or a higher level cache. The processor stores the elements in temporary storage and monitors for accesses by an instruction. The processor stores a record representing the prefetch instruction. The processor updates the record with an indicator and issues a new prefetch instruction by comparing the new prefetch instruction to the record, based on the new prefetch instruction matching the prefetch instruction, assigning the indicator to the new prefetch instruction as a priority value, based on the new prefetch instruction not matching the prefetch instruction, assigning a default value to the new prefetch instruction as the priority value, and determining whether to execute the new prefetch instruction, based on the priority value of the new prefetch instruction.

METHODS AND APPARATUS FOR DECODING PROGRAM INSTRUCTIONS

Aspects of the present disclosure relate to an apparatus comprising fetch circuitry. The fetch circuitry comprises a pointer-based fetch queue for queuing processing instructions retrieved from a storage, and pointer storage for storing a pointer identifying a current fetch queue element. The apparatus comprises decode circuitry having a plurality of decode units, and fetch queue extraction circuitry to, based on the pointer, extract the content of a plurality of elements of the fetch queue; apply combinatorial logic to speculatively produce, from the content of said fetch queue entries, a plurality of speculative potential instructions; and transmit each speculative potential instruction to a corresponding one of said decode units. Each decode unit is configured to decode the corresponding speculative potential instruction. The instruction extraction circuitry is configured to extract a subset of said plurality of speculative potential instructions, and transmit said determined subset to pipeline component circuitry.

Split-level history buffer in a computer processing unit

A split level history buffer in a central processing unit is provided. A first instruction and a second instruction are fetched, tagged, and the first instruction is stored an entry of a register file. The first instruction is evicted from the entry and the second instruction is stored in the entry. If the first instruction is evicted, then the first instruction is stored in a first portion of a history buffer. If a result for the first instruction is generated, then the first instruction is moved to a second portion of the history buffer and the result is stored with the first instruction in the second portion of the history buffer. If it is determined that a third instruction evicts the second instruction from the entry, then the second instruction is stored in the first portion of the history buffer.

Way predictor and enable logic for instruction tightly-coupled memory and instruction cache
11687342 · 2023-06-27 · ·

Disclosed herein are systems and method for instruction tightly-coupled memory (iTIM) and instruction cache (iCache) access prediction. A processor may use a predictor to enable access to the iTIM or the iCache and a particular way (a memory structure) based on a location state and program counter value. The predictor may determine whether to stay in an enabled memory structure, move to and enable a different memory structure, or move to and enable both memory structures. Stay and move predictions may be based on whether a memory structure boundary crossing has occurred due to sequential instruction processing, branch or jump instruction processing, branch resolution, and cache miss processing. The program counter and a location state indicator may use feedback and be updated each instruction-fetch cycle to determine which memory structure(s) needs to be enabled for the next instruction fetch.

Livelock Recovery Circuit
20170364363 · 2017-12-21 ·

Livelock recovery circuits configured to detect livelock in a processor, and cause the processor to transition to a known safe state when livelock is detected. The livelock recovery circuits include detection logic configured to detect that the processor is in livelock when the processor has illegally repeated an instruction; and transition logic configured to cause the processor to transition to a safe state when livelock has been detected by the detection logic.

Livelock Detection in a Hardware Design Using Formal Evaluation Logic
20170364609 · 2017-12-21 ·

A hardware monitor arranged to detect livelock in a hardware design for an integrated circuit. The hardware monitor includes monitor and detection logic configured to detect when a particular state has occurred in an instantiation of the hardware design; and assertion evaluation logic configured to periodically evaluate one or more assertions that assert a formal property related to reoccurrence of the particular state in the instantiation of the hardware design to detect whether the instantiation of the hardware design is in a livelock comprising the predetermined state. The hardware monitor may be used by a formal verification tool to exhaustively verify that the instantiation of the hardware design cannot enter a livelock comprising the predetermined state.

Fetching Instructions in an Instruction Fetch Unit

A method in an instruction fetch unit configured to initiate a fetch of an instruction bundle from a first memory and to initiate a fetch of an instruction bundle from a second memory, wherein a fetch from the second memory takes a predetermined fixed plurality of processor cycles, the method comprising: identifying that an instruction bundle is to be selected for fetching from the second memory in a predetermined future processor cycle; and initiating a fetch of the identified instruction bundle from the second memory a number of processor cycles prior to the predetermined future processor cycle based upon the predetermined fixed plurality of processor cycles taken to fetch from the second memory

Method and system for hard ware-assisted pre-execution

One aspect provides a system for hardware-assisted pre-execution. During operation, the system determines a pre-execution code region comprising one or more instructions. The system increments a global counter upon initiating the one or more instructions. The system issues a first instruction, which involves setting, in a first entry for the first instruction in a data structure, a first prefetch region identifier with a current value of the global counter. Responsive to a head pointer of the data structure reaching the first entry, the system: determines, based on a non-zero value for the first prefetch region identifier, that the first entry is not available to be allocated; and advances the head pointer to a next entry in the data structure, which renders a load associated with the first entry as a non-blocking load. The system resets the global counter upon completing the one or more instructions.