G06F9/3804

Cumulative confidence fetch throttling

A method and apparatus to utilize a fetching scheme for instructions in a processor to limit the expenditure of power caused by the speculative execution of branch instructions is provided. Also provided is a computer readable storage device encoded with data for adapting a manufacturing facility to create an apparatus. The method includes calculating a cumulative confidence measure based on one or more outstanding conditional branch instructions. The method also includes reducing prefetching operations in response to detecting that the cumulative confidence measure is below a first threshold level.

DATA PROCESSOR
20170344477 · 2017-11-30 ·

A data processor comprises a memory-management-unit for receiving external-operation-data from a CPU. The memory-management-unit sets a deterministic-quantity value for the external-operation-data based on the external-operation-data. The deterministic-quantity value may be either an active-value or an inactive-value. The data processor has a non-deterministic-processor-block for receiving a memory-signal from the memory-management-unit, and has a control-block configured to (i) send the memory-signal to an NDP-output-terminal if the deterministic-quantity value is the active-value, thereby bypassing a performance-enhancement-block, or (ii) send at least a portion of the memory-signal that is representative of the request for response-data to the performance-enhancement-block if the deterministic-quantity value is the inactive-value.

Starting reading of instructions from a correct speculative condition prior to fully flushing an instruction pipeline after an incorrect instruction speculation determination
11675595 · 2023-06-13 · ·

An apparatus includes instruction fetching circuitry to read a set of instructions, including a speculative execution instruction and a speculative condition determination instruction; cache the instructions; and read the speculative execution instruction corresponding to the speculative condition of the speculative condition determination instruction. If an execution result of the speculative condition determination instruction indicates the speculative condition is incorrect, clear the instructions cached in the instruction fetching circuitry. Instruction decoding circuitry decodes instructions. Executing circuitry executes instructions, including executing the speculative condition determination instruction to obtain the execution result. Instruction retiring circuitry caches instructions executed by the executing circuitry, and in response to an instruction older than the speculative condition determination instruction being retired, instructs the executing circuitry to clear instructions in the executing circuitry and clear the instructions cached in the instruction retiring circuitry.

Instruction cache behavior and branch prediction

Instruction cache behavior and branch prediction are used to improve the functionality of a computing device by profiling branching instructions in an instruction cache to identify likelihoods of proceeding to a plurality of targets from the branching instructions; identifying a hot path in the instruction cache based on the identified likelihoods; and rearranging the plurality of targets relative to one another and associated branching instructions so that a first branching instruction that has a higher likelihood of proceeding to a first hot target than to a first cold target and that previously flowed to the first cold target and jumped to the first hot target instead flows to the first hot target and jumps to the first cold target.

SINGLE-THREAD SPECULATIVE MULTI-THREADING
20170337062 · 2017-11-23 ·

A processor includes a pipeline and control circuitry. The pipeline is configured to process instructions of program code and includes one or more fetch units. The control circuitry is configured to predict at run-time one or more future flow-control traces to be traversed in the program code, to define, based on the predicted flow-control traces, two or more regions of the program code from which instructions are to be fetched, wherein the number of regions is greater than the number of fetch units, and to instruct the pipeline to fetch instructions alternately from the two or more regions of the program code using the one or more fetch units, and to process the fetched instructions.

VARIABLE-LENGTH INSTRUCTION BUFFER MANAGEMENT

A vector processor is disclosed including a variety of variable-length instructions. Computer-implemented methods are disclosed for efficiently carrying out a variety of operations in a time-conscious, memory-efficient, and power-efficient manner. Methods for more efficiently managing a buffer by controlling the threshold based on the length of delay line instructions are disclosed. Methods for disposing multi-type and multi-size operations in hardware are disclosed. Methods for condensing look-up tables are disclosed. Methods for in-line alteration of variables are disclosed.

Low latency fetch circuitry for compute kernels
11256510 · 2022-02-22 · ·

Techniques are disclosed relating to fetching items from a compute command stream that includes compute kernels. In some embodiments, stream fetch circuitry sequentially pre-fetches items from the stream and stores them in a buffer. In some embodiments, fetch parse circuitry iterate through items in the buffer using a fetch parse pointer to detect indirect-data-access items and/or redirect items in the buffer. The fetch parse circuitry may send detected indirect data accesses to indirect-fetch circuitry, which may buffer requests. In some embodiments, execute parse circuitry iterates through items in the buffer using an execute parse pointer (e.g., which may trail the fetch parse pointer) and outputs both item data from the buffer and indirect-fetch results from indirect-fetch circuitry for execution. In various embodiments, the disclosed techniques may reduce fetch latency for compute kernels.

Processor testing

Processors may be tested according to various implementations. In one general implementation, a process for processor testing may include randomly generating a first plurality of branch instructions for a first portion of an instruction set, each branch instruction in the first portion branching to a respective instruction in a second portion of the instruction set. The process may also include randomly generating a second plurality of branch instructions for the second portion of the instruction set, each branch instruction in the second portion branching to a respective instruction in the first portion of the instruction set. The process may additionally include generating a plurality of instructions to increment a counter when each branch instruction is encountered during execution.

Data cache system and method

A data cache system is provided. The system includes a central processing unit (CPU), a memory system, an instruction track table, a tracker and a data engine. The CPU is configured to execute instructions and read data. The memory system is configured to store the instructions and the data. The instruction track table is configured to store corresponding information of branch instructions stored in the memory system. The tracker is configured to point to a first data read instruction after an instruction currently being executed by the CPU. The data engine is configured to calculate a data address in advance before the CPU executes the data read instruction pointed to by the tracker. Further, the data engine is also configured to control the memory system to provide the corresponding data for the CPU based on the data address.

INSTRUCTION PREFETCHING
20170286116 · 2017-10-05 ·

A data processing apparatus has prefetch circuitry for prefetching instructions from a data store into an instruction queue. Branch prediction circuitry is provided for predicting outcomes of branch instructions and the prefetch circuitry may prefetch instructions subsequent to the branch based on the predicted outcome. Instruction identifying circuitry identifies whether a given instruction prefetched from the data store is a predetermined type of program flow altering instruction and if so then controls the prefetch circuitry to halt prefetching of subsequent instructions into the instruction queue.