G06F9/3842

Throttling Code Fetch For Speculative Code Paths

Methods and apparatus relating to throttling a code fetch for speculative code paths are described. In an embodiment, a first storage structure stores a reference to a code line in response to a request to be received from a cache. A second storage structure to store a reference to the code line in response to an update to an Instruction Dispatch Queue (IDQ). Logic circuitry controls additional code line fetch operations based at least in part on a comparison of a number of ongoing speculative code fetches and a determination that the code line is speculative. Other embodiments are also disclosed and claimed.

PREDICTING UPCOMING CONTROL FLOW

An apparatus has a fetch queue to identify a sequence of instructions to be fetched for execution and prediction circuitry to predict upcoming control flow and to control which instructions are identified in the fetch queue in dependence on the prediction. The prediction circuitry predicts multi-taken sequences which are sequences of instructions in which control flow is diverted by a first control flow changing instruction to a series of instructions terminating in a second control flow changing instruction that diverts control flow to a target address. The apparatus also has prediction confidence calculation circuitry to calculate confidence levels for respective multi-taken sequences. Each confidence level is indicative of a confidence in an accuracy of prediction of its respective multi-taken sequence. When the confidence level for a particular multi-taken sequence satisfies a prediction confidence condition, the prediction confidence tracking circuitry allows the particular multi-taken sequence to be predicted by the prediction circuitry. The prediction circuitry causes the series of instructions and the target instruction for the particular multi-taken sequence to be identified in the fetch queue when the prediction circuitry predicts the particular multi-taken sequence and further predictions to be made starting from the target address for the particular multi-taken sequence.

Multi-Cycle Scheduler with Speculative Picking of Micro-Operations
20230195517 · 2023-06-22 ·

A multi-cycle scheduler for a processor includes early wake circuitry, late wake circuitry, and picker circuitry. In a first cycle of a clock, the early wake circuitry speculatively identifies child micro-operations as ready whose dependencies are satisfied by a set of ready parent micro-operations. In a second cycle of the clock, the picker circuitry picks at least one of the child micro-operations identified as ready for issue to execution circuitry. In addition, the late wake circuitry blocks from issue at least one picked child micro-operation speculatively identified as ready upon determining that a respective parent micro-operation did not issue to execution circuitry.

Renaming with generation numbers

A processor including a register file having a plurality of registers, and configured for out-of-order instruction execution, further includes a renamer unit that produces generation numbers that are associated with register file addresses to provide a renamed version of a register that is temporally offset from an existing version of that register rather than assigning a non-programmer-visible physical register as the renamed register.

Generation and use of memory access instruction order encodings

Apparatus and methods are disclosed for controlling execution of memory access instructions in a block-based processor architecture using a hardware structure that indicates a relative ordering of memory access instruction in an instruction block. In one example of the disclosed technology, a method of executing an instruction block having a plurality of memory load and/or memory store instructions includes selecting a next memory load or memory store instruction to execute based on dependencies encoded within the block, and on a store vector that stores data indicating which memory load and memory store instructions in the instruction block have executed. The store vector can be masked using a store mask. The store mask can be generated when decoding the instruction block, or copied from an instruction block header. Based on the encoded dependencies and the masked store vector, the next instruction can issue when its dependencies are available.

Restricted speculative execution mode to prevent observable side effects

Embodiments of methods and apparatuses for restricted speculative execution are disclosed. In an embodiment, a processor includes configuration storage, an execution circuit, and a controller. The configuration storage is to store an indicator to enable a restricted speculative execution mode of operation of the processor, wherein the processor is to restrict speculative execution when operating in restricted speculative execution mode. The execution circuit is to perform speculative execution. The controller to restrict speculative execution by the execution circuit when the restricted speculative execution mode is enabled.

Instruction and Logic for Total Store Elimination
20170351516 · 2017-12-07 ·

A processor includes a front end including circuitry to decode instructions from an instruction stream, a data cache unit including circuitry to cache data for the processor, and a binary translator. The binary translator includes circuitry to identify a redundant store in the instruction stream, mark the start and end of a region of the instruction stream with the redundant store, remove the redundant store, and store an amended instruction stream with the redundant store removed.

Apparatus and method with value prediction for load operation
11513966 · 2022-11-29 · ·

An apparatus has processing circuitry, load tracking circuitry and value prediction circuitry. In response to an actual value of first target data becoming available for a value-predicted load operation, it is determined whether the actual value matches the predicted value of the first target data determined by the value prediction circuitry, and whether the tracking information indicates that, for a given younger load operation issued before the actual value of the first target data was available, there is a risk of second target data associated with that given load operation having changed after having been loaded. Independent of whether the addresses of the value-predicted load operation and younger load operation correspond, at least the given load operation is re-processed when the value prediction is correct and the tracking information indicates there is a risk of the second target data having changes after being loaded. This protects against ordering violations.

APPARATUS AND METHOD WITH PREDICTION FOR LOAD OPERATION
20230185573 · 2023-06-15 ·

An apparatus has processing circuitry, load tracking circuitry and load prediction circuitry. It is determined whether tracking information indicates that there is a risk of target data, corresponding to an address of a speculatively-issued load operation which is speculatively issued (bypassing an older operation) based on a prediction determined by the load prediction circuitry, having changed between the target data being loaded for the speculatively-issued load operation and data being loaded for a given older load operation bypassed by the speculatively-issued load operation. If so, independent of whether the addresses of the speculatively-issued load operation and the given older load operation correspond, at least the speculatively-issued load operation is reissued, even when the prediction is correct. This protects against ordering violations.

METHOD AND APPARATUS FOR MAINTAINING DATA COHERENCE IN A NON-UNIFORM COMPUTE DEVICE

A data processing apparatus includes one or more host processors with first processing units, one or more caches with second processing unit, a non-cache memory having a third processing unit and a reorder buffer operable to maintain data order during execution of a program of instructions. An instruction scheduler routes instructions to the processing units. Data coherence is maintained by control logic that blocks access to data locations in use by a selected processing unit other than the selected processing unit until data associated with the data locations are released from the reorder buffer. Data stored in the cache is written to the memory if it is already in a modified state, otherwise the state is set to the modified state. A memory controller may be used to restrict access to memory locations to be operated on.