Patent classifications
G06F9/3832
Distributed history buffer flush and restore handling in a parallel slice design
An approach is provided in which a computing system captures content included in a history buffer entry that corresponds to a flush ITAG. The computing system, in turn, uses an execution unit to transmit the content over a results bus to multiple registers and restore at least one of the registers accordingly.
Pipelining out-of-order instructions
Systems, methods and computer program product provide for pipelining out-of-order instructions. Embodiments comprise an instruction reservation station for short instructions of a short latency type and long instructions of a long latency type, an issue queue containing at least two short instructions of a short latency type, which are to be chained to match a latency of a long instruction of a long latency type, a register file, at least one execution pipeline for instructions of a short latency type and at least one execution pipeline for instructions of a long latency type; wherein results of the at least one execution pipeline for instructions of the short latency type are written to the register file, preserved in an auxiliary buffer, or forwarded to inputs of said execution pipelines. Data of the auxiliary buffer are written to the register file.
Power saving by reusing results of identical micro-operations
A data processing apparatus has control circuitry for detecting whether a current micro-operation to be processed by a processing pipeline would give the same result as an earlier micro-operation. If so, then the current micro-operation is passed through the processing pipeline, with at least one pipeline stage passed by the current micro-operation being placed in a power saving state during a processing cycle in which the current micro-operation is at that pipeline stage. The result of the earlier micro-operation is then output as a result of said current micro-operation. This allows power consumption to be reduced by not repeating the same computation.
METHOD AND APPARATUS FOR A PAGE-LOCAL DELTA-BASED PREFETCHER
A method includes recording a first set of consecutive memory access deltas, where each of the consecutive memory access deltas represents a difference between two memory addresses accessed by an application, updating values in a prefetch training table based on the first set of memory access deltas, and predicting one or more memory addresses for prefetching responsive to a second set of consecutive memory access deltas and based on values in the prefetch training table.
Arithmetic processing device and control method implemented by arithmetic processing device
An arithmetic processing device includes: a decoding circuit configured to decode a command; a command execution circuit configured to execute the command decoded by the decoding circuit; a register circuit configured to include a plurality of registers for holding data used by the command execution circuit; an identification information holding circuit configured to store identification information for identifying a register for writing a specific value when the command is a register writing command; a setting circuit configured to hold the specific value; and an operation control circuit configured to execute inhibiting processing when the command is a register reading command, the inhibiting processing including inhibiting an access of the register by the register reading command and selecting the specific value held in the setting circuit.
Speculative buffer for speculative memory accesses with entries tagged with execution context identifiers
An apparatus comprises processing circuitry to execute instructions from one or more of a plurality of execution contexts each associated with a respective execution context identifier; a cache; and a speculative buffer. Control circuitry controls allocation of data to the cache and the speculative buffer. A speculative entry, for which allocation is caused by a speculative memory access associated with a given execution context, is allocated to the speculative buffer instead of to the cache while the speculatively executed memory access instruction remains speculative. The speculative entry specifies, as a tagged execution context identifier, the execution context identifier associated with the given execution context. Presence of the speculative entry in the speculative buffer is prevented from being observable to execution contexts other than the execution context identified by the tagged execution context identifier.
Methods and apparatus for handling processor load instructions
Aspects of the present disclosure relate to an apparatus comprising decode circuitry to receive an instruction and identify the received instruction as a load instruction, and prediction circuitry to predict, based on a prediction scheme, a target address of the load instruction, and trigger a speculative memory access in respect of the predicted target address.
PREDICTION USING INSTRUCTION CORRELATION
A data processing apparatus is provided, which is able to provide predictions for hard to predict instructions. Prediction circuitry generates predictions relating to predictable instructions in a stream, where the prediction circuitry comprises storage circuitry to store, in respect of each of the predictable instructions, a reference to a set of monitored instructions in the stream to be used for generating predictions for the predictable instructions. Processing circuitry receives the predictions from the prediction circuitry and executes the predictable instructions in the stream using the predictions. Programmable instruction correlation parameter storage circuitry stores a given correlation parameter between a given predictable instruction in the stream and a subset of the set of monitored instructions of the given predictable instruction, to assist the prediction circuitry in generating the predictions. If the programmable instruction correlation parameter storage circuitry is currently storing the given correlation parameter, the prediction circuitry generates a given prediction relating to the given predictable instruction based on the subset of the set of monitored instructions indicated in the programmable instruction correlation parameter storage circuitry. Otherwise the prediction circuitry generates the given prediction relating to the given predictable instruction based on the set of monitored instructions indicated in the storage circuitry.
OPPORTUNISTIC CONSUMER INSTRUCTION STEERING BASED ON PRODUCER INSTRUCTION VALUE PREDICTION IN A MULTI-CLUSTER PROCESSOR
Opportunistic consumer instruction steering based on producer instruction value prediction in a multi-cluster processor is disclosed. A processor provides producer instructions and consumer instructions to a steering circuit that steers the program instructions to clusters of instruction execution circuits. An input value provided to a consumer instruction may be a produced value of a producer instruction, creating a dependency. The steering circuit steers a producer instruction to a first cluster and, in response to receiving the consumer instruction and the predicted value of the producer instruction, provides the predicted value to at least a second cluster and steers the consumer instruction to the second cluster for execution with the predicted value as the input value. A consumer instruction can be executed in a different cluster than a producer instruction without a cluster-to-cluster latency penalty, which allows the instruction loads to be better balanced among the clusters for higher processor throughput.
Instruction address based data prediction and prefetching
Provided is a method, computer program product, and system for performing data address prediction. The method comprises receiving a first instruction for execution by a processor. A load address predictor (LAP) accesses a LAP table entry for a section of an instruction cache. The section is associated with a plurality of instructions that includes the first instruction. The LAP predicts a set of data addresses that will be loaded using the LAP table entry. The method further comprises sending a recommendation to prefetch the set of data addresses to a load-store unit (LSU).