Patent classifications
G06F9/3832
INSTRUCTION SET ARCHITECTURE BASED AND AUTOMATIC LOAD TRACKING FOR OPPORTUNISTIC RE-STEER OF DATA-DEPENDENT FLAKY BRANCHES
Methods and apparatuses relating to instruction set architecture (ISA) based and automatic load tracking hardware for opportunistic re-steer of data-dependent flaky branches are described. In one embodiment, a processor includes a pipeline circuit comprising a decoder to decode instructions into decoded instructions and an execution circuit to execute the decoded instructions, a branch predictor circuit to generate a predicted path for a branch instruction, and a branch re-steer circuit to, for the branch instruction dependent on a result from a load instruction, check if an instruction received by the pipeline circuit is the load instruction, and when the instruction received by the pipeline circuit is the load instruction, check for a write back of the result from the load instruction between a decode of the branch instruction with the decoder and an execution of the branch instruction with the execution circuit, and when the predicted path differs from a path based on the result from the load instruction, re-steer the branch instruction in the pipeline circuit to the path and cause execution of the branch instruction for the path based on the result from the load instruction.
AN APPARATUS AND METHOD FOR PREDICTING SOURCE OPERAND VALUES AND OPTIMIZED PROCESSING OF INSTRUCTIONS
An apparatus and method are provided for processing instructions. The apparatus has execution circuitry for executing instructions, where each instruction requires an associated operation to be performed using one or more source operand values in order to produce a result value. Issue circuitry is used to maintain a record of pending instructions awaiting execution by the execution circuitry, and prediction circuitry is used to produce a predicted source operand value for a chosen pending instruction. Optimisation circuitry is then arranged to detect an optimisation condition for the chosen pending instruction when the predicted source operand value is such that, having regard to the associated operation for the chosen pending instruction, the result value is known without performing the associated operation. In response to detection of the optimisation condition, an optimisation operation is implemented instead of causing the execution circuitry to perform the associated operation in order to execute the chosen pending instruction. This can lead to significant performance and/or power consumption improvements.
APPARATUS AND METHOD FOR PERFORMING BRANCH PREDICTION
An apparatus has processing circuitry for executing instructions and fetch circuitry for fetching the instructions for execution. When a branch instruction is encountered by the fetch circuitry, it determines subsequent instructions to be fetched in dependence on an initial branch direction prediction for the branch instruction made by branch prediction circuitry. Value prediction circuitry is used to maintain a predicted result value for one or more instructions, and dispatch circuitry maintains a record of pending instructions that have been fetched by the fetch circuitry and are awaiting execution by the processing circuitry, and selects pending instructions from the record for dispatch to the processing circuitry. When a given instruction whose predicted result value is maintained by the value prediction circuitry has a dependent instruction whose outcome is dependent on a result value of the given instruction, the dispatch circuitry nay be arranged to enable speculative execution of that dependent instruction using the predicted result value of the given instruction. Analysis circuitry is arranged, when the dependent instruction is the branch instruction, to detect a mispredict condition when an additional branch direction prediction for the branch instruction determined using the predicted result value for the given instruction is considered more accurate that the initial branch direction prediction, and the additional branch direction prediction differs to the initial branch direction prediction. On detection of the mispredict condition, a control signal is issued to indicate that the branch instruction has been mispredicted.
Enabling removal and reconstruction of flag operations in a processor
In one embodiment, a processor includes a fetch logic to fetch instructions, a decode logic to decode the fetched instructions, and an execution logic to execute at least some of the instructions. The decode logic may determine whether a flag portion of a first instruction to be folded is to be performed, and if not, accumulate a first immediate value of the first instruction with a folded immediate value obtained from an entry of an immediate buffer. Other embodiments are described and claimed.
DELIVERING IMMEDIATE VALUES BY USING PROGRAM COUNTER (PC)-RELATIVE LOAD INSTRUCTIONS TO FETCH LITERAL DATA IN PROCESSOR-BASED DEVICES
Delivering immediate values by using program counter (PC)-relative load instructions to fetch literal data in processor-based devices is disclosed. In this regard, a processing element (PE) of a processor-based device provides an execution pipeline circuit that comprises an instruction processing portion and a data access portion. Using a literal data access logic circuit, the PE detects a PC-relative load instruction within a fetch window that includes multiple fetched instructions. The PE determines that the PC-relative load instruction can be serviced using literal data that is available to the instruction processing portion of the execution pipeline circuit (e.g., located within the fetch window containing the PC-relative load instruction, or stored in a literal pool buffer), The PE then retrieves the literal data within the instruction processing portion of the execution pipeline circuit, and executes the PC-relative load instruction using the literal data.
Check pointing of accumulator register results in a microprocessor
A computer system, processor, and method for processing information is disclosed that includes at least one processor having a main register file, the main register file having a plurality of entries for storing data; one or more execution units including a dense math execution unit; and at least one accumulator register file, the at least one accumulator register file associated with the dense math execution unit. The processor in an embodiment is configured to process data in the dense math execution unit where the results of the dense math execution unit are written to a first group of one or more accumulator register file entries, and after a checkpoint boundary is crossed based upon, for example, the number “N” of instructions dispatched after the start of the checkpoint, the results of the dense math execution unit are written to a second group of one or more accumulator register file entries.
History table management for a correlated prefetcher
Memory prefetching in a processor comprises: identifying, in response to memory access instructions, a pattern of addresses; and determining, based on the pattern of addresses, an address to prefetch. Determining the address to prefetch comprises: determining, using the pattern of addresses, an index into a history table; retrieving, from the history table and using the index, an offset value, wherein the offset value is not the address to prefetch; and determining the address to prefetch using the offset value and at least one address of the pattern of addresses. The method further comprises prefetching the address to prefetch.
CHECK POINTING OF ACCUMULATOR REGISTER RESULTS IN A MICROPROCESSOR
A computer system, processor, and method for processing information is disclosed that includes at least one processor having a main register file, the main register file having a plurality of entries for storing data; one or more execution units including a dense math execution unit; and at least one accumulator register file, the at least one accumulator register file associated with the dense math execution unit. The processor in an embodiment is configured to process data in the dense math execution unit where the results of the dense math execution unit are written to a first group of one or more accumulator register file entries, and after a checkpoint boundary is crossed based upon, for example, the number “N” of instructions dispatched after the start of the checkpoint, the results of the dense math execution unit are written to a second group of one or more accumulator register file entries.
TECHNIQUES FOR IMPROVING OPERAND CACHING
A technique for determining whether a register value should be written to an operand cache or whether the register value should remain in and not be evicted from the operand cache is provided. The technique includes executing an instruction that accesses an operand that comprises the register value, performing one or both of a lookahead technique and a prediction technique to determine whether the register value should be written to an operand cache or whether the register value should remain in and not be evicted from the operand cache, and based on the determining, updating the operand cache.
SHARED POINTER FOR LOCAL HISTORY RECORDS USED BY PREDICTION CIRCUITRY
An apparatus has processing circuitry, and history storage circuitry to store local history records. Each local history record corresponds to a respective subset of instruction addresses and tracks a sequence of observed instruction behaviour observed for successive instances of instructions having addresses in that subset. Pointer storage circuitry to store a shared pointer shared between the local history records. The shared pointer indicates a common storage position reached in each local history record. Prediction circuitry determines predicted instruction behaviour for a given instruction address based on a selected portion of a selected local history record stored in the history storage circuitry. The prediction circuitry selects the selected local history record based on the given instruction address and selects the selected portion based on the shared pointer.