Patent classifications
G06F9/3808
Run-Time Code Parallelization with Monitoring of Repetitive Instruction Sequences During Branch Mis-Prediction
A processor includes an execution pipeline and monitoring circuity. The execution pipeline is configured to execute instructions of program code. The monitoring circuity is configured to monitor the instructions in a segment of a repetitive sequence of the instructions so as to construct a specification of register access by the monitored instructions, to parallelize execution of the repetitive sequence based on the corrected specification, and to terminate monitoring of the instructions and discard the specification in response to detecting a branch mis-prediction in the monitored instructions.
PROVIDING REFERENCES TO PREVIOUSLY DECODED INSTRUCTIONS OF RECENTLY-PROVIDED INSTRUCTIONS TO BE EXECUTED BY A PROCESSOR
Providing references to previously decoded instructions of recently-provided instructions to be executed by a processor is disclosed herein. In one aspect, a low resource micro-operation controller is provided. Responsive to an instruction pipeline receiving an instruction address, the low resource micro-operation controller is configured to determine if the received instruction address corresponds to an instruction address in short history table. Short history table includes instruction addresses of recently-provided instructions having micro-ops in a post-decode queue. If the received instruction address corresponds to an instruction address in short history table, the low resource micro-operation controller is configured to provide reference (e.g., pointer) to the fetch stage that corresponds to an entry in the post-decode queue in which the micro-ops corresponding to the instruction address are stored. Responsive to the decode stage receiving the reference, the low resource micro-operation controller is configured to provide the micro-ops from the post-decode queue for execution.
Processor with memory-embedded pipeline for table-driven computation
A processor and a method implemented by the processor to obtain computation results are described. The processor includes a unified reuse table embedded in a processor pipeline, the unified reuse table including a plurality of entries, each entry of the plurality of entries corresponding with a computation instruction or a set of computation instructions. The processor also includes a functional unit to perform a computation based on a corresponding instruction.
Instruction sequence buffer to store branches having reliably predictable instruction sequences
A method for outputting reliably predictable instruction sequences. The method includes tracking repetitive hits to determine a set of frequently hit instruction sequences for a microprocessor, and out of that set, identifying a branch instruction having a series of subsequent frequently executed branch instructions that form a reliably predictable instruction sequence. The reliably predictable instruction sequence is stored into a buffer. On a subsequent hit to the branch instruction, the reliably predictable instruction sequence is output from the buffer.
Branch-History Mode Trace Encoder
A trace encoder may be connected to a processor core. The trace encoder may be configured to maintain a count of branches that are consecutively taken when executed by the processor core and/or a count of branches that are consecutively not-taken when executed by the processor core. The trace encoder may be configured to send a message including the count.
SPECULATIVE MULTI-THREADING TRACE PREDICTION
A method for trace prediction includes using trace prediction to predict a trace specifying branch decisions. When a branch misprediction is detected, trace prediction is terminated and prediction is continued using branch prediction.
Power saving by reusing results of identical micro-operations
A data processing apparatus has control circuitry for detecting whether a current micro-operation to be processed by a processing pipeline would give the same result as an earlier micro-operation. If so, then the current micro-operation is passed through the processing pipeline, with at least one pipeline stage passed by the current micro-operation being placed in a power saving state during a processing cycle in which the current micro-operation is at that pipeline stage. The result of the earlier micro-operation is then output as a result of said current micro-operation. This allows power consumption to be reduced by not repeating the same computation.
Flushing in a microprocessor with multi-step ahead branch predictor and a fetch target queue
A microprocessor is shown, in which a branch predictor and an instruction cache are decoupled by a fetch-target queue (FTQ). The FTQ stores at least an instruction address whose branch prediction has been finished by the branch predictor. The instruction addresses queued in the FTQ is to be read out later as an instruction-fetching address for the instruction cache. The instruction address that is input into the branch predictor and used for branch prediction leads the instruction-fetching address.
REUSING FETCHED, FLUSHED INSTRUCTIONS AFTER AN INSTRUCTION PIPELINE FLUSH IN RESPONSE TO A HAZARD IN A PROCESSOR TO REDUCE INSTRUCTION RE-FETCHING
Reusing fetched, flushed instructions after an instruction pipeline flush in response to a hazard in a processor to reduce instruction re-fetching is disclosed. An instruction processing circuit is configured to detect fetched performance degrading instructions (Pals) in a pre-execution stage in an instruction pipeline that may cause a precise interrupt that would cause flushing of the instruction pipeline. In response to detecting a PDI in an instruction pipeline, the instruction processing circuit is configured to capture the fetched PDI and/or its successor, younger fetched instructions that are processed in the instruction pipeline behind the PDI, in a pipeline refill circuit. If a later execution of the PDI in the instruction pipeline causes a flush of the instruction pipeline, the instruction processing circuit can inject the fetched PDI and/or its younger instructions previously captured from the pipeline refill circuit into the instruction pipeline to be processed without such instructions being re-fetched.
Reconfigurable Parallel Processing
Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) that each may comprise a configuration buffer, a sequencer coupled to the configuration buffer of each of the plurality of PEs and configured to distribute one or more PE configurations to the plurality of PEs, and a gasket memory coupled to the plurality of PEs and being configured to store at least one PE execution result to be used by at least one of the plurality of PEs during a next PE configuration.