Patent classifications
G06F9/323
Load Dependent Branch Prediction
Load dependent branch prediction is described. In accordance with described techniques, a load dependent branch instruction is detected by identifying that a destination location of a load instruction is used in an operation for determining whether a conditional branch is taken or not taken. The load instruction is included in a sequence of load instructions having addresses separated by a step size. An instruction is injected in an instruction stream of a processor for fetching data of a future load instruction using an address of the load instruction offset by a distance based on the step size. An additional instruction is injected in the instruction stream of the processor for precomputing an outcome of a load dependent branch using an address computed based on an address of the operation and the data of the future load instruction.
Instruction Set Architecture and Microarchitecture for Early Pipeline Re-steering Using Load Address Prediction to Mitigate Branch Misprediction Penalties
Methods and apparatus relating to Instruction Set Architecture (ISA) and/or microarchitecture for early pipeline re-steering using load address prediction to mitigate branch misprediction penalties are described. In an embodiment, decode circuitry decodes a load instruction and Load Address Predictor (LAP) circuitry issues a load prefetch request to memory for data for a load operation of the load instruction. Compute circuitry executes an outcome for a branch operation of the load instruction based on the data from the load prefetch request. And re-steering circuitry transmits a signal to cause flushing of data associated with the load instruction in response to a mismatch between the outcome for the branch operation and a stored prediction value for the branch. Other embodiments are also disclosed and claimed.
VARIABLE LENGTH INSTRUCTION PROCESSOR SYSTEM AND METHOD
A variable length instruction processor system and method is provided. Before a processor core executes an instruction, the system and method applied in a processor field convert the instruction into micro-operation(s)and the micro-operation(s) can be filled into a cache system that can be directly accessed by a processor core, reducing the depth of a pipeline and improving efficiency of the pipeline.
DETECTING THAT A SEQUENCE OF INSTRUCTIONS CREATES AN AFFILIATED RELATIONSHIP
Detecting that a sequence of instructions creates an affiliated relationship. A determination is made that a sequence of instructions creates an affiliated relationship. Based on determining that the sequence of instructions creates the affiliated relationship, a sequence of operations is generated. The sequence of operations provides a predicted target address to be included in a selected register and to be used in branching.
DETECTING THAT A SEQUENCE OF INSTRUCTIONS CREATES AN AFFILIATED RELATIONSHIP
Detecting that a sequence of instructions creates an affiliated relationship. A determination is made that a sequence of instructions creates an affiliated relationship. Based on determining that the sequence of instructions creates the affiliated relationship, a sequence of operations is generated. The sequence of operations provides a predicted target address to be included in a selected register and to be used in branching.
Intermodal calling branch instruction
Processing circuitry has a handler mode and a thread mode. In response to an exception condition, a switch to handler mode is made. In response to an intermodal calling branch instruction specifying a branch target address when the processing circuitry is in the handler mode, an instruction decoder controls the processing circuitry to save a function return address to a function return address storage location; switch a current mode of the processing circuitry to the thread mode; and branch to an instruction identified by the branch target address. This can be useful for deprivileging of exceptions.
Instructions for storing in general purpose registers one of two scalar constants based on the contents of vector write masks
According to one embodiment, an occurrence of an instruction is fetched. The instruction's format specifies its only source operand from a single vector write mask register, and specifies as its destination a single general purpose register. In addition, the instruction's format includes a first field whose contents selects the single vector write mask register, and includes a second field whose contents selects the single general purpose register. The source operand is a write mask including a plurality of one bit vector write mask elements that correspond to different multi-bit data element positions within architectural vector registers. The method also includes, responsive to executing the single occurrence of the single instruction, storing data in the single general purpose register such that its contents represent either a first or second scalar constant based on whether the plurality of one bit vector write mask elements in the source operand are all zero.
Variable length instruction processor system and method
A variable length instruction processor system and method is provided. Before a processor core executes an instruction, the system and method applied in a processor field convert the instruction into micro-operation(s) and the micro-operation(s) can be filled into a cache system that can be directly accessed by a processor core, reducing the depth of a pipeline and improving efficiency of the pipeline.
Methods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation
Methods for supporting wide and efficient front-end operation with guest architecture emulation are disclosed. As a part of a method for supporting wide and efficient front-end operation, upon receiving a request to fetch a first far taken branch instruction, a cache line that includes the first far taken branch instruction, a next cache line and a cache line located at the target of the first far taken branch instruction is read. Based on information that is accessed from a data table, the cache line and either the next cache line or the cache line located at the target is fetched in a single cycle.
Data processing system and method for executing block call and block return instructions
A data processing system includes a control register, a program counter and a controller. The control register is used to store a level status of an execution flow and at least one return address. When the controller reads a block call instruction while a level status of the execution flow has an initial value, the controller stores a return address of the block call instruction in the control register, increments a value of the level status, and redirects the execution flow to a target address indicated by the block call instruction. When the controller reads a block return instruction and the value of the level status is not equal to the initial value, the controller decrements the value of the level status. If the value of the level status becomes equal to the initial value, the controller redirects the execution flow to the return address.