Patent classifications
G06F9/322
PERFORMANCE OPTIMIZATION OF CLOSE CODE
Methods and systems described herein utilize a jump table in directly-addressable, near code, to facilitate improved execution of frequent calls to executable code from other workloads outside of the near code. By executing a directly-addressable call and jump instruction to access frequently-accessed executable code, indirect call instructions are avoided.
Methods and apparatus to insert profiling instructions into a graphics processing unit kernel
Embodiments are disclosed for inserting profiling instructions into graphics processing unit (GPU) kernels. An example apparatus includes instructions, and at least one processor to execute the instructions to determine whether a GPU supports modification of entry point addresses, detect a first entry point address and a second entry point address of an original GPU kernel, create a corresponding instrumented GPU kernel from the original GPU kernel based on the determination by inserting at least one of first profiling initialization instructions or first jump instructions at the first entry point address of the instrumented GPU kernel, inserting at least one of second profiling initialization instructions or second jump instructions at the second entry point address of the instrumented GPU kernel, and inserting profiling measurement instructions into the instrumented GPU kernel.
Fetch stage handling of indirect jumps in a processor pipeline
Systems and methods are disclosed for fetch stage handling of indirect jumps in a processor pipeline. For example, a method includes detecting a sequence of instructions fetched by a processor core, wherein the sequence of instructions includes a first instruction, with a result that depends on an immediate field of the first instruction and a program counter value, followed by a second instruction that is an indirect jump instruction; responsive to detection of the sequence of instructions, preventing an indirect jump target predictor circuit from generating a target address prediction for the second instruction; and, responsive to detection of the sequence of instructions, determining a target address for the second instruction before the first instruction is issued to an execution stage of a pipeline of the processor core.
FETCH STAGE HANDLING OF INDIRECT JUMPS IN A PROCESSOR PIPELINE
Systems and methods are disclosed for fetch stage handling of indirect jumps in a processor pipeline. For example, a method includes detecting a sequence of instructions fetched by a processor core, wherein the sequence of instructions includes a first instruction, with a result that depends on an immediate field of the first instruction and a program counter value, followed by a second instruction that is an indirect jump instruction; responsive to detection of the sequence of instructions, preventing an indirect jump target predictor circuit from generating a target address prediction for the second instruction; and, responsive to detection of the sequence of instructions, determining a target address for the second instruction before the first instruction is issued to an execution stage of a pipeline of the processor core.
Processor including debug unit and debug system
The present disclosure discloses a debug unit, comprising: a write register configured to store kernel write data written by a kernel of a processor, wherein the processor is communicatively coupled to a debugger configured to read the kernel write data, wherein the kernel write data is associated with a kernel write flag bit to indicate data validity of the kernel write data; and a control unit including circuitry configured to control access to the write register by the kernel of the processor and the debugger based on data validity indicated by the kernel write flag bit. The present disclosure further discloses a corresponding processor including the debug unit, a corresponding debugger communicatively coupled to the processor, and a corresponding debug system including the processor coupled to the debugger.
ENFORCING DATA PLACEMENT REQUIREMENTS VIA ADDRESS BIT SWAPPING
Enforcing data placement requirements via address bit swapping, including: receiving an instruction comprising a first memory address associated with a first address bit mapping; generating a remapped instruction by rearranging a plurality of bits of the first memory address according to a second address bit mapping; and issuing the remapped instruction to memory.
Determining branch targets for guest branch instructions executed in native address space
A microprocessor implemented method is disclosed. The method includes mapping a plurality of instructions in a guest address space to corresponding instructions in a native address space. The method further includes, for each of one or more guest branch instructions in said native address space fetched during execution, performing the following: determining a youngest prior guest branch target stored in a guest branch target register, determining a branch target for a respective guest branch instruction by adding an offset value for said respective guest branch instruction to said youngest prior guest branch target, where said offset value is adjusted to account for a difference in address in said guest address space between an instruction at a beginning of a guest instruction block and a branch instruction in said guest instruction block. The method further includes creating an entry in said guest branch target register for said branch target.
Device and processing architecture for resolving execution pipeline dependencies without requiring no operation instructions in the instruction memory
Different processor architectures are described to evaluate and track dependencies required by instructions. The processors may hold or queue instructions that require output of other instructions until required data and resources are available which may remove the requirement of NOPs in the instruction memory to resolve dependencies and pipeline hazards. The processor may divide instruction data into bundles for parallel execution and provide speculative execution. The processor may include various components to implement an evaluation unit, execution unit and termination unit.
Input/output data transformations when emulating non-traced code with a recorded execution of traced code
Transforming input data to enable execution of second executable code using trace data gathered during execution of first executable code. A trace of an execution of the first code is accessed. The trace stores data of an input that was consumed by first executable instructions of the first code. It is determined that the stored data of the input is usable as an input to second executable instructions of the second code. A difference in size/format of the stored data as used by the first instructions, compared to an input size/format expected by the second executable instructions, is identified. Based on the identified difference, a data transformation is determined that would enable the second instructions to consume the stored data. Execution of the second instructions is emulated using the stored data, including projecting the data transformation to enable the second instructions to consume the stored data.
Fetch stage handling of indirect jumps in a processor pipeline
Systems and methods are disclosed for fetch stage handling of indirect jumps in a processor pipeline. For example, a method includes detecting a sequence of instructions fetched by a processor core, wherein the sequence of instructions includes a first instruction, with a result that depends on an immediate field of the first instruction and a program counter value, followed by a second instruction that is an indirect jump instruction; responsive to detection of the sequence of instructions, preventing an indirect jump target predictor circuit from generating a target address prediction for the second instruction; and, responsive to detection of the sequence of instructions, determining a target address for the second instruction before the first instruction is issued to an execution stage of a pipeline of the processor core.