Patent classifications
G06F9/40
Operation parameter control based upon queued instruction characteristics
This follows a data processing system comprising multiple GPUs includes instruction queue circuitry storing data specifying program instructions for threads awaiting issue for execution. Instruction characterization circuitry determines one or more characteristics of the program instructions awaiting issue within the instructional queue circuitry and supplies this to operating parameter control circuitry. The operating parameter control circuitry alters one or more operating parameters of the system in response to the one or more characteristics of the program instructions awaiting issue.
Banked physical register data flow architecture in out-of-order processors
In a method of executing instructions in a processing system, respective global age tags are assigned to each of the one or more instructions fetched for processing by the processing system. Each global age tag indicates an age of the corresponding instruction in the processing system. Respective physical registers in a physical register file are allocated to each destination logical register referenced by each instruction. The respective global age tags are written to the in respective physical registers allocated to the destination logical registers of the instructions. The instructions are executed by the processing system. At least some of the instructions are executed in an order different from a program order of the instructions.
Buffering instructions of a single branch, backwards short loop within a virtual loop buffer
A method and system for instruction fetching within a processor instruction unit, utilizing a loop buffer, one or more virtual loop buffers, and/or an instruction buffer. During instruction fetch, modified instruction buffers coupled to an instruction cache (I-cache) temporarily store instructions from a single branch, backwards short loop. The modified instruction buffers may be a loop buffer, one or more virtual loop buffers, and/or an instruction buffer. The instruction fetch within the instruction unit of a processor retrieves the instructions for the short loop from the modified buffers during the loop cycles of the single branch, backwards short loop, rather than from the instruction cache.
IT instruction pre-decode
Various techniques for processing and pre-decoding branches within an IT instruction block. Instructions are fetched and cached in an instruction cache, and pre-decode bits are generated to indicate the presence of an IT instruction and the likely boundaries of the IT instruction block. If an unconditional branch is detected within the likely boundaries of an IT instruction block, the unconditional branch is treated as if it were a conditional branch. The unconditional branch is sent to the branch direction predictor and the predictor generates a branch direction prediction for the unconditional branch.
MFENCE and LFENCE micro-architectural implementation method and system
A system and method for fencing memory accesses. Memory loads can be fenced, or all memory access can be fenced. The system receives a fencing instruction that separates memory access instructions into older accesses and newer accesses. A buffer within the memory ordering unit is allocated to the instruction. The access instructions newer than the fencing instruction are stalled. The older access instructions are gradually retired. When all older memory accesses are retired, the fencing instruction is dispatched from the buffer.
Dynamic rename based register reconfiguration of a vector register file
Reconfiguring a register file using a rename table having a plurality of fields that indicate fracture information about a source register of an instruction for instructions which have narrow to wide dependencies.
System and method of data processing
A data processing apparatus, a data processing method and a computer program product are disclosed. In an embodiment, the data processing apparatus comprises: a processor comprising a plurality of parallel lanes for parallel processing of sets of threads, each lane comprising a plurality of pipelined stages, the pipelined stages of each lane being operable to process instructions from the sets of threads; and scheduling logic operable to schedule instructions for processing by the lanes, the scheduling logic being operable to identify that one of the sets of threads being processed is to be split into a plurality of sub-sets of threads and to schedule at least two of the plurality of sub-sets of threads for processing by different pipelined stages concurrently.
Zero cycle move
A system and method for reducing the latency of data move operations. A register rename unit within a processor determines whether a decoded move instruction is eligible for a zero cycle move operation. If so, control logic assigns a physical register identifier associated with a source operand of the move instruction to the destination operand of the move instruction. Additionally, the register rename unit marks the given move instruction to prevent it from proceeding in the processor pipeline. Further maintenance of the particular physical register identifier may be done by the register rename unit during commit of the given move instruction.
Low-miss-rate and low-miss-penalty cache system and method
A method for assisting operations of a processor core coupled to a first memory and a second memory includes: examining instructions being filled from the first memory to the second memory to extract instruction information containing at least branch information of the instructions, and creating a plurality of tracks based on the extracted instruction information. Further, the method includes filling one or more instructions from the first memory to the second memory based on one or more tracks from the plurality of tracks before the processor core starts executing the instructions, such that the processor core fetches the instructions from the second memory for execution. Filling the instructions further includes pre-fetching from the first memory to the second memory instruction segments containing the instructions corresponding to at least two levels of branch target instructions based on the one or more tracks.
Dynamic thread sharing in branch prediction structures
Embodiments relate to multithreaded branch prediction. An aspect includes a system for dynamically evaluating how to share entries of a multithreaded branch prediction structure. The system includes a first-level branch target buffer coupled to a processor circuit. The processor circuit is configured to perform a method. The method includes receiving a search request to locate branch prediction information associated with the search request, and searching for an entry corresponding to the search request in the first-level branch prediction structure. The entry is not allowed based on a thread state of the entry indicating that the entry has caused a problem on a thread associated with the thread state.