Patent classifications
G06F9/30058
POWER MANAGEMENT OF BRANCH PREDICTORS IN A COMPUTER PROCESSOR
A computer processor includes a branch prediction unit that includes a local branch predictor and a global branch predictor. Managing power consumption in such a computer processor includes, for each of a plurality of branch instructions: performing, by the local branch predictor, a local branch prediction; performing, by each of the global branch predictors, a global branch prediction; determining to utilize the local branch prediction over the global branch predictions as a branch prediction for the branch instruction; incrementing a value of a counter; determining whether the value of the counter exceeds a predetermined threshold; and if the value of the counter exceeds the predetermined threshold, powering down at least one of the global branch predictors and configuring the branch prediction unit to bypass the powered down global branch predictor for branch predictions of subsequent branch instructions.
DISTANCE-BASED BRANCH PREDICTION AND DETECTION
Examples of techniques for distance-based branch prediction are disclosed. In one example implementation according to aspects of the present disclosure, a computer-implemented method includes: determining, by a processing system, a potential return instruction address (IA) by determining whether a relationship is satisfied between a first target IA and a first branch IA; storing a second branch IA as a return when a target IA of a second branch matches a potential return IA for the second branch; and applying the potential return IA for the second branch as a predicted target IA of a predicted branch IA stored as a return
ZERO-OVERHEAD LOOP IN AN EMBEDDED DIGITAL SIGNAL PROCESSOR
A decoding logic method is arranged to execute a zero-overhead loop in an embedded digital signal processor (DSP). In the method, instruction data is fetched from a memory, and a plurality of instruction tokens, which are derived from the instruction data, are stored in a token buffer. A first portion of one or more instruction tokens from the token buffer are passed to a first decode module, which may be an instruction decode module, and a second portion of the one or more instruction tokens from the token buffer are passed to a second decode module, which may be a loop decode module. The second decode module detects a special loop instruction token, and based on the detection of the special loop instruction token, a loop counter is conditionally tested. Using the first decode module, at least one instruction token of an iterative algorithm is assembled into a single instruction, which is executable in a single execution cycle. Based on the conditional test of the loop counter, the first decode module further assembles a loop branch instruction of the iterative algorithm into the single instruction executable in one execution cycle.
Enhanced protection of processors from a buffer overflow attack
A method for changing a processor instruction randomly, covertly, and uniquely, so that the reverse process can restore it faithfully to its original form, making it virtually impossible for a malicious user to know how the bits are changed, preventing them from using a buffer overflow attack to write code with the same processor instruction changes into said processor's memory with the goal of taking control of the processor. When the changes are reversed prior to the instruction being executed, reverting the instruction back to its original value, malicious code placed in memory will be randomly altered so that when it is executed by the processor it produces chaotic, random behavior that will not allow control of the processor to be compromised, eventually producing a processing error that will cause the processor to either shut down the software process where the code exists to reload, or reset.
Hardware and software solutions to divergent branches in a parallel pipeline
A system and method for efficiently processing instructions in hardware parallel execution lanes within a processor. In response to a given divergent point within an identified loop, a compiler arranges instructions within the identified loop into very large instruction words (VLIW's). At least one VLIW includes instructions intermingled from different basic blocks between the given divergence point and a corresponding convergence point. The compiler generates code wherein when executed assigns at runtime instructions within a given VLIW to multiple parallel execution lanes within a target processor. The target processor includes a single instruction multiple data (SIMD) micro-architecture. The assignment for a given lane is based on branch direction found at runtime for the given lane at the given divergent point. The target processor includes a vector register for storing indications indicating which given instruction within a fetched VLIW for an associated lane to execute.
Suspending branch prediction upon entering transactional execution mode
In a computer supporting Transactional Memory (TM) Transaction Execution (TX), use of speculative branch prediction is programmably suspended during TX, and programmably resumed. The branch prediction suspension may cause the execution of one or more instructions following the branch instruction to stall in the pipeline until branch conditions and branch target addresses are resolved.
EXECUTING SYSTEM CALL VECTORED INSTRUCTIONS IN A MULTI-SLICE PROCESSOR
Executing system call vectored (SCV) instructions in a multi-slice processor including receiving, by an instruction fetch unit, a SCV instruction, wherein the SCV instruction is a system call from an operating system; sending the SCV instruction to a branch issue queue; determining, by the branch issue queue, that the SCV instruction is next-to-complete; issuing the SCV instruction to a branch resolution unit; and executing the SCV instruction by the branch resolution unit.
Analysis system and method for reducing the control flow divergence in the Graphics Processing Units (GPUs)
The invention discloses an analysis system and method for reducing control flow divergence in the Graphics Processing Units (GPUs). A computing unit is used to count the number of branch, number of cycle, and to calculate at least one direction ratio. A profiler is used to determine whether the code having the optimized control flow structure and the specialized branch or not. The optimization decision unit can determine which transform pattern can be used to transform the sub-control flow structure.
Graphic Processor Unit with Improved Energy Efficiency
A GPU architecture employs a crossbar switch to preferentially store operand vectors in a compressed form allowing reduction in the number of memory circuits that must be activated during an operand fetch and to allow existing execution units to be used for scalar execution. Scalar execution can be performed during branch divergence.
TECHNIQUES FOR PREDICTING A TARGET ADDRESS OF AN INDIRECT BRANCH INSTRUCTION
A technique for operating a processor includes identifying a difficult branch instruction (branch) whose target address (target) has been mispredicted multiple times. Information about the branch (which includes a current target and a next target) is learned and stored in a data structure. In response to the branch executing subsequent to the storing, whether a branch target of the branch corresponds to the current target in the data structure is determined. In response to the branch target of the branch corresponding to the current target of the branch in the data structure, the next target of the branch that is associated with the current target of the branch in the data structure is determined. In response to detecting that a next instance of the branch has been fetched, the next target of the branch is utilized as the predicted target for execution of the next instance of the branch.