G06F9/322

BRANCH PREDICTOR
20200285477 · 2020-09-10 ·

An apparatus comprises processing circuitry to perform data processing in response to instructions fetched from an instruction cache, an instruction prefetcher to speculatively prefetch instructions into the instruction cache, and a branch predictor having at least one branch prediction structure to store branch prediction data for predicting at least one branch property of an instruction fetched for processing by the processing circuitry. On prefetching of a given instruction into the instruction cache by the instruction prefetcher, the branch predictor is configured to perform a prefetch-triggered update of the branch prediction data based on information derived from the given instruction prefetched by the instruction prefetcher. This can help to improve performance, especially for workloads with a high branch density and large branch re-reference interval.

CALL STACK SAMPLING
20200285606 · 2020-09-10 ·

Apparatuses and methods of their operation are disclosed. A call stack is maintained which comprises subroutine information relating to subroutines which have been called during data processing operations and have not yet returned. A stack pointer is indicative of an extremity of the call stack associated with a most recently called subroutine which has been called during the data processing operations and has not yet returned. Call stack sampling can be carried out with reference to the stack pointer. A tide mark pointer is maintained, which indicates of a value which the stack pointer had when the call stack sampling procedure processing circuitry was last completed. The call stack sampling procedure comprises retrieving subroutine information from the call stack indicated between the value of the tide mark pointer and the current value of the stack pointer. More efficient call stack sampling is thereby supported, in that only modifications to the call stack need be sampled.

Processor supporting arithmetic instructions with branch on overflow and methods
10768930 · 2020-09-08 · ·

A method provides for decoding, in a microprocessor, an instruction into data identifying a first register, a second register, an immediate value, and an opcode identifier. The opcode identifier is interpreted as indicating that an arithmetic operation is to be performed on the first register and the second register, and that the microprocessor is to perform a change of control operation in response to the addition of the first register and the second register causing overflow or underflow. The change of control operation is to a location in a program determined based on the immediate value. A processor can be provided with a decoder and other supporting circuitry to implement such method. Overflow/underflow can be checked on word boundaries of a double-word operation.

Branch instruction

A data processing system provides a branch forward instruction (BF) which has programmable parameters specifying a branch target address to be branched to and a branch point identifying a program instruction following the branch forward instruction which, when reached, is followed by a branch to the branch target address.

Scan-on-fill next fetch target prediction

Systems, apparatuses, and methods for instruction next fetch prediction. A scan-on-fill target predictor in a processor generates a predicted next fetch address for the instruction fetch unit. When a group of instructions is used to fill an instruction cache but is not currently being retrieved from the instruction cache for processing by other pipeline stages, the group of instructions are scanned to identify exit points of basic blocks within the group. An entry of a table in the scan-on-fill target predictor is allocated for an instruction in a basic block in the group when the basic block has an exit point with a target address that can be resolved within a single clock cycle. The scan-on-fill target predictor may perform a lookup of the table with the current fetch address. The prediction may be compared to a main branch predictor at a later pipeline stage for training purposes.

Methods and apparatus to insert profiling instructions into a graphics processing unit kernel

Embodiments are disclosed for inserting profiling instructions into graphics processing unit (GPU) kernels. An example apparatus includes an entry point detector to detect a first entry point address and a second entry point address of an original GPU kernel. An instruction inserter is to create a corresponding instrumented GPU kernel from the original GPU kernel by adding instructions of the original GPU kernel and one or more profiling instructions to the instrumented GPU kernel. The instruction inserter is to insert, at the first entry point address of the instrumented GPU kernel, a first jump instruction to jump to first profiling initialization instructions, the instruction inserter to insert, at the second entry point address of the instrumented GPU kernel, a second jump instruction to jump to second profiling initialization instructions. The instruction inserter is to insert profiling measurement instructions of the profiling instructions into the instrumented GPU kernel.

METHOD OF ENFORCING CONTROL FLOW INTEGRITY IN A MONOLITHIC BINARY USING STATIC ANALYSIS

Method of enforcing control flow integrity (CFI) for a monolithic binary using static analysis by: marking evaluated functions as core functions by a chosen heuristic or empirically; generating a binary call graph; merging all function nodes of core functions as a node of highest privilege (set 0); merging all leaf functions in one node without privilege (set n); merging all nodes without privilege that reach functions of privilege i and setting the merged node privilege to i+1; checking if there is a node without privilege besides a trivial function; in a positive case, returning to merging all nodes without privilege and setting the merged node privilege to i+1; and in a negative case, setting the privilege of trivial functions as i+2.

AN APPARATUS AND METHOD FOR CONTROLLING EXECUTION OF INSTRUCTIONS
20200201643 · 2020-06-25 ·

An apparatus and method are provided for controlling execution of certain instructions. The apparatus has processing circuitry to execute a sequence of instructions, an integer storage element to store an integer value for access by the processing circuitry, and a capability storage element for storing a capability for access by the processing circuitry. A capability usage storage is then used to store capability usage information. The processing circuitry is responsive to execution of at least one instruction in the sequence of instructions to generate, in dependence on the capability usage information, a result to be stored in a destination storage element. In particular, when the capability usage information identifies a capability state, the result is generated as a capability, and the capability storage element is selected as the destination storage element. Conversely, when the capability usage information identifies a non-capability state, the result is generated as an integer value, and the integer storage element is selected as the destination storage element. This allows both capability and non-capability generating variants of an instruction to be specified, without requiring separate instructions to be provided within the instruction set.

Power saving branch modes in hardware

A method and apparatus are provided. The method includes executing a plurality of threads in a temporal dimension, executing a plurality of threads in a spatial dimension, determining a branch target address for each of the plurality of threads in the temporal dimension and the plurality of threads in the spatial dimension, and comparing each of the branch target addresses to determine a minimum branch target address, wherein the minimum branch target address is a minimum value among branch target addresses of each of the plurality of threads.

METHOD AND APPARATUS TO ALLOW EARLY DEPENDENCY RESOLUTION AND DATA FORWARDING IN A MICROPROCESSOR
20200174792 · 2020-06-04 ·

A microprocessor implemented method is disclosed. The method includes mapping a plurality of instructions in a guest address space to corresponding instructions in a native address space. The method further includes, for each of one or more guest branch instructions in said native address space fetched during execution, performing the following: determining a youngest prior guest branch target stored in a guest branch target register, determining a branch target for a respective guest branch instruction by adding an offset value for said respective guest branch instruction to said youngest prior guest branch target, where said offset value is adjusted to account for a difference in address in said guest address space between an instruction at a beginning of a guest instruction block and a branch instruction in said guest instruction block. The method further includes creating an entry in said guest branch target register for said branch target.