G06F9/30058

Systems and methods for optimizing authentication branch instructions

Systems, apparatuses, and methods for efficient handling of subroutine epilogues. When an indirect control transfer instruction corresponding to a procedure return for a subroutine is identified, the return address and a signature are retrieved from one or more of a return address stack and the memory stack. An authenticator generates a signature based on at least a portion of the retrieved return address. While the signature is being generated, instruction processing speculatively continues. No instructions are permitted to commit yet. The generated signature is later compared to a copy of the signature generated earlier during the corresponding procedure call. A mismatch causes an exception.

Correlation of thread intensity and heap usage to identify heap-hoarding stack traces
11640320 · 2023-05-02 · ·

Embodiments identify heap-hoarding stack traces to optimize memory efficiency. Some embodiments can determine a length of time when heap usage by processes exceeds a threshold. Some embodiments may then determine heap information of the processes for the length of time, where the heap information comprise heap usage information for each interval in the length of time. Next, some embodiments can determine thread information of the one or more processes for the length of time, wherein determining the thread information comprises determining classes of threads and wherein the thread information comprises, for each of the classes of threads, thread intensity information for each of the intervals. Some embodiments may then correlate the heap information with the thread information to identify code that correspond to the heap usage exceeding the threshold. Some embodiments may then initiate actions associated with the code.

Detecting a dynamic control flow re-convergence point for conditional branches in hardware

Systems, methods, and apparatuses relating to hardware for auto-predication of critical branches. In one embodiment, a processor core includes a decoder to decode instructions into decoded instructions, an execution unit to execute the decoded instructions, a branch predictor circuit to predict a future outcome of a branch instruction, and a branch predication manager circuit to disable use of the predicted future outcome for a conditional critical branch comprising the branch instruction.

Program flow classification

Execution flows of a program can be characterized by a series of execution events. The rates at which these execution events occur for a particular program can be collected periodically, and the execution events statistics can be utilized for both training a machine learning model, and later on for making classification inferences to determine whether a program run contains any abnormality. When an abnormality is encountered, an alert can be generated and provided to supervisory logic of a computing system to indicate that an abnormal program flow has been detected.

ADDRESS MANIPULATION USING INDICES AND TAGS
20230205534 · 2023-06-29 ·

Techniques are disclosed for address manipulation using indices and tags. A first index is generated from bits of a processor program counter, where the first index is used to access a branch predictor bimodal table. A first branch prediction is provided from the bimodal table, based on the first index. The first branch prediction is matched against N tables, where the tables contain prior branch histories, and where: the branch history in table T(N) is of greater length than the branch history of table T(N-1), and the branch history in table T(N-1) is of greater length than the branch history of table T(N-2). A processor address is manipulated using a greatest length of hits of branch prediction matches from the N tables, based on one or more hits occurring. The branch predictor address is manipulated using the first branch prediction from the bimodal table, based on zero hits occurring.

BRANCH TARGET PREDICTOR
20170371669 · 2017-12-28 ·

A method for predicting a fetch address of a next instruction to be fetched includes selecting, at a processor, a first way identifier or a second way identifier as a way pointer based on an active fetch address and historical prediction data. A first predictor table includes a first entry having the first way identifier and a second predictor table includes a second entry having the second way identifier. The method also includes selecting a first or second fetch address as a predicted fetch address based on the way pointer. A target table includes a first way storing the first fetch address and a second way storing the second fetch address. The first way and the second way are associated with the active fetch address. The first fetch address is associated with the first way identifier and the second fetch address is associated with the second way identifier.

STREAM BASED BRANCH PREDICTION INDEX ACCELERATOR WITH POWER PREDICTION

A computer-implemented method for predicting a taken branch that ends an instruction stream in a pipelined high frequency microprocessor includes receiving, by a processor, a first instruction within a first instruction stream, the first instruction including a first instruction address. The computer-implemented method further includes searching, by the processor, a stream-based index accelerator predictor one time for the stream; determining, by the processor, a prediction for a branch ending the branch stream; influencing, by the processor, a metadata prediction engine based on the prediction; and updating, by the processor, a stream-based index accelerator predictor with information indicative of the prediction.

VARIABLE BRANCH TARGET BUFFER (BTB) LINE SIZE FOR COMPRESSION

Embodiments include method, systems and computer program products for variable branch target buffer line size for compression. In some embodiments, a branch target buffer (BTB) congruence class for a line of a first parent array of a BTB may be determined. A threshold indicative of a maximum number branches to be stored in the line may be set. A branch may be received to store in the line of the first parent array. A determination may be made that storing the branch in the line would exceed the threshold and the line can be responsively split into an even half line and an odd half line.

Divergent Control Flow for Fused EUs

Embodiments provide support for divergent control flow in heterogeneous compute operations on a fused execution unit. On embodiment provides for a processing apparatus comprising a fused execution unit including multiple graphics execution units having a common instruction pointer; logic to serialize divergent function calls by the fused execution unit, the logic configured to compare a call target of execution channels within the fused execution unit and create multiple groups of channels, each group of channels associated with a single call target; and wherein the fused execution unit is to execute a first group of channels via a first execution unit and a second group of channels via a second execution unit.

Apparatus and method for combining thread warps with compatible execution masks for simultaneous execution and increased lane utilization
09851977 · 2017-12-26 · ·

A method for executing instructions on a single-program, multiple-data processor system having a fixed number of execution lanes, including: scheduling a primary instruction for execution with a first wave of multiple data; assigning the first wave to a corresponding primary subset of the execution lanes; scheduling a secondary instruction having a second wave of multiple data, such that the second wave fits in lanes that are unused by the primary subset of lanes; assigning the second wave to a corresponding secondary subset of the lanes; fetching the primary and secondary instructions; configuring the execution lanes such that the primary subset is responsive to the primary instruction and the secondary subset is simultaneously responsive to the secondary instruction; and simultaneously executing the primary and secondary instructions in the execution lanes.