G06F9/30058

Conditional branch frame barrier
11249758 · 2022-02-15 · ·

Establishing a conditional branch frame barrier is described. A conditional branch in a function epilogue is used to provide frame-specific control. The conditional branch evaluates a return condition to determine whether to return from a callee function to a calling function, or to execute a slow path instead. The return condition is evaluated based on a thread local value. The thread local value is set such that returns to potentially unsafe frames in a call stack are prohibited. The prohibition to return to a potentially unsafe frame may be referred to as a “frame barrier.” Additionally, the thread local value may be used to establish safepointing and/or thread local handshakes, both after execution of a function body and after execution of a loop body.

Method and apparatus for efficient execution of nested branches on a graphics processor unit

An apparatus and method for executing nested control flow instructions on a graphics processing unit (GPU). For example, one embodiment of a processor comprises: an execution unit having a plurality of channels to execute control flow instructions including fused control flow instructions comprising two or more consecutive control flow instructions fused into a single fused control flow instruction; and a branch unit to process the control flow instructions and to maintain a global counter indicating a nesting level of the control flow instructions, wherein to process a fused control flow instruction, the branch unit is to store a value N in a stack indicating a number of control flow instructions fused into the fused control flow instruction, the branch unit to subsequently read the value N from the stack upon execution of the fused control flow instruction and decrement the global counter by a value of N responsive to execution of the fused control flow instruction.

Circuit for increasing voltage swing of a local oscillator waveform signal

A bootstrap circuit for increasing the voltage swing of a local oscillator waveform signal. The bootstrap circuit comprises a driver stage for driving at an output thereof a local oscillator waveform signal having an increased voltage swing. The driver stage comprises a first supply voltage node and a second supply voltage node. The bootstrap circuit further comprises at least one energy storage component arranged to store energy within an energy storage element when the voltage level at the input node of the driver stage comprises the second voltage state and use the energy stored within the energy storage element to generate an increased voltage level, and to apply the increased voltage level to the first supply voltage node of the driver stage when the voltage level at the input node of the driver stage comprises the first voltage state.

RISC-V BRANCH PREDICTION METHOD, DEVICE, ELECTRONIC DEVICE AND STORAGE MEDIUM

A RISC-V branch prediction method and device, an electronic device and a computer readable storage medium are provided. On the basis of the prior art, the remaining jump times of the jump instruction are additionally acquired, and the single jump step length (the single jump step length is not fixed to be 1) is calculated according to the difference of remaining jump times during two consecutive jumps, whether the target jump instruction has executed the last jump can be judged according to the single jump step length of a jump instruction and in combination with the real-time remaining jump times, so as to determine the jump times that need to be executed subsequently according to the judgment result.

Thread-group-scoped gate instruction

Techniques are disclosed relating to a thread-group-scoped gate instruction. In some embodiments, graphics processor circuitry is configured to execute, for multiple SIMD groups of a thread group, a graphics program that includes a gate instruction. During execution of the gate instruction for a first SIMD group, the processor accesses state information to determine that a threshold number of other SIMD groups in the thread group have not yet executed the gate instruction. Based on the determination, the processor executes a particular set of instructions of the graphics program for the first SIMD group (that is not executed by one or more other SIMD groups that reach the gate instruction after the first SIMD group). For example, the particular set of instructions may be a utility program that performs one or more operations for the entire thread group but is only executed by a subset of the SIMD groups.

Tamper proof logging for automated processes

A manifest for an automated system is generated, wherein the manifest comprises a record of a plurality of algorithms configured to be used in operation of the automated system. An operational audit branch is generated from the manifest in response to execution of one or more algorithms of the plurality of algorithms. The generation of the operational audit branch comprises recording one or more inputs used by the one or more algorithms, and recording one or more outputs generated by the one or more algorithms.

OPPORTUNISTIC CONSUMER INSTRUCTION STEERING BASED ON PRODUCER INSTRUCTION VALUE PREDICTION IN A MULTI-CLUSTER PROCESSOR

Opportunistic consumer instruction steering based on producer instruction value prediction in a multi-cluster processor is disclosed. A processor provides producer instructions and consumer instructions to a steering circuit that steers the program instructions to clusters of instruction execution circuits. An input value provided to a consumer instruction may be a produced value of a producer instruction, creating a dependency. The steering circuit steers a producer instruction to a first cluster and, in response to receiving the consumer instruction and the predicted value of the producer instruction, provides the predicted value to at least a second cluster and steers the consumer instruction to the second cluster for execution with the predicted value as the input value. A consumer instruction can be executed in a different cluster than a producer instruction without a cluster-to-cluster latency penalty, which allows the instruction loads to be better balanced among the clusters for higher processor throughput.

Controlling Prediction Functional Blocks Used by a Branch Predictor in a Processor
20210382718 · 2021-12-09 ·

An electronic device includes a processor, a branch predictor in the processor, and a predictor controller in the processor. The branch predictor includes multiple prediction functional blocks, each prediction functional block configured for generating predictions for control transfer instructions (CTIs) in program code based on respective prediction information, the branch predictor configured to select, from among predictions generated by the prediction functional blocks for each CTI, a selected prediction to be used for that CTI. The predictor controller keeps a record of prediction functional blocks from which the branch predictor previously selected predictions for CTIs. The predictor controller uses information from the record for controlling which prediction functional blocks are used by the branch predictor for generating predictions for CTIs.

Conditional Branching Control for a Multi-Threaded, Self-Scheduling Reconfigurable Computing Fabric
20210373890 · 2021-12-02 ·

Representative apparatus, method, and system embodiments are disclosed for configurable computing. A representative system includes an interconnection network; a processor; and a plurality of configurable circuit clusters. Each configurable circuit cluster includes a plurality of configurable circuits arranged in an array; a synchronous network coupled to each configurable circuit of the array; and an asynchronous packet network coupled to each configurable circuit of the array. A representative configurable circuit includes a configurable computation circuit and a configuration memory having a first, instruction memory storing a plurality of data path configuration instructions to configure a data path of the configurable computation circuit; and a second, instruction and instruction index memory storing a plurality of spoke instructions and data path configuration instruction indices for selection of a master synchronous input, a current data path configuration instruction, and a next data path configuration instruction for a next configurable computation circuit.

ADDRESS MANIPULATION USING INDICES AND TAGS

Techniques are disclosed for address manipulation using indices and tags. A first index is generated from bits of a processor program counter, where the first index is used to access a branch predictor bimodal table. A first branch prediction is provided from the bimodal table, based on the first index. The first branch prediction is matched against N tables, where the tables contain prior branch histories, and where: the branch history in table T(N) is of greater length than the branch history of table T(N-1), and the branch history in table T(N-1) is of greater length than the branch history of table T(N-2). A processor address is manipulated using a greatest length of hits of branch prediction matches from the N tables, based on one or more hits occurring. The branch predictor address is manipulated using the first branch prediction from the bimodal table, based on zero hits occurring.