G06F9/3804

INSTRUCTION ADDRESS TRANSLATION AND INSTRUCTION PREFETCH ENGINE

Techniques for performing instruction fetch operations are provided. The techniques include determining instruction addresses for a primary branch prediction path; requesting that a level 0 translation lookaside buffer (“TLB”) caches address translations for the primary branch prediction path; determining either or both of alternate control flow path instruction addresses and lookahead control flow path instruction addresses; and requesting that either the level 0 TLB or an alternative level TLB caches address translations for either or both of the alternate control flow path instruction addresses and the lookahead control flow path instruction addresses.

REUSING FETCHED, FLUSHED INSTRUCTIONS AFTER AN INSTRUCTION PIPELINE FLUSH IN RESPONSE TO A HAZARD IN A PROCESSOR TO REDUCE INSTRUCTION RE-FETCHING
20210397453 · 2021-12-23 ·

Reusing fetched, flushed instructions after an instruction pipeline flush in response to a hazard in a processor to reduce instruction re-fetching is disclosed. An instruction processing circuit is configured to detect fetched performance degrading instructions (Pals) in a pre-execution stage in an instruction pipeline that may cause a precise interrupt that would cause flushing of the instruction pipeline. In response to detecting a PDI in an instruction pipeline, the instruction processing circuit is configured to capture the fetched PDI and/or its successor, younger fetched instructions that are processed in the instruction pipeline behind the PDI, in a pipeline refill circuit. If a later execution of the PDI in the instruction pipeline causes a flush of the instruction pipeline, the instruction processing circuit can inject the fetched PDI and/or its younger instructions previously captured from the pipeline refill circuit into the instruction pipeline to be processed without such instructions being re-fetched.

VIRTUAL 3-WAY DECOUPLED PREDICTION AND FETCH

A unified queue configured to perform decoupled prediction and fetch operations, and related apparatuses, systems, methods, and computer-readable media, is disclosed. The unified queue has a plurality of entries, where each entry is configured to store information associated with at least one instruction, and where the information comprises an identifier portion, a prediction information portion, and a tag information portion. The unified queue is configured to update the prediction information portion of each entry responsive to a prediction block, and to update the tag information portion of each entry responsive to a tag and TLB block. The prediction information may be updated more than once, and the unified queue is configured to take corrective action where a later prediction conflicts with an earlier prediction.

Streaming execution for a quantum processing system
11194573 · 2021-12-07 · ·

Interactions between a classical computing system and a quantum computing system can be structured to increase the effective memory available to hold instructions for a quantum processor. The system stores a schedule of compiled quantum processing instructions in a memory storage location on a classical computing system. A small program memory is included in close proximity to a control system for the quantum processor on the quantum computing system. The classical computing system sends a subset of instructions from the schedule of quantum instructions to the program memory. The control system manages execution of the instructions by accessing them at the program memory and configuring the quantum processor accordingly. While the quantum processor executes the instructions, additional instructions are transferred from the classical computing system to the program memory to await execution. The quantum system can execute many instructions quickly without idling while instructions are fetched from a large memory.

MERGED BRANCH TARGET BUFFER ENTRIES

Merging branch target buffer entries includes maintaining, in a branch target buffer, an entry corresponding to first branch instruction, where the entry identifies a first branch target address for the first branch instruction and a second branch target address for a second branch instruction; and accessing, based on the first branch instruction, the entry.

COMPILER ADAPTED IN GRAPHICS PROCESSING UNIT AND NON-TRANSITORY COMPUTER-READABLE MEDIUM

A compiler includes a front-end module, an optimization module, and a back-end module. The front-end module pre-processes a source code to generate an intermediate code. The optimization module optimizes the intermediate code. The back-end module translates the optimized intermediate code to generate a machine code. Optimization includes translating a branch instruction in the intermediate code into performing the following operations: establishing a post dominator tree for the branch instruction to find an immediate post dominator of the branch instruction as a reconverge point of a first path and a second path of the branch instruction; inserting a specific instruction at the front end of the reconverge point, so as to jumping to execute the instructions of the second path on the condition that once the specific instruction on the first path is executed.

ALTERNATE PATH DECODE FOR HARD-TO-PREDICT BRANCH

An embodiment of an integrated circuit may comprise a core, a front end unit coupled to the core to decode one or more instruction wherein the front end unit includes a first decode path, a second decode path, and circuitry to: predict a taken branch of a conditional branch instruction of the one or more instructions, decode a predicted path of the taken branch on the first decode path, determine if the conditional branch instruction corresponds to a hard-to-predict conditional branch instruction and if the second decode path is available and, if so determined, decode an alternate path of a not-taken branch of the hard-to-predict conditional branch instruction on the second decode path. Other embodiments are disclosed and claimed.

Instruction-level context switch in SIMD processor

Techniques are disclosed relating to context switching in a SIMD processor. In some embodiments, an apparatus includes pipeline circuitry configured to execute graphics instructions included in threads of a group of single-instruction multiple-data (SIMD) threads in a thread group. In some embodiments, context switch circuitry is configured to atomically: save, for the SIMD group, a program counter and information that indicates whether threads in the SIMD group are active using one or more context switch registers, set all threads to an active state for the SIMD group, and branch to handler code for the SIMD group. In some embodiments, the pipeline circuitry is configured to execute the handler code to save context information for the SIMD group and subsequently execute threads of another thread group. Disclosed techniques may allow instruction-level context switching even when some SIMD threads are non-active.

Reusing fetched, flushed instructions after an instruction pipeline flush in response to a hazard in a processor to reduce instruction re-fetching

Reusing fetched, flushed instructions after an instruction pipeline flush in response to a hazard in a processor to reduce instruction re-fetching is disclosed. An instruction processing circuit is configured to detect fetched performance degrading instructions (Pals) in a pre-execution stage in an instruction pipeline that may cause a precise interrupt that would cause flushing of the instruction pipeline. In response to detecting a PDI in an instruction pipeline, the instruction processing circuit is configured to capture the fetched PDI and/or its successor, younger fetched instructions that are processed in the instruction pipeline behind the PDI, in a pipeline refill circuit. If a later execution of the PDI in the instruction pipeline causes a flush of the instruction pipeline, the instruction processing circuit can inject the fetched PDI and/or its younger instructions previously captured from the pipeline refill circuit into the instruction pipeline to be processed without such instructions being re-fetched.

Apparatus and method for controlling allocation of instructions into an instruction cache storage

An apparatus and method are provided for controlling allocation of instructions into an instruction cache storage. The apparatus comprises processing circuitry to execute instructions, fetch circuitry to fetch instructions from memory for execution by the processing circuitry, and an instruction cache storage to store instructions fetched from the memory by the fetch circuitry. Cache control circuitry is responsive to the fetch circuitry fetching a target instruction from a memory address determined as a target address of an instruction flow changing instruction, at least when the memory address is within a specific address range, to prevent allocation of the fetched target instruction into the instruction cache storage unless the fetched target instruction is at least one specific type of instruction. It has been found that such an approach can inhibit the performance of speculation-based caching timing side-channel attacks.