G06F9/3804

Variable-length instruction buffer management

A vector processor is disclosed including a variety of variable-length instructions. Computer-implemented methods are disclosed for efficiently carrying out a variety of operations in a time-conscious, memory-efficient, and power-efficient manner. Methods for more efficiently managing a buffer by controlling the threshold based on the length of delay line instructions are disclosed. Methods for disposing multi-type and multi-size operations in hardware are disclosed. Methods for condensing look-up tables are disclosed. Methods for in-line alteration of variables are disclosed.

Instruction address translation and caching for primary and alternate branch prediction paths

Techniques for performing instruction fetch operations are provided. The techniques include determining instruction addresses for a primary branch prediction path; requesting that a level 0 translation lookaside buffer (“TLB”) caches address translations for the primary branch prediction path; determining either or both of alternate control flow path instruction addresses and lookahead control flow path instruction addresses; and requesting that either the level 0 TLB or an alternative level TLB caches address translations for either or both of the alternate control flow path instruction addresses and the lookahead control flow path instruction addresses.

Streaming execution for a quantum processing system
11567762 · 2023-01-31 · ·

Interactions between a classical computing system and a quantum computing system can be structured to increase the effective memory available to hold instructions for a quantum processor. The system stores a schedule of compiled quantum processing instructions in a memory storage location on a classical computing system. A small program memory is included in close proximity to a control system for the quantum processor on the quantum computing system. The classical computing system sends a subset of instructions from the schedule of quantum instructions to the program memory. The control system manages execution of the instructions by accessing them at the program memory and configuring the quantum processor accordingly. While the quantum processor executes the instructions, additional instructions are transferred from the classical computing system to the program memory to await execution. The quantum system can execute many instructions quickly without idling while instructions are fetched from a large memory.

COHERENCE-BASED DYNAMIC CODE REWRITING, TRACING AND CODE COVERAGE
20230028825 · 2023-01-26 ·

A device tracks accesses to pages of code executed by processors and modifies a portion of the code without terminating the execution of the code. The device is connected to the processors via a coherence interconnect and a local memory of the device stores the code pages. As a result, any requests to access cache lines of the code pages made by the processors will be placed on the coherence interconnect, and the device is able to track any cache-line accesses of the code pages by monitoring the coherence interconnect. In response to a request to read a cache line having a particular address, a modified code portion is returned in place of the code portion stored in the code pages.

Multi-table Signature Prefetch

Techniques are disclosed relating to signature-based instruction prefetching. In some embodiments, processor pipeline circuitry executes a computer program that includes control transfer instructions, such that the execution follows a taken path through the computer program. First signature prefetch table circuitry indicates prefetch addresses for signatures generated using a first signature generation technique and second signature prefetch table circuitry that indicates prefetch addresses for signatures generated using a second, different signature generation technique. Signature prefetch circuitry, in response to a prefetch training event: determines a first signature according to the first technique and a second signature according to the second technique and selects one but not both of the first and second signature prefetch tables to train using the first signature or the second signature.

Apparatus and method for generating and processing a trace stream indicative of instruction execution by processing circuitry

An apparatus and method are provided for generating and processing a trace stream indicative of instruction execution by processing circuitry. An apparatus has an input interface for receiving instruction execution information from the processing circuitry indicative of a sequence of instructions executed by the processing circuitry, and trace generation circuitry for generating from the instruction execution information a trace stream comprising a plurality of trace elements indicative of execution by the processing circuitry of instruction flow changing instructions within the sequence. The sequence may include a branch behaviour setting instruction that indicates an identified instruction within the sequence, where execution of the branch behaviour setting instruction enables a branch behaviour to be associated with the identified instruction that causes the processing circuitry to branch to a target address identified by the branch behaviour setting instruction when the identified instruction is encountered in the sequence. The trace generation circuitry is further arranged to generate, from the instruction execution information, a trace element indicative of execution behaviour of the branch behaviour setting instruction, and a trace element to indicate that the branch behaviour has been triggered on encountering the identified instruction within the sequence. This enables a very efficient form of trace stream to be used even in situations where the instruction sequence executed by the processing circuitry includes such branch behaviour setting instructions.

THREAD PRIORITIES USING MISPREDICTION RATE AND SPECULATIVE DEPTH

Methods and systems for determining a priority of a threads is described. A processor can execute branch instructions of the thread. The processor can predict branch instruction outcomes of the branch instructions of the thread. The processor can increment a misprediction count of the thread in response to an actual execution of a branch instruction of the thread being different from a corresponding branch instruction prediction outcome of the thread. The processor can determine the priority of the thread based on the misprediction count of the thread.

Controlling accesses to a branch prediction unit for sequences of fetch groups

An electronic device is described that handles control transfer instructions (CTIs) when executing instructions in program code. The electronic device has a processor that includes a branch prediction functional block and a sequential fetch logic functional block. The sequential fetch logic functional block determines, based on a record associated with a CTI, that a specified number of fetch groups of instructions that were previously determined to include no CTIs are to be fetched for execution in sequence following the CTI. When each of the specified number of fetch groups is fetched and prepared for execution, the sequential fetch logic prevents corresponding accesses of the branch prediction functional block for acquiring branch prediction information for instructions in that fetch group.

Multi-Threaded Secure Processor with Control Flow Attack Detection

A fault detecting multi-thread pipeline processor with fault detection is operative with a single pipeline stage which generates branch status comprising at least one of branch taken/not_taken, branch direction, and branch target. A first thread has control and data instructions, the control instructions comprising loop instructions including unconditional and conditional branch instructions, loop initialization instructions, loop arithmetic instructions, and no operation (NOP) instructions. A second thread has only control instructions and either has the non-control instructions replaced with NOP instructions, or removed entirely. A fault detector compares the branch status of the first thread and second thread and asserts a fault output when they do not match.

PROFILING OF SAMPLED OPERATIONS PROCESSED BY PROCESSING CIRCUITRY

Processing circuitry performs data processing operations in response to instructions fetched from a cache or memory or micro-operations decoded from the instructions. Sampling circuitry selects a subset of instructions or micro-operations as sampled operations to be profiled. Profiling circuitry captures, in response to processing of an instruction or micro-operation selected as a sampled operation, a sample record specifying an operation type of the sampled operation and information about behaviour of the sampled operation which is directly attributed to the sampled operation. The profiling circuitry can include, in the sample record for a sampled operation corresponding to a given instruction, a reference instruction address indicator indicative of an address of a reference instruction appearing earlier or later in program order than the given instruction, for which control flow is sequential between any instructions occurring between the reference instruction and the given instruction in program order.