G06F9/3806

NEURON CACHE-BASED HARDWARE BRANCH PREDICTION
20230078582 · 2023-03-16 ·

A branch prediction system includes a neuron cache and logic coupled to the neuron cache. The neuron cache includes one or more weights of a neural network model trained for one or more selected code sections, and the logic is to be used with the neuron cache to predict a target address for a branch instruction of the one or more selected code sections.

METHOD, SYSTEM AND DEVICE FOR PIPELINE PROCESSING OF INSTRUCTIONS, AND COMPUTER STORAGE MEDIUM

A method, system and device for pipeline processing of instructions and a computer storage medium. The method comprises: acquiring a target instruction set (S101); acquiring a target prediction result, wherein the target prediction result is a result obtained by predicting a jump mode of the target instruction set (S102); performing pipeline processing on the target instruction set according to the target prediction result (S103); determining if a pipeline flushing request is received (S104); and if so, correspondingly saving the target instruction set and a corresponding pipeline processing result, so as to perform pipeline processing on the target instruction set again on the basis of the pipeline processing result (S105). By means of the method, system, and device and computer-readable storage medium, a target instruction set and a corresponding pipeline processing result are correspondingly saved, so that when the target instruction set is subsequently processed again, the saved pipeline processing result can be directly used to perform pipeline processing, and the efficiency of pipeline processing of instructions can be improved.

UPDATING METADATA PREDICTION TABLES USING A REPREDICTION PIPELINE

Aspects of the invention include a computer-implemented method of updating metadata prediction tables. The computer-implemented method includes establishing, in the metadata prediction tables, a prediction of how a set of instructions will resolve and identifying that the set of instructions is completed. The computer-implemented method also includes determining, upon completion of the set of instructions, whether prediction update queues (PUQs) associated with the set of instructions indicate that the set of instructions resolved in one of a plurality of proscribed manners relative to the prediction and deciding that the metadata predictions tables are candidates to be updated based on the PUQs indicating that the set of instructions resolved in one of the plurality of proscribed manners.

PROGRAM FLOW PREDICTION FOR LOOPS
20230130323 · 2023-04-27 ·

Instruction processing circuitry comprises fetch circuitry to fetch instructions for execution; instruction decoder circuitry to decode fetched instructions; execution circuitry to execute decoded instructions; and program flow prediction circuitry to predict a next instruction to be fetched; in which the instruction decoder circuitry is configured to decode a loop control instruction in respect of a given program loop and to derive information from the loop control instruction for use by the program flow prediction circuitry to predict program flow for one or more iterations of the given program loop.

Address manipulation using indices and tags

Techniques are disclosed for address manipulation using indices and tags. A first index is generated from bits of a processor program counter, where the first index is used to access a branch predictor bimodal table. A first branch prediction is provided from the bimodal table, based on the first index. The first branch prediction is matched against N tables, where the tables contain prior branch histories, and where: the branch history in table T(N) is of greater length than the branch history of table T(N-1), and the branch history in table T(N-1) is of greater length than the branch history of table T(N-2). A processor address is manipulated using a greatest length of hits of branch prediction matches from the N tables, based on one or more hits occurring. The branch predictor address is manipulated using the first branch prediction from the bimodal table, based on zero hits occurring.

Flushing a fetch queue using predecode circuitry and prediction information

A data processing apparatus is provided. It includes control flow detection prediction circuitry that performs a presence prediction of whether a block of instructions contains a control flow instruction. A fetch queue stores, in association with prediction information, a queue of indications of the instructions and the prediction information comprises the presence prediction. An instruction cache stores fetched instructions that have been fetched according to the fetch queue. Post-fetch correction circuitry receives the fetched instructions prior to the fetched instructions being received by decode circuitry, the post-fetch correction circuitry includes analysis circuitry that causes the fetch queue to be at least partly flushed in dependence on a type of a given fetched instruction and the prediction information associated with the given fetched instruction.

PARALLEL INSTRUCTION EXTRACTION METHOD AND READABLE STORAGE MEDIUM

The invention relates to the technical field of a processor, in particular to a method for parallel extracting instructions and a readable storage medium. The method generates a valid vector of fetched instructions according to the end position vector s_mark_end of the instruction, and performs parallel decoding of instructions at each position, calculation of instruction address and branch instruction target address operation through logical “AND” and logical “OR” operations. Ultimately, multiple instructions are fetched in parallel. The present invention is a method for generating a valid vector of fetching instructions according to the end position vector s_mark_end of the instruction, and extracting multiple instructions in parallel through logical “AND” and logical “OR” operations. The invention can extract a plurality of instructions in parallel, there is no serial dependence relationship between each instruction, and the time sequence is easy to converge, so a higher main frequency can be obtained.

Method and system for enhanced multi-address read operations in low pin count interfaces

A memory device supporting multi-address read operations improves throughput on a bi-directional serial port. The device includes a memory array and an input/output port having an input mode and an output mode. The input/output port has at least one signal line used alternately in both the input and output modes. A controller includes logic configured to execute a multi-address read operation in response to receiving a read command on the input/output port, the multi-address read operation including receiving a first address and a second address using the at least one signal line before outputting data.

Multi-table signature prefetch

Techniques are disclosed relating to signature-based instruction prefetching. In some embodiments, processor pipeline circuitry executes a computer program that includes control transfer instructions, such that the execution follows a taken path through the computer program. First signature prefetch table circuitry indicates prefetch addresses for signatures generated using a first signature generation technique and second signature prefetch table circuitry indicates prefetch addresses for signatures generated using a second, different signature generation technique. Signature prefetch circuitry, in response to a prefetch training event, determines a first signature according to the first technique and a second signature according to the second technique and selects one but not both of the first and second signature prefetch tables to train using the first signature or the second signature.

Control of branch prediction for zero-overhead loop

In response to decoding a zero-overhead loop control instruction of an instruction set architecture, processing circuitry sets at least one loop control parameter for controlling execution of one or more iterations of a program loop body of a zero-overhead loop. Based on the at least one loop control parameter, loop control circuitry controls execution of the one or more iterations of the program loop body of the zero-overhead loop, the program loop body excluding the zero-overhead loop control instruction. Branch prediction disabling circuitry detects whether the processing circuitry is executing the program loop body of the zero-overhead loop associated with the zero-overhead loop control instruction, and dependent on detecting that the processing circuitry is executing the program loop body of the zero-overhead loop, disables branch prediction circuitry. This reduces power consumption during a zero-overhead loop when the branch prediction circuitry is unlikely to provide a benefit.