Patent classifications
G06F9/3804
PROGRAM FLOW PREDICTION FOR LOOPS
Instruction processing circuitry comprises fetch circuitry to fetch instructions for execution; instruction decoder circuitry to decode fetched instructions; execution circuitry to execute decoded instructions; and program flow prediction circuitry to predict a next instruction to be fetched; in which the instruction decoder circuitry is configured to decode a loop control instruction in respect of a given program loop and to derive information from the loop control instruction for use by the program flow prediction circuitry to predict program flow for one or more iterations of the given program loop.
Coalescing adjacent gather/scatter operations
According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.
PARALLEL INSTRUCTION EXTRACTION METHOD AND READABLE STORAGE MEDIUM
The invention relates to the technical field of a processor, in particular to a method for parallel extracting instructions and a readable storage medium. The method generates a valid vector of fetched instructions according to the end position vector s_mark_end of the instruction, and performs parallel decoding of instructions at each position, calculation of instruction address and branch instruction target address operation through logical “AND” and logical “OR” operations. Ultimately, multiple instructions are fetched in parallel. The present invention is a method for generating a valid vector of fetching instructions according to the end position vector s_mark_end of the instruction, and extracting multiple instructions in parallel through logical “AND” and logical “OR” operations. The invention can extract a plurality of instructions in parallel, there is no serial dependence relationship between each instruction, and the time sequence is easy to converge, so a higher main frequency can be obtained.
Computing device and method
The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.
Multi-table signature prefetch
Techniques are disclosed relating to signature-based instruction prefetching. In some embodiments, processor pipeline circuitry executes a computer program that includes control transfer instructions, such that the execution follows a taken path through the computer program. First signature prefetch table circuitry indicates prefetch addresses for signatures generated using a first signature generation technique and second signature prefetch table circuitry indicates prefetch addresses for signatures generated using a second, different signature generation technique. Signature prefetch circuitry, in response to a prefetch training event, determines a first signature according to the first technique and a second signature according to the second technique and selects one but not both of the first and second signature prefetch tables to train using the first signature or the second signature.
Program flow prediction for loops
Instruction processing circuitry comprises fetch circuitry to fetch instructions for execution; instruction decoder circuitry to decode fetched instructions; execution circuitry to execute decoded instructions; and program flow prediction circuitry to predict a next instruction to be fetched; in which the instruction decoder circuitry is configured to decode a loop control instruction in respect of a given program loop and to derive information from the loop control instruction for use by the program flow prediction circuitry to predict program flow for one or more iterations of the given program loop.
Detecting a dynamic control flow re-convergence point for conditional branches in hardware
Systems, methods, and apparatuses relating to hardware for auto-predication of critical branches. In one embodiment, a processor core includes a decoder to decode instructions into decoded instructions, an execution unit to execute the decoded instructions, a branch predictor circuit to predict a future outcome of a branch instruction, and a branch predication manager circuit to disable use of the predicted future outcome for a conditional critical branch comprising the branch instruction.
COALESCING ADJACENT GATHER/SCATTER OPERATIONS
According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.
PREDICTING UPCOMING CONTROL FLOW
An apparatus has a fetch queue to identify a sequence of instructions to be fetched for execution and prediction circuitry to predict upcoming control flow and to control which instructions are identified in the fetch queue in dependence on the prediction. The prediction circuitry predicts multi-taken sequences which are sequences of instructions in which control flow is diverted by a first control flow changing instruction to a series of instructions terminating in a second control flow changing instruction that diverts control flow to a target address. The apparatus also has prediction confidence calculation circuitry to calculate confidence levels for respective multi-taken sequences. Each confidence level is indicative of a confidence in an accuracy of prediction of its respective multi-taken sequence. When the confidence level for a particular multi-taken sequence satisfies a prediction confidence condition, the prediction confidence tracking circuitry allows the particular multi-taken sequence to be predicted by the prediction circuitry. The prediction circuitry causes the series of instructions and the target instruction for the particular multi-taken sequence to be identified in the fetch queue when the prediction circuitry predicts the particular multi-taken sequence and further predictions to be made starting from the target address for the particular multi-taken sequence.
Generation and use of memory access instruction order encodings
Apparatus and methods are disclosed for controlling execution of memory access instructions in a block-based processor architecture using a hardware structure that indicates a relative ordering of memory access instruction in an instruction block. In one example of the disclosed technology, a method of executing an instruction block having a plurality of memory load and/or memory store instructions includes selecting a next memory load or memory store instruction to execute based on dependencies encoded within the block, and on a store vector that stores data indicating which memory load and memory store instructions in the instruction block have executed. The store vector can be masked using a store mask. The store mask can be generated when decoding the instruction block, or copied from an instruction block header. Based on the encoded dependencies and the masked store vector, the next instruction can issue when its dependencies are available.