Patent classifications
G06F9/3832
Apparatus and method for predicting source operand values and optimized processing of instructions
An apparatus and method are provided for processing instructions. The apparatus has execution circuitry for executing instructions, where each instruction requires an associated operation to be performed using one or more source operand values in order to produce a result value. Issue circuitry is used to maintain a record of pending instructions awaiting execution by the execution circuitry, and prediction circuitry is used to produce a predicted source operand value for a chosen pending instruction. Optimisation circuitry is then arranged to detect an optimisation condition for the chosen pending instruction when the predicted source operand value is such that, having regard to the associated operation for the chosen pending instruction, the result value is known without performing the associated operation. In response to detection of the optimisation condition, an optimisation operation is implemented instead of causing the execution circuitry to perform the associated operation in order to execute the chosen pending instruction. This can lead to significant performance and/or power consumption improvements.
Zero operand instruction conversion for accelerating sparse computations in a central processing unit pipeline
A processing device includes a zero detection circuit to determine that an operand of a first instruction is zero and instruction conversion logic coupled with the zero detection circuit to, in response to the zero detection circuit determining that the operand is zero, convert the first instruction to a register move instruction executable by the processing device.
TECHNIQUES FOR NEURAL NETWORK EXECUTION UTILIZING MEMOIZATION
A system and method for reducing power consumption in processing artificial neural networks utilizes memoization techniques. The method includes receiving computer code representing a neural network model, the neural network model including an input layer having a first plurality of nodes, and an output layer having a second plurality of nodes; detecting in the computer code a cacheable block of instructions, the cacheable block of instructions including an input and an output, wherein the input and the output are local to the cacheable block of instructions; determining a first power consumption corresponding to retrieving a value from a value cache; determining a second power consumption corresponding to executing the cacheable block of instructions; and storing in the value cache an input value corresponding to the input and an output value corresponding to the output, in response to determining that the second power consumption is higher than the first power consumption.
TECHNIQUES FOR OPTIMIZING NEURAL NETWORKS FOR MEMOIZATION USING SHIFTED VALUE LOCALIZATION
A system and method for increasing cache hits in a value cache utilizing memoization is disclosed. The method includes receiving an input matrix for a parallel processing circuitry, wherein the parallel processing circuitry configured to process the input matrix with a second matrix; selecting a portion of the input matrix, wherein the portion includes a plurality of values; adjusting a first value of the plurality of values based on a second value of the plurality of values; generating a new input matrix based on the input matrix and the adjusted first value; and configuring the parallel processing circuitry to process the new input matrix with the second matrix.
DATA OPERATIONS AND FINITE STATE MACHINE FOR MACHINE LEARNING VIA BYPASS OF COMPUTATIONAL TASKS BASED ON FREQUENTLY-USED DATA VALUES
A mechanism is described for facilitating fast data operations and for facilitating a finite state machine for machine learning at autonomous machines. A method of embodiments, as described herein, includes detecting input data to be used in computational tasks by a computation component of a processor including a graphics processor. The method may further include determining one or more frequently-used data values (FDVs) from the data, and pushing the one or more frequent data values to bypass the computational tasks.
Apparatus and method for prefetching data items
Examples of the present disclosure relate to an apparatus comprising execution circuitry to execute instructions defining data processing operations on data items. The apparatus comprises cache storage to store temporary copies of the data items. The apparatus comprises prefetching circuitry to a) predict that a data item will be subject to the data processing operations by the execution circuitry by determining that the data item is consistent with an extrapolation of previous data item retrieval by the execution circuitry, and identifying that at least one control flow element of the instructions indicates that the data item will be subject to the data processing operations by the execution circuitry; and b) prefetch the data item into the cache storage.
Apparatus for storing, reading and modifying constant values
A data processing system utilizes non-volatile storage to store constant values. An instruction decoder decodes program instructions to generate control signals to control processing circuitry to perform processing operations which may include processing operations corresponding to constant-using program instructions. Such constant-using program instructions may include one or more operation specifying fields and one or more argument specifying fields which control the processing circuitry to generate an output value equal to that given by reading one or more constant values from the non-volatile storage, optionally modifying such a value, and then performing the processing operation upon the value, or the modified value, to generate an output value.
Detecting misprediction when an additional branch direction prediction determined using value prediction is considered more accurate than an initial branch direction prediction
An apparatus has processing circuitry for executing instructions and fetch circuitry for fetching the instructions for execution. When a branch instruction is encountered by the fetch circuitry, it determines subsequent instructions to be fetched in dependence on an initial branch direction prediction for the branch instruction made by branch prediction circuitry. Value prediction circuitry is used to maintain a predicted result value for one or more instructions, and dispatch circuitry maintains a record of pending instructions that have been fetched by the fetch circuitry and are awaiting execution by the processing circuitry, and selects pending instructions from the record for dispatch to the processing circuitry. When a given instruction whose predicted result value is maintained by the value prediction circuitry has a dependent instruction whose outcome is dependent on a result value of the given instruction, the dispatch circuitry nay be arranged to enable speculative execution of that dependent instruction using the predicted result value of the given instruction. Analysis circuitry is arranged, when the dependent instruction is the branch instruction, to detect a mispredict condition when an additional branch direction prediction for the branch instruction determined using the predicted result value for the given instruction is considered more accurate that the initial branch direction prediction, and the additional branch direction prediction differs to the initial branch direction prediction. On detection of the mispredict condition, a control signal is issued to indicate that the branch instruction has been mispredicted.
Steering a history buffer entry to a specific recovery port during speculative flush recovery lookup in a processor
A computer system, processor, and method for processing information is disclosed that includes reading out a plurality of entries in a history buffer prior to initiating a flush recovery process; initiating the flush recovery process; determining which of the history buffer entries read out of the history buffer should be recovered; and sending information associated with the history buffer entries to be recovered to one or more history buffer recovery ports. In one or more embodiments, the history buffer entries are continually read out in response to a processor and history buffer entries read out from the history buffer are directed to a specific history buffer recovery port associated with a mapper of a specific logical register.
Determining prefetch patterns with discontinuous strides
An apparatus and method are provided. The apparatus comprises storage circuitry to store a plurality of data elements. Processing circuitry executes a stream of instructions comprising access instructions that access some of the data elements at given locations. Training circuitry determines a pattern of the given locations based on the access instructions. Prefetch circuitry performs prefetches based on the pattern and filter circuitry filters the access instructions used by the training circuitry to determine the pattern by including discontinuous access instructions whose given location raises a discontinuity with the given location of a previous access instruction. In this way, it is possible to perform prefetching by calculating, rather than guessing, at a cumulative stride between the access instructions.