G06F9/3814

FETCHED DATA IN AN ULTRA-SHORT PIPED LOAD STORE UNIT
20170351521 · 2017-12-07 ·

Techniques are disclosed for receiving an instruction for processing data that includes a plurality of sectors. A method includes decoding the instruction to determine which of the plurality of sectors are needed to process the instruction and fetching at least one of the plurality of sectors from memory. The method includes determining whether each sector that is needed to process the instruction has been fetched. If all sectors needed to process the instruction have been fetched, the method includes transmitting a sector valid signal and processing the instruction. If all sectors needed to process the instruction have not been fetched, the method includes blocking a data valid signal from being transmitted, fetching an additional one or more of the plurality of sectors until all sectors needed to process the instruction have been fetched, transmitting a sector valid signal, and reissuing and processing the instruction using the fetched sectors.

Queued instruction re-dispatch after runahead

Various embodiments of microprocessors and methods of operating a microprocessor during runahead operation are disclosed herein. One example method of operating a microprocessor includes identifying a runahead-triggering event associated with a runahead-triggering instruction and, responsive to identification of the runahead-triggering event, entering runahead operation and inserting the runahead-triggering instruction along with one or more additional instructions in a queue. The example method also includes resuming non-runahead operation of the microprocessor in response to resolution of the runahead-triggering event and re-dispatching the runahead-triggering instruction along with the one or more additional instructions from the queue to the execution logic.

Microarchitectural sensitive tag flow
11263015 · 2022-03-01 · ·

Described herein are systems and methods for microarchitectural sensitive tag flow. For example, some methods include detecting dependence of data stored in a second data storage circuitry on the first instruction, where the first instruction will output a value to be stored in the second data storage circuitry, and wherein the second data storage circuitry is associated with a third tag indicating whether the second data storage circuitry has been designated as storing sensitive data; responsive to the dependence of data stored in the second data storage circuitry on the first instruction, checking whether the second tag indicates a sensitive instruction; and, responsive to the second tag indicating a sensitive instruction, updating the third tag to indicate that data stored in the second data storage circuitry has been designated as sensitive.

MERGING STATUS AND CONTROL DATA IN A RESERVATION STATION

Embodiments herein describe a reservation station (RS) in a processor that merges control data from multiple sources into a merged control data value. Before an instruction issues, the RS gathers and saves control data indicating how the instruction is to be executed. This control data may be saved in control registers. An instruction, however, can update many different types of status control bits in these registers. As such, the RS may store different types of control data for an instruction. Instead of the RS containing multiple registers and data paths for every type of control data, the embodiments herein describe merge logic in the RS that permits control data from different sources to be merged into a single control data value. Once the instruction is issued, the RS passes the merged control data value to an execution unit for processing.

Data processing apparatus and data processing method
09804958 · 2017-10-31 · ·

A data processing apparatus for accessing a plurality of memories is provided. The data processing apparatus includes a function control circuitry and an address generation circuitry. The function control circuitry is utilized to record a first memory address where a first function is implemented after the first function is implemented and to determine which one of the plurality of memories is a target memory according to the first memory address. The address generation circuitry is utilized to output the first memory address to the target memory. In addition, the function control circuitry is configured to determine the target memory in the same processing cycle in which the address generation circuitry is configured to output the first memory address.

SINGLE-THREAD SPECULATIVE MULTI-THREADING
20170337062 · 2017-11-23 ·

A processor includes a pipeline and control circuitry. The pipeline is configured to process instructions of program code and includes one or more fetch units. The control circuitry is configured to predict at run-time one or more future flow-control traces to be traversed in the program code, to define, based on the predicted flow-control traces, two or more regions of the program code from which instructions are to be fetched, wherein the number of regions is greater than the number of fetch units, and to instruct the pipeline to fetch instructions alternately from the two or more regions of the program code using the one or more fetch units, and to process the fetched instructions.

Low latency fetch circuitry for compute kernels
11256510 · 2022-02-22 · ·

Techniques are disclosed relating to fetching items from a compute command stream that includes compute kernels. In some embodiments, stream fetch circuitry sequentially pre-fetches items from the stream and stores them in a buffer. In some embodiments, fetch parse circuitry iterate through items in the buffer using a fetch parse pointer to detect indirect-data-access items and/or redirect items in the buffer. The fetch parse circuitry may send detected indirect data accesses to indirect-fetch circuitry, which may buffer requests. In some embodiments, execute parse circuitry iterates through items in the buffer using an execute parse pointer (e.g., which may trail the fetch parse pointer) and outputs both item data from the buffer and indirect-fetch results from indirect-fetch circuitry for execution. In various embodiments, the disclosed techniques may reduce fetch latency for compute kernels.

System, apparatus and method for program order queue (POQ) to manage data dependencies in processor having multiple instruction queues

In one embodiment, an apparatus includes: a plurality of registers; a first instruction queue to store first instructions; a second instruction queue to store second instructions; a program order queue having a plurality of portions each associated with one of the plurality of registers, each of the portions having entries to store a state of an instruction, the state comprising an encoding of a use of the register by the instruction and a source instruction queue for the instruction; and a dispatcher to dispatch for execution the first and second instructions from the first and second instruction queues based at least in part on information stored in the program order queue, to manage instruction dependencies between the first instructions and the second instructions. Other embodiments are described and claimed.

Low complexity out-of-order issue logic using static circuits

Instruction issue circuits are disclosed that are configured to issue multiple instructions within a superscalar pipeline of a microprocessor. The instruction issue circuit includes an instruction queue that stores instructions. A ready generation circuit is operably associated with the instruction queue and generates ready signals that indicate which instructions in the instruction queue are ready for execution. To simplify the instruction issue circuit, the instruction issue circuit has group blocks. Each group block receives a different group of the ready signals corresponding to a different group of the instructions. Each group block generates a group output indicating a group set within the corresponding group of the instructions that has a highest instruction execution priority and are ready for execution. By splitting the ready signals into groups, the groups of ready signals can be processed in parallel thereby reducing both the resulting delay and complexity of the instruction issue circuit.

Instruction dispatch for superscalar processors

The present disclosure relates to instruction dispatch mechanisms for superscalar processors having a plurality of functional units for executing operations simultaneously. Each particular functional unit of the plurality of functional units may be configured to output a capability vector indicating a set of operations that the particular functional unit is currently available to perform. As instructions are received in an issue queue, the functional unit to execute the instruction is selected by comparing capabilities required by the instruction to the asserted capabilities of each of the functional units. A functional unit may reset or de-assert a particular functionality while performing an operation and then re-assert the capability when the instruction is completed. A result of the operation may be stored in a skid buffer for at least as long as the chain execution time in order to avoid resource hazards are a write port of the vector register file.