G06F9/3854

Managing an effective address table in a multi-slice processor

Methods and apparatus for managing an effective address table (EAT) in a multi-slice processor including receiving, from an instruction sequence unit, a next-to-complete instruction tag (ITAG); obtaining, from the EAT, a first ITAG from a tail-plus-one EAT row, wherein the EAT comprises a tail EAT row that precedes the tail-plus-one EAT row; determining, based on a comparison of the next-to-complete ITAG and the first ITAG, that the tail EAT row has completed; and retiring the tail EAT row based on the determination.

Out of order store commit

Systems, apparatuses, and methods for committing store instructions out of order from a store queue are described. A processor may store a first store instruction and a second store instruction in the store queue, wherein the first store instruction is older than the second store instruction. In response to determining the second store instruction is ready to commit to the memory hierarchy, the processor may allow the second store instruction to commit before the first store instruction, in response to determining that all store instructions in the store queue older than the second store instruction are non-speculative. However, if it is determined that at least one store instruction in the store queue older than the second store instruction is speculative, the processor may prevent the second store instruction from committing to the memory hierarchy before the first store instruction.

Programmable graphics processor for multithreaded execution of programs

A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.

Computer architecture for speculative parallel execution

A system for parallel execution of program portions on different processors permits speculative execution of the program portions before a determination is made as to whether there is a data dependency between the portion and older but unexecuted portions. Before commitment of the program portions in a sequential execution order, data dependencies are resolved through a token system that tracks read access and write access to data elements accessed by the program portions.

Reducing power consumption in a multi-slice computer processor

Reducing power consumption in a multi-slice computer processor that includes a re-order buffer and an architected register file, including: designating an entry in the re-order buffer as being invalid and unwritten; assigning a pending instruction to the entry in the re-order buffer; responsive to assigning the pending instruction to the entry in the re-order buffer, designating the entry as being valid; writing data generated by executing the pending instruction into the re-order buffer; and responsive to writing data generated by executing the pending instruction into the re-order buffer, designating the entry as being written.

Annotation logic for dynamic instruction lookahead distance determination

A processor having an instruction cache for storing a plurality of instructions is provided. The processor further includes annotation logic configured to determine a lookahead distance associated with an instruction and annotate the at least one instruction cache with the lookahead distance. The lookahead distance may correspond to a number of instructions that separates an instruction that references a register from the most recent register definition. The lookahead distance may indicate the shortest distance to a later instruction that references a register that this instruction defines.

Speeding Up Younger Store Instruction Execution After a Sync Instruction

Mechanisms are provided, in a processor, for executing instructions that are younger than a previously dispatched synchronization (sync) instruction is provided. An instruction sequencer unit of the processor dispatches a sync instruction. The sync instruction is sent to a nest of one or more devices outside of the processor. The instruction sequencer unit dispatches a subsequent instruction after dispatching the sync instruction. The dispatching of the subsequent instruction after dispatching the sync instruction is performed prior to receiving a sync acknowledgement response from the nest. The instruction sequencer unit performs a completion of the subsequent instruction based on whether completion of the subsequent instruction is dependent upon receiving the sync acknowledgement from the nest and completion of the sync instruction.

Stateless capture of data linear addresses during precise event based sampling

A processor includes a logic for stateless capture of data linear addresses (DLA) during precise event based sampling (PEBS) for an out-of-order execution engine. The engine may include a PEBS unit with logic to increment a counter each time an instance of a designated micro-op is retired a reorder buffer, capture output DLA referenced by an instance of the micro-op that executes after the counter overflows, set a captured bit associated with a reorder buffer identifier for the instance of the micro-op, and store a PEBS record in a debug storage when the instance of the micro-op is retired from the reorder buffer. The designated micro-op references a DLA of a memory accessible to the processor.

Processing an encoding format field to interpret header information regarding a group of instructions

A method including fetching information regarding a group of instructions, where the group of instructions is configured to execute atomically by a processor, including an encoding format for the information regarding the group of instructions, is provided. The method further includes processing the encoding format to interpret the information regarding the group of instructions.

Variable latency instructions

Techniques related to executing instructions by a processor comprising receiving a first instruction for execution, determining a first latency value based on an expected amount of time needed for the first instruction to be executed, storing the first latency value in a writeback queue, beginning execution of the first instruction on the instruction execution pipeline, adjusting the latency value based on an amount of time passed since beginning execution of the first instruction, outputting a first result of the first instruction based on the latency value, receiving a second instruction, determining that the second instruction is a variable latency instruction, storing a ready value indicating that a second result of the second instruction is not ready in the writeback queue, beginning execution of the second instruction on the instruction execution pipeline, updating the ready value to indicate that the second result is ready, and outputting the second result.