G06F9/382

Techniques for instruction group formation for decode-time instruction optimization based on feedback

A technique of processing instructions for execution by a processor includes determining whether a first property of a first instruction and a second property of a second instruction are compatible. The first instruction and the second instruction are grouped in an instruction group in response to the first and second properties being compatible and a feedback value generated by a feedback function indicating the instruction group has been historically beneficial with respect to a benefit metric of the processor. Group formation for the first and second instructions is performed according to another criteria, in response to the first and second properties being incompatible or the feedback value indicating the grouping of the first and second instructions has not been historically beneficial.

Speculative execution of correlated memory access instruction methods, apparatuses and systems

A processor core, a processor, an apparatus, and an instruction processing method are disclosed. The processor core includes: an instruction fetch unit, where the instruction fetch unit includes a speculative execution predictor and the speculative execution predictor compares a program counter of a memory access instruction with a table entry stored in the speculative execution predictor and marks the memory access instruction; a scheduler unit adapted to adjust a send order of marked memory access instructions and send the marked memory access instructions according to the send order; an execution unit adapted to execute the memory access instructions according to the send order. In the instruction fetch unit, a memory access instruction is marked according to a speculative execution prediction result. In the scheduler unit, a send order of memory access instructions is determined according to the marked memory access instruction and the memory access instructions are sent. In the execution unit, the memory access instructions are executed according to the send order. This helps avoiding re-execution of a memory access instruction due to an address correlation of the memory access instruction. Consequently, this eliminates the need of adding an idle cycle in an instruction pipeline and the need of refreshing the pipeline to clear a memory access instruction that is incorrectly speculated.

POST-TESSELLATION BLENDING IN A GPU PIPELINE
20220270320 · 2022-08-25 ·

Implementations of blender hardware perform both domain shading and blending and whilst some vertices may not require blending, all vertices require domain shading. The blender hardware includes a cache and/or a content addressable memory and these data structures are used to reduce duplicate domain shading operations.

System for providing trace data in a data processor having a pipelined architecture

The invention is a method and system for providing trace data in a pipelined data processor. Aspects of the invention include providing a trace pipeline in parallel to the execution pipeline, providing trace information on whether conditional instructions complete or not, providing trace information on the interrupt status of the processor, replacing instructions in the processor with functionally equivalent instructions that also produce trace information and modifying the scheduling of instructions in the processor based on the occupancy of a trace output buffer.

Arithmetic processing unit and control method for arithmetic processing unit
11249763 · 2022-02-15 · ·

An arithmetic processing unit includes an instruction decoder which decodes a fetch instruction to issue an execution instruction; a reservation station which temporarily stores the execution instruction; and an arithmetic unit which executes the execution instruction, and the fetch instruction includes a multi-flow instruction which is divided into divided instructions and a single instruction. The instruction decoder includes: a pre-decoder including N number of slots each of which divides the multi-flow instruction into divided instructions; a main decoder including N number of slots each of which decodes the instructions to issue an execution instruction; and a pre-decoder buffer including N−K number of slots each of which temporarily stores instructions in the pre-decoder. The instruction decoder repeats transferring the divided instructions and the single instructions from the slots of the pre-decoder and the slots of the pre-decoder buffer to the main decoder as much as possible in order.

Opportunity multithreading in a multithreaded processor with instruction chaining capability

A computing device determines that a current software thread of a plurality of software threads having an issuing sequence does not have a first instruction waiting to be issued to a hardware thread during a clock cycle. The computing device identifies one or more alternative software threads in the issuing sequence having instructions waiting to be issued. The computing device selects, during the clock cycle by the computing device, a second instruction from a second software thread among the one or more alternative software threads in view of determining that the second instruction has no dependencies with any other instructions among the instructions waiting to be issued. Dependencies are identified by the computing device in view of the values of a chaining bit extracted from each of the instructions waiting to be issued. The computing device issues the second instruction to the hardware thread.

ADVANCED PROCESSOR ARCHITECTURE
20210406027 · 2021-12-30 · ·

The invention relates to a method for processing instructions out-of-order on a processor comprising an arrangement of execution units. The inventive method comprises looking up operand sources in a Register Positioning Table and setting operand input references of the instruction to be issued accordingly, checking for an Execution Unit (EXU) available for receiving a new instruction, and issuing the instruction to the available Execution Unit and entering a reference of the result register addressed by the instruction to be issued to the Execution Unit into the Register Positioning Table (RPT).

STREAMING ENGINE WITH ERROR DETECTION, CORRECTION AND RESTART
20210390018 · 2021-12-16 ·

Disclosed embodiments relate to a streaming engine employed in, for example, a digital signal processor. A fixed data stream sequence including plural nested loops is specified by a control register. The streaming engine includes an address generator producing addresses of data elements and a steam head register storing data elements next to be supplied as operands. The streaming engine fetches stream data ahead of use by the central processing unit core in a stream buffer. Parity bits are formed upon storage of data in the stream buffer which are stored with the corresponding data. Upon transfer to the stream head register a second parity is calculated and compared with the stored parity. The streaming engine signals a parity fault if the parities do not match. The streaming engine preferably restarts fetching the data stream at the data element generating a parity fault.

Processor with a program counter increment based on decoding of predecode bits
11200059 · 2021-12-14 · ·

A processor includes: an instruction fetch portion configured to fetch simultaneously a plurality of fixed-length instructions in accordance with a program counter; an instruction predecoder configured to predecode specific fields in a part of the plurality of fixed-length instructions; and a program counter management portion configured to control an increment of the program counter in accordance with a result of the predecoding.

Post-tessellation blending in a GPU pipeline

Implementations of blender hardware perform both domain shading and blending and whilst some vertices may not require blending, all vertices require domain shading. The blender hardware includes a cache and/or a content addressable memory and these data structures are used to reduce duplicate domain shading operations.