Patent classifications
G06F9/3853
PARALLEL DISPATCHING OF MULTI-OPERATION INSTRUCTIONS IN A MULTI-SLICE COMPUTER PROCESSOR
Parallel dispatching of multi-operation instructions in a multi-slice computer processor, including: determining whether an instruction must be broken into a plurality of smaller operations; marking each of the smaller operations as instructions to be dispatched in parallel; determining whether each of the operations can be dispatched to distinct instruction issue queues during a same clock cycle; and responsive to determining that each of the operations can be dispatched to distinct instruction issue queues during the same clock cycle, dispatching each of the operations to distinct instruction issue queues during the same clock cycle.
GPU instruction storage
Techniques are disclosed relating to low-level instruction storage in a graphics unit. In some embodiments, a graphics unit includes execution circuitry, decode circuitry, hazard circuitry, and caching circuitry. In some embodiments the execution circuitry is configured to execute clauses of graphics instructions. In some embodiments, the decode circuitry is configured to receive graphics instructions and a clause identifier for each received graphics instruction and to decode the received graphics instructions. In some embodiments, the hazard circuitry is configured to generate hazard information that specifies dependencies between ones of the decoded graphics instructions in the same clause. In some embodiments, the caching circuitry includes a plurality of entries each configured to store a set of decoded instructions in the same clause and hazard information generated by the decode circuitry for the clause. This may reduce power consumption, in some embodiments, by reducing hazard checking when clauses are executed multiple times.
PROGRAM COUNTER (PC)-RELATIVE LOAD AND STORE ADDRESSING
Load store addressing can include a processor, which fuses two consecutive instruction determined to be prefix instructions and treats the two instructions as a single fused instruction. The prefix instruction of the fused instruction is auto-finished at dispatch time in an issue unit of the processor. A suffix instruction of the fused instruction and its fields and the prefix instruction's fields are issued from an issue queue of the issue unit, wherein an opcode of the suffix instruction is issued to a load store unit of the processor, and fields of the fused instruction are issued to the execution unit of the processor. The execution unit forms operands of the suffix instruction, at least one operand formed based on a current instruction address of the single fused instruction. The load store unit executes the suffix instruction using the operands formed by the execution unit.
Low energy accelerator processor architecture with short parallel instruction word
Methods and apparatus for a low energy accelerator processor architecture with short parallel instruction word. An integrated circuit includes a system bus having a data width N, where N is a positive integer; a central processor unit coupled to the system bus and configured to execute instructions retrieved from a memory coupled to the system bus; and a low energy accelerator processor coupled to the system bus and configured to execute instruction words retrieved from a low energy accelerator code memory, the low energy accelerator processor having a plurality of execution units including a load store unit, a load coefficient unit, a multiply unit, and a butterfly/adder ALU unit, each of the execution units configured to perform operations responsive to op-codes decoded from the retrieved instruction words, wherein the width of the instruction words is equal to the data width N. Additional methods and apparatus are disclosed.
Method for performing dual dispatch of blocks and half blocks
A method for executing dual dispatch of blocks and half blocks. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks, wherein each of the instruction blocks comprise two half blocks; scheduling the instructions of the instruction block to execute in accordance with a scheduler; and performing a dual dispatch of the two half blocks for execution on an execution unit.
Method and apparatus for enabling a processor to generate pipeline control signals
A chaining bit decoder of a computer processor receives an instruction stream. The chaining bit decoder selects a group of instructions from the instruction stream. The chaining bit decoder extracts a designated bit from each instruction of the instruction stream to produce a sequence of chaining bits. The chaining bit decoder decodes the sequence of chaining bits. The chaining bit decoder identifies zero or more instruction stream dependencies among the selected group of instructions in view of the decoded sequence of chaining bits. The chaining bit decoder outputs control signals to cause one or more pipelines stages of the processor to execute the selected group of instructions in view of the identified zero or more instruction stream dependencies among the group sequence of instructions.
Opportunity multithreading in a multithreaded processor with instruction chaining capability
A computing device determines that a current software thread of a plurality of software threads having an issuing sequence does not have a first instruction waiting to be issued to a hardware thread during a clock cycle. The computing device identifies one or more alternative software threads in the issuing sequence having instructions waiting to be issued. The computing device selects, during the clock cycle by the computing device, a second instruction from a second software thread among the one or more alternative software threads in view of determining that the second instruction has no dependencies with any other instructions among the instructions waiting to be issued. Dependencies are identified by the computing device in view of the values of a chaining bit extracted from each of the instructions waiting to be issued. The computing device issues the second instruction to the hardware thread.
Handling and fusing load instructions in a processor
A system, processor, and/or technique configured to: determine whether two or more load instructions are fusible for execution in a load store unit as a fused load instruction; in response to determining that two or more load instructions are fusible, transmit information to process the two or more fusible load instructions into a single entry of an issue queue; issue the information to process the two or more fusible load instructions from the single entry in the issue queue as a fused load instruction to the load store unit using a single issue port of the issue queue, wherein the fused load instruction contains the information to process the two or more fusible load instructions; execute the fused load instruction in the load store unit; and write back data obtained by executing the fused load instruction simultaneously to multiple entries in the register file.
COALESCING ADJACENT GATHER/SCATTER OPERATIONS
According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.
Conditional execution specification of instructions using conditional extension slots in the same execute packet in a VLIW processor
In one embodiment, a system includes a memory and a processor core. The processor core includes functional units and an instruction decode unit configured to determine whether an execute packet of instructions received by the processing core includes a first instruction that is designated for execution by a first functional unit of the functional units and a second instruction that is a condition code extension instruction that includes a plurality of sets of condition code bits, wherein each set of condition code bits corresponds to a different one of the functional units, and wherein the sets of condition code bits include a first set of condition code bits that corresponds to the first functional unit. When the execute packet includes the first and second instructions, the first functional unit is configured to execute the first instruction conditionally based upon the first set of condition code bits in the second instruction.