Patent classifications
G06F9/3826
Pipelining out-of-order instructions
Systems, methods and computer program product provide for pipelining out-of-order instructions. Embodiments comprise an instruction reservation station for short instructions of a short latency type and long instructions of a long latency type, an issue queue containing at least two short instructions of a short latency type, which are to be chained to match a latency of a long instruction of a long latency type, a register file, at least one execution pipeline for instructions of a short latency type and at least one execution pipeline for instructions of a long latency type; wherein results of the at least one execution pipeline for instructions of the short latency type are written to the register file, preserved in an auxiliary buffer, or forwarded to inputs of said execution pipelines. Data of the auxiliary buffer are written to the register file.
Power saving by reusing results of identical micro-operations
A data processing apparatus has control circuitry for detecting whether a current micro-operation to be processed by a processing pipeline would give the same result as an earlier micro-operation. If so, then the current micro-operation is passed through the processing pipeline, with at least one pipeline stage passed by the current micro-operation being placed in a power saving state during a processing cycle in which the current micro-operation is at that pipeline stage. The result of the earlier micro-operation is then output as a result of said current micro-operation. This allows power consumption to be reduced by not repeating the same computation.
PREVENTING PREMATURE READS FROM A GENERAL PURPOSE REGISTER
Methods and apparatus for preventing premature reads from a general purpose register (GPR) including receiving an instruction comprising a source operand identifying a source GPR entry; setting a read-enabled flag based on a value in a particular entry of a source ready vector; if the read-enabled flag indicates data in the source GPR entry is ready for reading, dispatching the received instruction, including performing a read operation of the data in the source GPR entry; and if the read-enabled flag indicates data in the source GPR entry is not ready for reading, dispatching the received instruction without performing a read operation of the data in the source GPR entry.
Systems and methods for faster read after write forwarding using a virtual address
Methods for read after write forwarding using a virtual address are disclosed. A method includes determining when a virtual address has been remapped from corresponding to a first physical address to a second physical address and determining if all stores occupying a store queue before the remapping have been retired from the store queue. Loads that are younger than the stores that occupied the store queue before the remapping are prevented from being dispatched and executed until the stores that occupied the store queue before the remapping have left the store queue and become globally visible.
Opportunity multithreading in a multithreaded processor with instruction chaining capability
A computing device determines that a current software thread of a plurality of software threads having an issuing sequence does not have a first instruction waiting to be issued to a hardware thread during a clock cycle. The computing device identifies one or more alternative software threads in the issuing sequence having instructions waiting to be issued. The computing device selects, during the clock cycle by the computing device, a second instruction from a second software thread among the one or more alternative software threads in view of determining that the second instruction has no dependencies with any other instructions among the instructions waiting to be issued. Dependencies are identified by the computing device in view of the values of a chaining bit extracted from each of the instructions waiting to be issued. The computing device issues the second instruction to the hardware thread.
ADVANCED PROCESSOR ARCHITECTURE
The invention relates to a method for processing instructions out-of-order on a processor comprising an arrangement of execution units. The inventive method comprises looking up operand sources in a Register Positioning Table and setting operand input references of the instruction to be issued accordingly, checking for an Execution Unit (EXU) available for receiving a new instruction, and issuing the instruction to the available Execution Unit and entering a reference of the result register addressed by the instruction to be issued to the Execution Unit into the Register Positioning Table (RPT).
Decoupling Atomicity from Operation Size
In an embodiment, a processor implements a different atomicity size (for memory consistency order) than the operation size. More particularly, the processor may implement a smaller atomicity size than the operation size. For example, for multiple register loads, the atomicity size may be the register size. In another example, the vector element size may be the atomicity size for vector load instructions. In yet another example, multiple contiguous vector elements, but fewer than all the vector elements in a vector register, may be the atomicity size for vector load instructions.
Exception Handling
A processor includes: a memory; an execution pipeline having a plurality of pipeline stages for executing an operation on data provided to the execution pipeline and storing a result of the operation into the memory; a receive pipeline having a plurality of pipeline stages for handling incoming data to the processor and storing the incoming data into memory; context status storage for holding an exception indicator of an exception encountered by the receive pipeline whilst it is handling incoming data; the receive pipeline being configured to determine that an exception has been encountered in one of its pipeline stages and to delay committing the exception indicator to the context status storage until a final one of its pipeline stages and to continue to receive and store incoming data into the memory until the exception indicator has been committed.
Efficient scheduling of load instructions
When scheduling instructions for execution on a computing device, load instructions are processed before their dependent computational instructions. This can result in the load instructions being scheduled in a non-optimal order. To schedule the load instructions in a preferred order, a scheduler can speculatively schedule the load instructions without committing to their order. Subsequently, when the scheduler encounters the dependent computational instructions, the scheduler can reorder the speculatively scheduled load instructions according to the execution order of the dependent computational instructions.
Method of managing multi-tier memory displacement using software controlled thresholds
A computing system includes a memory controller having a plurality of bypass parameters set by a software program, a thresholds matrix to store threshold values selectable by the plurality of bypass parameters, and a bypass function to determine whether a first cache line is to be displaced with a second cache line in a first memory or the first cache line remains in the first memory and the second cache line is to be accessed by at least one of a processor core and the cache from a second memory.