Patent classifications
G06F9/3873
TECHNIQUES TO DERIVE EFFICIENT CONVERSION AND/OR COLOR CORRECTION OF VIDEO DATA
The present disclosure describes techniques for removing unnecessary processing stages from a graphics processing pipeline based on the format of data passed between the stages. Starting with a stage at a middle point in a pipeline, formats of data that are input to and output from the middle stage may be compared to each other. If the formats match, the middle stage may be removed from the pipeline. Thereafter, the format of data input to a pair of middle stages of the pipeline and output from the pipeline may be compared and, if they match, the middle pair may be deleted. This process may repeat until a middle pair is found where no match occurs between the input and output format. The remaining stages of the pipeline may be retained. In cases where a pipeline is not symmetrical, the formats of data at each node may be compared to each other. If a node possesses a format that does not match the format of any other node, then the stages between the node and its closest endpoint in the pipeline may be retained.
Processor with hybrid pipeline capable of operating in out-of-order and in-order modes
A method and circuit arrangement provide support for a hybrid pipeline that dynamically switches between out-of-order and in-order modes. The hybrid pipeline may selectively execute instructions from at least one instruction stream that require the high performance capabilities provided by out-of-order processing in the out-of-order mode. The hybrid pipeline may also execute instructions that have strict power requirements in the in-order mode where the in-order mode conserves more power compared to the out-of-order mode. Each stage in the hybrid pipeline may be activated and fully functional when the hybrid pipeline is in the out-of-order mode. However, stages in the hybrid pipeline not used for the in-order mode may be deactivated and bypassed by the instructions when the hybrid pipeline dynamically switches from the out-of-order mode to the in-order mode. The deactivated stages may then be reactivated when the hybrid pipeline dynamically switches from the in-order mode to the out-of-order mode.
Vector processor supporting linear interpolation on multiple dimensions
Techniques are disclosed for a vector processor architecture that enables data interpolation in accordance with multiple dimensions, such as one-, two-, and three-dimensional linear interpolation. The vector processor architecture includes a vector processor and accompanying vector addressable memory that enable a simultaneous retrieval of multiple entries in the vector addressable memory to facilitate linear interpolation calculations. The vector processor architecture vastly increases the speed in which such calculations may occur compared to conventional processing architectures. Example implementations include the calculation of digital pre-distortion (DPD) coefficients for use with radio frequency (RF) transmitter chains to support multi-band applications.
Execution Unit with Selective Instruction Pipeline Bypass
Execution unit and floating point unit pipelines may be selectively bypassed in some cases to improve execution unit performance and reduce power consumption. For example, instructions that do not need the operations provided by those pipelines can benefit from bypassing pipelines. By performing a write as soon as the operands are read from a register file, power consumption may be reduced and performance may be increased in some embodiments. In addition, the write can be done in the same cycle with the read so that no additional cycles may be needed in some embodiments for the write after read.
Variable length execution pipeline
In an aspect, a pipelined execution resource can produce an intermediate result for use in an iterative approximation algorithm in an odd number of clock cycles. The pipelined execution resource executes SIMD requests by staggering commencement of execution of the requests from a SIMD instruction. When executing one or more operations for a SIMD iterative approximation algorithm, and an operation for another SIMD iterative approximation algorithm is ready to begin execution, control logic causes intermediate results completed by the pipelined execution resource to pass through a wait state, before being used in a subsequent computation. This wait state presents two open scheduling cycles in which both parts of the next SIMD instruction can begin execution. Although the wait state increases latency to complete an in-progress algorithm, a total throughput of execution on the pipeline increases.
Techniques to derive efficient conversion and/or color correction of video data
The present disclosure describes techniques for removing unnecessary processing stages from a graphics processing pipeline based on the format of data passed between the stages. Starting with a stage at a middle point in a pipeline, formats of data that are input to and output from the middle stage may be compared to each other. If the formats match, the middle stage may be removed from the pipeline. Thereafter, the format of data input to a pair of middle stages of the pipeline and output from the pipeline may be compared and, if they match, the middle pair may be deleted. This process may repeat until a middle pair is found where no match occurs between the input and output format. The remaining stages of the pipeline may be retained. In cases where a pipeline is not symmetrical, the formats of data at each node may be compared to each other. If a node possesses a format that does not match the format of any other node, then the stages between the node and its closest endpoint in the pipeline may be retained.
Single cycle instruction pipeline scheduling
A method includes allocating a first single-cycle instruction to a first pipeline that picks single-cycle instructions for execution in program order. The method further includes marking at least one source register of the first single-cycle instruction as ready for execution in the first pipeline in response to all older single-cycle instructions allocated to the first pipeline being ready and eligible to be picked for execution. An apparatus includes a decoder to decode a first single-cycle instruction and to allocate the first single-cycle instruction to a first pipeline. The apparatus further includes a scheduler to pick single-cycle instructions for execution by the first pipeline in program order and to mark at least one source register of the first single-cycle instruction as ready for execution in the first pipeline in response to determining that all older single-cycle instructions allocated to the first pipeline are ready and eligible.
Controlling execution of instructions for a processing pipeline having first out-of order execution circuitry and second execution circuitry
An apparatus comprises a processing pipeline comprising out-of-order execution circuitry and second execution circuitry. Control circuitry monitors at least one reordering metric indicative of an extent to which instructions are executed out of order by the out-of-order execution circuitry, and controls whether instructions are executed using the out-of-order execution circuitry or the second execution circuitry based on the reordering metric. A speculation metric indicative of a fraction of executed instructions that are flushed due to a mis-speculation can also be used to determine whether to execute instructions on first or second execution circuitry having different performance or energy consumption characteristics.
Coprocessors with bypass optimization, variable grid architecture, and fused vector operations
In an embodiment, a coprocessor may include a bypass indication which identifies execution circuitry that is not used by a given processor instruction, and thus may be bypassed. The corresponding circuitry may be disabled during execution, preventing evaluation when the output of the circuitry will not be used for the instruction. In another embodiment, the coprocessor may implement a grid of processing elements in rows and columns, where a given coprocessor instruction may specify an operation that causes up to all of the processing elements to operate on vectors of input operands to produce results. Implementations of the coprocessor may implement a portion of the processing elements. The coprocessor control circuitry may be designed to operate with the full grid or partial grid, reissuing instructions in the partial grid case to perform the requested operation. In still another embodiment, the coprocessor may be able to fuse vector mode operations.
Pipelined Configurable Processor
A configurable processing circuit capable of handling multiple threads simultaneously, the circuit comprising a thread data store, a plurality of configurable execution units, a configurable routing network for connecting locations in the thread data store to the execution units, a configuration data store for storing configuration instances that each define a configuration of the routing network and a configuration of one or more of the plurality of execution units, and a pipeline formed from the execution units, the routing network and the thread data store that comprises a plurality of pipeline sections configured such that each thread propagates from one pipeline section to the next at each clock cycle, the circuit being configured to: (i) associate each thread with a configuration instance; and (ii) configure each of the plurality of pipeline sections for each clock cycle to be in accordance with the configuration instance associated with the respective thread that will propagate through that pipeline section during the clock cycle.