Patent classifications
G06F9/3875
System, apparatus and method for dynamic pipeline stage control of data path dominant circuitry of an integrated circuit
In an embodiment, a data path circuit includes: a plurality of pipeline stages coupled between an input of the data path circuit and an output of the data path circuit; and a first selection circuit coupled between a first pipeline stage and a second pipeline stage, the first selection circuit having a first input to receive an input to the first pipeline stage and a second input to receive an output of the first pipeline stage and controllable to output one of the input to the first pipeline stage and the output of the first pipeline stage. A bypass controller coupled to the data path circuit may control the first selection circuit based at least in part on an operating frequency of the data path circuit. Other embodiments are described and claimed.
SYSTEM, APPARATUS AND METHOD FOR DYNAMIC PIPELINE STAGE CONTROL OF DATA PATH DOMINANT CIRCUITRY OF AN INTEGRATED CIRCUIT
In an embodiment, a data path circuit includes: a plurality of pipeline stages coupled between an input of the data path circuit and an output of the data path circuit; and a first selection circuit coupled between a first pipeline stage and a second pipeline stage, the first selection circuit having a first input to receive an input to the first pipeline stage and a second input to receive an output of the first pipeline stage and controllable to output one of the input to the first pipeline stage and the output of the first pipeline stage. A bypass controller coupled to the data path circuit may control the first selection circuit based at least in part on an operating frequency of the data path circuit. Other embodiments are described and claimed.
ENHANCED PROCESSOR FUNCTIONS FOR CALCULATION
Enhanced processor functions for calculation are described. An example of an apparatus includes one or more processors including one or more processing resources and a memory to store data, the data including data for compute operations. A processing resource of the one or more processing resources includes a configurable pipeline for calculation operations, and wherein the configurable pipeline may be utilized to perform both a normal instruction for a calculation in a certain precision and a systolic instruction for a calculation in a certain precision.
COLLAPSING BUBBLES IN A PROCESSING UNIT PIPELINE
An arithmetic logic unit (ALU) pipeline of a processing unit collapses execution bubbles in response to a stall at a stage of the ALU pipeline. An execution bubble occurs at the pipeline in response to an invalid instruction being placed in the pipeline for execution. The invalid instruction thus consumes an available “slot” in the pipeline, and proceeds through the pipeline until a stall in a subsequent stage (that is, a stage after the stage executing the invalid instruction) is detected. In response to detecting the stall, the ALU continues to execute instructions that are behind the invalid instruction in the pipeline, thereby collapsing the execution bubble and conserving resources of the ALU.in response to a stall at a stage of the ALU pipeline.
REUSING ADJACENT SIMD UNIT FOR FAST WIDE RESULT GENERATION
A system for processing instructions with extended results includes a first instruction execution unit having a first result bus for execution of processor instructions. The system further includes a second instruction execution unit having a second result bus for execution of processor instructions. The first instruction execution unit is configured to selectively send a portion of results calculated by the first instruction execution unit to the second instruction execution unit during prosecution of a processor instruction if the second instruction execution unit is not used for executing the processor instruction and if the received processor instruction produces a result having a data width greater than the width of the first result bus. The second instruction execution unit is configured to receive the portion of results calculated by the first instruction execution unit and put the received results on the second results bus.
FLOATING-POINT SUPPORTIVE PIPELINE FOR EMULATED SHARED MEMORY ARCHITECTURES
A processor architecture arrangement for emulated shared memory (ESM) architectures, including a number of multithreaded processors each provided with interleaved inter-thread pipeline and a plurality of functional units for carrying out arithmetic and logical operations on data, wherein the pipeline includes at least two operatively parallel pipeline branches, first pipeline branch includes a first sub-group of said plurality of functional units, such as ALUs (arithmetic logic unit), arranged for carrying out integer operations, and second pipeline branch includes a second, non-overlapping sub-group of said plurality of functional units, such as FPUs (floating point unit), arranged for carrying out floating point operations, and further wherein one or more of the functional units of at least said second sub-group arranged for floating point operations are located operatively in parallel with the memory access segment of the pipeline.
INFORMATION PROCESSING SYSTEM, METHOD OF PROCESSING INFORMATION, AND INFORMATION PROCESSING APPARATUS
An information processing system includes a first information processing apparatus, an external storage, and a second information processing apparatus. The first information processing apparatus includes first circuitry to execute data transfer from the first information processing apparatus to the external storage, and assign a completion flag to data for which execution of the data transfer has been completed. The second information processing apparatus includes second circuitry to execute data transfer from the external storage to the second information processing apparatus, and assign a completion flag to data for which execution of the data transfer has been completed. The first circuitry is further configured to, after data transfer is interrupted, execute the data transfer of data to which the completion flag is not assigned. The second circuitry is further configured to after the data transfer is interrupted, execute the data transfer of data to which the completion flag is not assigned.
SYSTEMS AND METHODS TO TRANSPOSE VECTORS ON-THE-FLY WHILE LOADING FROM MEMORY
Disclosed embodiments relate to transposing vectors while loading from memory. In one example, a processor includes a register file, a memory interface, fetch circuitry to fetch an instruction, decode circuitry to decode the fetched instruction having fields to specify an opcode, a destination vector register, and a source vector having N groups of elements, N being a positive integer, the opcode to indicate the processor is to fetch the source vector, generate write data comprising one or more N-tuples, each N-tuple comprising corresponding elements from each of the N groups of elements, and write the write data to the destination vector register, and execution circuitry to execute the decoded instruction as per the opcode, the execution circuitry has a shuffle pipeline disposed between the memory and the register file, the shuffle pipeline to fetch, decode, and execute further instances of the instruction at one instruction per clock cycle.
MERGED DATA PATH FOR TRIANGLE AND BOX INTERSECTION TEST IN RAY TRACING
Described herein is a merged data path unit that has elements that are configurable to switch between different instruction types. The merged data path unit is a pipelined unit that has multiple stages. Between different stages lie multiplexor layers that are configurable to route data from functional blocks of a prior stage to a subsequent stage. The manner in which the multiplexor layers are configured for a particular stage is based on the instruction type executed at that stage. In some implementations, the functional blocks in different stages are also configurable by the control unit to change the operations performed. Further, in some implementations, the control unit has sideband storage that stores data that skips stages. An example of a merged data path used for performing a ray-triangle intersection test and a ray-box intersection test is also described herein.
Method and apparatus for efficient scheduling for asymmetrical execution units
A method and system performs instruction scheduling in an out-of-order microprocessor pipeline. The method and system selects a first set of instructions to dispatch from a scheduler to an execution module, wherein the execution module comprises two types of execution units. The first type of execution unit executes both a first and a second type of instruction and the second type of execution unit executes only the second type. Next, the method selects a second set of instructions to dispatch, which is a subset of the first set and comprises only instructions of the second type. The method determines a third set of instructions, which comprises instructions not selected as part of the second set. Further, the method dispatches the second set for execution using the second type of execution unit and dispatching the third set for execution using the first type of execution unit.