Patent classifications
G06F9/382
Method for replenishing a thread queue with a target instruction of a jump instruction
Methods and electronic circuits for executing instructions in a central processing unit (CPU) are provided. One of the methods includes forming an instruction block by sequentially fetching, from a current thread queue, one or more instructions including one jump instruction, wherein the jump instruction is the last instruction in the instruction block; transmitting the instruction block to a CPU execution unit for execution; replenishing the current thread queue with at least one instruction to form a thread queue to be executed; determining a target instruction of the jump instruction according to an execution result of the CPU execution unit; determining whether the target instruction is contained in the thread queue to be executed; and if not, flushing the thread queue to be executed, obtaining the target instruction and adding the target instruction to the thread queue to be executed.
ADVANCED PROCESSOR ARCHITECTURE
The invention relates to a method for processing instructions out-of-order on a processor comprising an arrangement of execution units. The inventive method comprises: 1) looking up operand sources in a Register Positioning Table and setting operand input references of the instruction to be issued accordingly; 2) checking for an Execution Unit (EXU) available for receiving a new instruction; and 3) issuing the instruction to the available Execution Unit and enter a reference of the result register addressed by the instruction to be issued to the Execution Unit into the Register Positioning Table (RPT).
RECONFIGURABLE SIMD ENGINE
An exemplary SIMD computing system comprises a SIMD processing element (SPE) configured to perform a selected operation on a portion of a processor input data word, with the operation selected by control signals read from a control memory location addressed by a decoded instruction. The SPE may comprise one or more adder, multiplier, or multiplexer coupled to the control signals. The control signals may comprise one or more bit read from the control memory. The control memory may be an MxN (M rows by N columns) memory having M possible SIMD operations and N control signals. Each instruction decoded may select an SPE operation from among N rows. A plurality of SPEs may receive the same control signals. The control memory may be rewritable, advantageously permitting customizable SIMD operations that are reconfigurable by storing in the control memory locations control signals designed to cause the SPE to perform selected operations.
Systems and methods for artificial intelligence with a flexible hardware processing framework
An artificial intelligence (AI) system is disclosed. The AI system provides an AI system lane processing chain, at least one AI processing block, a local memory, a hardware sequencer, and a lane composer. Each of the at least one AI processing block, the local memory coupled to the AI system lane processing chain, the hardware sequencer coupled to the AI system lane processing chain, and the lane composer is coupled to the AI system lane processing chain. The AI system lane processing chain is dynamically created by the lane composer.
DUAL PIPELINE PARALLEL SYSTOLIC ARRAY
A processing apparatus described herein includes a general-purpose parallel processing engine comprising a systolic array having multiple pipelines, each of the multiple pipelines including multiple pipeline stages, wherein the multiple pipelines include a first pipeline, a second pipeline, and a common input shared between the first pipeline and the second pipeline.
Technology For Optimizing Memory-To-Register Operations
An apparatus comprises decoder circuitry to decode an instruction that includes an opcode to indicate a protected load operation, a source field for source memory address information, and a destination field to identify a destination register. The apparatus also comprises memory to store an allocate load-protect (LP) data structure with an entry for the identified destination register. The entry comprises an IP field and a status field. The apparatus also comprises load elision circuitry to (a) use the allocate LP data structure to determine whether the identified destination register has active status for the IP; (b) in response to determining that the identified destination register has active status for the IP, cause the instruction to be elided; and (c) in response to determining that the identified destination register does not have active status for the IP, cause the instruction to be executed. Other embodiments are described and claimed.
Apparatus and method for scalable qubit addressing
An apparatus and method for scalable qubit addressing. For example, one embodiment of a processor comprises: a decoder comprising quantum instruction decode circuitry to decode quantum instructions to generate quantum microoperations (uops) and non-quantum decode circuitry to decode non-quantum instructions to generate non-quantum uops; execution circuitry comprising: an address generation unit (AGU) to generate a system memory address responsive to execution of one or more of the non-quantum uops; and quantum index generation circuitry to generate quantum index values responsive to execution of one or more of the quantum uops, each quantum index value uniquely identifying a quantum bit (qubit) in a quantum processor; wherein to generate a first quantum index value for a first quantum uop, the quantum index generation circuitry is to read the first quantum index value from a first architectural register identified by the first quantum uop.
Method and apparatus for virtualizing the micro-op cache
Systems, apparatuses, and methods for virtualizing a micro-operation cache are disclosed. A processor includes at least a micro-operation cache, a conventional cache subsystem, a decode unit, and control logic. The decode unit decodes instructions into micro-operations which are then stored in the micro-operation cache. The micro-operation cache has limited capacity for storing micro-operations. When new micro-operations are decoded from pending instructions, existing micro-operations are evicted from the micro-operation cache to make room for the new micro-operations. Rather than being discarded, micro-operations evicted from the micro-operation cache are stored in the conventional cache subsystem. This prevents the original instruction from having to be decoded again on subsequent executions. When the control logic determines that micro-operations for one or more fetched instructions are stored in either the micro-operation cache or the conventional cache subsystem, the control logic causes the decode unit to transition to a reduced-power state.
FETCH QUEUES USING CONTROL FLOW PREDICTION
A data processing apparatus is provided. It includes control flow detection prediction circuitry that performs a presence prediction of whether a block of instructions contains a control flow instruction. A fetch queue stores, in association with prediction information, a queue of indications of the instructions and the prediction information comprises the presence prediction. An instruction cache stores fetched instructions that have been fetched according to the fetch queue. Post-fetch correction circuitry receives the fetched instructions prior to the fetched instructions being received by decode circuitry, the post-fetch correction circuitry includes analysis circuitry that causes the fetch queue to be at least partly flushed in dependence on a type of a given fetched instruction and the prediction information associated with the given fetched instruction.
Methods and systems for processing data in a programmable data processing pipeline that includes out-of-pipeline processing
Methods and system for processing data in a programmable processing pipeline are disclosed. In an embodiment, a method for processing packets in a programmable packet processing pipeline is disclosed. The method involves processing data corresponding to a packet through a match-action pipeline of a programmable packet processing pipeline, and diverting the processing of data corresponding to the packet from the match-action pipeline to a processor core for out-of-pipeline processing.