G06F9/3873

System, apparatus and method for dynamic pipeline stage control of data path dominant circuitry of an integrated circuit
11132201 · 2021-09-28 · ·

In an embodiment, a data path circuit includes: a plurality of pipeline stages coupled between an input of the data path circuit and an output of the data path circuit; and a first selection circuit coupled between a first pipeline stage and a second pipeline stage, the first selection circuit having a first input to receive an input to the first pipeline stage and a second input to receive an output of the first pipeline stage and controllable to output one of the input to the first pipeline stage and the output of the first pipeline stage. A bypass controller coupled to the data path circuit may control the first selection circuit based at least in part on an operating frequency of the data path circuit. Other embodiments are described and claimed.

Implementing quick-release VLV memory access array

A method for implementing of a quick-release Variable Length Vector (VLV) memory access array in the technical field of software programs, which includes the following steps: Step 1: when a pipeline restarts to refresh an out-of-order queue each time, and the number of sending an entry recorded in a sending counter of the entry is equal to the number of returning the entry recorded in a returning counter of the entry, an ID of the entry is kept unchanged, and the ID is used for a next pushed request; Step 2: when the pipeline restarts to refresh the out-of-order queue each time, the number of sending the entry recorded in the sending counter is not equal to the number of returning the entry recorded in the returning counter and mirror resources are not exhausted, the existing entry is released, the ID, the sending counter and the returning counter of the entry are copied to another structure, and N IDs, each of which is in a non-busy status are selected from a free list, and a busy bit of each of the N IDs is set, and the N IDs are filled into an ID field of the entry; Step 3: copied request information of at least one request which is not completed is stored through the mirror resources when the pipeline restarts to refresh the out-of-order queue, and the copied request information includes an ID, a sending counter and a receiving counter of each request; Step 4: after the receiving counter is copied to the mirror resources, at least one response is continued to be monitored; Step 5: when the currently released mirror resource is the only available resource, the released ID is copied back to an entry tagged to be restarted in the same period of releasing this ID to exchange information with the entry; and Step 6: ID allocation and recovery is maintained through a free list.

SYSTEM, APPARATUS AND METHOD FOR DYNAMIC PIPELINE STAGE CONTROL OF DATA PATH DOMINANT CIRCUITRY OF AN INTEGRATED CIRCUIT
20210191725 · 2021-06-24 ·

In an embodiment, a data path circuit includes: a plurality of pipeline stages coupled between an input of the data path circuit and an output of the data path circuit; and a first selection circuit coupled between a first pipeline stage and a second pipeline stage, the first selection circuit having a first input to receive an input to the first pipeline stage and a second input to receive an output of the first pipeline stage and controllable to output one of the input to the first pipeline stage and the output of the first pipeline stage. A bypass controller coupled to the data path circuit may control the first selection circuit based at least in part on an operating frequency of the data path circuit. Other embodiments are described and claimed.

Method and dynamically reconfigurable processor adapted for management of persistence of information across multiple instruction cycles
10983947 · 2021-04-20 ·

A method and system for enabling persistence of a value by a dynamically reconfigurable processor (“DRP”) from the time of execution of an earlier executed instruction to a time of later executed instruction. The value may represent a constant a variable value of a software program. The value may be read from or written into a memory circuit, a DRP logic element, an iterator of a DRP logic element, or other value storing element or aspect of the DRP. The value may be maintained in a single logic element through the duration of one or more instruction execution cycles, or alternatively or additionally, the value may be transferred between or among one or more value storage hardware elements. The persistence of the value and transfer of the value within, into and/or out of the DRP enables later access of the value by, and/or positioning the value within, the DRP.

Arithmetic logic unit with normal and accelerated performance modes using differing numbers of computational circuits

A processor includes a front end including circuitry to decode a first instruction to set a performance register for an execution unit and a second instruction, and an allocator including circuitry to assign the second instruction to the execution unit to execute the second instruction. The execution unit includes circuitry to select between a normal computation and an accelerated computation based on a mode field of the performance register, perform the selected computation, and select between a normal result associated with the normal computation and an accelerated result associated with the accelerated computation based on the mode field.

Variable latency request arbitration

A technique for scheduling processing tasks having different latencies is provided. The technique involves identifying one or more available requests in a request queue, where each request queue corresponds to a different latency. A request arbiter examines a shift register to determine whether there is an available slot for the one or more requests. A slot is available for a request if there is a slot that is a number of slots from the end of the shift register equal to the number of cycles the request takes to complete processing in a corresponding processing pipeline. If a slot is available, the request is scheduled for execution and the slot is marked as being occupied. If a slot is not available, the request is not scheduled for execution on the current cycle. On transitioning to a new cycle, the shift register is shifted towards its end and the technique repeats.

FLOATING-POINT SUPPORTIVE PIPELINE FOR EMULATED SHARED MEMORY ARCHITECTURES
20240004666 · 2024-01-04 ·

A processor architecture arrangement for emulated shared memory (ESM) architectures, including a number of multithreaded processors each provided with interleaved inter-thread pipeline and a plurality of functional units for carrying out arithmetic and logical operations on data, wherein the pipeline includes at least two operatively parallel pipeline branches, first pipeline branch includes a first sub-group of said plurality of functional units, such as ALUs (arithmetic logic unit), arranged for carrying out integer operations, and second pipeline branch includes a second, non-overlapping sub-group of said plurality of functional units, such as FPUs (floating point unit), arranged for carrying out floating point operations, and further wherein one or more of the functional units of at least said second sub-group arranged for floating point operations are located operatively in parallel with the memory access segment of the pipeline.

Execution pipeline adaptation

An apparatus and method of data processing are provided. The apparatus comprises at least two execution pipelines, one with a shorter execution latency than the other. The execution pipelines share a write port and issue circuitry of the apparatus issues decoded instructions to a selected execution pipeline. The apparatus further comprises at least one additional pipeline stage and the issue circuitry can detect a write port conflict condition in dependence on a latency indication associated with a decoded instruction which it is to issue. If the issue circuitry intends to issue the decoded instruction to the execution pipeline with the shorter execution latency then when the write port conflict condition is found the issue circuitry will cause use of at least one additional pipeline stage in addition to the target execution pipeline to avoid the write port conflict.

Processor with hybrid pipeline capable of operating in out-of-order and in-order modes

A method and circuit arrangement provide support for a hybrid pipeline that dynamically switches between out-of-order and in-order modes. The hybrid pipeline may selectively execute instructions from at least one instruction stream that require the high performance capabilities provided by out-of-order processing in the out-of-order mode. The hybrid pipeline may also execute instructions that have strict power requirements in the in-order mode where the in-order mode conserves more power compared to the out-of-order mode. Each stage in the hybrid pipeline may be activated and fully functional when the hybrid pipeline is in the out-of-order mode. However, stages in the hybrid pipeline not used for the in-order mode may be deactivated and bypassed by the instructions when the hybrid pipeline dynamically switches from the out-of-order mode to the in-order mode. The deactivated stages may then be reactivated when the hybrid pipeline dynamically switches from the in-order mode to the out-of-order mode.

Apparatus and method for controlling use of a register cache

An apparatus and method are provided for controlling use of a register cache. The apparatus has execution circuitry for executing instructions to process data values, and a register file comprising a plurality of registers in which to store the data values for access by the execution circuitry. A register cache is also provided that has a plurality of entries and is arranged to cache a subset of the data values for access by the execution circuitry. Each entry is arranged to cache a data value and an indication of the register associated with that cached data value. Prefetch circuitry then performs prefetch operations to prefetch data values from the register file into the register cache. Timing indication storage is used to store, for each data value to be generated as a result of instructions being executed within the execution circuitry, a register identifier for that data value, and timing information indicating when that data value will be generated by the execution circuitry. Cache usage control circuitry is then responsive to receipt of a plurality of register identifiers associated with source data values for a pending instruction yet to be executed by the execution circuitry, to generate, with reference to the timing information in the timing indication storage, a timing control signal to control timing of at least one prefetch operation performed by the prefetch circuitry. Such an approach can lead to significant improvements in the efficiency of utilisation of the register cache.