G06F9/3858

VARIABLE PIPELINE LENGTH IN A BARREL-MULTITHREADED PROCESSOR
20230106087 · 2023-04-06 ·

Devices and techniques for variable pipeline length in a barrel-multithreaded processor are described herein. A completion time for an instruction can be determined prior to insertion into a pipeline of a processor. A conflict between the instruction and a different instruction based on the completion time can be detected. Here, the different instruction is already in the pipeline and the conflict detected when the completion time equals the previously determined completion time for the different instruction. A difference between the completion time and an unconflicted completion time can then be calculated and completion of the instruction delayed by the difference.

SYSTEM AND METHOD OF MERGING PARTIAL WRITE RESULT DURING RETIRE PHASE
20170371667 · 2017-12-28 ·

A processor including a physical register file, a rename table, mapping logic, size tracking logic, and merge logic. The rename table maps an architectural register with a larger index and a smaller index. The mapping logic detects a partial write instruction that specifies an architectural register that is already identified by an entry of the rename table mapped to a second physical register allocated for a larger write operation, and includes an index for the allocated register for the partial write instruction into the smaller index location of the entry. The size tracking logic provides a merge indication for the partial write instruction if the write size of the previous write instruction is larger. The merge logic merges the result of the partial write instruction with the second physical register during retirement of the partial write instruction.

PROCESSOR WITH SLAVE FREE LIST THAT HANDLES OVERFLOW OF RECYCLED PHYSICAL REGISTERS AND METHOD OF RECYCLING PHYSICAL REGISTERS IN A PROCESSOR USING A SLAVE FREE LIST
20170371673 · 2017-12-28 ·

A processor including physical registers, a reorder buffer, a master free list, a slave free list, a master recycle circuit, and a slave recycle circuit. The reorder buffer includes instruction entries in which each entry stores physical register indexes for recycling physical registers. The reorder buffer retires up to N instructions in each processor cycle. Each master and slave free list includes N input ports and stores physical register indexes, in which the master free list stores indexes of physical registers to be allocated to instructions being issued. When an instruction is retired, the master recycle circuit routes a first physical register index stored in an instruction entry of the instruction to an input port of the master free list, and the slave recycle circuit routes a second physical register index stored in the instruction entry of the instruction to an input port of the slave free list.

LOAD-STORE QUEUE FOR MULTIPLE PROCESSOR CORES

Technology related to load-store queues for block-based processor architectures is disclosed. In one example of the disclosed technology, a processor includes multiple processor cores and a load-store queue. Each processor core is configured to execute an instruction block including load and store instructions. The instruction block can be identified by a block identifier, and each of the load and store instructions is identified with a load-store identifier. The load-store queue can be configured to enqueue load and store instructions from the processor cores in a buffer indexed based on a function of the block identifier and the load-store identifier. The buffer can be searched for store instructions having a target address matching a target address of a load instruction received from a first processor core. Load response data can be returned for the received load instruction to the first processor core based on the search of the buffer.

MANAGING A DIVIDED LOAD REORDER QUEUE

Managing a divided load reorder queue including storing load instruction data for a load instruction in an expanded LRQ entry in the LRQ; launching the load instruction from the expanded LRQ entry; determining that the load instruction is in a finished state; moving a subset of the load instruction data from the expanded LRQ entry to a compact LRQ entry in the LRQ, wherein the compact LRQ entry is smaller than the expanded LRQ entry; and removing the load instruction data from the expanded LRQ entry.

LOAD-STORE QUEUE FOR BLOCK-BASED PROCESSOR

Technology related to load-store queues for block-based processor architectures is disclosed. In one example of the disclosed technology, a processor includes issue logic and a load-store buffer. The issue logic can be configured to issue load and store instructions out of program order. Each of the load and store instructions can include an identifier specifying a relative program order of the respective instruction. The load-store buffer can be configured to enqueue the issued load and store instructions; generate hash values for addresses of the load and store instructions; and update a hash data structure using the generated hash values for the issued store instructions as an index of the hash data structure. For the load instructions, the hash data structure can be searched to generate load response data for the load instructions. The generated load response data can be forwarded to an execution unit of the processor.

Providing code sections for matrix of arithmetic logic units in a processor
11687346 · 2023-06-27 · ·

The present invention relates to a processor having a trace cache and a plurality of ALUs arranged in a matrix, comprising an analyser unit located between the trace cache and the ALUs, wherein the analyser unit analyses the code in the trace cache, detects loops, transforms the code, and issues to the ALUs sections of the code combined to blocks for joint execution for a plurality of clock cycles.

Out-of-order block-based processors and instruction schedulers using ready state data indexed by instruction position identifiers

Apparatus and methods are disclosed for implementing block-based processors including field programmable gate-array implementations. In one example of the disclosed technology, a block-based processor includes an instruction decoder configured to generate decoded ready dependencies for a transactional block of instructions, where each of the instructions is associated with a different instruction identifier encoded in the transactional block. The processor further includes an instruction scheduler configured to issue an instruction from a set of instructions of the transactional block of instructions. The instruction is issued based on determining that decoded ready state dependencies for an instruction are satisfied. The determining includes accessing storage with the decoded ready dependencies indexed with a respective instruction identifier that is encoded in the transactional block of instructions.

MULTI-THREADING PROCESSOR AND A SCHEDULING METHOD THEREOF
20170364361 · 2017-12-21 ·

A processor includes an execution unit, a retirement module, a first retirement counter, a second retirement counter, and an adjustment module. The execution unit executes instructions of a first thread and a second thread by simultaneous multithreading. The retirement module retires the executed instructions of the first thread in order of the first-thread instruction sequence, and retires the executed instructions of the second thread in order of the second-thread instruction sequence. The first retirement counter determines a first multi-thread retirement rate of the first thread. The second retirement counter determines a second multi-thread retirement rate of the second thread. The adjustment module adjusts the proportions of hardware resources respectively occupied by the first thread and the second thread according to the first multi-thread retirement rate and the second multi-thread retirement rate, so that the processor executes at its most efficient level of performance.

OPERATION OF A MULTI-SLICE PROCESSOR IMPLEMENTING DEPENDENCY ACCUMULATION INSTRUCTION SEQUENCING

Operation of a multi-slice processor that includes a plurality of execution slices. Operation of such a multi-slice processor includes: receiving a first instruction indicating a first target register; receiving a second instruction indicating the first target register as a source operand; responsive to the second instruction indicating the first target register as a source operand, updating a dependent count corresponding to the first instruction; and issuing, in dependence upon the dependent count for the first instruction being greater than a dependent count for another instruction, the first instruction to an execution slice of the plurality of execution slices.