G06F9/3854

TIME-RESOURCE MATRIX FOR A MICROPROCESSOR WITH TIME COUNTER FOR STATICALLY DISPATCHING INSTRUCTIONS
20230244489 · 2023-08-03 · ·

A processor includes a time counter and a time-resource matrix and provides a method for statically dispatching instructions if the resources are available based on data stored in the time-resource matrix, and wherein execution times for the instructions use a time count from the time counter to specify when the instructions may be provided to an execution pipeline.

Optimizing thread selection at fetch, select, and commit stages of processor core pipeline

An apparatus includes a buffer configured to store a plurality of instructions previously fetched from a memory, wherein each instruction of the plurality of instructions may be included in a respective thread of a plurality of threads. The apparatus also includes control circuitry configured to select a given thread of the plurality of threads dependent upon a number of instructions in the buffer that are included in the given thread. The control circuitry is also configured to fetch a respective instruction corresponding to the given thread from the memory, and to store the respective instruction in the buffer.

Branch predictor with empirical branch bias override

A processor may include a baseline branch predictor and an empirical branch bias override circuit. The baseline branch predictor may receive a branch instruction associated with a given address identifier, and generate, based on a global branch history, an initial prediction of a branch direction for the instruction. The empirical branch bias override circuit may determine, dependent on a direction of an observed branch direction bias in executed branch instruction instances associated with the address identifier, whether the initial prediction should be overridden, may determine, in response to determining that the initial prediction should be overridden, a final prediction that matches the observed branch direction bias, or may determine, in response determining that the initial prediction should not be overridden, a final prediction that matches the initial prediction. The predictor may update an entry in the global branch history reflecting the resolved branch direction for the instruction following its execution.

Method and System for Detection of Thread Stall

A method of checking for a stall condition in a processor is disclosed, the method including inserting an inline instruction sequence into a thread, the inline instruction sequence configured to read the result from a timing register during processing of a first instruction and store the result in a first general purpose register, wherein the timing register functions as a timer for the processor; and read the results from the timing register during processing of a second instruction and store the results in a second general purpose register, wherein the second instruction is the next consecutive instruction after the first instruction. The inline thread sequence may be inserted in sequence with the thread and further configured to compare the difference between the result in the first and second general purpose register to a programmable threshold.

Folded instruction fetch pipeline

An instruction fetch pipeline includes first, second, and third sub-pipelines that respectively include: a TLB that receives a fetch virtual address, a tag random access memory (RAM) of a physically-indexed physically-tagged set associative instruction cache that receives a predicted set index, and a data RAM that receives the predicted set index and a predicted way number that specifies a way of the entry from which a block of instructions was previously fetched. The predicted set index specifies the instruction cache set that includes the entry. The three sub-pipelines respectively initiate in parallel: a TLB access using the fetch virtual address to obtain a translation thereof into a fetch physical address that includes a tag, a tag RAM access using the predicted set index to read a set of tags, and a data RAM access using the predicted set index and the predicted way number to fetch the block of instructions.

Branch target buffer that stores predicted set index and predicted way number of instruction cache

A microprocessor includes a branch target buffer (BTB). Each BTB entry holds a tag based on at least a portion of a virtual address of a block of instructions previously fetched from a physically-indexed physically-tagged set associative instruction cache using a physical address that is a translation of the virtual address, a translated address bit portion of a set index of an instruction cache entry from which the instruction block was previously fetched, and a way number of the instruction cache entry from which the instruction block was previously fetched. In response to a BTB hit based on a fetch virtual address, the BTB provides a translated address bit portion of a predicted set index that is the translated address bit portion of the set index from the hit on BTB entry and a predicted way number that is the way number from the hit on BTB entry.

Latest producer tracking in an out-of-order processor, and applications thereof

A processor and system for latest producer tracking In one embodiment, the processor includes an operand renamer circuit that includes a register rename map, a producer tracking circuit that includes a producer tracking map, and a results buffer allocater circuit that includes a results buffer free list. Control logic modifies in-register status values stored in the register rename map based on producer tracking status values stored in the producer tracking map. The producer tracking status values stored in the producer tracking map are modified based on buffer identification values output by the results buffer allocater circuit.

Parallelized execution of instruction sequences based on pre-monitoring

A method which includes, in a processor that processes instructions of program code, processing one or more of the instructions in a first segment of the instructions by a first hardware thread. Upon detecting that an instruction defined as a parallelization point has been fetched for the first thread, a second hardware thread is invoked to process at least one of the instructions in a second segment of the instructions, at least partially in parallel with processing of the instructions of the first segment by the first hardware thread, in accordance with a specification of register access that is indicative of data dependencies between the first and second segments.

Method and apparatus for sorting elements in hardware structures
10289419 · 2019-05-14 · ·

A method for sorting elements in hardware structures is disclosed. The method comprises selecting a plurality of elements to order from an unordered input queue (UIQ) within a predetermined range in response to finding a match between at least one most significant bit of the predetermined range and corresponding bits of a respective identifier associated with each of the plurality of elements. The method further comprises presenting each of the plurality of elements to a respective multiplexer. Further the method comprises generating a select signal for an enabled multiplexer in response to finding a match between at least one least significant bit of a respective identifier associated with each of the plurality of elements and a port number of the ordered queue. Finally, the method comprises forwarding a packet associated with a selected element identifier to a matching port number of the ordered queue from the enabled multiplexer.

Accelerator for gather-update-scatter operations including a content-addressable memory (CAM) and CAM controller

A processor may include a gather-update-scatter accelerator, and an allocator comprising circuitry to direct an instruction to the accelerator for execution. The instruction may include a search index, an operation to be performed, and a scalar data value. The accelerator may include a content-addressable memory (CAM) storing multiple entries, each of which stores a respective index key and a data value associated with the index key. The accelerator may include a CAM controller, which includes circuitry. The CAM controller may be configured to select, based on the information in the instruction, one of the plurality of entries in the CAM on which to operate. The CAM controller may be configured to perform an arithmetic or logical operation on the selected entry dependent on the information in the instruction. The CAM controller may be configured to store a result of the operation in the selected entry in the CAM.