G06F9/3814

Method for replenishing a thread queue with a target instruction of a jump instruction
11579885 · 2023-02-14 · ·

Methods and electronic circuits for executing instructions in a central processing unit (CPU) are provided. One of the methods includes forming an instruction block by sequentially fetching, from a current thread queue, one or more instructions including one jump instruction, wherein the jump instruction is the last instruction in the instruction block; transmitting the instruction block to a CPU execution unit for execution; replenishing the current thread queue with at least one instruction to form a thread queue to be executed; determining a target instruction of the jump instruction according to an execution result of the CPU execution unit; determining whether the target instruction is contained in the thread queue to be executed; and if not, flushing the thread queue to be executed, obtaining the target instruction and adding the target instruction to the thread queue to be executed.

Programmable instruction buffering for accumulating a burst of instructions

A processing system 2 includes a processing pipeline 12, 14, 16, 18, 28 which includes fetch circuitry 12 for fetching instructions to be executed from a memory 6, 8. Buffer control circuitry 34 is responsive to a programmable trigger, such as explicit hint instructions delimiting an instruction burst, or predetermined configuration data specifying parameters of a burst together with a synchronising instruction, to trigger the buffer control circuitry to stall a stallable portion of the processing pipeline (e.g. issue circuitry 16), to accumulate within one or more buffers 30, 32 fetched instructions starting from a predetermined starting instruction, and, when those instructions have been accumulated, to restart the stallable portion of the pipeline.

INSTRUCTION PRE-FETCHING
20180011735 · 2018-01-11 ·

Pre-fetching instructions for tasks of an operating system (OS) is provided by calling a task scheduler that determines a load start time for a set of instructions for a particular task corresponding to a task switch condition. The OS calls, and in response to the load start time, a loader entity module that generates a pre-fetch request that loads the set of instructions for the particular task from a non-volatile memory circuit into a random access memory circuit. The OS calls the task scheduler to switch to the particular task.

HARDWARE CONTROLLED INSTRUCTION PRE-FETCHING
20180011736 · 2018-01-11 ·

A task control circuit maintains, in response to task event information, a task information queue that includes task information for a plurality of tasks. Based upon the task information in the task information queue, a future task switch condition is identified as corresponding to a task switch time for a particular task of the plurality of tasks. A load start time is determined for a set of instructions for the particular task. A pre-fetch request is generated to load the set of instructions for the particular task into the memory circuit. The pre-fetch request is forwarded to a hardware loader circuit. In response to the task switch time, a task event trigger is generated for the particular task. The hardware loader circuit is used to load, in response to the pre-fetch request, the set of instructions from a non-volatile memory into the memory circuit.

Processor with instruction iteration

A processor includes a plurality of execution units. At least one of the execution units is configured to repeatedly execute a first instruction based on a first field of the first instruction indicating that the first instruction is to be iteratively executed.

Moving entries between multiple levels of a branch predictor based on a performance loss resulting from fewer than a pre-set number of instructions being stored in an instruction cache register

An instruction processing device and an instruction processing method are provided. The instruction processing device includes: a first-level branch target buffer, configured to store entries of a first plurality of branch instructions; a second-level branch target buffer, configured to store entries of a second plurality of branch instructions, wherein the entries in the first-level branch target buffer are accessed faster than the entries in the second-level branch target buffer; an instruction fetch unit coupled to the first-level branch target buffer and the second-level branch target buffer, the instruction fetch unit including circuitry configured to add, for a first branch instruction, one or more entries corresponding to the first branch instruction into the first-level branch target buffer when the one or more entries corresponding to the first branch instruction are identified in the second-level branch target buffer; and an execution unit including circuitry configured to execute the first branch instruction.

Circular queue management with split indexes

Methods and apparatus for managing circular queues are disclosed. A pointer designates an index position of a particular queue element and contains an additional pointer state, whereby two pointer values (split indexes) can designate the same index position. Front and rear pointers are respectively managed by dequeue and enqueue logic. The front pointer state and rear pointer state distinguish full and empty queue states when both pointers designate the same index position. Asynchronous dequeue and enqueue operations are supported, no lock is required, and no queue entry is wasted. Hardware and software embodiments for numerous applications are disclosed.

PROCESSOR EMBEDDED STREAMING BUFFER
20220413852 · 2022-12-29 · ·

Techniques are disclosed for the use of local buffers integrated into the execution units of a vector processor architecture. The use of local buffers results in less communication across the interconnection network implemented by vector processors, and increases interconnection network bandwidth, increases the speed of computations, and decreases power usage.

Executing multiple programs simultaneously on a processor core

Systems and methods are disclosed for allocating resources to contexts in block-based processor architectures. In one example of the disclosed technology, a processor is configured to spatially allocate resources between multiple contexts being executed by the processor, including caches, functional units, and register files. In a second example of the disclosed technology, a processor is configured to temporally allocate resources between multiple contexts, for example, on a clock cycle basis, including caches, register files, and branch predictors. Each context is guaranteed access to its allocated resources to avoid starvation from contexts competing for resources of the processor. A results buffer can be used for folding larger instruction blocks into portions that can be mapped to smaller-sized instruction windows. The results buffer stores operand results that can be passed to subsequent portions of an instruction block.

LINK STACK BASED INSTRUCTION PREFETCH AUGMENTATION
20220382552 · 2022-12-01 ·

A computer-implemented method of performing a link stack based prefetch augmentation using a sequential prefetching includes observing a call instruction in a program being executed, and pushing a return address onto a link stack for processing the next instruction. A stream of instructions is prefetched starting from a cached line address of the next instruction and is stored in an instruction cache.