G06F2209/507

Scheduling execution of instructions on a processor having multiple hardware threads with different execution resources
10318296 · 2019-06-11 · ·

A method and apparatus are provided for executing instructions of a multi-threaded processor having multiple hardware threads with differing hardware resources comprising the steps of receiving a plurality of streams of instructions and determining which hardware threads are able to receive instructions for execution, determining whether a thread determined to be available for executing an instructions has the hardware resources available required by that instructions and executing the instruction in dependence on the result of the determination.

Task execution in a SIMD processing unit with parallel groups of processing lanes

A SIMD processing unit processes a plurality of tasks which each include up to a predetermined maximum number of work items. The work items of a task are arranged for executing a common sequence of instructions on respective data items. The data items are arranged into blocks, with some of the blocks including at least one invalid data item. Work items which relate to invalid data items are invalid work items. The SIMD processing unit comprises a group of processing lanes configured to execute instructions of work items of a particular task over a plurality of processing cycles. A control module assembles work items into the tasks based on the validity of the work items, so that invalid work items of the particular task are temporally aligned across the processing lanes. In this way the number of wasted processing slots due to invalid work items may be reduced.

Method for platform-based scheduling of job flow

A method for a platform-based scheduling of a job flow is provided, which relates to the technology field of satellites. Specifically, the method includes: acquiring satellite remote sensing data according to transit time of a satellite and triggering a processing flow of the satellite remote sensing data; determining an execution order of the processing flow of the satellite remote sensing data according to a constraint relationship; and allocating processing resources to the processing flow in a hierarchical scheduling manner according to the execution order, and executing the processing flow with the processing resources. Due to a reasonable determination of the execution order of the processing flow of the satellite remote sensing data and a scheduling of the processing resources in the hierarchical scheduling manner, a reasonable allocation of the processing resources is realized. Thus, A timeliness of the processing of the satellite remote sensing data may be improved.

Apparatus and method for programmable load replay preclusion

An apparatus including first and second reservation stations. The first reservation station dispatches a load micro instruction, and indicates on a hold bus if the load micro instruction is a specified load micro instruction directed to retrieve an operand from a prescribed resource other than on-core cache memory. The second reservation station is coupled to the hold bus, and dispatches one or more younger micro instructions therein that depend on the load micro instruction for execution after a number of clock cycles following dispatch of the first load micro instruction, and if it is indicated on the hold bus that the load micro instruction is the specified load micro instruction, the second reservation station is configured to stall dispatch of the one or more younger micro instructions until the load micro instruction has retrieved the operand. The plurality of non-core resources includes a random access memory, programmed via a Joint Test Action Group interface with the plurality of specified load instructions corresponding to the out-of-order processor which, upon initialization, accesses the random access memory to determine said plurality of specified load instructions.

ALLOCATION OF RESOURCES TO TASKS
20240273804 · 2024-08-15 ·

A method of managing resources in a graphics processing pipeline includes, in response to selecting a task for execution within a texture/shading unit, allocating to the task both a static allocation of temporary registers for the entire task and a dynamic allocation of temporary registers. The dynamic allocation comprises temporary registers used by a first phase of the task only and the static allocation of temporary registers comprises any temporary registers that are used by the program and are live at a boundary between two phases. When the task subsequently reaches a boundary between two phases, the dynamic allocation of temporary registers are freed and a new dynamic allocation of temporary registers for a next phase of the task is allocated to the task.

Machine tool controller including a multi-core processor for dividing a large-sized program into portions stored in different lockable instruction caches
10127045 · 2018-11-13 · ·

A sequential program is divided into programs each with a size fitted into a cache memory using program profile information and prepared cache memory information. Program profile information on the sequential program and information on the cache memory are acquired. Based on the acquired information, division addresses at which the sequential program is divided are determined. The IDs of the division programs, assigned core numbers, the start addresses and end addresses of the programs, and cache storage block information are stored in a memory as program execution information.

Mechanism to preclude load replays dependent on long load cycles in an out-of-order processor

An apparatus including first and second reservation stations. The first reservation station dispatches a load micro instruction, and indicates on a hold bus if the load micro instruction is a specified load micro instruction directed to retrieve an operand from a prescribed resource other than on-core cache memory, where the specified load instruction requires more than a first number of clock cycles to retrieve the operand. The second reservation station is coupled to the hold bus, and dispatches one or more younger micro instructions therein that depend on the load micro instruction for execution after a number of clock cycles following dispatch of the first load micro instruction, and if it is indicated on the hold bus that the load micro instruction is the specified load micro instruction, the second reservation station is configured to stall dispatch of the one or more younger micro instructions until the load micro instruction has retrieved the operand.

Task execution in a SIMD processing unit with parallel groups of processing lanes

A SIMD processing unit processes a plurality of tasks which each include up to a predetermined maximum number of work items. The work items of a task are arranged for executing a common sequence of instructions on respective data items. The data items are arranged into blocks, with some of the blocks including at least one invalid data item. Work items which relate to invalid data items are invalid work items. The SIMD processing unit comprises a group of processing lanes configured to execute instructions of work items of a particular task over a plurality of processing cycles. A control module assembles work items into the tasks based on the validity of the work items, so that invalid work items of the particular task are temporally aligned across the processing lanes. In this way the number of wasted processing slots due to invalid work items may be reduced.

DYNAMIC PHYSICAL REGISTER ALLOCATION ACROSS MULTIPLE THREADS

A computer system includes a processor, main memory, and controller. The processor includes a plurality of hardware threads configured to execute a plurality of software threads. The main memory includes a first register table configured to contain a current set of architected registers for the currently running software threads. The controller is configured to change a first number of the architected registers assigned to a given one of the software threads to a second number of architected registers when a result of monitoring current usage of the registers by the software threads indicates that the change will improve performance of the computer system. The processor includes a second register table configured to contain a subset of the architected registers and a mapping table for each software thread indicating whether the architected registers referenced by the corresponding software thread are located in the first register table or the second register table.

Reconfigurable system-on-chip and related methods

A circuit includes combinational circuit and sequential circuit elements coupled thereto. The circuit includes a multiplexor coupled to the combinational and sequential circuit elements, and a system register is coupled to the multiplexor. At least one portion of the combinational and sequential circuit elements is configured to selectively switch to operate as a random access memory.