Patent classifications
G06F2209/507
Scheduling execution of instructions on a processor having multiple hardware threads with different execution resources
A method and apparatus are provided for executing instructions of a multi-threaded processor having multiple hardware threads (32, 34) with differing hardware resources comprising the steps of receiving a plurality of streams of instructions (38, 44) and determining which hardware threads are able to receive instructions for execution (40, 46), determining whether a thread determined to be available for executing an instructions has the hardware resources available required by that instructions (36) and executing the instruction in dependence on the result of the determination (50).
Method for hiding texture latency and managing registers on a processor
A method for hiding texture latency in a multi-thread virtual pipeline (MVP) processor including the steps of: allowing the MVP processor to start running a main rendering program; segmenting registers of various MVP kernel instances in the MVP processor according to the length set, acquiring a plurality of register sets with the same length, binding the register sets to chipsets of the processor at the beginning of the running of the kernel instance; allowing a shader thread to give up a processing time slot occupied by the shader thread after sending a texture detail request, and setting a Program Counter (PC) value in the case of return; and returning texture detail and allowing the shader thread to restart running.
Method and apparatus for sharing instruction scheduling resources among a plurality of execution threads in a multi-threaded processor architecture
A microprocessor includes a front end module and a schedule queue module. The front end module is configured to retrieve first instructions, corresponding to a first thread, from an instruction cache, and retrieve second instructions, corresponding to a second thread, from the instruction cache. The front end module is also configured to decode the first instructions into first decoded instructions, and decode the second instructions into second decoded instructions. The schedule queue module is configured to selectively store the first decoded instructions and the second decoded instructions from the front end module and, for each stored decoded instruction, selectively issue the stored decoded instruction to an execution module. The schedule queue is further configured to reject storing an additional one of the first decoded instructions from the front end module in response to a count of the stored first decoded instructions in the schedule queue module exceeding a threshold.
Task Execution in a SIMD Processing Unit with Parallel Groups of Processing Lanes
A SIMD processing unit processes a plurality of tasks which each include up to a predetermined maximum number of work items. The work items of a task are arranged for executing a common sequence of instructions on respective data items. The data items are arranged into blocks, with some of the blocks including at least one invalid data item. Work items which relate to invalid data items are invalid work items. The SIMD processing unit comprises a group of processing lanes configured to execute instructions of work items of a particular task over a plurality of processing cycles. A control module assembles work items into the tasks based on the validity of the work items, so that invalid work items of the particular task are temporally aligned across the processing lanes. In this way the number of wasted processing slots due to invalid work items may be reduced.
MANAGEMENT OF RESOURCES WITHIN A COMPUTING ENVIRONMENT
Resources in a computing environment are managed, for example, by a hardware controller controlling dispatching of resources from one or more pools of resources to be used in execution of threads. The controlling includes conditionally dispatching resources from the pool(s) to one or more low-priority threads of the computing environment based on current usage of resources in the pool(s) relative to an associated resource usage threshold. The management further includes monitoring resource dispatching from the pool(s) to one or more high-priority threads of the computing environment, and based on the monitoring, dynamically adjusting the resource usage threshold used in the conditionally dispatching of resources from the pool(s) to the low-priority thread(s).
Resource sharing using process delay
Methods and systems that reduce the number of instance of a shared resource needed for a processor to perform an operation and/or execute a process without impacting function are provided. a method of processing in a processor is provided. Aspects include determining that an operation to be performed by the processor will require the use of a shared resource. A command can be issued to cause a second operation to not use the shared resources N cycles later. The shared resource can then be used for a first aspect of the operation at cycle X and then used for a second aspect of the operation at cycle X+N. The second operation may be rescheduled according to embodiments.
Allocation of resources to tasks
A method of managing resources in a graphics processing pipeline includes, in response to selecting a task for execution within a texture/shading unit, allocating to the task both a static allocation of temporary registers for the entire task and a dynamic allocation of temporary registers. The dynamic allocation comprises temporary registers used by a first phase of the task only and the static allocation of temporary registers comprises any temporary registers that are used by the program and are live at a boundary between two phases. When the task subsequently reaches a boundary between two phases, the dynamic allocation of temporary registers are freed and a new dynamic allocation of temporary registers for a next phase of the task is allocated to the task.
Allocation of Resources to Tasks
A method of managing resources in a graphics processing pipeline includes, in response to selecting a task for execution within a texture/shading unit, allocating to the task both a static allocation of temporary registers for the entire task and a dynamic allocation of temporary registers. The dynamic allocation comprises temporary registers used by a first phase of the task only and the static allocation of temporary registers comprises any temporary registers that are used by the program and are live at a boundary between two phases. When the task subsequently reaches a boundary between two phases, the dynamic allocation of temporary registers are freed and a new dynamic allocation of temporary registers for a next phase of the task is allocated to the task.
System and method for user interactive pipelining of a computing application
A method of pipelining execution stages of a pipelined application can comprise a Buffer Pipeline Manager (BPM) of a Buffer Pipelined Application computing System (BPAS) allocating pipeline buffers, configuring access to the pipeline buffers by stage processors of the system, transferring buffers from one stage processor to a successor stage processor, and transferring data from a buffer in one memory to a buffer in an alternative memory. The BPM can allocate the buffers based on execution parameters associated with the pipelined application and/or stage processors. The BPM can transfer data to a buffer in an alternative memory based on performance, capacity, and/or topological attributes of the memories and/or processors utilizing the memories. The BPM can perform operations of the method responsive to interfaces of a Pipeline Programming Interface (PPI). A BPAS can comprise hardware processors, physical memories, stage processors, an application execution program, the PPI, and the BPM.
EFFICIENT DATA PROCESSING
A processor and method for handling data, by obtaining operations from storage, analyzing each of the operations to determine an associated operation space, and generating at least one operation set, wherein the operations of the operation set have substantially similar operation spaces. Receiving input data in the form of a tensor; and allocate the input data, as the input to a given operation of the operation set. The input data having the predetermined input characteristics associated with the given operation. Executing the given operations using the input to produces an output with the known output characteristics. Storing in a segment being associated with an operation of the operation set, the input data; and the output associated with the operation of the operation set.