Patent classifications
G06F2209/507
Allocation of Resources to Tasks
A method of managing resources in a graphics processing pipeline includes, in response to selecting a task for execution within a texture/shading unit, allocating to the task both a static allocation of temporary registers for the entire task and a dynamic allocation of temporary registers. The dynamic allocation comprises temporary registers used by a first phase of the task only and the static allocation of temporary registers comprises any temporary registers that are used by the program and are live at a boundary between two phases. When the task subsequently reaches a boundary between two phases, the dynamic allocation of temporary registers are freed and a new dynamic allocation of temporary registers for a next phase of the task is allocated to the task.
Input/output (I/O) memory management unit (IOMMU) multi-core interference mitigation
A multicore processing environment (MCPE) is disclosed. In embodiments, the MCPE includes multiple processing cores hosting multiple user applications configured for simultaneous execution. The cores and user applications share system resources including main memory and input/output (I/O) domains, each I/O domain including multiple I/O devices capable of requesting inbound access to main memory through an I/O memory management unit (IOMMU). For example, the IOMMU cache associates unique cache tags to each I/O device based on device identifiers or settings determined by the system registers, caching the configuration data for each I/O device under the appropriate cache tag. When each I/O device requests main memory access, the IOMMU cache refers to the appropriate configuration data under the corresponding unique cache tag. This prevents contention in the IOMMU cache caused by one device evicting the cache entry of another, minimizing interference channels by reducing the need for main memory access.
Resource management subsystem that maintains fairness and order
One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.
System, apparatus and method for providing power monitoring isolation in a processor
In one embodiment, a processor comprises a plurality of cores and a controller. Each of the plurality of cores may include: an execution circuit, a power measurement circuit to measure power consumption of the core and a first register to store a power-related context identifier to identify a process to be executed on the core. The controller may include: a plurality of energy status registers each associated with a power-related context identifier and to store energy consumption information of a process; and a control circuit coupled to the plurality of energy status registers, where the control circuit is to enable each of a plurality of processes to independently monitor the energy consumption information of the process and prevent each of the plurality of processes from monitoring the energy consumption information of other ones of the plurality of processes. Other embodiments are described and claimed.
TASK EXECUTION IN A SIMD PROCESSING UNIT WITH PARALLEL GROUPS OF PROCESSING LANES
A SIMD processing unit processes a plurality of tasks which each include up to a predetermined maximum number of work items. The work items of a task are arranged for executing a common sequence of instructions on respective data items. The data items are arranged into blocks, with some of the blocks including at least one invalid data item. Work items which relate to invalid data items are invalid work items. The SIMD processing unit comprises a group of processing lanes configured to execute instructions of work items of a particular task over a plurality of processing cycles. A control module assembles work items into the tasks based on the validity of the work items, so that invalid work items of the particular task are temporally aligned across the processing lanes. In this way the number of wasted processing slots due to invalid work items may be reduced.
Multi-core processor using former-stage pipeline portions and latter-stage pipeline portions assigned based on decode results in former-stage pipeline portions
A multi-core processor includes a plurality of former-stage cores that perform parallel processing using a plurality of pipelines covering a plurality of stages. In the pipelines, the former-stage cores perform stages ending with an instruction decode stage; stages starting with an instruction execution stage are executed by a latter-stage core. A dynamic load distribution block refers to decode results in the instruction decode stage and controls to assign the latter-stage core with a latter-stage-needed decode result being a decode result whose processing needs to be executed in the latter-stage core.
Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
A global front end scheduler to schedule instruction sequences to a plurality of virtual cores implemented via a plurality of partitionable engines. The global front end scheduler includes a thread allocation array to store a set of allocation thread pointers to point to a set of buckets in a bucket buffer in which execution blocks for respective threads are placed, a bucket buffer to provide a matrix of buckets, the bucket buffer including storage for the execution blocks, and a bucket retirement array to store a set of retirement thread pointers that track a next execution block to retire for a thread.
Method For Platform-Based Scheduling Of Job Flow
A method for a platform-based scheduling of a job flow is provided, which relates to the technology field of satellites. Specifically, the method includes: acquiring satellite remote sensing data according to transit time of a satellite and triggering a processing flow of the satellite remote sensing data; determining an execution order of the processing flow of the satellite remote sensing data according to a constraint relationship; and allocating processing resources to the processing flow in a hierarchical scheduling manner according to the execution order, and executing the processing flow with the processing resources. Due to a reasonable determination of the execution order of the processing flow of the satellite remote sensing data and a scheduling of the processing resources in the hierarchical scheduling manner, a reasonable allocation of the processing resources is realized. Thus, A timeliness of the processing of the satellite remote sensing data may be improved.
Task execution in a SIMD processing unit with parallel groups of processing lanes
A SIMD processing unit processes a plurality of tasks which each include up to a predetermined maximum number of work items. The work items of a task are arranged for executing a common sequence of instructions on respective data items. The data items are arranged into blocks, with some of the blocks including at least one invalid data item. Work items which relate to invalid data items are invalid work items. The SIMD processing unit comprises a group of processing lanes configured to execute instructions of work items of a particular task over a plurality of processing cycles. A control module assembles work items into the tasks based on the validity of the work items, so that invalid work items of the particular task are temporally aligned across the processing lanes. In this way the number of wasted processing slots due to invalid work items may be reduced.
Allocation of memory
Methods of memory allocation map registers referenced by different groups of instances of the same task to individual logical memories. Other example methods describe the mapping of registers referenced by a task to different banks within a single logical memory and in various examples this mapping may take into consideration which bank is likely to be the dominant bank for the particular task and the allocation for one or more other tasks.