Patent classifications
G06F2209/5018
Thread Management Method and Apparatus
In a thread management method, an application first information to an operating system through an API. The first information is used to indicate at least one first task to be executed by the application. The operating system allocates, by using the first information, the at least one first task to be executed by the application to a corresponding first task queue, and allocates a thread to the first task queue based on a current load level of the operating system and a type of the first task queue.
Work scheduling on candidate collections of processing units selected according to a criterion
In some examples, a system receives a first unit of work to be scheduled in the system that includes a plurality of collections of processing units to execute units of work, where each respective collection of processing units of the plurality of collections of processing units is associated with a corresponding scheduling queue. The system selects, for the first unit of work according to a first criterion, candidate collections from among the plurality of collections of processing units, and enqueues the first unit of work in a schedule queue associated with a selected collection of processing units that is selected, according to a selection criterion, from among the candidate collections.
ASYMMETRIC TUNING
Techniques for asymmetric scheduling are described. An example includes a plurality of processor cores, at least two processor cores to have different instruction set architecture support; storage for device characteristics of the processor core including instruction set architecture support; and a scheduler to schedule a thread on one of the plurality of processor based at least in part on the instruction set architecture support.
GLOBAL UNIFIED INTERDEPENDENT MULTI-TENANT QUALITY-OF-SERVICE PROCESSOR SCHEME
Embodiments of apparatuses, methods, and systems for a hierarchical multi-tenant processor scheme are disclosed. In an embodiment, a processor includes circuitry to execute threads, registers to store first values to define a tenant hierarchy, registers to store second values to specify a location of a thread corresponding to a tenant within the tenant hierarchy, and circuitry to include the second values in a request to access a resource. Use of the resource is to be monitored or controlled based on the location of the tenant within the tenant hierarchy.
SELF-TUNING THREAD DISPATCH POLICY
Self-tuning thread dispatch policies are described herein. According to one example, a self-tuning thread dispatch policy uses the relative execution time for GPU engines from previous frames to modify the thread dispatch policy for a subsequent frame. In one example, a graphics processing device includes command processing circuitry to receive commands for a render engine and a compute engine of the GPU to render and process frames of an application. The graphics processing device also includes circuitry to determine the usage of shared hardware resources by the render engine and the compute engine for one or more frames of the application. The number of threads to dispatch to the shared hardware resources for a next frame can then be adjusted for the render engine or the compute engine based on the usage of the shared hardware resources for the previous one or more frames.
FAST SHUTDOWN OF LARGE SCALE-UP PROCESSES
A system for shutting down a process of a database is provided. In some aspects, the system performs operations including tracking, during startup of a process, code locations of a process in the at least one memory. The operations may further include tracking, during runtime of the process and in response to the tracking the code locations, memory segments of the at least one memory allocated to the process. The operations may further include receiving an indication for a shutdown of a process. The operations may further include waking, in response to the indication, at least one processing thread of a plurality of processing threads allocated to a database system. The operations may further include allocating a list of memory mappings to the plurality of processing threads. The operations may further include freeing, by the first processing thread, the physical memory assigned to the processing thread by the memory mappings.
Command management using allocated command identifier pools
Systems and methods for threaded computing systems using allocated command identifier pools for command management are described. Command requests for different processing threads are received. Based on the thread assigned to process the command request, command identifiers are assigned from different pools of command identifiers for each thread, where each pool contains non-overlapping sets of command identifiers. The command identifiers are returned to the same pool that the command identifier came from upon completion of each command.
Decentralized processing of worker threads
One or more techniques and/or systems are provided for managing one or more worker threads. For example, a utility list queue may be populated with a set of work item entries for execution. A set of worker threads may be initialized to execute work item entries within the utility list queue. In an example, a worker thread may be instructed to operate in a decentralized manner, such as without guidance from a timer manager thread. The worker thread may be instructed to execute work item entries that are not assigned to other worker threads and that are expired (e.g., ready for execution). The worker thread may transition into a sleep state if the utility list queue does not comprise at least one work item entry that is unassigned and expired.
USER-LEVEL THREADING FOR SIMULATING MULTI-CORE PROCESSOR
A method improves an execution speed of a host multi-core simulator simulating a target multi-core processor that has a hierarchical architecture including multiple corelets per core that, in turn include multiple functional units. The host multi-core simulator is implemented using multiple OS threads. The method selects layers in the hierarchical architecture to simulate on one of the OS threads, based on a shortest estimated layer execution time determined by (1.0+t/c*s)/min(c, t), wherein c is a number of cores in the simulator, t is a number of OS threads, and s is a threading overhead coefficient. The method respectively executes, from among the selected layers, a parallel simulation of the units therein that frequently communicate with each other on one of the multiple OS threads based on a communication frequency threshold, by assigning and using a respective user-level thread for each of the units from among a plurality of user-level threads.
Barrier-aware graphics cluster scheduling mechanism
An apparatus to facilitate thread scheduling is disclosed. In one embodiment the apparatus includes a processor comprising a plurality of multiprocessors comprising single-instruction multiple thread (SIMT) execution circuitry to simultaneously execute multiple threads, a shared local memory to be shared by the multiple threads, and scheduling hardware logic to schedule the multiple threads in a thread group for execution across the plurality of multiprocessors in accordance with barrier data. The instructions of the multiple threads are to produce shared data to be stored in the shared local memory when executed by the plurality of multiprocessors, wherein additional instructions of at least a first thread of the multiple threads are to use the shared data, and wherein, in accordance with the barrier data, the first thread is to wait for other threads of the multiple threads to finish producing the shared data before executing the additional instructions.