G06F2209/483

USER-LEVEL THREADING FOR SIMULATING MULTI-CORE PROCESSOR
20230185601 · 2023-06-15 ·

A method improves an execution speed of a host multi-core simulator simulating a target multi-core processor that has a hierarchical architecture including multiple corelets per core that, in turn include multiple functional units. The host multi-core simulator is implemented using multiple OS threads. The method selects layers in the hierarchical architecture to simulate on one of the OS threads, based on a shortest estimated layer execution time determined by (1.0+t/c*s)/min(c, t), wherein c is a number of cores in the simulator, t is a number of OS threads, and s is a threading overhead coefficient. The method respectively executes, from among the selected layers, a parallel simulation of the units therein that frequently communicate with each other on one of the multiple OS threads based on a communication frequency threshold, by assigning and using a respective user-level thread for each of the units from among a plurality of user-level threads.

Configurable logic platform with reconfigurable processing circuitry
11500682 · 2022-11-15 · ·

An architecture for a load-balanced groups of multi-stage manycore processors shared dynamically among a set of software applications, with capabilities for destination task defined intra-application prioritization of inter-task communications (ITC), for architecture-based ITC performance isolation between the applications, as well as for prioritizing application task instances for execution on cores of manycore processors based at least in part on which of the task instances have available for them the input data, such as ITC data, that they need for executing.

EFFICIENT THREAD GROUP SCHEDULING

A mechanism is described for facilitating intelligent thread scheduling at autonomous machines. A method of embodiments, as described herein, includes detecting dependency information relating to a plurality of threads corresponding to a plurality of workloads associated with tasks relating to a processor including a graphics processor. The method may further include generating a tree of thread groups based on the dependency information, where each thread group includes multiple threads, and scheduling one or more of the thread groups associated a similar dependency to avoid dependency conflicts.

Techniques to support multiple interconnect protocols for a common set of interconnect connectors

Embodiments may be generally direct to apparatuses, systems, method, and techniques to determine a configuration for a plurality of connectors, the configuration to associate a first interconnect protocol with a first subset of the plurality of connectors and a second interconnect protocol with a second subset of the plurality of connectors, the first interconnect protocol and the second interconnect protocol are different interconnect protocols and each comprising one of a serial link protocol, a coherent link protocol, and an accelerator link protocol, cause processing of data for communication via the first subset of the plurality of connectors in accordance with the first interconnect protocol, and cause processing of data for communication via the second subset of the plurality of connector in accordance with the second interconnect protocol.

Systems, methods, and apparatuses for implementing a scheduler and workload manager that identifies and optimizes horizontally scalable workloads

In accordance with disclosed embodiments, there are provided systems, methods, and apparatuses for implementing a stateless, deterministic scheduler and work discovery system with interruption recovery. For instance, according to one embodiment, there is disclosed a system to implement a stateless scheduler service, in which the system includes: a processor and a memory to execute instructions at the system; a compute resource discovery engine to identify one or more computing resources available to execute workload tasks; a workload discovery engine to identify a plurality of workload tasks to be scheduled for execution; a cache to store information on behalf of the compute resource discovery engine and the workload discovery engine; a scheduler to request information from the cache specifying the one or more computing resources available to execute workload tasks and the plurality of workload tasks to be scheduled for execution; and further in which the scheduler is to schedule at least a portion of the plurality of workload tasks for execution via the one or more computing resources based on the information requested. Other related embodiments are disclosed.

FACILITATING DYNAMIC PARALLEL SCHEDULING OF COMMAND PACKETS AT GRAPHICS PROCESSING UNITS ON COMPUTING DEVICES
20170236246 · 2017-08-17 · ·

A mechanism is described for facilitating parallel scheduling of multiple commands on computing devices. A method of embodiments, as described herein, includes detecting a command of a plurality of commands to be processed at a graphics processing unit (GPU), and acquiring one or more resources of a plurality of resources to process the command. The plurality of resources may include other resources being used to process other commands of the plurality of commands. The method may further include facilitating processing of the command using the one or more resources, wherein the command is processed in parallel with processing of the other commands using the other resources.

Methods and systems for image processing

A method for image processing is provided. The method may include: obtaining a plurality of frames, each of the plurality of frames comprising a plurality of pixels; determining, based on the plurality of frames, whether a current frame of the plurality of frames comprises a moving object; in response to determining that the current frame includes no moving object, obtaining a first count of frames, and generating a target image by superimposing the first count of frames; in response to determining that the current frame includes a moving object, obtaining a second count of frames, and generating the target image by superimposing the second count of frames.

RESOURCE AVAILABILITY-BASED WORKFLOW EXECUTION TIMING DETERMINATION
20220035667 · 2022-02-03 ·

According to a computer-implemented method, an available amount of each of multiple computing resources is determined by machine logic over a period of time at a computing device. The machine logic also determines an expected usage of each computing resource to execute each workflow in a queue. The machine logic also determines a time of execution of each workflow in the queue based on the available amount of each of the multiple computing resources over time and the expected usage of each computing resource to execute each workflow in the queue.

System and method for increasing robustness of heterogeneous computing systems

Disclosed is a method for task pruning that can be utilized in existing resource allocation systems to improve the systems' robustness without requiring changing to existing mapping heuristics. The pruning mechanism leverages a probability model, which calculates the probability of a task competing before its deadline in the presence of task dropping, and only schedules tasks that are likely to succeed. Pruning tasks whose chance of success is low improves the chance of success for other tasks. Tasks that are unlikely to succeed are either deferred from current scheduling event or are preemptively dropped from the system. The pruning method can benefit service providers by allowing them to utilize their resources more efficiently and use them only for tasks that can meet their deadlines. The pruning method further helps end users by making the system more robust in allowing more tasks to complete on time.

Organizing tasks by a hierarchical task scheduler for execution in a multi-threaded processing system
11249807 · 2022-02-15 · ·

A method for scheduling tasks from a program executed by a multi-processor core system is disclosed. The method includes a scheduler that groups a plurality of tasks, each having an assigned priority, by priority in a task group. The task group is assembled with other task groups having identical priorities in a task group queue. A hierarchy of task group queues is established based on priority levels of the assigned tasks. Task groups are assigned to one of a plurality of worker threads based on the hierarchy of task group queues. Each of the worker threads is associated with a processor in the multi-processor system. The tasks of the task groups are executed via the worker threads according to the order in the hierarchy.