G06F2209/5018

Scheduler for amp architecture with closed loop performance and thermal controller

Systems and methods are disclosed for scheduling threads on a processor that has at least two different core types, such as an asymmetric multiprocessing system. Each core type can run at a plurality of selectable voltage and frequency scaling (DVFS) states. Threads from a plurality of processes can be grouped into thread groups. Execution metrics are accumulated for threads of a thread group and fed into a plurality of tunable controllers for the thread group. A closed loop performance control (CLPC) system determines a control effort for the thread group and maps the control effort to a recommended core type and DVFS state. A closed loop thermal and power management system can limit the control effort determined by the CLPC for a thread group, and limit the power, core type, and DVFS states for the system. Deferred interrupts can be used to increase performance.

Frozen indices
11556388 · 2023-01-17 · ·

Methods and systems for searching a frozen index are provided. Exemplary methods include: a method may comprise: receiving an initial search and a subsequent search; loading the initial search and the subsequent search into a throttled thread pool, the throttled thread pool including; getting the initial search from the throttled thread pool; storing a first shard from a mass storage in a memory in response to the initial search; performing the initial search on the first shard; providing first top search result scores from the initial search; and removing the first shard from the memory when the initial search is completed.

CPU CLUSTER SHARED RESOURCE MANAGEMENT

Embodiments include an asymmetric multiprocessing (AMP) system having a first central processing unit (CPU) cluster comprising a first core type, and a second CPU cluster comprising a second core type, where the AMP system can update a thread metric for a first thread running on the first CPU cluster based at least on: a past shared resource overloaded metric of the first CPU cluster, and on-core metrics of the first thread. The on-core metrics of the first thread can indicate that first thread contributes to contention of the same shared resource corresponding to the past shared resource overloaded metric of the first CPU cluster. The AMP system can assign the first thread to a different CPU cluster while other threads of the same thread group remain assigned to the first CPU cluster. The thread metric can include a Matrix Extension (MX) thread flag or a Bus Interface Unit (BIU) thread flag.

Method and system for multi-pronged backup using real-time attributes

A method and system for backup processes that includes identifying a target volume and identifying a number of available threads to back up the target volume. The elements in the target volume are distributed among the available threads based on a currently pending size of data in the threads. The elements are stored from each thread into a backup container, and merged from each of the backup containers into a backup volume.

EXTENDING PARALLEL SOFTWARE THREADS
20230229489 · 2023-07-20 · ·

A method for executing a software program, comprising: identifying in a program a plurality of host threads, each for performing some of a plurality of parallel sub-tasks of a task; and for each of the host threads: generating device threads, each associated with the host thread, each for one of the parallel tasks associated thereof; generating a parent thread associated with the host thread for communicating with the device threads; configuring a host processing circuitry to execute the parent thread; and configuring at least one other processing circuitry to execute in parallel the device threads while the host processing circuitry executes the parent thread; and for at least one of the host threads: receiving by the parent thread a value from the at least one other processing circuitry, the value generated when executing at least one of the device threads associated with the at least one host thread.

Scheduling processing of machine learning tasks on heterogeneous compute circuits

Scheduling work of a machine learning application includes instantiating kernel objects by a computer processor in response to input of kernel definitions. Each kernel object is of a kernel type indicating a compute circuit. The computer processor generates a graph in a memory. Each node represents a task and specifies an assignment of the task to one or more of the kernel objects, and each edge represents a data dependency. Task queues are created in the memory and assigned to queue tasks represented by the nodes. Kernel objects are assigned to the task queues, and the tasks are enqueued by threads executing the kernel objects, based on assignments of the kernel objects to the task queues and assignments of the tasks to the kernel objects. Tasks are dequeued by the threads, and the compute circuits are activated to initiate processing of the dequeued tasks.

OPERATION METHOD OF THE NON-UNIFORM MEMORY ACCESS SYSTEM

Provided is an operation method of a NUMA system, which includes: designating a page scan range including a plurality of pages; identifying a detour value for each of the plurality of pages; determining whether a detour value of a current target scan page is the same as the reference detour value; and releasing a connection of the current target scan page from the page table when determining that the detour value of the current target scan page is the same as the reference detour value.

Computing Device Control of a Job Execution Environment
20220405124 · 2022-12-22 · ·

Job execution environment control techniques are described to manage policy selection and implementation to control use of job executors by a computing device, automatically and without user intervention. These techniques are usable to select a policy from a plurality of policies that is then used to control lifecycles of job executors of a job execution environment of a computing device. Further, these techniques are usable to respond dynamically to change the selected policy during runtime of the application in response to changes in the job execution environment.

DYNAMICALLY REDISTRIBUTING I/O JOBS AMONG OPERATING SYSTEM THREADS

A thread may be de-activated (terminated or hibernated) or activated (e.g., re-activated or create anew if allowed) on a processing node, in response to which it may be desirable to redistribute the I/O jobs among the now active threads. Redistributing the I/O jobs may involve re-associating one or more active threads resulting from the activation or de-activation with one or more of the bin groups and/or re-assigning one or more job bins with one or more bin groups, for example, as will now be described. The bin groups may be re-associated with remaining active threads. I/O jobs may be redistributed among the active threads re-assigning job bins to bin groups. One or more queued I/O jobs may be moved to the thread that now owns the I/O job.

Active queue management in a multi-node computing environment

Systems and methods for processing computing jobs of a managed network are disclosed. Each of one or more worker nodes may implement a scheduler thread and a pool of worker threads. Upon waking up from a sleep state, the scheduler thread may determine a current number of jobs in an in-memory job queue that are waiting for processing by a worker thread, and may compute a job-completion rate of jobs processed by threads of the pool. Based on the job-completion rate, the scheduler thread may perform one or more of retrieving more jobs from a centralized database job queue and adding them to the in-memory job queue; removing one or more jobs from the in-memory job queue and returning them to the database job queue; leaving the in-memory job queue unchanged; or adjusting the duration of the sleep-interval timer. The scheduler thread may then return to a sleep state.