G06F2209/486

Memory module threading with staggered data transfers
11347665 · 2022-05-31 · ·

A method of transferring data between a memory controller and at least one memory module via a primary data bus having a primary data bus width is disclosed. The method includes accessing a first one of a memory device group via a corresponding data bus path in response to a threaded memory request from the memory controller. The accessing results in data groups collectively forming a first data thread transferred across a corresponding secondary data bus path. Transfer of the first data thread across the primary data bus width is carried out over a first time interval, while using less than the primary data transfer continuous throughput during that first time interval. During the first time interval, at least one data group from a second data thread is transferred on the primary data bus.

REDUCING LOAD BALANCING WORK STEALING
20220164282 · 2022-05-26 ·

Embodiments are disclosed for a method. The method includes determining that a thief thread attempted a work steal from a garbage collection (GC) owner queue. Additionally, the method includes determining that a number of tasks in the GC owner queue meets a predetermined threshold. Further, the method includes determining that the GC owner queue comprises a heavy-weight task. The method also includes moving the heavy-weight task to a top position of the GC owner queue.

Multi-core system and controlling operation of the same

In a method of operating a multi-core system comprising a plurality of processor cores, a plurality of task stall information respectively corresponding to a plurality of tasks are provided by monitoring a task stall time with respect to each task. The task stall time indicates a time while the each task is suspended within a task active time, and the task active time indicates a time while a corresponding processor core is occupied by the each task. Task scheduling is performed based on the plurality of task stall information, and a fine-grained dynamic voltage and frequency scaling (DVFS) is performed based on the task scheduling. The plurality of tasks may be assigned to the plurality of processor cores based on load unbalancing, and the effects of the fine-grained DVFS may be increased to reduce the power consumption of the multi-core system.

Hardware assisted fine-grained data movement

A processor includes a task scheduling unit and a compute unit coupled to the task scheduling unit. The task scheduling unit performs a task dependency assessment of a task dependency graph and task data requirements that correspond to each task of the plurality of tasks. Based on the task dependency assessment, the task scheduling unit schedules a first task of the plurality of tasks and a second proxy object of a plurality of proxy objects specified by the task data requirements such that a memory transfer of the second proxy object of the plurality of proxy objects occurs while the first task is being executed.

Task scheduling in a GPU using wakeup event state data

A method of scheduling tasks within a GPU or other highly parallel processing unit is described which is both age-aware and wakeup event driven. Tasks which are received are added to an age-based task queue. Wakeup event bits for task types, or combinations of task types and data groups, are set in response to completion of a task dependency and these wakeup event bits are used to select an oldest task from the queue that satisfies predefined criteria.

AUTO-RECOVERY FRAMEWORK

The present disclosure relates to computer-implemented methods, software, and systems for an automatic recovery job execution through a scheduling framework in a cloud environment. One or more recovery jobs are scheduled to be performed periodically for one or more registered service components included in a service instance running on a cluster node of a cloud platform. Each recovery job is associated with a corresponding service component of the service instance. A health check operation is invoked at a service component based on executing a recovery job at the scheduling framework corresponding to the service component. In response to determining that the service component needs a recovery measure based on a result from the health check operation, a recovery operation is invoked as part of executing a set of scheduled routines of the recovery job. Implemented logic for the recovery operation is stored and executed at the service component.

PROCESSING ENGINE SCHEDULING FOR TIME-SPACE PARTITIONED PROCESSING SYSTEMS

Embodiments for improved processing efficiency between a processor and at least one coprocessor are disclosed. Some examples are directed to a processor-coprocessor scheduling in which workloads are scheduled to a coprocessor based on a timing window of the processor. In additional or alternative examples, workloads are assigned to the coprocessor based on the processing resources and/or an order of priority. In connection with the disclosed embodiments, the coprocessor can be implemented by a graphics processing unit (GPU), hardware processing accelerator, field-programmable gate array (FPGA), application-specific integrated circuit (ASIC), or other processing circuitry. The processor can be implemented by a central processing unit (CPU) or other processing circuitry.

Ticket based request flow control

Disclosed are ticketed flow control mechanisms in a processing system with one or more masters and one or more slaves. In an aspect, a targeted slave receives a request from a requesting master. If the targeted slave is unavailable to service the request, a ticket for the request is provided to the requesting master. As resources in the targeted slave become available, messages are broadcasted for the requesting master to update the ticket value. When the ticket value has been updated to a final value, the requesting master may re-transmit the request.

HETEROGENEOUS SYSTEM ON A CHIP SCHEDULER

Described are techniques for scheduling tasks on a heterogeneous system on a chip (SoC). The techniques including receiving a directed acyclic graph at a meta pre-processor associated with a heterogeneous SoC and communicatively coupled to a scheduler, wherein the directed acyclic graph corresponds to a control flow graph of tasks associated with an application executed by the heterogeneous SoC. The techniques further including determining a rank for a respective task in the directed acyclic graph, wherein the rank is based on a priority of the respective task and a slack in the directed acyclic graph. The techniques further including providing the respective task to the scheduler for execution on the heterogeneous SoC according to the rank.

Managing asynchronous operations in cloud computing environments

A processor may execute an asynchronous operation of the program code, hibernate a process related to the asynchronous operation, and free-up related cloud runtime platform excluding the related system memory. Additionally, the processor may execute the asynchronous operation during the hibernation of the process, intercept an initiated completion function to the process after a completion of the asynchronous operation, inject at least one of additional program code and data into the completion function, un-hibernating the process and reallocate freed-up cloud runtime platform related resources of the process, and execute the completion function returning result data of the asynchronous operation and the at least one of additional program code and data to the process.