Patent classifications
G06F2209/484
SCHEDULER, METHOD OF OPERATING THE SAME, AND ACCELERATOR APPARATUS INCLUDING THE SAME
A scheduler, a method of operating the scheduler, and an accelerator apparatus including the scheduler are disclosed. A method of operating a scheduler to perform scheduling on models to be executed in an accelerator, the method includes receiving at least one execution request for a first model and a second model that are executed independently from each other in the accelerator, and performing layer-unit scheduling on the first model and the second model based on workload characteristics of the first model and the second model.
LOW-COST TASK SPECIFIC DEVICE SCHEDULING SYSTEM
A low-cost task specific device system for scheduling tasks that are performed by one or more devices is described. After a preceding task is performed, the performance of a successive task is delayed for a task-specific recharge interval associated with the preceding task. The successive task is performed after the task-specific recharge interval has expired.
Deferred command execution
Deferred command execution by a command processor (CP) may be performed based on a determination that at least one command of a primary buffer is located between a first link of the primary buffer and a second link of the primary buffer. The first link and the second link may be to one or more secondary buffers that includes a set of commands. The CP may initiate, before executing, a fetch of a first set of commands in the set of commands based on the first link, a fetch of the at least one command of the primary buffer, and a fetch of a second set of commands in the set of commands based on the second link. After initiating the fetch of the second set of commands, the CP may execute the first set of commands, the at least one command of the primary buffer, and the second set of commands.
OPTIMIZATION FOR SCHEDULING OF BATCH JOBS
A method, system and computer program product for optimizing scheduling of batch jobs are disclosed. The method may include obtaining, by one or more processors, a set of batch jobs, connection relationships among batch jobs in the set of batch jobs, and a respective execution time of each batch job in the set of batch jobs. The method may also include generating, by the one or more processors, a directed weighted graph for the set of batch jobs, wherein in the directed weighted graph, a node represents a batch job, a directed edge between two nodes represents a directed connection between two corresponding batch jobs, a weight of a node represents the execution time of the batch job corresponding to the node. The method may also include obtaining, by one or more processors, information of consumption of same resource(s) among the batch jobs in the set of batch jobs.
Efficient thread group scheduling
A mechanism is described for facilitating intelligent thread scheduling at autonomous machines. A method of embodiments, as described herein, includes detecting dependency information relating to a plurality of threads corresponding to a plurality of workloads associated with tasks relating to a processor including a graphics processor. The method may further include generating a tree of thread groups based on the dependency information, where each thread group includes multiple threads, and scheduling one or more of the thread groups associated a similar dependency to avoid dependency conflicts.
Resource management for parent child workload
For resource management for a parent child workload, a processor organizes a plurality of processes into a plurality of process groups. Each process group includes a given parent process and all child processes of the given parent process. Each process group has a process level. The processor further calculates a process cost for each process group and assigns a process priority to each process group based on the process cost for the process group. The processor iteratively assigns computing resources to subgroups of a given process group with a highest process priority at a given process level.
DYNAMIC SEQUENCING OF DATA PARTITIONS FOR OPTIMIZING MEMORY UTILIZATION AND PERFORMANCE OF NEURAL NETWORKS
Optimized memory usage and management is crucial to the overall performance of a neural network (NN) or deep neural network (DNN) computing environment. Using various characteristics of the input data dimension, an apportionment sequence is calculated for the input data to be processed by the NN or DNN that optimizes the efficient use of the local and external memory components. The apportionment sequence can describe how to parcel the input data (and its associated processing parameters—e.g., processing weights) into one or more portions as well as how such portions of input data (and its associated processing parameters) are passed between the local memory, external memory, and processing unit components of the NN or DNN. Additionally, the apportionment sequence can include instructions to store generated output data in the local and/or external memory components so as to optimize the efficient use of the local and/or external memory components.
Dynamic sequencing of data partitions for optimizing memory utilization and performance of neural networks
Optimized memory usage and management is crucial to the overall performance of a neural network (NN) or deep neural network (DNN) computing environment. Using various characteristics of the input data dimension, an apportionment sequence is calculated for the input data to be processed by the NN or DNN that optimizes the efficient use of the local and external memory components. The apportionment sequence can describe how to parcel the input data (and its associated processing parameters—e.g., processing weights) into one or more portions as well as how such portions of input data (and its associated processing parameters) are passed between the local memory, external memory, and processing unit components of the NN or DNN. Additionally, the apportionment sequence can include instructions to store generated output data in the local and/or external memory components so as to optimize the efficient use of the local and/or external memory components.
Micro-architecture designs and methods for eager execution and fetching of instructions
Micro-architecture designs and methods are provided. A computer processing architecture may include an instruction cache for storing producer instructions, a half-instruction cache for storing half instructions, and eager shelves for storing a result of a first producer instruction. The computer processing architecture may fetch the first producer instruction and a first half instruction; send the first half instruction to the eager shelves; based on execution of the first producer instruction, send a second half instruction to the eager shelves; assemble the first producer instruction in the eager shelves based on the first half instruction and the second half instruction; and dispatch the first producer instruction for execution.
System and method for batch evaluation programs
A batching module that prepares a plurality of blocked expressions for batch evaluation. The plurality of blocked expressions comprises a plurality of expressions in a blocked state. The batching module divides the plurality of blocked expressions into one or more partitions. For each particular partition of the one or more partitions, a single batch processing call is dispatched to an application server to perform a batch evaluation.