Patent classifications
G06F9/5066
RECOMMENDATIONS FOR SCHEDULING JOBS ON DISTRIBUTED COMPUTING DEVICES
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for scheduling operations represented as a computational graph on a distributed computing network. A method includes: receiving data representing operations to be executed in order to perform a job on a plurality of hardware accelerators of a plurality of different accelerator types; generating, for the job and from at least the data representing the operations, features that represent a predicted performance for the job on hardware accelerators of the plurality of different accelerator types; generating, from the features, a respective predicted performance metric for the job for each of the plurality of different accelerator types according to a performance objective function; and providing, to a scheduling system, one or more recommendations for scheduling the job on one or more recommended types of hardware accelerators.
METHOD AND APPARATUS FOR SCHEDULING WORKFLOW ON CLOUD PLATFORMS
There is provided a method for scheduling a Network Based Media Processing (NBMP) workflow on a cloud platform. The method includes obtaining an input workflow including input media stream, generating a modified workflow by dividing the input media stream into one or more tasks, scheduling the one or more tasks on the cloud platform based on a schedule descriptor, which includes schedule type information and processing the modified workflow based on the scheduling of the one or more tasks.
ACCELERATING INFERENCES PERFORMED BY ENSEMBLE MODELS OF BASE LEARNERS
A method is provided for accelerating machine learning inferences. The method uses an ensemble model run on input data. This ensemble model involves several base learners, where each of the base learners has been trained. The method first schedules tasks for execution. As a result of the task scheduling, one of the base learners is executed based on a subset of the input data. The execution of the tasks is then started to obtain respective task outcomes. An exit condition is repeatedly evaluated while executing the tasks by computing a deterministic function of the task outcomes obtained so far. This deterministic function output values indicate whether an inference result of the ensemble model has converged. Accordingly, the execution of the tasks can be interrupted if the exit condition evaluated last is found to be fulfilled. Eventually, an inference result of the ensemble model is estimated based on the task outcomes.
LEARNING AGENT BASED APPLICATION SCHEDULING
Tasks of directed acyclic graphs (DAGs) may be dynamically scheduled based on a plurality of constraints and conditions, task prioritization policies, task execution estimates, and configurations of a heterogenous system. A machine learning component may be initialized to dynamically schedule the tasks of the DAGs.
Efficient high bandwidth shared memory architectures for parallel machine learning and AI processing of large data sets and streams
The present disclosure relates to systems and methods to implement efficient high-bandwidth shared memory systems particularly suited for parallelizing and operating large scale machine learning and AI computing systems necessary to efficiently process high volume data sets and streams.
FEDERATED LEARNING
A federated learning method and apparatus, a device and a medium are provided, and relates to the field of artificial intelligence, in particular to the field of federated learning and machine learning. The federated learning method includes: receiving data related to a federated learning task of a target participant, wherein the target participant at least includes a first computing device for executing the federated learning task; determining computing resources of the first computing device that are able to be used to execute the federated learning task; and generating a first deployment scheme for executing the federated learning task in response to determining that the data and the computing resources meet a predetermined condition, wherein the first deployment scheme instructs to generate at least a first work node and a second work node on the first computing device.
Hybrid data-model parallelism for efficient deep learning
The embodiments herein describe hybrid parallelism techniques where a mix of data and model parallelism techniques are used to split the workload of a layer across an array of processors. When configuring the array, the bandwidth of the processors in one direction may be greater than the bandwidth in the other direction. Each layer is characterized according to whether they are more feature heavy or weight heavy. Depending on this characterization, the workload of an NN layer can be assigned to the array using a hybrid parallelism technique rather than using solely the data parallelism technique or solely the model parallelism technique. For example, if an NN layer is more weight heavy than feature heavy, data parallelism is used in the direction with the greater bandwidth (to minimize the negative impact of weight reduction) while model parallelism is used in the direction with the smaller bandwidth.
Automated distribution of models for execution on a non-edge device and an edge device
Techniques for generating and executing an execution plan for a machine learning (ML) model using one of an edge device and a non-edge device are described. In some examples, a request for the generation of the execution plan includes at least one objective for the execution of the ML model and the execution plan is generated based at least in part on comparative execution information and network latency information.
PERFORMANCE OVERHEAD OPTIMIZATION IN GPU SCOPING
The present disclosure relates to methods and devices for graphics processing including an apparatus, e.g., a GPU. The apparatus may process a first workload of a plurality of workloads at each of multiple clusters in a GPU pipeline. The apparatus may also increment a plurality of performance counters during the processing of the first workload at each of the multiple clusters. Further, the apparatus may determine, at each of the multiple clusters, whether the first workload is finished processing. The apparatus may also read, upon determining that the first workload is finished processing, a value of each of the multiple clusters for each of the plurality of performance counters. Additionally, the apparatus may transmit an indication of the read value of each of the multiple clusters for all of the plurality of performance counters.
INFORMATION PROCESSING SYSTEM AND INFORMATION PROCESSING METHOD
One or more information processing apparatuses to process information are provided. The information processing apparatus includes: a division function that divides processing information into a plurality of pieces, under a division condition that designates parallel processing among the information processing apparatuses, the processing information indicating a data processing procedure from a plurality of start points to one or more end points; a determination function that uniquely determines an assignee of each piece of the processing information divided by the division function, as any of the information processing apparatuses; and an execution function that executes a process in the information processing apparatus determined by the determination function.