Patent classifications
G06F9/5066
Recommendations for scheduling jobs on distributed computing devices
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for scheduling operations represented as a computational graph on a distributed computing network. A method includes: receiving data representing operations to be executed in order to perform a job on a plurality of hardware accelerators of a plurality of different accelerator types; generating, for the job and from at least the data representing the operations, features that represent a predicted performance for the job on hardware accelerators of the plurality of different accelerator types; generating, from the features, a respective predicted performance metric for the job for each of the plurality of different accelerator types according to a performance objective function; and providing, to a scheduling system, one or more recommendations for scheduling the job on one or more recommended types of hardware accelerators.
PARALLEL METHOD AND DEVICE FOR CONVOLUTION COMPUTATION AND DATA LOADING OF NEURAL NETWORK ACCELERATOR
Disclosed are a parallel method and device for convolution computation and data loading of a neural network accelerator. The method needs two input feature maps and two convolution kernel cache blocks, and sequentially stores the input feature maps and 64 convolution kernels into cache sub-blocks according to a loading length, so as to execute convolution computation and simultaneously load data of a next group of 64 convolution kernels.
Method and Apparatus for Updating Application Identification Model, and Storage Medium
A method and an apparatus for updating an application identification model, and a storage medium are provided. A client device may determine a plurality of training samples based on identification results of a plurality of pieces of data traffic, and train an application identification model using the training samples. Then, the client device may upload model data of the trained application identification model to a server, and the server performs joint update based on the model data uploaded by a plurality of client devices. Then, the client device may obtain a jointly updated application identification model based on jointly updated model data delivered by the server.
RESERVOIR SIMULATION UTILIZING HYBRID COMPUTING
Hybrid computing that utilizes a computer processor coupled to one or more graphical processing units (GPUs) is configured to perform computations that generate outputs related to reservoir simulations associated with formations that may include natural gas and oil reservoirs.
Constraining memory use for overlapping virtual memory operations
Constraining memory use for overlapping virtual memory operations is described. The memory use is constrained to prevent memory from exceeding an operational threshold, e.g., in relation to operations for modifying content. These operations are implemented according to algorithms having a plurality of instructions. Before the instructions are performed in relation to the content, virtual memory is allocated to the content data, which is then loaded into the virtual memory and is also partitioned into data portions. In the context of the described techniques, at least one of the instructions affects multiple portions of the content data loaded in virtual memory. When this occurs, the instruction is carried out, in part, by transferring the multiple portions of content data between the virtual memory and a memory such that a number of portions of the content data in the memory is constrained to the memory that is reserved for the operation.
Methods, systems, articles of manufacture and apparatus to improve resource utilization for binary tree structures
Methods, apparatus, systems and articles of manufacture are disclosed to improve resource utilization for binary tree structures. An example apparatus to improve resource utilization for field programmable gate array (FPGA) resources includes a computation determiner to identify a computation capability value associated with the FPGA resources, a k-ary tree builder to build a first k-ary tree having a number of k-ary nodes equal to the computation capability value, and an FPGA memory controller to initiate collision computation by transferring the first k-ary tree to a first memory of the FPGA resources.
METHOD AND SYSTEM FOR DISTRIBUTED WORKLOAD PROCESSING
A method and system for distributing a compute model and data to process to heterogeneous and distributed compute devices. The compute model and a portion of the data is processed on a benchmark system and the timing used to make a job execution speed estimate for each compute device. Compute devices are selected and assigned data chunks based on the estimate so distributed processing is completed within a predefined time period. The compute model and data chunks can be sent to the respective compute devices using separate processes, such as a payload manager configured to transfer compute jobs to remote devices and a messaging engine configured to transfer data messages, and where the payload manager and messaging engine communicate with corresponding software engines on the compute devices.
Computing Device Control of a Job Execution Environment
Job execution environment control techniques are described to manage policy selection and implementation to control use of job executors by a computing device, automatically and without user intervention. These techniques are usable to select a policy from a plurality of policies that is then used to control lifecycles of job executors of a job execution environment of a computing device. Further, these techniques are usable to respond dynamically to change the selected policy during runtime of the application in response to changes in the job execution environment.
Techniques to generate execution schedules from neural network computation graphs
Techniques are described for a compiler scheduling algorithm/routine that utilizes backtracking to generate an execution schedule for a neural network computation graph using a neural network compiler intermediate representation of hardware synchronization counters. The hardware synchronization counters may be referred to as physical barriers, hardware (HW) barriers, or barriers and their intermediate representations may be referred to as barrier tasks or barriers. Backtracking is utilized to prevent an available number of hardware barriers from being exceeded during performance of an execution schedule. An execution schedule may be a computation workload schedule for neural network inference applications. An execution schedule may also be a first in first out (FIFO) schedule.
TASK MIGRATION METHOD, APPARATUS, ELECTRONIC DEVICE AND STORAGE MEDIUM
The present disclosure provides a task migration method, apparatus, electronic device and storage medium, and relates to the technical field of data processing. The method may include: obtaining a task submitted by a user; in the case that the task is a Hadoop task and it is determined that task conversion is to be performed, converting Hadoop parameters in the task into parameters recognizable by a Spark; and injecting a conversion result into a predetermined kit and submitting the predetermined kit to a Spark cluster. The solution of the present disclosure may be applied to reduce the user's workload and enhance the processing efficiency etc.