Patent classifications
G06F2209/5017
BALANCING DATA PARTITIONS AMONG DYNAMIC SERVICES IN A CLOUD ENVIRONMENT
A method includes identifying, by a first instance of a service, a first number of data partitions of a data source to be processed by the service and a second number of instances of the service available to process the first number of data partitions. The method further includes separating the first number of data partitions into a first set of data partitions and a second set of data partitions in view of the second number of instances of the service, determining a target number of data partitions from the first set of data partitions to be claimed by each of the second number of instances of the service, and claiming, by the first instance of the service, the target number of data partitions from the first set of data partitions and up to one data partition from the second set of data partitions.
Data transferring apparatus and method for transferring data with overlap
A data transferring apparatus and a method for transferring data with overlap are provided. The data transferring apparatus includes a command splitter circuit and a plurality of tile processing circuits. The command splitter circuit splits a block level transfer command into a plurality of tile transfer tasks. The command splitter circuit may issue the tile transfer tasks to the tile processing circuits in a plurality of batches. The tile processing circuits may execute the tile transfer tasks in a current batch, so as to read data of a plurality of corresponding tiles among a plurality of source tiles of a source block to the tile processing circuits. After all the tile transfer tasks in the current batch have been executed by the tile processing circuits, the command splitter circuit issues the tile transfer tasks in a next batch of the batches to the tile processing circuits.
Methods for Offloading A Task From A Processor to Heterogeneous Accelerators
Systems and methods are provided for offloading a task from a central processor in a radio access network (RAN) server to one or more heterogeneous accelerators. For example, a task associated with one or more operational partitions (or a service application) associated with processing data traffic in the RAN is dynamically allocated for offloading from the central processor based on workload status information. One or more accelerators are dynamically allocated for executing the task, where the accelerators may be heterogeneous and may not comprise pre-programming for executing the task. The disclosed technology further enables generating specific application programs for execution on the respective heterogeneous accelerators based on a single set of program instructions. The methods automatically generate the specific application programs by identifying common functional blocks for processing data traffic and mapping the functional blocks to the single set of program instructions to generate code native to the respective accelerators.
PARALLEL PROCESSING ARCHITECTURE FOR ATOMIC OPERATIONS
Techniques for task processing in a parallel processing architecture for atomic operations are disclosed. A two-dimensional array of compute elements is accessed, where each compute element within the array of compute elements is known to a compiler and is coupled to its neighboring compute elements within the array of compute elements. Control for the array of compute elements is provided on a cycle-by-cycle basis. The control is enabled by a stream of wide control words generated by the compiler. At least one of the control words involves an operation requiring at least one additional operation. A bit of the control word is set, where the bit indicates a multicycle operation. The control word is executed, on at least one compute element within the array of compute elements, based on the bit. The multicycle operation comprises a read-modify-write operation.
Tile assignment to processing cores within a graphics processing unit
A graphics processing unit configured to process graphics data using a rendering space which is sub-divided into a plurality of tiles, the graphics processing unit comprising: a plurality of processing cores configured to render graphics data; cost indication logic configured to obtain a cost indication for each of a plurality of sets of one or more tiles of the rendering space, wherein the cost indication for a set of one or more tiles is suggestive of a cost of processing the set of one or more tiles; similarity indication logic configured to obtain similarity indications between sets of one or more tiles of the rendering space, wherein the similarity indication between two sets of one or more tiles is indicative of a level of similarity between the two sets of tiles according to at least one processing metric; and scheduling logic configured to assign the sets of one or more tiles to the processing cores for rendering in dependence on the cost indications and the similarity indications.
Apparatus and method for real time graphics processing using local and cloud-based graphics processing resources
An apparatus and method for scheduling threads on local and remote processing resources. For example, one embodiment of an apparatus comprises: a local graphics processor to execute threads of an application; graphics processor virtualization circuitry and/or logic to generate a virtualized representation of a local processor; a scheduler to identify a first subset of the threads for execution on a local graphics processor and a second subset of the threads for execution on a virtualized representation of a local processor; the scheduler to schedule the first subset of threads on the local graphics processor and the second subset of the threads by transmitting the threads or a representation thereof to Cloud-based processing resources associated with the virtualized representation of the local processor; and the local graphics processor to combine first results of executing the first subset of threads on the local graphics processor with second results of executing the second subset of threads on the Cloud-based processing resources to render an image frame.
PARTITIONING AND PLACEMENT OF MODELS
This disclosure describes techniques and mechanisms for enabling a user to run heavy deep learning workloads on standard edge networks without off-loading computation to a cloud, leveraging the available edge computing resources, and efficiently partitioning and distributing a Deep Neural Network (DNN) over a network. The techniques enable the user to split a workload into multiple parts and process the workload on a set of smaller, less capable compute nodes in a distributed manner, without compromising on performance, and while meeting a Service Level Objective (SLO).
Adaptive rebuilding of encoded data slices in a storage network
A method for execution by a computing device of a storage network begins by obtaining scoring information for a rebuilding encoded data slices for one or more storage units of a set of storage units of the storage network, where the scoring information includes two or more of a plurality of rebuilding rates, a plurality of input/output rates, a plurality of scores, and a plurality of selection rates. The method continues with determining a rebuilding rate of the plurality of rebuilding rates to utilize for the rebuilding based on the scoring information. The method continues by implementing the rebuilding of the encoded data slices in accordance with the rebuilding rate.
Application aware resource allocation for deep learning job scheduling
One embodiment provides a method, including: receiving at least one deep learning job for scheduling and running on a distributed system comprising a plurality of nodes; receiving a batch size range indicating a minimum batch size and a maximum batch size that can be utilized for running the at least one deep learning job; determining a plurality of runtime estimations for running the at least one deep learning job; creating a list of optimal combinations of (i) batch sizes and (ii) numbers of the plurality of nodes for running both (a) the at least one deep learning job and (b) current deep learning jobs; and scheduling the at least one deep-learning job at the distributed system, responsive to identifying, by utilizing the list, that the distributed system has necessary processing resources for running both (iii) the at least one deep learning job and (iv) the current deep learning jobs.
Managing workloads of a deep neural network processor
A computing system includes processor cores for executing applications that utilize functionality provided by a deep neural network (“DNN”) processor. One of the cores operates as a resource and power management (“RPM”) processor core. When the RPM processor receives a request to execute a DNN workload, it divides the DNN workload into workload fragments. The RPM processor then determines whether a workload fragment is to be statically allocated or dynamically allocated to a DNN processor. Once the RPM processor has selected a DNN processor, the RPM enqueues the workload fragment on a queue maintained by the selected DNN processor. The DNN processor dequeues workload fragments from its queue for execution. Once execution of a workload fragment has completed, the DNN processor generates an interrupt indicating that execution of the workload fragment has completed. The RPM processor can then notify the processor core that originally requested execution of the workload fragment.