G06F2209/509

Framework integration for instance-attachable accelerator

Techniques for partitioning data flow operations between execution on a compute instance and an attached accelerator instance are described. A set of operations supported by the accelerator is obtained. A set of operations associated with the data flow is obtained. An operation in the set of operations associated with the data flow is identified based on the set of operations supported by the accelerator. The accelerator executes the first operation.

On-Demand Access to Compute Resources
20230108828 · 2023-04-06 · ·

Disclosed are systems, methods and computer-readable media for controlling and managing the identification and provisioning of resources within an on-demand center as well as the transfer of workload to the provisioned resources. One aspect involves creating a virtual private cluster within the on-demand center for the particular workload from a local environment. A method of managing resources between a local compute environment and an on-demand environment includes detecting an event associated with a local compute environment and based on the detected event, identifying information about the local environment, establishing communication with an on-demand compute environment and transmitting the information about the local environment to the on-demand compute environment, provisioning resources within the on-demand compute environment to substantially duplicate the local environment and transferring workload from the local-environment to the on-demand compute environment. The event can be a threshold or a triggering event within or outside of the local environment.

DISTRIBUTED COMPUTING PIPELINE PROCESSING

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing computational graphs on distributed computing devices. One of the methods includes receiving a request to execute a processing pipeline (i) first operations that transform raw inputs into pre-processed inputs and (ii) second operations that operate on the pre-processed inputs; and in response: assigning the first operations to two or more of a plurality of computing devices, assigning the second operations to one or more hardware accelerators of a plurality of hardware accelerators, wherein each hardware accelerator is interconnected with the plurality of computing devices, and configured to (i) receive inputs from respective queues of the two or more computing devices assigned the first operations and (ii) perform the second operations on the received pre-processed inputs, and executing, in parallel, the processing pipeline on the two or more computing devices and the one or more hardware accelerators.

APPLICATION-SPECIFIC PACKET PROCESSING OFFLOAD SERVICE
20220321496 · 2022-10-06 · ·

A method for offloading network operations is described. The method includes receiving an offload service capabilities request message from a first application to request information from an offload service regarding capabilities of the offload service that meet a set of requirements; transmitting a response to the application that includes a set of offload service templates that are (1) selected based on the application requirements and (2) possible templates to be modified for performing operations of the application; evaluating the network resources for the program code of the application to select a set of network resources for offloading the operations of the first application to the network resources; and installing the program code, which was generated based on a set of offload service templates, on the set of network resources such that the set of network resources process packets from a second application that are addressed to the first application.

APPARATUS AND METHOD WITH LARGE-SCALE COMPUTING

A computing method and device for large-scale computing is provided. The computing device includes at least one processing device configured to perform an operation related to a neural network, a sensor configured to sense an electrical characteristic of the at least one processing device, an operating frequency of the at least one processing device, and a temperature of the at least one processing device, and a processor configured to calculate a workload to be allocated to the at least one processing device based on an operating mode of the at least one processing device, the electrical characteristic of the at least one processing device, the operating frequency of the at least one processing device, and the temperature of the at least one processing device, and control the electrical characteristic, the operating frequency, and the temperature based on the operating mode and the workload.

SYSTEMS AND METHODS FOR INCREASING HARDWARE ACCELERATOR PERFORMANCE IN NEURAL NETWORK APPLICATIONS

Low-power systems and methods increase computational efficiency in neural network processing by allowing hardware accelerators to perform processing steps on large amounts of data at reduced execution times without significantly increasing hardware cost. In various embodiments, this is accomplished by accessing locations in a source memory coupled to a hardware accelerator and using a resource optimizer that based on storage availability and network parameters determines target locations in a number of distributed memory elements. The target storage locations are selected according to one or more memory access metrics to reduce power consumption. A read/write synchronizer then schedules simultaneous read and write operations to reduce idle time and further increase computational efficiency.

BUBBLE SORTING FOR SCHEDULING TASK EXECUTION IN COMPUTING SYSTEMS

One or more embodiments of the present disclosure relate to determining a first execution schedule for execution of a plurality of runnables, the plurality of runnables corresponding to a process executed using a plurality of compute engines. Additionally or alternatively, one or more embodiments may relate to modifying the first execution schedule to generate a second execution schedule. The modifying may include moving one or more runnables of the plurality of runnables to populate one or more gaps in the first execution schedule. The moving of the one or more runnables may be performed in view of one or more moving constraints.

Computing systems with off-load processing for networking related tasks

Computing systems with off-load processing for networking related tasks are disclosed. A first mobile electronic device includes first wireless communication circuitry to support cellular communication; and second wireless communication circuitry to support wireless communication. The first electronic device includes processor circuitry to: identify a first one of a first cellular network or a second cellular network based on availability of the first and second cellular networks; initiate establishment of a first communication link between a second mobile electronic device and the first one of the first cellular network or the second cellular network via the first wireless communication circuitry and the second wireless communication circuitry; and initiate establishment of a second communication link between the second mobile electronic device and a second one of the first cellular network or the second cellular network based on a change in the availability of the first and second cellular networks.

DETECTING EXECUTION HAZARDS IN OFFLOADED OPERATIONS
20220318085 · 2022-10-06 ·

Detecting execution hazards in offloaded operations is disclosed. A second offload operation is compared to a first offload operation that precedes the second offload operation. It is determined whether the second offload operation creates an execution hazard on an offload target device based on the comparison of the second offload operation to the first offload operation. If the execution hazard is detected, an error handling operation may be performed. In some examples, the offload operations are processing-in-memory operations.

MULTI-ACCELERATOR COMPUTE DISPATCH

Techniques for executing computing work by a plurality of chiplets are provided. The techniques include assigning workgroups of a kernel dispatch packet to the chiplets; by each chiplet, executing the workgroups assigned to that chiplet; for each chiplet, upon completion of all workgroups assigned to that chiplet for the kernel dispatch packet, notifying the other chiplets of such completion; and upon completion of all workgroups of the kernel dispatch packet, notifying a client of such completion and proceeding to a subsequent kernel dispatch packet.