Patent classifications
G06F2209/509
Hardware Accelerator Service Aggregation
The present disclosure includes systems, methods, and computer-readable mediums for discovering capabilities of local and remote hardware (HW) accelerator cards. A local hardware (HW) accelerator card may provide, via a communication interface, a listing of acceleration services from the local HW accelerator card. The listing of acceleration services may include a first set of acceleration services provided by one or more accelerators of the local HW accelerator card and a second set of acceleration services provided by one or more accelerators of a remote HW accelerator card. A workload instruction defining a workload for processing by at least one of the acceleration services of the second set of acceleration services may be received from a processor of a computing device. The workload instruction may be forwarded to the remote HW accelerator card.
TASK PROCESSING METHOD AND APPARATUS
A task processing apparatus and a task processing method are provided. The task processing apparatus is coupled to a host apparatus, and includes: a controller configured to query whether there is a data processing task to be executed and trigger execution of the data processing task; at least one data processing engine configured to process operation data corresponding to the data processing task according to a configured working mode, and generate a data processing result; and at least one scheduler configured to: receive a task descriptor of the data processing task from the host apparatus; configure the working mode of the data processing engine based on the task descriptor; control transmission of the operation data corresponding to the data processing task from the host apparatus to the data processing engine; and control transmission of the data processing result from the data processing engine to the host apparatus.
Building and scheduling tasks for parallel processing
Logic includes a task builder for building tasks comprising data items, a task scheduler for scheduling tasks for processing by a parallel processor, a data store arranged to map content of each data item to an item ID, and a linked-list RAM comprising an entry for each item ID. For each new data item, the task builder creates a new task by starting a new linked list, or adds the data item to an existing linked list. In each linked list, the entry for each data item records a pointer to a next item ID in the list. The task builder indicates when any of the tasks is ready for scheduling. The task scheduler identifies a ready task based on the indication from the task builder, and in response follows the pointers in the respective linked list in order to schedule the data items of the task for processing.
LARGE DEEP LEARNING MODEL TRAINING METHOD AND SYSTEM, DEVICE AND MEDIUM
A deep learning model training method and system, a device, and a storage medium, includes performing the following steps on each topological layer: arranging tensors in an ascending order according to series numbers of topological layers where the tensors are required; sequentially moving the tensors to a Graphics Processing Unit (GPU) according to the arrangement, and determining whether a sum of the tensors already moved to the GPU exceeds a threshold; in response to the fact that the sum of the tensors already moved to the GPU exceeds the threshold, moving the excess part to a Central Processing Unit (CPU), and determining whether the current topological layer is a last topological layer; and in response to the fact that the current topological layer is the last topological layer, correcting a tensor with a positional anomaly.
TECHNIQUES FOR PARTITIONING NEURAL NETWORKS
Apparatuses, systems, and techniques to partition neural networks. In at least one embodiment, one or more circuits are to cause one or more neural networks to be dynamically partitioned based, at least in part, on one or more performance metrics of the one or more neural networks.
SYSTEMS AND METHODS FOR DISAGGREGATED ACCELERATION OF ARTIFICIAL INTELLIGENCE OPERATIONS
A disclosed system may include a disaggregated artificial intelligence (AI) operation accelerator including a dense AI operation accelerator configured to accelerate dense AI operations and a sparse AI operation accelerator, physically separate from the dense AI operation accelerator, configured to accelerate sparse AI operations. The system may also include a scheduler that includes (1) a receiving module that receives an AI operation, (2) an identifying module that identifies the AI operation as a dense AI operation or sparse AI operation, and (3) a directing module that directs (a) the dense AI operation accelerator to accelerate identified dense AI operations, and (b) the sparse AI operation accelerator to accelerate identified sparse AI operations. The system may also include a physical processor that executes the receiving module, the identifying module, and the directing module. Various other methods, systems, and computer-readable media are also disclosed.
System for allocating task processing between an IoT device and an edge device
Methods and systems are disclosed for allocating tasks between apparatus in an IoT system in a manner to generally minimize the total amount of time to execute the tasks. At least one embodiment includes a computer-implemented method for allocating task processing between an internet of things (IoT) device and an edge device. The computer-implemented method includes collecting data from one or more sensors to execute a task having data size Xt; predicting a space complexity data size Xc for the task based on data size Xt, and allocating data for processing between the IoT device and edge device as a function of Xc. In at least one embodiment, the space complexity data size Xc is determined by applying Xt to the input of a long short-term memory neural network.
Enabling stateless accelerator designs shared across mutually-distrustful tenants
An apparatus to facilitate enabling stateless accelerator designs shared across mutually-distrustful tenants is disclosed. The apparatus includes a fully-homomorphic encryption (FHE)-capable circuitry to establish a secure session with a trusted environment executing on a host device communicably coupled to the apparatus; generate, as part of establishing the secure session, per-tenant FHE keys for each tenant utilizing the FHE-capable circuitry, the per-tenant FHE keys utilized to encrypt tenant data provided to an FHE-capable compute kernel of the FHE-capable circuitry; process tenant data that is in an FHE-encrypted format encrypted with a per-tenant FHE key of the per-tenant FHE keys; and store the tenant data that is in the FHE-encrypted format encrypted with the per-tenant FHE key of the per-tenant FHE keys.
Systems and methods for facilitating scalable shared rendering
A system for facilitating scalable shared rendering, including plurality of servers communicably coupled to each other, each server executing executable instance of rendering software, being communicably coupled to display apparatus(/es), wherein when executed, rendering software causes each server to receive information indicative of poses of users of display apparatus(/es), utilise three-dimensional model(/s) of extended-reality environment to generate images from poses, send images to respective display apparatus(/es) for display, wherein at least one of plurality of servers is configured to detect when total number of display apparatuses to be served exceeds predefined threshold number, and employ new server and execute new executable instance of rendering software when predefined threshold number is exceeded, wherein new display apparatuses are served by new server, thereby facilitating scalable shared rendering.
On-Demand Access to Compute Resources
Disclosed are systems, methods and computer-readable media for controlling and managing the identification and provisioning of resources within an on-demand center as well as the transfer of workload to the provisioned resources. One aspect involves creating a virtual private cluster within the on-demand center for the particular workload from a local environment. A method of managing resources between a local compute environment and an on-demand environment includes detecting an event associated with a local compute environment and based on the detected event, identifying information about the local environment, establishing communication with an on-demand compute environment and transmitting the information about the local environment to the on-demand compute environment, provisioning resources within the on-demand compute environment to substantially duplicate the local environment and transferring workload from the local-environment to the on-demand compute environment. The event can be a threshold or a triggering event within or outside of the local environment.