G06F2209/509

Affinity-based Graphics Scheduling

Techniques are disclosed relating to affinity-based scheduling of graphics work. In disclosed embodiments, first and second groups of graphics processor sub-units may share respective first and second caches. Distribution circuitry may receive a software-specified set of graphics work and a software-indicated mapping of portions of the set of graphics work to groups of graphics processor sub-units. The distribution circuitry may assign subsets of the set of graphics work based on the mapping. This may improve cache efficiency, in some embodiments, by allowing graphics work that accesses the same memory areas to be assigned to the same group of sub-units that share a cache.

AI VIDEO PROCESSING METHOD AND APPARATUS
20230049578 · 2023-02-16 ·

The method comprises: connecting to a plurality of AI computing boards in an AI processing resource pool and a plurality of video encoding and decoding boards in a video processing resource pool by means of a unified high-speed interface; respectively allocating a specified number of AI computing boards and video encoding and decoding boards on account of resources and bandwidths required for completing a processing task to form a temporary cooperation relationship based on the processing task; in response to resource overflow or insufficiency in the AI processing resource pool or the video processing resource pool caused by a processing task change, accessing more AI computing boards or video encoding and decoding boards or stopping using redundant AI computing boards or video encoding and decoding boards; performing the processing task on account of the allocated AI computing boards or video encoding and decoding boards, and releasing the temporary cooperation relationship.

SYSTEMS AND METHODS FOR AI META-CONSTELLATION

System and method for device constellation according to certain embodiments. For example, a method for device constellation, the method includes the steps of: receiving a request, the request including a plurality of request parameters; decomposing the request into one or more tasks; selecting one or more edge devices based at least in part on the plurality of request parameters; assigning the one or more tasks to the one or more selected edge devices to cause the one or more selected edge devices to perform the one or more tasks; and receiving one or more task results from the one or more selected edge devices.

Software Control Techniques for Graphics Hardware that Supports Logical Slots
20230051906 · 2023-02-16 ·

Disclosed embodiments relate to software control of graphics hardware that supports logical slots. In some embodiments, a GPU includes circuitry that implements a plurality of logical slots and a set of graphics processor sub-units that each implement multiple distributed hardware slots. Control circuitry may determine mappings between logical slots and distributed hardware slots for different sets of graphics work. Various mapping aspects may be software-controlled. For example, software may specify one or more of the following: priority information for a set of graphics work, to retain the mapping after completion of the work, a distribution rule, a target group of sub-units, a sub-unit mask, a scheduling policy, to reclaim hardware slots from another logical slot, etc. Software may also query status of the work.

COORDINATING EXECUTION OF COMPUTING OPERATIONS FOR SOFTWARE APPLICATIONS
20230049332 · 2023-02-16 ·

A client-side system can include a service proxy that can receive a request to perform a computing operation from a web application that is executable in a web browser of the client-side system. The service proxy can determine if the computing operation is executable by a local execution module that is external to the web browser and local to the client-side system. The local execution module may be different from the web application and may be configured to execute one or more computing operations using computing resources local to the client-side system. If the computing operation is executable by a local execution module, the service proxy can transmit a communication to the local execution module for causing the local execution module to execute the computing operation.

Techniques for reconfiguring partitions in a parallel processing system

A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.

Software defined automation system and architecture

Embodiments of a software defined automation system that provides a reference architecture for designing, managing and maintaining a highly available, scalable and flexible automation system. In some embodiments, an SDA system can include a localized subsystem including a system controller node and multiple compute nodes. The multiple compute nodes can be communicatively coupled to the system controller node via a first communication network. The system controller node can manage the multiple compute nodes and virtualization of a control system on a compute node via the first communication network. The virtualized control system includes virtualized control system elements connected to a virtual network that is connected to a second communication network to enable the virtualized control system elements to control a physical control system element via the second communication network connected to the virtual network.

Scheduler for amp architecture with closed loop performance and thermal controller

Systems and methods are disclosed for scheduling threads on a processor that has at least two different core types, such as an asymmetric multiprocessing system. Each core type can run at a plurality of selectable voltage and frequency scaling (DVFS) states. Threads from a plurality of processes can be grouped into thread groups. Execution metrics are accumulated for threads of a thread group and fed into a plurality of tunable controllers for the thread group. A closed loop performance control (CLPC) system determines a control effort for the thread group and maps the control effort to a recommended core type and DVFS state. A closed loop thermal and power management system can limit the control effort determined by the CLPC for a thread group, and limit the power, core type, and DVFS states for the system. Deferred interrupts can be used to increase performance.

Systems and methods for configuring a watermark unit with watermark algorithms for a data processing accelerator

Embodiments of the disclosure relate to configuring a watermark unit with watermark algorithms for artificial intelligence (AI) models for a data processing (DP) accelerator. In one embodiment, in response to a request received by a DP accelerator, the request, sent by an application, to apply a watermark algorithm to an AI model by the DP accelerator, a system determines that the watermark algorithm is not available at a watermark unit of the DP accelerator. The system sends a request for the watermark algorithm. The system receives the watermark algorithm by the DP accelerator. The system configures the watermark unit at runtime with the watermark algorithm for the watermark algorithm to be used by the DP accelerator.

DATA TRANSMISSION METHOD AND APPARATUS
20230038051 · 2023-02-09 ·

A data transmission method and apparatus are provided. The data transmission method is applied to a computer system including at least two coprocessors, for example, including a first coprocessor and a second coprocessor. A shared memory is deployed between the first coprocessor and the second coprocessor, and is configured to store data generated when subtasks are separately executed. Further, the shared memory further stores a storage address of data generated when a subtask is executed, and a mapping relationship between each subtask and a coprocessor that executes the subtask. Therefore, a storage address of data to be read by the coprocessor may be found based on the mapping relationship, and the data may further be directly read from the shared memory without being copied by using a system bus. This improves efficiency of data transmission between the coprocessors.