Patent classifications
G06F9/5066
Logical Slot to Hardware Slot Mapping for Graphics Processors
Disclosed techniques relate to work distribution in graphics processors. In some embodiments, an apparatus includes circuitry that implements a plurality of logical slots and a set of graphics processor sub-units that each implement multiple distributed hardware slots. The circuitry may determine different distribution rules for first and second sets of graphics work and map logical slots to distributed hardware slots based on the distribution rules. In various embodiments, disclosed techniques may advantageously distribute work efficiently across distributed shader processors for graphics kicks of various sizes.
Affinity-based Graphics Scheduling
Techniques are disclosed relating to affinity-based scheduling of graphics work. In disclosed embodiments, first and second groups of graphics processor sub-units may share respective first and second caches. Distribution circuitry may receive a software-specified set of graphics work and a software-indicated mapping of portions of the set of graphics work to groups of graphics processor sub-units. The distribution circuitry may assign subsets of the set of graphics work based on the mapping. This may improve cache efficiency, in some embodiments, by allowing graphics work that accesses the same memory areas to be assigned to the same group of sub-units that share a cache.
SYSTEMS, METHODS, AND APPARATUS FOR ASSOCIATING COMPUTATIONAL DEVICE FUNCTIONS WITH COMPUTE ENGINES
A method may include creating an association identifier based on an association between a computational device function and a compute engine of a computational device, and invoking an execute command to perform an execution of the computational device function using the compute engine, wherein the execute command uses the association identifier. The compute engine may be a first compute engine, and the association may be further between the computational device function and a second compute engine of the computational device. The execute command may perform an execution of the computational device function using the second compute engine. The execution of the computational device function using the first compute engine and the execution of the computational device function using the second compute engine may overlap. The execute command may include the association identifier. The creating the association identifier may include invoking a create association command.
METHOD FOR DATA PROCESSING, AND COMMUNICATION DEVICE
A method for data processing method and a communication device are provided. The method includes the following operations. First configuration information is acquired. The first configuration information is used for configuring N split modes and a jth part corresponding to an ith split mode among the N split modes. N is an integer greater than or equal to 1, i is greater than or equal to 1 and less than or equal to N, j is greater than or equal to 1 and less than or equal to M, and M is an integer greater than 1. The N split modes includes a split mode for splitting a data processing model into at least two sub-processing models by presetting a split position.
Software Control Techniques for Graphics Hardware that Supports Logical Slots
Disclosed embodiments relate to software control of graphics hardware that supports logical slots. In some embodiments, a GPU includes circuitry that implements a plurality of logical slots and a set of graphics processor sub-units that each implement multiple distributed hardware slots. Control circuitry may determine mappings between logical slots and distributed hardware slots for different sets of graphics work. Various mapping aspects may be software-controlled. For example, software may specify one or more of the following: priority information for a set of graphics work, to retain the mapping after completion of the work, a distribution rule, a target group of sub-units, a sub-unit mask, a scheduling policy, to reclaim hardware slots from another logical slot, etc. Software may also query status of the work.
Techniques for reconfiguring partitions in a parallel processing system
A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.
Scheduling artificial intelligence model partitions based on reversed computation graph
Techniques are disclosed for scheduling artificial intelligence model partitions for execution in an information processing system. For example, a method comprises the following steps. An intermediate representation of an artificial intelligence model is obtained. A reversed computation graph corresponding to a computation graph generated based on the intermediate representation is obtained. Nodes in the reversed computation graph represent functions related to the artificial intelligence model, and one or more directed edges in the reversed computation graph represent one or more dependencies between the functions. The reversed computation graph is partitioned into sequential partitions, such that the partitions are executed sequentially and functions corresponding to nodes in each partition are executed in parallel.
Method and system for performing parallel computations to generate multiple output feature maps
Systems and methods for performing parallel computation are disclosed. The system can include: a task manager; and a plurality of cores coupled with the task manager and configured to respectively perform a set of parallel computation tasks based on instructions from the task manager, wherein each of the plurality of cores further comprises: a processing unit configured to generate a first output feature map corresponding to a first computation task among the set of parallel computation tasks; an interface configured to receive one or more instructions from the task manager to collect external output feature maps corresponding to the set of parallel computation tasks from other cores of the plurality of cores; a reduction unit configured to generate a reduced feature map based on the first output feature map and received external output feature maps.
Automation system and method
A computer-implemented method, computer program product and computing system for receiving a complex task; processing the complex task to define a plurality of discrete tasks each having a discrete goal; executing the plurality of discrete tasks on a plurality of machine-accessible public computing platforms; determining if any of the plurality of discrete tasks failed to achieve its discrete goal; and if a specific discrete task failed to achieve its discrete goal, defining a substitute discrete task having a substitute discrete goal.
Data Re-Encryption For Software Applications
Some embodiments provide a non-transitory machine-readable medium that stores a program. The program receives a request to execute a task for re-encrypting a set of data associated with an application that has been encrypted with a first encryption key. The task is for re-encrypting the set of data using a second encryption key. The program further determines an amount of work to complete the task. The program also divides the task into a set of subtasks based on the amount of work. The program further assigns each subtask in the set of subtasks to a node in a plurality of nodes for execution of the subtask. The plurality of nodes are configured to implement the application.