G06F9/50

Logical Slot to Hardware Slot Mapping for Graphics Processors

Disclosed techniques relate to work distribution in graphics processors. In some embodiments, an apparatus includes circuitry that implements a plurality of logical slots and a set of graphics processor sub-units that each implement multiple distributed hardware slots. The circuitry may determine different distribution rules for first and second sets of graphics work and map logical slots to distributed hardware slots based on the distribution rules. In various embodiments, disclosed techniques may advantageously distribute work efficiently across distributed shader processors for graphics kicks of various sizes.

ACCELERATING TABLE LOOKUPS USING A DECOUPLED LOOKUP TABLE ACCELERATOR IN A SYSTEM ON A CHIP

In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

NOISY-NEIGHBOR DETECTION AND REMEDIATION

Noisy-neighbor detection and remediation is provided by performing real-time monitoring of workload processing and associated resource consumption of application components that use shared resource(s) of a computing environment, determining workload and shared resource consumption patterns for each of the application components, for each application, of a plurality of applications, that includes at least one application component of the application components, correlating the determined workload and shared resource consumption patterns of each of those application component(s) and determining a correlated shared resource usage pattern for that application, performing impact analysis to determine impact of the applications on each other, and identifying noisy-neighbor(s) that use the one or more shared resources and automatically raising an alert indicating those noisy-neighbor(s).

Affinity-based Graphics Scheduling

Techniques are disclosed relating to affinity-based scheduling of graphics work. In disclosed embodiments, first and second groups of graphics processor sub-units may share respective first and second caches. Distribution circuitry may receive a software-specified set of graphics work and a software-indicated mapping of portions of the set of graphics work to groups of graphics processor sub-units. The distribution circuitry may assign subsets of the set of graphics work based on the mapping. This may improve cache efficiency, in some embodiments, by allowing graphics work that accesses the same memory areas to be assigned to the same group of sub-units that share a cache.

SYSTEMS, METHODS, AND APPARATUS FOR ASSOCIATING COMPUTATIONAL DEVICE FUNCTIONS WITH COMPUTE ENGINES
20230052076 · 2023-02-16 ·

A method may include creating an association identifier based on an association between a computational device function and a compute engine of a computational device, and invoking an execute command to perform an execution of the computational device function using the compute engine, wherein the execute command uses the association identifier. The compute engine may be a first compute engine, and the association may be further between the computational device function and a second compute engine of the computational device. The execute command may perform an execution of the computational device function using the second compute engine. The execution of the computational device function using the first compute engine and the execution of the computational device function using the second compute engine may overlap. The execute command may include the association identifier. The creating the association identifier may include invoking a create association command.

CONFIGURABLE LOGIC PLATFORM WITH RECONFIGURABLE PROCESSING CIRCUITRY
20230046107 · 2023-02-16 · ·

An architecture for a load-balanced groups of multi-stage manycore processors shared dynamically among a set of software applications, with capabilities for destination task defined intra-application prioritization of inter-task communications (ITC), for architecture-based ITC performance isolation between the applications, as well as for prioritizing application task instances for execution on cores of manycore processors based at least in part on which of the task instances have available for them the input data, such as ITC data, that they need for executing.

RESOURCE PROVISIONING SYSTEMS AND METHODS
20230046201 · 2023-02-16 ·

A method for a first set of processors and a second set of processors comprises, the first set of processors processing a set of queries, as a result of a change in utilization of the first set of processors, processing the set of queries using the second set of processors. The change in processors is independent of a change in storage resources, the storage resources shared by the first set of processors and the second set of processors.

METHOD FOR DATA PROCESSING, AND COMMUNICATION DEVICE

A method for data processing method and a communication device are provided. The method includes the following operations. First configuration information is acquired. The first configuration information is used for configuring N split modes and a jth part corresponding to an ith split mode among the N split modes. N is an integer greater than or equal to 1, i is greater than or equal to 1 and less than or equal to N, j is greater than or equal to 1 and less than or equal to M, and M is an integer greater than 1. The N split modes includes a split mode for splitting a data processing model into at least two sub-processing models by presetting a split position.

AI VIDEO PROCESSING METHOD AND APPARATUS
20230049578 · 2023-02-16 ·

The method comprises: connecting to a plurality of AI computing boards in an AI processing resource pool and a plurality of video encoding and decoding boards in a video processing resource pool by means of a unified high-speed interface; respectively allocating a specified number of AI computing boards and video encoding and decoding boards on account of resources and bandwidths required for completing a processing task to form a temporary cooperation relationship based on the processing task; in response to resource overflow or insufficiency in the AI processing resource pool or the video processing resource pool caused by a processing task change, accessing more AI computing boards or video encoding and decoding boards or stopping using redundant AI computing boards or video encoding and decoding boards; performing the processing task on account of the allocated AI computing boards or video encoding and decoding boards, and releasing the temporary cooperation relationship.

DATA LABELING SYSTEM AND METHOD, AND DATA LABELING MANAGER
20230048473 · 2023-02-16 ·

Embodiments of this application disclose a data labeling system and method, and a data labeling manager. The system includes a data labeling manager, a labeling model storage repository, and a basic computing unit storage repository. The data labeling manager receives a data labeling request, obtains a target basic computing unit, allocates a hardware resource to the target basic computing unit, establishes a target computing unit, obtains first storage path information of basic parameter data of a first labeling model, and sends the first storage path information to the target computing unit. The target computing unit obtains the basic parameter data of the to-be-used labeling model by using the first storage path information, combines a target model inference framework and the basic parameter data of the first labeling model to obtain the first labeling model, and labels to-be-labeled data by using the first labeling model.