G06F9/5027

ACCELERATING TABLE LOOKUPS USING A DECOUPLED LOOKUP TABLE ACCELERATOR IN A SYSTEM ON A CHIP

In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

Affinity-based Graphics Scheduling

Techniques are disclosed relating to affinity-based scheduling of graphics work. In disclosed embodiments, first and second groups of graphics processor sub-units may share respective first and second caches. Distribution circuitry may receive a software-specified set of graphics work and a software-indicated mapping of portions of the set of graphics work to groups of graphics processor sub-units. The distribution circuitry may assign subsets of the set of graphics work based on the mapping. This may improve cache efficiency, in some embodiments, by allowing graphics work that accesses the same memory areas to be assigned to the same group of sub-units that share a cache.

SYSTEMS, METHODS, AND APPARATUS FOR ASSOCIATING COMPUTATIONAL DEVICE FUNCTIONS WITH COMPUTE ENGINES
20230052076 · 2023-02-16 ·

A method may include creating an association identifier based on an association between a computational device function and a compute engine of a computational device, and invoking an execute command to perform an execution of the computational device function using the compute engine, wherein the execute command uses the association identifier. The compute engine may be a first compute engine, and the association may be further between the computational device function and a second compute engine of the computational device. The execute command may perform an execution of the computational device function using the second compute engine. The execution of the computational device function using the first compute engine and the execution of the computational device function using the second compute engine may overlap. The execute command may include the association identifier. The creating the association identifier may include invoking a create association command.

CONFIGURABLE LOGIC PLATFORM WITH RECONFIGURABLE PROCESSING CIRCUITRY
20230046107 · 2023-02-16 · ·

An architecture for a load-balanced groups of multi-stage manycore processors shared dynamically among a set of software applications, with capabilities for destination task defined intra-application prioritization of inter-task communications (ITC), for architecture-based ITC performance isolation between the applications, as well as for prioritizing application task instances for execution on cores of manycore processors based at least in part on which of the task instances have available for them the input data, such as ITC data, that they need for executing.

METHOD FOR DATA PROCESSING, AND COMMUNICATION DEVICE

A method for data processing method and a communication device are provided. The method includes the following operations. First configuration information is acquired. The first configuration information is used for configuring N split modes and a jth part corresponding to an ith split mode among the N split modes. N is an integer greater than or equal to 1, i is greater than or equal to 1 and less than or equal to N, j is greater than or equal to 1 and less than or equal to M, and M is an integer greater than 1. The N split modes includes a split mode for splitting a data processing model into at least two sub-processing models by presetting a split position.

MOVEMENT OF TENSOR DATA DURING RESHAPE OPERATION

A method of performing a reshape operation specified in a reshape layer of a neural network model is described. The reshape operation reshapes an input tensor with an input tensor shape to an output tensor with an output tensor shape. The tensor data that has to be reshaped is directly routed between tile memories of the hardware accelerator in an efficient manner. This advantageously optimizes usage of memory space and allows any number and type of neural network models to be run on the hardware accelerator.

DATA LABELING SYSTEM AND METHOD, AND DATA LABELING MANAGER
20230048473 · 2023-02-16 ·

Embodiments of this application disclose a data labeling system and method, and a data labeling manager. The system includes a data labeling manager, a labeling model storage repository, and a basic computing unit storage repository. The data labeling manager receives a data labeling request, obtains a target basic computing unit, allocates a hardware resource to the target basic computing unit, establishes a target computing unit, obtains first storage path information of basic parameter data of a first labeling model, and sends the first storage path information to the target computing unit. The target computing unit obtains the basic parameter data of the to-be-used labeling model by using the first storage path information, combines a target model inference framework and the basic parameter data of the first labeling model to obtain the first labeling model, and labels to-be-labeled data by using the first labeling model.

SYSTEMS AND METHODS FOR AI META-CONSTELLATION

System and method for device constellation according to certain embodiments. For example, a method for device constellation, the method includes the steps of: receiving a request, the request including a plurality of request parameters; decomposing the request into one or more tasks; selecting one or more edge devices based at least in part on the plurality of request parameters; assigning the one or more tasks to the one or more selected edge devices to cause the one or more selected edge devices to perform the one or more tasks; and receiving one or more task results from the one or more selected edge devices.

APPLICATION USER JOURNEY MANAGEMENT
20230049219 · 2023-02-16 · ·

An application activation method includes enabling an activation of one or more applications, including an activation of a first application, on a computing device. A first plurality of interactions of a user with the one or more applications on the computing device are detected. A first offer to renew the activation of the first application is generated based on the first plurality of interactions of the user. The first offer is provided to the user via the computing device. An acceptance of the first offer is received from the user, and the activation of the first application is renewed responsive to receiving the acceptance of the first offer.

COORDINATING EXECUTION OF COMPUTING OPERATIONS FOR SOFTWARE APPLICATIONS
20230049332 · 2023-02-16 ·

A client-side system can include a service proxy that can receive a request to perform a computing operation from a web application that is executable in a web browser of the client-side system. The service proxy can determine if the computing operation is executable by a local execution module that is external to the web browser and local to the client-side system. The local execution module may be different from the web application and may be configured to execute one or more computing operations using computing resources local to the client-side system. If the computing operation is executable by a local execution module, the service proxy can transmit a communication to the local execution module for causing the local execution module to execute the computing operation.