G06F9/5066

Method for designing an application task architecture of an electronic control unit with one or more virtual cores

Disclosed is a method for designing an application task architecture for an electronic control unit based on an AUTOSAR operating system that is adaptable to a plurality of microcontrollers. Prior to association with a microcontroller, the method involves developing the application task architecture by using at least one virtual core different from the one or more cores of the microcontroller, the various tasks being assigned respectively to the at least one virtual core, and associating the at least one virtual core with the one or more cores of the microcontroller so as to allocate tasks assigned to the at least one virtual core to the core or among the cores of the microcontroller.

Graphics processors and graphics processing units having dot product accumulate instruction for hybrid floating point format

Described herein is a graphics processing unit (GPU) comprising a first processing cluster to perform parallel processing operations, the parallel processing operations including a ray tracing operation and a matrix multiply operation; and a second processing cluster coupled to the first processing cluster, wherein the first processing cluster includes a floating-point unit to perform floating point operations, the floating-point unit is configured to process an instruction using a bfloat16 (BF16) format with a multiplier to multiply second and third source operands while an accumulator adds a first source operand with output from the multiplier.

DYNAMIC PLACEMENT OF COMPUTATION SUB-GRAPHS
20230237375 · 2023-07-27 ·

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for assigning operations of a computational graph to a plurality of computing devices are disclosed. Data characterizing a computational graph is obtained. Context information for a computational environment in which to perform the operations of the computational graph is received. A model input is generated, which includes at least the context information and the data characterizing the computational graph. The model input is processed using the machine learning model to generate an output defining placement assignments of the operations of the computational graph to the plurality of computing devices. The operations of the computational graph are assigned to the plurality of computing device according to the defined placement assignments.

MULTI-DOMAIN CONVOLUTIONAL NEURAL NETWORK

In one embodiment, an apparatus comprises a memory and a processor. The memory is to store visual data associated with a visual representation captured by one or more sensors. The processor is to: obtain the visual data associated with the visual representation captured by the one or more sensors, wherein the visual data comprises uncompressed visual data or compressed visual data; process the visual data using a convolutional neural network (CNN), wherein the CNN comprises a plurality of layers, wherein the plurality of layers comprises a plurality of filters, and wherein the plurality of filters comprises one or more pixel-domain filters to perform processing associated with uncompressed data and one or more compressed-domain filters to perform processing associated with compressed data; and classify the visual data based on an output of the CNN.

STREAM COMPUTING JOB PROCESSING METHOD, STREAM COMPUTING SYSTEM AND ELECTRONIC DEVICE

This application discloses a stream computing job processing method, a stream computing system and an electronic device, and the method includes: obtaining a stream computing job; running the stream computing job in a process-based manner, where the stream computing job includes at least one process.

DISTRIBUTED ACCELERATOR

Systems, methods, and devices are described coordinating a distributed accelerator. A command that includes instructions for performing a task is received. One or more sub-tasks of the task are determined to generate a set of sub-tasks. For each sub-task of the set of sub-tasks, an accelerator slice of a plurality of accelerator slices of a distributed accelerator is allocated, sub-task instructions for performing the sub-task are determined. Sub-task instructions are transmitted to the allocated accelerator slice for each sub-task. Each allocated accelerator slice is configured to generate a corresponding response indicative of the allocated accelerator slice having completed a respective sub-task. In a further example aspect, corresponding responses are received from each allocated accelerator slice and a coordinated response indicative of the corresponding responses is generated.

MODEL COORDINATION METHOD AND APPARATUS

A model coordination method for a first device is provided. The first device stores at least one model segment. The at least one model segment is configured to realize a part of functions of a preset model. The method includes: determining a first model segment from the at least one model segment stored in the first device, wherein when the first model segment is executed and a second model segment is executed by a second device, a part of or all the functions of the preset model are realized, the second model segment is one of at least one model segment stored in the second device, and the at least one model segment stored in the second device is configured to realize a part of the functions of the preset model. A model coordination apparatus is also provided.

Scheduling method and related apparatus

Disclosed are a scheduling method and a related apparatus. A computing apparatus in a server can be chosen to implement a computation request, thereby improving the running efficiency of the server.

Deep learning heterogeneous computing method based on layer-wide memory allocation and system thereof

A deep learning heterogeneous computing method based on layer-wide memory allocation, at least comprises steps of: traversing a neural network model so as to acquire a training operational sequence and a number of layers L thereof; calculating a memory room R.sub.1 required by data involved in operation at the i.sup.th layer of the neural network model under a double-buffer configuration, where 1≤i≤L; altering a layer structure of the i.sup.th layer and updating the training operational sequence; distributing all the data across a memory room of the CPU and the memory room of the GPU according to a data placement method; performing iterative computation at each said layer successively based on the training operational sequence so as to complete neural network training.

TASK SCHEDULING METHOD AND APPARATUS
20230025917 · 2023-01-26 ·

A task scheduling method and an apparatus that belongs to the field of intelligent vehicles is provided. The method may be applied to an embedded device using AUTomotive Open System Architecture (AUTOSAR), the embedded device includes a memory and a processor, the memory stores an interface function, and a first software component and a second software component are deployed in the processor. In this solution, registration information of a to-be-deployed algorithm may be obtained and parsed by using the interface function, and a task in the algorithm may be scheduled and executed by using the software component.