G06F9/3877

Computing device and method

A computing device, comprising: a computing module, comprising one or more computing units; and a control module, comprising a computing control unit, and used for controlling shutdown of the computing unit of the computing module according to a determining condition. Also provided is a computing method. The computing device and method have the advantages of low power consumption and high flexibility, and can be combined with the upgrading mode of software, thereby further increasing the computing speed, reducing the computing amount, and reducing the computing power consumption of an accelerator.

Classification of synthetic data tasks and orchestration of resource allocation

Various techniques are described for classifying synthetic data tasks and orchestrating a resource allocation between groups of eligible resources for processing the synthetic data tasks. Received synthetic data tasks can be classified by identifying a task category and a corresponding group of eligible resources (e.g., processors) for processing synthetic data tasks in the task category. For example, synthetic data tasks can include generation of source assets, ingestion of source assets, identification of variation parameters, variation of variation parameters, and creation of synthetic data. Certain categories of synthetic data tasks can be classified for processing with a particular group of eligible resources. For example, tasks to ingest synthetic data assets can be classified for processing on a CPU only, while a task to create synthetic data assets can be classified for processing on a GPU only. The synthetic data tasks can be queued and routed for processing by an eligible resource.

Systems and methods for tone mapping of high dynamic range images for high-quality deep learning based processing
11544823 · 2023-01-03 · ·

Systems and methods for tone mapping of high dynamic range (HDR) images for high-quality deep learning based processing are disclosed. In one embodiment, a graphics processor includes a media pipeline to generate media requests for processing images and an execution unit to receive media requests from the media pipeline. The execution unit is configured to compute an auto-exposure scale for an image to effectively tone map the image, to scale the image with the computed auto-exposure scale, and to apply a tone mapping operator including a log function to the image and scaling the log function to generate a tone mapped image.

Accelerating AI training by an all-reduce process with compression over a distributed system

According to various embodiments, methods and systems are provided to accelerate artificial intelligence (AI) model training with advanced interconnect communication technologies and systematic zero-value compression over a distributed training system. According to an exemplary method, during each iteration of a Scatter-Reduce process performed on a cluster of processors arranged in a logical ring to train a neural network model, a processor receives a compressed data block from a prior processor in the logical ring, performs an operation on the received compressed data block and a compressed data block generated on the processor to obtain a calculated data block, and sends the calculated data block to a following processor in the logical ring. A compressed data block calculated from corresponding data blocks from the processors can be identified on each processor and distributed to each other processor and decompressed therein for use in the AI model training.

Zero knowledge proof hardware accelerator and the method thereof

A hardware accelerator for accelerating the zero knowledge succinct non-interactive argument of knowledge (zk-SNARK) protocol by reducing the computation time of the cryptographic verification is disclosed. The accelerator includes a zk-SNARK engine having one or more processing units running in parallel. The processing unit can include one or more multiply-accumulate operation (MAC) units, one or more fast Fourier transform (FFT) units; and one or more elliptic curve processor (ECP) units. The one or more ECP units are configured to reduce a bit-length of a scalar d.sub.i in an ECP algorithm used for generating a proof, thereby the cryptographic verification requires less computation power.

FPGA Cloud Platform Acceleration Resource Allocation Method, and System
20220413918 · 2022-12-29 ·

An FPGA cloud platform acceleration resource allocation method and a system. The method comprises: allocating and coordinating accelerator card resources according to delays between a host of a user and FPGA accelerator cards deployed at various network segments, and upon an FPGA usage request of the user, allocating an FPGA accelerator card in an FPGA resource pool that has a minimum delay to the host, thereby achieving acceleration resource allocation of an FPGA cloud platform; and a cloud monitoring management platform obtaining transmission delays to a virtual machine network according to different geographic locations of various FPGA cards in the FPGA resource pool, and allocating a card having a minimum delay to each user. In addition, the cloud monitoring management platform effectively prevents unauthorized users from accessing acceleration resources in the resource pool, thereby protecting the rights of pool owners. The invention effectively protects FPGA accelerator cards that are not authorized for users, and ensures that the card allocated to a user has a minimum network delay, thereby optimizing acceleration performance, and improving user experience.

SYNCHRONIZATION BARRIER

Apparatuses, systems, and techniques to implement a barrier operation. In at least one embodiment, a memory barrier operation causes accesses to memory by a plurality of groups of threads to occur in an order indicated by the memory barrier operation.

APPARATUSES, METHODS, AND SYSTEMS FOR INSTRUCTIONS FOR A HARDWARE ASSISTED HETEROGENEOUS INSTRUCTION SET ARCHITECTURE DISPATCHER

Systems, methods, and apparatuses to support instructions for a hardware assisted heterogeneous instruction set architecture dispatcher are described. In one embodiment, a hardware processor includes a plurality of processor cores comprising a first type of processor core that supports a first instruction set architecture and a second type of processor core that supports a second different instruction set architecture, a decoder circuit of a processor core of the plurality of processor cores to decode a single instruction into a decoded single instruction, the single instruction including a field that identifies a requested core type and an opcode that indicates an execution circuit of the processor core is to: read a register to determine a core type of the processor core, cause the processor core to enter a first mode, that only permits execution of the first instruction set architecture by the processor core, when the requested core type and the core type of the processor core are the first type, cause the processor core to enter a second mode, that only permits execution of the second different instruction set architecture by the processor core, when the requested core type and the core type of the processor core are the second type, cause the processor core to enter a third mode, that only permits execution of the first instruction set architecture by the processor core, when the requested core type is the second type and the core type of the processor core is the first type, and cause the processor core to enter a fourth mode, that only permits execution of the second different instruction set architecture by the processor core, when the requested core type is the first type and the core type of the processor core is the second type, and the execution circuit of the processor core to execute the decoded single instruction according to the opcode.

Device such as a connected object provided with means for checking the execution of a program executed by the device

The present invention relates to a device (1) such as a connected object comprising a first electronic circuit (2) comprising: a first processing unit (6) for executing a program, a first memory (8) for memorizing data during the execution of the program, a debug port (10) dedicated to checking the execution of the program from outside the first circuit,
a second electronic circuit (4) connected to the debug port (10), comprising: a second memory (14) memorizing reference data related to the program, a second processing unit (12) for implementing the following steps automatically and autonomously via the debug port (10): checking the integrity of the data memorized by the first memory (8) and/or the compliance of the program's execution by the first processing unit (6) with a reference execution, assisted by the reference data.

NEURAL NETWORK PROCESSING ASSIST INSTRUCTION

A first processor processes an instruction configured to perform a plurality of functions. The plurality of functions includes one or more functions to operate on one or more tensors. A determination is made of a function of the plurality of functions to be performed. The first processor provides to a second processor information related to the function. The second processor is to perform the function. The first processor and the second processor share memory providing memory coherence.