G06F9/522

Sync groupings
11550639 · 2023-01-10 · ·

A work accelerator is connected to a gateway. The gateway enables the transfer of data to the work accelerator from an external storage at pre-compiled data synchronisation points attained by the work accelerator. The work accelerator is configured to send to a register of the gateway an indication of a sync group comprising the gateway. The work accelerator then sends to the gateway, a synchronisation request for a synchronisation to be performed at an upcoming pre-compiled data exchange synchronisation point. The sync propagation circuits are each configured to receive at least one synchronisation request and propagate or acknowledge the synchronisation request in dependence upon the indication of the sync group received from the work accelerator.

Continuation analysis tasks for GPU task scheduling

Systems, apparatuses, and methods for implementing continuation analysis tasks (CATs) are disclosed. In one embodiment, a system implements hardware acceleration of CATs to manage the dependencies and scheduling of an application composed of multiple tasks. In one embodiment, a continuation packet is referenced directly by a first task. When the first task completes, the first task enqueues a continuation packet on a first queue. The first task can specify on which queue to place the continuation packet. The agent responsible for the first queue dequeues and executes the continuation packet which invokes an analysis phase which is performed prior to determining which dependent tasks to enqueue. If it is determined during the analysis phase that a second task is now ready to be launched, the second task is enqueued on one of the queues. Then, an agent responsible for this queue dequeues and executes the second task.

Information processing apparatus, information processing system, and non-transitory computer-readable storage medium for storing communication management program
11544118 · 2023-01-03 · ·

One embodiment provides an information processing apparatus effective to execute a parallel job in coordination with other information processing apparatuses. In an example, the information processing apparatus includes: a memory configured to store computer readable instructions; and a processor configured to execute the computer readable instructions sored in the memory, the computer readable instructions including: providing an instruction to issue barrier communication of error information; and propagating the error information to each of the other information processing apparatuses based on the instruction for the barrier communication.

Control of Data Sending from a Multi-Processor Device
20220414040 · 2022-12-29 ·

A method for controlling the sending of data by a plurality of processors belonging to a device, the method comprising: sending a first message to a first processor of the plurality of processors to grant permission to the first processor of the plurality of processors to send a first set of data packets over at least one external interface of the device; receiving from the first processor, an identifier of a second processor of the plurality of processors; and in response to receipt of the identifier of the second processor, send a second message to the second processor to grant permission to the second processor to send a second set of data packets over the at least one external interface.

BARRIER STATE SAVE AND RESTORE FOR PREEMPTION IN A GRAPHICS ENVIRONMENT

An apparatus to facilitate barrier state save and restore for preemption in a graphics environment is disclosed. The apparatus includes processing resources to execute a plurality of execution threads that are comprised in a thread group (TG) and mid-thread preemption barrier save and restore hardware circuitry to: initiate an exception handling routine in response to a mid-thread preemption event, the exception handling routine to cause a barrier signaling event to be issued; receive indication of a valid designated thread status for a thread of a thread group (TG) in response to the barrier signaling event; and in response to receiving the indication of the valid designated thread status for the thread of the TG, cause, by the thread of the TG having the valid designated thread status, a barrier save routine and a barrier restore routine to be initiated for named barriers of the TG.

SYNCHRONIZATION BARRIER

Apparatuses, systems, and techniques to implement a barrier operation. In at least one embodiment, a memory barrier operation causes accesses to memory by a plurality of groups of threads to occur in an order indicated by the memory barrier operation.

Scheduling tasks using swap flags

A method of activating scheduling instructions within a parallel processing unit is described. The method comprises decoding, in an instruction decoder, an instruction in a scheduled task in an active state and checking, by an instruction controller, if a swap flag is set in the decoded instruction. If the swap flag in the decoded instruction is set, a scheduler is triggered to de-activate the scheduled task by changing the scheduled task from the active state to a non-active state.

Techniques to generate execution schedules from neural network computation graphs
11531565 · 2022-12-20 · ·

Techniques are described for a compiler scheduling algorithm/routine that utilizes backtracking to generate an execution schedule for a neural network computation graph using a neural network compiler intermediate representation of hardware synchronization counters. The hardware synchronization counters may be referred to as physical barriers, hardware (HW) barriers, or barriers and their intermediate representations may be referred to as barrier tasks or barriers. Backtracking is utilized to prevent an available number of hardware barriers from being exceeded during performance of an execution schedule. An execution schedule may be a computation workload schedule for neural network inference applications. An execution schedule may also be a first in first out (FIFO) schedule.

ASYNCHRONOUS COMPLETION NOTIFICATION IN A MULTI-CORE DATA PROCESSING SYSTEM

Asynchronous completion notification is provided in a data processing system including one or more cores each executing one or more threads. A hardware unit of the data processing system receives and enqueues a request for processing and a source tag indicating at least a thread and core that issued the request. The hardware unit maintains a pointer to a completion area in a memory space. The completion area includes a completion granule for the hardware unit and thread. The hardware unit performs the processing requested by the request and computes an address of the completion granule based on the pointer and the source tag. The hardware unit then provides completion notification for the request by updating the completion granule with a value indicating a completion status.

Deterministic execution replay for multicore systems

Techniques are disclosed for interposing on nondeterministic events during multicore virtual machine (VM) execution to capture information that allows for deterministically recreating the nondeterministic events during execution replay of the VM. A method may include reading, by a virtual processor running within a multicore VM instance, an instruction to execute, and, responsive to a determination that the instruction is a nondeterministic instruction, interposing on the nondeterministic instruction execution so as to allow deterministic execution of the nondeterministic instruction during replay execution of the multicore VM instance. Interposing on the nondeterministic instruction execution may include recording a partial barrier event and/or a full barrier event. The nondeterministic instruction may be a read memory access instruction or a write memory access instruction.