G06F9/3887

EXCEPTION HANDLING FOR DEBUGGING IN A GRAPHICS ENVIRONMENT

An apparatus to facilitate exception handling for debugging in a graphics environment is disclosed. The apparatus includes load store pipeline hardware circuitry to: in response to a page fault exception being enabled for a memory access request received from a thread of the plurality of threads, allocate a memory dependency token correlated to a scoreboard identifier (SBID) that is included with the memory access request; send, to memory fabric of the graphics processor, the memory access request comprising the memory dependency token; receive, from the memory fabric in response to the memory access request, a memory access response comprising the memory dependency token and indicating occurrence of a page fault error condition and fault details associated with the page fault error condition; and return the SBID associated with the memory access response and fault details of the page fault error condition to a debug register of the thread.

DYNAMICALLY SCALABLE AND PARTITIONED COPY ENGINE

An apparatus to facilitate a dynamically scalable and partitioned copy engine is disclosed. The apparatus includes a processor comprising copy engine hardware circuitry to facilitate copying surface data in memory and comprising: a plurality of copy front-end hardware circuitry to generate a plurality of surface data sub-blocks, wherein a number of the plurality of copy front-end hardware circuitry corresponds to a number of partitions configured for the processor, with each partition associated with a single copy front-end hardware circuitry; a plurality of copy back-end hardware circuitry to operate in parallel to process the plurality of surface data sub-blocks to perform memory accesses, wherein subsets of the plurality of copy back-end hardware circuitry are each associated with the single copy front-end hardware circuitry associated with each partition; and a connectivity matrix hardware circuitry to communicably connect the plurality of copy front-end hardware circuitry to the plurality of copy back-end hardware circuitry.

BARRIER STATE SAVE AND RESTORE FOR PREEMPTION IN A GRAPHICS ENVIRONMENT

An apparatus to facilitate barrier state save and restore for preemption in a graphics environment is disclosed. The apparatus includes processing resources to execute a plurality of execution threads that are comprised in a thread group (TG) and mid-thread preemption barrier save and restore hardware circuitry to: initiate an exception handling routine in response to a mid-thread preemption event, the exception handling routine to cause a barrier signaling event to be issued; receive indication of a valid designated thread status for a thread of a thread group (TG) in response to the barrier signaling event; and in response to receiving the indication of the valid designated thread status for the thread of the TG, cause, by the thread of the TG having the valid designated thread status, a barrier save routine and a barrier restore routine to be initiated for named barriers of the TG.

SYNCHRONIZATION BARRIER

Apparatuses, systems, and techniques to implement a barrier operation. In at least one embodiment, a memory barrier operation causes accesses to memory by a plurality of groups of threads to occur in an order indicated by the memory barrier operation.

EFFICIENT COMPLEX MULTIPLY AND ACCUMULATE
20220413804 · 2022-12-29 ·

Two commands each perform a partial complex multiply and accumulate. By using these two commands together, a full complex multiply and accumulate operation is performed. As compared to traditional implementations, this reduces the number of commands used from eight (four multiplies, a subtraction and three adds) to two. In some example embodiments, a single-instruction/multiple-data (SIMD) architecture is used to enable each command to perform multiple partial complex multiply and accumulate operations simultaneously, further increasing efficiency. One application of a complex multiply and accumulate is in generating images from pulse data of a radar or lidar. For example, an image may be generated from a synthetic aperture radar (SAR) on an autonomous vehicle (e.g., a drone). The image may be provided to a trained machine learning model that generates an output. Based on the output, inputs to control circuits of the autonomous vehicle are generated.

PROCESSING DEVICE AND METHOD OF USING A REGISTER CACHE
20220413858 · 2022-12-29 · ·

A processing device is provided which comprises memory, a plurality of registers and a processor. the processor is configured to execute a plurality of portions of a program, allocate a number of the registers per portion of the program such that a number of remaining registers are available as a register cache and transfer data between the number of registers, which are allocated per portion of the program, and the register cache. The processor loads data to the allocated registers to execute a portion of the program, stores data, resulting from execution of the portion, in the register cache, reloads the data in the allocated registers and executes another portion of the program using the data reloaded to the allocated registers and A called function uses the number of allocated registers, which is less than an architectural limit of registers allocated per portion of the program.

Control flow mechanism for execution of graphics processor instructions using active channel packing

An apparatus to facilitate control flow in a graphics processing system is disclosed. The apparatus includes logic a plurality of execution units to execute single instruction, multiple data (SIMD) and flow control logic to detect a diverging control flow in a plurality of SIMD channels and reduce the execution of the control flow to a subset of the SIMD channels.

NEURAL NETWORK PROCESSING ASSIST INSTRUCTION

A first processor processes an instruction configured to perform a plurality of functions. The plurality of functions includes one or more functions to operate on one or more tensors. A determination is made of a function of the plurality of functions to be performed. The first processor provides to a second processor information related to the function. The second processor is to perform the function. The first processor and the second processor share memory providing memory coherence.

GRAPHICS PROCESSING
20220391216 · 2022-12-08 ·

There is disclosed an instruction that can be included into a graphics processor shader program to be executed by a group of execution threads that when executed will cause a group of execution lanes to be in an ‘active’ (e.g. SIMD) execution state in which active state processing operations can be performed using the group of plural execution lanes together. The processing operations can then be performed using the execution lanes in the active state together. The execution lanes are then allowed or caused to return to their prior execution state once the processing operations have finished.

PARALLEL PROCESSING IN A SPIKING NEURAL NETWORK
20220383080 · 2022-12-01 ·

The disclosed embodiments are related to storing critical data in a memory device such as Flash or DRAM memory device. In one embodiment, a device comprising a plurality of parallel processors is disclosed, the plurality of parallel processors configured to: perform a search and match operation, the search and match operation loading a plurality of synaptic identifier bit strings and a plurality of spike identifier bit strings, the search and match operation further generating a plurality of bitmasks; perform a synaptic integration phase, the synaptic integration phase generating a plurality of synaptic current vectors based on the plurality of bitmasks, the synaptic current vectors associated with respective synthetic neurons; solve a neural membrane equation for each of the synthetic neurons; and update membrane potentials associated with the synthetic neurons, the membrane potentials stored in a memory device.