G06F8/458

CODE INSPECTION METHOD UNDER WEAK MEMORY ORDERING ARCHITECTURE AND CORRESPONDING DEVICE
20240045787 · 2024-02-08 ·

This application discloses a code inspection method, including: obtaining first source code and test code, where the first source code includes a blocking mark, the first source code corresponds to a plurality of threads, the plurality of threads have at least one shared memory; generating, based on the first source code and the condition indicated by the test code, a plurality of execution flows under the weak memory ordering architecture for the plurality of threads that operate a same shared memory, where a busy-wait loop exists in a target execution flow that includes the blocking mark and that is in the plurality of execution flows; and determining the busy-wait loop in the target execution flow as an infinite loop if a read operation executed by a target thread in one iteration of the busy-wait loop cannot reference an unreferenced write operation.

Synchronization mechanisms for a multi-core processor using wait commands having either a blocking or a non-blocking state
11892972 · 2024-02-06 · ·

Systems, apparatuses and methods suitable for optimizing synchronization mechanisms for multi-core processors are provided. The synchronizing mechanisms may be optimized by receiving a command stream which comprises a plurality of commands including one or more wait commands, wherein each wait command has an associated state and one or more associated conditions; sequentially processing each command in the command stream until a wait command is reached; checking the state associated with the wait command to be processed, wherein if said state is a blocking state, further processing of commands in the command stream is paused until each of said wait command's associated conditions are met, and wherein if said state is a non-blocking state, the next command in the command stream is retrieved and processed.

Synchronization of execution threads on a multi-threaded processor
10481911 · 2019-11-19 · ·

Method and apparatus are provided for synchronizing execution of a plurality of threads on a multi-threaded processor. A program executed by a thread can have a number of synchronization points corresponding to points where execution is to be synchronized with another thread. Execution of a thread is paused when it reaches a synchronization point until at least one other thread with which it is intended to be synchronized reaches a corresponding synchronization point. Execution is subsequently resumed. A control core maintains status data for threads and can cause a thread that is ready to run to use execution resources that were occupied by a thread that is waiting for a synchronization event.

Ordering of shader code execution
11966998 · 2024-04-23 · ·

Examples described herein relate to a graphics processing apparatus that includes a memory device and a graphics processing unit (GPU). In some examples, the GPU is configured to execute a shader program that is to identify at least two code blocks that are independent from each other and cause execution of an unexecuted independent code block with available data based on use of a scoreboard to track data availability for independent code blocks. In some examples, execution of the shader program is to cause the GPU to select a first code block identifier for tracking completion of a dependency of the first independent code block. In some examples, execution of the shader program is to cause the GPU to identify an offset to a first instruction position in a sequence of instructions of the first independent code block in an instruction queue.

METHOD FOR IMPLEMENTING A SOFTWARE MODULE DEFINED BY A NON-INTERLEAVED DIRECTED ACYCLIC GRAPH IN A MULTICORE ENVIRONMENT
20240118879 · 2024-04-11 ·

An elementary method for implementing a software module defined by an elementary directed acyclic graph, including the following steps: copying the code of the initial sequence, adding a fork function at the end of the initial sequence, copying the code of a parallel sequence, adding a join flag function at the end of said parallel sequence, copying the code of the other parallel sequence, adding a join wait function at the end of the other parallel sequence.

Method and system for yield operation supporting thread-like behavior

A method, system, and computer program product synchronize a group of workitems executing an instruction stream on a processor. The processor is yielded by a first workitem responsive to a synchronization instruction in the instruction stream. A first one of a plurality of program counters is updated to point to a next instruction following the synchronization instruction in the instruction stream to be executed by the first workitem. A second workitem is run on the processor after the yielding.

EMI mitigation on high-speed lanes using false stall
10459860 · 2019-10-29 · ·

Methods and apparatus relating to techniques for Electromagnetic Interference (EMI) mitigation on high-speed lanes using false stall are described. In one embodiment, protocol logic determines whether to perform a false stall operation on a lane in response to a determination that no data is to be sent over the lane and that data is being transmitted over the lane. The false stall operation includes sending one or more training symbols (e.g., immediately) after an End Of Burst (EOB) signal over the lane, instead of allowing the lane to stall. Other embodiments are also disclosed.

LOW-OVERHEAD DETECTION TECHNIQUES FOR SYNCHRONIZATION PROBLEMS IN PARALLEL AND CONCURRENT SOFTWARE

The techniques described herein may provide techniques to detect, categorize, and diagnose synchronization issues that provide improved performance and issue resolution. For example, in an embodiment, a method may comprise detecting occurrence of synchronization performance problems in software code, when at least some detected synchronization performance problems occur when a contention rate for software locks is low, determining a cause of the synchronization performance problems, and modifying the software code to remedy the cause of the synchronization performance problems so as to improve synchronization performance of the software code.

General purpose distributed data parallel computing using a high level language

General-purpose distributed data-parallel computing using a high-level language is disclosed. Data parallel portions of a sequential program that is written by a developer in a high-level language are automatically translated into a distributed execution plan. The distributed execution plan is then executed on large compute clusters. Thus, the developer is allowed to write the program using familiar programming constructs in the high level language. Moreover, developers without experience with distributed compute systems are able to take advantage of such systems.

OPTIMIZE CONTROL-FLOW CONVERGENCE ON SIMD ENGINE USING DIVERGENCE DEPTH
20190294444 · 2019-09-26 ·

There are provided a system, a method and a computer program product for selecting an active data stream (a lane) while running Single Program Multiple Data code on a Single Instruction Multiple Data machine. The machine runs an instruction stream over input data streams and machine increments lane depth counters of all active lanes upon the thread-PC reaching a branch operation and updates the lane-PC of each active lane according to targets of the branch operation. An instruction of the instruction stream includes a barrier indicating a convergence point for all lanes to join. In response to a lane reaching a barrier: evaluating whether all lane-PCs are set to a same thread-PC; and if the lane-PCs are not set to the same thread-PC, selecting an active lane from the plurality of lanes; otherwise, incrementing the lane-PCs of all the lanes, and then selecting an active lane from the plurality of lanes.