Patent classifications
G06F8/458
Apparatus and method for secondary offloads in graphics processing unit
The invention relates to an apparatus for second offloads in a graphics processing unit (GPU). The apparatus includes an engine; and a compute unit (CU). The engine is arranged operably to store an operation table including entries. The CU is arranged operably to fetch computation codes including execution codes, and synchronization requests; execute each execution code; and send requests to the engine in accordance with the synchronization requests for instructing the engine to allow components inside or outside of the GPU to complete operations in accordance with the entries of the operation table.
COMPILER-BASED INPUT SYNCHRONIZATION FOR PROCESSOR WITH VARIANT STAGE LATENCIES
The technology disclosed provides a system that comprises a processor with computing units on an integrated circuit substrate. The processor is configured to map a program across multiple hardware stages with each hardware stage executing a corresponding operation of the program at a different stage latency dependent on an operation type and an operand format. The system further comprises a runtime logic that configures the compute units with configuration data. The configuration data causes first and second producer hardware stages in a given compute unit to execute first and second data processing operations and produce first and second outputs at first and second stage latencies, and synchronizes consumption of the first and second outputs by a consumer hardware stage in the given compute unit for execution of a third data processing operation by introducing a register storage delay that compensates for a difference between the first and second stage latencies.
Supporting dynamic behavior in statically compiled programs
Support for dynamic behavior is provided during static compilation while reducing reliance on JIT compilation and large runtimes. A mapping is created between metadata and native code runtime artifacts, such as between type definition metadata and a runtime type description, or between method definition metadata, a runtime type description, and a native code method location, or field definition metadata, a runtime type description, and a field location. A mapping between runtime artifacts may also be created. Some compilation results include trampoline code to support a reflection invocation of an artifact in the reduced runtime support environment, for virtual method calls, call-time bounds checking, calling convention conversion, or compiler-intrinsic methods. Some results support runtime diagnostics by including certain metadata even when full dynamic behavior is not supported.
Cooperative thread array reduction and scan operations
One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread.
System and method for selectively delaying execution of an operation based on a search for uncompleted predicate operations in processor-associated queues
A system and method of parallelizing programs employs runtime instructions to identify data accessed by program portions and to assign those program portions to particular processors based on potential overlap between the access data. Data dependence between different program portions may be identified and used to look for pending “predicate” program portions that could create data dependencies and to postpone program portions that may be dependent while permitting parallel execution of other program portions.
DATAFLOW GRAPH PROGRAMMING ENVIRONMENT FOR A HETEROGENOUS PROCESSING SYSTEM
Examples herein describe techniques for generating dataflow graphs using source code for defining kernels and communication links between those kernels. In one embodiment, the graph is formed using nodes (e.g., kernels) which are communicatively coupled by edges (e.g., the communication links between the kernels). A compiler converts the source code into a bit stream and/or binary code which configure a heterogeneous processing system of a SoC to execute the graph. The compiler uses the graph expressed in source code to determine where to assign the kernels in the heterogeneous processing system. Further, the compiler can select the specific communication techniques to establish the communication links between the kernels and whether synchronization should be used in a communication link. Thus, the programmer can express the dataflow graph at a high-level (using source code) without understanding about how the operator graph is implemented using the heterogeneous hardware in the SoC.
Method and system to enable print functionality in high-level synthesis (HLS) design platforms
This disclosure generally relates to high-level synthesis (HLS) platforms, and, more particularly, enable print functionality in high-level synthesis (HLS) platforms. The recent availability FPGA-HLS is a great success due to availability of compilers for FPGAs as opposed to hardware description languages (HDLs) that requires special skills. However, the compilers within the HLS design platform includes limited support for all the standard libraries, wherein features like print functionality is not supported. The invention discloses techniques to enable print functionality in HLS design platforms based on source-to-source transformations and stream combining scheme. In addition to enabling print functionality, the invention also discloses a formatter technique to receive-format FPGA data into human interpretable data.
PROJECTION OF BUILD AND DESIGN-TIME INPUTS AND OUTPUTS BETWEEN DIFFERENT BUILD ENVIRONMENTS
Methods, systems, apparatuses, and computer program products are described that enable local builds to be substantially equivalent with remote builds. In embodiments, local build and design-time inputs and/or outputs of a local build environment hosted on a local computing device are projected to remote build and design-time inputs and/or outputs of a remote build environment hosted on a remote computing device. In further embodiments, remote build and design-time inputs and/or outputs of the remote build environment are projected to local build and design time inputs and/or outputs of the local build environment. In still further embodiments, first build and design-time inputs and/or outputs of a first build environment hosted on a computing device are projected to second build and design-time inputs and/or outputs of a second build environment hosted on the same computing device.
Transmission point pattern extraction from executable code in message passing environments
Extractable annotations are created and stored for different transmission points. In some instances, this occurs during compiling. One type of transmission point is a message being passed to another process. Once the transmission point is identified, a pattern defining output for the transmission point is then identified. A first extractable annotation defining the first patter is then created and stored for subsequent use.
METHOD OF REORDERING CONDITION CHECKS
Described is a computer-implemented method of reordering condition checks. Two or more condition checks in computer code that may be reordered within the code are identified. It is determined that the execution frequency of a later one of the condition checks is satisfied at a greater frequency than a preceding one of the condition checks. It is determined that there is an absence of side effects in the two or more condition checks. The values of the condition checks are propagated and abstract interpretation is performed on the values that are propagated. It is determined that the condition checks are exclusive of each other, and the condition checks are reordered within the computer code.