Patent classifications
G06F8/458
CODE COVERAGE GENERATION IN GPU BY USING HOST-DEVICE COORDINATION
System and method of compiling a program having a mixture of host code and device code to enable code coverage data collection for device code execution. An exemplary integrated compiler can compile source code programmed to be executed by a host processor (e.g., CPU) and a co-processor (e.g., a GPU) concurrently. The compilation can generate an instrumented executable code which includes: coverage instrumentation counters for the device functions; mapping information that maps the counters with the instrumented source points; and instructions for the host processor to allocate and initialize device memory for the counters and to retrieve collected code coverage information from the device memory to the host memory. Execution of the instrumented executable can yield a coverage report on the device code functions.
Parallelization compiling method, parallelization compiler, and vehicular device
A parallelization compiling method for generating a segmented program from a sequential program, in which multiple macro tasks are included and at least two of the macro tasks have a data dependency relationship with one another, includes determining an existence of invalidation information for invalidating at least a part of the data dependency relationship between the at least two of the plurality of macro tasks before compiling the sequential program into the segmented program, and generating the segmented program by compiling the sequential program into the segmented program with reference to a determination result of the existence of the invalidation information. When the invalidation information is determined to exist, the at least a part of the data dependency relationship is invalidated before the compiling of the sequential program into the segmented program.
ANNOTATION-DRIVEN FRAMEWORK FOR GENERATING STATE MACHINE UPDATES
Embodiments of the present disclosure relate to techniques for maintaining a state of a distributed system. In particular, certain embodiments relate to identifying a function. Some embodiments relate to, upon determining that the function comprises an annotation indicating that the function is capable of modifying the state of the distributed system, transforming the function to allow the function to generate updates to a state machine.
Method and apparatus for synchronization annotation
Methods and system for providing synchronization of a multi-threaded application includes analyzing a source file of the application to identify one or more synchronization annotations contained therein, wherein the synchronization annotations are defined using declarative statements. One or more synchronization annotation processors are identified and invoked for processing the one or more synchronization annotations identified in the source file so as to generate code files. The source file is compiled to generate one or more class files by compiling the procedural code within the source file to generate one or more class files, and compiling the code files to generate the one or more class files. The class files associated with the code files are used by the multiple threads during execution of the application to arbitrate access to methods and data manipulated by classes within the class files associated with the procedural code.
Dynamic alias checking with transactional memory
An approach to dynamic run-time alias checking comprising creating a main thread and a helper thread, computing an optimized first region of code in a rollback-only transactional memory associated with the main thread checking for one or more alias dependencies in an un-optimized first region of code, responsive to a determination in a predetermined amount of time that no alias dependencies are present in the un-optimized first region of code, committing a transaction and responsive to at least one of a failure to determine results of the check for one or more alias dependencies in the predetermined amount of time and a determination in the predetermined amount of time that alias dependencies are present in the un-optimized first region of code, performing a rollback of the transaction and executing the un-optimized first region of code.
SYSTEM AND METHOD TO ACCELERATE REDUCE OPERATIONS IN GRAPHICS PROCESSOR
Embodiments described herein provide a system, method, and apparatus to accelerate reduce operations in a graphics processor. One embodiment provides an apparatus including one or more processors, the one or more processors including a first logic unit to perform a merged write, barrier, and read operation in response to a barrier synchronization request from a set of threads in a work group, synchronize the set of threads, and report a result of an operation specified in association with the barrier synchronization request.
COMPUTER SYSTEM AND METHOD FOR PARALLEL PROGRAM CODE OPTIMIZATION AND DEPLOYMENT
A compiler system, method and computer program product for optimizing a program is disclosed. The compiler includes an extractor module configured to extract, from an initial program code, a hierarchical task representation wherein each node of the hierarchical task representation corresponds to a potential unit of execution. The root node of the hierarchical task representation represents the entire initial program code and each child node represents a sub-set of units of execution of its respective parent node. It further has a parallelizer module configured to apply to the hierarchical task representation pre-defined parallelization rules associated with the processing device to automatically adjust the hierarchical task representation by assigning particular units of execution to particular processing units of the processing device and by inserting communication and/or synchronization in that the adjusted hierarchical task representation reflects parallel program code for the processing device.
METHOD AND DEVICE FOR SIMULATING SYNCHRONOUS BLOCKING IN ASYNCHRONOUS ENVIRONMENT, STORAGE MEDIUM, SERVER AND TERMINAL
Method and device for simulating synchronous blocking in an asynchronous environment, a storage medium, a server and a terminal are provided. The method includes: receiving converted codes from a server, wherein original codes run in a synchronous environment are converted to obtain the converted codes that are run in the asynchronous environment, and the converted codes comprise blocking detection codes; following execution of the converted codes, executing the blocking detection codes to detect whether there is a blocking; when the blocking is detected, sending a location of a blocking point to the server and saving an execution context; and after an operation corresponding to the blocking is performed, restoring the execution context and continuing executing codes after the location of the blocking point. Whether there is a blocking may be detected accurately and code execution efficiency may be improved.
SYSTEM AND METHOD TO ACCELERATE REDUCE OPERATIONS IN GRAPHICS PROCESSOR
Embodiments described herein provide a system, method, and apparatus to accelerate reduce operations in a graphics processor. One embodiment provides an apparatus including one or more processors, the one or more processors including a first logic unit to perform a merged write, barrier, and read operation in response to a barrier synchronization request from a set of threads in a work group, synchronize the set of threads, and broadcast a result of an operation specified in association with the barrier synchronization request.
Code refactoring mechanism for asynchronous code optimization using topological sorting
Methods, systems, apparatuses, and computer program products are provided for transforming asynchronous code into more efficient, logically equivalent asynchronous code; Program code is converted into a first syntax tree. A dependency graph is generated from the first syntax tree with each node of the dependency graph corresponding to a code statement and having an assigned weight. Weighted topological sorting of the dependency graph is performed to generate a sorted dependency graph. A second syntax tree is generated from the sorted dependency graph. In another implementation, the program code is transformed into await-relaxed and/or loop-relaxed program code prior to being transformed into the first syntax tree.