Patent classifications
G06F8/4441
Method and apparatus for enabling autonomous acceleration of dataflow AI applications
A method includes analyzing a dataflow graph representing data dependencies between operators of a dataflow application to identify a plurality of candidate groups of the operators. Based on characteristics of a given hardware accelerator and the operators of a given candidate group of the plurality of candidate groups, determining whether the operators of the given candidate group are to be combined. In response to determining that the operators of the given candidate group are to be combined, retrieving executable binary code segments corresponding to the operators of the given candidate group, generating a unit of binary code including the executable binary code segments and metadata representing an execution control flow among the executable binary code segments, and dispatching the unit of code to the given hardware accelerator for execution of the unit of code.
TECHNIQUES FOR PARALLEL EXECUTION
Apparatuses, systems, and techniques to identify instructions for advanced execution. In at least one embodiment, a processor performs one or more instructions that have been identified by a compiler to be speculatively performed in parallel.
ACCELERATION OF OPERATIONS
Apparatuses, systems, and techniques to reduce a sequence of operations to an equivalent sequence having a smaller number of operations. In at least one embodiment, a sequence of matrix operations are accelerated by combining operations that reorder a matrix with a matrix multiplication operation.
Loop lock reservation
Embodiments relate to a system, program product, and method for implementing loop lock reservations, and, more specifically, for holding a lock reservation across some or all of the iterations of a loop, and under certain conditions, temporarily effect a running thread to yield the reservation and allow other threads to enter the lock.
RANDOMIZED COMPILER OPTIMIZATION SELECTION FOR IMPROVED COMPUTER SECURITY
A method and system provide the ability to compile computer source code. The source code is pre-processed to generate pure source code that includes definitions required for interpretation. The pure source code is formalized in a compiler, into assembly language that is processor specific. The formalization includes determining a set of two or more optimization routines, randomly selecting a selected optimization routine from the set of two or more optimization routines, and applying the selected optimization routine to each segment of the pure source code in a serialized manner. An executable binary file is then output and executed based on the formalized pure source code.
PARTIAL DATA TYPE PROMOTION TO EXPLOIT EFFICIENT VECTORIZATION IN MICROPROCESSORS
Aspects of the invention include a compiler detecting an expression in a loop that includes elements of mixed data types. The compiler then promotes elements of a sub-expression of the expression to a same intermediate data type. The compiler then calculates the sub-expression using the elements of the same intermediate data type.
Generating closures from abstract representation of source code
A device may receive source code and identify, based on the source code, an abstract syntax tree representing an abstract syntactic structure of the source code. Based on the abstract syntax tree, the device may identify a closure, the closure implementing a function based on at least a portion of the abstract syntax tree. In addition, the device may perform an action based on the closure.
Program rewrite device, storage medium, and program rewrite method
A program rewrite method executed by a computer, the method includes rewriting a program to output a first output group by performing operations for a first variable among a plurality of variables with a plurality of data types; rewriting the program to output a second output group by performing operations for a second variable among the plurality of variables with a plurality of data types; identifying, from the first output group and the second output group, a third output group that satisfied a predetermined criterion as a result of executing the rewritten programs; determining a data type that corresponds to the third output group as a use data type; and outputting a program in which the use data type is set for each of the plurality of variables.
OFFLOAD SERVER, OFFLOAD CONTROL METHOD, AND OFFLOAD PROGRAM
An offload server includes: an application code analysis section configured to analyze source code of an application; a data transfer designation section configured to, on the basis of a result of the code analysis, designate GPU processing for a loop statement by using at least one selected from the group of directive clauses, of OpenACC, consisting of a ‘kernels’ directive clause, a ‘parallel loop’ directive clause, and a ‘parallel loop vector’ directive clause; and a parallel processing designation section configured to identify loop statements in the application, and, for each of the identified loop statements, specify a statement specifying application of parallel processing by the GPU and perform compilation.
OFFLOAD SERVER, OFFLOAD CONTROL METHOD, AND OFFLOAD PROGRAM
An offload server (1) includes: an application code analysis section (112) configured to analyze source code of an application; a data transfer designation section (113) configured to, on the basis of a result of the code analysis, designate a data transfer to be collectively performed on, before starting GPU processing and after finishing the GPU processing, of variables that need to be transferred between a CPU and a GPU, those which are not mutually referenced nor mutually updated between CPU processing and the GPU processing and which are only to be returned to the CPU as a result of the GPU processing; a parallel processing designation section (114) configured to identify loop statements in the application, and, for each of the identified loop statements, specify a statement specifying application of parallel processing by the GPU and perform compilation.