IPIQ

G06F8/4441

Method and apparatus for enabling autonomous acceleration of dataflow AI applications

11573777 · 2023-02-07 ·

Huawei Technologies Co., Ltd.

A method includes analyzing a dataflow graph representing data dependencies between operators of a dataflow application to identify a plurality of candidate groups of the operators. Based on characteristics of a given hardware accelerator and the operators of a given candidate group of the plurality of candidate groups, determining whether the operators of the given candidate group are to be combined. In response to determining that the operators of the given candidate group are to be combined, retrieving executable binary code segments corresponding to the operators of the given candidate group, generating a unit of binary code including the executable binary code segments and metadata representing an execution control flow among the executable binary code segments, and dispatching the unit of code to the given hardware accelerator for execution of the unit of code.

TECHNIQUES FOR PARALLEL EXECUTION

20220342673 · 2022-10-27 ·

Apparatuses, systems, and techniques to identify instructions for advanced execution. In at least one embodiment, a processor performs one or more instructions that have been identified by a compiler to be speculatively performed in parallel.

ACCELERATION OF OPERATIONS

20220342666 · 2022-10-27 ·

Dz-ching Ju

Apparatuses, systems, and techniques to reduce a sequence of operations to an equivalent sequence having a smaller number of operations. In at least one embodiment, a sequence of matrix operations are accelerated by combining operations that reorder a matrix with a matrix multiplication operation.

Loop lock reservation

11609752 · 2023-03-21 ·

International Business Machines Corporation

Andrew James Craik

Embodiments relate to a system, program product, and method for implementing loop lock reservations, and, more specifically, for holding a lock reservation across some or all of the iterations of a loop, and under certain conditions, temporarily effect a running thread to yield the reservation and allow other threads to enter the lock.

RANDOMIZED COMPILER OPTIMIZATION SELECTION FOR IMPROVED COMPUTER SECURITY

20230079426 · 2023-03-16 ·

California Institute Of Technology

Michael Ian Ferguson

A method and system provide the ability to compile computer source code. The source code is pre-processed to generate pure source code that includes definitions required for interpretation. The pure source code is formalized in a compiler, into assembly language that is processor specific. The formalization includes determining a set of two or more optimization routines, randomly selecting a selected optimization routine from the set of two or more optimization routines, and applying the selected optimization routine to each segment of the pure source code in a serialized manner. An executable binary file is then output and executed based on the formalized pure source code.

PARTIAL DATA TYPE PROMOTION TO EXPLOIT EFFICIENT VECTORIZATION IN MICROPROCESSORS

20230073063 · 2023-03-09 ·

Aspects of the invention include a compiler detecting an expression in a loop that includes elements of mixed data types. The compiler then promotes elements of a sub-expression of the expression to a same intermediate data type. The compiler then calculates the sub-expression using the elements of the same intermediate data type.

Generating closures from abstract representation of source code

11474797 · 2022-10-18 ·

Capital One Services, Llc

Behdad Forghani

A device may receive source code and identify, based on the source code, an abstract syntax tree representing an abstract syntactic structure of the source code. Based on the abstract syntax tree, the device may identify a closure, the closure implementing a function based on at least a portion of the abstract syntax tree. In addition, the device may perform an action based on the closure.

Program rewrite device, storage medium, and program rewrite method

11599341 · 2023-03-07 ·

Fujitsu Limited

Masaki Arai

A program rewrite method executed by a computer, the method includes rewriting a program to output a first output group by performing operations for a first variable among a plurality of variables with a plurality of data types; rewriting the program to output a second output group by performing operations for a second variable among the plurality of variables with a plurality of data types; identifying, from the first output group and the second output group, a third output group that satisfied a predetermined criterion as a result of executing the rewritten programs; determining a data type that corresponds to the third output group as a use data type; and outputting a program in which the use data type is set for each of the plurality of variables.

OFFLOAD SERVER, OFFLOAD CONTROL METHOD, AND OFFLOAD PROGRAM

20230066594 · 2023-03-02 ·

Yoji YAMATO

An offload server includes: an application code analysis section configured to analyze source code of an application; a data transfer designation section configured to, on the basis of a result of the code analysis, designate GPU processing for a loop statement by using at least one selected from the group of directive clauses, of OpenACC, consisting of a ‘kernels’ directive clause, a ‘parallel loop’ directive clause, and a ‘parallel loop vector’ directive clause; and a parallel processing designation section configured to identify loop statements in the application, and, for each of the identified loop statements, specify a statement specifying application of parallel processing by the GPU and perform compilation.

OFFLOAD SERVER, OFFLOAD CONTROL METHOD, AND OFFLOAD PROGRAM

20230065994 · 2023-03-02 ·

Yoji YAMATO

An offload server (1) includes: an application code analysis section (112) configured to analyze source code of an application; a data transfer designation section (113) configured to, on the basis of a result of the code analysis, designate a data transfer to be collectively performed on, before starting GPU processing and after finishing the GPU processing, of variables that need to be transferred between a CPU and a GPU, those which are not mutually referenced nor mutually updated between CPU processing and the GPU processing and which are only to be returned to the CPU as a result of the GPU processing; a parallel processing designation section (114) configured to identify loop statements in the application, and, for each of the identified loop statements, specify a statement specifying application of parallel processing by the GPU and perform compilation.

Patent classifications

G06F8/4441