G06F8/451

Optimizing memory bandwidth in spatial architectures
11481329 · 2022-10-25 · ·

A technique to facilitate efficient, parallelized execution of a program using a multiprocessor system having two or more processors includes detecting and, optionally, minimizing broadcast data communication between a shared memory and two or more processors. To this end, the broadcast space of a data structure is generated as an intersection of the reuse space of the data structure and the placement space of a statement accessing the data structure. A non-empty broadcast space implies broadcast data communication that can be minimized by rescheduling the statement accessing the data structure.

Arithmetic enhancement of C-like smart contracts for verifiable computation

A system converts high level source code into an arithmetic circuit that represents the functionality expressed in the source code, such as a smart contract as used in relation to a blockchain platform. The system processes a portion of high level source code to generate an arithmetic circuit. The arithmetic circuit comprises one or more arithmetic gates arranged to represent at least some of the functionality expressed in the source code.

OFFLOAD SERVER, OFFLOAD CONTROL METHOD, AND OFFLOAD PROGRAM
20230066594 · 2023-03-02 ·

An offload server includes: an application code analysis section configured to analyze source code of an application; a data transfer designation section configured to, on the basis of a result of the code analysis, designate GPU processing for a loop statement by using at least one selected from the group of directive clauses, of OpenACC, consisting of a ‘kernels’ directive clause, a ‘parallel loop’ directive clause, and a ‘parallel loop vector’ directive clause; and a parallel processing designation section configured to identify loop statements in the application, and, for each of the identified loop statements, specify a statement specifying application of parallel processing by the GPU and perform compilation.

Dynamic generation of cloud platform application programming interface calls
11630705 · 2023-04-18 · ·

An apparatus comprises a processing device configured to receive a request to execute an action on cloud assets of a cloud platform utilizing an application programming interface (API) exposed by the cloud platform, the request comprising a set of keyword arguments, and to generate a code class instance for the API. The processing device is also configured to instantiate, via the generated code class instance, a client for the cloud platform utilizing a first subset of arguments in the set of keyword arguments, to determine from the set of keyword arguments an identifier of the action to be executed, and to execute the action by running a function of the generated code class instance, the function dynamically generating an API call utilizing the instantiated client for the cloud platform, the determined identifier, and a second subset of arguments in the set of keyword arguments.

DYNAMIC COMPUTATION OFFLOADING TO GRAPHICS PROCESSING UNIT
20230061087 · 2023-03-02 ·

A method includes receiving source code of a program to be compiled and compiling the source code of the program. Compiling the source code includes identifying a first function in the source code of the program that is a candidate to be executed by a graphics processing unit (GPU), generating a first intermediate representation and a second intermediate representation for the first function, and inserting a second function in the program in place of the first function, wherein the second function is to select one of the first intermediate representation or the second intermediate representation to be executed. The method further includes providing a compiled program package including the second function, the first intermediate representation and the second intermediate representation.

Packing conditional branch operations

Disclosed in some examples, are systems, methods, devices, and machine readable mediums which use improved dynamic programming algorithms to pack conditional branch instructions. Conditional code branches may be modeled as directed acyclic graphs (DAGs) which have a topological ordering. These DAGs may be used to construct a dynamic programming table to find a partial mapping of one path onto the other path using dynamic programming algorithms.

Compiler operations for heterogeneous code objects

Described herein are techniques for performing compilation operations for heterogeneous code objects. According to the techniques, a compiler identifies architectures targeted by a compilation unit, compiles the compilation unit into a heterogeneous code object that includes a different code object portion for each identified architecture, performs name mangling on functions of the compilation unit, links the heterogeneous code object with a second code object to form an executable, and generates relocation records for the executable.

Sparsity Uniformity Enforcement for Multicore Processor

Methods and systems relating to the field of parallel computing are disclosed herein. The methods and systems disclosed include approaches for sparsity uniformity enforcement for a set of computational nodes which are used to execute a complex computation. A disclosed method includes determining a sparsity distribution in a set of operand data, and generating, using a compiler, a set of instructions for executing, using the set of operand data and a set of processing cores, a complex computation. Alternatively, the method includes altering the operand data. The method also includes distributing the set of operand data to the set of processing cores for use in executing the complex computation in accordance with the set of instructions. Either the altering is conducted to, or the compiler is programmed to, balance the sparsity distribution among the set of processing cores.

MULTI-LEVEL INTERMEDIATE REPRESENTATION DECODER FOR HETEROGENEOUS PLATFORMS

A method, apparatus, and a non-transitory computer-readable storage medium for generating heterogenous platform code. The method may obtain a neural network model. The neural network model may be programed to run on at least one platform. The method may also obtain an initial intermediate representation (IR) code by encoding the neural network model, and obtain a target IR code by adding decorations to the initial IR code based on a target platform. The method may also output an executable code optimized to run on the target platform by decoding the target IR code.

Systems and methods for memory layout determination and conflict resolution

A dataflow graph has operation units that are configured to be producer operation units to produce tensors for execution of the application, and to be consumer operation units to consume the tensors for execution of the application. Compile time logic is configured to process the dataflow graph to determine, for the tensors, expected producer memory layouts, expected consumer memory layouts, and current memory layouts. The expected producer memory layouts specify memory layouts required by the producer operation units that produce the tensors. The expected consumer memory layouts specify the memory layouts required by the consumer operation units that consume the tensors. The current memory layouts specify the memory layouts of the tensors. Each of the memory layouts includes a vector dimension and at least one of a vector ordering and a data alignment.