Patent classifications
G06F8/441
Adapting pre-compiled eBPF programs at runtime for the host kernel by offset inference
An approach is provided in which a method, system, and computer program product load a first program and a second program on a target host that includes a host kernel. The first program and the second program are both pre-compiled on a build system that is different from the target host. The method, system, and computer program product execute at least a subset of the first program on the host kernel and the subset of the first program captures a set of kernel structure information from the host kernel. The method, system, and program product load, at the target host, the set of kernel structure information into the second program at one or more placeholder locations. Then, the method, system and program product execute at least a subset of the second program with the set of kernel structure information on the target kernel.
COMPILED SHADER PROGRAM CACHES IN A CLOUD COMPUTING ENVIRONMENT
Apparatuses, systems, and techniques for a compiled shader program caches in a cloud computing environment.
GPR optimization in a GPU based on a GPR release mechanism
This disclosure provides systems, devices, apparatus and methods, including computer programs encoded on storage media, for GPR optimization in a GPU based on a GPR release mechanism. More specifically, a GPU may determine at least one unutilized branch within an executable shader based on constants defined for the executable shader. Based on the at least one unutilized branch, the GPU may further determine a number of GPRs that can be deallocated from previously allocated GPRs. The GPU may deallocate, for a subsequent thread within a draw call, the number of GPRs from the previously allocated GPRs during execution of the executable shader based on the determined number of GPRs to be deallocated.
METHODS AND DEVICES FOR COMPUTING A MEMORY SIZE FOR SOFTWARE OPTIMIZATION
There is provided methods and devices for computing a tile size for software optimization. A method includes receiving, by a computing device, information indicative of one or more of a set of loop bounds and a set of data shapes; processing, by the computing device, the information to determine a computation configuration based on the obtained information, the computation configuration implementable by a compiler, said processing including evaluating at least the computation configuration based on a build cost model, the build cost model representative of a data transfer cost and a data efficiency of the computation configuration; and transmitting, by the computing device, instructions directing the compiler to implement the computation configuration.
Compiler for RISC processor having specialized registers
A compiler is disclosed. The compiler is configured to generate executable code based on source code, where the source code includes a plurality of variables. The compiler includes an executable code generator configured to allocate a register to each of the source code variables, where the executable code generator is configured to select one of a group of register types to be allocated for each variable, and where the allocated register of each variable corresponds with the determined register type determined therefor.
Function evaluation using multiple values loaded into registers by a single instruction
A technique for efficient calling of functions on a processor generates an executable program having a function call by analysing an interface for the function that defines an argument expression and an internal value used solely within the function, and an argument declaration defining an argument value to be provided to the function when the program is run. A data structure is generated including the internal value and a resolved argument value derived from the argument expression and the argument value. A single instruction is encoded in the program to utilise the data structure. When the program is executed on a processor, the single instruction causes the processor to load the argument value and internal value from the data structure into registers in the processor, prior to evaluating the function. The function can then be executed without further register loads being performed.
GPR OPTIMIZATION IN A GPU BASED ON A GPR RELEASE MECHANISM
This disclosure provides systems, devices, apparatus and methods, including computer programs encoded on storage media, for GPR optimization in a GPU based on a GPR release mechanism. More specifically, a GPU may determine at least one unutilized branch within an executable shader based on constants defined for the executable shader. Based on the at least one unutilized branch, the GPU may further determine a number of GPRs that can be deallocated from previously allocated GPRs. The GPU may deallocate, for a subsequent thread within a draw call, the number of GPRs from the previously allocated GPRs during execution of the executable shader based on the determined number of GPRs to be deallocated.
Techniques For Compiling High-Level Inline Code
A processor circuit includes a compiler configured to receive a software program that comprises software code coded in an assembly language and inline software code coded in a high-level programming language, compile the inline software code coded in the high-level programming language within the software program into assembly code in the assembly language, and compile the assembly code and the software code coded in the assembly language into machine code for the processor circuit. A method includes determining if first and second instructions in a software program are combinable into one instruction word, combining the first and the second instructions in the software program into one instruction word if the first and the second instructions are combinable, and fetching the instruction word into a single register by storing the instruction word in the single register.
Compiler operations for heterogeneous code objects
Described herein are techniques for performing compilation operations for heterogeneous code objects. According to the techniques, a compiler identifies architectures targeted by a compilation unit, compiles the compilation unit into a heterogeneous code object that includes a different code object portion for each identified architecture, performs name mangling on functions of the compilation unit, links the heterogeneous code object with a second code object to form an executable, and generates relocation records for the executable.
Method of using multidimensional blockification to optimize computer program and device thereof
Disclosed embodiments relate to a method and device for optimizing compilation of source code. The proposed method receives a first intermediate representation code of a source code and analyses each basic block instruction of the plurality of basic block instructions contained in the first intermediate representation code for blockification. In order to blockify the identical instructions, the one or more groups of basic block instructions are assessed for eligibility of blockification. Upon determining as eligible, the group of basic block instructions are blockified using one of one dimensional SIMD vectorization and two-dimensional SIMD vectorization. The method further generates a second intermediate representation of the source code which is translated to executable target code with more efficient processing capacity.