G06F8/441

Compiler for translating between a virtual image processor instruction set architecture (ISA) and target hardware having a two-dimensional shift array structure
11182138 · 2021-11-23 · ·

A method is described that includes translating higher level program code including higher level instructions having an instruction format that identifies pixels to be accessed from a memory with first and second coordinates from an orthogonal coordinate system into lower level instructions that target a hardware architecture having an array of execution lanes and a shift register array structure that is able to shift data along two different axis. The translating includes replacing the higher level instructions having the instruction format with lower level shift instructions that shift data within the shift register array structure.

Allocating Variables to Computer Memory
20220019531 · 2022-01-20 ·

A method of allocating variables to computer memory includes determining at compile time when each of the plurality of variables is live in a memory region and allocating a memory region to each variable wherein at least some variables are allocated at compile time to overlapping memory regions to be stored in those memory regions at runtime at non-overlapping times.

GPR OPTIMIZATION IN A GPU BASED ON A GPR RELEASE MECHANISM
20210358076 · 2021-11-18 ·

This disclosure provides systems, devices, apparatus and methods, including computer programs encoded on storage media, for GPR optimization in a GPU based on a GPR release mechanism. More specifically, a GPU may determine at least one unutilized branch within an executable shader based on constants defined for the executable shader. Based on the at least one unutilized branch, the GPU may further determine a number of GPRs that can be deallocated from previously allocated GPRs. The GPU may deallocate, for a subsequent thread within a draw call, the number of GPRs from the previously allocated GPRs during execution of the executable shader based on the determined number of GPRs to be deallocated.

Dynamic memory protection
11176060 · 2021-11-16 · ·

Presented herein are methods and systems for adjusting code files to apply memory protection for dynamic memory regions supporting run-time dynamic allocation of memory blocks. The code file(s), comprising a plurality of routines, are created for execution by one or more processors using the dynamic memory. Adjusting the code file(s) comprises analyzing the code file(s) to identify exploitation vulnerable routine(s) and adding a memory integrity code segment configured to detect, upon execution completion of each vulnerable routine, a write operation exceeding from a memory space of one or more of a subset of most recently allocated blocks allocated in the dynamic memory to a memory space of an adjacent block using marker(s) inserted in the dynamic memory in the boundary(s) of each of the subset's blocks. In runtime, in case the write operation is detected, the memory integrity code segment causes the processor(s) to initiate one or more predefined actions.

Multi-thread processing

A computer-implemented method for multi-thread processing, the method including: compiling a first plurality of threads using a corresponding first register set for each thread in the first plurality of threads, to obtain a first plurality of corresponding machine instruction codes; and fusing the first plurality of machine instruction codes using first instructions in an instruction set supported by a processing core, to obtain machine instruction code of a fused thread, the machine instruction code of the fused thread including thread portions corresponding to each thread of the first plurality of threads, in which the first instructions include load effective address instructions and control transfer instructions, in which the load effective address instructions and the control transfer instructions are compiled using a second register set, and in which jump operations between thread portions are implemented by the control transfer instructions inserted into the machine instruction code of the fused thread.

Systems and methods for reducing register bank conflicts based on a software hint bit causing a hardware thread switch

Mechanisms for reducing register bank conflicts based on software hint and hardware thread switch are disclosed. In some embodiments, an apparatus for thread switching includes a graphics processing unit (GPU) that includes a plurality of register banks to store operands that are assigned at least partially to avoid register bank conflicts. Decoding circuitry checks a thread switching field of a first instruction to be executed by a first thread. The GPU performs a thread switch mechanism to cause a second instruction to be executed by a second thread when the thread switching field of the first instruction is set.

Color selection schemes for storage allocation

A compiler-implemented technique for performing a storage allocation is described. Computer code to be converted into machine instructions for execution on an integrated circuit device is received. The integrated circuit device includes a memory having a set of memory locations. Based on the computer code, a set of values that are to be stored on the integrated circuit device are determined. An interference graph that includes the set of values and a set of interferences is constructed. While traversing the interference graph, a set of memory location assignments are generated by assigning the set of values to the set of memory locations in accordance with one or more color selection schemes.

Generating tie code fragments for binary translation

Systems and methods for binary translation of executable code. An example binary translation method comprises: decoding a current source code fragment compatible with a source instruction set architecture (ISA); identifying a first source register referenced by the current source code fragment; determining that the first source register is not referenced by a register mapping table, wherein the register mapping table comprises a plurality of entries, each entry specifying a source register, a target register, and a weight value; identifying, among the plurality of mapping table entries, a mapping table entry comprising a highest weight value, wherein the identified mapping table entry specifies a second source register and a second target register; replacing, in the identified mapping table entry, an identifier of the second source register with an identifier of the first source register; and translating, using the mapping table entry, the current source code fragment into a target code fragment, wherein the target code fragment is compatible with a target ISA.

Configuration of secondary processors

Systems and methods are provided for configuration of a secondary processor by a host processor. The host processor can access compiled firmware for the secondary processor, which has a parameter stored at a pre-determined address. The host processor can modify the parameter at the pre-determined address in the firmware to generate a modified firmware for the secondary processor. The host processor can further load the modified firmware into a memory of the secondary processor. The secondary processor can execute the modified firmware having the modified parameter. The host processor can further remodify the parameter in the memory of the secondary processor during runtime without having to recompile the firmware.

Methods and devices for computing a memory size for software optimization

There is provided methods and devices for computing a tile size for software optimization. A method includes receiving, by a computing device, information indicative of one or more of a set of loop bounds and a set of data shapes; processing, by the computing device, the information to determine a computation configuration based on the obtained information, the computation configuration implementable by a compiler, said processing including evaluating at least the computation configuration based on a build cost model, the build cost model representative of a data transfer cost and a data efficiency of the computation configuration; and transmitting, by the computing device, instructions directing the compiler to implement the computation configuration.