G06F8/441

RELAXING USER-SPECIFIED REGISTER CONSTRAINTS FOR IMPROVING REGISTER ALLOCATION
20170286078 · 2017-10-05 ·

A method is provided for relaxing register constraints in a computer program. The method includes identifying, by a processor enabled compiler, unrequired register constraints imposed by a user on the computer program. The unrequired register constraints are unrequired for a proper operation of the computer program. The method further includes automatically relaxing, by the processor enabled compiler, the identified unrequired register constraints to optimize register allocation for the computer program.

COMPILER FOR TRANSLATING BETWEEN A VIRTUAL IMAGE PROCESSOR INSTRUCTION SET ARCHITECTURE (ISA) AND TARGET HARDWARE HAVING A TWO-DIMENSIONAL SHIFT ARRAY STRUCTURE
20170242669 · 2017-08-24 · ·

A method is described that includes translating higher level program code including higher level instructions having an instruction format that identifies pixels to be accessed from a memory with first and second coordinates from an orthogonal coordinate system into lower level instructions that target a hardware architecture having an array of execution lanes and a shift register array structure that is able to shift data along two different axis. The translating includes replacing the higher level instructions having the instruction format with lower level shift instructions that shift data within the shift register array structure.

Multi-version shaders
11243752 · 2022-02-08 · ·

Described herein are techniques for generating a stitched shader program. The techniques include identifying a set of shader programs to include in the stitched shader program, wherein the set includes at least one multiversion shader program that includes a first version of instructions and a second version of instructions, wherein the first version of instructions uses a first number of resources that is different than a second number of resources used by the second version of instructions. The techniques also include combining the set of shader programs to form the stitched shader program. The techniques further include determining a number of resources for the stitched shader program. The techniques also include based on the determined number of resources, modifying the instructions corresponding to the multiversion shader program to, when executed, execute either the first version of instructions, or the second version of instructions.

INPUT/OUTPUT (I/O) BINDING WITH AUTOMATIC INTERNATIONAL ELECTROMECHANICAL COMMISSION (IEC) ADDRESS GENERATION IN REMOTE TERMINAL UNIT (RTU) CONFIGURATION
20170235691 · 2017-08-17 ·

A method includes automatically assigning, by at least one processor, an IEC address to an I/O binding variable for an RTU. This includes identifying a type of the I/O binding variable and identifying a size of the I/O binding variable based on the identified type. The size represents a number of memory locations to be used to store the I/O binding variable in at least one memory of the RTU. This also includes, in response to determining that the at least one memory contains a free space to store the I/O binding variable based on the identified size, assigning the IEC address identifying the free space to the I/O binding variable.

High-level programming language which utilizes virtual memory
11429390 · 2022-08-30 · ·

Systems and methods for utilizing virtual memory with a high-level programming language are provided. Multiple address spaces are created in virtual memory, wherein each of the multiple address spaces include data entries, each of which have a value. A machine executable software program is operated which utilizes each of said multiple address spaces. At least a first one of the address spaces is independent from at least a second one of said address spaces, and at least a third one of the address spaces is electronically associated with at least a fourth one of the address spaces.

PARALLEL PROCESSING ARCHITECTURE USING DISTRIBUTED REGISTER FILES
20220308872 · 2022-09-29 · ·

Techniques for task processing based on a parallel processing architecture using distributed register files are disclosed. A two-dimensional array of compute elements is accessed. Each compute element is known to a compiler and is coupled to its neighboring compute elements. The array of compute elements is controlled on a cycle-by-cycle basis. The controlling is enabled by a stream of wide control words generated by the compiler. Virtual registers are mapped to a plurality of physical register files distributed among one or more of the compute elements. Virtual registers are represented by the compiler. The mapping is performed by the compiler. A broadcast write operation is enabled to two or more of the physical register files. Operations contained in the control words are executed. Operations are enabled by at least one of the distributed physical register files. Implementation in separate compute elements enables parallel operation processing.

Generating tie code fragments for binary translation

Systems and methods for binary translation of executable code. An example binary translation method comprises: decoding a current source code fragment compatible with a source instruction set architecture (ISA); identifying a first source register referenced by the current source code fragment; determining that the first source register is not referenced by a register mapping table, wherein the register mapping table comprises a plurality of entries, each entry specifying a source register, a target register, and a weight value; identifying, among the plurality of mapping table entries, a mapping table entry comprising a highest weight value, wherein the identified mapping table entry specifies a second source register and a second target register; replacing, in the identified mapping table entry, an identifier of the second source register with an identifier of the first source register; and translating, using the mapping table entry, the current source code fragment into a target code fragment, wherein the target code fragment is compatible with a target ISA.

EMBEDDED COMPUTATION INSTRUCTION SET OPTIMIZATION
20220237008 · 2022-07-28 ·

The technology disclosed herein pertains to a system and method for providing optimization of embedded computation instruction set (CIS), the method including downloading the CIS to a computational storage device (CSD), committing the CIS to a program slot in a computational storage processor of the CSD, simulating execution of the CIS at the committed slot to generate static analysis of one or more registers of the CIS to determine ranges of values that the one or more registers can take through a lifecycle of the CIS, demoting one or more of the registers to lower size registers, and generating a native instruction set from the CIS based on the register demotions.

Compiler sub expression directed acyclic graph (DAG) remat for register pressure

The present disclosure relates to devices and methods for transforming program source code using a rematerialization operation. The devices and methods may identify at least one hot spot with high register pressure in a program source code for an application and identify a plurality of live variables within the at least one hot spot. The devices and methods may group the plurality of live variables by a basic block that has contained a define or single use of the plurality of live variables. The devices and methods may build a directed acyclic graph (DAG) for each basic block that has a grouped plurality of live variables. The devices and methods may save the DAG as a candidate instruction to move in the program source code and may generate transformed program source code for the application by moving the candidate instruction.

Context switching locations for compiler-assisted context switching

Generating context switching locations for compiler-assisted context switching. A set of possible locations is determined for preferred preemption points in a set of threads based on (i) an identification of a set of candidate markers for preferred preemption points and (ii) a type of characteristic that is associated with a possible location included in the set of possible locations. A modified set of possible locations is generated in a data structure based on the type of characteristic, wherein the modified set of possible locations indicate one or more preferred preemption points in the set of threads.