G06F8/443

REGISTER PRESSURE TARGET FUNCTION SPLITTING
20230029183 · 2023-01-26 ·

Provided are embodiments for a method of performing register pressure targeted function splitting. The method can include determining a candidate region of a function, the candidate region comprising variables, and determining a number of available registers in a computing system for allocating the variables of the function. The method can also include grouping the variables in the candidate region into first variables and second variables based at least in part on the number of available registers, and splitting the candidate region of the function into split functions based at least in part on the grouping of the variables. Also provided are embodiments for a computer program product and a system for performing register pressure targeted function splitting

ZERO-COPY SPARSE MATRIX FACTORIZATION SYNTHESIS FOR HETEROGENEOUS COMPUTE SYSTEMS
20230024035 · 2023-01-26 ·

A system, method, and computer-readable medium for synthesizing zero-copy sparse matrix factorization operations in heterogeneous compute systems are provided. The system includes a host and an accelerator device. The host device is configured to divide an input matrix into a plurality of blocks which are transferred to a memory of the accelerator device. The host device is also configured to generate at least one index buffer that includes pointers to the block in the accelerator's memory, where each index buffer represents a frontal matrix associated with a matrix decomposition algorithm. The host processor is configured to receive one or more kernels configured to process the index buffer(s) on an accelerator device. The index buffers are processed by the accelerator device and the modified block data is written back to a memory of the host device to generate a factorized output matrix.

OPERATIONS ON MATRIX OPERANDS IRRESPECTIVE OF WHERE OPERANDS ARE STORED IN MEMORY
20230229588 · 2023-07-20 ·

Apparatus, systems, and techniques to transform data in memory for deep learning operations. In at least one embodiment, a compiler inserts one or more data transforms into a software program to transform one or more data elements arbitrarily arranged in memory and improve performance of one or more deep learning operations.

METHOD AND GRAPHICS PROCESSING SYSTEM FOR RENDERING ONE OR MORE FRAGMENTS HAVING SHADER-DEPENDENT PROPERTIES
20230230319 · 2023-07-20 ·

A graphics processing unit and method for processing fragments in a graphics processing system which includes: (i) hidden surface removal logic configured to perform hidden surface removal on fragments, and (ii) processing logic configured to execute shader programs for fragments. Initial processing of fragments is performed at the hidden surface removal logic. Some of the fragments have a shader-dependent property. A shader program for a particular fragment having the shader-dependent property is split into two stages. The initial processing comprises performing a depth test on the particular fragment. In response to the particular fragment passing the depth test of the initial processing in the hidden surface removal logic, a first stage, but not a second stage, of the shader program is executed for the particular fragment at the processing logic. The first stage of the shader program has instructions for determining the property of the particular fragment.

Isolating applications at the edge

Disclosed herein are enhancements for deploying application in an edge system of a communication network. In one implementation, a runtime environment identifies a request from a Hypertext Transfer Protocol (HTTP) accelerator service to be processed by an application. In response to the request, the runtime environment may identify an isolation resource to support the request, initiate execution of code for the application, and pass context to the code. Once initiated, the runtime environment may copy data from the artifact to the isolation resource using the context and return control to the HTTP accelerator service upon executing the code.

SYSTEMS AND METHODS FOR CODE GENERATION FOR A PLURALITY OF ARCHITECTURES

Systems and methods for code generation for a plurality of architectures. At a host architecture, a JIT compile operation is performed for a received JavaScript or Web Assembly file. The JIT compiler references a host library that has been updated to include at least one new JIT instruction. Output from the JIT compile operation is compiled machine code for the host architecture that has new opcodes (OPX) added, responsive to the new JIT instruction. The JIT compiler executes the opcodes (OPX) in XuCode mode, meaning that the host architecture switches into a hardware protected private ISA (Instruction Set Architecture) called XuCode to implement the new JIT opcode instruction in XuCode.

Cyclically dependent checks for software tamper-proofing
11698950 · 2023-07-11 · ·

Embodiments of the present disclosure relate to anti-tamper computer systems, in particular to methods and systems which can embed protection code into software. Among other things, the protection code helps prevent (and make it more costly) to reverse engineer to tamper with the protected software with malicious intent, such as, but not restricted to: the removal of a license protection mechanism; the removal of code displaying advertisements; the injection of a malicious thread into the program memory space; illicit usage; or any other kind of unauthorized modification of the software.

System and method for adapting executable object to a processing unit

Embodiments are generally directed to a system and method for adapting executable object to a processing unit. An embodiment of a method to adapt an executable object from a first processing unit to a second processing unit, comprises: adapting the executable object optimized for the first processing unit of a first architecture, to the second processing unit of a second architecture, wherein the second architecture is different from the first architecture, wherein the executable object is adapted to perform on the second processing unit based on a plurality of performance metrics collected while the executable object is performed on the first processing unit and the second processing unit.

Methods and apparatus to detect and annotate backedges in a dataflow graph

Disclosed examples to detect and annotate backedges in data-flow graphs include: a characteristic detector to store a node characteristic identifier in memory in association with a first node of a dataflow graph; a characteristic comparator to compare the node characteristic identifier with a reference criterion; and a backedge identifier generator to generate a backedge identifier indicative of a backedge between the first node and a second node of the dataflow graph based on the comparison, the memory to store the backedge identifier in association with a connection arc between the first and second nodes.

Compiler program, compiling method, information processing device

A compiler program causes a computer to execute optimization processing for an optimization target program. The optimization target program includes a loop including a vector store instruction and a vector load instruction for an array variable. The optimization processing includes (1) unrolling the vector store instruction and the vector load instruction in the loop by an unrolling number of times to generate a plurality of unrolled vector store instructions and a plurality of unrolled vector load instructions, and (2) scheduling to move an unrolled vector load instruction among the plurality of unrolled vector load instructions, which is located after a first unrolled vector store instruction that is located at first among the plurality of unrolled vector load instructions, before the first unrolled vector store instruction.