Patent classifications
G06F8/441
Register pressure target function splitting
Provided are embodiments for a method of performing register pressure targeted function splitting. The method can include determining a candidate region of a function, the candidate region comprising variables, and determining a number of available registers in a computing system for allocating the variables of the function. The method can also include grouping the variables in the candidate region into first variables and second variables based at least in part on the number of available registers, and splitting the candidate region of the function into split functions based at least in part on the grouping of the variables. Also provided are embodiments for a computer program product and a system for performing register pressure targeted function splitting.
COMPILER-BASED INPUT SYNCHRONIZATION FOR PROCESSOR WITH VARIANT STAGE LATENCIES
The technology disclosed provides a system that comprises a processor with computing units on an integrated circuit substrate. The processor is configured to map a program across multiple hardware stages with each hardware stage executing a corresponding operation of the program at a different stage latency dependent on an operation type and an operand format. The system further comprises a runtime logic that configures the compute units with configuration data. The configuration data causes first and second producer hardware stages in a given compute unit to execute first and second data processing operations and produce first and second outputs at first and second stage latencies, and synchronizes consumption of the first and second outputs by a consumer hardware stage in the given compute unit for execution of a third data processing operation by introducing a register storage delay that compensates for a difference between the first and second stage latencies.
Systems and Methods for Using Error Correction and Pipelining Techniques for an Access Triggered Computer Architecture
A method for improving performance of an access triggered architecture for a computer implemented application is provided. The method first executes typical operations of the access triggered architecture according to an execution time, wherein the typical operations comprise: obtaining a dataset and an instruction set; and using the instruction set to transmit the dataset to a functional block associated with an operation, wherein the functional block performs the operation using the dataset to generate a revised dataset. The method further creates a pipeline of the typical operations to reduce the execution time of the typical operations, to create a reduced execution time; and executes the typical operations according to the reduced execution time, using the pipeline.
Method and apparatus for performing register allocation
A method is provided of performing register allocation for at least one program code module. The method includes constructing a restriction graph for program variables within at least one program instruction, and determining whether the constructed restriction graph is colorable. If it is determined that the constructed restriction graph is not colorable, then the method determines whether at least one alternative form of the at least one program instruction is available, and modifies the at least one program instruction to comprise an alternative form if it is determined that at least one alternative form is available.
PROCESSOR THAT INCLUDES A SPECIAL STORE INSTRUCTION USED IN REGIONS OF A COMPUTER PROGRAM WHERE MEMORY ALIASING MAY OCCUR
Processor hardware detects when memory aliasing occurs, and assures proper operation of the code even in the presence of memory aliasing. The processor defines a special store instruction that is different from a regular store instruction. The special store instruction is used in regions of the computer program where memory aliasing may occur. Because the hardware can detect and correct for memory aliasing, this allows a compiler to make optimizations such as register promotion even in regions of the code where memory aliasing may occur.
Data processing method, computer readable medium and data processing device
A data processing method, a computer readable medium, and a data processing device capable of improving processing efficiency are provided. A storage destination of sub-read blocks is changed to a high-speed small-capacity memory on a high layer by adding a shape attribute in an attribute group for data blocks, adding a memory access monitoring unit for obtaining the shape attribute of a data block to the configuration of a data processing device, obtaining the shape attribute of the non-rectangular read block by executing a program on a trial basis, and propagating this shape attribute in a direction opposite to a data flow or a process flow within the program.
Reconfiguration of address space based on loading short pointer mode application
A short pointer mode application has been loaded. Based on determining that the short pointer mode application has been loaded, an address space configured for a long pointer mode environment is reconfigured. The address space has one portion addressable by short pointers of a defined size and another portion addressable by long pointers of another defined size, and the reconfiguring includes obtaining a long pointer library, and loading the long pointer library in the one portion of the address space addressable by short pointers.
Automatic generation of efficient vector code with low overhead in a time-efficient manner independent of vector width
A computing system includes a compatibility graph builder to generate a compatibility graph based on a dependency graph representing program source code, where the compatibility graph indicates compatibility relationships between operations represented in the dependency graph, a clique generator coupled with the compatibility graph builder to generate a set of candidate vector packings based on the compatibility relationships indicated in the compatibility graph, a set cover generator coupled with the clique generator to select a subset of vector packings from the set of candidate vector packings, and a vector code generator coupled with the set cover generator to generate the vector code based on the selected subset of vector packings.
METHOD AND SYSTEM FOR OPTIMIZING ACCESS TO CONSTANT MEMORY
The disclosed systems, structures, and methods are directed to optimizing memory access to constants in heterogeneous parallel computers, including systems that support OpenCL. This is achieved in an optimizing compiler that transforms program scope constants and constants at the outermost scope of kernels into implicit constant pointer arguments. The optimizing compiler also attempts to determine access patterns for constants at compile-time and places the constants in a variety of memory types available in a compute device architecture based on these access patterns.
COMPILE METHOD, NON-TRANSITORY COMPUTER-READABLE RECORDING MEDIUM STORING COMPILE PROGRAM, AND INFORMATION PROCESSING DEVICE
An information processing device includes: a memory that stores a program; and a processor that executes the program to perform operations, wherein the operations includes: specifying a first register which is allocated to scalar data and satisfies a condition that a survival interval of the scalar data includes a survival interval of first data to which any register is not allocated; and allocating an empty area of the first register to the first data.