Patent classifications
G06F8/441
Deferred GPR allocation for texture/load instruction block
A graphics processing unit (GPU) utilizes block general purpose registers (bGPRs) to load multiple waves of samples for an instruction group into a processing pipeline and receive processed samples from the pipeline. The GPU acquires a credit for the bGPR for execution of the instruction group for a first wave using a persistent GPR and the bGPR. The GPU refunds the credit upon loading the first wave into the pipeline. The GPU executes a subsequent wave for the instruction group to load samples to the pipeline when at least one credit is available and the pipeline is processing the first wave. The GPU stores an indication of each wave that has been loaded into the pipeline in a queue. The GPU returns samples for a next wave in the queue from the pipeline to the bGPR for further processing when the physical slot of the bGPR is available.
Pre-instruction scheduling rematerialization for register pressure reduction
Examples are disclosed herein that relate to performing rematerialization operation(s) on program source code prior to instruction scheduling. In one example, a method includes prior to performing instruction scheduling on program source code, for each basic block of the program source code, determining a register pressure at a boundary of the basic block, determining whether the register pressure at the boundary is greater than a target register pressure, based on the register pressure at the boundary being greater than the target register pressure, identifying one or more candidate instructions in the basic block suitable for rematerialization to reduce the register pressure at the boundary, and performing a rematerialization operation on at least one of the one or more candidate instructions to reduce the register pressure at the boundary to be less than the target register pressure.
Method of Using Multidimensional Blockification To Optimize Computer Program and Device Thereof
Disclosed embodiments relate to a method and device for optimizing compilation of source code. The proposed method receives a first intermediate representation code of a source code and analyses each basic block instruction of the plurality of basic block instructions contained in the first intermediate representation code for blockification. In order to blockify the identical instructions, the one or more groups of basic block instructions are assessed for eligibility of blockification. Upon determining as eligible, the group of basic block instructions are blockified using one of one dimensional SIMD vectorization and two-dimensional SIMD vectorization. The method further generates a second intermediate representation of the source code which is translated to executable target code with more efficient processing capacity.
GENERAL PURPOSE REGISTER HIERARCHY SYSTEM AND METHOD
A processing unit includes a first memory device and a second memory device. The first memory device includes a first plurality of general purpose registers (GPRs) and the second memory device includes a second plurality of GPRs. The second memory device includes fewer GPRs than the first memory device. Program data is stored at the first memory device and the second memory device based on expected frequency of accesses associated with the program data.
INFORMATION PROCESSING APPARATUS FOR DATA WRITING AND COMPUTER-READABLE RECORDING MEDIUM HAVING STORED THEREIN CONTROL PROGRAM
An apparatus includes: a memory and a processor that: obtains, when an operating system detects a first writing process writing first data into a region in a process space in which region a file stored in a storage is mapped, a first size of the first data from information recording a data size of target data for a target address for each of writing processes; reads, when the first size is less than a threshold, a second data corresponding to the first data and having a second size larger than the first size from the file stored in the storage into the memory; rewrites part of the second data stored in a writing region of the second data in the memory with the first data; and writes third data in the writing region into the file stored in the storage, the third data being a result of the rewriting.
SYSTEMS AND METHODS FOR REDUCING REGISTER BANK CONFLICTS BASED ON SOFTWARE HINT AND HARDWARE THREAD SWITCH
Mechanisms for reducing register bank conflicts based on software hint and hardware thread switch are disclosed. In some embodiments, an apparatus for thread switching includes a graphics processing unit (GPU) that includes a plurality of register banks to store operands that are assigned at least partially to avoid register bank conflicts. A decoding circuitry checks a thread switching field of a first instruction to be executed by a first thread. The GPU performs a thread switch mechanism to cause a second instruction to be executed by a second thread when the thread switching field of the first instruction is set.
Dynamic taint tracking on mobile devices
Taint is dynamically tracked on a mobile device. Taint virtual instructions are added to virtual instructions of a control-flow graph (CFG). A taint virtual instruction has a taint operand that corresponds to an operand of a virtual instruction and has a taint output that corresponds to an output of the virtual instruction in a block of the CFG. Registers are allocated for the taint virtual instruction and the virtual instructions. After register allocation, the taint virtual instruction and the virtual instructions are converted to native code, which is executed to track taint on the mobile device.
Method of using multidimensional blockification to optimize computer program and device thereof
Disclosed embodiments relate to a method and device for optimizing compilation of source code. The proposed method receives a first intermediate representation code of a source code and analyses each basic block instruction of the plurality of basic block instructions contained in the first intermediate representation code for blockification. In order to blockify the identical instructions, the one or more groups of basic block instructions are assessed for eligibility of blockification. Upon determining as eligible, the group of basic block instructions are blockified using one of one dimensional SIMD vectorization and two-dimensional SIMD vectorization. The method further generates a second intermediate representation of the source code which is translated to executable target code with more efficient processing capacity.
Large lookup tables for an image processor
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for supporting large lookup tables on an image processor. One of the methods includes receiving an input kernel program for an image processor having a two-dimensional array of execution lanes, a shift-register array, and a plurality of memory banks. If the kernel program has an instruction that reads a lookup table value for a lookup table partitioned across the plurality of memory banks, the instruction in the kernel program are replaced with a sequence of instructions that, when executed by an execution lane, causes the execution lane to read a first value from a local memory bank and a second value from the local memory bank on behalf of another execution lane belonging to a different group of execution lanes.
Non-transitory computer-readable medium, assembly instruction conversion method and information processing apparatus
A non-transitory computer-readable medium having stored therein a program for causing a computer to execute a process. The process includes storing a plurality of generation instructions in a storage area for each of a plurality of first assembly instructions, each generation instruction instructing the generation of a machine language of a second assembly instruction that executes processing equivalent to each first assembly instruction, and generating machine languages of a plurality of second assembly instructions so that the machine languages of the second assembly instructions having a dependency relationship do not appear adjacent to each other, according to the plurality of generation instructions in the storage area.