Patent classifications
G06F9/30138
Method of using multidimensional blockification to optimize computer program and device thereof
Disclosed embodiments relate to a method and device for optimizing compilation of source code. The proposed method receives a first intermediate representation code of a source code and analyses each basic block instruction of the plurality of basic block instructions contained in the first intermediate representation code for blockification. In order to blockify the identical instructions, the one or more groups of basic block instructions are assessed for eligibility of blockification. Upon determining as eligible, the group of basic block instructions are blockified using one of one dimensional SIMD vectorization and two-dimensional SIMD vectorization. The method further generates a second intermediate representation of the source code which is translated to executable target code with more efficient processing capacity.
Storage devices mapped to registers and mapping methods thereof
A storage device, which is coupled to a host and a first register, includes a first mapping register, a shadow register, and a controller. The first mapping register is configured to store the first address of the first register. The shadow register includes a first shadow section mapped to a register section of the first register. The controller receives an initialization instruction generated by the host to write the first address into the first mapping register so that the first shadow section is mapped to the first register section.
HIERARCHICAL GENERAL REGISTER FILE (GRF) FOR EXECUTION BLOCK
In an example, an apparatus comprises a plurality of execution units, and a first general register file (GRF) communicatively couple to the plurality of execution units, wherein the first GRF is shared by the plurality of execution units. Other embodiments are also disclosed and claimed.
Non-Cached Loads and Stores in a System Having a Multi-Threaded, Self-Scheduling Processor
Representative apparatus, method, and system embodiments are disclosed for a self-scheduling processor which also provides additional functionality. Representative embodiments include a self-scheduling processor, comprising: a processor core adapted to execute instructions; and a core control circuit adapted to automatically schedule an instruction for execution by the processor core in response to a received work descriptor data packet. In a representative embodiment, the processor core is further adapted to execute a non-cached load instruction to designate a general purpose register rather than a data cache for storage of data received from a memory circuit. The core control circuit is also adapted to schedule a fiber create instruction for execution by the processor core, and to generate one or more work descriptor data packets to another circuit for execution of corresponding execution threads. Event processing, data path management, system calls, memory requests, and other new instructions are also disclosed.
Systems and methods for synchronizing data processing in a cellular modem
A cellular modem processor can include dedicated processing engines that implement specific, complex data processing operations. The processing engines can be arranged in pipelines, with different processing engines executing different steps in a sequence of operations. Flow control or data synchronization between pipeline stages can be provided using a hybrid of firmware-based flow control and hardware-based data dependency management. Firmware instructions can define data flow by reference to a virtual address space associated with pipeline buffers. A hardware interlock controller within the pipeline can track and enforce the data dependencies for the pipeline.
Technique for Overriding Memory Attributes
Techniques are disclosed relating to an apparatus that includes a plurality of memory access control registers that are programmable with respective address ranges within an address space. The apparatus further includes a memory access circuit configured to receive a command for performing a memory access, the command specifying an address corresponding to a location in a memory circuit. In response to the address being located within an address range of a particular one of the plurality of memory access control registers, the memory access circuit is configured to perform the command using override memory parameters that have been programmed into the particular memory access control register instead of a default set of attributes for the address space.
Data processing
Data processing circuitry comprises out-of-order instruction execution circuitry; register mapping circuitry to map zero or more architectural processor registers relating to execution of that program instruction to respective ones of a set of physical processor registers; commit circuitry to commit, in a program code order, the results of executed program instructions, the commit circuitry being configured to access a data store which stores register tag data to indicate which physical registers mapped by the register mapping circuitry relate to a given program instruction; fault detection circuitry to detect a memory access fault in respect of a vector memory access operation and to generate fault indication data indicative of an element earliest in the element order for which a memory access fault was detected; a fault indication register to store the fault indication data, in which the register mapping circuitry is configured to generate a register mapping for a program instruction for any architectural processor registers relating to execution of that program instruction other than the fault indication register; and control circuitry to encode the fault indication data, applicable to a program instruction not yet committed by the commit circuitry, to register tag data associated with that program instruction.
Method of Using Multidimensional Blockification To Optimize Computer Program and Device Thereof
Disclosed embodiments relate to a method and device for optimizing compilation of source code. The proposed method receives a first intermediate representation code of a source code and analyses each basic block instruction of the plurality of basic block instructions contained in the first intermediate representation code for blockification. In order to blockify the identical instructions, the one or more groups of basic block instructions are assessed for eligibility of blockification. Upon determining as eligible, the group of basic block instructions are blockified using one of one dimensional SIMD vectorization and two-dimensional SIMD vectorization. The method further generates a second intermediate representation of the source code which is translated to executable target code with more efficient processing capacity.
Systems and methods for increased bandwidth utilization regarding irregular memory accesses using software pre-execution
Systems and methods are configured to receive code containing an original loop that includes irregular memory accesses. The original loop can be split. A pre-execution loop that contains code to prefetch content of the memory can be generated. Execution of the pre-execution loop can access memory inclusively between a starting location and the starting location plus a prefetch distance. A modified loop that can perform at least one computation based on the content prefetched with execution of the pre-execution loop can be generated. Execution of the main loop can to follow the execution of the pre-execution loop. The original loop can be replaced with the pre-execution loop and the modified loop.
OPPORTUNISTIC WRITE-BACK DISCARD OF SINGLE-USE VECTOR REGISTER VALUES
A method for performing opportunistic write-back discard of single-use vector register values. The method includes executing instructions of a GPU in a default mode, detecting a beginning of a single-use section that includes instructions that produce single-use vector register values, and executing instructions in a single-use mode. The method includes discarding the write-back of a single-use vector register value if the single-use value gets forwarded either via a bypass path or via register file cache. The method includes inserting hint instructions into an executable program code that demarcates single-use sections. A system includes a microprocessor to execute instructions in the default mode. The microprocessor detects a beginning and an ending of a single-use section that includes instructions that produce single-use vector register values.