Patent classifications
G06F9/30065
Using loop exit prediction to accelerate or suppress loop mode of a processor
A processor predicts a number of loop iterations associated with a set of loop instructions. In response to the predicted number of loop iterations exceeding a first loop iteration threshold, the set of loop instructions are executed in a loop mode that includes placing at least one component of an instruction pipeline of the processor in a low-power mode or state and executing the set of loop instructions from a loop buffer. In response to the predicted number of loop iterations being less than or equal to a second loop iteration threshold, the set of instructions are executed in a non-loop mode that includes maintaining at least one component of the instruction pipeline in a powered up state and executing the set of loop instructions from an instruction fetch unit of the instruction pipeline.
Processor Core, Processor And Method For Executing A Composite Scalar-Vector Very Lare Instruction Word (VLIW) Instruction
A processor core includes a storage device which stores a composite very large instruction word (VLIW) instruction, an instruction unit which obtains the composite VLIW instruction from the storage device and decodes the composite VLIW instruction to determine an operation to perform, and a composite VLIW instruction execution unit which executes the decoded composite VLIW instruction to perform the operation.
Method and apparatus for permuting streamed data elements
A method is provided that includes receiving, in a permute network, a plurality of data elements for a vector instruction from a streaming engine, and mapping, by the permute network, the plurality of data elements to vector locations for execution of the vector instruction by a vector functional unit in a vector data path of a processor.
ACCESSING DATA IN MULTI-DIMENSIONAL TENSORS
Methods, systems, and apparatus, including an apparatus for processing an instruction for accessing a N-dimensional tensor, the apparatus including multiple tensor index elements and multiple dimension multiplier elements, where each of the dimension multiplier elements has a corresponding tensor index element. The apparatus includes one or more processors configured to obtain an instruction to access a particular element of a N-dimensional tensor, where the N-dimensional tensor has multiple elements arranged across each of the N dimensions, and where N is an integer that is equal to or greater than one; determine, using one or more tensor index elements of the multiple tensor index elements and one or more dimension multiplier elements of the multiple dimension multiplier elements, an address of the particular element; and output data indicating the determined address for accessing the particular element of the N-dimensional tensor.
Arithmetic processing unit and control method for arithmetic processing unit
An arithmetic processing unit includes an instruction decoder which decodes a fetch instruction to issue an execution instruction; a reservation station which temporarily stores the execution instruction; and an arithmetic unit which executes the execution instruction, and the fetch instruction includes a multi-flow instruction which is divided into divided instructions and a single instruction. The instruction decoder includes: a pre-decoder including N number of slots each of which divides the multi-flow instruction into divided instructions; a main decoder including N number of slots each of which decodes the instructions to issue an execution instruction; and a pre-decoder buffer including N−K number of slots each of which temporarily stores instructions in the pre-decoder. The instruction decoder repeats transferring the divided instructions and the single instructions from the slots of the pre-decoder and the slots of the pre-decoder buffer to the main decoder as much as possible in order.
RISC-V BRANCH PREDICTION METHOD, DEVICE, ELECTRONIC DEVICE AND STORAGE MEDIUM
A RISC-V branch prediction method and device, an electronic device and a computer readable storage medium are provided. On the basis of the prior art, the remaining jump times of the jump instruction are additionally acquired, and the single jump step length (the single jump step length is not fixed to be 1) is calculated according to the difference of remaining jump times during two consecutive jumps, whether the target jump instruction has executed the last jump can be judged according to the single jump step length of a jump instruction and in combination with the real-time remaining jump times, so as to determine the jump times that need to be executed subsequently according to the judgment result.
ADVANCED PROCESSOR ARCHITECTURE
The invention relates to a method for processing instructions out-of-order on a processor comprising an arrangement of execution units. The inventive method comprises looking up operand sources in a Register Positioning Table and setting operand input references of the instruction to be issued accordingly, checking for an Execution Unit (EXU) available for receiving a new instruction, and issuing the instruction to the available Execution Unit and entering a reference of the result register addressed by the instruction to be issued to the Execution Unit into the Register Positioning Table (RPT).
INFORMATION PROCESSING DEVICE, COMPILING METHOD, AND NON-TRANSITORY COMPUTER-READABLE RECORDING MEDIUM
An information processing device includes a memory, and a processor coupled to the memory and configured to detect an access pattern according to which a memory reference instruction in a first loop process to be executed posterior to a second loop process accesses first data elements in the memory every loop iteration, and insert a prefetch instruction to the second loop process based on the access pattern, the prefetch instruction being an instruction to transfer at least one of the first data elements from the memory to a first sector of a cache memory, the at least one of the first data elements transferred to the first sector of the cache memory being never cached out by a second data element different from each of the first data elements.
Execution or Write Mask Generation for Data Selection in a Multi-Threaded, Self-Scheduling Reconfigurable Computing Fabric
Representative apparatus, method, and system embodiments are disclosed for configurable computing. A representative system includes an asynchronous packet network having a plurality of data transmission lines forming a data path transmitting operand data; a synchronous mesh communication network; a plurality of configurable circuits arranged in an array, each configurable circuit of the plurality of configurable circuits coupled to the asynchronous packet network and to the synchronous mesh communication network, each configurable circuit of the plurality of configurable circuits adapted to perform a plurality of computations; each configurable circuit of the plurality of configurable circuits comprising: a memory storing operand data; and an execution or write mask generator adapted to generate an execution mask or a write mask identifying valid bits or bytes transmitted on the data path or stored in the memory for a current or next computation.
INSTRUCTION ADDRESS TRANSLATION AND INSTRUCTION PREFETCH ENGINE
Techniques for performing instruction fetch operations are provided. The techniques include determining instruction addresses for a primary branch prediction path; requesting that a level 0 translation lookaside buffer (“TLB”) caches address translations for the primary branch prediction path; determining either or both of alternate control flow path instruction addresses and lookahead control flow path instruction addresses; and requesting that either the level 0 TLB or an alternative level TLB caches address translations for either or both of the alternate control flow path instruction addresses and the lookahead control flow path instruction addresses.