G06F9/381

Using loop exit prediction to accelerate or suppress loop mode of a processor

A processor predicts a number of loop iterations associated with a set of loop instructions. In response to the predicted number of loop iterations exceeding a first loop iteration threshold, the set of loop instructions are executed in a loop mode that includes placing at least one component of an instruction pipeline of the processor in a low-power mode or state and executing the set of loop instructions from a loop buffer. In response to the predicted number of loop iterations being less than or equal to a second loop iteration threshold, the set of instructions are executed in a non-loop mode that includes maintaining at least one component of the instruction pipeline in a powered up state and executing the set of loop instructions from an instruction fetch unit of the instruction pipeline.

PROVIDING REFERENCES TO PREVIOUSLY DECODED INSTRUCTIONS OF RECENTLY-PROVIDED INSTRUCTIONS TO BE EXECUTED BY A PROCESSOR

Providing references to previously decoded instructions of recently-provided instructions to be executed by a processor is disclosed herein. In one aspect, a low resource micro-operation controller is provided. Responsive to an instruction pipeline receiving an instruction address, the low resource micro-operation controller is configured to determine if the received instruction address corresponds to an instruction address in short history table. Short history table includes instruction addresses of recently-provided instructions having micro-ops in a post-decode queue. If the received instruction address corresponds to an instruction address in short history table, the low resource micro-operation controller is configured to provide reference (e.g., pointer) to the fetch stage that corresponds to an entry in the post-decode queue in which the micro-ops corresponding to the instruction address are stored. Responsive to the decode stage receiving the reference, the low resource micro-operation controller is configured to provide the micro-ops from the post-decode queue for execution.

Method and apparatus for permuting streamed data elements

A method is provided that includes receiving, in a permute network, a plurality of data elements for a vector instruction from a streaming engine, and mapping, by the permute network, the plurality of data elements to vector locations for execution of the vector instruction by a vector functional unit in a vector data path of a processor.

Arithmetic processing unit and control method for arithmetic processing unit
11249763 · 2022-02-15 · ·

An arithmetic processing unit includes an instruction decoder which decodes a fetch instruction to issue an execution instruction; a reservation station which temporarily stores the execution instruction; and an arithmetic unit which executes the execution instruction, and the fetch instruction includes a multi-flow instruction which is divided into divided instructions and a single instruction. The instruction decoder includes: a pre-decoder including N number of slots each of which divides the multi-flow instruction into divided instructions; a main decoder including N number of slots each of which decodes the instructions to issue an execution instruction; and a pre-decoder buffer including N−K number of slots each of which temporarily stores instructions in the pre-decoder. The instruction decoder repeats transferring the divided instructions and the single instructions from the slots of the pre-decoder and the slots of the pre-decoder buffer to the main decoder as much as possible in order.

INFORMATION PROCESSING DEVICE, COMPILING METHOD, AND NON-TRANSITORY COMPUTER-READABLE RECORDING MEDIUM

An information processing device includes a memory, and a processor coupled to the memory and configured to detect an access pattern according to which a memory reference instruction in a first loop process to be executed posterior to a second loop process accesses first data elements in the memory every loop iteration, and insert a prefetch instruction to the second loop process based on the access pattern, the prefetch instruction being an instruction to transfer at least one of the first data elements from the memory to a first sector of a cache memory, the at least one of the first data elements transferred to the first sector of the cache memory being never cached out by a second data element different from each of the first data elements.

Repeat Instruction for Loading and/or Executing Code in a Claimable Repeat Cache a Specified Number of Times

A processor is disclosed including: a barrel-threaded execution unit for executing concurrent threads, and a repeat cache shared between the concurrent threads. The processor's instruction set includes a repeat instruction which takes a repeat count operand. When the repeat cache is not claimed and the repeat instruction is executed in a first thread, a portion of code is cached from the first thread into the repeat cache, the state of the repeat cache is changed to record it as claimed, and the cached code is executed a number of times. When the repeat instruction is then executed in a further thread, then the already-cached portion of code is again executed a respective number of times, each time from the repeat cache. For each of the first and further instructions, the repeat count operand in the respective instruction specifies the number of times to execute the cached code.

Automatic compiler dataflow optimization to enable pipelining of loops with local storage requirements

Systems, apparatuses and methods may provide for technology that detects one or more local variables in source code, wherein the local variable(s) lack dependencies across iterations of a loop in the source code, automatically generate pipeline execution code for the local variable(s), and incorporate the pipeline execution code into an output of a compiler. In one example, the pipeline execution code includes an initialization of a pool of buffer storage for the local variable(s).

Processing method, device, equipment and storage medium of loop instruction

The present application discloses a processing method, device, equipment and storage medium of a loop instruction, and relates to the fields of voice and chips. A specific embodiment is: acquiring a computer program including a first loop body, where the first loop body is generated according to a second loop body in a software code to be compiled, the first loop body includes a plurality of first loop instructions, the plurality of first loop instructions can be identified by a hardware structure of a computer device; in the case that the first loop body is detected, determining loop parameters of the first loop body according to the plurality of first loop instructions; acquiring the plurality of first loop instructions according to the loop parameters of the first loop body; executing the plurality of first loop instructions.

ARTIFICIAL INTELLIGENCE COMPUTING DEVICE AND RELATED PRODUCT
20220156077 · 2022-05-19 ·

The invention provides an artificial intelligence computing device and a related product. The artificial intelligence computing device is used for executing machine learning computation. According to the device of the invention, for the instructions in the more than two instruction sets forming the loop body, the same operation code in the operation code storage area is used for the repeated instructions, so that the storage space of the operation code is saved, the code amount of each instruction in the instruction set in the second time slice can be reduced, the instruction storage space can also be saved, and the operation efficiency is improved.

Method and apparatus for dual issue multiply instructions

A method is provided that includes performing, by a processor in response to a dual issue multiply instruction, multiplication of operands of the dual issue multiply instruction using multiplication units comprised in a data path of the processor and configured to operate together to determine a product of the operands, and storing, by the processor, the product in a storage location indicated by the dual issue multiply instruction.