G06F9/30069

METHOD PERFORMED BY A MICROCONTROLLER FOR MANAGING A NOP INSTRUCTION AND CORRESPONDING MICROCONTROLLER
20230195460 · 2023-06-22 ·

Disclosed herein is a method for managing of NOP instructions in a microcontroller, the method comprising duplicating all jump instructions causing a NOP instruction to form a new instruction set; inserting an internal NOP instruction into each of the jump instructions; when a jump instruction is executed, executing a subsequent instruction of the new instruction set; and executing the internal NOP instruction when an execution of the subsequent instruction is skipped.

Skip instruction to skip a number of instructions on a predicate
09830153 · 2017-11-28 · ·

A pipelined run-to-completion processor executes a conditional skip instruction. If a predicate condition as specified by a predicate code field of the skip instruction is true, then the skip instruction causes execution of a number of instructions following the skip instruction to be “skipped”. The number of instructions to be skipped is specified by a skip count field of the skip instruction. In some examples, the skip instruction includes a “flag don't touch” bit. If this bit is set, then neither the skip instruction nor any of the skipped instructions can change the values of the flags. Both the skip instruction and following instructions to be skipped are decoded one by one in sequence and pass through the processor pipeline, but the execution stage is prevented from carrying out the instruction operation of a following instruction if the predicate condition of the skip instruction was true.

Sparse systolic array design

A systolic array can be configured to skip distributed operands that have zero-values, resulting in improved resource efficiency. A skip module is introduced to receive operands from memory, identify whether they have a zero value or not, and, if they are nonzero, generate an operand vector including an index before sending the operand vector to a processing element.

Control flow in a thread-based environment without branching

A method for computing in a thread-based environment provides manipulating an execution mask to enable and disable threads when executing multiple conditional function clauses for process instructions. Execution lanes are controlled based on execution participation for the process instructions for reducing resource consumption. Execution of particular one or more schedulable structures that include multiple process instructions are skipped based on the execution mask and activating instructions.

RISC-V BRANCH PREDICTION METHOD, DEVICE, ELECTRONIC DEVICE AND STORAGE MEDIUM

A RISC-V branch prediction method and device, an electronic device and a computer readable storage medium are provided. On the basis of the prior art, the remaining jump times of the jump instruction are additionally acquired, and the single jump step length (the single jump step length is not fixed to be 1) is calculated according to the difference of remaining jump times during two consecutive jumps, whether the target jump instruction has executed the last jump can be judged according to the single jump step length of a jump instruction and in combination with the real-time remaining jump times, so as to determine the jump times that need to be executed subsequently according to the judgment result.

MICROPROCESSOR WITH HIGH-EFFICIENCY DECODING OF COMPLEX INSTRUCTIONS
20210389947 · 2021-12-16 ·

Microcode combination of complex instructions is shown. A microprocessor includes an instruction queue, an instruction decoder, and a microcode controller. The instruction decoder is coupled to the instruction queue. The microcode controller is coupled to the instruction decoder and has a memory. The memory stores a combined microcode for M complex instructions arranged in a specific order, where M is an integer greater than 1. When the M complex instructions in the specific order have popped out of the first to M-th entries of the instruction queue, the instruction decoder operates the microcode controller to read the memory for the combined microcode with microcode reading trapping happened just once.

Microprocessor with high-efficiency decoding of complex instructions

Microcode combination of complex instructions is shown. A microprocessor includes an instruction queue, an instruction decoder, and a microcode controller. The instruction decoder is coupled to the instruction queue. The microcode controller is coupled to the instruction decoder and has a memory. The memory stores a combined microcode for M complex instructions arranged in a specific order, where M is an integer greater than 1. When the M complex instructions in the specific order have popped out of the first to M-th entries of the instruction queue, the instruction decoder operates the microcode controller to read the memory for the combined microcode with microcode reading trapping happened just once.

NPU implemented for artificial neural networks to process fusion of heterogeneous data received from heterogeneous sensors
11731656 · 2023-08-22 · ·

A neural processing unit (NPU) includes a controller including a scheduler, the controller configured to receive from a compiler a machine code of an artificial neural network (ANN) including a fusion ANN, the machine code including data locality information of the fusion ANN, and receive heterogeneous sensor data from a plurality of sensors corresponding to the fusion ANN; at least one processing element configured to perform fusion operations of the fusion ANN including a convolution operation and at least one special function operation; a special function unit (SFU) configured to perform a special function operation of the fusion ANN; and an on-chip memory configured to store operation data of the fusion ANN, wherein the schedular is configured to control the at least one processing element and the on-chip memory such that all operations of the fusion ANN are processed in a predetermined sequence according to the data locality information.

Non-transitory computer-readable recording medium, compilation method, and compiler device
11734003 · 2023-08-22 · ·

The present disclosure relates to a compiler for causing a computer to execute a process. The process includes generating a first program, wherein the first program includes a first code that determines whether a first area of a memory that a process inside a loop included in a second program refers to in a first execution time of the loop is in duplicate with a second area of the memory that the process refers to in a second execution time of the loop, a second code that executes the process in an order of the first and second execution times when it is determined that the first and the second areas are duplicate, and a third code that executes the process for the first execution time and the process for the second execution time in parallel when it is determined that the first and the second areas are not duplicate.

Scatter and Gather Streaming Data through a Circular FIFO
20220012201 · 2022-01-13 ·

Systems, apparatuses, and methods for performing scatter and gather direct memory access (DMA) streaming through a circular buffer are described. A system includes a circular buffer, producer DMA engine, and consumer DMA engine. After the producer DMA engine writes or skips over a given data chunk of a first frame to the buffer, the producer DMA engine sends an updated write pointer to the consumer DMA engine indicating that a data credit has been committed to the buffer and that the data credit is ready to be consumed. After the consumer DMA engine reads or skips over the given data chunk of the first frame from the buffer, the consumer DMA engine sends an updated read pointer to the producer DMA engine indicating that the data credit has been consumed and that space has been freed up in the buffer to be reused by the producer DMA engine.