Patent classifications
G06F9/381
Decoupled processor instruction window and operand buffer
A processor core in an instruction block-based microarchitecture is configured so that an instruction window and operand buffers are decoupled for independent operation in which instructions in the block are not tied to resources such as control bits and operands that are maintained in the operand buffers. Instead, pointers are established among instructions in the block and the resources so that control state can be established for a refreshed instruction block (i.e., an instruction block that is reused without re-fetching it from an instruction cache) by following the pointers. Such decoupling of the instruction window from the operand space can provide greater processor efficiency, particularly in multiple core arrays where refreshing is utilized (for example when executing program code that uses tight loops), because the operands and control bits are pre-validated.
MEMORY-ADAPTIVE PROCESSING METHOD FOR CONVOLUTIONAL NEURAL NETWORK
A memory-adaptive processing method for a convolutional neural network includes a feature map counting step, a size relation counting step and a convolution calculating step. The feature map counting step is for counting a number of a plurality of input channels of a plurality of input feature maps, an input feature map tile size, a number of a plurality of output channels of a plurality of output feature maps and an output feature map tile size for a convolutional layer operation. The size relation counting step is for obtaining a cache free space size in a feature map cache and counting a size relation. The convolution calculating step is for performing the convolutional layer operation with the input feature maps to produce the output feature maps according to a memory-adaptive processing technique, and the memory-adaptive processing technique includes a dividing step and an output-group-first processing step.
USING LOOP EXIT PREDICTION TO ACCELERATE OR SUPPRESS LOOP MODE OF A PROCESSOR
A processor predicts a number of loop iterations associated with a set of loop instructions. In response to the predicted number of loop iterations exceeding a first loop iteration threshold, the set of loop instructions are executed in a loop mode that includes placing at least one component of an instruction pipeline of the processor in a low-power mode or state and executing the set of loop instructions from a loop buffer. In response to the predicted number of loop iterations being less than or equal to a second loop iteration threshold, the set of instructions are executed in a non-loop mode that includes maintaining at least one component of the instruction pipeline in a powered up state and executing the set of loop instructions from an instruction fetch unit of the instruction pipeline.
DELIVERING IMMEDIATE VALUES BY USING PROGRAM COUNTER (PC)-RELATIVE LOAD INSTRUCTIONS TO FETCH LITERAL DATA IN PROCESSOR-BASED DEVICES
Delivering immediate values by using program counter (PC)-relative load instructions to fetch literal data in processor-based devices is disclosed. In this regard, a processing element (PE) of a processor-based device provides an execution pipeline circuit that comprises an instruction processing portion and a data access portion. Using a literal data access logic circuit, the PE detects a PC-relative load instruction within a fetch window that includes multiple fetched instructions. The PE determines that the PC-relative load instruction can be serviced using literal data that is available to the instruction processing portion of the execution pipeline circuit (e.g., located within the fetch window containing the PC-relative load instruction, or stored in a literal pool buffer), The PE then retrieves the literal data within the instruction processing portion of the execution pipeline circuit, and executes the PC-relative load instruction using the literal data.
Method and apparatus to control the use of hierarchical branch predictors based on the effectiveness of their results
According to one general aspect, an apparatus may include a main-branch target buffer (BTB). The apparatus may include a micro-BTB separate from and smaller than the main-BTB, and configured to produce prediction information associated with a branching instruction. The apparatus may include a micro-BTB confidence counter configured to measure a correctness of the prediction information produced by the micro-BTB. The apparatus may further include a micro-BTB misprediction rate counter configured to measure a rate of mispredictions produced by the micro-BTB. The apparatus may also include a micro-BTB enablement circuit configured to enable a usage of the micro-BTB's prediction information, based, at least in part, upon the values of the micro-BTB confidence counter and the micro-BTB misprediction rate counter.
LOOP EXIT PREDICTOR
Disclosed embodiments relate to systems and methods structured to predict a loop exit. In one example, a processor includes a branch prediction unit to determine a loop exit predictor start corresponding to a finite consistent loop, and an instruction decoder queue to: receive an iteration of the finite consistent loop corresponding to a loop exit predictor and an iteration count, replay one or more instructions of the iteration based on the iteration count, and switch to post-loop instructions responsive to a determination that a number of iterations of the finite consistent loop is equal to the iteration count.
TWO ADDRESS TRANSLATIONS FROM A SINGLE TABLE LOOK-ASIDE BUFFER READ
A streaming engine employed in a digital data processor specifies a fixed read only data stream. An address generator produces virtual addresses of data elements. An address translation unit converts these virtual addresses to physical addresses by comparing the most significant bits of a next address N with the virtual address bits of each entry in an address translation table. Upon a match, the translated address is the physical address bits of the matching entry and the least significant bits of address N. The address translation unit can generate two translated addresses. If the most significant bits of address N+1 match those of address N, the same physical address bits are used for translation of address N+1. The sequential nature of the data stream increases the probability that consecutive addresses match the same address translation entry and can use this technique.
STREAMING ENGINE WITH EARLY EXIT FROM LOOP LEVELS SUPPORTING EARLY EXIT LOOPS AND IRREGULAR LOOPS
A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. Upon a stream break instruction specifying one of the nested loops, the stream engine ends a current iteration of the loop. If the specified loop was not the outermost loop, the streaming engine begins an iteration of a next outer loop. If the specified loop was the outermost nested loop, the streaming engine ends the stream. The streaming engine places a vector of data elements in order in lanes within a stream head register. A stream break instruction is operable upon a vector break.
PROVIDING CODE SECTIONS FOR MATRIX OF ARITHMETIC LOGIC UNITS IN A PROCESSOR
The present invention relates to a processor having a trace cache and a plurality of ALUs arranged in a matrix, comprising an analyser unit located between the trace cache and the ALUs, wherein the analyser unit analyses the code in the trace cache, detects loops, transforms the code, and issues to the ALUs sections of the code combined to blocks for joint execution for a plurality of clock cycles.
Tracking streaming engine vector predicates to control processor execution
In a method of operating a computer system, an instruction loop is executed by a processor in which each iteration of the instruction loop accesses a current data vector and an associated current vector predicate. The instruction loop is repeated when the current vector predicate indicates the current data vector contains at least one valid data element and the instruction loop is exited when the current vector predicate indicates the current data vector contains no valid data elements.