G06F9/381

DETECTING A REPETITIVE PATTERN IN AN INSTRUCTION PIPELINE OF A PROCESSOR TO REDUCE REPEATED FETCHING
20220066782 · 2022-03-03 ·

Exemplary aspects disclosed herein include detecting a repetitive pattern in an instruction pipeline of a processor to reduce repeated fetching. The processor includes a pattern record circuit configured to receive information in a data stream (e.g., instructions or consumed data) in the instruction pipeline. The pattern record circuit includes a first in, first out (FIFO) table circuit that contains an input record column and plurality of additional adjacent record columns. As new data occurs in the data stream, the data record circuit is configured to sequentially record next incoming data from the data stream into a next input entry of an input record column and then shift previously recorded data into adjacent entries of adjacent record columns. The distance between the input record column and the additional record column that has matching data is the distance in the data stream between a reoccurrence of data in the data stream.

Dynamic hammock branch training for branch hammock detection in an instruction stream executing in a processor

Dynamic hammock branch training for branch hammock detection in an instruction stream executing in a processor is disclosed. A branch hammock detection circuit is configured to dynamically detect branch hammocks in an instruction stream during run-time processing of the instruction stream. In response to an identified conditional branch instruction, the branch hammock detection circuit starts a training process for a potential branch hammock predicated by the conditional branch instruction. The branch hammock detection circuit is configured to determine if an identified in-training branch hammock is an actual branch hammock based on setting a potential convergence point as the target address for the conditional branch instruction based on whether the branch is taken or not taken. If an instruction is processed at the set convergence point, this means the set convergence point can be an actual convergence point and the in-training branch hammock can be detected as an actual branch hammock.

SELECTIVELY UPDATING BRANCH PREDICTORS FOR LOOPS EXECUTED FROM LOOP BUFFERS IN A PROCESSOR

Selectively updating branch predictors for loops executed from loop buffers is disclosed herein. In some aspects, a branch predictor update circuit of a processor is configured to detect a loop comprising a plurality of loop instructions in an instruction stream, and to determine that the loop is stored within a loop buffer circuit of the processor. The branch predictor update circuit is further configured to determine a count of potential history register updates to the history register for the plurality of loop instructions, and to determine whether the count of potential history register updates exceeds a size of the history register. The branch predictor update circuit is also configured to, responsive to determining that the count of potential history register updates does not exceed the size of the history register, update a branch predictor of the branch predictor circuit based on the plurality of loop instructions.

Loop management in multi-processor dataflow architecture

Embodiments of the present invention include a computer system that manages execution of one or more programs with one or more loops where each loop having a loop level. Embodiments that manage loops that can skip execution and the number of loops changing during execution are also disclosed. A loop level register (LLEV) stores the loop level for a currently executing loop. A Loop-Back Program Counter Register (LBPR) has a table of one or more Loop-Back Registers. Each Loop-Back Register stores the loop level for a LBPR respective loop and a loop back PC location for the LBPR respective loop. A Program Counter points back to the PC location for each iteration of the loop. A Loop Current Count Register table (LCCR) tracks a number of iterations remaining to executed for of the loop. A loop management process causes one of the CPUs to execute all the one or more instructions of an iteration of the currently executing program loop. When all iterations of the executing loop are complete, the LLEV is decremented to a next loop level that contained the executed loop.

HIGHLY INTEGRATED SCALABLE, FLEXIBLE DSP MEGAMODULE ARCHITECTURE

Disclosed embodiments include an electronic device having a processor core, a memory, a register, and a data load unit to receive a plurality of data elements stored in the memory in response to an instruction. All of the data elements hare the same data size, which is specified by one or more coding bits. The data load unit includes an address generator to generate addresses corresponding to locations in the memory at which the data elements are located, and a formatting unit to format the data elements. The register is configured to store the formatted data elements, and the processor core is configured to receive the formatted data elements from the register.

Processing device and method of controlling processing device
11080063 · 2021-08-03 · ·

A processing device includes an instruction extractor that extracts target instructions intended for a loop process that is repeatedly performed, from instructions decoded by an instruction decoder, and a loop buffer including entries where each of the target instructions extracted by an instruction extractor are stored. An instruction processor stores a target instruction into one of the entries of the loop buffer, and combines target instructions into one target instruction in a case where resources of an instruction execution circuit used by the target instructions do not overlap, to store the one instruction in one of the entries of the loop buffer, and a selector selects the instruction output from the instruction decoder or the target instruction output from the loop buffer, and outputs the selected instruction to the instruction execution circuit.

Loop end prediction using loop counter updated by inflight loop end instructions

In a data processing apparatus loop end prediction is carried out to predict whether a branch represented by a loop end instruction will be taken, branching to the start of the loop for a further iteration to be carried out, or will be not taken leading to the further instructions following the loop. A loop iteration counter at the fetch stage of the apparatus maintains a count on the basis of which the prediction is made. The loop iteration counter is decremented both by loop end instructions reaching the end of the pipeline for which no prediction was made and by later loop end instructions for which a prediction is made, once it has been established that a loop is being executed. This dual counting mechanism allows “shadow” loop end instructions, which were already in the pipeline by the time it was established that a loop is being executed, to be accounted for.

Microarchitectural features for mitigation of differential power analysis and electromagnetic analysis

A processing system with a microarchitectural feature for mitigation of differential power analysis and electromagnetic analysis attacks can include a memory, a processor, and a mitigation response unit. The processor can include an instruction predictor that comprises a storage device for storing metadata associated with corresponding instruction blocks. The mitigation response unit is coupled to the instruction predictor to write and read the metadata associated with the corresponding instruction blocks. The mitigation response unit is configured to determine a mitigation technique for an instruction block based on an electromagnetic or power signature corresponding to execution of the instruction block and metadata associated with the instruction block.

TRACKING STREAMING ENGINE VECTOR PREDICATES TO CONTROL PROCESSOR EXECUTION
20210182210 · 2021-06-17 ·

In a method of operating a computer system, an instruction loop is executed by a processor in which each iteration of the instruction loop accesses a current data vector and an associated current vector predicate. The instruction loop is repeated when the current vector predicate indicates the current data vector contains at least one valid data element and the instruction loop is exited when the current vector predicate indicates the current data vector contains no valid data elements.

Highly integrated scalable, flexible DSP megamodule architecture

Disclosed embodiments include a data processing apparatus having a processing core, a memory, and a streaming engine. The streaming engine is configured to receive a plurality of data elements stored in the memory and to provide the plurality of data elements as a data stream to the processing core, and includes an address generator to generate addresses corresponding to locations in the memory, a buffer to store the data elements received from the locations in the memory corresponding to the generated addresses, and an output to supply the data elements received from the memory to the processing core as the data stream.