IPIQ

G06F9/30181

Indirect control flow instructions and inhibiting data value speculation

11256513 · 2022-02-22 ·

Arm Limited

There is provided an apparatus that includes input circuitry to receive input data and output circuitry to output a sequence of instructions to be executed by data processing circuitry. Generation circuitry performs a generation process to generate the sequence of instructions using the input data. The sequence of instructions comprises an indirect control flow instruction having a field that indicates where a target of the indirect control flow instruction is stored. The generation process causes at least one of the instructions in the sequence of instructions to store a state of control flow speculation after execution of the indirect control flow instruction. The at least one of the instructions in the sequence of instructions that stores the state of control flow speculation is inhibited from being subject to data value speculation by the data processing circuitry.

Microprocessor that fuses if-then instructions

09792121 · 2017-10-17 ·

Via Technologies, Inc.

A microprocessor includes an instruction translation unit that extracts condition information from the IT instruction and fuses the IT instruction with the first IT block instruction. For each instruction of the IT block, the instruction translation unit: determines a respective condition for the IT block instruction using the condition information extracted from the IT instruction and translates the IT block instruction into a microinstruction. The microinstruction includes the respective condition. Execution units conditionally execute the microinstruction based on the respective condition. For each IT block instruction, the instruction translation unit determines a respective state value using the extracted condition information. The state value comprises the lower eight bits of the IT instruction having the lower five bits left-shifted by N-1 bits, where N indicates a position of the IT block instruction in the IT block.

Instruction fusion after register rename

11256509 · 2022-02-22 ·

International Business Machines Corporation

Embodiments of the present invention include methods, systems, and computer program products for implementing instruction fusion after register rename. A computer-implemented method includes receiving, by a processor, a plurality of instructions at an instruction pipeline. The processor can further performing a register rename within the instruction pipeline in response to the received plurality of instructions. The processor can further determine that two or more of the plurality of instructions can be fused after the register rename. The processor can further fuse the two or more instructions that can be fused based on the determination to create one or more fused instructions. The processor can further perform an execution stage within the instruction pipeline to execute the plurality of instructions, including the one or more fused instructions.

Instruction and logic for a logical move in an out-of-order processor

09823925 · 2017-11-21 ·

Intel Corporation

A processor includes allocation unit with logic to receive a logical move instruction. The logical move instruction includes a source logical register as a source parameter and a destination logical register as a destination parameter. The source logical register is assigned to a source physical register and the destination logical register is assigned to a destination physical register. The allocation unit includes logic to assign a first value of the source logical register to the destination logical register and to maintain a second value of the destination physical register before and after the assignment of the first value to the destination logical register.

System and methods for expandably wide processor instructions

09785565 · 2017-10-10 ·

MicroUnity Systems Engineering, Inc.

Expandably wide operations are disclosed in which operands wider than the data path between a processor and memory are used in executing instructions. The expandably wide operands reduce the influence of the characteristics of the associated processor in the design of functional units performing calculations, including the width of the register file, the processor clock rate, the exception subsystem of the processor, and the sequence of operations in loading and use of the operand in a wide cache memory.

Auxiliary Cache for Reducing Instruction Fetch and Decode Bandwidth Requirements

20170286110 · 2017-10-05 ·

A hardware-software co-designed processor includes a front end to decode an instruction, an execution unit to execute the instruction, an auxiliary cache to store auxiliary information for consumption during execution of the instruction, an instruction blender, and a retirement unit to retire the instruction. The auxiliary information may include long immediate values, non-working instructions for emulating an untranslated instruction stream, or execution hints, and is not decoded by the front end. The auxiliary cache includes circuitry to receive the auxiliary information from a binary translator, to store the auxiliary information in the auxiliary cache, and to provide the auxiliary information to the instruction blender prior to execution. The instruction blender includes circuitry to receive the auxiliary information, to blend the instruction with the auxiliary information, and to provide the blended instruction to the execution unit. Use of the auxiliary cache may reduce fetch and decode bandwidth requirements.

Methods and systems for analyzing and improving performance of computer codes

09753731 · 2017-09-05 ·

The Mathworks, Inc.

Milos Puzovic

Methods and systems for analyzing and improving performance of computer codes. In some embodiments, a method comprises executing, via one or more processors, program code; collecting, via the one or more processors, one or more hardware dependent metrics for the program code; identifying an execution anomaly based on the one or more hardware dependent metrics, wherein the execution anomaly is present when executing the program code; and designing a modification of the program code via the one or more processors, wherein the modification addresses the execution anomaly. In some other embodiments, a method comprises collecting one or more hardware independent metrics for program code; receiving one or more characteristics of a computing device; and estimating, based on the one or more hardware independent metrics and the one or more characteristics, a duration for execution of the program code on the computing device.

COMBINING LOADS OR STORES IN COMPUTER PROCESSING

20170249144 · 2017-08-31 ·

Aspects disclosed herein relate to combining instructions to load data from or store data in memory while processing instructions in processors. An exemplary method includes detecting a pattern of pipelined instructions to access memory using a first portion of available bus width and, in response to detecting the pattern, combining the pipelined instructions into a single instruction to access the memory using a second portion of the available bus width that is wider than the first portion. Devices including processors using disclosed aspects may execute currently available software in a more efficient manner without the software being modified.

PROCESSORS, METHODS, AND SYSTEMS TO ADJUST MAXIMUM CLOCK FREQUENCIES BASED ON INSTRUCTION TYPE

20170249000 · 2017-08-31 ·

Intel Corporation

An integrated circuit of an aspect includes a power control unit having an interface to receive an indication that one or more instructions of a first type are to be performed by a core. The power control unit also has logic to control a maximum clock frequency for the core based on the indication that the instructions of the first type are to be performed by the core.

Latent modification instruction for substituting functionality of instructions during transactional execution

11243770 · 2022-02-08 ·

International Business Machines Corporation

An instruction stream includes a transactional code region. The transactional code region includes a latent modification instruction (LMI), a next sequential instruction (NSI) following the LMI, and a set of target instructions following the NSI in program order. Each target instruction has an associated function, and the LMI at least partially specifies a substitute function for the associated function. A processor executes the LMI, the NSI, and at least one of the target instructions, employing the substitute function at least partially specified by the LMI. The LMI, the NSI, and the target instructions may be executed by the processor in sequential program order or out of order.

Patent classifications

G06F9/30181