G06F9/30021

COMBINING LOADS OR STORES IN COMPUTER PROCESSING

Aspects disclosed herein relate to combining instructions to load data from or store data in memory while processing instructions in processors. An exemplary method includes detecting a pattern of pipelined instructions to access memory using a first portion of available bus width and, in response to detecting the pattern, combining the pipelined instructions into a single instruction to access the memory using a second portion of the available bus width that is wider than the first portion. Devices including processors using disclosed aspects may execute currently available software in a more efficient manner without the software being modified.

METHOD FOR OPERATING A MICROPROCESSOR
20170249145 · 2017-08-31 ·

A method is described for operating a microprocessor, in which a conversion software executed in the microprocessor carries out a binary translation, in the course of which a source instruction that is encoded according to a first instruction-set architecture is translated into a target instruction in a binary manner, which is encoded according to a second instruction-set architecture, and the target instruction translated by the translation software into the second instruction-set architecture being replicated, and in this replicated target instruction a memory area which is to be accessed in the course of the execution of the target instruction is replaced by a second memory area, and the target instruction and the copied target instruction is executed by the microprocessor. With the aid of the method, a temporal redundancy is achieved by a (temporally) parallel execution of the target instruction on a processor core and a local or regional redundancy by a parallel execution of the target instruction on different processor cores.

Dynamic selection of OSC hazard avoidance mechanism

Methods, systems and computer program products for dynamically selecting an OSC hazard avoidance mechanism are provided. Aspects include receiving a load instruction that is associated with an operand store compare (OSC) prediction. The OSC prediction is stored in an entry of an OSC history table (OHT) and includes a multiple dependencies indicator (MDI). Responsive to determining the MDI is in a first state, aspects include applying a first OSC hazard avoidance mechanism in relation to the load instruction. Responsive to determining that the load instruction is dependent on more than one store instruction, aspects include placing the MDI in a second state. The MDI being in the second state provides an indication to apply a second OSC hazard avoidance mechanism in relation to the load instruction.

Vector generate mask instruction

A Vector Generate Mask instruction. For each element in the first operand, a bit mask is generated. The mask includes bits set to a selected value starting at a position specified by a first field of the instruction and ending at a position specified by a second field of the instruction.

Merging and sorting arrays on an SIMD processor

Methods, systems, and articles of manufacture for merging and sorting arrays on a processor are provided herein. A method includes splitting an input array into multiple sub-arrays across multiple processing elements; merging the multiple sub-arrays into multiple vectors; and sorting the multiple vectors by comparing and swapping one or more vector elements among the multiple vectors.

METHOD AND APPARATUS FOR SPECULATIVE DECOMPRESSION

An apparatus and method for performing parallel decoding of prefix codes such as Huffman codes. For example, one embodiment of an apparatus comprises: a first decompression module to perform a non-speculative decompression of a first portion of a prefix code payload comprising a first plurality of symbols; and a second decompression module to perform speculative decompression of a second portion of the prefix code payload comprising a second plurality of symbols concurrently with the non-speculative decompression performed by the first compression module.

PROVIDING VECTOR HORIZONTAL COMPARE FUNCTIONALITY WITHIN A VECTOR REGISTER

A processor includes a vector register including data fields to store values of vector elements of data, a decoder to decode a single instruction multiple data (SIMD) instruction specifying a source operand and a mask to identify a masked portion of the data fields. An execution unit is to read a plurality of values from unmasked data fields of the plurality of data fields of the vector register; compare, within the vector register, each of the plurality of values from the unmasked data fields for equality with all other values of the plurality of values; and responsive to a detection of an inequality of any two values of the plurality of values, set a mask field, corresponding to a detected unequal value, to a masked state with a flip of a bit value of the mask field, to signal the detection of the inequality.

Compare and exchange operation using sleep-wakeup mechanism

A method, apparatus, and system are provided for performing compare and exchange operations using a sleep-wakeup mechanism. According to one embodiment, an instruction at a processor is executed to help acquire a lock on behalf of the processor. If the lock is unavailable to be acquired by the processor, the instruction is put to sleep until an event has occurred.

Control flow in a thread-based environment without branching

A method for computing in a thread-based environment provides manipulating an execution mask to enable and disable threads when executing multiple conditional function clauses for process instructions. Execution lanes are controlled based on execution participation for the process instructions for reducing resource consumption. Execution of particular one or more schedulable structures that include multiple process instructions are skipped based on the execution mask and activating instructions.

MICROPROCESSOR THAT FUSES LOAD AND COMPARE INSTRUCTIONS

Technology for fusing certain load instructions and compare-immediate instructions in a computer processor having a load-store architecture with respect to transferring data between memory and registers of the computer processor. In some embodiments the load and compare-immediate instructions are consecutive. In some embodiments, the instructions are only merged if: (i) the respective RA and RT fields of the two instructions match; (ii) the immediate field of the compare-immediate instruction has a certain value, or falls within a range of certain values; and/or (iii) the instructions are received in a consecutive manner.