G06F9/30185

METHOD AND APPARATUS FOR SPECULATIVE DECOMPRESSION

An apparatus and method for performing parallel decoding of prefix codes such as Huffman codes. For example, one embodiment of an apparatus comprises: a first decompression module to perform a non-speculative decompression of a first portion of a prefix code payload comprising a first plurality of symbols; and a second decompression module to perform speculative decompression of a second portion of the prefix code payload comprising a second plurality of symbols concurrently with the non-speculative decompression performed by the first compression module.

EXTENDING FUSED MULTIPLY-ADD INSTRUCTIONS
20220035628 · 2022-02-03 ·

Extending fused multiply-add instructions, the method comprising: receiving an extended fused multiply-add (FMA) instruction indicating one or more operands of a fused multiply-add (FMA) operation and one or more transformations to be applied to the one or more operands; and performing, based on the extended FMA instruction, the one or more transformations and the FMA operation.

Performing Rounding Operations Responsive To An Instruction
20170220349 · 2017-08-03 ·

In one embodiment, the present invention includes a method for receiving a rounding instruction and an immediate value in a processor, determining if a rounding mode override indicator of the immediate value is active, and if so executing a rounding operation on a source operand in a floating point unit of the processor responsive to the rounding instruction and according to a rounding mode set forth in the immediate operand. Other embodiments are described and claimed.

PROGRAM COUNTER (PC)-RELATIVE LOAD AND STORE ADDRESSING

Load store addressing can include a processor, which fuses two consecutive instruction determined to be prefix instructions and treats the two instructions as a single fused instruction. The prefix instruction of the fused instruction is auto-finished at dispatch time in an issue unit of the processor. A suffix instruction of the fused instruction and its fields and the prefix instruction's fields are issued from an issue queue of the issue unit, wherein an opcode of the suffix instruction is issued to a load store unit of the processor, and fields of the fused instruction are issued to the execution unit of the processor. The execution unit forms operands of the suffix instruction, at least one operand formed based on a current instruction address of the single fused instruction. The load store unit executes the suffix instruction using the operands formed by the execution unit.

Apparatus and method for efficient call/return emulation using a dual return stack buffer

An apparatus and method for a dual return stack buffer (RSB) for use in binary translation systems. An embodiment of a processor includes: a dual return stack buffer (DRSB) comprising a native RSB and an extended RSB (XRSB), the dual RSB to be used within a binary translation execution environment in which guest call-return instruction sequences are translated to native call-return instruction sequences to be executed directly by the processor; the native RSB to store native return addresses associated with the native call-return instruction sequences; and the XRSB to store emulated return addresses associated with the guest call-return instruction sequences, wherein each native return address stored in the RSB is associated with an emulated return address stored in the XRSB.

Flag non-modification extension for ISA instructions using prefixes

In one embodiment, a processor includes an instruction decoder to receive and decode an instruction having a prefix and an opcode, an execution unit to execute the instruction based on the opcode, and flag modification override logic to prevent the execution unit from modifying a flag register of the processor based on the prefix of the instruction.

Instruction and logic to provide vector linear interpolation functionality

Instructions and logic provide vector linear interpolation functionality. In some embodiments, responsive to an instruction specifying: a first operand from a set of vector registers, a size of each of the vector elements, a portion of the vector elements upon which to compute linear interpolations, a second operand from a set of vector registers, and a third operand; an execution unit, reads a first, a second and a third value of the size of vector elements from corresponding data fields in the first, the second and the third operand respectively and computes an interpolated value as the first value multiplied by the second value minus the second value multiplied by the third value plus the third value.

SYSTEMS, METHODS, AND APPARATUS FOR TILE CONFIGURATION

Embodiments detailed herein relate to matrix (tile) operations. For example, decode circuitry to decode an instruction having fields for an opcode and a memory address; and execution circuitry to execute the decoded instruction to set a tile configuration for the processor to utilize tiles in matrix operations based on a description retrieved from the memory address, wherein a tile a set of 2-dimensional registers are discussed.

Conditional execution specification of instructions using conditional extension slots in the same execute packet in a VLIW processor

In one embodiment, a system includes a memory and a processor core. The processor core includes functional units and an instruction decode unit configured to determine whether an execute packet of instructions received by the processing core includes a first instruction that is designated for execution by a first functional unit of the functional units and a second instruction that is a condition code extension instruction that includes a plurality of sets of condition code bits, wherein each set of condition code bits corresponds to a different one of the functional units, and wherein the sets of condition code bits include a first set of condition code bits that corresponds to the first functional unit. When the execute packet includes the first and second instructions, the first functional unit is configured to execute the first instruction conditionally based upon the first set of condition code bits in the second instruction.

Vector friendly instruction format and execution thereof

A vector friendly instruction format and execution thereof. According to one embodiment of the invention, a processor is configured to execute an instruction set. The instruction set includes a vector friendly instruction format. The vector friendly instruction format has a plurality of fields including a base operation field, a modifier field, an augmentation operation field, and a data element width field, wherein the first instruction format supports different versions of base operations and different augmentation operations through placement of different values in the base operation field, the modifier field, the alpha field, the beta field, and the data element width field, and wherein only one of the different values may be placed in each of the base operation field, the modifier field, the alpha field, the beta field, and the data element width field on each occurrence of an instruction in the first instruction format in instruction streams.