G06F9/382

Encoding and decoding variable length instructions

Methods of encoding and decoding are described which use a variable number of instruction words to encode instructions from an instruction set, such that different instructions within the instruction set may be encoded using different numbers of instruction words. To encode an instruction, the bits within the instruction are reordered and formed into instruction words based upon their variance as determined using empirical or simulation data. The bits in the instruction words are compared to corresponding predicted values and some or all of the instruction words that match the predicted values are omitted from the encoded instruction.

Processor and instruction set for flexible qubit control with low memory overhead
11726790 · 2023-08-15 · ·

Apparatus and method for specifying quantum operations such as qubit rotations in a quantum instruction. For example, one embodiment of an apparatus comprises: a quantum instruction processing pipeline to process a quantum instruction having one or more opcodes to specify quantum operations and one or more operands and/or fields to specify values to be used to perform the quantum operations; a quantum waveform synthesizer to synthesize a waveform to control a qubit based on the values specified by the operands and/or fields of the quantum instruction.

Program Thread Selection Between a Plurality of Execution Pipelines
20220137973 · 2022-05-05 ·

Techniques are disclosed relating to an apparatus that includes a plurality of execution pipelines including first and second execution pipelines, a shared circuit that is shared by the first and second execution pipelines, and a decode circuit. The first and second execution pipelines are configured to concurrently perform operations for respective instructions. The decode circuit is configured to assign a first program thread to the first execution pipeline and a second program thread to the second execution pipeline. In response to determining that respective instructions from the first and second program threads that utilize the shared circuit are concurrently available for dispatch, the decode circuit is further configured to select between the first program thread and the second program thread.

Instructions for fused multiply-add operations with variable precision input operands

Disclosed embodiments relate to instructions for fused multiply-add (FMA) operations with variable-precision inputs. In one example, a processor to execute an asymmetric FMA instruction includes fetch circuitry to fetch an FMA instruction having fields to specify an opcode, a destination, and first and second source vectors having first and second widths, respectively, decode circuitry to decode the fetched FMA instruction, and a single instruction multiple data (SIMD) execution circuit to process as many elements of the second source vector as fit into an SIMD lane width by multiplying each element by a corresponding element of the first source vector, and accumulating a resulting product with previous contents of the destination, wherein the SIMD lane width is one of 16 bits, 32 bits, and 64 bits, the first width is one of 4 bits and 8 bits, and the second width is one of 1 bit, 2 bits, and 4 bits.

APPARATUS AND METHOD FOR COMPLEX MULTIPLICATION

An embodiment of the invention is a processor including execution circuitry to calculate, in response to a decoded instruction, a result of a complex multiplication of a first complex number and a second complex number. The calculation includes a first operation to calculate a first term of a real component of the result and a first term of the imaginary component of the result. The calculation also includes a second operation to calculate a second term of the real component of the result and a second term of the imaginary component of the result. The processor also includes a decoder, a first source register, and a second source register. The decoder is to decode an instruction to generate the decoded instruction. The first source register is to provide the first complex number and the second source register is to provide the second complex number.

Instructions and logic for vector multiply add with zero skipping

Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.

MODULAR PIPELINES FOR ACCESSING DIGITAL DATA
20230305851 · 2023-09-28 ·

A data processing system for accessing encoded digital data comprises an accessor pipeline generator. The accessor pipeline generator may comprise an accessor functional unit selector and an accessor functional unit connector. The accessor functional unit selector may iteratively select platform independent accessor functional units from an accessor functional unit library to generate an accessor pipeline. The pipeline may be tested during the process of accessor pipeline generation. The encoded data and the generated accessor pipeline may be stored in a container.

Apparatus and method for complex multiplication

An embodiment of the invention is a processor including execution circuitry to calculate, in response to a decoded instruction, a result of a complex multiplication of a first complex number and a second complex number. The calculation includes a first operation to calculate a first term of a real component of the result and a first term of the imaginary component of the result. The calculation also includes a second operation to calculate a second term of the real component of the result and a second term of the imaginary component of the result. The processor also includes a decoder, a first source register, and a second source register. The decoder is to decode an instruction to generate the decoded instruction. The first source register is to provide the first complex number and the second source register is to provide the second complex number.

VARIABLE-LENGTH INSTRUCTION STEERING TO INSTRUCTION DECODE CLUSTERS

Embodiments of apparatuses and methods for variable-length instruction steering to instruction decode clusters are disclosed. In an embodiment, an apparatus includes a decode cluster and chunk steering circuitry. The decode cluster includes multiple instruction decoders. The chunk steering circuitry is to break a sequence of instruction bytes into a plurality of chunks, create a slice from a one or more of the plurality of chunks based on one or more indications of a number of instructions in each of the one or more of the plurality of chunks, wherein the slice has a variable size and includes a plurality of instructions, and steer the slice to the decode cluster.

Speculative usage of parallel decode units

Aspects of the present disclosure relate to an apparatus comprising fetch circuitry. The fetch circuitry comprises a pointer-based fetch queue for queuing processing instructions retrieved from a storage, and pointer storage for storing a pointer identifying a current fetch queue element. The apparatus comprises decode circuitry having a plurality of decode units, and fetch queue extraction circuitry to, based on the pointer, extract the content of a plurality of elements of the fetch queue; apply combinatorial logic to speculatively produce, from the content of said fetch queue entries, a plurality of speculative potential instructions; and transmit each speculative potential instruction to a corresponding one of said decode units. Each decode unit is configured to decode the corresponding speculative potential instruction. The instruction extraction circuitry is configured to extract a subset of said plurality of speculative potential instructions, and transmit said determined subset to pipeline component circuitry.