G06F7/762

SYSTEMS, METHODS, AND APPARATUSES FOR TILE LOAD

Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in the form of decode circuitry to decode an instruction having fields for an opcode, a destination matrix operand identifier, and source memory information, and execution circuitry to execute the decoded instruction to load groups of strided data elements from memory into configured rows of the identified destination matrix operand to memory.

Systems, methods, and apparatuses for matrix add, subtract, and multiply

Embodiments detailed herein relate to matrix operations. In particular, support for matrix (tile) addition, subtraction, and multiplication is described. For example, circuitry to support instructions for element-by-element matrix (tile) addition, subtraction, and multiplication are detailed. In some embodiments, for matrix (tile) addition, decode circuitry is to decode an instruction having fields for an opcode, a first source matrix operand identifier, a second source matrix operand identifier, and a destination matrix operand identifier; and execution circuitry is to execute the decoded instruction to, for each data element position of the identified first source matrix operand: add a first data value at that data element position to a second data value at a corresponding data element position of the identified second source matrix operand, and store a result of the addition into a corresponding data element position of the identified destination matrix operand.

Pair merge execution units for microinstructions
12271737 · 2025-04-08 · ·

An instruction execution circuit operable to reduce two or more micro-operations into one by producing multiple permutation and merge results in one execution cycle. The execution circuit includes a permutation and merge switching fabric and a bank of multiplexers. For a fetched instruction, a decoder decodes an opcode to generate a set of control indications used to control the multiplexers to select bytes from the respective inputs that are destined for each of the multiple results. In this manner, multiple permutation results can be output from the execution circuits in one micro-operation.

SYSTEMS, METHODS, AND APPARATUSES FOR TILE TRANSPOSE

Embodiments detailed herein relate to matrix operations. In particular, support for a matrix transpose instruction is detailed. In some embodiments, decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and a destination matrix operand identifier; and execution circuitry to execute the decoded instruction to transpose each row of elements of the identified source matrix operand into a corresponding column of the identified destination matrix operand are detailed.

SYSTEMS, METHODS, AND APPARATUSES FOR TILE MATRIX MULTIPLICATION AND ACCUMULATION

Embodiments detailed herein relate to matrix operations. In particular, matrix (tile) multiply accumulate and negated matrix (tile) multiply accumulate are discussed. For example, in some embodiments decode circuitry to decode an instruction having fields for an opcode, an identifier for a first source matrix operand, an identifier of a second source matrix operand, and an identifier for a source/destination matrix operand; and execution circuitry to execute the decoded instruction to multiply the identified first source matrix operand by the identified second source matrix operand, add a result of the multiplication to the identified source/destination matrix operand, and store a result of the addition in the identified source/destination matrix operand and zero unconfigured columns of identified source/destination matrix operand are detailed.

Systems, methods, and apparatus for tile configuration

Embodiments detailed herein relate to matrix (tile) operations. For example, decode circuitry to decode an instruction having fields for an opcode and a memory address; and execution circuitry to execute the decoded instruction to set a tile configuration for the processor to utilize tiles in matrix operations based on a description retrieved from the memory address, wherein a tile a set of 2-dimensional registers are discussed.

Systems, methods, and apparatuses for dot production operations

Embodiments detailed herein relate to matrix operations. For example, embodiments of instruction support for matrix (tile) dot product operations are detailed. Exemplary instructions including computing a dot product of signed words and accumulating in a double word with saturation; computing a dot product of bytes and accumulating in to a dword with saturation, where the input bytes can be signed or unsigned and the dword accumulation has output saturation; etc.

Neural network operation method and apparatus with mapping orders

A neural network operation apparatus includes an input register to store an input feature map, a processing element array including a processing element to perform an operation based on the input feature map and a weight matrix, and a controller to map a portion of the input feature map and a portion of the weight matrix, both on which the operation is to be performed, to the processing element.

Systems, methods, and apparatuses for tile store

Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in at least a form of decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and destination memory information, and execution circuitry to execute the decoded instruction to store each data element of configured rows of the identified source matrix operand to memory based on the destination memory information.