IPIQ

G06F2207/3812

RECONFIGURABLE DIGITAL SIGNAL PROCESSING (DSP) VECTOR ENGINE

20200225947 · 2020-07-16 ·

Systems and methods described herein may relate to providing a dynamically configurable circuitry able to process data associated with a variety of matrix dimensions one or more complex number operations, one or more real number operations, or both. Configurations may be applied to the configurable circuitry to program the configurable circuitry for a next operation. The configurable circuitry may process data according to a variety of operations based at least in part on operation of a repeated processing element coupled in a compute network of processing elements.

Method and apparatus for efficient binary and ternary support in fused multiply-add (FMA) circuits

10713012 · 2020-07-14 ·

Intel Corporation

An apparatus and method for efficiently performing a multiply add or multiply accumulate operation. For example, one embodiment of a processor comprises: a decoder to decode an instruction specifying a multiply-accumulate or multiply-add operation, the instruction comprising a first operand identifying a multiplier and a second operand identifying a multiplicand; and fused multiply-add (FMA) execution circuitry comprising first multiplication circuitry to perform a multiplication using the multiplicand and multiplier to generate a result for multipliers and multiplicands falling within a first precision range, and second multiplication circuitry to be used instead of the first multiplication circuitry for multipliers and multiplicands falling within a second precision range; control circuitry, responsive to a precision of the first and second operands being below a threshold, to cause the first operand and second operand to be processed by the second multiplication circuitry to generate the result; and adder circuitry to add the result to an accumulated value to generate a new accumulated value.

Packed 16 bits instruction pipeline

11880683 · 2024-01-23 ·

Advanced Micro Devices, Inc.

Systems, apparatuses, and methods for efficiently processing arithmetic operations are disclosed. A computing system includes a processor capable of executing single precision mathematical instructions on data sizes of M bits and half precision mathematical instructions on data sizes of N bits, which is less than M bits. At least two source operands with M bits indicated by a received instruction are read from a register file. If the instruction is a packed math instruction, at least a first source operand with a size of N bits less than M bits is selected from either a high portion or a low portion of one of the at least two source operands read from the register file. The instruction includes fields storing bits, each bit indicating the high portion or the low portion of a given source operand associated with a register identifier specified elsewhere in the instruction.

CHANGING PRECISION OF OPERANDS

20240095302 · 2024-03-21 ·

Apparatuses, systems, and techniques to perform matrix multiply-accumulate (MMA) operations on data of a first type using one or more MMA instructions for data of a second type. In at least one embodiment, a single tensorfloat-32 (TF32) MMA instruction computes a 32-bit floating point (FP32) output using TF32 input operands converted from FP32 data values.

Multiplication and accumulation (MAC) operator

11909421 · 2024-02-20 ·

Sk Hynix Inc.

Choung Ki Song

A MAC operator includes a plurality of data type converters and a plurality of multipliers. Each of the plurality of data type converters may receive 16-bit input data of one of first to fourth data types of a floating-point format to convert into L-bit output data of the floating-point format. Each of the plurality of multipliers may perform a multiplication on the L-bit output data of the floating-point format outputted from two of the plurality of data type converters to output multiplication result data of the floating-point format.

Use of multiple different variants of floating point number formats in floating point operations on a per-operand basis

11966740 · 2024-04-23 ·

GRAPHCORE LIMITED

A processor comprising: a register file comprising a group of operand registers for holding data values, each operand register being a fixed number of bits in length for holding a respective data value of that length; and processing logic comprising floating point logic for performing floating point operations on data values in the register file, the floating point logic is configured to process the fixed number of bits in the respective data value according to a floating point format comprising a set of mantissa bits and a set of exponent bits. The processing logic is operable to select between a plurality of different variants of the floating point format, at least some of the variants having a different size sets of mantissa bits and exponent bits relative to one another.

Decimal and binary floating point arithmetic calculations

10416962 · 2019-09-17 ·

International Business Machines Corporation

Logic is provided for performing decimal and binary floating point arithmetic calculations on first and second operands. The method includes: receiving the first and second operands in packed format; unpacking the first and second operands; swapping the first operand to a fourth operand and the second operand to a third operand, if an exponent of the first operand is less than an exponent of the second operand, otherwise storing the first operand to the third operand and the second operand to the fourth operand; aligning the third operand and the fourth operands based on the exponent difference of the third and fourth operand and a number of leading zeroes of the third operand; performing an add/subtract operation on the aligned third and fourth operands with normalizing and rounding between the operands; and packing the result obtained from the add/subtract.

Reconfigurable digital signal processing (DSP) vector engine

11983530 · 2024-05-14 ·

Intel Corporation

Systems and methods described herein may relate to providing a dynamically configurable circuitry able to process data associated with a variety of matrix dimensions using one or more complex number operations, one or more real number operations, or both. Configurations may be applied to the configurable circuitry to program the configurable circuitry for a next operation. The configurable circuitry may process data according to a variety of operations based at least in part on operation of a repeated processing element coupled in a compute network of processing elements.

PACKED 16 BITS INSTRUCTION PIPELINE

20190129718 · 2019-05-02 ·

Systems, apparatuses, and methods for routing traffic between clients and system memory are disclosed. A computing system includes a processor capable of executing single precision mathematical instructions on data sizes of M bits and half precision mathematical instructions on data sizes of N bits, which is less than M bits. At least two source operands with M bits indicated by a received instruction are read from a register file. If the instruction is a packed math instruction, at least a first source operand with a size of N bits less than M bits is selected from either a high portion or a low portion of one of the at least two source operands read from the register file. The instruction includes fields storing bits, each bit indicating the high portion or the low portion of a given source operand associated with a register identifier specified elsewhere in the instruction.

METHOD AND APPARATUS FOR EFFICIENT BINARY AND TERNARY SUPPORT IN FUSED MULTIPLY-ADD (FMA) CIRCUITS

20190056916 · 2019-02-21 ·

Patent classifications

G06F2207/3812