G06F2207/3828

APPARATUS AND METHOD FOR VECTOR HORIZONTAL ADD OF SIGNED/UNSIGNED WORDS AND DOUBLEWORDS

An apparatus and method for performing a packed horizontal addition of words and doublewords. For example, one embodiment of a processor comprises: a decoder to decode a packed horizontal add instruction to generate a decoded packed horizontal add instruction, the packed horizontal add instruction including an opcode and operands identifying a plurality of packed words; a source register to store a first plurality of packed words; execution circuitry to execute the decoded instruction, the execution circuitry comprising: operand selection circuitry to identify first and second packed words from the source register in accordance with the operand and opcode of the packed horizontal add instruction; adder circuitry to add the first and second packed words to generate a temporary sum; a temporary storage of at least 17 bits to store the temporary sum; saturation circuitry to saturate the temporary sum if necessary to generate a final result; a destination register to store the final result as a packed result word in a designated data element position.

PACKED 16 BITS INSTRUCTION PIPELINE

Systems, apparatuses, and methods for routing traffic between clients and system memory are disclosed. A computing system includes a processor capable of executing single precision mathematical instructions on data sizes of M bits and half precision mathematical instructions on data sizes of N bits, which is less than M bits. At least two source operands with M bits indicated by a received instruction are read from a register file. If the instruction is a packed math instruction, at least a first source operand with a size of N bits less than M bits is selected from either a high portion or a low portion of one of the at least two source operands read from the register file. The instruction includes fields storing bits, each bit indicating the high portion or the low portion of a given source operand associated with a register identifier specified elsewhere in the instruction.

METHOD, DEVICE, AND SYSTEM FOR TASK PROCESSING
20190123902 · 2019-04-25 ·

A number of RSA computing tasks that have different word lengths which are less than a maximum word length of an operand register are processed at the same time by combining a number of different word lengths to be equal to or less than the maximum word length of the operand register.

APPARATUS AND METHOD FOR PERFORMING MULTIPLICATION WITH ADDITION-SUBTRACTION OF REAL COMPONENT

An apparatus and method for performing a transform on complex data. For example, one embodiment of a processor comprises: multiplier circuitry to multiply packed real N-bit data elements in the first source register with packed real M-bit data elements in the second source register and to multiply packed imaginary N-bit data elements in the first source register with packed imaginary M-bit data elements in the second source register to generate at least four real products, adder circuitry to subtract a first selected real product from a second selected real product to generate a first temporary result and to subtract a third selected real product from a fourth selected real product to generate a second temporary result, the adder circuitry to add the first temporary result to a first packed N-bit data element from the third source register to generate a first pre-scaled result, to subtract the first temporary result from the first packed N-bit data element to generate a second pre-scaled result, to add the second temporary result to a second packed N-bit data element from the third source register to generate a third pre-scaled result, and to subtract the second temporary result from the second packed N-bit data element to generate a fourth pre-scaled result; scaling circuitry to scale the first, second, third and fourth pre-scaled results to a specified bit width to generate first, second, third, and fourth final results; and a destination register to store the first, second, third, and fourth final results in specified data element positions.

Arithmetic processing device and control method for arithmetic processing device

A plurality of floating-point registers store data therein. A processing execution unit executes arithmetic processing by using data stored in the floating-point registers. A first switch and a second switch select a route connecting the processing execution unit and the floating-point registers. A switch control unit controls the first switch and the second switch so as to switch a route to be selected, based on a switching instruction from the processing execution unit.

PROCESSOR AND METHOD FOR OUTER PRODUCT ACCUMULATE OPERATIONS

A processor and method for performing outer product and outer product accumulation operations on vector operands requiring large numbers of multiplies and accumulations is disclosed.

METHODS FOR USING A MULTIPLIER TO SUPPORT MULTIPLE SUB-MULTIPLICATION OPERATIONS

Integrated circuits with digital signal processing (DSP) blocks are provided. A DSP block may include one or more large multiplier circuits. A large multiplier circuit (e.g., an 1818 or 1819 multiplier circuit) may be used to support two or more smaller multiplication operations sharing one or two sets of multiplier operands, a complex multiplication, and a sum of two multiplications. If the multiplier products overflow and interfere with one another, correction operations can be performed. Partial products from two or more larger multiplier circuits can be used to combine decomposed partial products. A large multiplier circuit can also be used to support two floating-point mantissa multipliers.

DATA PACKING TECHNIQUES FOR HARD-WIRED MULTIPLIER CIRCUITS

A method is provided that includes providing a hard-wired integer multiplier circuit configured to multiply a first physical operand and a second physical operand, mapping a first logical operand to a first portion of the first physical operand, mapping a second logical operand to a second portion of the first physical operand, and mapping a third logical operand to the second physical operand. The method further includes multiplying the first physical operand and the second physical operand using the hard-wired integer multiplier circuit to provide a multiplication result that includes a first portion including a product of the first logical operand and the third logical operand, and a second portion including a product of the second logical operand and the third logical operand.

Processor and method for outer product accumulate operations

A processor and method for performing outer product and outer product accumulation operations on vector operands requiring large numbers of multiplies and accumulations is disclosed.

Dense Digital Arithmetic Circuitry Utilization for Fixed-Point Machine Learning
20180307975 · 2018-10-25 ·

Systems and methods are related to improving throughput of neural networks in integrated circuits by combining values in operands to increase compute density. A system includes an integrated circuit (IC) having multiplier circuitry. The IC receives a first value and a second value in a first operand. The IC performs a multiplication operation, via the multiplier circuitry, on the first operand and a second operand to produce a first multiplied product based at least in part on the first value and a second multiplied product based at least in part on the second value.