Patent classifications
G06F7/44
GENERALIZED ACCELERATION OF MATRIX MULTIPLY ACCUMULATE OPERATIONS
A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A dot product operation includes the steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products based on the exponents associated with each element of the first vector and each element of the second vector; and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder.
Processor with efficient arithmetic units
A processor includes a carry save array multiplier. The carry save array multiplier includes an array of cascaded partial product generators. The array of cascaded partial product generators is configured to generate an output value as a product of two operands presented at inputs of the multiplier. The array of cascaded partial product generators is also configured to generate an output value as a sum of two operands presented at inputs of the multiplier.
Processor with efficient arithmetic units
A processor includes a carry save array multiplier. The carry save array multiplier includes an array of cascaded partial product generators. The array of cascaded partial product generators is configured to generate an output value as a product of two operands presented at inputs of the multiplier. The array of cascaded partial product generators is also configured to generate an output value as a sum of two operands presented at inputs of the multiplier.
Distributed double-precision floating-point multiplication
The present embodiments relate to circuitry that efficiently performs double-precision floating-point multiplication operations, single-precision floating-point multiplication operations, and fixed-point multiplication operations. Such circuitry may be implemented in specialized processing blocks. If desired, each specialized processing block efficiently may perform a single-precision floating-point multiplication operation, and multiple specialized processing blocks may be coupled together to perform a double-precision floating-point multiplication operation. Inter-block signaling circuits may generate rounding information and propagate the rounding information together with partial product results from a current specialized processing block to another specialized processing block.
COMPUTING AND SUMMING UP MULTIPLE PRODUCTS IN A SINGLE MULTIPLIER
Methods, systems and computer program products for computing and summing up multiple products in a single multiplier are provided. Aspects include receiving a first number and a second number, creating partial products of the first number and the second number based on a multiplication of the first number and the second number, and reducing the number of partial products to create an intermediate result. Aspects also include receiving a third number and a fourth number, creating partial products of the third number and the fourth number based on a multiplication of the third number and the fourth number, creating a reduction tree and adding the intermediate result to the reduction tree. Aspects further include reducing the number of partial products in the reduction tree to create a second sum value and a second carry value and adding the second sum value and the second carry value to create a result.
FAST FILTERING
Devices and methods for filtering data include calculating intermediate input values from input elements using a transformation function. The transformation function is based at least in part on a size of the filter and a number of filter outputs. Intermediate filter values are calculated from filter elements of the filter using the transformation function. Each intermediate input value is multiplied with a respective intermediate filter value to form intermediate values. These intermediate values are combined with each other using the transformation function to determine one or more output values.
FAST FILTERING
Devices and methods for filtering data include calculating intermediate input values from input elements using a transformation function. The transformation function is based at least in part on a size of the filter and a number of filter outputs. Intermediate filter values are calculated from filter elements of the filter using the transformation function. Each intermediate input value is multiplied with a respective intermediate filter value to form intermediate values. These intermediate values are combined with each other using the transformation function to determine one or more output values.
Power transmission device and wireless power transmission system
In a power transmission device including a power transmission coil that is disposed to oppose an installation surface of the power transmission device on which a power receiving device is installed and that is capable of being electromagnetically coupled with the power receiving coil. A magnetic substance is disposed at least outside the power transmission coil to oppose the installation surface via the power transmission coil and to be electromagnetically coupled with the power transmission coil. A object detecting circuit detects a metal object existing at least outside the power transmission coil by supplying first AC power to the power transmission coil and detecting a change in at least one of a voltage, a current, and a frequency of the first AC power or a voltage or current of a DC component of the first AC power.
Apparatus and method for floating-point multiplication
An apparatus and method for floating-point multiplication are provided. Two partial products are generated from two operand significands, which are then added to generate a product significand. The value of an unbiased result exponent is determined from the operand exponent values and leading zero counts, and a shift amount and direction for the product significand are determined in dependence on a predetermined minimum exponent value of a predetermined canonical format. The product significand is shifted by the shift amount in the shift direction. An overflow mask identifying an overflow bit position of the product significand is generated by right shifting a predetermined mask pattern by the shift amount, and the overflow mask is applied to the product significand to extract an overflow value at the overflow bit position. This extraction of the overflow value happens before the shift circuitry shifts the product significand, allowing an overall faster floating-point multiplication to be performed.
Apparatus and method for floating-point multiplication
An apparatus and method for floating-point multiplication are provided. Two partial products are generated from two operand significands. An unbiased result exponent is determined from operand exponent values and leading zero counts, and a shift amount and direction for a product significand as needed for a predetermined minimum exponent value of a predetermined canonical format. First and second rounding values for injection into addition of the partial products are generated by shifting a predetermined rounding pattern by the shift amount in an opposite shift direction for the first rounding value and left shifting by one bit the first rounding value to give the second. The first and second partial products are added together with the first rounding value to give a first product significand, and are added together with the second rounding value to give a second product significand. These product significands are shifted by the shift amount in the shift direction and one is then selected in order to generate a formatted significand in the predetermined canonical format. The early injection rounding provides a faster floating-point multiplier.