Patent classifications
G06F7/49921
Division circuit and microprocessor
In an embodiment, a division circuit has an overflow determination circuit configured to determine whether or not a division result overflows by comparing absolute values of a dividend and a divisor, a replacement circuit configured to replace the dividend with a first value and replace the divisor with a second value when the overflow determination circuit determines that the division result overflows, and a stepwise division circuit configured to perform stepwise division on the dividend and the divisor or the first value and the second value.
Arithmetic circuitry for power-efficient multiply-add operations
An arithmetic circuitry includes a first processing circuitry, a second processing circuitry, an adder circuitry, and a saturation logic circuitry. The first processing circuitry divides one input term into blocks each of which being divided for each predetermined digit number, to make a least significant bit of each of the blocks overlap with a most significant bit of the adjacent and low-order block, and calculates a partial product of each of the blocks and the other input term based on Booth recoding in which a sign is controlled when Booth recoding values become ±0. The second processing circuitry simplifies the partial products. The adder circuitry outputs the sum of a result obtained through the simplification and an addition term. The saturation logic circuitry executes saturation processing based on a result outputted by the second processing circuitry and a result outputted by the adder circuitry.
NEURAL NETWORK DEVICE, METHOD OF OPERATING THE NEURAL NETWORK DEVICE, AND APPLICATION PROCESSOR INCLUDING THE NEURAL NETWORK DEVICE
A neural network device includes a floating-point arithmetic circuit configured to perform a dot product operation and an accumulation operation; and a buffer configured to store first cumulative data generated by the floating-point arithmetic circuit, wherein the floating-point arithmetic circuit is further configured to perform the dot product operation and the accumulation operation by: identifying a maximum value from a plurality of exponent addition results, obtained by respectively adding exponents of a plurality of floating-point data pairs, and an exponent value of the first cumulative data; performing, based on the maximum value, an align shift of a plurality of fraction multiplication results, obtained by respectively multiplying fractions of the plurality of floating-point data pairs, and a fraction part of the first cumulative data; and performing a summation of the plurality of aligned fraction multiplication results and the aligned fraction part of the first cumulative data.
ARITHMETIC CIRCUITRY
An arithmetic circuitry includes a first processing circuitry, a second processing circuitry, an adder circuitry, and a saturation logic circuitry. The first processing circuitry divides one input term into blocks each of which being divided for each predetermined digit number, to make a least significant bit of each of the blocks overlap with a most significant bit of the adjacent and low-order block, and calculates a partial product of each of the blocks and the other input term based on Booth recoding in which a sign is controlled when Booth recoding values become ±0. The second processing circuitry simplifies the partial products. The adder circuitry outputs the sum of a result obtained through the simplification and an addition term. The saturation logic circuitry executes saturation processing based on a result outputted by the second processing circuitry and a result outputted by the adder circuitry.
Systems and methods for performing vector max/min instructions that also generate index values
Disclosed embodiments relate to systems and methods for performing instructions structured to compute a min/max value of a vector. In one example, a processor executes a decoded single instruction to determine on a per data element position of the identified first and second operands a maximum or minimum, store the determined maximum or minimums in corresponding data element positions of the identified first operand, and determine and store, in each data element position of the identified third operand, an indication of where the maximum or minimum came from.
APPARATUS AND METHOD FOR MULTIPLICATION AND ACCUMULATION OF COMPLEX VALUES
An apparatus and method for accumulating complex numbers. For example, one embodiment of a processor comprises: a first source register to store a first plurality of real and imaginary components of a first set of complex numbers; a second source register to store a second plurality of real and imaginary components of a second set of complex numbers; wherein the real and imaginary components of the first and second plurality of are to be stored as packed data elements within the first and second source registers; and execution circuitry comprising: multiplier circuitry to multiply selected real and imaginary values from the first source register with selected real and imaginary values from the second source register to generate a first plurality of values, adder circuitry to add and subtract selected combinations of the first plurality of values to generate a second plurality of values, and accumulation circuitry to combine the second plurality of values with a third set of complex numbers stored in a destination register to generate an accumulated result, the accumulated result to be written to the destination register.
SYSTEMS AND METHODS FOR PERFORMING INSTRUCTIONS SPECIFYING VECTOR TILE LOGIC OPERATIONS
Disclosed embodiments relate to systems and methods for performing instructions structured to compute a min/max value of a vector. In one example, a processor executes a decoded single instruction to determine on a per data element position of the identified first and second operands a maximum or minimum, store the determined maximum or minimums in corresponding data element positions of the identified first operand, and determine and store, in each data element position of the identified third operand, an indication of where the maximum or minimum came from.
SYSTEMS, APPARATUSES, AND METHODS FOR VECTOR-PACKED FRACTIONAL MULTIPLICATION OF SIGNED WORDS WITH ROUNDING, SATURATION, AND HIGH-RESULT SELECTION
Embodiments of systems, apparatuses, and methods for vector-packed fractional multiplication of signed words with rounding, saturation, and high-result selection in a processor are described. For example, execution circuitry executes a decoded instruction to perform a fractional multiplication operation for each of a plurality of pairs of packed data elements to yield a plurality of output values, round each of the plurality of output values, detect whether any of the plurality of output values reflect an overflow or underflow, for any of the plurality of output values that reflect an overflow or underflow, saturate the output value, and store the plurality of output values into a corresponding plurality of positions of the packed data destination operand.
Overflow Event Counter
A processing device comprises a register configured to store a count value indicating a number of times overflow events have resulted from arithmetic operations performed by the processing device. An execution unit of the device, in response to performing an arithmetic operation having a result which extends beyond one of the predefined limit values for the floating-point format, stores a result value that is within the predefined limit values, and cause the count value to be incremented. The count value provides a performant way of determining the number of overflow events that have occurred during the arithmetic processing performed by the execution unit. The count value provides a metric that provides a measure of the inaccuracy imparted into the results of the application processing by overflow events.
Neural network device, method of operating the neural network device, and application processor including the neural network device
A neural network device includes a floating-point arithmetic circuit configured to perform a dot product operation and an accumulation operation; and a buffer configured to store first cumulative data generated by the floating-point arithmetic circuit, wherein the floating-point arithmetic circuit is further configured to perform the dot product operation and the accumulation operation by: identifying a maximum value from a plurality of exponent addition results, obtained by respectively adding exponents of a plurality of floating-point data pairs, and an exponent value of the first cumulative data; performing, based on the maximum value, an align shift of a plurality of fraction multiplication results, obtained by respectively multiplying fractions of the plurality of floating-point data pairs, and a fraction part of the first cumulative data; and performing a summation of the plurality of aligned fraction multiplication results and the aligned fraction part of the first cumulative data.