Patent classifications
G06F2207/3884
Floating-point number operation circuit and method
This invention discloses a floating-point number operation circuit and a method thereof. The floating-point number operation circuit is configured to perform a fused multiplication and accumulation (fused mac) operation or a multiplication and accumulation (mac) operation on a first operand, a second operand, and a third operand, or perform a multiplication operation on the first operand and the second operand. The floating-point number operation circuit includes two rounding circuits, a multiplication circuit, a selection circuit, a control circuit, and an addition circuit. The control circuit controls the scheduling of various operations and the use of resources on each calculation path to simplify the circuit and improve the performance of the processor.
Power saving floating point multiplier-accumulator with precision-aware accumulation
A floating point multiplier-accumulator (MAC) multiplies and accumulates N pairs of floating point values using N MAC processors operating simultaneously, each pair of values comprising an input value and a coefficient value to be multiplied and accumulated. The pairs of floating point values are simultaneously processed by the plurality of MAC processors, each of which outputs a signed integer form fraction and a maximum exponent. A range estimator forms a possible range of values from the exponent differences and determines an adder precision. The integer form fractions are summed using the adder precision, a sign bit is extracted, and a floating point value is output. Each MAC processor provides its integer form fraction with a precision determined by the MAC processor's exponent difference.
DIVIDE/SQUARE-ROOT PIPELINE AND METHOD
An apparatus comprises a divide/square-root pipeline comprising: a plurality of divide/square-root iteration pipeline stages each to perform a respective iteration of a digit-recurrence divide or square root operation; and signal paths to supply outputs generated by one divide/square root iteration pipeline stage in one iteration as inputs to a subsequent divide/square root iteration pipeline stage of the divide/square-root pipeline for performing a subsequent iteration of the digit-recurrence divide or square root operation. The divide/square-root pipeline is capable of performing the digit-recurrence divide or square root operation on a floating-point operand to generate a floating-point result.
IMPLEMENTING LOGARITHMIC AND ANTILOGARITHMIC OPERATIONS BASED ON PIECEWISE LINEAR APPROXIMATION
Implementations of the disclosure provide logarithm and anti-logarithm operations on a hardware processor based on linear piecewise approximation. An example processor includes a piece wise linear log approximation circuit that receives an input of a floating-point number comprising a sign, an exponent and a mantissa. The piece wise linear log approximation circuit approximates a fractional portion of a fixed point number using a linear approximation of the mantissa of the floating-point number. The piece wise linear log approximation circuit also derives an integer from the exponent.
Arithmetic apparatus and control method of the same using cordic algorithm
An arithmetic apparatus comprises a plurality of cascade-connected arithmetic units. Each of the plurality of arithmetic units comprises: a calculator configured to operate in one of a rotation mode of performing a rotation calculation, and a vectoring mode of calculating a rotation angle; and a holding unit configured to hold rotational direction information output from the calculator in the vectoring mode. In addition, when operating in the rotation mode, the calculator performs the rotation calculation on data input from an arithmetic unit in a preceding stage, based on the rotational direction information held in the holding unit.
Pipelined hardware to accelerate modular arithmetic operations
Embodiments are directed to elliptic curve cryptography scalar multiplications in a generic field with heavy pipelining between field operations. A bit width is determined of operands in data to be processed by a modular hardware block. It is checked whether the bit width of the operands matches a fixed bit width of the modular hardware block. In response to there being a match, the modular hardware block processes the operands. In response to there being a mismatch, the operands are modified to be accommodated by the fixed bit width of the modular hardware block.
Process for performing floating point multiply-accumulate operations with precision based on exponent differences for saving power
A process for a floating point multiplier-accumulator (MAC) is operative on N pairs of floating point values using N MAC processes operating concurrently, each MAC process operating on a pair of values comprising an input value and a coefficient value. Each MAC process simultaneously generates an integer form fraction accompanied by a sign bit and an exponent difference computed by subtracting an exponent sum from a maximum exponent sum of all exponent sums. A range estimating process determines a possible range of values from the exponent differences and determines an adder precision. A summing process adds all of the integer form fractions using the determined adder precision, and converts the sum to a floating point value using the maximum exponent sum, sign bit of the summed integer form fractions, and optionally performs a 2's complement of the summed integer form fraction if the sign bit is negative.
Process for dual mode floating point multiplier-accumulator with high precision mode for near zero accumulation results
A process for a floating point multiplier-accumulator (MAC) is operative on N pairs of floating point values using N MAC processes operating concurrently, each MAC process operating on a pair of values comprising an input value and a coefficient value. Each MAC process simultaneously generates: an integer form fraction at a first bitwidth and a second bitwidth greater than the first bitwidth, a sign bit, and an exponent difference computed by subtracting an exponent sum from a maximum exponent sum of all exponent sums. The integer form fractions of the first bitwidths are provided to an adder tree using the first bitwidth, and if the sum has an excess percentage of leading 0s, then the second bitwidth is used by an adder tree using the second bitwidth to form a great precision integer form fraction. The sign, integer form fraction, and maximum exponent are provided to an normalizer which generates a floating point result.
Configurable MAC pipelines for finite-impulse-response filtering, and methods of operating same
An integrated circuit comprising a plurality MAC pipelines wherein each MAC pipeline includes: (i) a plurality of MACs connected in series and (ii) a plurality of data paths including an accumulation data path, wherein each MAC includes a multiplier to multiply to generate product data and an accumulator to generate sum data. The integrated circuit further comprises a plurality of control/configure circuits, wherein each control/configure circuit connects directly to and is associated with a MAC pipeline, wherein each control/configure circuit includes an accumulation data path which is configurable to directly connect to the accumulation data path of the MAC pipeline to form an accumulation ring when the control/configure circuit is configured in an accumulation mode, and an output data path configurable to directly connect to the output of the accumulation data path of the MAC pipeline when the control/configure circuit is configured in an output data mode.
MAC processing pipelines, circuitry to configure same, and methods of operating same
An integrated circuit comprising a plurality MAC processors, interconnected into a linear pipeline, configurable to process input data, wherein each MAC processor includes (A) a multiplier and (B) an accumulator circuit, and (C) a plurality of rotate input data paths, wherein each rotate input data path couples two sequential MAC processors of the linear pipeline including an input of the multiplier circuit of a first MAC processor of sequential MAC processors to an input of the multiplier circuit of the immediately following MAC processor of the associated sequential MAC processors of the pipelinewherein each rotate input data path is configurable to provide rotate input data from a first MAC processor of sequential MAC processors of the linear pipeline to the immediately following MAC processor of the associated sequential MAC processors thereby forming a serial circular path via the plurality of rotate input data paths.