Process for Performing Floating Point Multiply-Accumulate Operations with Precision Based on Exponent Differences for Saving Power
20220405052 · 2022-12-22
Assignee
Inventors
Cpc classification
International classification
Abstract
A process for a floating point multiplier-accumulator (MAC) is operative on N pairs of floating point values using N MAC processes operating concurrently, each MAC process operating on a pair of values comprising an input value and a coefficient value. Each MAC process simultaneously generates an integer form fraction accompanied by a sign bit and an exponent difference computed by subtracting an exponent sum from a maximum exponent sum of all exponent sums. A range estimating process determines a possible range of values from the exponent differences and determines an adder precision. A summing process adds all of the integer form fractions using the determined adder precision, and converts the sum to a floating point value using the maximum exponent sum, sign bit of the summed integer form fractions, and optionally performs a 2's complement of the summed integer form fraction if the sign bit is negative.
Claims
1. A process for performing floating point multiplier-accumulator (MAC) operations on N pairs of values, each pair of values comprising an input value and a coefficient value, the process comprising: computing, for each of the N pairs, an exclusive OR operation performed on a sign bit of the input value and a sign bit of the coefficient value and generating a sign bit; computing, for each of the N pairs of values, an integer multiplication of a mantissa of the input value and a mantissa of the coefficient value and outputting a fraction; computing, for each of the N pairs of values, an exponent difference between a maximum exponent sum from all exponent sums of the N pairs an exponent sum for a pair of values; performing, for each of the N pairs of values: a pad operation by pre-pending 0 values and appending 0 values to an associated fraction to form a first value; complementing the first value if an associated sign bit is negative to generate a second value; shifting the second value to the right by an associated exponent difference value to generate a PCS value; computing a sum of all PCS values to form a PCS sum; normalizing the PCS sum, extracting a final sign bit from the normalized PCS sum, performing a 2s complement of the normalized PCS sum if the sign bit is negative to form a final mantissa; concatenating the final sign bit, final mantissa, and a final exponent computed from an adjusted maximum exponent, number of leading 0s in the sum of all PCS values, and number of PCS pre-pended 0s into a final floating point result.
2. The process of claim 1 where a fraction of each of the N pairs of values has a precision determined by the exponent difference.
3. The process of claim 2 where a fraction of each of the N pairs of values has a precision of 4 bits when an associated exponent difference is greater than 24.
4. The process of claim 2 where a fraction of each of the N pairs of values has a precision of 8 bits when an associated exponent difference is greater than 21.
5. The process of claim 1 where a fraction of each of the N pairs of values has a precision of 12 bits when an associated exponent difference is larger than 12.
6. The process of claim 1 where an exponent difference for a MAC processor that does not have a largest exponent sum is incremented if an associated integer multiplication does not overflow and an associated exponent difference is 0.
7. The process of claim 1 where an exponent difference for a MAC processor that does not have a largest exponent sum is decremented if an associated integer multiplication has an overflow and an associated exponent difference is greater than 0.
8. The process of claim 1 where the maximum exponent is incremented if an exponent difference is 0 and an associated multiplication has an overflow.
9. The process of claim 1 where an estimate of minimum value and maximum value is based on an associated exponent difference.
10. The process of claim 1 where computing the sum of PCS values is done with a variable precision based on an estimate of mantissa range.
11. The process of claim 9 where a sum of minimum values and a sum of maximum values determines a precision of a computed sum of PCS values.
12. The process of claim 11 where a computed sum of PCS values has a full precision and a less than full precision, and the less than full precision is enabled when a sum of exponent processor maximum values and a sum of exponent processor minimum values are either both positive values or both negative values.
13. The process of claim 11 where the computed sum of PCS values is performed using configurable cascadable 8 bit adders.
14. The process of claim 13 where the sum of all PCS values is computed in at least one of a 16 bit mode, a 24 bit mode, and a 32 bit mode.
15. A process for a floating point multiplier-accumulator (MAC) multiplying and accumulating N pairs of values, each pair of values comprising an input value and a coefficient value, the process operative on a plurality N of MAC processes, each MAC process receiving an input value and a corresponding coefficient value, each MAC process comprising: a sign process operative to perform an exclusive OR on a sign bit of the input value and a sign bit of the coefficient value and output a sign bit; a mantissa process configured to perform an integer multiplication of a hidden bit restored mantissa of the input value with a hidden bit restored mantissa of the coefficient value and output a fraction, upon a fraction overflow condition, the mantissa process dividing the output fraction by two and asserting an exponent increment; an exponent process generating an exponent sum of an exponent of the input value and an exponent of the coefficient value, the exponent process receiving a maximum exponent from a centralized find maximum exponent process, the exponent process modifying the maximum exponent and also outputting an exponent difference computed by subtracting the exponent sum from the maximum exponent, the exponent process also using the exponent difference and sign bit to estimate a minimum value and a maximum value; a Pad, Complement, Shift (PCS) process receiving the output fraction from the mantissa process and also the sign bit from the sign process, the PCS process configured to pad the fraction by pre-pending and appending 0s to the fraction to generate a first value, thereafter generating a second value by performing a two's complement of the first value if the sign bit is negative and otherwise taking no action on the first value, the PCS process configured to performing a shift operation on the second value by right shifting the second value by the exponent difference to generate a PCS output; the centralized find maximum exponent process receiving an exponent sum from each exponent process of the first pipeline stage, the centralized find maximum exponent process outputting a maximum exponent value corresponding to a maximum exponent process sum; a central range process operative to sum minimum values from the exponent process and also to sum maximum values from each exponent generator, the central range process forming an adder precision based on the sum of minimum values and the sum of maximum values; an adder process summing N PCS output values to a single value, the adder process configured to perform addition using the adder precision; a final stage process normalizing the single value, generating a final stage mantissa by performing a 2s complement of the single value if the single value is negative, generating a final stage sign bit, and concatenating the final stage sign bit, final stage mantissa, and adjusted maximum exponent into a MAC result.
16. The process of claim 15 where the exponent process computes a mantissa precision based on the exponent difference.
17. The process of claim 16 where the mantissa precision is 4 bits when the exponent difference is greater than 24.
18. The process of claim 16 where the mantissa precision is 8 bits when the exponent difference is greater than 21.
19. The process of claim 16 where the mantissa precision is 12 bits when the exponent difference is larger than 12.
10. The process of claim 15 where the adjusted maximum exponent is 8 bits and equal to the maximum exponent less a number of left shifts to remove leading 0s of the single value which were not pre-pended by the PCS process, less 127.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0062]
[0063]
[0064]
[0065]
[0066]
[0067]
[0068]
[0069]
DETAILED DESCRIPTION OF THE INVENTION
[0070]
[0071] An input row vector 101 such as [a.sub.1 a.sub.2 . . . a.sub.n] and one of the columns of the coefficient matrix 103
is input to the MAC 100 of
[0072] Each MAC processor comprises a MAC processor first pipeline stage 150 and a MAC processor second pipeline stage 152. The MAC processors of 150 and 152 are followed by a common adder stage 154 which receives integer form fractions 156 from all of the MAC processors and forms the single accumulated floating point output value 148. A central find maximum exponent processor 112 receives inputs from all of the exponent processors to generate a maximum exponent sum 164, and a central range estimator 162 receives minimum and maximum estimated ranges from all of the MAC processors to generate an estimated minimum and maximum range for the purpose of determining required adder precision.
[0073] MAC processor first stage 150 separates the components (sign, exponent, and mantissa) from the pair of multiplicands (in the present example, one of the example sixteen input 101 terms and a corresponding coefficient 103 term), each term a floating point value comprising a sign bit, 8 exponent bits and 7 mantissa bits). Each of the exemplar N input terms from 101 and corresponding N coefficient terms from 103 are provided to a separate one of the 16 pipeline stages 150/152, each input term and coefficient term separated into sign, exponent, and mantissa component for processing by a respective pipeline stages.
[0074] An example floating point value may be represented by:
−1.sup.S*(1+b.sub.n*2.sup.−1+b.sub.n-1*2.sup.−2+ . . . +b.sub.0*2.sup.−n)*2.sup.E
[0075] where S is the sign bit, and [bn . . . b0] is the mantissa (for n bits), and E is the exponent (as an unsigned integer, in the range 0-255 unsigned representing an exponent range −127 to +128 in the present examples). It is important to note that the mantissa leading term 1 which precedes b.sub.n*2.sup.−1 in the above expression is known as a “hidden bit” in the representation of the floating point number, as it is implied by the floating point format but is not expressly present in the floating point format. Accordingly, the range of a mantissa of the above format is always in the range from 1.0 to less than 2.0. These floating point format examples and N=16 adder tree of
[0076] Each first pipeline stage 150 has a sign bit processor 105 and sign bit (XOR) register 107, a mantissa processor 104 and fraction register 108, and an exponent processor 106. The Find Max Exponent 112 function is shown in dashed lines as it is a separate module which receives exponent sums from all N stages of exponent processor 106 and provides its MAX_EXP output 164 representing the maximum exponent from among the exponent processors 106 to all exponent processors 106.
[0077]
[0078]
[0079]
[0080] Exponent Difference Adjustment 406 is operative to modify EXP_DIFF (Max_Exp-curr_exp) 404 and MAX_EXP 154 as described below to generate the Exp_Diff output 115 and MAX_EXP 130A according to the method of
[0081] EXP_DIFF 115 is generated by incrementing Max_exp-current 404 if EXP_INC 113 is not asserted and the current stage is also the largest exponent (path 728 of
[0082] EXP_DIFF 115 is generated by decrementing max-current 404 if EXP_INC 113 is asserted and the current station is not the largest exponent sum (path 729 of
[0083] MAX_EXP increments if EXP_INC is asserted and the current station is also the largest exponent (path 732 of
[0084] Each exponent processor 106 generates an output range_est 117 derived from the exponent difference 404 and sign bit 166, and also generates an output Reg_en 111 derived from the exponent difference 404. These signals are used to reduce power consumption for certain cases that may come up frequently in floating point multiply-accumulate operations. The larger the exponent difference for a particular stage performing one of the N multiplications, the less likely that particular component will influence the accumulated result compared to contributions by multiplication results from pipeline stages with exponent differences closer to 0, and energy can be saved by not toggling register or processor bits for contributions with lower significance. In an example of the invention, Reg_en 111 controls the number of bits processed in the fraction register 108 or optionally mantissa processor 104 based on exponent difference. In one example of the invention shown in
[0085] One important feature of exponent summing is that each 8 bit exponent of a floating point format has an exponent range from 0-255 decimal, representing an exponent range from −127 to 128, whereas the exponent sum is being done as unsigned numbers for simplicity in the current example of the invention. Accordingly, when multiplying two floating point numbers A and B with exponents EXP_A and EXP_B, the values represented by the exponent sum as (EXP_A−127)+(EXP_B−127), but when adding these as unsigned integers for simplicity as in the present application, the second −127 must be compensated before forming the exponent in the final stage. This compensation may be done at each MAC Processor exponent processor, or at the final stage before presenting the floating point MAC result. In the present invention, for an 8 bit exponent value, subtracting 127 for this compensation may be done either at each MAC processor exponent processor, or the compensation may be done once at the final output stage 146 by subtracting 127 from MAX_EXP 130 when the leading bit adjustments of normalizing the integer form fraction 168 is done. While not explicitly described in the N exponent processors 106 or the single normalizing stage 146, it is understood that this compensation may be done in either location.
[0086] Additionally, the adders 154 do not require full precision if the range of values being added results in a narrow range of possible values, as the lower significant bits of the addition operations similarly do not require as great an adder precision, which can be an additional source of power savings by not enabling those additional bits. In another example of the invention, the adders 124, 140, 142, and 144 are 32 bit adders comprised of a cascaded series of four 8 bit adders which can be enabled independently starting with the most significant 8 bits and adding subsequent 8 bit additional adders. In this embodiment, the exponent processor 106 generates a range estimate 106 based on identifying the smallest signed value and the largest signed value that each mantissa processor and exponent generator could produce by examination of the exponent difference only, combined with the sign bit. Each stage computes its possible signed smallest and largest values, which are added together by overall range estimator 162 to enable an appropriate adder precision, with the example 8 bit adders enabled from most significant adder to least significant adder using the adder_en signal 120. As a simplified example, if N=4 and each stage range estimator 408 generates the (min,max) values (8,16), (−64,−32), (4,8), and (8,16), the central range estimator 162 will estimate a range of (−44, 8). Since this range crosses 0, the summed value could include very small values such as 0.00001, requiring full precision (32 bit in the present example) of the adders. If the second value were (84,168) instead of (−64,−32), the range would be (84,168) (a single power of two different) indicating that the adders require less precision, such as the minimum of two 8 bit adders for 16 bits of precision. The relationship between overall range and number of adders enabled by the central range estimator 162 may be determine in any manner which preserves accuracy. In one example of the invention, an overall estimated range which includes a negative lower value and positive upper value results in adder_en enabling all adders, whereas an overall range which is entirely negative or positive enables fewer than all adders, such as two or three adders. Where the range is entirely positive or entirely negative, and has an upper extent which is separated by a multiple of more than 2.sup.7 or 2.sup.8 times the lower extent, enabling one or preferably two 8 bit adders may be used, and if the upper extent is separated by less than a multiple of more than 2.sup.7 or 2.sup.8 times the lower extent, enabling two or three adders may be used. In this manner, the adders 124, 140, 142, and 144 operate with variable precision depending on the result of the central range estimator.
[0087] The adders 124, 140, 142, and 144 summing the N outputs of the N second pipeline stage 152 PCS processor 122 are shown in
[0088] The Pad, Complement, Shift (PCS) Processor 122 is shown in the block diagram of
[0089] In this manner, each of the N first pipeline stages of
[0090] The second pipeline stage 152 is operative to receive the corresponding first pipeline stage outputs and perform additional operations. The mantissa Pad/Complement Shift (PCS) stage 122 receives the normalized mantissa value 114 from the first pipeline stage 150, and performs a first step of padding, whereby a fixed number of 0s is prepended and a fixed number of 0s is appended. Prepending leading 0s is done to maintain the range and precision of the summed result to prevent subsequent overflows during addition of the results from the example N=16 second pipeline stages during adder stage 154. For the addition of N=16 integers, an optimal padding of four prepended leading 0s is sufficient to prevent an overflow error during the addition of the 16 normalized mantissas. For an example 32 bit integer form fraction, the normalized mantissa integer 114 having 16 bits may be padded with 4 0 bits prepended (to accommodate 16 maximum non-overflow addition operations), and 12 0s may be appended to form a first integer form fraction of 32 bits. In general, the bit size after padding (shown as 32 in the present example, motivated by the use of four 8 bit adders which are individually enabled by Adder_en 120 from
[0091] The third step mantissa shift stage 506 of
[0092] The N output values from the Mantissa PCS 122 stage are summed in adder stage 154 as a binary tree of adders 124, 140, 142, and 144, resulting in a single integer form fraction value sent to output stage 146. If the integer form fraction 168 input to 146 is negative, then a negative sign bit component is generated, and a 2s complement of the integer form fraction 168 input to 146 is generated, along with a normalization step to round the integer form fraction 168 to the nearest 7 bit mantissa value and truncated to the mantissa component output format, in the present example, 7 bits (without hidden “1.” bit as previously described), and the exponent component is the MAX_EXP 130 output by exponent difference adjustment stage 406 with decimal 127 subtracted and also subtracting the number of leading 0s (ignoring the number of padded 0s) and left shifting the mantissa in one example of the invention. The number of pre-pended 0s of the PCS stage are removed during normalization, but not used in computing the adjusted exponent of the final MAC floating point result. If the integer form fraction input to output stage 146 is positive, the sign bit component is 0, the mantissa component is rounded and truncated to the number of bits required, and the exponent component is computed as before. The floating point output value is then the sign bit component, the exponent component, and the mantissa component according to the standard format previously described for floating point numbers.
[0093]
[0094] Step 706 is the separation of sign, mantissa, and exponent, as was previously described in
[0095]
[0096]
[0097]
[0098]
[0099] In a second example of the invention shown in
[0100]
[0101]
[0102] The present examples are provided for illustrative purposes only, and are not intended to limit the invention to only the embodiments shown. For example, the apparatus may be practiced as N pipeline stages operating concurrently, each pipeline stage forming an integer form fraction for use by a summing stage, with a first and second pipeline stage, so that each clock cycle generates a new MAC result. Alternatively, it is possible to scan the exponent sums to determine the MAC_EXP value, and thereafter to compute and sum each integer form fraction output from each Mantissa PCS stage separately, and accumulate each mantissa PCS output sequentially. The invention may be practiced as an apparatus or as a process without limitation to the examples provided merely for understanding the invention.