FRACTIONAL LOGARITHMIC NUMBER SYSTEM ADDER

Abstract

An adder for fractional logarithmic number system (FLNS) format operands includes a compare-and-swap circuit that inputs first and second FLNS operands represented by fixed point values and provides a greater one as operand x and a lesser or equal one as operand y. Sign bits are s.sub.x and s.sub.y of x and y, respectively, q.sub.x and q.sub.y, are integer portions of x and y, respectively, fraction portions of x and y have integer values r.sub.x and r.sub.y, respectively. The compare-and-swap circuit is configured to provide s.sub.x as a sign bit, s.sub.z of a sum z=x(1+y/x) for x0. A subtraction circuit subtracts (q.sub.y+r.sub.y/n)(q.sub.x+r.sub.x/n) and outputs q.sub. and r.sub., such that =y/x, where n=2.sup.w.sup.r and w.sub.r is a bit-width of r.sub.x and r.sub.y. An approximation circuit provides an approximation of (1+) to a nearest FLNS value, , as fixed point value having an integer portion q.sub. and a fraction portion that has an integer value r.sub.. A summing circuit adds q.sub.x+r.sub.x/n+q.sub.+r.sub./n in response to s.sub.x=s.sub.y, and subtracts q.sub.x+r.sub.x/nq.sub.r.sub./n in response to s.sub.xs.sub.y, to provide the sum as a fixed point value having an integer portion q.sub.z and a fraction portion that as an integer has a value r.sub.z.

Claims

1. An adder for fractional logarithmic number system (FLNS) format operands, comprising: a compare-and-swap circuit configured to input first and second FLNS operands represented by fixed point values and provide a greater one of the first and second operands as operand x, and provide a lesser or equal one of the first and second operands as operand y, wherein s.sub.x and s.sub.y are sign bits of x and y, respectively, q.sub.x and q.sub.y, are integer portions of x and y, respectively, fraction portions of x and y that as integers have values r.sub.x and r.sub.y, respectively, x=s.sub.x.Math.2.sup.q.sup.x.sup.+r.sup.x.sup./n, y=s.sub.y.Math.2.sup.q.sup.y.sup.+r.sup.y.sup./n, n=2.sup.w.sup.r, w.sub.r is a bit-width of r.sub.x and r.sub.y, and the compare-and-swap circuit is configured to provide s.sub.x as a sign bit, s.sub.z of a sum z=x(1+y/x) for x0; a subtraction circuit configured to subtract (q.sub.y+r.sub.y/n)(q.sub.x+r.sub.x/n) and output q.sub. and r.sub., wherein =y/x; an approximation circuit configured to provide an approximation of (1+) to a nearest FLNS value, , as fixed point value having an integer portion q.sub. and a fraction portion that as an integer has a value r.sub.; and a summing circuit configured to add q.sub.x+r.sub.x/n+q.sub.+r.sub./n in response to s.sub.x=s.sub.y, and subtract q.sub.x+r.sub.x/nq.sub.r.sub./n in response to s.sub.xs.sub.y, to provide the sum as a fixed point value having an integer portion q.sub.z and a fraction portion that as an integer has a value r.sub.z.

2. The adder of claim 1, wherein the approximation circuit is configured to provide to the FLNS value nearest to (1+2.sup.q.sup..sup.+r.sup..sup./n) in response to s.sub.x=s.sub.y, and the FLNS value nearest to (12.sup.q.sup..sup.+r.sup..sup./n) in response to s.sub.xs.sub.y.

3. The adder of claim 1, wherein the approximation circuit is configured to: map each range of a plurality of ranges of a plurality of possible values of q.sub.+r.sub./n to a respective value of r.sub. according to a first mapping in response to s.sub.x=s.sub.y; and map each range of a plurality of ranges of a plurality of possible values of q.sub.+r.sub./n to a respective value of r.sub. according to a second mapping in response to s.sub.xs.sub.y and |x|2|y|.

4. The adder of claim 3, wherein the approximation circuit is configured to map each value of r.sub. to a respective pair of values of q.sub. and r.sub. according to a third mapping in response to s.sub.xs.sub.y and |x|<2|y|<2|x|.

5. The adder of claim 4, wherein the approximation circuit includes a look-up table (124) that implements the third mapping.

6. The adder of claim 3, wherein the approximation circuit includes: a first decision-tree circuit configured to implement the first mapping; and a second decision-tree circuit configured to implement the second mapping.

7. The adder of claim 1, wherein the approximation circuit is configured to: map each range of a plurality of ranges of a plurality of possible values of q.sub.+r.sub./n to a respective value of r.sub. according to a first mapping in response to s.sub.x=s.sub.y; and map each range of a plurality of ranges of a plurality of possible values of q.sub.+r.sub./n to a respective value of r.sub. according to a second mapping in response to s.sub.xs.sub.y and q.sub.0.

8. The adder of claim 7, wherein the approximation circuit is configured to map each value of r.sub. to a respective pair of values of q.sub. and r.sub. according to a third mapping in response to s.sub.xs.sub.y and q.sub.=0.

9. The adder of claim 1, wherein the summing circuit includes: a twos-complement converter circuit configured to convert the fixed point value having q.sub. and r.sub. to a negative twos-complement value; a selector circuit configured to select as an addend the fixed point value having q.sub. and r.sub. in response to s.sub.x=s.sub.y, and select as the addend the negative twos-complement value in response to s.sub.xs.sub.y; and an adder circuit configured to add x to the addend.

10. The adder of claim 1, wherein the approximation circuit includes: a first decision-tree circuit configured to map each range of a plurality of ranges of a plurality of possible values of q.sub.+r.sub./n to a respective value of r.sub. according to a first mapping in response to s.sub.x=s.sub.y, wherein the first decision-tree circuit is configured to compare q.sub.+r.sub./n to threshold values of log.sub.2(2.sup.r.sup..sup./n+2.sup.r.sup..sup.+1/n2)1 for a plurality of values of r.sub.0; and a second decision-tree circuit configured to map each range of a plurality of ranges of a plurality of possible values of q.sub.+r.sub./n to a respective value of r.sub. according to a second mapping in response to s.sub.xs.sub.y and q.sub.0, wherein the second decision-tree circuit is configured to compare q.sub.+r.sub./n to threshold values of log.sub.2(2.sup.r.sup..sup./n2.sup.r.sup..sup.1/n+2)1 for a plurality of values of r.sub.0.

11. The adder of claim 10, wherein the approximation circuit includes a look-up table (124) that implements the third mapping, and the look-up table is configured with values of log.sub.2(12.sup.r.sup..sup./n) for r.sub.1.

12. A method for adding fractional logarithmic number system (FLNS) format operands, comprising: inputting first and second FLNS operands represented by fixed point values to a compare-and-swap circuit and providing a greater one of the first and second operands as operand x, and providing a lesser or equal one of the first and second operands as operand y, wherein s.sub.x and s.sub.y are sign bits of x and y, respectively, q.sub.x and q.sub.y, are integer portions of x and y, respectively, fraction portions of x and y that as integers have values r.sub.x and r.sub.y, respectively, x=s.sub.x.Math.2.sup.q.sup.x.sup.+r.sup.x.sup./n, y=s.sub.y.Math.2.sup.q.sup.y.sup.+r.sup.y.sup./n, n=2.sup.w.sup.r, w.sub.r is a bit-width of r.sub.x and r.sub.y; providing s.sub.x as a sign bit, s.sub.z of a sum, z=x(1+y/x) for x0; subtracting by a subtraction circuit, (q.sub.y+r.sub.y/n)(q.sub.x+r.sub.x/n) and outputting q.sub. and r.sub., wherein =y/x; approximating by an approximation circuit, an approximation of (1+) as fixed point value having an integer portion q.sub. and a fraction portion that as an integer has a value r.sub.; and adding by a summing circuit, q.sub.x+r.sub.x/n+q.sub.+r.sub./n in response to s.sub.x=s.sub.y, and subtracting q.sub.x+r.sub.x/nq.sub.r.sub./n in response to s.sub.xs.sub.y, and providing the sum as a fixed point value having an integer portion q.sub.z and a fraction portion that as an integer has a value r.sub.z.

13. The method of claim 12, wherein is an FLNS value nearest to (1+2.sup.q.sup..sup.+r.sup..sup./n) in response to s.sub.x=s.sub.y, and the FLNS value nearest (12.sup.q.sup..sup.+r.sup..sup./n) in response to s.sub.xs.sub.y.

14. The method of claim 12, wherein the approximating includes: mapping each range of a plurality of ranges of a plurality of possible values of q.sub.+r.sub./n to a respective value of r.sub. according to a first mapping in response to s.sub.x=s.sub.y; and mapping each range of a plurality of ranges of a plurality of possible values of q.sub.+r.sub./n to a respective value of r.sub. according to a second mapping in response to s.sub.xs.sub.y and |x|2|y|.

15. The method of claim 14, wherein the approximating includes mapping each value of r.sub. to a respective pair of values of q.sub. and r.sub. according to a third mapping in response to s.sub.xs.sub.y and |x|<2|y|<2|x|.

16. The method of claim 15, wherein the approximating includes performing the third mapping by a look-up table.

17. The method of claim 14, wherein the approximating includes: performing the first mapping by a first decision-tree circuit; and performing the second mapping by a second decision-tree circuit.

18. The method of claim 12, wherein the approximating includes: mapping each range of a plurality of ranges of a plurality of possible values of q.sub.+r.sub./n to a respective value of r.sub. according to a first mapping in response to s.sub.x=s.sub.y; and mapping each range of a plurality of ranges of a plurality of possible values of q.sub.+r.sub./n to a respective value of r.sub. according to a second mapping in response to s.sub.xs.sub.y and q.sub.0.

19. The method of claim 12, wherein the approximating includes: mapping by a first decision-tree circuit, each range of a plurality of ranges of a plurality of possible values of q.sub.+r.sub./n to a respective value of r.sub. according to a first mapping in response to s.sub.x=s.sub.y, and comparing q.sub.+r.sub./n to threshold values of log.sub.2(2.sup.r.sup..sup./n+2.sup.r.sup..sup.+1/n2)1 for a plurality of values of r.sub.0; and mapping by a second decision-tree circuit, each range of a plurality of ranges of a plurality of possible values of q.sub.+r.sub./n to a respective value of r.sub. according to a second mapping in response to s.sub.xs.sub.y and q.sub.0, and comparing q.sub.+r.sub./n to threshold values of log.sub.2(2.sup.r.sup..sup./n2.sup.r.sup..sup.1/n+2)1 for a plurality of values of r.sub.0.

20. The method of claim 19, wherein the approximating includes mapping by a look-up table configured with values of log.sub.2(12.sup.r.sup..sup./n) for r.sub.1.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0007] Various aspects and features of the circuits and methods will become apparent upon review of the following detailed description and upon reference to the drawings in which:

[0008] FIG. 1 shows an exemplary circuit arrangement that implements a conversion-free FLNS adder;

[0009] FIG. 2 illustrates the thermometer function and mapping implemented by a first mapping circuit;

[0010] FIG. 3 illustrates the thermometer function and mapping implemented by a second mapping circuit;

[0011] FIG. 4 illustrates the mapping implemented by a third mapping circuit;

[0012] FIG. 5 shows an exemplary decision tree that implements the first mapping circuit;

[0013] FIG. 6 is a flowchart of an exemplary process of adding two FLNS operands; and

[0014] FIG. 7 is a block diagram depicting a System-on-Chip (SoC) that can implement the FLNS adder circuitry according to an example.

DETAILED DESCRIPTION

[0015] In the following description, numerous specific details are set forth to describe specific examples presented herein. It should be apparent, however, to one skilled in the art, that one or more other examples and/or variations of these examples may be practiced without all the specific details given below. In other instances, well known features have not been described in detail so as not to obscure the description of the examples herein. For ease of illustration, the same reference numerals may be used in different diagrams to refer to the same elements or additional instances of the same element.

[0016] Fractional LNS (FLNS) formats have been used to improve LNS precision via fractional exponents. In an FLNS format, the exponent is represented by a quotient and a remainder. In the FLNS representation of a number x, where M is the bit-width of x,

x=s.sub.x*2.sup.{dot over (x)}/, {dot over (x)}=0; 1; 2; . . . , 2.sup.M11;

where {dot over (x)} is an integer and is the base factor that controls the fractional exponent of the base. controls the quantization gap, which is the distance between successive representable values within the number system.

[0017] The FLNS expression of x can be alternatively stated as:

x=s.sub.x.Math.2.sup.q.sup.x.sup.+r.sup.x.sup./n, q.sub.x custom-character , r.sub.x, n=2.sup.w.sup.r, r.sub.x<n

where q.sub.x and r.sub.x are the quotient and remainder of {dot over (x)}/, and w.sub.r represents the bit-width of the remainder.

[0018] Prior approaches involving FLNS, have attempted to reduce the hardware resource requirements for performing addition operations by converting the operands to fixed point format and using lookup tables to determine the contribution of remainder of the exponent. However, the conversion between FLNS format and fixed point introduces extra overhead and can significantly degrade performance in applications such as neural networks.

[0019] The disclosed approaches avoid inefficiencies associated with converting operands to fixed point values and converting sums back to FLNS format while improving computational efficiency in adding FLNS format operands. Operands need not be converted from FLNS format to fixed point for accumulation. Avoiding the conversion of values between FLNS format and fixed-point format can significantly improve performance and reduce resource requirements in applications such as neural networks in which accumulated values from one layer are provided as input to the next layer for multiplications.

[0020] The disclosed methods and circuits provide a conversion-free FLNS adder of two operands. Each addition is performed by way of a subtraction circuit performing logarithmic division of the operand having the lesser absolute value by the operand having the greater absolute value, approximation circuitry estimating a nearest FLNS value of the result plus 1, and an adder circuit performing a logarithmic multiplication of the estimated value and the greater operand.

[0021] FIG. 1 shows an exemplary circuit arrangement 100 that implements a conversion-free FLNS adder. Calculating z=x+y can be alternatively expressed as z=x(1+), where |x|>|y|, =y/x, and x0. Having x and y in FLNS form, allows to be determined by subtracting fixed point exponents:

=s.sub..Math.2.sup.q.sup..sup.+r.sup..sup./n=s.sub.xs.sub.y.Math.2.sup.q.sup.y.sup.q.sup.x.sup.+r.sup.y.sup./nr.sup.x.sup./n

[0022] The term (1+) can be approximated (1+.fwdarw.) to the nearest FLNS value, =2.sup.q.sup..sup.+r.sup..sup./n, using a quantization mapping, 1+.fwdarw., such that:

[00001] $z x$ $2^{q_{x} + q_{} + r_{x} / n + r_{/ n}}$

[0023] The sum, z, can be efficiently calculated by adding the exponents of x and . Note that >0 because |y|/|x|<1.

[0024] Referring to FIG. 1, FLNS format operands OP1 and OP2 are input from registers 102 and 104, respectively. Each of the FLNS operands can be in fixed-point two-complement form, having a sign bit (s.sub.OP1 and s.sub.OP2) and exponent elements that include a quotient (q.sub.OP1 and q.sub.OP2) and a remainder (r.sub.OP1 and r.sub.OP2). The integer portion of OP1 is q.sub.OP1, and the fractional portion of OP1 when interpreted as an integer is r.sub.OP1. The integer portion of OP2 is q.sub.OP2, and the fractional portion of q.sub.OP2 when interpreted as an integer is r.sub.OP2. An example of a fixed point operand having a 4-bit integer portion and a 3-bit fraction portion is 0110.010. The integer portion q is 0110, which is 6.sub.10, and the fraction portion is 010, which has an integer value of r=2.

[0025] Circuits 106 and 108 compare the exponent elements of OP1 and OP2 and provide the one of OP1 and OP2 having the greater absolute value as a fixed-point two-complement operand x in register 110 and the operand having the lesser absolute value as a fixed-point two-complement operand y in register 112.

[0026] Subtraction circuit 114 subtracts x from y (yx=q.sub.yq.sub.x+r.sub.y/nr.sub.x/n) and stores the result in fixed-point two-complement form in register 116. The integer portion of the value in register 116 is q.sub., and the fraction portion of the value in register 116 when interpreted as an integer is r.sub..

[0027] Comparison circuit 118, mapping circuits M.sub.1, M.sub.2, and M.sub.3, and selector circuit 126 form an approximation circuit. The approximation circuit that maps (1+) to the nearest FLNS value, =2.sup.q.sup..sup.+r.sup..sup./n. is an FLNS value nearest to (1+2.sup.q.sup..sup.+r.sup..sup./n) in response to s.sub.x=s.sub.y, and an FLNS value nearest to (12.sup.q.sup..sup.+r.sup..sup./n) in response to s.sub.xs.sub.y (x+y=x*(1+y/x), and xy=x*(1y/x)).

[0028] The mapping circuits 120, 122, and 124 implement three different mappings, and the selector circuit selects the output from one of the mapping circuits. Each of mapping circuit M.sub.1 and M.sub.2 outputs an unsigned binary format integer r.sub., and mapping circuit M.sub.3 outputs unsigned binary integers q.sub. and r.sub.. The different mappings are based on mutually exclusive cases of the signs and ratio of |x| to |y|. The output of mapping M.sub.1 (case (i)) is selected in response to s.sub.x=s.sub.y, the output of mapping M.sub.2 (case (ii)) is selected in response to s.sub.xs.sub.y and |x|2|y|, and the output of mapping M.sub.3 (case (iii)) is selected in response to s.sub.xs.sub.y and |x|<2|y|<2|x|.

[0029] After swapping x and y such that |x||y|, x+y is computed as x+y=x(1+y/x)x2.sup.. If s.sub.x=s.sub.y, then +y/x>1, 2.sup.>1, and 0. If s.sub.xs.sub.y, then +y/x<1, 2.sup.<1, and <0. Thus, for case i, >0, and for cases ii and iii, <0. To avoid twos-complement conversions for accessing the mapping circuits, the implemented mappings assume >0, and is applied differently between case i and cases ii and iii. For case i, +yx2.sup., and for cases ii and iii: x+yx2.sup.. The output from M.sub.2 for case ii is r.sub.0, though the actual value of r.sub. for in case ii is less than or equal to 0. The output from M.sub.3 for case iii is q.sub.0 and r.sub.>0, though the actual value of q.sub. is less than or equal to 0 and r.sub. is less than 0. Given that the actual values of mappings for cases ii and iii are less than or equal to 0, the outputs from mapping circuits M.sub.2 and M.sub.3 are converted to negative twos-complement values.

[0030] The mappings have either n or n1 entries. In mapping M.sub.1, the sum z is bounded within range (x, 2x], i. e., (s.sub.x.Math.2.sup.q.sup.x.sup.+r.sup.x.sup./n, s.sub.x.Math.2.sup.q.sup.x.sup.+r.sup.x.sup./n+1]. In FLNS format, r.sub.x, n custom-character , and z has n discrete possible values, and therefore, the M.sub.1 mapping has n meaningful discrete entries. The same is true in mapping M.sub.2, where z is bounded within range [1/2 x, x), i.e., [s.sub.x.Math.2.sup.q.sup.x.sup.+r.sup.x.sup./n1, s.sub.x.Math.2.sup.q.sup.x.sup.+r.sup.x.sup./n). In mapping M.sub.3, 1/2<<1, i. e., 2.sup.1<2.sup.q.sup..sup.+r.sup..sup./n<2.sup.0, such that 1<q.sub.+r.sub./n<0. Because can only take one of n1 possible values, meaning that the 1+.fwdarw. mapping M.sub.3 contains n1 meaningful discrete entries.

[0031] Selector circuit 126 selects one of the outputs from the mapping circuits 120, 122, and 124 based on the states of the signals from comparison circuit 118 and the signal from XNOR circuit 130. In response to s.sub.x=s.sub.y, the selector circuit selects the output from mapping circuit 120 (M.sub.1); in response to s.sub.xs.sub.y and q.sub.0, the selector circuit selects the output from mapping circuit 122 (M.sub.2); and in response to s.sub.xs.sub.y and q.sub.=0, the selector circuit selects the output from mapping circuit 124 (M.sub.3). The signed binary integers q.sub. and r.sub. are stored as a signed fixed point value in register 128. The integer portion of the value in register 128 is q.sub., and the fraction portion of the value in register 128 when interpreted as an integer is r.sub..

[0032] Note that q.sub.=0 is stored in register 128 when the output of mapping M.sub.1, or M.sub.2 is selected.

[0033] For case (i), the output of mapping M.sub.1 is always a positive value, and for cases (ii) and (iii), the outputs of mappings M.sub.2 and M.sub.3 are negative but unsigned. Twos-complement converter circuit 132 converts the value from register 128 to a signed twos-complement value (invert integer bits and add 1 to LSB), and selector circuit 134 selects either the value from register 128 or the signed twos-complement value from converter circuit 132 in response to the signal from XNOR circuit 130. In response to s.sub.x=s.sub.y, the signal from XNOR circuit causes selector circuit 134 to select the output from register 128, and in response to s.sub.xs.sub.y, the signal from XNOR circuit causes selector circuit 134 to select the output from converter circuit 132.

[0034] Summing circuitry adds q.sub.x+r.sub.x/n+q.sub.+r.sub./n in response to s.sub.x=s.sub.y, and subtracts q.sub.x+r.sub.x/nq.sub.r.sub./n in response to s.sub.xs.sub.y, to provide the sum z as a fixed point value having an integer portion q.sub.z and a fraction portion that as an integer has the value r.sub.z, (s.sub.z*2.sup.q.sup.z.sup.+r.sup.z.sup./nx+y. The summing circuitry includes two-complement converter 132, selector circuit 134, and adder 136.

[0035] The two-complement converter 132 is a circuit that converts the unsigned fixed point value from register 128 to a negative twos-complement value. The selector circuit 134 selects as an addend either the fixed point value from register 128 in response to the signal from XNOR circuit 130 indicating s.sub.x=s.sub.y, or the negative twos-complement value from circuit 132 in response to the signal from the XNOR circuit indicating s.sub.xs.sub.y. The adder circuit 136 adds the value from register 110 (without the sign bit s.sub.x) to the addend (without the sign bit if the twos-complement value is selected) selected by selector circuit 134 and provides the sum as a fixed point value in register 138.

[0036] FIGS. 2, 3, and 4 illustrate the mapping of 1+.fwdarw. for cases i, ii, and iii, respectively. According to the disclosed approaches, the mappings of cases i and ii are implemented by thermometer functions. The thermometer functions, which can be implemented by decision tree circuits, map between the exponent of , q.sub.+r.sub./n, and the fractional exponent of , r.sub.. FIG. 5 shows an exemplary implementation of the decision tree circuit for mapping M.sub.1. The decision tree circuit for mapping M.sub.2 would have different thresholds, but is not shown. Thermometer thresholds separating the entries can be pre-computed and configured into the decision tree circuits as constant values. The thresholds can be calculated by solving an inequality for each pair of adjacent entries. For example, the inequality for case (i) is:

r.sub. custom-character , r.sub.<n1, 1+2.sup.q.sup..sup.+r.sup..sup./n2.sup.r.sup..sup./n2.sup.r.sup..sup.+1/n12.sup.q.sup..sup.+r.sup..sup./n

which reduces to:

r.sub. custom-character , r.sub.<n, q.sub.+r.sub./nlog.sub.2(2.sup.r.sup..sup./n+2.sup.r.sup..sup.+1/n2)1,

The right-hand side of the inequality defines the threshold values. Similarly, the inequality at case (ii) is

r.sub. custom-character , r.sub.<n, q.sub.+r.sub./nlog.sub.2(2.sup.r.sup..sup./n2.sup.r.sup..sup.1/n+2)1

[0037] FIG. 2 illustrates the thermometer function and mapping implemented by the mapping circuit M.sub.1 for case (i). The input is a fixed point value, q.sub.+r.sub./n, and the output is an integer value of r.sub. that maps to the input. Each threshold is computed as log.sub.2(2.sup.r.sup..sup./n+2.sup.r.sup..sup.+1/n2)1 for one of the possible values of r.sub.. For example, the threshold computed for r.sub.=3 is: log.sub.2(2.sup.3/n+2.sup.4/n2)1. A value of q.sub.+r.sub./nlog.sub.2(2.sup.3/n+2.sup.4/n2)1 and greater than log.sub.2(2.sup.2/n+2.sup.3/n2)1 maps to r.sub.=3.

[0038] FIG. 3 illustrates the thermometer function and mapping implemented by the mapping circuit M.sub.2 for case (ii). The input is a fixed point value, q.sub.+r.sub./n, and the output is an integer value of r.sub. that maps to the input. Each threshold is computed as log.sub.2(2.sup.r.sup..sup./n2.sup.r.sup..sup.1/n+2)1 for one of the possible values of r.sub.. For example, the threshold computed for r.sub.=3 is: log.sub.2(2.sup.3/n2.sup.4/n+2)1. A value of q.sub.+r.sub./nlog.sub.2(2.sup.3/n2.sup.4/n+2)1 and greater than log.sub.2(2.sup.2/n2.sup.3/n+2)1 maps to r.sub.=3.

[0039] FIG. 4 illustrates the mapping implemented by the mapping circuit M.sub.3 for case (iii). In case (iii), |x|<2|y|<2|x|, which means that x and y are close in magnitude. Because 1<q.sub.r.sub./n<0, it is known that q.sub.=0, and r.sub. custom-character , 0<r.sub.<n. That is, there are n1 discrete entries in the mapping of 1+.fwdarw. in case (iii). Given consecutive integer values of r.sub., the mapping can be implemented as a lookup table (LUT) circuit having (n1) entries.

[0040] The input to the LUT circuit is an integer value of r.sub., and the output is a fixed point value, q.sub.+r.sub./n, having q.sub. as the integer portion and the fraction portion r.sub. if interpreted as an integer. The values configured into the LUT circuit are pre-computed as log.sub.2(12.sup.r.sup..sup./n).

[0041] FIG. 5 shows an exemplary decision tree for the case (i) mapping. The decision tree searches for the interval between two thresholds into which an input fixed point value, q.sub.+r.sub./n falls. Each comparison (cmp) compares the input fixed point value, q.sub.+r.sub./n to a threshold and reduces the remaining search space by .

[0042] The maximum threshold, T(r.sub._max), is the pre-computed threshold with the maximum possible value of r.sub., and T(r.sub._min), is the pre-computed threshold with the minimum possible value of r.sub.. Each threshold T(r.sub.) is computed as log.sub.2(2.sup.r.sup..sup./n+2.sup.r.sup..sup.+1/n2)1, as described above.

[0043] At the top of the search tree, comparison 202 compares q.sub.+r.sub./n to T(r.sub._max/2), which is the threshold at approximately the middle of the range values of r.sub.. Note that each division of r.sub._max can be the floor of the result (i.e., floor (r.sub._max/m) for m a power of 2 greater than 0).

[0044] In response to q.sub.+r.sub./n being equal to the threshold T(r.sub._max/2), the output value is r.sub._max/2. In response to q.sub.+r.sub./n<T(r.sub._max/2), the decision tree continues with comparison 204 of q.sub.+r.sub./n to T(r.sub._max/4). In response to q.sub.+r.sub./n>T(r.sub._max/2), the decision tree continues with comparison 206 of q.sub.+r.sub./n to T(r.sub._max/2+r.sub._max/4).

[0045] Comparison 206 compares q.sub.+r.sub./n to T(r.sub._max/2+r.sub._max/4). In response to q.sub.+r.sub./n being equal to the threshold T(r.sub._max/2+r.sub._max/4), the output value is r.sub._max/2+r.sub._max/4. In response to q.sub.+r.sub./n<T(r.sub._max/2+r.sub._max/4), the decision tree continues with comparison 208 of q.sub.+r.sub./n to T(r.sub._max/2+r.sub._max/4r.sub._max/8).

[0046] At comparison 208, in response to q.sub.+r.sub./n<T(r.sub._max/2+r.sub._max/4r.sub._max/8), the decision tree continues with a comparison of q.sub.+r.sub./n to T(r.sub._max/2+r.sub._max4/r.sub._max/8r.sub._max/16) (not shown). In response to q.sub.+r.sub./n>T(r.sub._max/2+r.sub._max/4r.sub._max/8), the decision tree continues with a comparison of q.sub.+r.sub./n to T(r.sub._max/2+r.sub._max/4+r.sub._max/8+r.sub._max/16) (not shown). In response to q.sub.+r.sub./n being equal to the threshold T(r.sub._max/2+r.sub._max/4r.sub._max/8), the output value is r.sub._max/2+r.sub._max/4r.sub._max/8.

[0047] The search in the decision tree continues as described above until the q.sub.+r.sub./n is equal to a threshold, or a comparison at the lowest level in the tree has been reached. At the lowest-level comparison, if q.sub.+r.sub./n is less than the T(x), then the output is r.sub.=x. If q.sub.+r.sub./n is greater than the T(x), then the output is r.sub.=x+1.

[0048] The decision tree can be implemented by a programmed processor or by programmable logic. The programmed processor can access a data structure having the threshold values and indexed by values of r.sub.. A programmable logic implementation can individual comparison circuits having pre-configured threshold values and associated values of r.sub..

[0049] FIG. 6 is a flowchart of an exemplary process of adding two FLNS operands. At block 302, signed fixed point FLNS operands x and y are provided as input, with a swap circuit designating the lesser of the absolute values of the two operands as y and the other operand as x. The operands each have a sign bit, s, an integer part q, and a fractional part r.

[0050] At block 304, the sign bit of operand x is selected as the sign of the sum and can be stored in a register at the bit position of the sign bit of the signed fixed point sum.

[0051] At block 306, a subtraction circuit can determine |y|/|x| by subtracting (q.sub.y+r.sub.y/n)(q.sub.x+r.sub.x/n), where (q.sub.y+r.sub.y/n) denotes the unsigned fixed point value of x, and (q.sub.x+r.sub.x/n) denotes the unsigned fixed point value of y. The difference is {q.sub., r.sub.}, which denotes the unsigned fixed point value having an integer part q.sub., and a fractional part that as an integer is denoted r.sub..

[0052] At block 308, the term (1+) is approximated (1+.fwdarw.) to the nearest FLNS value, =2.sup.q.sup..sup.+r.sup..sup./n, using a quantization mapping as previously described. Decision block 310, in response to s.sub.x=s.sub.y, selects the first mapping for case (i) as provided at block 312. Decision block 314, in response to s.sub.xs.sub.y and q.sub.0, selects the second mapping for case (ii) as provided at block 316. In response to s.sub.xs.sub.y and q.sub.=0, decision block 314 selects the third mapping for case (iii) as provided at block 318. For cases (i) and (ii), the mappings provide the mapped value of r.sub., and q.sub.=0. For case (iii), the mapping provides the value of {q.sub., r.sub.}. At block 320, the values {q.sub., r.sub.} from the mappings of cases (ii) and (iii) are converted to negative twos-complement values.

[0053] At block 322, the fixed point values {q.sub.x, r.sub.x} and {q.sub., r.sub.} are summed by and adder, and the result {q.sub.z, q.sub.z, r.sub.z} is output at block 324.

[0054] FIG. 7 is a block diagram depicting a System-on-Chip (SoC) 401 that can implement the FLNS adder circuitry according to an example. In the example, the SoC includes the processing subsystem (PS) 402 and the programmable logic subsystem 403. The processing subsystem 402 includes various processing units, such as a real-time processing unit (RPU) 404, an application processing unit (APU) 405, a graphics processing unit (GPU) 406, a configuration and security unit (CSU) 412, and a platform management unit (PMU) 411. The PS 402 also includes various support circuits, such as on-chip memory (OCM) 414, transceivers 407, peripherals 408, interconnect 416, DMA circuit 409, memory controller 410, peripherals 415, and multiplexed (MIO) circuit 413. The processing units and the support circuits are interconnected by the interconnect 416. The PL subsystem 403 is also coupled to the interconnect 416. The transceivers 407 are coupled to external pins 424. The PL 403 is coupled to external pins 423. The memory controller 410 is coupled to external pins 422. The MIO 413 is coupled to external pins 420. The PS 402 is generally coupled to external pins 421. The APU 405 can include a CPU 417, memory 418, and support circuits 419. The APU 405 can include other circuitry, including L1 and L2 caches and the like. The RPU 404 can include additional circuitry, such as L1 caches and the like. The interconnect 416 can include cache-coherent interconnect or the like.

[0055] Referring to the PS 402, each of the processing units includes one or more central processing units (CPUs) and associated circuits, such as memories, interrupt controllers, direct memory access (DMA) controllers, memory management units (MMUs), floating point units (FPUs), and the like. The interconnect 416 includes various switches, busses, communication links, and the like configured to interconnect the processing units, as well as interconnect the other components in the PS 402 to the processing units.

[0056] The OCM 414 includes one or more RAM modules, which can be distributed throughout the PS 402. For example, the OCM 414 can include battery backed RAM (BBRAM), tightly coupled memory (TCM), and the like. The memory controller 410 can include a DRAM interface for accessing external DRAM. The peripherals 408, 415 can include one or more components that provide an interface to the PS 402. For example, the peripherals can include a graphics processing unit (GPU), a display interface (e.g., DisplayPort, high-definition multimedia interface (HDMI) port, etc.), universal serial bus (USB) ports, Ethernet ports, universal asynchronous transceiver (UART) ports, serial peripheral interface (SPI) ports, general purpose (GPIO) ports, serial advanced technology attachment (SATA) ports, PCIe ports, and the like. The peripherals 415 can be coupled to the MIO 413. The peripherals 408 can be coupled to the transceivers 407. The transceivers 407 can include serializer/deserializer (SERDES) circuits, MGTs, and the like.

[0057] Various logic may be implemented as circuitry to carry out one or more of the operations and activities described herein and/or shown in the figures. In these contexts, a circuit or circuitry may be referred to as logic, module, engine, or block. It should be understood that logic, modules, engines and blocks are all circuits that carry out one or more of the operations/activities. In certain implementations, a programmable circuit is one or more computer circuits programmed to execute a set (or sets) of instructions stored in a ROM or RAM and/or operate according to configuration data stored in a configuration memory.

[0058] Though aspects and features may in some cases be described in individual figures, it will be appreciated that features from one figure can be combined with features of another figure even though the combination is not explicitly shown or explicitly described as a combination.

[0059] The circuits and methods are thought to be applicable to a variety of systems for adding FLNS operands. Other aspects and features will be apparent to those skilled in the art from consideration of the specification. The circuits and methods may be implemented as one or more processors configured to execute software, as an application specific integrated circuit (ASIC), or as a logic on a programmable logic device. It is intended that the specification and drawings be considered as examples only, with a true scope of the invention being indicated by the following claims.

FRACTIONAL LOGARITHMIC NUMBER SYSTEM ADDER

Assignee

Inventors

Cpc classification

Classification Explorer

G06F7/4833

PHYSICS

Classification Explorer

G06F7/49942

PHYSICS

International classification

Classification Explorer

G06F7/483

PHYSICS

Classification Explorer

G06F7/499

PHYSICS

Abstract

Claims

Description