FRACTIONAL LOGARITHMIC NUMBER SYSTEM ADDER
20240069865 ยท 2024-02-29
Assignee
Inventors
- Erwei Wang (London, GB)
- Samuel Richard Bayliss (Los Altos, CA, US)
- Philip James-Roxby (Longmont, CO, US)
Cpc classification
International classification
G06F7/483
PHYSICS
Abstract
An adder for fractional logarithmic number system (FLNS) format operands includes a compare-and-swap circuit that inputs first and second FLNS operands represented by fixed point values and provides a greater one as operand x and a lesser or equal one as operand y. Sign bits are s.sub.x and s.sub.y of x and y, respectively, q.sub.x and q.sub.y, are integer portions of x and y, respectively, fraction portions of x and y have integer values r.sub.x and r.sub.y, respectively. The compare-and-swap circuit is configured to provide s.sub.x as a sign bit, s.sub.z of a sum z=x(1+y/x) for x0. A subtraction circuit subtracts (q.sub.y+r.sub.y/n)(q.sub.x+r.sub.x/n) and outputs q.sub. and r.sub., such that =y/x, where n=2.sup.w.sup.
Claims
1. An adder for fractional logarithmic number system (FLNS) format operands, comprising: a compare-and-swap circuit configured to input first and second FLNS operands represented by fixed point values and provide a greater one of the first and second operands as operand x, and provide a lesser or equal one of the first and second operands as operand y, wherein s.sub.x and s.sub.y are sign bits of x and y, respectively, q.sub.x and q.sub.y, are integer portions of x and y, respectively, fraction portions of x and y that as integers have values r.sub.x and r.sub.y, respectively, x=s.sub.x.Math.2.sup.q.sup.
2. The adder of claim 1, wherein the approximation circuit is configured to provide to the FLNS value nearest to (1+2.sup.q.sup.
3. The adder of claim 1, wherein the approximation circuit is configured to: map each range of a plurality of ranges of a plurality of possible values of q.sub.+r.sub./n to a respective value of r.sub. according to a first mapping in response to s.sub.x=s.sub.y; and map each range of a plurality of ranges of a plurality of possible values of q.sub.+r.sub./n to a respective value of r.sub. according to a second mapping in response to s.sub.xs.sub.y and |x|2|y|.
4. The adder of claim 3, wherein the approximation circuit is configured to map each value of r.sub. to a respective pair of values of q.sub. and r.sub. according to a third mapping in response to s.sub.xs.sub.y and |x|<2|y|<2|x|.
5. The adder of claim 4, wherein the approximation circuit includes a look-up table (124) that implements the third mapping.
6. The adder of claim 3, wherein the approximation circuit includes: a first decision-tree circuit configured to implement the first mapping; and a second decision-tree circuit configured to implement the second mapping.
7. The adder of claim 1, wherein the approximation circuit is configured to: map each range of a plurality of ranges of a plurality of possible values of q.sub.+r.sub./n to a respective value of r.sub. according to a first mapping in response to s.sub.x=s.sub.y; and map each range of a plurality of ranges of a plurality of possible values of q.sub.+r.sub./n to a respective value of r.sub. according to a second mapping in response to s.sub.xs.sub.y and q.sub.0.
8. The adder of claim 7, wherein the approximation circuit is configured to map each value of r.sub. to a respective pair of values of q.sub. and r.sub. according to a third mapping in response to s.sub.xs.sub.y and q.sub.=0.
9. The adder of claim 1, wherein the summing circuit includes: a twos-complement converter circuit configured to convert the fixed point value having q.sub. and r.sub. to a negative twos-complement value; a selector circuit configured to select as an addend the fixed point value having q.sub. and r.sub. in response to s.sub.x=s.sub.y, and select as the addend the negative twos-complement value in response to s.sub.xs.sub.y; and an adder circuit configured to add x to the addend.
10. The adder of claim 1, wherein the approximation circuit includes: a first decision-tree circuit configured to map each range of a plurality of ranges of a plurality of possible values of q.sub.+r.sub./n to a respective value of r.sub. according to a first mapping in response to s.sub.x=s.sub.y, wherein the first decision-tree circuit is configured to compare q.sub.+r.sub./n to threshold values of log.sub.2(2.sup.r.sup.
11. The adder of claim 10, wherein the approximation circuit includes a look-up table (124) that implements the third mapping, and the look-up table is configured with values of log.sub.2(12.sup.r.sup.
12. A method for adding fractional logarithmic number system (FLNS) format operands, comprising: inputting first and second FLNS operands represented by fixed point values to a compare-and-swap circuit and providing a greater one of the first and second operands as operand x, and providing a lesser or equal one of the first and second operands as operand y, wherein s.sub.x and s.sub.y are sign bits of x and y, respectively, q.sub.x and q.sub.y, are integer portions of x and y, respectively, fraction portions of x and y that as integers have values r.sub.x and r.sub.y, respectively, x=s.sub.x.Math.2.sup.q.sup.
13. The method of claim 12, wherein is an FLNS value nearest to (1+2.sup.q.sup.
14. The method of claim 12, wherein the approximating includes: mapping each range of a plurality of ranges of a plurality of possible values of q.sub.+r.sub./n to a respective value of r.sub. according to a first mapping in response to s.sub.x=s.sub.y; and mapping each range of a plurality of ranges of a plurality of possible values of q.sub.+r.sub./n to a respective value of r.sub. according to a second mapping in response to s.sub.xs.sub.y and |x|2|y|.
15. The method of claim 14, wherein the approximating includes mapping each value of r.sub. to a respective pair of values of q.sub. and r.sub. according to a third mapping in response to s.sub.xs.sub.y and |x|<2|y|<2|x|.
16. The method of claim 15, wherein the approximating includes performing the third mapping by a look-up table.
17. The method of claim 14, wherein the approximating includes: performing the first mapping by a first decision-tree circuit; and performing the second mapping by a second decision-tree circuit.
18. The method of claim 12, wherein the approximating includes: mapping each range of a plurality of ranges of a plurality of possible values of q.sub.+r.sub./n to a respective value of r.sub. according to a first mapping in response to s.sub.x=s.sub.y; and mapping each range of a plurality of ranges of a plurality of possible values of q.sub.+r.sub./n to a respective value of r.sub. according to a second mapping in response to s.sub.xs.sub.y and q.sub.0.
19. The method of claim 12, wherein the approximating includes: mapping by a first decision-tree circuit, each range of a plurality of ranges of a plurality of possible values of q.sub.+r.sub./n to a respective value of r.sub. according to a first mapping in response to s.sub.x=s.sub.y, and comparing q.sub.+r.sub./n to threshold values of log.sub.2(2.sup.r.sup.
20. The method of claim 19, wherein the approximating includes mapping by a look-up table configured with values of log.sub.2(12.sup.r.sup.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0007] Various aspects and features of the circuits and methods will become apparent upon review of the following detailed description and upon reference to the drawings in which:
[0008]
[0009]
[0010]
[0011]
[0012]
[0013]
[0014]
DETAILED DESCRIPTION
[0015] In the following description, numerous specific details are set forth to describe specific examples presented herein. It should be apparent, however, to one skilled in the art, that one or more other examples and/or variations of these examples may be practiced without all the specific details given below. In other instances, well known features have not been described in detail so as not to obscure the description of the examples herein. For ease of illustration, the same reference numerals may be used in different diagrams to refer to the same elements or additional instances of the same element.
[0016] Fractional LNS (FLNS) formats have been used to improve LNS precision via fractional exponents. In an FLNS format, the exponent is represented by a quotient and a remainder. In the FLNS representation of a number x, where M is the bit-width of x,
x=s.sub.x*2.sup.{dot over (x)}/, {dot over (x)}=0; 1; 2; . . . , 2.sup.M11;
where {dot over (x)} is an integer and is the base factor that controls the fractional exponent of the base. controls the quantization gap, which is the distance between successive representable values within the number system.
[0017] The FLNS expression of x can be alternatively stated as:
x=s.sub.x.Math.2.sup.q.sup., r.sub.x
, n=2.sup.w.sup.
where q.sub.x and r.sub.x are the quotient and remainder of {dot over (x)}/, and w.sub.r represents the bit-width of the remainder.
[0018] Prior approaches involving FLNS, have attempted to reduce the hardware resource requirements for performing addition operations by converting the operands to fixed point format and using lookup tables to determine the contribution of remainder of the exponent. However, the conversion between FLNS format and fixed point introduces extra overhead and can significantly degrade performance in applications such as neural networks.
[0019] The disclosed approaches avoid inefficiencies associated with converting operands to fixed point values and converting sums back to FLNS format while improving computational efficiency in adding FLNS format operands. Operands need not be converted from FLNS format to fixed point for accumulation. Avoiding the conversion of values between FLNS format and fixed-point format can significantly improve performance and reduce resource requirements in applications such as neural networks in which accumulated values from one layer are provided as input to the next layer for multiplications.
[0020] The disclosed methods and circuits provide a conversion-free FLNS adder of two operands. Each addition is performed by way of a subtraction circuit performing logarithmic division of the operand having the lesser absolute value by the operand having the greater absolute value, approximation circuitry estimating a nearest FLNS value of the result plus 1, and an adder circuit performing a logarithmic multiplication of the estimated value and the greater operand.
[0021]
=s.sub..Math.2.sup.q.sup.
[0022] The term (1+) can be approximated (1+.fwdarw.) to the nearest FLNS value, =2.sup.q.sup.
[0023] The sum, z, can be efficiently calculated by adding the exponents of x and . Note that >0 because |y|/|x|<1.
[0024] Referring to
[0025] Circuits 106 and 108 compare the exponent elements of OP1 and OP2 and provide the one of OP1 and OP2 having the greater absolute value as a fixed-point two-complement operand x in register 110 and the operand having the lesser absolute value as a fixed-point two-complement operand y in register 112.
[0026] Subtraction circuit 114 subtracts x from y (yx=q.sub.yq.sub.x+r.sub.y/nr.sub.x/n) and stores the result in fixed-point two-complement form in register 116. The integer portion of the value in register 116 is q.sub., and the fraction portion of the value in register 116 when interpreted as an integer is r.sub..
[0027] Comparison circuit 118, mapping circuits M.sub.1, M.sub.2, and M.sub.3, and selector circuit 126 form an approximation circuit. The approximation circuit that maps (1+) to the nearest FLNS value, =2.sup.q.sup.
[0028] The mapping circuits 120, 122, and 124 implement three different mappings, and the selector circuit selects the output from one of the mapping circuits. Each of mapping circuit M.sub.1 and M.sub.2 outputs an unsigned binary format integer r.sub., and mapping circuit M.sub.3 outputs unsigned binary integers q.sub. and r.sub.. The different mappings are based on mutually exclusive cases of the signs and ratio of |x| to |y|. The output of mapping M.sub.1 (case (i)) is selected in response to s.sub.x=s.sub.y, the output of mapping M.sub.2 (case (ii)) is selected in response to s.sub.xs.sub.y and |x|2|y|, and the output of mapping M.sub.3 (case (iii)) is selected in response to s.sub.xs.sub.y and |x|<2|y|<2|x|.
[0029] After swapping x and y such that |x||y|, x+y is computed as x+y=x(1+y/x)x2.sup.. If s.sub.x=s.sub.y, then +y/x>1, 2.sup.>1, and 0. If s.sub.xs.sub.y, then +y/x<1, 2.sup.<1, and <0. Thus, for case i, >0, and for cases ii and iii, <0. To avoid twos-complement conversions for accessing the mapping circuits, the implemented mappings assume >0, and is applied differently between case i and cases ii and iii. For case i, +yx2.sup., and for cases ii and iii: x+yx2.sup.. The output from M.sub.2 for case ii is r.sub.0, though the actual value of r.sub. for in case ii is less than or equal to 0. The output from M.sub.3 for case iii is q.sub.0 and r.sub.>0, though the actual value of q.sub. is less than or equal to 0 and r.sub. is less than 0. Given that the actual values of mappings for cases ii and iii are less than or equal to 0, the outputs from mapping circuits M.sub.2 and M.sub.3 are converted to negative twos-complement values.
[0030] The mappings have either n or n1 entries. In mapping M.sub.1, the sum z is bounded within range (x, 2x], i. e., (s.sub.x.Math.2.sup.q.sup., and z has n discrete possible values, and therefore, the M.sub.1 mapping has n meaningful discrete entries. The same is true in mapping M.sub.2, where z is bounded within range [1/2 x, x), i.e., [s.sub.x.Math.2.sup.q.sup.
[0031] Selector circuit 126 selects one of the outputs from the mapping circuits 120, 122, and 124 based on the states of the signals from comparison circuit 118 and the signal from XNOR circuit 130. In response to s.sub.x=s.sub.y, the selector circuit selects the output from mapping circuit 120 (M.sub.1); in response to s.sub.xs.sub.y and q.sub.0, the selector circuit selects the output from mapping circuit 122 (M.sub.2); and in response to s.sub.xs.sub.y and q.sub.=0, the selector circuit selects the output from mapping circuit 124 (M.sub.3). The signed binary integers q.sub. and r.sub. are stored as a signed fixed point value in register 128. The integer portion of the value in register 128 is q.sub., and the fraction portion of the value in register 128 when interpreted as an integer is r.sub..
[0032] Note that q.sub.=0 is stored in register 128 when the output of mapping M.sub.1, or M.sub.2 is selected.
[0033] For case (i), the output of mapping M.sub.1 is always a positive value, and for cases (ii) and (iii), the outputs of mappings M.sub.2 and M.sub.3 are negative but unsigned. Twos-complement converter circuit 132 converts the value from register 128 to a signed twos-complement value (invert integer bits and add 1 to LSB), and selector circuit 134 selects either the value from register 128 or the signed twos-complement value from converter circuit 132 in response to the signal from XNOR circuit 130. In response to s.sub.x=s.sub.y, the signal from XNOR circuit causes selector circuit 134 to select the output from register 128, and in response to s.sub.xs.sub.y, the signal from XNOR circuit causes selector circuit 134 to select the output from converter circuit 132.
[0034] Summing circuitry adds q.sub.x+r.sub.x/n+q.sub.+r.sub./n in response to s.sub.x=s.sub.y, and subtracts q.sub.x+r.sub.x/nq.sub.r.sub./n in response to s.sub.xs.sub.y, to provide the sum z as a fixed point value having an integer portion q.sub.z and a fraction portion that as an integer has the value r.sub.z, (s.sub.z*2.sup.q.sup.
[0035] The two-complement converter 132 is a circuit that converts the unsigned fixed point value from register 128 to a negative twos-complement value. The selector circuit 134 selects as an addend either the fixed point value from register 128 in response to the signal from XNOR circuit 130 indicating s.sub.x=s.sub.y, or the negative twos-complement value from circuit 132 in response to the signal from the XNOR circuit indicating s.sub.xs.sub.y. The adder circuit 136 adds the value from register 110 (without the sign bit s.sub.x) to the addend (without the sign bit if the twos-complement value is selected) selected by selector circuit 134 and provides the sum as a fixed point value in register 138.
[0036]
r.sub., r.sub.<n1, 1+2.sup.q.sup.
which reduces to:
r.sub., r.sub.<n, q.sub.+r.sub./nlog.sub.2(2.sup.r.sup.
The right-hand side of the inequality defines the threshold values. Similarly, the inequality at case (ii) is
r.sub., r.sub.<n, q.sub.+r.sub./nlog.sub.2(2.sup.r.sup.
[0037]
[0038]
[0039] , 0<r.sub.<n. That is, there are n1 discrete entries in the mapping of 1+.fwdarw. in case (iii). Given consecutive integer values of r.sub., the mapping can be implemented as a lookup table (LUT) circuit having (n1) entries.
[0040] The input to the LUT circuit is an integer value of r.sub., and the output is a fixed point value, q.sub.+r.sub./n, having q.sub. as the integer portion and the fraction portion r.sub. if interpreted as an integer. The values configured into the LUT circuit are pre-computed as log.sub.2(12.sup.r.sup.
[0041]
[0042] The maximum threshold, T(r.sub._max), is the pre-computed threshold with the maximum possible value of r.sub., and T(r.sub._min), is the pre-computed threshold with the minimum possible value of r.sub.. Each threshold T(r.sub.) is computed as log.sub.2(2.sup.r.sup.
[0043] At the top of the search tree, comparison 202 compares q.sub.+r.sub./n to T(r.sub._max/2), which is the threshold at approximately the middle of the range values of r.sub.. Note that each division of r.sub._max can be the floor of the result (i.e., floor (r.sub._max/m) for m a power of 2 greater than 0).
[0044] In response to q.sub.+r.sub./n being equal to the threshold T(r.sub._max/2), the output value is r.sub._max/2. In response to q.sub.+r.sub./n<T(r.sub._max/2), the decision tree continues with comparison 204 of q.sub.+r.sub./n to T(r.sub._max/4). In response to q.sub.+r.sub./n>T(r.sub._max/2), the decision tree continues with comparison 206 of q.sub.+r.sub./n to T(r.sub._max/2+r.sub._max/4).
[0045] Comparison 206 compares q.sub.+r.sub./n to T(r.sub._max/2+r.sub._max/4). In response to q.sub.+r.sub./n being equal to the threshold T(r.sub._max/2+r.sub._max/4), the output value is r.sub._max/2+r.sub._max/4. In response to q.sub.+r.sub./n<T(r.sub._max/2+r.sub._max/4), the decision tree continues with comparison 208 of q.sub.+r.sub./n to T(r.sub._max/2+r.sub._max/4r.sub._max/8).
[0046] At comparison 208, in response to q.sub.+r.sub./n<T(r.sub._max/2+r.sub._max/4r.sub._max/8), the decision tree continues with a comparison of q.sub.+r.sub./n to T(r.sub._max/2+r.sub._max4/r.sub._max/8r.sub._max/16) (not shown). In response to q.sub.+r.sub./n>T(r.sub._max/2+r.sub._max/4r.sub._max/8), the decision tree continues with a comparison of q.sub.+r.sub./n to T(r.sub._max/2+r.sub._max/4+r.sub._max/8+r.sub._max/16) (not shown). In response to q.sub.+r.sub./n being equal to the threshold T(r.sub._max/2+r.sub._max/4r.sub._max/8), the output value is r.sub._max/2+r.sub._max/4r.sub._max/8.
[0047] The search in the decision tree continues as described above until the q.sub.+r.sub./n is equal to a threshold, or a comparison at the lowest level in the tree has been reached. At the lowest-level comparison, if q.sub.+r.sub./n is less than the T(x), then the output is r.sub.=x. If q.sub.+r.sub./n is greater than the T(x), then the output is r.sub.=x+1.
[0048] The decision tree can be implemented by a programmed processor or by programmable logic. The programmed processor can access a data structure having the threshold values and indexed by values of r.sub.. A programmable logic implementation can individual comparison circuits having pre-configured threshold values and associated values of r.sub..
[0049]
[0050] At block 304, the sign bit of operand x is selected as the sign of the sum and can be stored in a register at the bit position of the sign bit of the signed fixed point sum.
[0051] At block 306, a subtraction circuit can determine |y|/|x| by subtracting (q.sub.y+r.sub.y/n)(q.sub.x+r.sub.x/n), where (q.sub.y+r.sub.y/n) denotes the unsigned fixed point value of x, and (q.sub.x+r.sub.x/n) denotes the unsigned fixed point value of y. The difference is {q.sub., r.sub.}, which denotes the unsigned fixed point value having an integer part q.sub., and a fractional part that as an integer is denoted r.sub..
[0052] At block 308, the term (1+) is approximated (1+.fwdarw.) to the nearest FLNS value, =2.sup.q.sup.
[0053] At block 322, the fixed point values {q.sub.x, r.sub.x} and {q.sub., r.sub.} are summed by and adder, and the result {q.sub.z, q.sub.z, r.sub.z} is output at block 324.
[0054]
[0055] Referring to the PS 402, each of the processing units includes one or more central processing units (CPUs) and associated circuits, such as memories, interrupt controllers, direct memory access (DMA) controllers, memory management units (MMUs), floating point units (FPUs), and the like. The interconnect 416 includes various switches, busses, communication links, and the like configured to interconnect the processing units, as well as interconnect the other components in the PS 402 to the processing units.
[0056] The OCM 414 includes one or more RAM modules, which can be distributed throughout the PS 402. For example, the OCM 414 can include battery backed RAM (BBRAM), tightly coupled memory (TCM), and the like. The memory controller 410 can include a DRAM interface for accessing external DRAM. The peripherals 408, 415 can include one or more components that provide an interface to the PS 402. For example, the peripherals can include a graphics processing unit (GPU), a display interface (e.g., DisplayPort, high-definition multimedia interface (HDMI) port, etc.), universal serial bus (USB) ports, Ethernet ports, universal asynchronous transceiver (UART) ports, serial peripheral interface (SPI) ports, general purpose (GPIO) ports, serial advanced technology attachment (SATA) ports, PCIe ports, and the like. The peripherals 415 can be coupled to the MIO 413. The peripherals 408 can be coupled to the transceivers 407. The transceivers 407 can include serializer/deserializer (SERDES) circuits, MGTs, and the like.
[0057] Various logic may be implemented as circuitry to carry out one or more of the operations and activities described herein and/or shown in the figures. In these contexts, a circuit or circuitry may be referred to as logic, module, engine, or block. It should be understood that logic, modules, engines and blocks are all circuits that carry out one or more of the operations/activities. In certain implementations, a programmable circuit is one or more computer circuits programmed to execute a set (or sets) of instructions stored in a ROM or RAM and/or operate according to configuration data stored in a configuration memory.
[0058] Though aspects and features may in some cases be described in individual figures, it will be appreciated that features from one figure can be combined with features of another figure even though the combination is not explicitly shown or explicitly described as a combination.
[0059] The circuits and methods are thought to be applicable to a variety of systems for adding FLNS operands. Other aspects and features will be apparent to those skilled in the art from consideration of the specification. The circuits and methods may be implemented as one or more processors configured to execute software, as an application specific integrated circuit (ASIC), or as a logic on a programmable logic device. It is intended that the specification and drawings be considered as examples only, with a true scope of the invention being indicated by the following claims.