Abstract
A method and apparatus are provided for manufacturing integrated circuits performing invariant integer division x/d. A desired rounding mode is provided and an integer triple (a,b,k) for this rounding mode is derived. Furthermore, a set of conditions for the rounding mode is derived. An RTL representation is then derived using the integer triple. From this a hardware layout can be derived and an integrated circuit manufactured with the derived hardware layout. When the integer triple is derived a minimum value of k for the desired rounding mode and set of conditions is also derived.
Claims
1. An apparatus implemented in one or more processors and configured to generate a representation of an integrated circuit for performing integer division x/d where x is a variable integer and d is an invariant integer constant, the apparatus comprising: a parameter creator configured to derive an integer triple (a,b,k) which satisfies Round(x/d)=(ax+b)/2.sup.k for a desired rounding mode; and a synthesis tool configured to generate the representation of the integrated circuit from a representation of the logical operation (ax+b)/2.sup.k in accordance with the derived integer triple (a,b,k) for manufacture of the integrated circuit.
2. The apparatus of claim 1, wherein the parameter creator is configured to derive an integer triple (a,b,k) which has a minimum value of k to satisfy Round(x/d)=(ax+b)/2.sup.k for the desired rounding mode.
3. The apparatus of claim 1, wherein the parameter creator is configured to derive a value for b from values of a and k, the derived value of b having the smallest Hamming weight of the possible values of b satisfying Round(x/d)=(ax+b)/2.sup.k.
4. The apparatus of claim 1, further comprising a representation generator configured to derive the representation of the logical operation (ax+b)/2.sup.k using the derived integer triple (a,b,k).
5. The apparatus of claim 4, wherein the representation generator is an RTL (Register Transfer Level) generator, and the representation of the logical operation (ax+b)/2.sup.k is an RTL representation.
6. The apparatus of claim 1, wherein the desired rounding mode is one of: (i) round towards zero, (ii) round to nearest, and (iii) faithful rounding.
7. The apparatus of claim 1, wherein the parameter creator is further configured to derive k differently for different rounding modes.
8. The apparatus of claim 1, wherein the parameter creator is further configured to receive n, d, and the desired rounding mode as inputs, wherein x is an n-bit number.
9. The apparatus of claim 1, wherein the parameter creator is configured to derive the integer triple (a,b,k) based on a set of conditions.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) FIG. 1 shows a block diagram of circuitry embodying an example according to the disclosure.
(2) FIG. 2 shows a method that can be implemented in examples according to the disclosure.
(3) FIG. 3 depicts apparatus that can be used to implement aspects of the disclosure.
DETAILED DESCRIPTION
(4) Exemplary aspects of the disclosure are described with reference to three rounding modes. Other rounding modes may also be implemented.
(5) We first present the necessary and sufficient conditions for a given triple of (a,b,k) to implement each of the three following rounding schemes.
(6) Round Towards Zero (RTZ)
(7) In this case we require:
(8)
(9) Now the sawtooth function x mod d is discontinuous in x with peaks at x=md−1 where 1≤m≤floor(2.sup.n/d) and troughs at x=md for 0≤m≤floor(2.sup.n/d). It suffices to check that the upper bound error condition is met for md−1 and the lower bound error condition is met for md:
(10)
(11) Now given that a and d is odd then ad−2.sup.k≠0. Depending on the sign of ad−2.sup.k different values of m will stress these inequalities. It follows that the necessary and sufficient conditions for the implementation of round towards zero mode for the IC design is:
(12)
Round to Nearest (RTN)
(13) In this case we require:
(14)
(15) Now the sawtooth function (2x+d) mod 2d is discontinuous in x with peaks at md−(d+1)/2 where 0<m≤floor((2.sup.n+1+d−1)/2d) and troughs at md−(d−1)/2 for 0<m≤floor((2.sup.n+1+d−3)/2d). It suffices to check the upper bound error condition is met for the peaks and the lower bound condition is met for the troughs:
(16)
(17) Now given that a and d is odd then ad−2.sup.k≠0. Depending on the sign of ad−2.sup.k different values of m will stress these inequalities. It follows that the necessary and sufficient conditions for implementation of round towards nearest for the IC design is:
(18) 0
Faithful Rounding (FR1)
(19) In this case we can return either integer that lies either side of the true side, if the true answer is an integer we must return that integer:
(20)
(21) Now for the second case the sawtooth function x mod d is discontinuous in x with peaks at x=md−1 where 0<m≤floor(2.sup.n/d) and troughs at x=md+1 (note we are assuming x≠md) for 0≤m≤floor(2.sup.n/d). It suffices to check that the upper bound error condition is met for md−1 and the lower bound error condition is met for md+1:
(22)
(23) Now given that a and d is odd then ad−2.sup.k≠0. Depending on the sign of ad−2.sup.k different values of m will stress these inequalities in the two cases. It follows that the necessary and sufficient conditions for implementation of faithful rounding is:
(24)
Minimal Hardware Implementation Scheme
(25) Minimal hardware implementations in the IC will result from minimising the number of partial product bits in ax+b. The scheme used achieves this as follows:
(26) 1 Minimise k producing kopt.
(27) 2 For the range of acceptable values of a for a given kopt choose the one that results in the smallest constant multiplier. This can be accomplished by choosing a value for a which has the smallest number of non zero elements in a Canonical Signed Digit representation of a. This will result in aopt. Define this function as minCSD(x).
(28) 3 For the range of valid values for b having fixed kopt and aopt choose the one with smallest Hamming weight, as this minimises the number of partial products bits. If there are a range of numbers that have smallest Hamming weight, we choose the one that has smallest value as this will add 1 s into the least significant bits of the array where the height of the array is smallest. Define the function which finds this value for numbers in the interval [a, b] as minHamm(a, b). Note that the minHamm(a, b) function can be computed as follows:
(29) TABLE-US-00001 Input unsigned a[p−1:0],b[p−1:0] Output unsigned c[p−1:0] c=0 for i=p−1; i≥0; i−− loop if a[i]==b[i] then c[i]=a[i]; a[i]=0; else c+=2.sup.┌log.sup.2.sup.a┐; break; endloop return c
(30) Now applying this scheme to the space of allowable (a,b,k) as derived by the rounding mode and set of conditions we can construct a minimal hardware implementation for each of the three rounding schemes:
(31) RTZ Minimal Hardware Implementation when ad−2.sup.K>0
(32) In this case we require:
(33)
(34) Now note that the right hand side is strictly decreasing in b. So for any valid a, b and k we can always set b=0 and then the condition will still be met, plus it will cost less hardware to implement. Hence a minimal hardware implementation will have b=0. Thus our condition reduces to:
(35)
(36) Given that a must be an integer we have a formula for kopt:
(37)
(38) And kopt is the smallest such valid k hence:
(39)
(40) Hence a=ceil(2.sup.k.sup.opt/d) is valid but a=ceil(2.sup.k.sup.opt/d)+1 is not valid. It follows that the there is only valid value for a when k=kopt. We can now state that the design which minimises k and satisfies ad−2.sup.k>0 is unique and is defined by:
(41)
RTZ Minimal Hardware Implementation when ad−2.sup.K<0
In this case we need:
(42)
(43) Hence b must necessarily be in the following interval:
b∈[(2.sup.k−ad)└2.sup.n/d┘,2.sup.k+a−ad−1]
(44) This interval must be non empty so:
(45) 0
(46) Given that a must be an integer we have a formula for kopt:
(47)
(48) Where kopt is the smallest such valid k hence:
(49)
(50) Hence a=floor(2.sup.k.sup.opt/d) is valid but a=floor(2.sup.k.sup.opt/d)−1 is not valid. It follows that the there is only valid value for a when k=kopt. We can now state that the design which minimises k and satisfies ad−2.sup.k<0 is unique in k and a and is defined by:
(51)
(52) Where minHamm(a, b) returns the number of smallest value from the numbers of smallest Hamming weight found within the interval [a, b].
(53) RTZ Minimal Hardware Design
(54) Summarising the previous sections we have the following algorithm:
(55)
(56) Note that kopt.sup.+ is never equal to kopt.sup.−, otherwise if kopt=kopt.sup.+=kopt.sup.− then:
(57)
(58) Simplifying these two conditions we get:
2((−2.sup.k−1)mod d)>(−2.sup.k)mod d
2(2.sup.k−1 mod d)>2.sup.k mod d
2(2.sup.k−1 mod d)>2.sup.k mod d>2(2.sup.k−1 mod d)−d
(59) This is a contradiction as 2.sup.k mod d is equal to one of these limits.
(60) RTN Minimal Hardware Implementation when ad−2.sup.K>0
(61) In this case we need:
(62)
(63) Hence b must necessarily be in the following interval:
(64)
(65) This interval must be non empty so:
(66)
(67) Given that a must be an integer we have a formula for kopt:
(68)
(69) Where kopt is the smallest such valid k hence:
(70) 0
(71) Hence a=ceil(2.sup.k.sup.opt/d) is valid but a=ceil(2.sup.k.sup.opt/d)+1 is not valid. It follows that the there is only valid value for a when k=kopt. The design which minimises k and satisfies ad−2>0 is unique and is defined by:
(72)
(73) Where minHamm(a, b) returns the number of smallest value from the numbers of smallest Hamming weight found within the interval [a, b].
(74) RTN Minimal Hardware Implementation when ad−2.sup.K<0
(75) In this case we need:
(76)
(77) Hence b must necessarily be in the following interval:
(78)
(79) This interval must be non empty so:
(80)
(81) Given that a must be an integer we have a formula for kopt:
(82)
(83) Where kopt is the smallest such valid k hence:
(84)
(85) Hence a=floor(2.sup.k.sup.opt/d) is valid but a=floor(2.sup.k.sup.opt/d)−1 is not valid. It follows that the there is only valid value for a when k=kopt. The design which minimises k and satisfies ad−2<0 is unique in k and a and is defined by:
(86)
(87) Where minHamm(a, b) returns the number of smallest value from the numbers of smallest Hamming weight found within the interval [a, b].
(88) RTN Minimal Hardware Design
(89) Summarising the previous sections results in the following algorithm:
(90)
(91) Note that kopt.sup.+ is never equal to kopt.sup.−, otherwise if kopt.sup.+=kopt.sup.−=kopt then:
(92)
(93) Simplifying these two conditions we get:
2((−2.sup.k−1)mod d)>(−2.sup.k)mod d
2(2.sup.k−1 mod d)>2.sup.k mod d
2(2.sup.k−1 mod d)>2.sup.k mod d>2(2.sup.k−1 mod d)−d
(94) This is a contradiction as 2.sup.k mod d is equal to one of these limits.
(95) FR1 Minimal Hardware Implementation when ad−2>0
(96) In this case we require:
└2.sup.n/d┘(ad−2.sup.k)<2.sup.k−b
(97) Now note that the right hand side is strictly decreasing in b. So for any valid a, b and k we can always set b=0 and then condition will still be met, plus cost less hardware to implement. Hence minimal hardware implementations will have b=0. Thus our condition reduces to:
(98) 0
(99) Given that a must be an integer we have a formula for kopt:
(100)
(101) Where kopt is the smallest such valid k hence:
(102)
(103) Hence a=ceil(2.sup.k.sup.opt/d) valid but a=ceil(2.sup.k.sup.opt/d)+1 is not valid. It follows that the there is only a valid value for a when k=kopt. We can now state that the design which minimises k and satisfies ad−2.sup.k>0 is unique and is defined by:
(104)
FR1 Minimal Hardware Implementation when ad−2.sup.K<0
(105) In this case we need:
└2.sup.n/d┘(2.sup.k−ad)≤b<2.sup.k
(106) Now b must live in a non empty interval so:
(107)
(108) Given that a must be an integer we have a formula for kopt:
(109)
(110) Where kopt is the smallest such valid k hence:
(111)
(112) Hence a=floor(2.sup.k.sup.opt/d) is valid but a=floor(2.sup.k.sup.opt/d)−1 is not valid. It follows that the there is only a valid value for a when k=kopt. We can now state that the design which minimises k and satisfies ad−2.sup.k<0 is unique in k and a and is defined by:
(113)
(114) Where minHamm(a, b) returns the number of smallest value from the numbers of smallest Hamming weight found within the interval [a, b].
(115) FR1 Minimal Hardware Design
(116) Summarising the previous sections we have the following algorithm:
(117)
(118) Note that kopt.sup.+ is never equal to kopt.sup.−, else if kopt=kopt.sup.+=kopt.sup.− then:
(119)
(120) Simplifying these two conditions we get:
2((−2.sup.k−1 mod d)>(−2.sup.k)mod d
2(2.sup.k−1 mod d)>2.sup.k mod d
2(2.sup.k−1 mod d)>2.sup.k mod d>2(2.sup.k−1 mod d)−d
(121) This is a contradiction as 2.sup.k mod d is equal to one of these limits.
(122) Invariant Integer Division Synthesiser
(123) Example structure of a synthesis apparatus according to the disclosure that performs invariant integer division is depicted in FIG. 1.
(124) This shows a parameter creation unit 2 which has three inputs n, d and rounding mode. n is the number of bits to be used in the numerator of the division, d is the divisor, and the rounding mode is a selection of one of a plurality of rounding modes. Three examples are given here but others are possible.
(125) The parameter creation unit 2 generates in dependence on the inputs n, d, and rounding mode, the integer triple (a, b, k) required by an RTL generator k to generate an appropriate RTL representation of the circuitry for performing the division for the said number of bits of n and rounding mode, and for additional conditions provided to the RTL generation. The RTL generator is computer controlled to generate an RTL representation of a division for the integer triple using additional conditions such as ad−2.sup.k<0.
(126) The RTL representation is then output to a synthesis tool 6 which generates the hardware circuits required to implement the division on an appropriate part of an integrated circuit.
(127) The algorithm in the parameter creation may be summarised as:
{k,a,b}=(k.sup.+<k.sup.−)?{k.sup.+,a.sup.+,min Hamm(Y.sup.+(k.sup.+,a.sup.+))}:{k.sup.−,a.sup.−,min Hamm(Y.sup.−(k.sup.−,a.sup.−))}
(128) Where
(129) 0
(130) And
(131) TABLE-US-00002 RTZ RTN FR1 X.sup.+ X.sup.− Y.sup.+(k,a) (0,0) (0,0) Y.sup.−(k,a) 0
Specific Example of the Idea: Application to the Multiplication of Normalised Numbers
(132) An unsigned n bit normalised number x is interpreted as holding the value x/(2.sup.n−1). Multiplication of these numbers thus involves computing the following:
(133)
(134) We can apply the previously found results to implementing this design for the three rounding modes. In this case d=2.sup.n−1 and given that ab≤(2.sup.n−1) then 2.sup.n−1 in the previous sections will be replaced by (2.sup.n−1).sup.2. Substituting these values into the previous sections gives rise to the following three rounding:
(135)
(136) Note that the RTN case gives a generalisation and proof of the formula for such multiplication [2]. Note that the allowable interval for the additive constant in each case is [2.sup.n−1, 2.sup.n+1], [2.sup.n−1(2.sup.n+1)−2, 2.sup.n−1(2.sup.n+1)] and no freedom for the FR1 case.
Alternative Implementations
(137) Further implementations can be realized by those skilled in the art based on the following disclosures, to deal with the following situations:
(138) 1) d is even. Note that if d is a power of 2 then we have the trivial implementations:
(139) 2) If x lives in the interval [0,max] and not [0,2.sup.n−1] then it suffices to replace 2.sup.n the formulae above with max+1. 3) If x can take negative the formulae can be reworked using the framework established above. 4) Other rounding modes—the formulae can be reworked using the framework established above. 5) The hardware scheme has been chosen to minimize the size of the final shift—a different scheme could be applied which would result in different equations for optimal a, b and k.
(140) In summary of the above, FIG. 1 depicts an example where a parameter creator inputs n, d, and a desired rounding mode, and outputs integer triple a, b, and k. An RTL generator 11 receives the a, b, and k. RTL generator 11 generates RTL for (a*x+b)>>k according to the parameters. The generated RTL is input to a synthesis tool 12, which outputs a hardware layout that can be fabricated in a fabrication process.
(141) FIG. 2 depicts an example method, where integer triple (a, b, k) is derived 15 for a rounding mode and conditions. At 16, a minimum value of k is derived (16 is depicted separately from 15, for ease of explanation). At 17, an RTL representation of (ax+b)/2{circumflex over ( )}.sup.k is derived. At 18, a hardware layout is derived from the RTL. At 19, a circuit can be manufactured according to the hardware layout.
(142) FIG. 3 depicts exemplary apparatus in which methods can be implemented. A processor 25 interfaces with a user interface 30 and with a display 32. A RAM 26 and non-volatile storage 28 interfaces with processor 25. These memories are tangible memories that can store instructions for configuring processor 25 to perform aspects of the disclosure.