Evaluation of polynomials with floating-point components
09959091 ยท 2018-05-01
Assignee
Inventors
Cpc classification
G06F7/483
PHYSICS
G06F2207/5523
PHYSICS
International classification
G06F7/48
PHYSICS
Abstract
A method identifies a floating point implementation of a polynomial that is accurately evaluable. The method comprises determining whether the polynomial has an allowable variety defined by a plurality of sub-varieties, and, if so, partitioning the input domain of the polynomial into a plurality of sub-domains about the sub-varieties. A floating point precision is then identified for each input to the polynomial falling within each sub-domain based on the location of the input within the sub-domain (e.g. how far away the input is from the sub-variety associated with the sub-domain). A floating point implementation for the polynomial is generated so that an input to the polynomial is evaluated using floating point components having the precision identified for the input.
Claims
1. A method of generating a floating point implementation of a polynomial in hardware logic, the method comprising: determining, in an assessment module, whether the polynomial has an allowable variety defined by one or more sub-varieties; in response to determining that the polynomial has an allowable variety, partitioning, in an assessment module, an input domain of the polynomial into a plurality of sub-domains about the plurality of sub-varieties, each sub-domain being associated with one of the plurality of sub-varieties; for each sub-domain, identifying, in an assessment module, a precision of floating point components for each input to the polynomial falling within that sub-domain based on a location of the input within that sub-domain; and generating a floating point implementation comprising hardware logic configured to evaluate the polynomial for any received input to the polynomial using floating point components having the precision identified for the input.
2. The method of claim 1, wherein determining whether the polynomial has an allowable variety comprises determining whether each of the conditions for the polynomial being zero are of the form x.sub.i=0 or x.sub.j.sub.k=0.
3. The method of claim 1, wherein partitioning the input domain into one or more sub-domains about the plurality of sub-varieties comprises mapping, in an assessment module, each sub-variety onto each plane formed by each possible pair of input values, and, for each plane, creating constraints for each sub-variety that separate that sub-variety from the other sub-varieties.
4. The method of claim 3, wherein a sub-domain is defined as an intersection of the constraints for an associated sub-variety.
5. The method of claim 4, wherein the constraints for a sub-variety are created by considering, in an assessment module, whether an input to the polynomial is closer to the sub-variety than the other sub-varieties.
6. The method of claim 1, wherein identifying the precision of floating point components for each input to the polynomial falling in a particular sub-domain comprises identifying, in an assessment module, a subset of the sub-domain that is capable of being accurately evaluated using a basic floating point implementation of the polynomial of a predefined precision.
7. The method of claim 6, wherein identifying the precision of floating point components for each input to the polynomial falling within a particular sub-domain and outside of the identified subset of the particular sub-domain, further comprises: identifying, in an assessment module, one or more dominant terms of the polynomial as the polynomial approaches the sub-variety; generating, in an assessment module, a dominant term polynomial comprising the one or more identified dominant terms; determining, in an assessment module, whether the dominant term polynomial has an allowable variety; and in response to determining that the dominant term polynomial has an allowable variety, determining, in an assessment module, a floating point precision to accurately evaluate the polynomial using the dominant term polynomial.
8. The method of claim 7, wherein identifying the one or more dominant terms of the polynomial as the polynomial approaches the sub-variety comprises: expanding, in an assessment module, the polynomial using variables defining the sub-variety; and identifying, in an assessment module, the dominant terms as the terms of the expanded polynomial associated with faces of a convex hull of a multi-dimensional Newton polytope.
9. The method of claim 7, wherein determining whether the dominant term polynomial has an allowable variety comprises determining, in an assessment module, whether each of the conditions for the dominant term polynomial being zero are of the form x.sub.i=0 or x.sub.jx.sub.k=0.
10. A system to generate a floating point implementation of a polynomial in hardware logic, the system comprising: an input module configured to receive the polynomial; an assessment module configured to: determine whether the polynomial has an allowable variety defined by a plurality of sub-varieties; in response to determining that the polynomial has an allowable variety, partition an input domain of the polynomial into a plurality of sub-domains about the plurality of sub-varieties, each sub-domain being associated with one of the plurality of sub-varieties; and identify a precision of floating point components for each input to the polynomial falling in each sub-domain based on a location of the input within the sub-domain; and an output module configured to output a floating point implementation comprising hardware logic configured to evaluate the polynomial for an input to the polynomial using floating point components having the precision identified for the input.
11. The system of claim 10, wherein determining whether the polynomial has an allowable variety comprises determining, in the assessment module, whether each of the conditions for the polynomial being zero are of the form x.sub.i=0 or x.sub.jx.sub.k=0.
12. The system of claim 10, wherein partitioning the input domain into one or more sub-domains about the plurality of sub-varieties comprises mapping, in the assessment module, each sub-variety onto each plane formed by each pair of input values, and, for each plane, creating constraints for each sub-variety that separate that sub-variety from the other sub-varieties.
13. The system of claim 12, wherein a sub-domain is defined as an intersection of the constraints for an associated sub-variety.
14. The system of claim 13, wherein the constraints for a sub-variety are created by considering, in the assessment module, whether an input to the polynomial is closer to the sub-variety than the other sub-varieties.
15. The system of claim 10, wherein identifying the precision of floating point components for each input to the polynomial falling in a particular sub-domain comprises identifying, in the assessment module, a subset of the sub-domain that is capable of being accurately evaluated using a basic floating point implementation of the polynomial of a predefined precision.
16. The system of claim 15, wherein identifying the precision of floating point components for each input to the polynomial falling within a particular sub-domain and outside the identified subset of the sub-domain comprises: identifying, in the assessment module, one or more dominant terms of the polynomial as the polynomial approaches the sub-variety associated with the sub-domain; generating, in the assessment module, a dominant term polynomial comprising the one or more identified dominant terms; determining, in the assessment module, whether the dominant term polynomial has an allowable variety; and in response to determining that the dominant term polynomial has an allowable variety, identifying, in the assessment module, a floating point precision to accurately evaluate the polynomial using the dominant term polynomial.
17. The system of claim 16, wherein identifying the one or more dominant terms of the polynomial as the polynomial approaches the sub-variety comprises: expanding, in the assessment module, the polynomial using variables defining the sub-variety; and identifying the dominant terms as the terms of the expanded polynomial associated with faces of a convex hull of a multi-dimensional Newton polytope.
18. The system of claim 16, wherein determining whether the dominant term polynomial has an allowable variety comprises determining, in the assessment module, whether each of the conditions for the dominant term polynomial being zero are of the form x.sub.i=0 or x.sub.jx.sub.k=0.
19. A non-transitory computer readable storage medium having stored thereon computer readable program code that when executed causes at least one processor to: determine, in the processor, whether a polynomial has an allowable variety defined by one or more sub-varieties; in response to determining that the polynomial has an allowable variety, partition, in the processor, an input domain of the polynomial into a plurality of sub-domains about the plurality of sub-varieties, each sub-domain being associated with one of the plurality of sub-varieties; for each sub-domain, identify, in the processor, a precision of floating point components for each input to the polynomial falling in each sub-domain based on a location of the input within the sub-domain; and generate, in the processor, a floating point implementation comprising hardware logic configured to evaluate the polynomial for an input to the polynomial using floating point components having the precision identified for the input.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Embodiments of the invention will be described, by way of example, with reference to the following drawings, in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15) Common reference numerals are used throughout the figures to indicate similar features.
DETAILED DESCRIPTION
(16) Embodiments of the present invention are described below by way of example only. These examples represent some ways of putting various configurations of the invention into practice that are currently known to the Applicant although they are not the only ways in which this could be achieved. The description sets forth the functions of the example and the sequence of steps for constructing and operating the example. However, the same or equivalent functions and sequences may be accomplished by different examples.
(17) Described herein are methods and systems for generating a floating point implementation of a polynomial that allows for accurate evaluation of the polynomial. A floating point implementation is said to accurately evaluate the polynomial if it has a bounded relative error of strictly less than one. The methods and systems described herein are based on the principle that more precision is required when the input is closer to a zero condition of the polynomial and less precision is required when the input is further from a non-zero condition. Accordingly, the example methods and systems described herein generate a floating point implementation that comprises hardware logic that is configured to select the terms and precision of the floating point components to accurately evaluate the polynomial based on the proximity of the input to the zero conditions.
(18) All of, a part of, the floating point implementation may then be built in hardware (e.g. it may be synthesized into silicon).
(19) As described above, floating point implementations of a polynomial are prone to error due to the rounding nature of floating point representations and computations. If p is the real infinitely precise polynomial and {circumflex over (p)} is the floating point implementation of the polynomial the relative error of the floating point implementation is as shown in equation (6):
(20)
(21) There are countless algorithms in computer aided geometric design which involve intersections, distance function and curvature extreme polynomial calculations (e.g. calculations of equations (1) or (2)) which require correctly determining the sign of a polynomial. In particular, determining the sign wrong can result in gross errors or catastrophic failure in geometric calculation. To ensure the sign is correct the relative error of a floating point implementation of a polynomial must be less than one as shown in equation (7):
(22)
(23) If a relative error of greater than one were permitted as shown in equation (8):
(24)
then it would be possible for {circumflex over (p)}=p thus {circumflex over (p)} would not be guaranteed to get the sign right.
(25) Furthermore, since the cost of a floating point implementation of a polynomial is orders of magnitude greater than a fixed-point implementation the use of increased precision to gain greater accuracy can degrade performance in both time and power consumption. Accordingly it is desirable for floating point implementations of polynomials to be accurate (i.e. have a relative bounded error of less than one) and only use the appropriate level of precision to achieve the accuracy.
(26) Reference is made to
(27) The input module 202 receives a polynomial p(x) to be implemented in hardware using floating point components, the floating point formats (and precision thereof) available for implementing the polynomial, an absolute relative error bound (e.g. one in some cases), and the floating point operations (e.g. {circumflex over (x)}, , {circumflex over ()}) that are available to evaluate the polynomial.
(28) The assessment module 204 determines whether there is an accurately evaluable floating point implementation of the input polynomial p(x) comprising one or more sub-varieties (i.e. zero conditions); and if it is determined that there is an accurately evaluable floating point implementation for the specified input polynomial p(x) the assessment module 204 divides the input domain of the polynomial into sub-domains around the sub-varieties and identifies the terms of the polynomial to be evaluated and the minimum floating point precision (based on the available precisions) that can be used to accurately evaluate the polynomial for inputs within each sub-domain.
(29) In particular, each sub-domain is divided into one or more subsets based on proximity to the associated sub-variety (e.g. the input values within the sub-domain that are far from the sub-variety may form one subset and the input values within the sub-domain that are close to the sub-variety may be said to be outside the subset). The terms of the polynomial to be evaluated and the floating point precision for accurate evaluation of the polynomial are then identified for (i) inputs that fall within the subset (e.g. inputs that are far from the sub-variety); and (ii) for inputs that fall outside the subset (e.g. inputs that are close to the sub-variety). In general, when an input is far away from the sub-variety then the polynomial can be accurately evaluated using a lower precision basic implementation of the polynomial and when an input is close to the sub-variety then the polynomial can be accurately evaluated using a higher precision basic implementation of the dominant terms of the polynomial as it approaches the sub-variety.
(30) An example method for generating a floating point implementation of a polynomial that allows accurate evaluation thereof is described with reference to
(31) The output module 206 outputs either an indication that the input polynomial p(x) cannot be accurately evaluated using the floating point components, or it outputs a floating point implementation of the polynomial {circumflex over (p)}(x) which can be used to accurately evaluate the polynomial. The floating point implementation of the polynomial {circumflex over (p)}(x) is configured to select the terms and precision to be used in evaluating the polynomial based on which sub-domain and subset thereof the input to the polynomial (e.g. x.sub.1, x.sub.2, x.sub.3 . . . ) falls within.
(32) Reference is now made to
(33) If the assessment module 204 determines that the polynomial does not have an allowable variety, then the method proceeds to block 304 where the output module 206 outputs an indication that the polynomial cannot be accurately evaluated using the specified floating point components. If, however, the assessment module 204 determines that the polynomial does have an allowable variety then the method 300 proceeds to block 306.
(34) At block 306, the assessment module 204 divides the input domain into a plurality of sub-domains or regions. Each sub-domain or region is associated with a particular sub-variety. An input point (or set of input values) that falls within a particular sub-domain or region is closer to the associated sub-variety than any other sub-variety. In general, the closer an input point (or set of input values) is to a sub-variety (e.g. zero condition of the polynomial) the more precision is required.
(35) Partitioning or dividing the input domain into a plurality of sub-domains or regions can be performed by mapping each sub-variety onto each possible plane formed by each possible pair of input values. Then on each plane constraints are created that separate a given sub-variety from all the other sub-varieties. The sub-domain for each sub-variety is then defined by the intersection of the constraints for that sub-variety.
(36) Once the input domain has been divided into a sub-domain for each sub-variety, the method 300 proceeds to block 308.
(37) At block 308, the assessment module 204 identifies, for each sub-variety, a subset within the sub-variety for which the polynomial can be accurately evaluated using a basic implementation of the polynomial. In particular, it is determined how far away from the sub-variety the input has to be to be able to accurately evaluate the polynomial using a basic floating point implementation of the polynomial.
(38) A basic floating point implementation of a polynomial is a floating point implementation which operates on the expanded form of the polynomial and first computes the monomials using a left to right logarithm multiplication tree, then multiples by each coefficient. Finally a left to right logarithmic addition tree is performed.
(39) As will be described in further detail below, a homogeneous polynomial p(x) of degree m and having s terms can be accurately evaluated using a basic floating point implementation with floating point components with a maximum relative accuracy on the domain D provided the following is true:
(40)
(41) where D is the relevant sub-domain (formed by the constraints described above in reference to block 306) bounded away from the sub-variety by a constraint of the form x.sub.i.sup.2e(x.sub.1.sup.2+x.sub.2.sup.2+ . . . +x.sub.n.sup.2) or (x.sub.ix.sub.j).sup.2e(x.sub.1.sup.2+x.sub.2.sup.2+ . . . +x.sub.n.sup.2).
(42) A particular floating point operation will have a relative accuracy . Thus for a known the minimum e that allows a particular floating point precision to be used can be determined for each floating point precision for each sub-variety/sub-domain from the above equation. This will produce a domain D (i.e. a subset of the sub-domain) for each particular floating point precision for each sub-variety.
(43) Once the values of e have been determined for each floating point precision for each sub-variety it can be determined whether the input is far enough away from the sub-variety to evaluate the polynomial using a basic floating point implementation of the polynomial by determining if the input falls within one of the domains D (i.e. subsets of the sub-domains) defined by the calculated e values.
(44) Once the subset (e.g. domain D) for each sub-domain has been identified the method 300 proceeds to block 310.
(45) At block 310 the assessment module 204 determines, for each sub-variety, the dominant terms of the polynomial as the polynomial approaches the sub-variety. In some cases, determining the dominant terms of a polynomial as the input approaches a particular sub-variety comprises expanding the polynomial using the variables defining the sub-variety within a given subdomain. The dominant terms are then identified by computing the terms associated with the faces of the convex hull of a multi-dimensional Newton polytope. This process will be described in further detail below.
(46) Once the dominant terms are identified they are combined to form what is referred to herein as a dominant term polynomial and then the method 300 proceeds to block 312.
(47) At block 312, it is determined whether the dominant term polynomial has an allowable variety. As described above in reference to block 302 a polynomial is said to have an allowable variety if the conditions for the polynomial being zero (i.e. equality) can be written in the form x.sub.i=0 or x.sub.jx.sub.k=0. If it is determined that the dominant term polynomial has an allowable variety then the method proceeds to block 314. If, however, it is determined that the dominant term polynomial does not have an allowable variety then the method proceeds to block 304 where the output module outputs a message indicating the polynomial cannot be accurately evaluated using the floating point components (e.g. {circumflex over ()}, , {circumflex over ()})
(48) At block 314, the precision for accurately evaluating the dominant term polynomial as a proxy for the actual polynomial is determined.
(49) As will be described in more detail below, a dominant term polynomial can be used to accurately evaluate a polynomial near a particular sub-variety if the following is met:
(50)
(51) where is the relative error of the dominant term polynomial p.sub.dom and the polynomial. If can be determined then the above equation can be solved to determine which indicates the level of precision needed to accurately evaluate the polynomial (via the dominant term polynomial) when it gets close to a particular sub-variety.
(52) Once the precision required for accurately evaluating the polynomial via the dominant term polynomial has been determined, the method ends and the output module 206 outputs a floating point implementation that is configured to evaluate the polynomial using a basic implementation of the polynomial of a predetermined precision when the input to the polynomial falls within a subset of a sub-domain (e.g. the input is far from the sub-variety) and to evaluate the polynomial using the identified dominant term polynomial for a sub-variety when the input falls within the associated sub-domain, but outside any of the subsets (e.g. the input is close to the sub-variety).
(53) The blocks of method 300 are now described in further detail below.
(54) Determining Whether the Polynomial has an Allowable Variety
(55) The first step in determining a floating point implementation of a polynomial that allows accurate evaluation thereof is determining whether the polynomial has an allowable variety. An allowable variety is a floating point implementation of the polynomial in which the zeros can be determined correctly. A floating point implementation of a polynomial is considered to determine the zeros correctly if when the floating point implementation of the polynomial {circumflex over (p)} is equal to zero the corresponding real infinitely precise polynomial p is also equal to zero. This is represented by equation (9)
{circumflex over (p)}=0p=0(9)
(56) It is known that floating point computations of the form ab or a{circumflex over ()}b can be performed accurately (assuming no underflow or overflow occur). In particular, as shown in
(57) If tracing back a floating point implementation of a polynomial {circumflex over (p)} when {circumflex over (p)}=0, there is a non-zero intermediate signal, X, which is the output of a non-trivial operator Y, and the inputs to Y can be altered without changing X and hence the entire implementation still returns zero, then it is possible to identify inputs such that {circumflex over (p)}=0 but that the infinitely precise p0.
(58) For example, consider the polynomial p=(a+b)+c which has a floating point implementation of {circumflex over (p)}=(ab)c, as shown in
(59) Accordingly, for a floating point implementation of a polynomial {circumflex over (p)} to correctly calculate the zeros, when {circumflex over (p)}=0, all non-trivial operators within the implementation must produce zero. Since some of the non-trivial operators act on the inputs to the polynomial then certain inputs must have the property that x.sub.i=0 or x.sub.jx.sub.k=0. Accordingly if a polynomial can be represented as a union of constraints of the form x.sub.i=0, or x.sub.jx.sub.k=0 (as shown in equations (10)) the polynomial is considered to have an allowable variety. Otherwise the polynomial does not have an allowable variety.
p={x.sub.iopx.sub.j=0}
op={+,,}
p={x.sub.i=0,x.sub.jx.sub.k=0}(10)
(60) In other words a polynomial has an allowable variety if the conditions for the polynomial being zero (i.e. equality) can be written in the form x.sub.i=0 or x.sub.jx.sub.k=0. Each zero condition (e.g. condition of the form x.sub.i=0 or x.sub.jx.sub.k=0)) defines a sub-variety.
(61) For example, consider the Motzkin polynomial shown in equation (11):
z.sup.6+x.sup.2y.sup.2(x.sup.2+y.sup.23z.sup.2)0(11)
(62) Equality (i.e. the polynomial is zero) occurs when z=y=0, or z=x=0, or z=y=x. Accordingly, the Motzkin polynomial of equation (11) has an allowable variety since it satisfies the property shown in (10) above.
(63) If a polynomial has an allowable variety then accurate evaluation of the polynomial using only floating point adder and multiplier components may be possible.
(64) Partitioning the Input Domain into Sub-Domains
(65) Once it has been determined that the polynomial has an allowable variety then the appropriate precision to accurately evaluate the polynomial for a given input is determined. Determining the appropriate precision comprises: (i) dividing the input domain around each of the sub-varieties of the allowable variety to determine which of the sub-varieties an input value is closest to; (ii) determining the precision to accurately evaluate the polynomial when the input is far from the closest sub-variety; and (iii) determining the precision to accurately evaluate the polynomial when the input is near the closest sub-variety. In general the closer the input is to one of the sub-varieties the more precision is required and the further the input is to the sub-varieties the less precision is required. Accordingly, the level of precision depends on the proximity of the input to the sub-varieties. As described above it is desirable to use increased precision only where is it is needed to achieve the desired relative error.
(66) Since the level of precision depends on the proximity of the input to the sub-varieties, the first step is to determine which sub-variety an input is closest to. This is done by partitioning the input domain into sub-domains or regions which are each associated with a particular sub-variety. An input or point that falls within a particular sub-domain or region is closer to the associated sub-variety than any other sub-variety.
(67) Partitioning is accomplished by mapping each sub-variety onto each possible plane formed by each possible pair of input values. Then on each plane constraints are generated that separate a given sub-variety from all the other sub-varieties.
(68) This process is illustrated using the Motzkin polynomial of equation (11). As described above, the Motzkin polynomial has six sub-varieties: (i) z=x=0; (ii) z=y=0; (iii) z=y=x; (iv) z=y=x; (v) z=y=x; and (vi) z=y=x. These three sub-varieties can be represented by the six lines 602 to 612 shown in
(69) Each line 606-612 in
(70) Constraints are generated for each sub-variety for each plane. The constraints define when a point on the plane (e.g. (x,y) in the xy plane) is closer to the sub-variety than any other sub-variety. This comprises defining a set of constraints for each line mapped to a particular plane that describes the condition for the arbitrary point to be closer to that line than any other line. For example, reference is made again to
|xy||x| |xy||y| |xy||x+y|(12)
|xy||x| |xy||y| sgn(x)=sgn(y)(13)
(71) The constraints determined from each plane are combined for each sub-variety to form a final set of constraints for the sub-variety. For example, the three planes (e.g. as shown in
|xy||x| |xy||y| |xz||z| |yz||z| sgn(x)=sgn(y)=sgn(z)(14)
(72) Similarly, the subdomain for the sub-variety z=x=0 is defined by the following constraints in (15):
|x||xy| |x||x+y| |x||y| |z||yz| |z||y+z|(15)
(73) Finally, the subdomain for the sub-variety z=y=0 is defined by the following constraints in (16):
|yx| |y||y+x| |y||x| |z||xz| |z||x+z|(16)
(74) Since the Motzkin polynomial does not change when x, y or z are negative, it is sufficient to accurately evaluate the Motzkin polynomial in the positive octant, namely x,y,z0. In this octant the subdomain constraints of (14), (15) and (16) can be simplified to (17), (18) and (19) shown below respectively:
x2y y2x x2z y2z(17)
2xy 2zy(18)
2yx 2zx(19)
(75) In general the set of possible conditions which separate a sub-variety A from another sub-variety B when projected onto the x, y plane can be found in Table 2.
(76) TABLE-US-00002 TABLE 2 Part of Part of Resulting Sub-Variety A Sub-Variety B Constraint x = y x = y Not needed x = y x = y Not needed x = y x = y sgn(x) = sgn(y) x = y x = y sgn(x) sgn(y) x = y x = 0 |y| |2x| x = y y = 0 |x| |2y| x = 0 x = y |2x| |y| x = 0 x = 0 Not needed x = 0 y = 0 |x| |y| y = 0 x = y |2y| |x| y = 0 x = 0 |y| |x| y = 0 y = 0 Not needed any x = y = 0 Not needed x = y = 0 any Not needed x,y plane any Not needed any x,y plane Not needed
(77) Accordingly, the constraints in Table 1 can be used to generate the constraints defining the sub-domain for any sub-variety. As described above, if an input or point falls within a particular sub-domain it is closest to the sub-variety associated with the particular sub-domain.
(78) Determining Precision when Far Away from the Sub-Variety
(79) Once it has been determined what sub-variety (e.g. condition for when p=0) an input point is closest to, the floating point precision required to ensure a relative error of less than one can be determined. As noted above, the amount of precision required to achieve a particular bounded error is based on the distance the input is from the sub-variety. Generally, the closer the input is to a sub-variety the more precision is required, and the further the input is to a sub-variety the less precision is required.
(80) Accordingly, for each sub-variety it is determined how far away from the sub-variety the input needs to be to use one or more predefined precisions. If the input is within the determined distance(s) then the associated precision can be used to accurately evaluate the polynomial using floating point components. If, however, the input is not within the determined distance(s) then the input is considered close to the sub-variety and a different method, described below, is used to determine if the polynomial can be accurately evaluated using floating point components and if so, the precision of those components.
(81) A general polynomial can be represented by the form in equation (20)
p=.sub.c.sub.x.sup.(20)
(82) As described above, a basic floating point implementation of a polynomial is one which operates on the expanded form of the polynomial and first computes the monomials using a left to right logarithm multiplication tree, then multiples by each coefficient. Finally a left to right logarithmic addition tree is performed.
(83) If the polynomial of equation (20) is homogenous (the sum of the exponents for each term in the polynomial is equal) of degree m with s terms, then the corresponding basic floating point implementation of the polynomial will be of the form shown in equation (21)
{circumflex over (p)}.sub.basic=.sub.c.sub.x.sup..sub.i=1.sup.m+[log.sup.
(84) where each floating point operation (e.g. {circumflex over ()}, {circumflex over ()}, ) introduces a relative error (1+.sub.i) where |.sub.i| where is the largest relative error the floating point components can introduce.
(85) The relative error of {circumflex over (p)}.sub.basic top can then be represented by equation (22)
(86)
(87) If the polynomial is homogenous (as noted) above then p(x)=.sup.mp(x). Accordingly scaling every variable x.sub.i by the same amount will leave the relative error of {circumflex over (p)}.sub.basic to p unchanged. Therefore, without a loss of generality, it can be assumed that a bound can be calculated by assuming evaluation on the unit sphere, namely x={square root over (.sub.ix.sub.i.sup.2)}=1. This means that it can be assumed that |x.sub.i|1. Furthermore, since the evaluation is done far from the variety then p0. Thus the relative error can be bounded as shown in equation (23):
(88)
(89) where p.sub.min is the smallest value of p obtainable when x=1 and in domain D. p.sub.min can be represented by equation (24):
p.sub.min=min(p(x):x=1,xD)(24)
(90) Accordingly, the polynomial of equation (20) can be accurately evaluated using a basic floating point implementation with floating point components with a maximum relative accuracy on the domain D provided equation (25) is true:
(91)
and provided D is in a sub-domain, but does not include any part of the corresponding sub-variety (e.g. it is bounded away from the sub-variety itself).
(92) As described above, a sub-domain can be written as the intersection of constraints of the following form shown in (26):
|x.sub.i||2x.sub.j| |2x.sub.i||x.sub.j| sgn(x.sub.i)=sgn(x.sub.j)sgn(x.sub.i)sgn(x.sub.j)(26)
(93) A constrain for bounding away from a particular sub-variety will be of one of the following formats shown in (27):
x.sub.i.sup.2e.sup.2(x.sub.1.sup.2+x.sub.2.sup.2+ . . . +x.sub.n.sup.2) or (x.sub.ix.sub.j).sup.2e.sup.2(x.sub.1.sup.2+x.sub.2.sup.2+ . . . +x.sub.n.sup.2)(27)
(94) Accordingly equation (25) is met when the following conditions (28), (29) and (30) are true:
(95)
(96) In particular, if conditions (28), (29) and (30) are met then a basic floating point implementation of a polynomial p using floating point components with a bounded relative error of will have a relative error bounded by .
(97) Accordingly, if e (distance from the sub-variety) is fixed, (precision) is maximized whereas if (precision) is fixed then e (distance from the sub-variety) is minimized. Ideally the minimum e that allows for a specific is selected.
(98) The Motzkin polynomial of equation (11) can be used to demonstrate use of equation
(99) (25) to identify the e for the sub-variety x=y=z for a floating point implementation of a polynomial using floating point components with a relative accuracy of 2.sup.23 (F32 components) to guarantee a relative accuracy of 2.sup.10. In particular from equation (25) the result is equation (31).
(100)
(101) The minimization of e is an exercise in polynomial minimization which can be performed, for example, via Grbner bases. Similar minimisations can be performed for each sub-variety and for single or double floating point components to identify e values for each sub-variety for each single and double floating point components. A summary of the e values for the Motzkin polynomial of equation (11) is summarized in Table 3.
(102) TABLE-US-00003 TABLE 3 Sub-Variety F32 (Single) F64 (Double) x = y = z e.sub.1 = 4860985 2.sup.25 e.sub.2 = 12582933 2.sup.41 0.1448686421 0.000005722055448 z = x = 0 e.sub.3 = 14247057 2.sup.25 e.sub.4 = 14529599 2.sup.42 0.4245953858 0.000003303648327 z = y = 0 e.sub.3 = 14247057 2.sup.25 e.sub.4 = 14529599 2.sup.42 0.4245953858 0.000003303648327
(103) This means that when the input falls in any of the following domains D then the polynomial can be accurately evaluated using a floating point implementation using single floating point components (F32),
D.sub.1=subdomain.sub.subvariety:x=y=z and (xz).sup.2e.sub.1.sup.2(x.sup.2+y.sup.2+z.sup.2) or (yz).sup.2e.sub.1.sup.2(x.sup.2+y.sup.2+z.sup.2)
D.sub.2=subdomain.sub.subvariety:x=z=0 and x.sup.2e.sub.3.sup.2(x.sup.2+y.sup.2+z.sup.2) or z.sup.2e.sub.3.sup.2(x.sup.2+y.sup.2+z.sup.2)
D.sub.3=subdomain.sub.subvariety:y=z=0 and y.sup.2e.sub.3.sup.2(x.sup.2+y.sup.2+z.sup.2) or z.sup.2e.sub.3.sup.2(x.sup.2+y.sup.2+z.sup.2)
(104) and if the input falls in any of the following domains D then the polynomial can be accurately evaluated using a floating point implementation using double floating point components (F64)
D.sub.4=subdomain.sub.subvariety:x=y=z and (xz).sup.2e.sub.2.sup.2(x.sup.2+y.sup.2+z.sup.2) or (yz).sup.2e.sub.2.sup.2(x.sup.2+y.sup.2+z.sup.2)
D.sub.5=subdomain.sub.subvariety:x=z=0 and x.sup.2e.sub.4.sup.2(x.sup.2+y.sup.2+z.sup.2) or z.sup.2e.sub.4.sup.2(x.sup.2+y.sup.2+z.sup.2)
D.sub.6=subdomain.sub.subvariety:y=z=0 and y.sup.2e.sub.4.sup.2(x.sup.2+y.sup.2+z.sup.2) or z.sup.2e.sub.4.sup.2(x.sup.2+y.sup.2+z.sup.2)
(105) There are, however, regions of the input domain to the Motzkin polynomial that do not fall into any of the sub-domains (i.e. the sub-domain for the sub-variety x=y=z; the sub-domain for the sub-variety x=z=0; and the sub-domain for the sub-variety y=z=0). The only point on the variety that resides in these regions is the origin x=y=z=0. Close to the origin, but in regions not covered by the sub-domains, a parameter e.sub.5 can be chosen to permit single floating point components (F32) to be used, provided one of the following conditions in equation (32) hold:
xe.sub.5.sup.2(x.sup.2+y.sup.2+z.sup.2) or ye.sub.5.sup.2(x.sup.2+y.sup.2+z.sup.2) or ze(x.sup.2+y.sup.2+z.sup.2)(32)
(106) Similarly e.sub.6 can be calculated to permit double floating point components (F64). These are calculated to be as follows:
e.sub.5=20530592.sup.210.9789748192
e.sub.6=158177392.sup.240.9428107142
(107) Generally as e gets smaller more precision is required. However, at some point e gets so small (e.g. the point gets so close to zero) that very accurate precision is required. This is because as you approach the sub-variety, terms of the polynomial that were irrelevant become relevant and dominant. The determination of the appropriate terms to be evaluated and floating point accuracy thereof in these cases is discussed below.
(108) Determining Precision when Close to the Sub-Variety
(109) When the input point or value is close to a sub-variety whether or not a polynomial can be accurately evaluated and if so, what precision is required, is based on the particular direction of approach to the sub-variety. In particular, if the sub-variety is approached from one direction a particular term may become relevant or dominant whereas if the sub-variety is approached from a different direction a different term may become relevant or dominant.
(110) For example, consider the polynomial in equation (33)
p=x.sup.2a+xyb+y.sup.2(33)
(111) If the sub-variety x=y=0 is approached from a direction characterized by x=y=t the polynomial of equation (33) can be written in terms of variable t as shown in equation (34):
p=t.sup.2(a+b+c)(34)
(112) It can be seen that in this scenario a+b+c becomes the dominant term. Thus, to accurately evaluate the polynomial of equation (34) near x=y=0 it must be possible to accurately evaluate a+b+c. This is not possible since a+b+c is does not have an allowable variety.
(113) To determine the dominant terms of a polynomial as the input approaches a particular sub-variety, the polynomial is expanded using the variables defining the sub-variety within a given sub-domain. The dominant terms are then identified by computing the terms associated with the faces of the convex hull of a multi-dimensional Newton polytope.
(114) An example of identifying the dominant terms in this manner will be described using the Motzkin polynomial of equation (11). To determine the dominant terms of the polynomial as the sub-variety x=y=z is approached, the polynomial is re-written in terms of a=xz and b=yz to produce the polynomial p(z,a,b) shown in equation (35):
p(z,a,b)=4z.sup.4a.sup.2+4z.sup.4ab+4z.sup.4b.sup.2+4z.sup.3a.sup.3+10z.sup.3a.sup.2b+10z.sup.3ab+4z.sup.3b.sup.3+z.sup.2a.sup.4+8z.sup.2a.sup.3b+9z.sup.2a.sup.2b.sup.2+8z.sup.2ab.sup.3+z.sup.2b.sup.4+2za.sup.4b+4za.sup.3b.sup.2+4za.sup.2b.sup.3+2zab.sup.4+a.sup.4b.sup.2+a.sup.2b.sup.4(35)
(115) To determine the terms that dominate in p(z, a, b) as a,b.fwdarw.0, each of the points (i, j) where a term of the form a.sup.ib.sup.j exists in p(z, a, b) is plotted. The points (i, j) 802 for equation (35) are shown in
(116) For example, the term 4z.sup.4ab will tend to zero slower than 9z.sup.2a.sup.2b.sup.2 as a and b approach zero. Hence 4z.sup.4ab dominates 9z.sup.2a.sup.2b.sup.2. Similarly, the set of terms associated with the face nearest the origin dominate all others. It can be seen from
p.sub.dom=4z.sup.4a.sup.2+4z.sup.4ab+4z.sup.4b.sup.2(36)
(117) Once the dominant terms have been identified it is determined whether the polynomial they form has an allowable variety (i.e. are the zeros of the form x.sub.i=0 or x.sub.jx.sub.k=0) and thus can be accurately evaluated. In the example of the Motzkin polynomial the zeros (i.e. sub-varieties) of the polynomial of the dominant terms (p.sub.dom) are z=0 and a=b=0 thus the dominant terms have an allowable variety.
(118) If the polynomial formed of the dominant terms has an allowable variety then the polynomial may be accurately evaluated. If, however, the polynomial formed of dominant terms is to act as a proxy for the polynomial near x=y=z then the basic floating point implementation of the dominant polynomial ({circumflex over (p)}.sub.dom(basic)) has its own error which can be bounded as shown in equation (37):
(119)
(120) If the following can be determined:
(121)
(122) then equation (37) can be written in terms of A as shown in equation (39) and must be less than the global relative error
(123)
(124) Equation (39) can be rearranged to equation (40):
(125)
(126) From equation (40) it is clear that if p.sub.dom can be implemented with relative accuracy bounded above /1+ then {circumflex over (p)}.sub.dom(basic) can be used as the accurate evaluation of p near x=y=z. From the previous section it is known that a basic floating point implementation of a polynomial is relatively accurate if the condition in equation (25) is satisfied. Accordingly, equation (40) can be re-written as equation (41):
(127)
(128) If is known and can be determined then equation (41) can be solved to determine which indicates the level of precision needed to accurately evaluate the polynomial (via the dominant term polynomial) when it gets close to a particular sub-variety.
(129) Reference is now made back to the Motzkin polynomial of equation (11). The dominant polynomial p.sub.dom for this polynomial as it approaches the sub variety x=y=z is shown in equation (36). The value of A is calculated to be 6356645/2.sup.390.00001156266990. The calculation of p.sub.min just off the sub variety returns a minimum value of 70368744177664/1990303912851459264513. Therefore a basic implementation of p.sub.dom must use components that satisfy equation (42):
(130)
(131) The largest value of S which satisfies this is 2.sup.42 so F64 operations can be used.
(132) For the other sub-varieties p.sub.dom=z.sup.6+x.sup.2+y.sup.4, the associated
(133)
Therefore a basic implementation of this p.sub.dom must use components that satisfy equation (43):
(134)
(135) The largest value of which satisfies this is 2.sup.139.
(136) In the regions not covered by the sub-domains, as the origin is approached, all terms are dominant because p is homogenous. For this region
(137)
A basic implementation of p in this domain requires floating point operations with a precision that satisfies equation (44):
(138)
(139) The largest value of S which satisfies this is 2.sup.23.
(140) Example Implementation for Motzkin Polynomial
(141) The output from each of the previous steps is combined to form an implementation of the polynomial which allows accurate evaluation thereof.
(142) The following is an example of an algorithm of an implementation of the Motzkin polynomial of (11) generated using the method described above which accurately evaluates the Motzkin polynomial. The function BasicN(p) implements a basic implementation using floating-point operators of size N{32,64}. An example is illustrated in equation (45)
BasicN(3x.sup.2+2x.sup.2y+3xy.sup.2)=(3((xx)x)+2((xx)y))+3((xy)y)(45)
(143) Basic128.sup.+(p) is an extended quad precision with a mantissa of length 139.
(144) In addition, where the comparisons to determine whether an input is within a sub-domain are being performed using single floating point precision (F32) then the e values should be rounded up to the nearest F32 value to ensure no false positive occurs. This results in the following modified e values.
(145)
(146) TABLE-US-00004 Inputs F16 x, y, z; Output F32 {circumflex over (p)} F32 x = |x| F32 y = |y| F32 z = |z| F32 x2 = x.sup.2 F32 y2 = y.sup.2 F32 z2 = z.sup.2 F32 a = x z F32 b = y z F32 c = x y F32 a2 = a.sup.2 F32 b2 = b.sup.2 F32 s = (x2 + y2) + z2
(147) Although the above examples have been described in relation to homogeneous polynomials it will be evident to a person of skill in the art that the principles and methods described herein may be equally applied to non-homogeneous polynomials as non-homogeneous polynomials can be converted into homogenous polynomials. In particular, an arbitrary polynomial p can be homogenized by evaluating polynomial q as shown in equation (46) which has an extra variable x.sub.0 setting x.sub.0=1 where (d is the largest total degree of p):
(148)
(149) Equation (47) shows that q is in fact homogeneous:
(150)
(151) Reference is now
(152) Computing-based device 900 comprises one or more processors 902 which may be microprocessors, controllers or any other suitable type of processors for processing computer executable instructions to control the operation of the device in order to identify a floating point implementation of a polynomial that allows accurate evaluation of the polynomial. In some examples, for example where a system on a chip architecture is used, the processors 902 may include one or more fixed function blocks (also referred to as accelerators) which implement a part of the method of
(153) The computer executable instructions may be provided using any computer-readable media that is accessible by computing based device 900. Computer-readable media may include, for example, computer storage media such as memory 908 and communications media. Computer storage media (i.e. non-transitory machine readable media), such as memory 908, includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EPROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information for access by a computing device. In contrast, communication media may embody computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave, or other transport mechanism. As defined herein, computer storage media does not include communication media. Although the computer storage media (i.e. non-transitory machine readable media, e.g. memory 908) is shown within the computing-based device 900 it will be appreciated that the storage may be distributed or located remotely and accessed via a network or other communication link (e.g. using communication interface 910).
(154) The computing-based device 900 also comprises an input/output controller 912 arranged to output display information to a display device 914 which may be separate from or integral to the computing-based device 900. The display information may provide a graphical user interface. The input/output controller 912 is also arranged to receive and process input from one or more devices, such as a user input device 916 (e.g. a mouse or a keyboard). This user input may be used to provide the inputs to the polynomial assessment module 906. In an embodiment the display device 914 may also act as the user input device 916 if it is a touch sensitive display device. The input/output controller 912 may also output data to devices other than the display device, e.g. a locally connected printing device (not shown in
(155) The term processor and computer are used herein to refer to any device, or portion thereof, with processing capability such that it can execute instructions. The term processor may, for example, include central processing units (CPUs), graphics processing units (GPUs or VPUs), physics processing units (PPUs), radio processing units (RPUs), digital signal processors (DSPs), general purpose processors (e.g. a general purpose GPU), microprocessors, any processing unit which is designed to accelerate tasks outside of a CPU, etc. Those skilled in the art will realize that such processing capabilities are incorporated into many different devices and therefore the term computer includes set top boxes, media players, digital radios, PCs, servers, mobile telephones, personal digital assistants and many other devices.
(156) Those skilled in the art will realize that storage devices utilized to store program instructions can be distributed across a network. For example, a remote computer may store an example of the process described as software. A local or terminal computer may access the remote computer and download a part or all of the software to run the program. Alternatively, the local computer may download pieces of the software as needed, or execute some software instructions at the local terminal and some at the remote computer (or computer network). Those skilled in the art will also realize that by utilizing conventional techniques known to those skilled in the art that all, or a portion of the software instructions may be carried out by a dedicated circuit, such as a DSP, programmable logic array, or the like.
(157) The methods described herein may be performed by a computer configured with software in machine readable form stored on a tangible storage medium e.g. in the form of a computer program comprising computer readable program code for configuring a computer to perform the constituent portions of described methods or in the form of a computer program comprising computer program code means adapted to perform all the steps of any of the methods described herein when the program is run on a computer and where the computer program may be embodied on a computer readable storage medium. Examples of tangible (or non-transitory) storage media include disks, thumb drives, memory cards etc. and do not include propagated signals. The software can be suitable for execution on a parallel processor or a serial processor such that the method steps may be carried out in any suitable order, or simultaneously.
(158) The hardware components described herein may be generated by a non-transitory computer readable storage medium having encoded thereon computer readable program code.
(159) It is also intended to encompass software which describes or defines the configuration of hardware that implements a module, functionality, component or logic described above, such as HDL (hardware description language) software, as is used for designing integrated circuits, or for configuring programmable chips, to carry out desired functions. That is, there may be provided a computer readable storage medium having encoded thereon computer readable program code for generating a processing unit configured to perform any of the methods described herein, or for generating a processing unit comprising any apparatus described herein. That is, a computer system may be configured to generate a representation of a digital circuit from definitions of circuit elements and data defining rules for combining those circuit elements, wherein a non-transitory computer readable storage medium may have stored thereon processor executable instructions that when executed at such a computer system, cause the computer system to generate a processing unit as described herein. For example, a non-transitory computer readable storage medium may have stored thereon computer readable instructions that, when processed at a computer system for generating a manifestation of an integrated circuit, cause the computer system to generate a manifestation of a processor of a receiver as described in the examples herein or to generate a manifestation of a processor configured to perform a method as described in the examples herein. The manifestation of a processor could be the processor itself, or a representation of the processor (e.g. a mask) which can be used to generate the processor.
(160) Memories storing machine executable data for use in implementing disclosed aspects can be non-transitory media. Non-transitory media can be volatile or non-volatile. Examples of volatile non-transitory media include semiconductor-based memory, such as SRAM or DRAM. Examples of technologies that can be used to implement non-volatile memory include optical and magnetic memory technologies, flash memory, phase change memory, resistive RAM.
(161) A particular reference to logic refers to structure that performs a function or functions. An example of logic includes circuitry that is arranged to perform those function(s). For example, such circuitry may include transistors and/or other hardware elements available in a manufacturing process. Such transistors and/or other elements may be used to form circuitry or structures that implement and/or contain memory, such as registers, flip flops, or latches, logical operators, such as Boolean operations, mathematical operators, such as adders, multipliers, or shifters, and interconnect, by way of example. Such elements may be provided as custom circuits or standard cell libraries, macros, or at other levels of abstraction. Such elements may be interconnected in a specific arrangement. Logic may include circuitry that is fixed function and circuitry can be programmed to perform a function or functions; such programming may be provided from a firmware or software update or control mechanism. Logic identified to perform one function may also include logic that implements a constituent function or sub-process. In an example, hardware logic has circuitry that implements a fixed function operation, or operations, state machine or process.
(162) Any range or device value given herein may be extended or altered without losing the effect sought, as will be apparent to the skilled person.
(163) It will be understood that the benefits and advantages described above may relate to one embodiment or may relate to several embodiments. The embodiments are not limited to those that solve any or all of the stated problems or those that have any or all of the stated benefits and advantages.
(164) Any reference to an item refers to one or more of those items. The term comprising is used herein to mean including the method blocks or elements identified, but that such blocks or elements do not comprise an exclusive list and an apparatus may contain additional blocks or elements and a method may contain additional operations or elements. Furthermore, the blocks, elements and operations are themselves not impliedly closed.
(165) The steps of the methods described herein may be carried out in any suitable order, or simultaneously where appropriate. The arrows between boxes in the figures show one example sequence of method steps but are not intended to exclude other sequences or the performance of multiple steps in parallel. Additionally, individual blocks may be deleted from any of the methods without departing from the spirit and scope of the subject matter described herein. Aspects of any of the examples described above may be combined with aspects of any of the other examples described to form further examples without losing the effect sought. Where elements of the figures are shown connected by arrows, it will be appreciated that these arrows show just one example flow of communications (including data and control messages) between elements. The flow between elements may be in either direction or in both directions.
(166) It will be understood that the above description of a preferred embodiment is given by way of example only and that various modifications may be made by those skilled in the art. Although various embodiments have been described above with a certain degree of particularity, or with reference to one or more individual embodiments, those skilled in the art could make numerous alterations to the disclosed embodiments without departing from the spirit or scope of this invention.