EXACT VERSUS INEXACT DECIMAL FLOATING-POINT NUMBERS AND COMPUTATION SYSTEM
20230147929 · 2023-05-11
Assignee
Inventors
Cpc classification
International classification
Abstract
This disclosure represents an improved computer system and process to avoid the consequences of improper conversion of numbers and of rounding errors. This process makes the distinction between exact and inexact decimal floating-point numbers. If the result of a sequence of operation is exact, the user can trust that every decimal digit in the computed result is correct. On the other hand, if the input operands are inexact or the result cannot be computed exactly, a loss of significant digits occurs, and the user is warned of the loss. A novel representation is used for the inexact computed values. An estimate of the absolute error is also part of the representation.
Claims
1. A computerized method for performing computations, the method comprising: receiving, at one or more processors, a first operand in a data format, wherein the first operand is an inexact number; receiving, at the one or more processors, a second operand in the data format, wherein the second operand is a decimal number, wherein the decimal number includes a data field that encodes a first digit of a coefficient and specifies whether the decimal number is one of exact or inexact; and performing, at the one or more processors, a computation operation comprising one or more of: if an input for an operand is exact but not normalized, not counting leading zeros of the operand, wherein the operand is at least one of the first operand or the second operand, if a result of the computation operation is exact, normalizing the result, or if the result is not exact, outputting an inexact result of the computation operation.
2. The method of claim 1, wherein the inexact result is an underflow and has a representation as an inexact zero with a minimum exponent.
3. The method of claim 1, wherein the inexact result is an overflow and is encoded in the data field as a signed value, wherein the signed value is defined outside a range of numbers representable by the data format.
4. The method of claim 1, wherein, when the computation operation includes division of a non-zero value by an exact zero, a result of infinity is produced.
5. The method of claim 1, wherein the data field of the result is encoded as NaN if the computation operation is an invalid operation.
6. The method of claim 1, wherein the normalizing is performed by a floating-point decimal processing unit within the one or more processors.
7. The method of claim 1, wherein the computation operation is performed by a floating-point processing unit within the one or more processors.
8. The method of claim 1, wherein the data format includes an exact zero representation with an encoding comprising a zero significand and a zero biased exponent.
9. A system for performing computations, the system comprising: one or more processors and a non-transitory computer readable storage having software instructions stored thereon configured to cause the one or more processors to: receive a first operand in a data format, wherein the first operand is an inexact number; receive a second operand in the data format, wherein the second operand is a decimal number, wherein the decimal number includes a data field that encodes a first digit of a coefficient and specifies whether the decimal number is one of exact or inexact; and perform a computation operation comprising one or more of: if an input for an operand is exact but not normalized, not counting leading zeros of the operand, wherein the operand is at least one of the first operand or the second operand, if a result of the computation operation is exact, normalizing the result, or if the result is not exact, outputting an inexact result of the computation operation.
10. The system of claim 9, wherein the inexact result is an underflow and has a representation as an inexact zero with a minimum exponent.
11. The system of claim 9, wherein the inexact result is an overflow and is encoded in the data field as a signed value, wherein the signed value is defined outside a range of numbers representable by the data format.
12. The system of claim 9, wherein, when the computation operation includes division of a non-zero value by an exact zero, a result of infinity is produced.
13. The system of claim 9, wherein the data field of the result is encoded as NaN if the computation operation is an invalid operation.
14. The system of claim 9, wherein the normalizing is performed by a floating-point decimal processing unit within the one or more processors.
15. The system of claim 9, wherein the computation operation is performed by a floating-point processing unit within the one or more processors.
16. The system of claim 9, wherein the data format includes an exact zero representation with an encoding comprising a zero significand and a zero biased exponent.
17. A non-transitory computer readable medium having instructions stored therein that, when executed by one or more processors, cause the one or more processors to perform a method for performing computations, the method comprising: receiving, at one or more processors, a first operand in a data format, wherein the first operand is an inexact number; receiving, at the one or more processors, a second operand in the data format, wherein the second operand is a decimal number, wherein the decimal number includes a data field that encodes a first digit of a coefficient and specifies whether the decimal number is one of exact or inexact; and performing, at the one or more processors, a computation operation comprising one or more of: if an input for an operand is exact but not normalized, not counting leading zeros of the operand, wherein the operand is at least one of the first operand or the second operand, if a result of the computation operation is exact, normalizing the result, or if the result is not exact, outputting an inexact result of the computation operation.
18. The non-transitory computer readable medium of claim 17, wherein the inexact result is an underflow and has a representation as an inexact zero with a minimum exponent.
19. The non-transitory computer readable medium of claim 17, wherein the inexact result is an overflow and is encoded in the data field as a signed value, wherein the signed value is defined outside a range of numbers representable by the data format.
20. The non-transitory computer readable medium of claim 17, wherein the data format includes an exact zero representation with an encoding comprising a zero significand and a zero biased exponent.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
[0045]
DETAILED DESCRIPTION
[0046] In the drawings, like reference numerals designate identical or corresponding parts throughout the several views. Further, as used herein, the words “a,” “an,” and the like generally carry a meaning of “one or more,” unless stated otherwise. The drawings are generally drawn to scale unless specified otherwise or illustrating schematic structures or flowcharts.
[0047] The disclosed method/system focuses on making a distinction between exact and inexact decimal floating-point numbers. If the result of a sequence of operations is exact, every decimal digit in the computed result will be correct. However, if the input operands are inexact and/or the result cannot be computed exactly, significant digits may be lost during computation, and a specific representation is used by the disclosed method/system for the inexact computed result to identify the result as inexact. An estimate of the error is also given as part of the inexact computed result.
[0048] The disclosed method/system also defines computer-based arithmetic on inexact numbers. The use of inexact arithmetic eliminates the need for rounding, which simplifies hardware and software implementations, and ensures that users are warned that the results are inexact.
Exact Versus Inexact Decimal Numbers
[0049] The disclosed method/system distinguishes between exact and inexact decimal floating-point numbers and describes computation on both exact and inexact decimal floating-point numbers, which is not defined nor implemented by IEEE 754-2008.
[0050] An exact decimal floating-point number represents a single discrete value in the infinite continuum of real numbers and can be represented with zero error. An exact decimal number can be normalized. For example, the exact decimal number 0.2 can be represented uniquely as 2,000,000×10.sup.-7 with p = 7 decimal digits.
[0051] Conversion of an exact decimal32 number into a decimal64 number is performed by appending trailing zeros to a significand and adjusting the exponent accordingly. For example, 0.2 = 2,000,×10.sup.-7, with p = 7 decimal digits can be normalized to become 2,000,000,000,000,000×10.sup.-16, with p = 16 decimal digits when converted into decimal64. However, converting an exact decimal64 number into a decimal32 number may produce an inexact number if even one of the nine trailing decimal digits that are removed from the significand is not zero. An inexact decimal number cannot, however, be represented exactly with finite precision. An inexact representation of πis 3,141,592.H×10.sup.-6, with p = 7 digits, where 0.H represents a high ƒraction (0.5 ≤ 0.H < 1). The absolute error here is 0.H×10.sup.-6. An inexact representation of πwith p = 16 digits is 3,141,592,653,589,793.L×10.sup.-15, where 0.L represents a low fraction (0 ≤ 0.L< 0.5). The absolute error here is 0.L×10.sup.-15.
[0052] Converting an inexact number, such as π, from p=7 to p=16 decimal digits does not increase its precision. Leading zeros are inserted: π= 0,000,000,003,141,592.H×10.sup.-6. If an inexact number is not normalized, then it cannot be left-shifted and normalized because the trailing digits are unknown. Therefore, inexact decimal numbers may or may not be normalized. Such inexact decimal numbers typically have .L or .H representations that indicate low or high fraction intervals: 0.L= [0, 0.5) and 0.H = [0.5, 1).
[0053] An inexact number can be the result of an operation with exact operands. However, unlike the IEEE 754 standard, this invention does not use rounding. An inexact number is always encoded with .L or .H fraction that represents an interval in the infinite continuum of real numbers.
Exact Versus Inexact Zeros
[0054] The IEEE 754-2008 decimal standard defines only exact zero as a large cohort with a zero significand and an arbitrary value of an exponent field: zero = ± 0×10.sup.q for any exponent value q. There is no unique representation of the exact zero and no definition of the inexact zero. However, the disclosed method/system distinguishes between exact and inexact zeros. Exact zero has a zero significand and a zero biased exponent (E = 0). It is written as 0 (sign bit is ignored). However, inexact zeros are many and written as: ±0.L×10.sup.q or ±0.H×10.sup.q. Inexact zeros represent errors in computations. The significand is either 0.Lor 0.H. However, the exponent q indicates the scale of the error.
[0055] The value of (x - x) is exact zero when x is exact, but an inexact zero when x is inexact. For a given coefficient C and exponent q, (C.L×10.sup.q - C.L×10.sup.q) is 0.L×10.sup.q. Similarly, (C.H×10.sup.q - C.H×10.sup.q) is also0.L×10.sup.q.
Improved Decimal Format
[0056] The disclosed method/system suggests a new format for exact and inexact decimal numbers.
[0057]
Binary Encoding
[0058] The 5-bit L field indicates whether a decimal number is exact or not and encodes the leading digit d of the integer coefficient C. The 5-bit L field encoding is shown in
[0059] The leading digit d of a normalized exact decimal number should not be 0 and is between 1 and 9. Exact decimal numbers should be normalized to avoid multiplicity of representations. The only exception is, when the biased exponent E is 0, the leading digit d can be 0 only for subnormal decimal numbers. Therefore, there is a unique encoding of exact zero: L = E = T = 0.
[0060] The next 6 encoded values of the L field (24 to 29) encode the leading digit d = 8 to 9. They are also divided into 3 groups, according to cd= 00 (exact), 01 (inexact .L fraction), or 10 (inexact .H fraction). The last two values of the L field encode the specific value of Overflow (L = 30) and NaN (L = 31).
[0061]
[0062] The trailing significand field T is encoded using densely packed decimal (DPD). Each 10-bit declet is unpacked into three BCD digits (See, M. Cowlishaw, “Densely Packed Decimal Encoding”, IEEE Proceedings - Computers and Digital Techniques, vol. 149, p. 102-104, May 2002, which is incorporated herein by reference).
[0063] The significand of an exact decimal number is an integer coefficient C, which is the concatenation of the leading digit d of the L field and the 3k digits in T, as shown in
Exception-Free Computing
[0064] The IEEE 754-2008 standard defines five kinds of exceptions that can be caught in a given computation. These are: invalid operation, division by zero, overflow, underflow, and inexact. According to IEEE 754-2008, an exception is signaled by setting a flag in a hardware register, or by taking a trap.
[0065] The disclosed method/system replaces all signaling operations with exception free quiet operations. An inexact operation produces an inexact result, which can be ±C.Lx10.sup.qor ±C.Hx 10.sup.q. Rounding is not performed and a hardware flag is not used. Underflow is represented as an inexact zero with minimum exponent, which can be ±0.L×10.sup.-Bias or ±().H×10.sup.-Bias. Underflow is not rounded or reduced to exact zero.
[0066] Overflow is a signed value represented by L=30, as shown in
[0067] For any finite number x, -overflow < x < +overflow. Dividing a non-zero value x by exact zero also produces overflow.
[0068] An invalid operation, such as dividing zero by zero or adding +overflow to -overflow, produces a NaN result. NaN values are unordered and cannot be compared. The 5-bit L field encodes the special values of overflow and NaN, as shown in
Comparing Exact and Inexact Decimal Floating-Point Numbers
[0069] According to IEEE 754-2008, floating-point numbers are ordered, except for NaN. Given two floating-point numbers, there are four mutually exclusive relations: equality (EQ), less than (LT), greater than (GT), or unordered (UN). Two rounded numbers can be equal even when they represent different real numbers.
[0070] In this disclosed invention, equality has two meanings. It can be exact or inexact. Two finite decimal numbers x and y are exactly equal if they are both exact and have identical values. If x and y are exactly equal, then their difference (x -y) is exact zero.
[0071] On the other hand, approximate equality (AE) is used when at least one decimal number is inexact. For example, given that x = 31415.H×10.sup.-4 and y = 314.L×10.sup.-2 are two approximations of πwith different exponents and significant digits, then x and y must be aligned. The significand of x is right shifted: x = 31415.H×10.sup.-4 = 314.15H×10.sup.-2 ≈314.L×10.sup.-2, indicating that x is approximately equal to y. Similarly, if z = 3141000×10.sup.-6 is an exact decimal number with p = 7 digits, then z= 314.1×10.sup.-2≈314.L×10.sup.-2 is also approximately equal to y. However, z= 31410.0×10.sup.-4 is less than x= 31415.H×10.sup.-4 when aligning z to x. This example shows that approximate equality is not transitive, while exact equality is transitive.
[0072]
[0073] The first step extracts all the fields of x and y in accordance to the format disclosed in
[0074] The second step compares the exponents Ex and Ey and computes their absolute difference: diƒƒ= abs(Ex - Ey).
[0075] Step 3 swaps the significands {Cx, Fx} and {Cy, Fy}, if Ex is less than Ey. The outputs of this step are: {Cu, Fu} and {Cv, Fv}.
[0076] Step 4 shifts right the significand {Cv, Fv} according to the exponent difference. The output is a shifted significand {Cw, Fw} = SHR({Cv, Fv}, diƒƒ). This step aligns {Cu, Fu} and {Cw, Fw} to have a common exponent, which is max(Ex, Ey).
[0077] Step 5 compares the aligned significands {Cu, Fu} and {Cw, Fw}. If the sign bits Sx and Sy are different, then the comparison result depends only on the sign bits: LT = Sx and GT = Sy. If Sx and Sy are identical then the magnitudes of {Cu, Fu} and {Cw, Fw} are compared for exact equality (EQ), approximate equality (AE), less than (LTu = {Cu, Fu} < {Cw, Fw}), and greater than (GTu = {Cu, Fu} > {Cw, Fw}). The LT (x <y) and GT (x > y) outputs are computed based on the sign bit Sx and whether swap occurred in Step 3.
[0078] Step 6 handles exceptional inputs. If an input is NaN then the comparison result is unordered (UN). Similarly, if both inputs are overflow values of the same sign, then the comparison result is also unordered (UN).
Decimal Addition and Subtraction
[0079] The IEEE 754-2008 standard defines addition and subtraction operations on decimal floating-point numbers that are assumed to be exact. However, the IEEE computation method does not handle inexact input operands properly. It might fail and produce incorrect results when one or both input operands are inexact.
[0080]
[0081] Step 1 extracts all the fields of x and y in accordance to the format disclosed in
[0082] Step 2 compares the exponents Ex and Ey and computes their absolute difference: diff = abs(Ex - Ey). It also computes Eu = max(Ex, Ey), which is the common exponent for the aligned significands.
[0083] Step 3 swaps the significands {Cx, Fx} and {Cy, Fy}, if Ex is less than Ey. It also swaps the sign bits to select the sign Su of the swapped significand {Cu, Fu}. The output of the third step are the swapped {Su, Cu, Fu} and {Cv, Fv}.
[0084] Step 4 shifts right the significand (Cv, Fv) according to the exponent difference. The output of this step is a shifted significand (Cw, Fw) = SHR(Cv, Fv, diƒƒ). This step aligns (Cu, Fu) and (Cw, Fw) to have a common exponent, which is Eu.
[0085] Step 5 determines the effective operation EOP = Sx ^ Sy ^ Op, where ^ is the XOR operator. This step can be done in parallel and does not depend on the previous steps.
[0086] Step 6 adds or subtracts the aligned significands {Cu, Fu} and {Cw, Fw}, according to the effective operation EOP. For subtraction, the BCD (ten’s) complement of {Cw, Fw} is computed. Subtraction is converted into addition to the ten’s complement. The computed sum is {Cs, Fs}. For subtraction, the LT flag indicates whether the computed sum {Cs, Fs} is negative (i.e., whether {Cu, Fu} is less than {Cw, Fw}). If the computed sum is negative, then it should be post-corrected to become {Cr, Fr} = ten’s complement of {Cs, Fs}, which is its absolute value. The result sign is computed as Sr = Su ^ LT, where ^ is the XOR operator.
[0087] The addition and subtraction of inexact decimal floating-point numbers requires arithmetic on the 0.Land 0.H fractions, where 0.Land 0.H are the [0, 0.5) and [0.5, 1) intervals, respectively. One choice is to use interval arithmetic. For example, 0.L+0.Lcan be 0.Lor 0.H, 0.L+0.H can be 0.H or 1.L, and 0.H+0.H can be 1.L or 1.H. Similarly, 0.L-0.L and 0.H-0.H can be ±0.L, 0.H-0.L can be 0.H or 0.L, and 0.L-0.H can be -0.H or -0.L. The drawback of interval arithmetic is that it requires two endpoints to represent the result, and intervals become complex over a sequence of operations, which in turns complicates the implementation of the operation.
[0088] The arithmetic on inexact decimal floating-point numbers, which is disclosed in this invention, is inexact and does not always guarantee a correct result. However, it produces more reliable results than those obtained according to IEEE 754-2008.
[0089] Inexact addition to ±0.L and ±0.H is shown in Table 1. It uses digit injection. The 0.L and 0.H fractions become 0.2 and 0.7, respectively, with an injected BCD fraction digit. The result of inexact addition is an approximation, not an interval. For example, 0.L+0.L becomes 0.2+0.2 ≈0.4 ≈0.L(not 0.H). Similarly 0.L+0.H becomes 0.2+0.4 ≈0.6 ≈0.H (not 1.L), and 0.H+0.H ≈1.L (not 1.H). Inexact subtraction of (0.L-0.L) and (0.H-0.H) are defined to be +0.L (not -0.L). However, (-0.L+0.L) and (-0.H+0.H) are defined to be equal to -0.L. The remaining entries in Table 1 are derived consistently.
TABLE-US-00001 Inexact addition to ±0.L and ±0.H + +0.L +0.H -0.L -0.H +0.L +0.L +0.H +0.L -0.H +0.H +0.H +1.L +0.H +0.L -0.L -0.L +0.H -0.L -0.H -0.H -0.H -0.L -0.H -1.L
[0090] Digit injection is also used for adding and subtracting an inexact fraction with a shifted significand. For example, (0.L+0.4) becomes (0.2+0.4) ≈ 0.H. Similarly, (0.L-0.4) becomes (0.2-0.4) ≈-0.L.
[0091] Step 7 normalizes the result {Cr, Fr} computed in step 6 and adjusts the common exponent Eu. It outputs a normalized significand {Cn, Fn} and exponent En. If the result computed in step 6 has an extra carry digit, then {Ca, Cr, Fr} is shifted right one BCD digit, and the exponent Eu is incremented. If the {Cr, Fr} result, computed in step 6, is exact but with leading zero digits, it is shifted left according to the count LZr of leading zeros in Cr. The left shift amount SA = min(LZr, Eu) in case Eu is close to zero, to avoid having a negative biased exponent. The exponent Eu is also decremented by SA. However, an inexact result with leading zeros cannot be normalized. If {Cr, Fr} is exact zero, then En is reduced to zero. The normalization step is necessary to produce a unique representation of the result.
[0092] Step 8 handles exceptional inputs and produces exceptional results (Overflow and NaN). Adding a finite number to overflow produces overflow. Subtracting two overflow values produces NaN. Adding any value x to NaN produces NaN.
[0093] Step 9 encodes and packs the normalized result z, with sign bit Sr, normalized exponent En, and normalized significand {Cn, Fn}. The L field, shown in
[0094] To demonstrate the computation method of
TABLE-US-00002 Addition Examples according to the method of
[0095] The second example is the addition of an inexact negative number x to an exact positive number y. The exponents are identical and there is no swapping or shifting of the second operand. The effective operation EOP is subtract because the sign bits (Sx and Sy) are different. {Ct, Ft} is the BCD (ten’s) complement of {Cw, Fw}. Subtraction is converted into addition to the ten’s complement. The LT flag is set to 1 because the carry flag Ca is zero for subtraction. It indicates that {Cu, Fu} is less than {Cw, Fw}. The result significand {Cr, Fr} is post-corrected by computing the ten’s complement of {Cs, Fs}. The result sign Sr becomes positive. The result significand {Cr, Fr} cannot be left-shifted and normalized, because Fr is inexact. The result z is positive and inexact with a low fraction.
Decimal Multiplication
[0096] Unlike addition and subtraction, floating-point multiplication does not require the alignment of significands when the exponents are different. To multiply two decimal floating-point numbers, their decimal significands are multiplied, and their exponents are added. The result significand is normalized to the required precision. If at least one of the shifted-out fraction digits is non-zero, the result becomes inexact. The last shifted-out decimal digit indicates a low or high fraction.
[0097] If an input operand is inexact with reduced precision, the result coefficient is restricted to have a precision in accordance to the minimum precision of its input operands. Digit injection is used to approximate 0.Land 0.H as 0.2 and 0.7, respectively.
[0098]
[0099] Step 1 extracts all the fields of x and y in accordance to the format disclosed in
[0100] Step 2 injects Fx and Fy (0, 2, or 7) into Cx and Cy to produce Cu and Cv, each having (p+1) decimal digits, wherep is the precision. This step also counts the maximum number of leading zeros in both coefficients Cx and Cy: m = max(LZx, LZy). This is needed when an input is inexact to determine the precision of the result.
[0101] Step 3 computes the sign of the result Sr = Sx ^ Sy, where ^ is the XOR operator. It also adds the biased exponents Ex and Ey, subtracting the bias: Em = Ex + Ey - Bias - 2.
[0102] Step 4 multiplies the significands: Cm = Cu x Cv, where Cm can have at most (2p+2) nonzero BCD digits. The extra two are the injected fraction digits.
[0103] Step 5 computes LZm, which is the count of leading zeros in Cm. It determines the right shift amount according to the precision p, m and LZm: SA = p +2 + m - LZm. It computes the result exponent Er and increases the shift amount SA if Er is negative. It shifts right the Cm product. The output is a shifted significand {Cr, Fr} = SHR (Cm, SA) and an inexact flag Inx that indicates whether any shifted-out digit is nonzero. The result coefficient Cr has p decimal digits and Fr is a single decimal fraction digit.
[0104] Step 6 handles exceptional inputs and produces exceptional results. If any input is NaN, the product is NaN. Multiplying two overflow values produces overflow. If the product exponent Er exceeds the maximum biased exponent Emax then the result is overflow.
[0105] Step 7 encodes and packs the result z, with sign bit Sr, exponent Er, and significand {Cr, Fr}. The L field, shown in
[0106] To demonstrate the computation method of
TABLE-US-00003 Multiplication examples according to the method of
[0107] The second example is the multiplication of two inexact decimal numbers x and y. The number of leading zero digits in the coefficients of x and y is LZx = 2 and LZy = 1, respectively, with a maximum m = 2. This indicates that the product coefficient cannot have more significant digits than its input operands, when they are inexact. The injected high and low fraction digits are 7 and 2, respectively. The computed product Cm has four leading decimal zeros (LZm = 4). The shift amount becomes SA = 7 to keep only five significant digits in the final product coefficient Cr. The result z is negative and inexact with a low fraction.
Decimal Division
[0108] Given two finite decimal floating-point numbers x and y, the significand of x is divided by the significand ofy, and the exponents are subtracted. The result is then normalized to the required precision. If an input operand is inexact with reduced precision, the result coefficient is restricted to have a precision in accordance to the minimum precision of its input operands. Digit injection is used to approximate 0.Land 0.H as 0.2 and 0.7, respectively.
[0109]
[0110] Step 1 extracts all the fields of x and y in accordance to the format disclosed in
[0111] Step 2 injects Fx (0, 2, or 7) and (p+1) decimal zeros into Cx to produce a coefficient Cu having (2p+2) decimal digits, where p is the precision. It also injects Fy (0, 2, or 7) into Cy to produce a coefficient Cv having (p+1) decimal digits. This step also counts the maximum number of leading zeros in both coefficients Cx and Cy: m = max(LZx, LZy). This is needed when an input operand is inexact to determine the precision of the result.
[0112] Step 3 computes the sign of the result Sr = Sx ^ Sy, where ^ is the XOR operator. It also computes the biased exponent of the quotient: Eq = Ex - Ey - p- 1 + Bias.
[0113] Step 4 divides the extended decimal coefficients: Cq = Cu / Cv. This step produces a quotient Cq having at most (2p+2) decimal digits, and a remainder flag Rm that indicates whether division is inexact (Rm can be 1 or 0).
[0114] Step 5 computes LZq, which is the count of leading zeros in Cq. It determines the right shift amount according to the precision p, m and LZq: SA = (p + 2 + m - LZq). It computes the result biased exponent Er and increases the shift amount SA if Er is negative. It shifts right the Cq quotient. The output is a shifted significand {Cr, Fr} = SHR (Cq, SA) and an inexact flag Inx that indicates whether any shifted-out digit is nonzero. The result coefficient Cr has p decimal digits. The result fraction Fr is a single decimal digit.
[0115] Step 6 handles exceptional inputs and produces exceptional results. If any input is NaN, the result is NaN. Dividing two overflow values or two zero values also produces NaN. Dividing a non-zero value by zero produces overflow. If the product exponent Er exceeds the maximum biased exponent Emax then the result is also overflow.
[0116] Step 7 encodes and packs the result z, with sign bit Sr, exponent Er, and significand {Cr, Fr}. The L field, shown in
[0117] To demonstrate the computation method of
[0118] The coefficient Cu is the concatenation of Cx, fraction Fx, and eight decimal zeros. The coefficient Cv is the concatenation of Cy and Fy. The maximum number of leading zeros in Cx and Cy is m = 0 because x and y are exact. The result sign is Sr = 0 (positive) and the quotient exponent is Eq = -9 + Bias, which is the exponent of Cq = Cu / Cv.
[0119] The coefficient Cu is divided by Cv to produce a quotient Cq that can have at most (2p + 2) = 16 decimal digits. The remainder flag Rm indicates that the remainder is not zero. The count of leading zeros in Cq is LZq = 8. The shift amount SA = 1 to keep only 7 significant digits. The result coefficient Cr is computed by shifting right Cq. The fraction Fr is the last shifted-out decimal digit. The result exponent Er is incremented in accordance to the shift amount SA. The result is inexact even though the operands are exact.
TABLE-US-00004 Division examples according to the method of
[0120] The second example is the division of two inexact decimal numbers x and y. The number of leading zero digits in the coefficients of x and y is LZx = 0 and LZy = 2, respectively, with a maximum m = 2. The injected high and low fraction digits are 7 and 2, respectively. The computed quotient Cq has five leading decimal zeros (LZq = 5). The shift amount becomes SA = 6 to keep only five significant digits in the result coefficient Cr. This indicates that the result coefficient cannot have more significant digits than its weakest input operand with the least number of significant digits. The result exponent Er is incremented in accordance to the shift amount SA.
[0121]
[0122] Device 800 can be any suitable computer system, server, or other electronic or hardware device. For example, the device 800 can be a mainframe computer, desktop computer, workstation, portable computer, or electronic device (portable device, mobile device, cell phone, smart phone, tablet computer, television, TV set top box, personal digital assistant (PDA), media player, game device, wearable device, etc.). In some implementations, device 800 includes a processor 802, an operating system 804, a memory 806, and input/output (I/O) interface 808.
[0123] Processor 802 can be one or more processors and/or processing circuits to execute program code and control basic operations of the device 800. The processor 802 can include a decimal floating-point computation unit as described herein. A “processor” includes any suitable hardware and/or software system, mechanism or component that processes data, signals or other information. A processor may include a system with a general-purpose central processing unit (CPU), multiple processing units (e.g., a decimal floating-point unit), dedicated circuitry for achieving functionality, or other systems. Processing need not be limited to a particular geographic location, or have temporal limitations. For example, a processor may perform its functions in “real-time,” “offline,” in a “batch mode,” etc. Portions of processing may be performed at different times and at different locations, by different (or the same) processing systems. A computer may be any processor in communication with a memory.
[0124] Memory 806 is typically provided in device 800 for access by the processor 802, and may be any suitable processor-readable storage medium, e.g., random access memory (RAM), read-only memory (ROM), Electrical Erasable Read-only Memory (EEPROM), Flash memory, etc., suitable for storing instructions for execution by the processor, and located separate from processor 802 and/or integrated therewith. Memory 806 can store software operating on the device 800 by the processor 802, including an operating system 804, one or more applications 810, and a database 812. In some implementations, applications 810 can include instructions that enable processor 802 to perform the functions described herein (e.g., in
[0125] For example, the application 810 can include, perform, and/or control decimal floating-point computations as described herein. Any of software in memory 804 can alternatively be stored on any other suitable storage location or computer-readable medium. In addition, memory 804 (and/or other connected storage device(s)) can store machine learning model (e.g., SVM) information, and/or other instructions and data used in the features described herein. Memory 804 and any other type of storage (magnetic disk, optical disk, magnetic tape, or other tangible media) can be considered “storage” or “storage devices.”
[0126] I/O interface 808 can provide functions to enable interfacing the processing device 800 with other systems and devices. For example, network communication devices, storage devices (e.g., memory and/or database), and input/output devices can communicate via interface 808. In some implementations, the I/O interface 808 can connect to interface devices including input devices (keyboard, pointing device, touchscreen, microphone, camera, scanner, etc.) and/or output devices (display device, speaker devices, printer, motor, etc.).
[0127] For ease of illustration,
[0128] In general, a computer that performs the processes described herein can include one or more processors and a memory (e.g., a non-transitory computer readable medium). The process data and instructions may be stored in the memory. These processes and instructions may also be stored on a storage medium such as a hard drive (HDD) or portable storage medium or may be stored remotely. Note that each of the functions of the described embodiments may be implemented by one or more processors or processing circuits. A processing circuit can include a programmed processor, as a processor includes circuitry. A processing circuit/circuitry may also include devices such as an application specific integrated circuit (ASIC) and conventional circuit components arranged to perform the recited functions. The processing circuitry can be referred to interchangeably as circuitry throughout the disclosure. Further, the claimed advancements are not limited by the form of the computer-readable media on which the instructions of the inventive process are stored. For example, the instructions may be stored on CDs, DVDs, in FLASH memory, RAM, ROM, PROM, EPROM, EEPROM, hard disk or any other information processing device.
[0129] The processor may contain one or more processors and even may be implemented using one or more heterogeneous processor systems. According to certain implementations, the instruction set architecture of the processor can use a reduced instruction set architecture, a complex instruction set architecture, a vector processor architecture, a very large instruction word architecture. Furthermore, the processor can be based on the Von Neumann model or the Harvard model. The processor can be a digital signal processor, an FPGA, an ASIC, a PLA, a PLD, or a CPLD. Further, the processor can be an x86 processor by Intel or by AMD; an ARM processor, a Power architecture processor by, e.g., IBM; a SPARC architecture processor by Sun Microsystems or by Oracle; or other known CPU architecture.
[0130] The functions and features described herein may also be executed by various distributed components of a system. For example, one or more processors may execute the functions, wherein the processors are distributed across multiple components communicating in a network. The distributed components may include one or more client and server machines, which may share processing in addition to various human interface and communication devices (e.g., display monitors, smart phones, tablets, personal digital assistants (PDAs)). The network may be a private network, such as a LAN or WAN, or may be a public network, such as the Internet. Input to the system may be received via direct user input and received remotely either in real-time or as a batch process. Additionally, some implementations may be performed on modules or hardware not identical to those described. Accordingly, other implementations are within the scope that may be claimed.