Information processing apparatus, program, and information processing method configured to handle a high-precision computer number
11334317 · 2022-05-17
Assignee
Inventors
- Katsunori Shimomura (Tokyo, JP)
- Tadaaki Taguchi (Tokyo, JP)
- Akira Kawasaki (Tokyo, JP)
- Reki Yamamoto (Tokyo, JP)
Cpc classification
International classification
Abstract
An information processing apparatus, program, and information processing method performing validated numerics. Arithmetic operation of definite numbers a.sub.1 and b.sub.1 of the computer numbers in which real numbers A and B are defined by formulas (1) and (2) is performed to determine an absolute effective digit γ satisfying formula (3),
A=a.sub.1+a.sub.2,|a.sub.1|≤C.sup.ha,−C.sup.ea≤a.sub.2<C.sup.ea formula (1),
B=b.sub.1+b.sub.2,|a.sub.1|≤C.sup.hb,−C.sup.eb≤b.sub.2<C.sup.eb formula (2),
(A,B)=(a,b).sub.1+(a,b).sub.2,−C.sup.γ≤(a,b).sub.2<C.sup.γ formula (3).
a.sub.1 and b.sub.1 are definite numbers whose numerical values are definite, and a.sub.2 and b.sub.2 are uncertain numbers whose numerical values are uncertain; C denotes a radix; h.sub.a and h.sub.b denote extended high order maxes that are minimum extended digits satisfying |a.sub.1|≤C.sup.ha and |b.sub.1|≤C.sup.hb, and h.sub.a and h.sub.b denote high order maxes that are integers; and e.sub.a, e.sub.b and γ denote the absolute effective digits that are integers.
Claims
1. An information processing apparatus comprising: a memory configured to store a high-precision computer number in which real numbers A and B are respectively defined by the following formulas (11) and (12); and a processor programmed to: input the high-precision computer number in bit representation into the memory; perform an arithmetic operation by using definite numbers a.sub.1 and b.sub.1 of the high-precision computer numbers stored in the memory, and by assuming that real numbers A and B fall within a range of a.sub.1−C.sup.ea+ea′A≤A<a.sub.1+C.sup.ea+ea′ and b.sub.1−C.sup.eb+eb′, respectively; determine an extended absolute effective digit (γ+γ′) so as to satisfy the following formula (13); and instruct the memory to store high-precision computer numbers as the result of the arithmetic operation,
A=a.sub.1+a.sub.2,|a.sub.1|≤C.sup.h.sup.
B=b.sub.1+b.sub.2,|b.sub.1|≤C.sup.h.sup.
(A,B)=(a,b).sub.1+(a,b).sub.2,−C.sup.γ+γ′≤(a,b).sub.2<C.sup.γ+γ′ (13) wherein a.sub.1 and b.sub.1 are definite numbers whose numerical values are definite, and a.sub.2 and b.sub.2 are uncertain numbers whose numerical values are uncertain; C denotes a radix; h.sub.a+h.sub.a′ and h.sub.b+h.sub.b′ respectively denote extended high order maxes that are minimum extended digits satisfying |a.sub.1|≤C.sup.ha+ha′ and |b.sub.1|≤C.sup.hb+hb′; h.sub.a and h.sub.b respectively denote high order maxes that are integers; h.sub.a′ and h.sub.b′ respectively denote decimal digits that are decimals of 0≤h.sub.a′<1 and 0≤h.sub.b′<1; e.sub.a+e.sub.a′, e.sub.b+e.sub.b′, and γ+γ′ denote extended absolute effective digits that are extended digits; e.sub.a, e.sub.b, and γ denote absolute effective digits that are integers; e.sub.a′, e.sub.b′, and r′ respectively denote decimal digits that are decimals of 0≤e.sub.a′<1, 0≤e.sub.b′<1, and 0≤r′<1; (A, B) is the arithmetic operation result of the real numbers A and B; (a, b).sub.1 is the arithmetic operation result of the definite numbers a.sub.1 and b.sub.1; and (a, b).sub.2 is the arithmetic operation result of the uncertain numbers a.sub.2 and b.sub.2; and wherein a sign s, a high order max h, a high order max sub h′, a low order max l, an absolute effective digit e, an absolute effective digit sub e′, and a number of array a are input, a predetermined number of bit arrays in the memory are secured based on the high order max h and a least significant digit l, and the high-precision computer number is stored in the memory in bit representation.
2. The information processing apparatus according to claim 1, wherein the radix C of the high-precision computer number is any one of 2, 8, and 16.
3. The information processing apparatus according to claim 2, wherein the decimal digits h.sub.a′, h.sub.b′, e.sub.a′, e.sub.b′, and r′ are h.sub.a′/2.sup.n(0≤h.sub.a′<2.sup.n), h.sub.b′/2.sup.n(0≤h.sub.b′<2m), e.sub.a′/2.sup.n(0≤e.sub.a′<2.sup.n), e.sub.b′/2.sup.n(0≤e.sub.b′<2.sup.n), r′/2.sup.n(0≤r′<2.sup.n), respectively, and n is a natural number.
4. The information processing apparatus according to claim 3, wherein the radix C of the high-precision computer number is 2; and n of the decimal digits is 8.
5. The information processing apparatus according to claim 4, wherein, in addition (A+B), the processor is programmed to set the minimum extended digit satisfying the following formula (14) as the extended absolute effective digit.
C.sup.e.sup.
6. The information processing apparatus according to claim 4, wherein, in multiplication (A*B), the processor is programmed to set the minimum extended digit satisfying the following formula (15) as the extended absolute effective digit.
C.sup.h.sup.
7. The information processing apparatus according to claim 4, wherein, in division (B/A), the processor is programmed to evaluate the following formula (18) as the high-precision computer number defined by the following formulas (16) and (17) by using h.sub.a+h.sub.a′−1/256 and h.sub.b+h.sub.b′−1/256 instead of h.sub.a+h.sub.a′ and h.sub.b+h.sub.b′, respectively, to determine the extended absolute effective digit.
8. The information processing apparatus according to claim 1, wherein a data structure of the high-precision computer number has a first header, a second header, and a significand, the first header has the sign s of 1 bit and an array of 7 bit, the second header has the high order max h of 16 bit, the high order max sub h′ of 8 bit, the low order max l, the absolute effective digit e of 16 bit, and the absolute effective digit sub e′ of 8 bit, and the significand has a 32 bit array a with a maximum of 127.
9. The information processing apparatus according to claim 8, wherein a floating point representation data is converted to the data structure of a high-precision computer number.
10. A non-transitory recording medium comprising a program for causing a computer to execute a process comprising: an input step of inputting a high-precision computer number in which real numbers A and B are respectively defined by the following formulas (11) and (12); a storing step of storing the high-precision computer number in bit representation into a memory; and an arithmetic step of performing an arithmetic operation by using definite numbers a.sub.1 and b.sub.1 of the high-precision computer numbers stored in the memory, and by assuming that real numbers A and B fall within a range of a.sub.1−C.sup.ea+ea′A≤A<a.sub.1+C.sup.ea+ea′ and b.sub.1−C.sup.eb+eb′, respectively, determining an extended absolute effective digit (γ+γ′) so as to satisfy the following formula (13); and storing, in the memory, high-precision computer numbers as the result of the arithmetic operation,
A=a.sub.1+a.sub.2,|a.sub.1|≤C.sup.h.sup.
B=b.sub.1+b.sub.2,|b.sub.1|≤C.sup.h.sup.
(A,B)=(a,b).sub.1+(a,b).sub.2,−C.sup.γ+γ′≤(a,b).sub.2<C.sup.γ+γ′ (13) wherein a.sub.1 and b.sub.1 are definite numbers whose numerical values are definite, and a.sub.2 and b.sub.2 are uncertain numbers whose numerical values are uncertain; C denotes a radix; h.sub.a+h.sub.a′ and h.sub.b+h.sub.b′ respectively denote extended high order maxes that are minimum extended digits satisfying |a.sub.1|≤C.sup.ha+ha′ and |b.sub.1|≤C.sup.hb+hb′; h.sub.a and h.sub.b respectively denote high order maxes that are integers; h.sub.a′ and h.sub.b′ respectively denote decimal digits that are decimals of 0≤h.sub.a′<1 and 0≤h.sub.b′<1; e.sub.a+e.sub.a′, e.sub.b+e.sub.b′, and γ+γ′ denote extended absolute effective digits that are extended digits; e.sub.a, e.sub.b, and γ denote absolute effective digits that are integers; e.sub.a′, e.sub.b′, and r′ respectively denote decimal digits that are decimals of 0≤e.sub.a′<1, 0≤e.sub.b′<1, and 0≤r′<1; (A, B) is the arithmetic operation result of the real numbers A and B; (a, b).sub.1 is the arithmetic operation result of the definite numbers a.sub.1 and b.sub.1; and (a, b).sub.2 is the arithmetic operation result of the uncertain numbers a.sub.2 and b.sub.2; and wherein a sign s, a high order max h, a high order max sub h′, a low order max l, an absolute effective digit e, an absolute effective digit sub e′, and a number of array a are input, a predetermined number of bit arrays in the memory are secured based on the high order max h and a least significant digit l, and the high-precision computer number is stored in the memory in bit representation.
11. An information processing method comprising: an input step of inputting a high-precision computer number in which real numbers A and B are respectively defined by the following formulas (11) and (12); a storing step of storing the high-precision computer number in bit representation into a memory; and an arithmetic step of performing an arithmetic operation by using definite numbers a.sub.1 and b.sub.1 of the high-precision computer numbers stored in the memory, and by assuming that real numbers A and B fall within a range of a.sub.1−C.sup.ea+ea′A≤A<a.sub.1+C.sup.ea+ea′ and b.sub.1−C.sup.eb+eb′, respectively, determining an extended absolute effective digit (γ+γ′) so as to satisfy the following formula (13); and storing, in the memory, high-precision computer numbers as the result of the arithmetic operation,
A=a.sub.1+a.sub.2,|a.sub.1|≤C.sup.h.sup.
B=b.sub.1+b.sub.2,|b.sub.1|≤C.sup.h.sup.
(A,B)=(a,b).sub.1+(a,b).sub.2,−C.sup.γ+γ′≤(a,b).sub.2<C.sup.γ+γ′ (13) wherein a.sub.1 and b.sub.1 are definite numbers whose numerical values are definite, and a.sub.2 and b.sub.2 are uncertain numbers whose numerical values are uncertain; C denotes a radix; h.sub.a+h.sub.a′ and h.sub.b+h.sub.b′ respectively denote extended high order maxes that are minimum extended digits satisfying |a.sub.1|≤C.sup.ha+ha′ and |b.sub.1|≤C.sup.hb+hb′; h.sub.a and h.sub.b respectively denote high order maxes that are integers; h.sub.a′ and h.sub.b′ respectively denote decimal digits that are decimals of 0≤h.sub.a′<1 and 0≤h.sub.b′<1; e.sub.a+e.sub.a′, e.sub.b+e.sub.b′, and γ+γ′ denote extended absolute effective digits that are extended digits; e.sub.a, e.sub.b, and γ denote absolute effective digits that are integers; e.sub.a′, e.sub.b′, and r′ respectively denote decimal digits that are decimals of 0≤e.sub.a′<1, 0≤e.sub.b′<1, and 0≤r′<1; (A, B) is the arithmetic operation result of the real numbers A and B; (a, b).sub.1 is the arithmetic operation result of the definite numbers a.sub.1 and b.sub.1; and (a, b).sub.2 is the arithmetic operation result of the uncertain numbers a.sub.2 and b.sub.2; and wherein a sign s, a high order max h, a high order max sub h′, a low order max l, an absolute effective digit e, an absolute effective digit sub e′, and a number of array a are input, a predetermined number of bit arrays in the memory are secured based on the high order max h and a least significant digit l, and the high-precision computer number is stored in the memory in bit representation.
Description
BRIEF DESCRIPTION OF DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
(25)
(26)
(27)
(28)
(29)
(30)
(31)
(32)
(33)
(34)
(35)
(36)
(37)
(38)
(39)
(40)
(41)
(42)
(43)
(44)
(45)
(46)
(47)
(48)
(49)
MODE FOR CARRYING OUT THE INVENTION
1. First Embodiment
(50) Hereinafter, a first embodiment of the present technology will be described in detail in the following order with reference to the drawings. It should be noted that the present technology is not limited to the following embodiment, and various modifications may be made without departing from the scope of the present technology.
(51) 1-1. Information processing apparatus
(52) 1-2. Arithmetic operation of computer numbers
(53) 1-3. Examples
1-1. Information Processing Apparatus
(54) Functional Configuration
(55)
A=a.sub.1+a.sub.2,|a.sub.1|≤C.sup.h.sup.
A=b.sub.1+b.sub.2,|b.sub.1|≤C.sup.h.sup.
(A,B)=(a,b).sub.1+(a,b).sub.2,−C.sup.γ≤(a,b).sub.2<C.sup.γ (3)
(56) Herein a.sub.1 and b.sub.1 are definite numbers whose numerical values are definite, and a.sub.2 and b.sub.2 are uncertain numbers whose numerical values are uncertain; C denotes a radix; h.sub.a and h.sub.b respectively denote extended high order maxes that are minimum extended digits satisfying |a.sub.1|≤C.sup.ha and |b.sub.1|≤C.sup.hb, and h.sub.a and h.sub.b respectively denote high order maxes that are integers; e.sub.a, e.sub.b and γ denote absolute effective digits that are integers; and (A, B) is an arithmetic operation result of the real numbers A and B, (a, b).sub.1 is an arithmetic operation result of the definite numbers a.sub.1 and b.sub.1, and (a, b).sub.2 is an arithmetic operation result of the uncertain numbers a.sub.2 and b.sub.2.
(57) The radix C of the computer number is not particularly limited, but is preferably 2, 8, or 16. Thus, it is possible to verify the arithmetic operation of binary, octal or hexadecimal numbers used in general-purpose computers.
(58)
(59) The input unit 11 inputs computer numbers in which real numbers A and B are respectively defined by the following formulas (1) and (2).
A=a.sub.1+a.sub.2,|a.sub.1|≤C.sup.h.sup.
A=b.sub.1+b.sub.2,|b.sub.1|≤C.sup.h.sup.
(60) Herein a.sub.1 and b.sub.1 are definite numbers whose numerical values are definite, and a.sub.2 and b.sub.2 are uncertain numbers whose numerical values are uncertain; C denotes a radix; h.sub.a and h.sub.b respectively denote extended high order maxes that are minimum extended digits satisfying |a.sub.1|≤C.sup.ha and |b.sub.1|≤C.sup.hb, and h.sub.a and h.sub.b respectively denote high order maxes that are integers; and e.sub.a, e.sub.b and γ denote absolute effective digits that are integers.
(61) Specifically, the input unit 11 inputs computer numbers for which a sign, high order max, low order max, absolute effective digit, and number of arrays are set.
(62) The storage unit 12 stores computer numbers and is a so-called buffer. The buffer is reserved based on the high order max and the low order max inputted by the input unit 11.
(63)
(64) The arithmetic unit 13 performs an arithmetic operation by using the definite numbers a.sub.1 and b.sub.1 of computer numbers stored in the storage unit 12, determines an absolute effective digit r so as to satisfy the following formula (3), and stores, in the storage unit 12, computer numbers as the result of the arithmetic operation.
(A,B)=(a,b).sub.1+(a,b).sub.2,−C.sup.γ≤(a,b).sub.2<C.sup.γ (3)
(65) Herein C denotes a radix; γ is an exponent and is an absolute effective digit; and (A, B) is an arithmetic operation result of the real numbers A and B, (a, b).sub.1 is an arithmetic operation result of the definite numbers a.sub.1 and b.sub.1, and (a, b).sub.2 is an arithmetic operation result of the uncertain numbers a.sub.2 and b.sub.2.
(66) The arithmetic operation of computer numbers is performed on the assumption that the real numbers A and B fall within the range of a.sub.1−C.sup.ea≤A<a.sub.1+C.sup.ea and b.sub.1−C.sup.eb≤B<b.sub.1+C.sup.eb, respectively, so that the arithmetic operation result is also a computer number and thus satisfies the formula (3). The arithmetic operation of the computer number is divided into the definite numbers a.sub.1, b.sub.1 and the uncertain numbers a.sub.2, b.sub.2, the definite numbers a.sub.1, b.sub.1 are calculated as they are, and the absolute effective digit r of the arithmetic operation result is determined according to various conditions.
(67) In the arithmetic operation of computer numbers, the maximum value of the absolute effective digit can be specified. In particular, for division and functions such as root, sin, and cos, the absolute effective digit can be specified. Thus, the numerical value of the absolute effective digit or more can be guaranteed.
(68) The arithmetic operation of computer numbers is also considered to be a special pattern of interval arithmetic. However, the difference is that the interval arithmetic represents a real number as a range and performs four arithmetic operations on the range exactly, while the four arithmetic operations of computer numbers controls errors by the absolute effective digit. In addition, since the arithmetic operation of computer numbers is considered to be applied after determining a system of numbers instead of real numbers, the starting point thereof is different from the concept of interval arithmetic. Therefore, the arithmetic operation itself of computer numbers is simpler than interval arithmetic, and the arithmetic operation is simplified by using the absolute effective digit as an evaluation of errors (unclear parts).
(69) According to such an information processing apparatus, only the part requiring the absolute effective digit can be calculated by a computer number arithmetic operation library, and the absolute effective digit of the arithmetic operation result can be confirmed, so that validated numerics can be easily performed.
(70) The information processing apparatus preferably further includes a control unit 14 for converting a numerical arithmetic operation program for real numbers into a numerical arithmetic operation program for computer numbers, executing the numerical arithmetic operation program for computer numbers, and storing, in the storage unit 12, computer numbers obtained by the arithmetic operation in the arithmetic unit 13. Thus, the history of the absolute effective digit of each variable can be maintained during the execution of the numerical arithmetic operation program for computer numbers, and it is possible to check whether the required absolute effective digit can be finally obtained. It is also possible to inversely calculate the absolute effective digit of the value of each variable required for an arithmetic operation of a desired absolute effective digit.
(71) Hardware Configuration
(72)
(73) The CPU 21 is capable of executing processes of the arithmetic unit 13 and the control unit 14 in the functional configuration shown in
(74) The GPU 22 has a video memory (VRAM), can perform drawing processing and arithmetic operation processing in response to a request from the CPU 21, and may calculate computer numbers by the computer number arithmetic operation library. The ROM 23 is, for example, a read-only nonvolatile memory, and stores information such as constants necessary for the operation of each block of the information processing apparatus 2. The RAM 24 is a volatile memory, and is used not only as a deployment area for an arithmetic program but also as a storage area for temporarily storing intermediate data or the like outputted during the operation of each block of the information processing apparatus 2.
(75) The operation input unit 25 can control the function of the input unit 11 in the functional configuration shown in
(76) The storage 26 records a numerical arithmetic operation program or the like deployed in the RAM 24. The storage 26 may be an HDD (Hard disk drive), an SSD (Solid State Drive), or an optical drive, among others. The input/output interface 27 can output images generated by the GPU 22 to a display device.
(77) In such a hardware configuration, the functional configuration shown in
2. Arithmetic Operation of Computer Numbers
(78) Hereinafter, the arithmetic operation method will be described with the radix of the computer number being 2 (C=2).
(79) Addition
(80) Assume that a real number A and a real number B are computer numbers defined by the formulas (1) and (2), respectively.
A=a.sub.1+a.sub.2,|a.sub.1|≤C.sup.h.sup.
A=b.sub.1+b.sub.2,|b.sub.1|≤C.sup.h.sup.
(81) Herein a.sub.1 and b.sub.1 are definite numbers whose numerical values are definite, and a.sub.2 and b.sub.2 are uncertain numbers whose numerical values are uncertain; C denotes a radix; h.sub.a and h.sub.b respectively denote extended high order maxes that are minimum extended digits satisfying |a.sub.1|≤C.sup.ha and |b.sub.1|≤C.sup.hb, and h.sub.a and h.sub.b respectively denote high order maxes that are integers; and e.sub.a, e.sub.b and γ denote absolute effective digits that are integers.
(82) Hereinafter, m, n, and r will be used for explanation instead of the absolute effective digits e.sub.a, e.sub.b, and γ, respectively.
A+B=a.sub.1+a.sub.2+b.sub.1+b.sub.2=(a.sub.1+b.sub.1)+(a.sub.2βb.sub.2).Math.(a.sub.1+b.sub.1)−C.sup.m−C.sup.n≤A+B<(a.sub.1+b.sub.1)+C.sup.m+C.sup.n
(83) Assume that r is the smallest integer that satisfies C.sup.m+C.sup.n≤C.sup.r.
(84) If m=n, then r=m+1; if m≠n, then r=max (m, n)+1, so r=max (m, n)+1.
A+B=(a.sub.1+b.sub.1)++B).sub.2
(85) (a.sub.1+b.sub.1) is a definite number, (A+B).sub.2 is an uncertain number, −C.sup.r≤(A+B).sub.2<C.sup.r is satisfied, and r is the absolute effective digit. The minimum order of the definite number (a.sub.1+b.sub.1) may be less than the absolute effective digit. A description will be given by classifying patterns according to the sign of (a.sub.1+b.sub.1).
(86)
(a.sub.1+b.sub.1)−C.sup.r≤A+B<(a.sub.1+b.sub.1)+C.sup.r.Math.(α+β)+γ−C.sup.r≤A+B<(α+β)+γ+C.sup.r
(87) If γ>0 ((a.sub.1+b.sub.1)>0),
(88) then (α+β)−C.sup.r−1≤A+B<(α+β)+C.sup.r−1.
(89) If γ<0 ((a.sub.1+b.sub.1)<0),
(90) then (α+β)−C.sup.r+1≤A+B<(α+β)+C.sup.r+1.
(91) Thus, (α+β)−C.sup.r+1≤A+B<(α+β)+C.sup.r+1 holds, (α+β) is a definite number, and the absolute effective digit is r+1.
(92) If γ=0, then (α+β)−C.sup.r≤A+B<(α+β)+C.sup.r holds, (α+β) is a definite number, and the absolute effective digit is r.
(93) Multiplication
A*B=(a.sub.1+a.sub.2)*(b.sub.1+b.sub.2)=a.sub.1b.sub.1+a.sub.2b.sub.1+a.sub.1b.sub.2a.sub.2b.sub.2
(94) Assume that A and B are both positive.
(95) a.sub.1≤C.sup.α; α is the smallest integer that satisfies the inequality. It is preferably log.sub.2|a.sub.1|.
(96) b.sub.1≤C.sup.β; β is the smallest integer that satisfies the inequality. It is preferably log.sub.2|b.sub.1|.
−C.sup.mC.sup.β≤a.sub.2b.sub.1<C.sup.mC.sup.β,
−C.sup.aC.sup.m≤a.sub.1b.sub.2<C.sup.aC.sup.m, and
−C.sup.mC.sup.n<a.sub.2b.sub.2≤C.sup.mC.sup.n,
therefore,
a.sub.1b.sub.1−C.sup.mC.sup.β−C.sup.m−C.sup.mC.sup.n<A*B<a.sub.1b.sub.1+C.sup.mC.sup.β+C.sup.aC.sup.m+C.sup.mC.sup.n.Math.a.sub.1b.sub.1−C.sup.m+β−C.sup.α+m−C.sup.m+n<A*B<+C.sup.m+β+C.sup.α+m+C.sup.m+n
(97) Assume that r is the smallest integer that satisfies C.sup.m+β+C.sup.α+m+C.sup.m+m≤C.sup.r.
A*B=a.sub.1b.sub.1+(A*B).sub.2
(98) a.sub.1b.sub.1 is a definite number, (A*B).sub.2 is an uncertain number, −C.sup.r≤(A*B).sub.2<C.sup.r is satisfied, and r is the absolute effective digit. The minimum order of the definite number a.sub.1b.sub.1 may be less than the effective digit.
(99) The definite number and the absolute effective digit considering the meaningless digit part are the same as the positive case in addition.
(100)
(101) Assuming that r is the smallest integer that satisfies C.sup.m+β+C.sup.α+m+C.sup.m+n≤C.sup.r, then C.sup.m+β+C.sup.α+m+C.sup.m+n=C.sup.r holds only in pattern 3. Further, r that satisfies C.sup.m+β+C.sup.α+m+C.sup.m+n≤C.sup.r is the maximum value+2 (r=max (m, n)+2) if (m+β), (α+n), and (m+n) are pattern 1 or pattern 2, or is the maximum value+1 (r=max (m, n)+1) otherwise. Moreover, r that satisfies C.sup.m+β+C.sup.α+m+C.sup.m+n<C.sup.r is the maximum value+2 (r=max (m, n)+2) when (m+β), (α+n) and (m+n) are pattern 1, pattern 2 or pattern 3, or is the maximum value+1 (r=max (m, n)+1) otherwise.
3. Specific Examples
(102) Next, arithmetic operation examples of software using the computer numbers described above will be described.
(103) As shown in
(104) In the multiplication, addition, and division of computer numbers shown in
(105) Arithmetic operation of Rump formula
(106) The value of f (a, b) obtained by substituting a=77617 and b=33096 into the following formula devised by S. M. Rump is evaluated.
f(x,y)=333.75y.sup.6+x.sup.2(11x.sup.2y.sup.2−y.sup.6−121y.sup.4−2)+55y.sup.8+x/2y
(107) When this arithmetic operation is executed on the IBM mainframe S/370 with different operational precision, the following results are obtained.
(108) Single precision (approximately 8 decimal digits): f (x, y)≈1.1720603 . . . .
(109) Double precision (approximately 17 decimal digits): f (x, y)≈1.17206039400531 . . . .
(110) Extended precision (approximately 34 decimal digits): f (x, y)≈1.1720603940053178 . . . .
(111) The true value is f (a, b)=−0.827386 . . . ; therefore the arithmetic operations on S/370 have wrong results whose signs are not even correct.
(112)
f1=333.75y.sup.6+x.sup.2*11x.sup.2y.sup.2+55y.sup.8+x/2y
f2=x.sup.2*y.sup.6+x.sup.2*121y.sup.4+x.sup.2*2
f=f1−f2
(113)
(114) Arithmetic Operation of Series
1/(1−x)=1−x+x.sup.2+ . . . +(−1).sup.nx.sup.n . . . (−1<x<1)
(115) A formula of Maclaurin expansion is evaluated. If terms −x+x2+ . . . +(−1).sup.nx.sup.n . . . are added sequentially to 1−1/(1−x), then the result should approach 0.
(116)
(117) Table 1 shows the results of arithmetic operations using double precision floating-point. There were no differences in errors depending on S. It is revealed that, since each term of the intermediate series is a sum of values smaller than the initial value when calculating the sum of the series, the effective digit is smaller than the initial value. It might be difficult to calculate the sum of series even when double precision floating-point numbers are used. On the other hand, in the arithmetic operation of computer numbers, the result was 0 when calculated with the effective digit being −500.
(118) TABLE-US-00001 TABLE 1 δ 1 − 1/(1 + 100 δ) x = 100 δ series sum 1/1000 0.000999001 0.001 −1.449E−16 1/10000 9.999E−05 1E−04 1.6616E−17 1/100000 9.9999E−06 1E−05 5.645E−17 1/1000000 9.99999E−07 1E−06 −6.114E−17
2. Second Embodiment
(119) Hereinafter, a second embodiment of the present technology will be described in detail in the following order with reference to the drawings. It should be noted that the present technology is not limited to the following embodiment, and various modifications may be made without departing from the scope of the present technology.
(120) 2-1. Information Processing Apparatus
(121) 2-2. Arithmetic Operation of High-Precision Computer Numbers
(122) 2-3. Example
(123) In the arithmetic operation of the multiplication of computer numbers according to the first embodiment, since the absolute effective digit increases by about three digits, the accuracy of computer numbers decreases rapidly when multiplications are repeated. Further, in additions with different absolute effective digit, since the absolute effective digit always increases by one, the accuracy surely decreases when additions are repeated.
(124) Therefore, in the second embodiment, high-precision computer numbers are used instead of computer numbers in the first embodiment. The high-precision computer numbers uses extended digits (extended high order max and extended absolute effective digit) (α+α′) instead of the digits in the computer number (high order max and absolute effective digit). Herein, a is an integer, and this value is the number of digits. α′ is a decimal digit which is a decimal of 0≤α′<1, and for example, a′ may be a/2.sup.n(0≤a<2.sup.n).
(125) When generating a high-precision computer number at first, the extended absolute effective digit is an integer. The decimal digit of the extended absolute effective digit can suppress the deterioration of precision in the case where the four arithmetic operations are repeated, and is particularly effective when there is a difference in the absolute effective digits.
2-1. Information Processing Apparatus
(126) Functional Configuration
(127)
A=a.sub.1+a.sub.2,|a.sub.1|≤C.sup.h.sup.
B=b.sub.1+b.sub.2,|b.sub.1|≤C.sup.h.sup.
(A,B)=(a,b).sub.1+(a,b).sub.2,−C.sup.γ+γ′≤(a,b).sub.2<C.sup.γ+γ′ (13)
(128) Herein a.sub.1 and b.sub.1 are definite numbers whose numerical values are definite, and a.sub.2 and b.sub.2 are uncertain numbers whose numerical values are uncertain; C denotes a radix; h.sub.a+h.sub.a′ and h.sub.b+h.sub.b′ respectively denote extended high order maxes that are minimum extended digits satisfying |a.sub.1|≤C.sup.ha+ha′ and |b.sub.1|≤C.sup.hb+hb′; h.sub.a and h.sub.b respectively denote high order maxes that are integers; h.sub.a′ and h.sub.b′ respectively denote decimal digits that are decimals of 0≤h.sub.a′<1 and 0≤h.sub.b′<1; e.sub.a+e.sub.a, e.sub.b e.sub.b′, and γ+γ′ denote extended absolute effective digits that are extended digits; e.sub.a, e.sub.b, and γ denote absolute effective digits that are integers; e.sub.a′, e.sub.b′, and r′ respectively denote decimal digits that are decimals of 0≤e.sub.a′<1, 0≤e.sub.b′<1, and 0≤r′<1; (A, B) is the arithmetic operation result of the real numbers A and B; (a, b).sub.1 is the arithmetic operation result of the definite numbers a.sub.1 and b.sub.1; and (a, b).sub.2 is the arithmetic operation result of the uncertain numbers a.sub.2 and b.sub.2.
(129) The radix C of the computer number is not particularly limited, but is preferably 2, 8, or 16. Thus, it is possible to verify the arithmetic operation of binary, octal or hexadecimal numbers used in general-purpose computers.
(130) The decimal digits h.sub.a′, h.sub.b′, e.sub.a, e.sub.b′, and r′ are preferably h.sub.a′/2.sup.n (0≤h.sub.a′<2.sup.n), h.sub.b′ 2.sup.n (0≤h.sub.b′<2m), e.sub.a′/2.sup.n (0≤e.sub.a′<2.sup.n), e.sub.b′/2.sup.n (0≤e.sub.b′<2.sup.n), r′/2.sup.n (0≤r′<2.sup.n), respectively, and n is a natural number. n is preferably 32 or less, more preferably 16 or less, and still more preferably 8 or less. Increasing n improves precision, but increases arithmetic operation cost. It should be noted that, when the decimal digits h.sub.a′, h.sub.b′, e.sub.a′, e.sub.b′, and r′ are 0, this is the computer number according to the first embodiment.
(131)
(132) The input unit 31 inputs the high-precision computer number in which real numbers A and B are respectively defined by the following formulas (11) and (12).
A=a.sub.1+a.sub.2,|a.sub.1|≤C.sup.h.sup.
B=b.sub.1+b.sub.2,|b.sub.1|≤C.sup.h.sup.
(133) Herein a.sub.1 and b.sub.1 are definite numbers whose numerical values are definite, and a.sub.2 and b.sub.2 are uncertain numbers whose numerical values are uncertain; C denotes a radix; h.sub.a+h.sub.a′ and h.sub.b+h.sub.b′ respectively denote extended high order maxes that are minimum extended digits satisfying |a.sub.1|≤C.sup.ha+ha′ and |b.sub.1|≤C.sup.hb+hb′, h.sub.a and h.sub.b respectively denote high order maxes that are integers; h.sub.a′ and h.sub.b′ respectively denote decimal digits that are decimals of 0≤h.sub.a′<1 and 0≤h.sub.b′<1; e.sub.a+e.sub.a′, e.sub.b+e.sub.b′, and γ+γ′ denote extended absolute effective digits that are extended digits; e.sub.a and e.sub.b denote absolute effective digits that are integers; e.sub.a′ and e.sub.b′ respectively denote decimal digits that are decimals of 0≤e.sub.a′<1, and 0≤e.sub.b′<1;
(134) Specifically, the input unit 31 inputs high-precision computer numbers for which a sign, high order max h that is the most significant digit, decimal digit h′ of the high order max (high order max sub), low order max l that is the least significant digit, the absolute effective digit e (effective digit), the decimal digit e′ of the absolute effective digit (effective digit sub), and number of arrays are set.
(135) The storage unit 32 stores the high-precision computer number and is a so-called buffer. The buffer is reserved based on the high order max h and the low order max l inputted by the input unit 31. For example, as shown in
(136) The arithmetic unit 33 performs an arithmetic operation by using the definite numbers a.sub.1 and b.sub.1 of the computer numbers stored in the storage unit 32, determines the extended absolute effective digit (γ+γ′) so as to satisfy the following formula (13), and stores, in the storage unit 32, computer numbers as the result of the arithmetic operation.
(A,B)=(a,b).sub.1+(a,b).sub.2,−C.sup.γ+γ′≤(a,b).sub.2<C.sup.γ+γ′ (13)
(137) Herein C denotes a radix; γ+γ′ denotes the extended absolute effective digit, γ denotes the absolute effective digit which is an integer, and γ′ denotes the decimal digit which is a decimal of 0≤γ′<1; (A, B) is an arithmetic operation result of the real numbers A and B, (a, b).sub.1 is an arithmetic operation result of the definite numbers a.sub.1 and b.sub.1, and (a, b).sub.2 is an arithmetic operation result of the uncertain numbers a.sub.2 and b.sub.2.
(138) The arithmetic operation of the high-precision computer number is performed by assuming that the real numbers A and B fall within the range of a.sub.1−C.sup.ea+ea′≤A<a.sub.1+C.sup.ea+ea′ and b.sub.1−C.sup.eb+eb′≤B<b.sub.1+C.sup.eb+eb′ respectively, so that the arithmetic operation result is also the high-precision computer number and thus satisfies the formula (13). The arithmetic operation of the high-precision computer number are divided into the definite numbers a.sub.1, b.sub.1 and the uncertain numbers a.sub.2, b.sub.2, the definite numbers a.sub.1, b.sub.1 are calculated as they are, and the extended absolute effective digit (γ+γ′) of the arithmetic operation result is determined according to various conditions. It should be noted that the effective digit k is the smallest integer that satisfies log.sub.C|definite number|−extended absolute effective digit (γ+γ′)≤k.
(139) In the arithmetic operation of addition (A+B), the arithmetic unit 33 can set the minimum extended digit satisfying the following formula (14) as the extended absolute effective digit.
C.sup.e.sup.
(140) In the arithmetic operation of multiplication (A*B), the arithmetic unit 33 can set the minimum extended digit satisfying the following formula (15) as the extended absolute effective digit.
C.sup.h.sup.
(141) In the arithmetic operation of division (B/A), the arithmetic unit evaluates the following formula (18) as the high-precision computer number defined by the following formulas (16) and (17) by using h.sub.a+h.sub.a′−1/256 and h.sub.b+h.sub.b′−1/256 instead of h.sub.a+h.sub.a′ and h.sub.b+h.sub.b′, respectively, to determine the extended absolute effective digit.
(142)
(143) In four arithmetic operations of the high-precision computer number, it is necessary to perform arithmetic operations such as |a|≤infC.sup.α+α′ (C: radix, a: definite number), supC.sup.α+α′+supC.sup.β+β′≤infC.sup.γ+γ′|(C: radix), and infC.sup.α+α′−supC.sup.β+β′≥sup.sup.γ+γ′ (C: radix). Herein, supA denotes the upper limit value of A, and infA denotes the lower limit value of A.
(144) Preferably, the arithmetic unit 33 performs arithmetic operations for |a|≤infC.sup.α+α′ by using a list table of values a for α′ when α=0, for supC.sup.α+α′+supC.sup.β+β′≤infC.sup.γ+γ′ by using a list table of values γ and γ′ for α′, β and β′ when α=0, and for infC.sup.α+α′−supC.sup.β+β′≥supC.sup.γ+γ′ by using a list table of values γ and γ′ for α′, β and β′ when α=0. Thus, the arithmetic operations of the high-precision computer numbers can be accelerated.
(145) Further, it is preferable that the arithmetic unit 33 trims the definite numbers of supC.sup.α+α′+supC.sup.β+β′≤infC.sup.γ+γ′ based on the difference of infC.sup.γ+γ′−(supC.sup.α+α′+supC.sup.β+β′). This can suppress the arithmetic operation cost when the arithmetic operations are repeated. Similarly, for infC.sup.α+α′−supC.sup.β+β′≥supC.sup.γ+γ′, trimming the definite numbers based on the difference of infC.sup.α+α′−supC.sup.β+β′−supC.sup.γ+γ′ can suppress the arithmetic operation cost when the arithmetic operations are repeated.
(146) The information processing apparatus preferably further includes a control unit 34 for converting a numerical arithmetic operation program for real numbers into a numerical arithmetic operation program for high-precision computer numbers, executing the numerical arithmetic operation program for high-precision computer numbers, and storing, in the storage unit 32, computer numbers obtained by the arithmetic operation in the arithmetic unit 33. Specifically, it is preferable to convert the floating-point representation to high-precision computer numbers. Thus, the history of the extended absolute effective digit of each variable can be maintained during the execution of the numerical arithmetic operation program for high-precision computer numbers, and it is possible to check whether the required extended absolute effective digit can be finally obtained. It is also possible to inversely calculate the absolute effective digit of the value of each variable required for an arithmetic operation of a desired extended absolute effective digit arithmetic operation.
(147) Such an information processing apparatus can calculate only the part requiring the extended absolute effective digit by the high-precision computer number arithmetic operation library, and confirm the extended absolute effective digit of the arithmetic operation result, so that validated numerics can be easily performed.
(148) Hardware Configuration
(149) The information processing apparatus may have a hardware configuration shown in
2. Arithmetic Operation of High-Precision Computer Numbers
(150) Hereinafter, the arithmetic operation method will be described with the radix of the high-precision computer number being 2 (C=2), and the decimal digits h.sub.a′, h.sub.b′, e.sub.a′, e.sub.b′, and r′ of the exponents being h.sub.a′/2.sup.8 (0≤h.sub.a′<2.sup.8), h.sub.b′/2.sup.8 (0≤h.sub.b ′<2.sup.8), e.sub.a′/2.sup.8 (0≤e.sub.a′<2.sup.8), e.sub.b′/2.sup.8 (0≤e.sub.b′<2.sup.8), and r′/2.sup.8 (0≤r′<2.sup.8), respectively. However, in order to simplify the description, the decimal digits of the exponents are also denoted as h.sub.a′, h.sub.b′, e.sub.a′, e.sub.b′, and r′.
(151) If n=8, then, for example, in the relation of supC.sup.α+α′+supC.sup.β+β′≤infC.sup.γ+γ′, the difference of infC.sup.γ+γ′−(supC.sup.α+α′+supC.sup.β+β′) does not become 0 except in a specific case, and a gap occurs. Since the arithmetic operation is not affected even if the part below this gap is truncated, trimming processing can be performed, thereby suppressing the arithmetic operation cost when the arithmetic operations are repeated.
(152)
(153) The significand stores arrays a of 32 bits, and the maximum number of the arrays a is 127. Since the array a is an unsigned integer value, the maximum size of the significand is 127*32=4064 bits. This can represent 1223 digits in the decimal notation. For example, since a light year is 9.5 trillion km<10 trillion km=10.sup.13 km=10.sup.19 mm=10.sup.25 nm, assuming the size of the universe is 137 light years, or about 100 light years, 100 light years=10.sup.10 light years=10.sup.35 nm, so that 35 digits (decimal number) are sufficient to express the size of the universe in nm. In relation to this matter, by using the high-precision computer numbers, arithmetic operation of very large order can be performed.
(154) Signed 16 bit integers can be used for the high order max (most significant digit) h, the low order max (least significant digit) 1, and the absolute effective digit (effective digit) e. A signed 16 bit integer is ±32768, so a sufficient representation is possible. It should be noted that, when the absolute effective digit e is −32768 (1000000000000000 in the binary notation), the error is regarded as 0.
(155) For the decimal digit h′ of the high order max and the decimal digit e′ of the absolute effective digit, an unsigned 8-bit integer such as an unsigned char variable in the C language can be used. When the decimal digit h′ of the high order max is 0, there are two meanings. If h=1, then h′ is 0 as it is, but if h>1, then h′=256 and the high order max is (h+1).
(156) Treatment of the Uncertain Number 0
(157) The range of an uncertain number is expressed as −C.sup.e+e′≤(uncertain number)<C.sup.e+e′ by using the extended absolute effective digit. If the absolute effective digit e is −∞, then the possible width of the uncertain number is infinitely close to 0. In the representation of the high-precision computer numbers in a computer, the minimum value of the absolute effective digit e is −32768, and this value is treated as −∞. If e.fwdarw.−∞, then the uncertain number is defined as 0. The value of the absolute effective digit e that can be taken in the arithmetic operation of the high-precision computer number is a value in the range from −32767 to 32767, and when it exceeds this range, it is treated as an arithmetic operation error.
(158) 0 in High-Precision Computer Number
(159) The representation of 0 in the high-precision computer number is defined as follows.
(160) s (sign).fwdarw.0
(161) h (high order max)−0
(162) hd (decimal digit of the high order max).fwdarw.0
(163) l (low order max).fwdarw.0
(164) e (absolute effective digit).fwdarw.calculated value
(165) ed (decimal digit of the absolute effective digit).fwdarw.calculated value
(166) array a.fwdarw.a [0]=0
(167) The size of the array a in high-precision computer numbers does not change, but the minimum value of the array a is set to 0 as an exception. In computer numbers other than 0, the 0th bit of the array a is always 1.
(168) Extended Digit Table
(169)
(170) Decimal Digit h′ of Extended High Order Max
(171)
(172) As shown in
(173) The first 9 bits of the extracted 32 bits are used to calculate a′ whose lower limit value is larger than the value obtained by dividing the first 9 bits by 256. This process is performed by referring to an 8 bit-α′ table which outputs a′ corresponding to an 8-bit integer.
(174)
(175) Addition
(176) Assume that a real number A and a real number B are the high-precision computer numbers defined by the formulas (11) and (12), respectively.
A=a.sub.1+a.sub.2,|a.sub.1|≤C.sup.h.sup.
B=b.sub.1+b.sub.2,|b.sub.1|≤C.sup.h.sup.
(177) wherein a.sub.1 and b.sub.1 are definite numbers whose numerical values are definite, and a.sub.2 and b.sub.2 are uncertain numbers whose numerical values are uncertain; C=2; h.sub.a+h.sub.a′ and h.sub.b+h.sub.b′ respectively denote extended high order maxes that are minimum extended digits satisfying |a.sub.1|≤2.sup.ha+ha′ and |b.sub.1|≤2.sup.hb+hb′; h.sub.a and h.sub.b respectively denote high order maxes that are integers, h.sub.a′ and h.sub.b′ respectively denote decimal digits that are h.sub.a′/256 and h.sub.b′/256; e.sub.a+e.sub.a′ and e.sub.b+e.sub.b′ denote extended absolute effective digits; e.sub.a and e.sub.b denote absolute effective digits that are integers; and e.sub.a′ and e.sub.b′ respectively denote decimal digits which are decimals of e.sub.a′/256 and e.sub.b′/256.
(178) A+B is expressed by the following formulas (21) to (23).
A+B=a.sub.1+a.sub.2+b.sub.1+b.sub.2=a.sub.1+b.sub.1+a.sub.2+b.sub.2 (21)
(a.sub.1+b.sub.1)−C.sup.e.sup.
C.sup.e.sup.
(179) The minimum extended digit that satisfies the formula (23) is determined. The values such as C.sup.ea+ea′, C.sup.eb+eb′ and C.sup.γ+γ′ are expressed by high-precision computer numbers, and supC.sup.ea+ea′+supC.sup.eb+eb′≤infC.sup.γ+γ′ is determined. Herein, supA denotes the upper limit value of A, and infA denotes the lower limit value of A.
(180) The case when the uncertain number is 0
(181) If the uncertain number b.sub.2 of the real number B is 0 in A+B shown in the formula (21), then the inequality shown in the formula (22) becomes the following formula (24).
(a.sub.1+b.sub.1)−C.sup.e.sup.
(182) Further, if the uncertain number a.sub.2 of the real number A is 0, the uncertain number of A+B becomes 0. In this case, the trimming of the lower digits is not performed.
(183) Extended Digit Addition Table
(184) The arithmetic operation of supC.sup.ea+ea′+supC.sup.eb+eb′≤infC.sup.γ+γ′ as shown in the formula (23) uses an extended digit addition table which outputs the minimum extended digit (γ+γ′) satisfying C.sup.α+C.sup.β+β′≤C.sup.γ+γ for C.sup.α and C.sup.β+β′. Herein, 0≥β, and α′ and β′ are α′/2.sup.8 and β′/2.sup.8, respectively.
(185)
(186)
(187) It is understood that in the definite number, the position of γ+β is the position of the original digits, and the lower digits that have been shifted from this position by the digits of the difference may be truncated. That is, the difference is absorbed into the offset.
(188)
(189)
(190) In C.sup.α+α′+C.sup.β+β′≤C.sup.γ+γ′, if (α+α′) coincides with (β+β′), then γ=2α, γ′=α′, and the offset value becomes ∞. This is actually expressed as 65535 (bit16max in the program), which is the maximum value of 16 bit integers. In this case, since the difference becomes 0, the digits of the calculated value becomes large. In the processing in this case, the same processing as the division described later is performed, so that (γ+γ′)+1/256 is set to the extended absolute effective digit and the digits of the calculated value is restricted by using the difference between C.sup.γ+γ′ and C.sup.γ+γ+1/256.
(191) The extended digit addition table is classified by 0 (integers between 0 and −32) and outputs γ, γ′ and the digits of the difference for α′ and β′.
(192) (1) If β=0, α′≥β′, then the extended digit (γ+γ′) satisfying supC.sup.α′+supC.sup.β+β′≤infC.sup.γ+γ′ is determined, and the difference (infC.sup.γ+γ′−(supC.sup.α′+supC.sup.β+β′)) is evaluated to create a table of α′, β′, γ, γ′, and the digits of the difference.
(193) (2) If β is −1 to −9, then the table of α′, β′, γ, γ′, and the digits of the difference is created by determining the extended digit (γ+γ′) satisfying supC.sup.α′+supC.sup.β+β′≤infC.sup.γ+γ′ and evaluating the difference (infC.sup.γ+γ′−(supC.sup.α′+supC.sup.β+β′)) as in (1).
(194) (3) If β is −10 to −32, then γ=−0 and γ′=α′+1, regardless of the value of β′. However, if α′=255, then γ=+1 and γ′=0. By evaluating the difference (infC.sup.γ+γ′−(supC.sup.α′+supC.sup.β+β′)), the table of α′, β′, γ, γ′, and the digits of the difference is created.
(195) (4) If β is −33 or less, then γ=−β and γ′=α′+1, regardless of the value of β′ as in (3). However, if α′=255, then γ=+1 and γ′=0. By evaluating the difference (infC.sup.γ+γ′−(supC.sup.α′+supC.sup.β+β′)), the table of α′, β′, γ, γ′, and the digits of the difference is created. The digits of the difference is 9 or less.
(196) (5) If β=0 and α′=β′, then the digits of the difference is 65535. This is the maximum value of unsigned single-precision integers. In this case, trimming of the array a is not performed.
(197)
2.sup.2/256+2.sup.0=2.0054299011 . . .
2.sup.1+2/256=2.010859802 . . .
(198) Further, since 2.sup.1+1/256=2.005422550 . . . , 2.sup.1+2/256 is the minimum value.
(199) Further, the difference (2.sup.1+2/256−(2.sup.2/256+2.sup.0)) is 0.0054299009 . . . . Multiplying this value by 256 yields 1.3900546304. This indicates that when the value of the difference is shifted to the left by 8 bits, the head position becomes the position of γ. Therefore, it is shifted to the right by 8 bits from γ so as to set the digits of the difference less than 9 to be 0.
(200) Multiplication
(201) Assume that a real number A and a real number B are the high-precision computer numbers defined by the formulas (11) and (12), respectively.
A=a.sub.1+a.sub.2,|a.sub.1|≤C.sup.h.sup.
B=b.sub.1+b.sub.2,|b.sub.1|≤C.sup.h.sup.
(202) Herein a.sub.1 and b.sub.1 are definite numbers whose numerical values are definite, and a.sub.2 and b.sub.2 are uncertain numbers whose numerical values are uncertain; C=2; h.sub.a+h.sub.a′ and h.sub.b+h.sub.b′ respectively denote extended high order maxes that are minimum extended digits satisfying |a.sub.1|≤2.sup.ha+ha′ and |b.sub.1|≤2.sup.hb+hb′; h.sub.a and h.sub.b respectively denote high order maxes that are integers, h.sub.a′ and h.sub.b′ respectively denote decimal digits that are h.sub.a′/256 and h.sub.b′/256; e.sub.a+e.sub.a′ and e.sub.b+e.sub.b′ denote extended absolute effective digits; e.sub.a and e.sub.b denote absolute effective digits that are integers; and e.sub.a′ and e.sub.b′ respectively denote decimal digits which are decimals of e.sub.a′/256 and e.sub.b′/256.
(203) A*B is expressed by the following formula (31).
A×B=(a.sub.1+a.sub.2)×(b.sub.1+b.sub.2)=a.sub.1b.sub.1+a.sub.2b.sub.1+a.sub.1b.sub.2+a.sub.2b.sub.2 (31)
(204) Hereinafter, h.sub.a, h.sub.a′, h.sub.b and h.sub.b′ are denoted by m.sub.a, m.sub.a′, m.sub.b and m.sub.b′, respectively.
(205) (1) If A>0 and B>0, then a.sub.1≤C.sup.ma+ma′, b.sub.1≤C.sup.mb+mb′, and the following formulas (32) to (34) are satisfied.
−C.sup.e.sup.
−C.sup.m.sup.
−C.sup.e.sup.
(206) The formulas (32) to (34) can be rearranged to be expressed as the following formulas (35) to (37).
−C.sup.m.sup.
−C.sup.m.sup.
−C.sup.e.sup.
(207) According to the following formula (38), the formula (31) is expressed as the following formula (39), and the extended digit satisfying the formula (40) is just to be determined.
C′=C.sup.m.sup.
a.sub.1b.sub.1−C′<A×B<a.sub.1b.sub.1+C′ (39)
C′≤C.sup.r+r′ (40)
(208) In order to determine the extended digit, the minimum extended digit satisfying the following formula (41) is just to be determined.
C.sup.α+α′+C.sup.β+β′≤C.sup.γ+γ′ (41)
(209) If the uncertain number b.sub.2 of the real number B is 0, then the formula (38) becomes the following formula (42).
C.sup.γ+γ′=C.sup.m.sup.
(210) If both the uncertain numbers of the real numbers A and B are 0, then γ.fwdarw.−∞, and the value of the absolute effective digit e is −32768 (1000000000000000). If the uncertain number of either of the real numbers A and B is 0, then the trimming processing of the lower digit is not performed.
(211) (2) If A>0 and B<0, then a.sub.1≤C.sup.ma+ma′, −b.sub.1≤C.sup.mb+mb′, and the following formulas (43) to (45) are satisfied.
−C.sup.e.sup.
−C.sup.m.sup.
−C.sup.e.sup.
(212) The formulas (43) to (45) can be rearranged to be expressed as the following formulas (46) to (48).
−C.sup.m.sup.
−C.sup.m.sup.
−C.sup.e.sup.
(213) According to the following formula (49), the formula (31) is expressed as the following formula (50), and the extended digit satisfying the formula (51) is just to be determined.
C′=C.sup.m.sup.
a.sub.1b.sub.1−C′<A×B<a.sub.1b.sub.1+C′ (50)
C′≤C.sup.r+r′ (51)
(214) The case of the uncertain number being 0 is the same as the case of (1) A>0 and B>0.
(215) (3) If A<0 and B>0, then −a.sub.1≤C.sup.ma+ma′, b.sub.1≤C.sup.mb+mb′, and the following formulas (52) to (54) are satisfied.
−C.sup.e.sup.
−C.sup.m.sup.
−C.sup.e.sup.
(216) The formulas (52) to (54) can be rearranged to be expressed as the following formulas (55) to (57).
−C.sup.m.sup.
−C.sup.m.sup.
−C.sup.e.sup.
(217) According to the following formula (58), the formula (31) is expressed as the following formula (59), and the extended digit satisfying the formula (60) is just to be determined.
C′=C.sup.m.sup.
a.sub.1b.sub.1−C′<A×B<a.sub.1b.sub.1+C′ (59)
C′≤C.sup.r+r′ (60)
(218) The case of the uncertain number being 0 is the same as the case of (1) A>0 and B>0.
(219) (4) If A<0 and B<0, then a.sub.1≤C.sup.ma+ma′, −b.sub.1≤C.sup.mb+mb′, and the following formulas (61) to (63) are satisfied.
−C.sup.e.sup.
−C.sup.m.sup.
−C.sup.e.sup.
(220) The formulas (61) to (63) can be rearranged to be expressed as the following formulas (64) to (66).
−C.sup.m.sup.
−C.sup.m.sup.
−C.sup.e.sup.
(221) According to the following formula (67), the formula (31) is expressed as the following formula (68), and the extended digit satisfying the formula (69) is just to be determined.
C′=C.sup.m.sup.
a.sub.1b.sub.1−C′<A×B≤a.sub.1b.sub.1+C′ (68)
C′<C.sup.r+r′ (69)
(222) The case of the uncertain number being 0 is the same as the case of (1) A>0 and B>0.
(223) Division
(224) Assume that a real number A and a real number B are the high-precision computer numbers defined by the formulas (11) and (12), respectively.
A=a.sub.1+a.sub.2,|a.sub.1|≤C.sup.h.sup.
B=b.sub.1+b.sub.2,|b.sub.1|≤C.sup.h.sup.
(225) Herein a.sub.1 and b.sub.1 are definite numbers whose numerical values are definite, and a.sub.2 and b.sub.2 are uncertain numbers whose numerical values are uncertain; C=2; h.sub.a+ha and h.sub.b+h.sub.b′ respectively denote extended high order maxes that are minimum extended digits satisfying |a.sub.1|≤2.sup.ha+ha′ and |b.sub.1|≤2.sup.hb+hb′; h.sub.a and h.sub.b respectively denote high order maxes that are integers, h.sub.a′ and h.sub.b′ respectively denote decimal digits that are h.sub.a′/256 and h.sub.b′/256; e.sub.a+e.sub.a′ and e.sub.b+e.sub.b′ denote extended absolute effective digits; e.sub.a and e.sub.b denote absolute effective digits that are integers; and e.sub.a′ and e.sub.b′ respectively denote decimal digits which are decimals of e.sub.a′/256 and e.sub.b′/256.
(226) For B/A, h.sub.a+h.sub.a′−1/256 and h.sub.b+h.sub.b′−1/256 are used instead of the extended high order max (h.sub.a+h.sub.a′, h.sub.b+h.sub.b′), and the high-precision computer number is defined by the following formulas (71) and (72).
(227)
(228) B/A is expressed by the following formula (73).
(229)
(230) The formula (73) is evaluated.
(231)
(232) This formula (74) assumes that C.sup.ha+ha′-1/256−C.sup.ea+ea′ will not be 0. A value less than or equal to 0 means a possibility of A being 0, so division does not hold.
(233)
(234) Determining the extended digit (H.sub.a+H.sub.a′) satisfying the formula (75) results in the following formula (76). H.sub.a+H.sub.a′ can be determined by referring to an extended digit subtraction table which outputs the maximum extended digit (γ+γ′) satisfying C.sup.α′−C.sup.β+β′≥C.sup.γ+γ for C.sup.α′ and C.sup.β+β′.
(235)
(236) Next, b.sub.2a.sub.1−b.sub.1a.sub.2 is evaluated for each terms. If a.sub.1>0, then the following formula (77) holds; if a.sub.1<0, then the following formula (78) holds; if b.sub.1>0, then the following formula (79) holds; and if b.sub.1<0, then the following formula (80) holds.
a.sub.1>0,−C.sup.e.sup.
a.sub.1<0,C.sup.e.sup.
b.sub.1>0,−C.sup.e.sup.
b.sub.1<0,C.sup.e.sup.
(237) Therefore, if a.sub.1>0 and b.sub.1>0, then the following formula (81) holds; if a.sub.1>0 and b.sub.1<0, then the following formula (82) holds; if a.sub.1<0 and b.sub.1>0, then the following formula (83) holds; and if a.sub.1<0 and b.sub.1<0, then the following formula (84) holds.
a>0,b.sub.1>0,−C.sup.e.sup.
a>0,b.sub.1<0,−C.sup.e.sup.
a<0,b.sub.1>0,−C.sup.e.sup.
a<0,b.sub.1<0,−C.sup.e.sup.
C.sup.e.sup.
(238) Determining the extended digit (H.sub.ab+H.sub.ab′) satisfying the formula (85) results in the following formula (86).
−C.sup.H.sup.
(239) Therefore, the second term of the formula (73) can be denoted by the following formula (87).
(240)
(241) Next, the digit of b.sub.1/a.sub.1 will be described. Since the digit of b.sub.1/a.sub.1 can be rounded to the difference from which the extended digit (H.sub.a+H.sub.a′) satisfying the formula (85) is obtained in the arithmetic operation of the uncertain part, the digits up to the rounding error is just to be determined. The rounding will be described using the following formulas (88) to (90).
(242)
(243) According to the formula (90), the digits after c can be rounded.
(244) Next, a description will be given of a process when the uncertain number is 0, that is, when e.sub.a and e.sub.b are −∞.
(245) (1) If e.sub.a=−∞
(246) then, in the formula (74) for evaluating the above formula (73), determining the extended digit (H.sub.a+H.sub.a′) satisfying the formula (75) results in the following formula (91).
(247)
(248) Further, determining the extended digit (H.sub.ab+H.sub.ab′) satisfying the formula (85) results in the following formula (92).
(H.sub.ab+H.sub.ab′)=(h.sub.a+e.sub.b+h.sub.a′+e.sub.b′) (92)
(249) Therefore, according to the formula (93), the inequality of the formula (88) is rewritten as the formula (94).
(250)
(251) In the formula (94), the formula (95) and the formula (96) are used. Then, using the difference between C.sup.α+α′ and C.sup.β+β′, trimming processing of the extended absolute effective digit is performed.
(252)
(253)
(254) If e.sub.b=−∞, then the error is zero. Therefore, it is necessary to specify the effective digit from the outside.
(255) (2) If e.sub.a≠−∞, e.sub.b=−∞,
(256) then determining the extended digit (H.sub.ab+H.sub.ab′) satisfying the formula (85) results in the following formula (97).
(H.sub.ab+H.sub.ab′)=(h.sub.b+e.sub.a+h.sub.b′+e.sub.a′) (97)
(257) Therefore, according to the formula (98), the inequality of the formula (88) is rewritten as the formula (99).
(258)
(259) Then, as in the case of (1) e.sub.a=−∞, the trimming process of the extended absolute effective digit is performed by using the difference between C.sup.α+α′ and C.sup.β+β′.
(260) Extended Digit Subtraction Table
(261) An extended digit subtraction table for outputting the maximum extended digit (γ+γ′) satisfying C.sup.α′−C.sup.β+β′≥C.sup.γ+γ′ for C.sup.α′ and C.sup.β+β′ will be described. Herein, 0≥β, and α′ and β′ are α′/2.sup.8 and β′/2.sup.8, respectively. If 0=β, then α′≥β′. By using the extended digit subtraction table, the arithmetic operation can be accelerated, for example, in determining the extended digit (H.sub.a+H.sub.a′) satisfying the formula (75).
(262)
(263) If β=0 and α′=β′, then C.sup.α′−C.sup.β+β′≥C.sup.−∞. Possible values of γ are a range of single-precision integers, and the smallest value is treated as −∞. The value is “1000000000000000” in a 2 bit representation.
(264)
3. Specific Examples
(265) An example of arithmetic operation using the high-precision computer number will be described below. The radix is 2 (C=2), and extensions (decimal part) of the exponents h.sub.a′, h.sub.b′, e.sub.a′, e.sub.b′, and r′ are h.sub.a′/2.sup.8 (0≤h.sub.a′<2.sup.8), h.sub.b′/2.sup.8 (0≤h.sub.b′<2.sup.8), e.sub.a′/2.sup.8 (0≤e.sub.a′<2.sup.8), e.sub.b′/2.sup.8 (0≤e.sub.b′<2.sup.8), and r′/2.sup.8 (0≤r′<2.sup.8), respectively.
(266) In the arithmetic operation examples, double precision floating-point representations were converted into high-precision computer numbers, and arithmetic operations were performed using the high-precision computer numbers. The conversion of a double precision floating-point representation into a high-precision computer number is performed by checking the sign and exponent of the double precision floating-point representation and copying the significand.
(267)
(268) e (−57) indicates that the effective digit is valid up to −57th digit in the binary notation. This is represented as log.sub.10 2=0.301029995663981 . . . in the decimal notation, so that log.sub.10 2.sup.−57=−17.1587097528469281 . . . , which means up to −17 digits in the decimal notation are valid (see line 4 in
(269) To the arrays a [1] and a [0], the significand (fraction) of the double precision floating-point representation is directly substituted. It should be noted that the high-precision computer number is right-aligned. Since the value of “0.1” in the decimal notation cannot be exactly expressed in a computer, it is represented as the value in line 4.
(270) Substituting the significand denoting “0.1” in the decimal notation in double precision floating-point representation into the array of the high-precision computer number, the following array (A) is obtained. This is a concatenation of the array a [1] and the array a [0] of the high-precision computer number shown in
(271) 0000000000001100110011001100110011001100110011001100110011001101 (A)
(272) The first 1 bit is the high order max h and can be computed from the double precision floating-point representation. The extension (decimal digit) h′ of the high order max can be determined by referring to the above-mentioned extended digit table. The first 9 bits of the array (A) are “110011001” and the following 8 bits are “10011001”. By using these 8 bits, the extension h′ of the high order max can be determined by referring to the above-mentioned “8 bit-α′ table”. Since “10011001” is 153 in the decimal notation, the 153rd value (174) of the “8 bit-α′ table” (the first value is 0th) is read.
(273) 2.sup.174/256 is calculated as 1.601792755682693353793846241591. On the other hand, the value of the first 32 bits of the array (A) is 3435973836, and by dividing this value by 2.sup.31, the first bit of the array (A) can be defined as 0th digit of the binary notation. It is understood that, since this value is 1.59999999962747097015380859375, setting the first 1 digit of the array (A) to be the 0th digit of the binary notation can suppress the array (A) to 2.sup.174/256. Since the beginning of the array (A) is −4th digit in the binary notation, 0.1≤2.sup.−4+174/256 is satisfied.
(274) When actually calculated, the following arithmetic operation results are obtained, and it is found that 2.sup.−4+173/256≤0.1≤2.sup.−4+174/256.
2.sup.−4+174/256=0.10011204723016833461211539009943
2.sup.−4+173/256=0.09984134986928919364142626838804
(275) Addition
(276)
(277) In the high-precision computer number shown in
2.sup.67/256=1.19890616707438048177 . . .
2.sup.68/256=1.2021567314527031420 . . .
(278) The extended absolute effective digits e (−53) and e′ (23) of the arithmetic operation result indicate that the extended absolute effective digits e (−57) and e′ (0) of the high-precision computer number of “0.1” and the extended absolute effective digits e (−53) and e′ (0) of the high-precision computer number of “1.1” are used and that 2.sup.−53+22/256≤2.sup.−57+2.sup.−53≤2.sup.−53+23/256. Multiplying both sides by 2.sup.53 yields the following arithmetic operation result, to reveal that 2.sup.22/256≤2.sup.−4+1≤2.sup.23/256.
2.sup.22/256=1.061377227289262 . . .
2.sup.−4+1=1.0625
2.sup.23/256=1.0642549128844645 . . .
(279) Next, the offset value is determined.
0.1=0.10000000000000000555111512312+α|α|≤2.sup.−57
1.1=1.1000000000000000888178419700+β|β|≤2.sup.−53
1.2=1.20000000000000009436895709312+α+β
(280) Since α+β≤2.sup.−57+2.sup.−53=2.sup.−53 (2.sup.−4+1), the minimum γ′ satisfying (1+2.sup.−4)≤2.sup.γ′/256 is determined. In (1+2.sup.−4), 1=2°, 2.sup.−4=2.sup.−4, α′=0, β=−4, and β′=0, so that γ′ and the digits of the difference (offset value) are obtained from these values by referring to the above-mentioned extended digit addition table.
(281) Using the value (γ′=23) in the extended digit addition table, (1+2.sup.−4)≤2.sup.23/256; actual calculation of both sides yields the following arithmetic operation result.
1+2.sup.−4=1.0625
2.sup.23/256=1.0642549128844645497886112570016
(282) The offset value of the extended digit addition table is 10. The arithmetic operation result of the difference is as follows.
2.sup.23/256−(1+2.sup.−4)=0.00175491288446454978861125700158
(283) The arithmetic operation result obtained by multiplying this by 2.sup.10 is as follows. 1.7970307936916989835379271696181
(284) Digits after the decimal point in the binary notation are “0.0000000001********”, and the part of “1********” or less can be truncated. This indicates that 10 digits or less from −53rd digit in the binary notation may be truncated. Thus, the low order max l is −55th digit, which does not exceed the absolute effective digit e (−53). Originally, −55th digits or less in the binary notation are 0s.
(285)
(286)
(287) In the iterative arithmetic operation of computer numbers according to the first embodiment, since the value of the absolute effective digit e is increased by about 3 for each arithmetic operation, the significance of the effective digit is lost when the arithmetic operations are performed 100 times, but this can be improved by the high-precision computer number.
(288)
(289) Multiplication
(290)
(291) In the high-precision computer number shown in
2.sup.−4+208/256=0.1097657600233312 . . .
2.sup.−4+209/256=0.11006336518984898 . . .
(292) In the high-precision computer number shown in
C′=C.sup.m.sup.
C′=2.sup.0−57+36/256+0+2.sup.−4−53+174/256+0+2.sup.−57−53+0+0
C′=2.sup.−57+36/256+2.sup.−57+174/256+2.sup.−110
2.sup.57C′=2.sup.36/256+2.sup.174/256+2.sup.−53
(293) When actually calculated, the following arithmetic operation results are obtained, and it is found that 2.sup.57 (2.sup.−56+111/256)≤2.sup.57C′≤2.sup.57 (2.sup.−56+112/256). It can be also seen that e (−56) and e′ (112) are the minimum extended digits satisfying C′≤C.sup.γ+γ′, and are the extended effective digits (e+e′/256).
2.sup.57(2.sup.−56+112/256)=2.708511093873 . . .
2.sup.57C′=2.704175338990 . . .
2.sup.57(2.sup.−56+111/256)=2.701187431 . . .
(294) Since the offset value is 9, the low order max l is (−63), which is smaller than e (−56). It is assumed that two or more 0s are consecutive after this digit.
(295)
(296)
(297) The effective digits are compared between the arithmetic operation result shown in
FIG. 33: (0+36/256)−(−53)=53+36/256
FIG. 42: (−2326+202/256)−(−2372+1/256)=43+203/256
(298) Although the effective digit is reduced by about 6+89/256, they are improved compared with the computer numbers in the first embodiment.
(299) Division
(300)
(301)
(302)
(303) In the above formula (75), h.sub.a+h.sub.a′−1/256, e.sub.a+e.sub.a′ are calculated as follows.
h.sub.a+h.sub.a′−1/256=−4+173/256
e.sub.a+e.sub.a′−57
(304) Then, these numerical values are used to calculate H.sub.a+H.sub.a by referring to the extended digit subtraction table. Because e.sub.a is smaller than ha, the value is obtained by subtracting 1/256 from h.sub.a+h.sub.a−1/256.
H.sub.a+H.sub.a′=−4+172/256
(305)
(306) In the above formula (99), (h.sub.b+e.sub.a−H.sub.a−h.sub.a)+(h.sub.b′+e.sub.a′−H.sub.a′−h.sub.a′+1/256) is −51+168/256, which is the arithmetic operation result of the extended absolute effective digits e (−51) and e′ (169). Since the offset value is 8, the extended absolute effective digit is larger than the low order max l (−53).
(307)
(308)
(309) In the above formula (75), h.sub.a+h.sub.a′−1/256 and e.sub.a+e.sub.a′ are calculated as follows.
h.sub.a+h.sub.a′−1/256=−4+173/256
e.sub.a+e.sub.a′=−57
(310) Then, these numerical values are used to calculate H.sub.a+H.sub.a′ by referring to the extended digit subtraction table. Because e.sub.a is smaller than ha, the value is obtained by subtracting 1/256 from h.sub.a+h.sub.a′−1/256.
H.sub.a+H.sub.a′=−4+172/256
(311)
(312) In the above formula (89), the following two values of the exponent are calculated.
(e.sub.b−H.sub.a)+(e.sub.b′−H.sub.a′+1/256)=−53+85/256
(h.sub.b+e.sub.a−H.sub.a−h.sub.a)+(h.sub.b′+e.sub.a′−H.sub.a′−h.sub.a′+1/256)=−53+85/256
(313) Since there is no difference between these two values, in the method in which the offset value is ∞, the extended absolute effective digit of the arithmetic operation result is −52+85/256 plus 1/256 (e (−52), e′ (86)).
DESCRIPTION OF REFERENCE CHARACTERS
(314) 11 input unit, 12 storage unit, 13 arithmetic unit, 14 control unit, 21 CPU, 22 GPU, 23 ROM, 24 RAM, 25 operation input unit, 26 storage, 27 input/output interface, 31 input unit, 32 storage unit, 33 arithmetic unit, 34 control unit