Number format pre-conversion instructions
10331404 ยท 2019-06-25
Assignee
Inventors
- Jorn Nystad (Trondheim, NO)
- Andreas Due Engh-Halstvedt (Trondheim, NO)
- Simon Alex Charles (Cambridge, GB)
Cpc classification
G06F7/483
PHYSICS
G06F5/00
PHYSICS
G06F9/30025
PHYSICS
G06F7/499
PHYSICS
International classification
G06F7/483
PHYSICS
G06F9/30
PHYSICS
G06F5/00
PHYSICS
G06F7/499
PHYSICS
Abstract
Apparatus for processing data includes processing circuitry 16, 18, 20, 22, 24, 26 and decoder circuitry 14 for decoding program instructions. The program instructions decoded include a floating point pre-conversion instruction which performs round-to-nearest ties to even rounding upon the mantissa field of an input floating number to generate an output floating point number with the same mantissa length but with the mantissa rounded to a position corresponding to a shorter mantissa field. The output mantissa field includes a suffix of zero values concatenated the rounded value. The decoder for circuitry 14 is also responsive to an integer pre-conversion instruction to quantise and input integer value using round-to-nearest ties to even rounding to form an output integer operand with a number of significant bits matched to the mantissa size of a floating point number to which the integer is later to be converted using an integer-to-floating point conversion instruction.
Claims
1. Apparatus for processing data comprising: processing circuitry configured to perform processing operations including integer processing operations having 2.sup.N-bit integer operands and floating point processing operations having 2.sup.M-bit floating point operands with a Y-bit mantissa field, where N, M and Y are positive integers and N>M; and decoder circuitry configured to decode program instructions to generate control signals to control said processing circuitry to perform said processing operations; wherein said decoder circuitry is configured to decode an integer pre-conversion instruction to generate control signals to control said processing circuitry to perform a data processing operation comprising: receiving a 2.sup.N-bit input integer operand; detecting a bit position of a most significant non-zero bit within the 2.sup.N-bit input integer operand; and quantizing said 2.sup.N-bit input integer operand using a round-to-nearest ties to even rounding mode to form a 2.sup.N-bit output integer operand comprising: (Y+1) significant bits starting from the detected bit position, and trailing bits trailing the (Y+1) significant bits set to zero.
2. Apparatus as claimed in claim 1, wherein said decoding circuitry is configured to decode a first integer-to-floating point conversion instruction to generate control signals to control said processing circuitry to perform a data processing operation comprising: converting a 2.sup.N-bit input integer operand to a corresponding value expressed as a 2.sup.(N1)-bit output floating point operand.
3. Apparatus as claimed in claim 2, wherein N=6, M=5 and Y=23.
4. Apparatus as claimed in claim 2, wherein N=5, M=4 and Y=10.
5. Apparatus as claimed in claim 1, wherein said decoding circuitry is configured to decode a second integer-to-floating point conversion instruction to generate control signals to control said processing circuitry to perform a data processing operation comprising: converting a 2.sup.N-bit input integer operand to a corresponding value expressed as 2.sup.(N2)-bit output floating point operand.
6. A computer program product storing in a non-transitory form a computer program for controlling a computer to provide virtual machine execution environment for program instructions corresponding to an apparatus as claimed in claim 1.
7. Apparatus as claimed in claim 5, wherein N=6, M=4 and Y=10.
8. Apparatus for processing data comprising: processing means for performing processing operations including integer processing operations having 2.sup.N-bit integer operands and floating point processing operations having 2.sup.M-bit floating point operands with a Y-bit mantissa field, where N, M and Y are positive integers and N>M; and decoder means for decoding program instructions to generate control signals to control said processing means to perform said processing operations; wherein said decoder means is configured to decode an integer pre-conversion instruction to generate control signals to control said processing means to perform a data processing operation comprising: receiving a 2.sup.N-bit input integer operand; detecting a bit position of a most significant non-zero bit within the 2.sup.N-bit input integer operand; and quantizing said 2.sup.N-bit input integer operand using a round-to-nearest ties to even rounding mode to form a 2.sup.N-bit output integer operand comprising: (Y+1) significant bits starting from the detected bit position, and trailing bits trailing the (Y+1) significant bits set to zero.
9. A method of processing data comprising the steps of: performing processing operations including integer processing operations having 2.sup.N-bit integer operands and floating point processing operations having 2.sup.M-bit floating point operands with a target Y-bit mantissa field, where N, M and Y are positive integers and N>M; and decoding program instructions to generate control signals to control said processing operations; wherein said steps of decoding decoding an integer pre-conversion instruction to generate control signals to control a data processing operation comprising: receiving a 2.sup.N-bit input integer operand; detecting a bit position of a most significant non-zero bit within the 2.sup.N-bit input integer operand; and quantizing said 2.sup.N-bit input integer operand using a round-to-nearest ties to even rounding mode to form a 2.sup.N-bit output integer operand comprising: (Y+1) significant bits starting from the detected bit position, and trailing bits trailing the (Y+1) significant bits set to zero.
10. A method as claimed in claim 9, wherein said step of decoding decodes a first integer-to-floating point conversion instruction to generate control signals to control a data processing operation comprising: converting a 2.sup.N-bit input integer operand to a corresponding value expressed as a 2.sup.(N1)-bit output floating point operand.
11. A method as claimed in claim 10, wherein N=6, M=5 and Y=23.
12. A method as claimed in claim 10, wherein N=5, M=4 and Y=10.
13. A method as claimed in claim 9, wherein said step of decoding decodes a second integer-to-floating point conversion instruction to generate control signals to control a data processing operation comprising: converting a 2.sup.N-bit input integer operand to a corresponding value expressed as a 2.sup.(N2)-bit output floating point operand.
14. A method as claimed in claim 13, wherein N=6, M=4 and Y=10.
15. A computer program product storing in a non-transitory form a computer program for controlling a computer, said computer program including an integer pre-conversion instruction for controlling said computer to operate in accordance with the method of claim 9.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
DESCRIPTION OF EXAMPLE EMBODIMENTS
(9)
(10) It will be further understood that while the example of
(11)
(12) The input mantissa field has a bit length of 52. This is rounded to an intermediate mantissa field having a bit length of 10 and occupying bit positions 51 to 42 in the output floating point operand 30. This intermediate mantissa field is been subject to round-to-nearest ties to even rounding in respect of the least significant 42 bits of the input mantissa field. This rounding feeds into the least significant bit of the intermediate mantissa field and can carry through into the more significant bits in the normal way.
(13) A suffix value comprising 42 0's is concatenated with the intermediate mantissa field at its least significant end to form an output mantissa field which still contains 52 bits in accordance with the floating point number format of the output floating point operand 30.
(14) However, the mantissa field of the output floating point operand 30 has already been rounded down to an effective 10-bit length in preparation for use within a 16-bit floating point number. If the output mantissa field of the output floating point number 30 is subject to multiple subsequent rounding operations, while providing the 16-bit float point number then these will merely discard the trailing zero values and will not alter the intermediate mantissa field within bit positions [51:42].
(15)
(16) A second floating point conversion instruction is then performed to change the data width from 32 to 16 bits. The final 16-bit floating point number has a 10-bit mantissa field. A second round-to-nearest ties to even rounding is performed when converting from the 23-bit mantissa of the 32-bit floating point number down to the 10-bit mantissa of the 16-bit floating point number. The two rounding operations illustrated in
(17) It will be appreciated that the operations illustrated above and elsewhere in this description in relation to particular instructions may be performed using different steps and/or with a different ordering of steps. Such different steps and/or orderings may nevertheless produce a result with the same value as given by the steps described herein. Such alternatives are encompassed within the scope of the present techniques.
(18)
(19) At step 34 the 64-bit floating point input operand is received. At step 36 the 52-bit mantissa of the input operand is converted in to a 10-bit intermediate mantissa using round-to-nearest ties to even rounding in respect of the discarded bits of the mantissa. At step 38 the 10-bit intermediate mantissa is concatenated with a suffix of 42 zero values to generate a 52-bit output mantissa field of the correct size for a 64-bit floating point number. At step 40 the 64-bit floating point operand is formed using the sign bit and the exponent field of the original 64-bit floating point input operand together with the 52-bit mantissa as formed at step 48. This output floating point operand is the result operand for the floating point pre-conversion instruction.
(20) If the 16-bit floating point number will be a denormal floating point number (as detected from the value of the input exponent field), then the rounding point in the mantissa of the input 64-bit floating point number is left shifted to take account of the extra bits of the mantissa that will be discarded when the denormal 16-bit floating point number is generated.
(21)
(22) The target mantissa field is a 23-bit mantissa field. However, as is known in this technical field, the floating point format assumes a leading bit value of 1 to the left of the decimal point of the mantissa and accordingly this leading 1 maps to the bit at bit position A. The bit which form the mantissa field extend from bit position (A1) to bit position (A23).
(23) The bits which are less significant than the bits which will form the mantissa field extend from bit position (A24) down to bit position 0. These least significant bits form the input together with the least significant bit of the potential mantissa field (bit position (A23)) to a round-to-nearest ties to even operation which produces a rounding bit which is summed in to bit position (A23). This rounding bit may ripple up through the remainder of the bits which will form the mantissa field as is known in this technical field.
(24) The output of the integer pre-conversion instruction is also a 64-bit integer, but in this case the integer value has been quantised using a round-to-nearest ties to even rounding mode such that is has a number of significant bits equal to the mantissa field bit width for the target floating point number plus one (to account for the implied leading 1 in the mantissa field). The bits of the output integer in the range (A24) to 0 may be all set to a value of zero.
(25) When the 64-bit integer to 32-bit floating point conversion is performed, the mantissa field is formed from the least significant 23 bits of the significant bits within the integer pre-conversion instruction result, i.e. dropping the implied 1. The integer-2-floating point version instruction sets the exponent to a value equal to (A+127) to account for the known exponent offset used within 32-bit floating point number according to the IEEE 754 Standard. The sign bit in this example is set to zero.
(26) It is possible that the integer pre-conversion instruction may also be performed in a manner which will deal with 2's complement signed integers. In this case, the integer pre-conversion instruction may first perform a 2's complement negation (invert and add one) before performing the leading non-zero bit detection step, performing the quantisation and then performing another 2's complement negation to restore the sign. Such modifications to encompass integer pre-conversion instructions operating upon signed values are encompassed within the present techniques.
(27)
(28) When an integer-to-floating point conversion instruction is executed to convert the input 64-bit to a 16-bit floating point value, then the 10-bit mantissa field is selected from the least significant bits of the 11 significant bits within the output of the integer pre-conversion instruction. The exponent value is set to (A+15) to account for the normal exponent offset.
(29)
(30)
(31) At step 52 the output 64-bit integer operand is generated from the rounding result of step 48 together with a suffix of all zero values as generated at step 50.
(32) Although illustrative embodiments of the invention have been described in detail herein with reference to the accompanying drawings, it is to be understood that the invention is not limited to those precise embodiments, and that various changes and modifications can be effected therein by one skilled in the art without departing from the scope and spirit of the invention as defined by the appended claims.