Patent classifications
G06F7/49905
Accuracy-conserving floating-point value aggregation
A method for enhancing an accuracy of a sum of a plurality of floating-point numbers. The method receives a floating-point number and generates a plurality of provisional numbers with a value of zero. The method further generates a surjective map from the values of an exponent and a sign of a mantissa to the provisional numbers in the plurality of provisional numbers. The method further maps a value of the exponent and the sign of the mantissa to a first provisional number with the surjective map. The method further generates a test number from the first provisional number and if the test number exceeds a limit, modifies a second provisional number by using at least part of the test number. The method further equates the first provisional number to the test number if the test number does not exceed the limit. The method further sums the plurality of provisional numbers.
AN APPARATUS AND METHOD FOR PROCESSING FLOATING POINT VALUES
An apparatus and method are provided for processing floating point values using an intermediate representation which has significand, exponent and shadow sections. A less significant portion of the exponent of the floating point value defines a range of positions within the significand section where the representation of the significand is to be held. The exponent section holds a representation of a more significant portion of the exponent indicating a selected window of multiple contiguous windows spanning a value range of a format of the floating point value. A first portion of the significand section corresponds to the selected window and a second portion corresponds to an overlap into a further window which is adjacent to and lower in the value range. The shadow section holds values for populating the second portion of the significand section when the representation of the significand of the floating point value is moved to a higher window which is adjacent to and higher in the value range than the selected window. The shadow section allows the selected window to be shifted, such that the summation of multiple values produces the same result independent of the order in which the values are summed.
METHODS FOR CALCULATING FLOATING-POINT OPERANDS AND APPARATUSES USING THE SAME
The invention introduces a method for calculating floating-point operands, which contains at least the following steps: receiving an FP (floating-point) operand in a first format from a source register, wherein the first format is one of a group of first formats of different kinds; converting the FP operand in the first format into an FP operand in a second format; generating a calculation result in the second format by calculating the FP operand in the second format; converting the calculation result in the second format into a calculation result in the first format; and writing-back the calculation result of the first format.
Electronic control device and steering system
An electronic control device is provided with a plurality of calculation blocks that calculate floating-point data. The electronic control device includes a storage that stores calculation data items of the plurality of calculation blocks. The electronic control device calculates a command value of a control amount in a control target based on the calculation data items of the plurality of calculation blocks.
Dual-mode floating point processor operation
By providing a mode indication, an execution unit is operable to operate in two separate modes, each of which cause the execution unit to perform calculations by interpreting the same bit string (the first of the bit strings) as representing one of two different values. When operating in the first mode, the first of the bit string represents an undefined value, in other words a NaN. When operating in the second mode, the first of the bit strings represents a negative zero. Hence, the same string of bits can represent either a NaN or a negative zero depending upon the mode of operation of the processor. Since it is not necessary to reserve more than one bit string to represent these two special values, the remaining combinations of bits are available to represent other values.