VARIABLE PRECISION FLOATING-POINT ADDER AND SUBTRACTOR
20180081633 ยท 2018-03-22
Inventors
Cpc classification
G06F7/49915
PHYSICS
G06F2205/00
PHYSICS
G06F7/49921
PHYSICS
International classification
Abstract
An integrated circuit may include a floating-point adder that supports variable precisions. The floating-point adder may receive first and second inputs to be added, where the first and second inputs each have a mantissa and an exponent. The mantissa and exponent values may be split into a near path and a far path using a dual path floating-point adder architecture depending on the difference of the exponents and on whether an addition or subtraction is being performed. The mantissa values may be left justified, while the sticky bits are right justified. The hardware for the largest mantissa can be used to support the calculations for the smaller mantissas using no additional arithmetic structures, with only some multiplexing and decoding logic.
Claims
1. An integrated circuit, comprising: a floating-point adder that receives first and second numbers, each having a mantissa and an exponent, wherein the floating-point adder comprises: a mantissa alignment circuit; and an addition circuit, wherein the mantissa alignment circuit and the addition circuit are operable in a first mode that supports a first mantissa size and are also operable in a second mode that supports a second mantissa size that is different than the first mantissa size.
2. The integrated circuit of claim 1, wherein the mantissa alignment circuit comprises a right shifter that generates a shifted mantissa.
3. The integrated circuit of claim 2, wherein the mantissa alignment circuit further comprises a gating circuit that selectively passes through the shifted mantissa.
4. The integrated circuit of claim 3, wherein the mantissa alignment circuit further comprises a comparison circuit that controls the gating circuit, and wherein the comparison circuit selectively directs the gating circuit to output only zero bits.
5. The integrated circuit of claim 4, wherein the mantissa alignment circuit further comprises a maximum shift lookup table (LUT) circuit that outputs a maximum shift value depending on the size of the mantissas of the first and second numbers, and wherein the comparison circuit compares the maximum shift value to the difference in the exponents.
6. The integrated circuit of claim 5, wherein the maximum shift LUT circuit outputs a first maximum shift value during the first mode and outputs a second maximum shift value that is different than the first maximum shift value during the second mode.
7. The integrated circuit of claim 2, wherein the right shifter includes: a chain of multiplexing stages; a first logic OR gate that is coupled to each multiplexing stage in the chain and that generates a first output signal; and a second logic OR gate that receives at least some least significant bits from a last multiplexing stage in the chain and that generates a second output signal, wherein second output signal is selectively combined with the first output signal only during the second mode.
8. The integrated circuit of claim 1, wherein the addition circuit comprises: a first summing circuit that outputs a sum; a second summing circuit that outputs an incremented sum; and a logic OR gate that feeds the second summing circuit and that receives a deasserted control signal during the first mode and an asserted control signal during the second mode.
9. The integrated circuit of claim 8, wherein the second summing circuit has a carry-in port that receives a logic 1.
10. The integrated circuit of claim 1, wherein the addition circuit comprises: a first summing circuit that outputs a sum; and a second summing circuit that outputs an incremented sum, wherein the second summing circuit includes: a first summing segment; a second summing segment; and a multiplexing circuit that receives a carry-out signal from the second summing segment and that selectively routes a selected one of the carry-output signal and a sticky bit to a carry-in port of the first summing segment.
11. A method of operating a floating-point adder on an integrated circuit, wherein the method comprises: receiving first and second numbers each having a mantissa and an exponent; routing the first and second numbers to a near path when a difference in the exponents is equal to or less than a predetermined threshold; routing the first and second numbers to a far path when the difference in the exponents exceeds the predetermined threshold; operating the floating-point adder in a first mode to support a first mantissa size for the first and second numbers; and operating the floating-point adder in a second mode to support a second mantissa size that is different than the first mantissa size.
12. The method of claim 11, further comprising: left justifying the mantissas of the first and second numbers during both the first and second modes.
13. The method of claim 11, further comprising: right justifying sticky bits for the first and second numbers during both the first and second modes.
14. The method of claim 11, further comprising: determining a first maximum mantissa shift amount for the first mode; and determining a second maximum mantissa shift amount that is different than the first maximum mantissa shift amount for the second mode.
15. The method of claim 11, further comprising: generating a sticky bit by combining a first number of least significant bits (LSBs) in the second number during the first mode; and generating the sticky bit by combining a second number of LSBs in the second number that is different than the first number during the second mode.
16. A floating-point adder circuit for adding together first and second floating-point numbers, each of which has a mantissa and an exponent, comprising: a fixed path that conveys a sticky bit for the second floating-point number; and a first multiplexer that includes: a first input that receives a bit from a first bit position in the mantissa of the second floating-number number during a first mode to support a first mantissa size for the first and second floating-point numbers; a second input that receives a bit from a second bit position in the mantissa of the second floating-number number during a second mode to support a second mantissa size for the first and second floating-point numbers that is different than the first mantissa size; and an output at which a rounding bit is provided.
17. The floating-point adder circuit of claim 16, further comprising: a second multiplexer that includes: a first input that receives a bit from a third bit position in the mantissa of the second floating-number number during the first mode; a second input that receives a bit from a fourth bit position in the mantissa of the second floating-number number during the second mode; and an output at which a guard bit is provided.
18. The floating-point adder circuit of claim 17, further comprising: a third multiplexer that includes: a first input that receives a bit from a fifth bit position in the mantissa of the second floating-number number during the first mode; a second input that receives a bit from a sixth bit position in the mantissa of the second floating-number during the second mode; and an output at which a least significant bit (LSB) of the mantissa of the second floating-point number is provided.
19. The floating-point adder circuit of claim 18, further comprising: a summing circuit having a first inverting input that is coupled to the fixed path and the first, second, and third multiplexers and a second input that receives a logic one.
20. The floating-point adder circuit of claim 19, further comprising: a rounding determination circuit that receives the sticky bit, the rounding bit, the guard bit, the LSB of the mantissa of the second floating-point number, and a most significant bit of the sum of the first and second floating-point numbers.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
DETAILED DESCRIPTION
[0024] The embodiments presented herein relate to integrated circuits and, more particularly, to floating-point adders/subtractors on an integrated circuit. It will be recognized by one skilled in the art, that the present exemplary embodiments may be practiced without some or all of these specific details. In other instances, well-known operations have not been described in detail in order not to unnecessarily obscure the present embodiments.
[0025] An illustrative embodiment of an integrated circuit such as programmable logic device (PLD) 100 having an exemplary interconnect circuitry is shown in
[0026] Programmable logic device 100 may contain programmable memory elements. Memory elements may be loaded with configuration data (also called programming data) using input/output elements (IOEs) 102. Once loaded, the memory elements each provide a corresponding static control signal that controls the operation of an associated functional block (e.g., LABs 110, SPB 120, RAM 130, or input/output elements 102).
[0027] In a typical scenario, the outputs of the loaded memory elements are applied to the gates of metal-oxide-semiconductor transistors in a functional block to turn certain transistors on or off and thereby configure the logic in the functional block including the routing paths. Programmable logic circuit elements that may be controlled in this way include parts of multiplexers (e.g., multiplexers used for forming routing paths in interconnect circuits), look-up tables, logic arrays, AND, OR, NAND, and NOR logic gates, pass gates, etc.
[0028] The memory elements may use any suitable volatile and/or non-volatile memory structures such as random-access-memory (RAM) cells, fuses, antifuses, programmable read-only-memory memory cells, mask-programmed and laser-programmed structures, mechanical memory devices (e.g., including localized mechanical resonators), mechanically operated RAM (MORAM), combinations of these structures, etc. Because the memory elements are loaded with configuration data during programming, the memory elements are sometimes referred to as configuration memory, configuration RAM (CRAM), configuration memory elements, or programmable memory elements.
[0029] In addition, the programmable logic device may have input/output elements (IOEs) 102 for driving signals off of device 100 and for receiving signals from other devices. Input/output elements 102 may include parallel input/output circuitry, serial data transceiver circuitry, differential receiver and transmitter circuitry, or other circuitry used to connect one integrated circuit to another integrated circuit. As shown, input/output elements 102 may be located around the periphery of the chip.
[0030] If desired, the programmable logic device may have input/output elements 102 arranged in different ways. For example, input/output elements 102 may form one or more columns of input/output elements that may be located anywhere on the programmable logic device (e.g., distributed evenly across the width of the PLD). If desired, input/output elements 102 may form one or more rows of input/output elements (e.g., distributed across the height of the PLD). Alternatively, input/output elements 102 may form islands of input/output elements that may be distributed over the surface of the PLD or clustered in selected areas.
[0031] The PLD may also include programmable interconnect circuitry in the form of vertical routing channels 140 (i.e., interconnects formed along a vertical axis of PLD 100) and horizontal routing channels 150 (i.e., interconnects formed along a horizontal axis of PLD 100), each routing channel including at least one track to route at least one wire. If desired, the interconnect circuitry may include double data rate interconnections and/or single data rate interconnections.
[0032] If desired, routing wires may be shorter than the entire length of the routing channel. A length L wire may span L functional blocks. For example, a length four wire may span four blocks. Length four wires in a horizontal routing channel may be referred to as H4 wires, whereas length four wires in a vertical routing channel may be referred to as V4 wires.
[0033] Different PLDs may have different functional blocks which connect to different numbers of routing channels. A three-sided routing architecture is depicted in
[0034] In a direct drive routing architecture, each wire is driven at a single logical point by a driver. The driver may be associated with a multiplexer which selects a signal to drive on the wire. In the case of channels with a fixed number of wires along their length, a driver may be placed at each starting point of a wire.
[0035] Note that other routing topologies, besides the topology of the interconnect circuitry depicted in
[0036] Furthermore, it should be understood that embodiments may be implemented in any integrated circuit. If desired, the functional blocks of such an integrated circuit may be arranged in more levels or layers in which multiple functional blocks are interconnected to form still larger blocks. Other device arrangements may use functional blocks that are not arranged in rows and columns.
[0037]
[0038] Floating-point numbers are commonplace for representing real numbers in scientific notation in computing systems and are designed to cover a large numeric range and diverse precision requirements. The IEEE 754 standard is commonly used for floating-point numbers. A floating-point number includes three different parts: (1) the sign of the floating-point number, (2) the mantissa, and (3) the exponent. Each of these parts may be represented by a binary number and, in the IEEE 754 format, have different bit sizes depending on the precision. For example, a single precision floating-point number requires 32 bits, which are distributed as follows: one sign bit (bit 32), eight exponent bits (bits [31:24]), and 23 mantissa bits (bits [23:1]). A double precision floating-point number requires 64 bits including one sign bit (bit 64), 11 exponent bits (bits [63:53]), and 52 mantissa bits (bits [52:1]).
[0039] The sign of a floating-point number according to standard IEEE 754 is represented using a single bit, where a 0 denotes a positive number and a 1 denotes a negative number. As IEEE 754 floating-point numbers are defined in this signed magnitude format, additional and subtraction operations are essentially the same, so only an adder may be referred to below to simplify the discussion.
[0040] The exponent of a floating-point number preferably is an unsigned binary number which, for the single precision format, ranges from 0 to 255. In order to represent a very small number, it is necessary to use negative exponents. Thus, the exponent preferably has a negative bias. For single precision floating-point numbers, the bias preferably is 127. For example a value of 140 for the exponent actually represents (140127)=13, and a value of 100 represents (100127)=27. For double precision numbers, the exponent bias preferably is 1023.
[0041] As discussed above, according to the IEEE 754 standard, the mantissa is a normalized number (i.e., it has no leading zeroes and represents the precision component of a floating point number). Because the mantissa is stored in binary format, the leading bit can either be a 0 or a 1, but for a normalized number it will always be a 1. Therefore, in a system where numbers are always normalized, the leading bit need not be stored and can be implied, effectively giving the mantissa one extra bit of precision. Thus, the single precision format effectively has 24 bits of precision (i.e., 23 mantissa bits plus one implied bit).
[0042] The single precision floating point arithmetic as defined by the IEEE 754 standard may be referred to as FP32, since the single precision floating-point number requires 32 bits (see, e.g., the first row in table 300 of
[0043] Typically, conventional programmable logic devices only include float-point adders with fixed precisions. Implementing variable precision floating-point arithmetic circuits may be prohibitively expensive since support of different precisions may require significant area overhead. It would therefore be desirable to implement multi-precision floating-point adder circuits without incurring significant area penalties.
[0044] In accordance with an embodiment, adders 200 may not only be configured to support FP32 and FP16, but may also be configured to support a wide range of intermediate sizes such as FP17, FP18, FP20, etc., without incurring large area penalties. Configured in this way, DSP 120 can support up to twice the functional density relative to FP32 operations. As shown in
[0045] The number of bits allocated to the exponent and mantissa portions as shown in table 300 is merely illustrative and does not serve to limit the present embodiments. If desired, the exponent for each of the various floating-point formats may be more or less than five bits, and the number of mantissa bits may be adjusted based on the exponent.
[0046]
[0047] To illustrate the operation of the near path, consider an example where A is equal to 1124*exp(15) and where B is equal to 1924*exp(14). In the example above, number A has a significand that is equal to 1124 and an exponent of base two. In order to match the exponents, B may be right-shifted by one position (i.e., to divide B by a factor of 2 since the exponent is base two, which yields 962). Since B is negative, the magnitude of B may then be subtracted from A, which then yields 162 (i.e., 1124 minus 962), with a corresponding exponent of 15. To normalize 162, the number of leading zeros may be determined and then left shifted by the appropriate number of bits. In this particular scenario, the result may be equal to 1296*exp(12) (i.e., 162 may be left-shifted by 3 bit positions, effectively multiplying 162 by a factor of 8 to yield 1296).
[0048] As shown in the example above, the near path may involve a subtraction operation (which can be performed using subtraction circuit 410 of
[0049] To illustrate the operation of the far path, consider an example where A is equal to 1524*exp(15) and where B is equal to 1424*exp(12). In order to match the exponents, B may be right-shifted by three bit positions (i.e., to divide B by 8 since the exponent is base two, which yields 178). Since B is negative, the magnitude of B may then be subtracted from A, which then yields 1346 (i.e., 1524 minus 178), with a corresponding exponent of 15. Since A is at least several orders of magnitude larger than B, the resulting number can be expressed in base two without a complicated count-zero/normalization process.
[0050] As shown in the example above, the far path may involve right shifting the far less number (which can be performed using alignment circuit 414) and an addition/subtraction operation (which can be performed using circuit 416). While the normalization operation at the output of circuit 416 might be trivial, implementation of alignment circuit 414 and arithmetic circuit 416 can be fairly complex.
[0051] Still referring to
[0052]
[0053] As described above, floating-point adder 200 of
[0054] The shift value may be compared to the maximum allowed shift output using comparator 504. If the input shift value exceeds the maximum shift value, comparator 504 will output a low value, which will cause gate 502 to output a zeroed value (i.e., the far lesser output will be completely zeroed out). If the input shift value is equal to or less than the maximum shift value, comparator 504 will output a high value, which will allow gating circuit 502 to pass through the shifted value at the output of right shifter block 500. The example of
[0055]
[0056] In floating-point calculations, a sticky bit will also need to be computed. The sticky bit indicates whether a high bit was shifted to the right of the word width of the input number and is therefore lost. Blocks 602-1, 602-2, 602-3, and 602-4 may receive the shifted bits and may be used to calculate the sticky bit contribution for each level by ORing together any bits that would have been shifted to the right of the data path. All of these signals may be ORed together using logic OR gate 604 to generate help a sticky bit for the far path.
[0057] The size of right shifter block 500 may be set depending on the largest mantissa precision that needs to be supported by floating-point adder 200 (
[0058] To help support smaller mantissas, the sticky bit will need to include bits shifted to the right of the smaller mantissa. These bits can be taken from the last multiplexing stage such as stage 600-4 in the example of
[0059] This example in which two different mantissa widths are supported is merely illustrative and is not intended to limit the scope of the present embodiments. If desired, three or more different floating-point precisions can be supported in this way. An additional OR gate 606 will combine bits to the left (more significant than those from gate 606) and can be selectively ORed with the components of the larger sticky bits. In other words, an additional OR gate that receives the fourth and fifth bit from the last stage may be cascaded with logic gate 606 and may be selectively combined using an additional logic AND gate to help support a mantissa width of 4 bits.
[0060] The methods described herein may require all input mantissas to be aligned at the MSB (e.g., all mantissas should be left justified). Operated in this way, the guard and round bits will then be implied to be the immediate right of the LSB of the respective mantissas. Similarly, the sticky bit for all mantissa cases will at the same position (e.g., to the right of the LSB of the largest number). In other words, numbers with smaller mantissas will still be left justified but there will be at least some don't care bits between its LSB and the actual sticky bit, which is right justified according to the size of the largest mantissa (e.g., the mantissas are left justified, whereas the stick bit is right justified).
[0061]
[0062] Circuit 416 receives a first input signal left[20:1] from the far greater path and a second input signal right[20:1] from the far lesser path. The far less input signal right[20:1] may be selectively complemented using bit-wise XOR gate 704. For example, each bit in signal right[20:1] may be selectively inverted using gate 704 during a true subtraction operation when the right input of XOR gate 704 is asserted. In accordance with at least some embodiments, the subtraction operation, which is a function of the operation and the sign of the input operands, is performed using a two's complement operation. To convert a number into its two's complement, the number may be inverted and incremented by 1. Whether or not signal right[20:1] has been altered by gate 704, the value at the output of gate 704 may be represented as signal right[20:1].
[0063] In the simplest scenario, the far greater input signal left[20:1] and the far lesser input signal right[20:1] would be added together via path 710 using summing circuit 700 to produce A+B or added together using summing circuit 702 with an additional +1 to produce A+B+1. The additional 1 input may be implemented as a high carry-in bit. In particular, the A+B+1 result of summing circuit 702 may be selected in the case of a subtraction operation or in the case of an addition operation that is rounded up. The 14-bit mantissa example described above may be capable of supporting FP20 addition/subtracting operations (see, e.g.,
[0064] When supporting a smaller mantissa such as a 10-bit mantissa (see, e.g., FP16 in
[0065] As shown in this example, when a smaller mantissa is being supported, the difference in precision between the larger mantissa and the smaller mantissa may be ORed high using gate 706. Doing so has the effect of propagating the additional 1 of summer 702 through to the LSB of the smaller mantissa. If desired, this technique may be extended to support more than two different precisions. To support additional smaller mantissa sizes, additional logic OR gate(s) 706 may be added to further propagate the carry-in 1 bit further up summer 702.
[0066] Although summing circuits 700 and 702 are shown as separate structures, other structures and methods such as flagged prefix adders or compound adders may be used (as examples). The exemplary arrangement as shown in
[0067] Summing circuit 702 may also include a multiplexer 804 having a first input that is coupled to the carry-out port 808 of summing section 802, a second input that receives the sticky bit from the right LSB path 806, a control input that receives control signal Sc, and an output that feeds the carry-in port 810 of summing section 800. Configured in this way, the carry-out from summing section 802 is routed to the carry-in of summing section 800 when supporting the larger mantissa width of 14 bits (e.g., in a first mode when signal Sc is driven low), whereas the sticky bit is fed to the carry-in of summing section 800 via path 806 when supporting the smaller mantissa width of 10 bits (e.g., in a second mode when signal Sc is driven high). The sticky bit should always be the LSB of signal right. Directly feeding the sticky bit to summing section 800 in this way provides the same functionality as rippling through a 1 from the LSB to the desired bit position.
[0068]
[0069] Still referring to
[0070] As described previously, the mantissas are left justified. The three MSB bits of the mantissa, which are always in the same location since the mantissas are left justified, will have to be tested to check if the range is greater than or equal to two, less than two, greater than or equal to one, or less than one. Thus, the only logic that is required to process the MSBs is a two-bit N:1 multiplexer (where N is the total number of unique mantissa precisions) and M logic OR gates (where M is equal to the difference in precision between the largest and smallest mantissas). This method of aligning the mantissas on the MSB means that the near path is identical in all cases, so the near path does not require any muxing to position the mantissa before, during, or after the near path calculation.
[0071]
[0072] There is also a special case for subtraction rounding, which only affects the LSB of the result.
[0073]
[0074] The embodiments of
[0075] The embodiments thus far have been described with respect to integrated circuits. The methods and apparatuses described herein may be incorporated into any suitable circuit. For example, they may be incorporated into numerous types of devices such as programmable logic devices, application specific standard products (ASSPs), and application specific integrated circuits (ASICs). Examples of programmable logic devices include programmable arrays logic (PALs), programmable logic arrays (PLAs), field programmable logic arrays (FPGAs), electrically programmable logic devices (EPLDs), electrically erasable programmable logic devices (EEPLDs), logic cell arrays (LCAs), complex programmable logic devices (CPLDs), and field programmable gate arrays (FPGAs), just to name a few.
[0076] The programmable logic device described in one or more embodiments herein may be part of a data processing system that includes one or more of the following components: a processor; memory; IO circuitry; and peripheral devices. The data processing can be used in a wide variety of applications, such as computer networking, data networking, instrumentation, video processing, digital signal processing, or any suitable other application where the advantage of using programmable or re-programmable logic is desirable. The programmable logic device can be used to perform a variety of different logic functions. For example, the programmable logic device can be configured as a processor or controller that works in cooperation with a system processor. The programmable logic device may also be used as an arbiter for arbitrating access to a shared resource in the data processing system. In yet another example, the programmable logic device can be configured as an interface between a processor and one of the other components in the system. In one embodiment, the programmable logic device may be one of the family of devices owned by ALTERA/INTEL Corporation.
[0077] The foregoing is merely illustrative of the principles of this invention and various modifications can be made by those skilled in the art. The foregoing embodiments may be implemented individually or in any combination.