DYNAMIC BLOCK SIZE CARRY-SKIP ADDER CONSTRUCTION ON FPGAS BY COMBINING RIPPLE CARRY ADDERS WITH ROUTABLE PROPAGATE/GENERATE SIGNALS
20220244912 · 2022-08-04
Inventors
Cpc classification
G06F7/506
PHYSICS
International classification
Abstract
An adder is implemented in a field programmable gate array (FPGA). The adder has a first ripple carry adder block, for least significant bits of the adder. The adder has a plurality of carry skip adder blocks of differing block sizes. Each block size relates to bit-width of input to a block. The carry skip adder blocks of differing block sizes are for a plurality of bits of the adder. The adder has a second ripple carry adder block, for most significant bits of the adder.
Claims
1. An adder implemented in a field programmable gate array (FPGA), comprising: a first ripple carry adder block, for least significant bits of the adder; a plurality of carry skip adder blocks of differing block sizes, each block size relating to bit-width of input to a block, for a plurality of bits of the adder; and a second ripple carry adder block, for most significant bits of the adder.
2. The adder implemented in the FPGA of claim 1, wherein: each of the plurality of carry skip adder blocks coupled to receive as inputs routed propagate carry and generate carry signals from full adder logic blocks in a skip adder structure.
3. The adder implemented in the FPGA of claim 1, wherein: critical path delay for a carry of the adder is lower in comparison to critical path delay for a carry of a ripple carry adder that could be implemented in the FPGA as having a same overall input bit-width as the adder.
4. The adder implemented in the FPGA of claim 1, wherein: area of the adder, in the FPGA, is lower in comparison to an area of a further carry skip adder that could be implemented in the FPGA composed of carry skip adder blocks having a fixed block size equal to a largest of the differing block sizes of the plurality of carry skip adder blocks of the adder.
5. The adder implemented in the FPGA of claim 1, wherein: the differing block sizes increase from a first carry skip adder block, at a first end of the plurality of carry skip adder blocks, towards at least one carry skip adder block in a middle of the plurality of carry skip adder blocks and decrease from the at least one carry skip adder block in the middle of the plurality of carry skip adder blocks towards a second carry skip adder block, at a second end of the plurality of carry skip adder blocks.
6. The adder implemented in the FPGA of claim 1, wherein: at least one of the plurality of carry skip adder blocks includes a wide AND gate logic for fast block propagate carry generation.
7. The adder implemented in the FPGA of claim 1, wherein the adder has two or more features from a feature set consisting of: a first feature comprising an adder structure that uses routed propagate and generate signals from adder logic to create carry skip adder structures; a second feature comprising variable carry skip block sizes to hide routing delay associated with generating group propagate and generate signals; a third feature comprising customized block sizes in the adder structure to trade-off adder area for performance; and a fourth feature comprising a ripple carry structure to generate a wide AND for the function of fast block propagate generation.
8. A computer aided design (CAD) method, practiced by a CAD system, the method comprising: receiving instruction to implement an adder in a field programmable gate array (FPGA); and generating the adder in a format for programming the FPGA, wherein the adder comprises: a first ripple carry adder block, for least significant bits of the adder; a plurality of carry skip adder blocks of differing block sizes, for a plurality of bits of the adder, each block size relating to bit-width of input to a block; and a second ripple carry adder block, for most significant bits of the adder.
9. The CAD method of claim 8, wherein: each of the plurality of carry skip adder blocks coupled to receive as inputs routed propagate carry and generate carry signals from full adder logic blocks in a skip adder structure.
10. The CAD method of claim 8, wherein: critical path delay for a carry of the adder is lower in comparison to critical path delay for a carry of a ripple carry adder that could be implemented in the FPGA as having a same overall input bit-width as the adder.
11. The CAD method of claim 8, wherein: area of the adder, in the FPGA, is lower in comparison to an area of a further carry skip adder that could be implemented in the FPGA composed of carry skip adder blocks having a fixed block size equal to a largest of the differing block sizes of the plurality of carry skip adder blocks of the adder.
12. The CAD method of claim 8, wherein: the differing block sizes increase from a first carry skip adder block, at a first end of the plurality of carry skip adder blocks towards at least one carry skip adder block in a middle of the plurality of carry skip adder blocks and decrease from the at least one carry skip adder block in the middle of the plurality of carry skip adder blocks towards a second carry skip adder block, at a second end of the plurality of carry skip adder blocks.
13. The CAD method of claim 8, wherein: at least one of the plurality of carry skip adder blocks includes a wide AND gate logic for fast block propagate carry generation.
14. The CAD method of claim 8, wherein the adder has two or more features from a feature set consisting of: a first feature comprising an adder structure that uses routed propagate and generate signals from adder logic to create carry skip adder structures; a second feature comprising variable carry skip block sizes to hide routing delay associated with generating group propagate and generate signals; a third feature comprising customized block sizes in the adder structure to trade-off adder area for performance; and a fourth feature comprising a ripple carry structure to generate a wide AND for the function of fast block propagate generation.
15. A tangible, non-transitory, computer-readable media having instructions thereupon which, when executed by a processor, cause the processor to perform a method comprising: receiving instruction to implement an adder in a field programmable gate array (FPGA); and programming the FPGA to implement the adder, wherein the adder comprises: a first ripple carry adder block, for least significant bits of the adder; a plurality of carry skip adder blocks of differing block sizes, each block size relating to bit-width of input to a block, for a plurality of bits of the adder; and a second ripple carry adder block, for most significant bits of the adder.
16. The computer-readable media of claim 15, wherein: each of the plurality of carry skip adder blocks coupled to receive as input routed propagate carry and generate carry signals from full adder logic blocks in skip adder structure.
17. The computer-readable media of claim 15, wherein: critical path delay for a carry of the adder is lower in comparison to critical path delay for a carry of a ripple carry adder that could be implemented in the FPGA as having a same overall input bit-width as the adder; and area of the adder, in the FPGA, is lower in comparison to an area of a further carry skip adder that could be implemented in the FPGA composed of carry skip adder blocks having a fixed block size equal to a largest of the differing block sizes of the plurality of carry skip adder blocks of the adder.
18. The computer-readable media of claim 15, wherein: the differing block sizes increase from a first carry skip adder block, at a first end of the plurality of carry skip adder blocks towards at least one carry skip adder block in a middle of the plurality of carry skip adder blocks and decrease from the at least one carry skip adder block in the middle of the plurality of carry skip adder blocks towards a second carry skip adder block, at a second end of the plurality of carry skip adder blocks.
19. The computer-readable media of claim 15, wherein: at least one of the plurality of carry skip adder blocks includes a wide AND gate logic for fast block propagate carry generation.
20. The computer-readable media of claim 15, wherein the adder has two or more features from a feature set consisting of: a first feature comprising an adder structure that uses routed propagate and generate signals from adder logic to create carry skip adder structures; a second feature comprising variable carry skip block sizes to hide routing delay associated with generating group propagate and generate signals; a third feature comprising customized block sizes in the adder structure to trade-off adder area for performance; and a fourth feature comprising a ripple carry structure to generate a wide AND for the function of fast block propagate generation.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0019] Embodiments described herein will be understood more fully from the detailed description given below and from the accompanying drawings of various embodiments of the invention, which, however, should not be taken to limit the invention to the specific embodiments, but are for explanation and understanding only.
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
DETAILED DESCRIPTION
[0028] In the following description, numerous details are set forth to provide a more thorough explanation of the present embodiments. It will be apparent, however, to one skilled in the art, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the present embodiments.
[0029] Techniques are described herein for creating a class of fast carry-skip adder structures on FPGAs with low area overhead versus plain ripple carry adders (RCA) using a modified version of the standard hardened RCA that drives the routing fabric with the propagate and generate signals.
[0030]
[0031] In some embodiments, the full adder, implemented using a 4-LUT 104 in
[0032]
[0033]
[0034]
[0035]
[0036] With reference to the carry skip adder embodiments in
[0037]
[0038]
[0039] In the embodiment shown in
[0040] In one embodiment, in terms of block size choice, the adder structure can be chosen by the user by specifying whether the CAD tool should focus more on area or performance (which is a global option that affects the whole design), with a parameterized adder module that the user can instantiate in their design (e.g., the user can specify parameters that control the structure of the adder), using physical synthesis techniques to start with the area optimized adder, then modify the block sizes to target speed only for adders on the critical path.
[0041] Thus, as described above, the carry skip adder structure(s) are implemented efficiently on an FPGA using a mix of hardened resources and soft logic/routing. Included in the range of embodiments are at least the following features, and the capability of a CAD system to generate adder implementations that have various combinations of these features. [0042] An adder structure that uses routed propagate and generate signals from adder logic to create carry skip adder structures. [0043] An adder structure that has variable carry skip block sizes to hide routing delay associated with generating group propagate and generate signals. [0044] An adder structure that has customized block sizes in the adder structure to trade-off adder area for performance. [0045] An adder structure that includes a ripple carry structure to generate a wide AND for the purpose of fast block propagate generation.
[0046] An adder structure having two or more of the preceding features.
[0047] Further features that various embodiments have in various combinations are as follows. [0048] Critical path delay for carry of the adder is lower in comparison to critical path delay for carry of a ripple carry adder that could be implemented in the FPGA as having same overall input bit-width as the adder. [0049] Area of the adder, in the FPGA, is lower in comparison to area of a carry skip adder that could be implemented in the FPGA as composed of carry skip adder blocks having a fixed block size equal to a largest of the differing block sizes.
[0050]
[0051] In various embodiments, synthesis creates the entire adder as one block, in a hierarchical structure that has blocks within blocks. For example, if instructed to implement a 32 bit adder, the CAD tool 804 creates all the block sizes that are used to create a carry skip version of the adder. In some embodiments, the CAD tool 804 explores trade-offs, for example the bigger the block size, the longer it takes to create group, generate and propagate signals. Returning to the example of a 32 bit adder, the CAD tool 804 could split the design into four groups of eight or eight groups of four, and analyze critical path, then select which of the two possibilities is optimal for timing of carry. The CAD tool 804 could determine timing for a four bit ripple adder, and compare timing for a four bit carry skip adder. Such comparisons can be performed for various stages of an adder, with various combinations of block sizes.
[0052] It has been found that, as the size and width of the adder increases, the time it takes to compute for the carry scales sub-linearly. And, comparing critical path for a ripple carry adder, the time it takes to compute for the carry scales in a linear relationship with the width of the adder. Accordingly, it has been found that, below a certain bit width, a ripple carry adder is fastest. Such a bit width could be used as a threshold value, in the CAD tool 804. Instructed to implement an adder of a bit width below or equal to the threshold value, the CAD tool 804 could implement a ripple carry adder. Greater than that, the CAD tool 804 can implement an adder that begins and ends with a ripple carry adder, i.e., one ripple carry adder for the lower bits, and another ripple carry adder for the upper bits, and has a carry skip adder, or multiple carry skip adder blocks of various block sizes, for the middle bits.
[0053] At the beginning, the CAD tool 804 can start with a low block size, for example a block size of two. Then there is an additional threshold where it makes sense, analytically, to increase the block size for the next block(s) and still be below the delay to keep up with the ripple through the critical path of the carry. This is what is meant by hiding the general routing delay, in various embodiments. Delay for the carry generate and carry propagate signals for a given carry skip adder block are compared to delay along the critical path of the carry for the assembled adder, then acceptable block size for that carry skip adder block (and sub-critical delay for block carry generate and block carry propagate signals) is determined based on this comparison.
[0054] At some point, for example about midway through the adder, it is possible that adding a large block would create a new critical path to generate sum bits. Adding a smaller block, which takes less delay to generate the later or last sum bits of the adder avoids this new critical path. The CAD tool 804 could proceed in this direction, generating smaller block sizes towards the more significant bits of the adder. Then the final bits for the adder could be implemented with another ripple carry adder, which would be faster than another carry skip adder block. Past the middle of the adder, the CAD tool 804 could create smaller block sizes and keep reducing block size because there is less delay that can be masked by the end of the ripple.
[0055] Some embodiments of the CAD tool 804 optimize block sizes of an adder implemented with variable block sizes by balancing the delay through ripple in the carry chain and delay in the block carry generate and block carry propagate signals. Using larger block sizes means fewer stages of ripple in the critical path of the carry, which speeds up the carry propagation but makes the sum generation slower.
[0056] One embodiment of the CAD tool 804 looks at each bit of the adder and determines how to compute the sum for the next group of bits, e.g., will that be one bit at a time, two bits at a time, three or four bits at a time, etc. Two factors go into the decision, one is to have enough delay to generate the group generate signal earlier than the delay that has been accumulated thus far in the critical path for the carry. The other factor is generation of the sum bits taking into account ripple through a link which is through general-purpose routing. It is acceptable to make some signals slower because they are not in the critical path, and that dictates how big a block size may be. There is an outward constraint and an input constraint. Towards the more significant bit end of an adder, the sum bits might be slowed down and become the critical path. From an algorithm point of view, one determination is whether it is creating a new critical path by generating a block, and if so, then try a smaller block.
[0057] Some portions of the detailed descriptions above are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
[0058] It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing” or “computing” or “calculating” or “determining” or “displaying” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
[0059] The present invention also relates to apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
[0060] The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the required method steps. The required structure for a variety of these systems will appear from the description below. In addition, the present invention is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the invention as described herein.
[0061] A machine-readable medium includes any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer). For example, a machine-readable medium includes read only memory (“ROM”); random access memory (“RAM”); magnetic disk storage media; optical storage media; flash memory devices; electrical, optical, acoustical or other form of propagated signals (e.g., carrier waves, infrared signals, digital signals, etc.); etc.
[0062] Whereas many alterations and modifications of the present invention will no doubt become apparent to a person of ordinary skill in the art after having read the foregoing description, it is to be understood that any particular embodiment shown and described by way of illustration is in no way intended to be considered limiting. Therefore, references to details of various embodiments are not intended to limit the scope of the claims which in themselves recite only those features regarded as essential to the invention.