8x8 binary digital multiplier
10025557 ยท 2018-07-17
Assignee
Inventors
Cpc classification
G06F7/53
PHYSICS
H03M7/30
ELECTRICITY
International classification
Abstract
An 88 binary digital multiplier reduces the height of partial product columns to be no more than 7 bits high. The six 7-bit high middle columns are each input to a (7:3) counter. An ascending triangle compressor operates on the lesser significant bit columns. A descending triangle compressor operates on the greater significant bit columns. The counter and compressor outputs are combined for a final stage of compression, followed by partial product addition.
Claims
1. An ascending triangle compressor for compressing 14 terms of the five rightmost columns of partial products of a binary digital multiplier to produce single sum terms for each of five columns and two sum terms for a sixth column, the ascending triangle compressor comprising: a first half adder having inputs for both of the bits of the second least significant column and an output for the second least significant sum term; a (3+:2) compressor having inputs for each of the three bits of the third least significant column and an output for the third least significant sum term; a first (4:2) compressor having inputs for each of the four bits of the fourth least significant sum term; a second (4:2) compressor having inputs for four of the five bits of the fifth least significant column and an output for the first of the two sixth least significant sum terms; a second half adder having an input from the (3+:2) compressor and an input from the first (4:2) compressor, and an output for the fourth least significant sum term; and a (3:2) compressor having an input from the first (4:2) compressor, an input from the second (4:2) compressor, and output for the fifth least significant sum term, and an output for the second of the a sixth least significant sum terms.
2. A descending triangle compressor for compressing the ten terms of the four leftmost columns of partial products of a binary digital multiplier to produce single sum terms for each of five columns, the descending triangle compressor comprising: a (4:2) compressor having inputs for each of the four bits of the least significant column and an output for the least significant sum term; a first full adder having inputs for each of the three bits of the second least significant column; a half adder having inputs for both of the bits of the third least significant column; a second full adder having inputs for each of two outputs of the (4:2) compressor, an input for an output of the first full adder, and an output for the second least significant sum term; a third full adder having in an input for an output of the second full adder, an input for an output of the first full adder, an input for an output of the half adder, and an output for the third least significant sum term; and a fourth full adder having an input for an output of the third full adder, an input for an output of the half adder, and an input for the bit of the fourth least significant column an output for fourth least significant sum term, an output for the fifth least significant sum term.
3. A (7:3) counter comprising: a first full adder for adding a first through third input term, the full adder having a carry in input for the first term and a data input for each of the second and third term, a first output and a second output; a (4:2) compressor for adding a fourth through seventh input term and producing a least significant count bit, the (4:2) compressor having a data input for each of the four input terms, a carry in input, a first output for the least significant count bit, a second output, and a third output; and a second full adder for producing the two most significant count bits, the second full adder having a first input, a second input, and a third input, and a data output for the middle count bit, and a carry out output for the most significant count bit, wherein: the first output of the first output of the first full adder is coupled to the carry in input of the (4:2) compressor; the second output of the first full adder is coupled to the second input of the second full adder, the second output of the (4:2) compressor is coupled to the first input of the second full adder, and the third output of the (4:2) compressor is coupled to the third input of the second full adder.
4. The (7:3) counter of claim 3 wherein: the first output of the first full adder is a data output and the second output of the first full adder is a carry out; the second output of the (4:2) compressor is a data output and the third output of the (4:2) compressor is a carry out; and the first input of the second full adder is a carry in, the second input of the second full adder is a data input, and the third input of the second full adder is a data input.
5. An 88 binary digital multiplier comprising: an AND gate with inputs r0c7 and r1c6 and an output in partial product column 9; a NAND gate with inputs r1c7 and r0c6 and an output in partial product column 8; an XOR gate with inputs r0c7 and r1c6 and an output in partial product column 7; an AND gate with inputs that are two different terms selected from the set consisting of r0c4, r1c3, r2c2, and r3c1 and an output in partial product column 5; and an XOR gate with inputs that are the two chosen terms and an output in partial product column 4.
6. The 88 binary digital multiplier of claim 5 wherein the two chosen terms are r0c4 and r1c3.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
DETAILED DESCRIPTION
(14) The invention concerns the wire delay cost of multiplier logic. Furthermore, it concerns the logic-area density cost of cross-column wires. Cross-column wires are ones that cross between columns of multiplier terms. The invention concerns units comprising half adders, full adders, counters, compressors, and a carry propagate adder. The invention is an 88 binary digital multiplier that has fewer and shorter wires than a Booth or modified Booth multiplier. Like most 88 binary digital multipliers, it has a first stage of bit-wise AND gates of the multiplicand and successively 1-bit left-shifted copies of the multiplier. This creates a parallelogram of eight rows and 15 columns of partial product bits to be summed.
(15) Below, various aspects of the invention are disclosed and discussed. Each, alone, is novel, as is their combination. It will be appreciated by those skilled in the art that many variations can be made, such as by switching the order of certain inputs to some units or by switching the order of the staging of certain units or by using fully or partially functional equivalents of certain units.
Reducing the Height of Column 7 from 8 Rows to 7
(16) The invention comprises a particularly fast, efficient (7:3) compressor. Using it requires that the height of multiplier column 7 be reduced from height 8 to height 7. Column 7 is the only column with height 8.
(17)
(18) According to the invention, add r0c7 to r1c6 in Column 7:
(19) Column 7 sum=(r0c7) XOR (r1c6)
(20) Carry into column 8=(r0c7) AND (r1c6)=r0r1c6c7
(21) Add the Carry=r0r1c6c7 to (r1c7) in Column 8:
(22) Sum=(r1c7) XOR r0r1c6c7=r1c7 AND (1 XOR r0c6)=(r1c7) AND NOT (r0c6)
(23) Carry=(r0r1c6c7)=(r1c7) AND (r0c6)
(24) As shown in
(25) Note: the 4 complex terms have 2 gate delays (2d). All other terms have 1 AND2 gate delay (1d).
(26) Note: Column 4 and Column 5 both have (r0c4) and (r1c2) terms, so each could be either a 2nd gate or else a wire, if the output transistors are sized for a fan-out of two and if wiring is available.
(27) Note: Column 6 has (r0 c6) and Column 7 has NOT (r0c6) terms, so one could be either a 2nd gate or else a wire, if the output transistors are sized for a fan-out of two and if wiring is available.
(28) Note: Columns 8 and 9 both have (r0c7) and (r1c5) terms, so each could be either a 2nd gate or a wire, if the output transistors are sized for a fan-out of two and if wiring is available.
(29) Note: The Column 9 complex term, (r0c7) AND (r1c5), could have been computed instead as (r1c6) AND (r0c6), so the Column 9 terms (r1c6) and (r0c6) would be the same as in Column 7. However, this would mean two longer wires crossing two columns, not just one, as shown here.
(30) Note: Columns 5, 6, 7, 8, 9 all have height 7.
(31) Note: In the embodiment of
Ascending Triangle Compressor
(32) According to an aspect of the invention, an ascending triangle compressor is used to compress 14 terms of the five rightmost columns of the partial product sums.
(33) The resultant column height of output terms for columns 5 to 0 is {2,1,1,1,1,1}. Note that this allows the final stage carry propagate adder to be shortened since columns 0 through 4, having a height of just one term, do not have to participate in the final carry propagate adder.
(34)
(35) The second rank comprises half adder 410 and (3:2) Compressor 412. A zero value is input to the second rank instance of the (3:2) compressor. This gives the {2,1,1,1,1,1} result of outputs S5B and S5A, for use in the next column, and the final results S4, S3, S2, S1, and S0. The effective gate delays of output S0, S1, S2, S3, S4, S5B, and S5A are 0d, 1d, 3d, 4d, 6d, 6d, and 3d respectively.
(36) Note that for column 2 (inputs r0c2, r1c1, and r2c0) a zero can be added to the column without changing the results so that this column can be considered to be (r0c2, r1c1, r2c0, zero). This column can be handled by a (4:2) compressor, where one input term is 0. This is referred to as a (3+:2) compressor.
(37) Ascending triangle compressor 300 produces 7 weighted outputs, a carry out to the next column summation logic. The longest output delay is to the final multiplication results output at S4 and compressed partial sum S5B, each with 6 effective gate delays.
Descending Triangle Compressor
(38) According to an aspect of the invention, a descending triangle compressor is used to compress 10 terms of the four leftmost columns of the partial product sums.
(39) This gives resultant terms S0, S1, S2, S3, and S4 with column of {1,1,1,1,1}. Note that this allows the final stage carry propagate adder to be further shortened since columns 15 through 11, having a height of just one term, do not have to participate in the final carry propagate adder, which can be replaced by an incrementer. The effective gate delays of outputs S0, S1, S2, S3, and S4 are 3d, 4d, 5d, 6d, and 6d respectively.
(40)
(41) Descending triangle compressor 600 produces 5 weighted outputs. The longest output delay is to the final multiplication results output at S3 and S4, each with 6 effective gate delays.
(7:3) Counter
(42) According to an aspect of the invention, (7:3) counters are used to compress partial product bits. Each compresses 7 terms.
(43)
(44)
(45) Consider a column of 7 input terms driving inputs A, B, C, D, E, F, and G, where the terms have effective input delays of 2d, 1d, 1d, 1d, 1d, 1d, and 1d respectively. In the embodiment of
(46) (7:3) counter 1000 uses two full adders (3 gates each), one (4:2) Compressor (6 gates). In combination with the 6 AND2 gates for stage one of the 6 multiplier input terms, a single-column (7:3) counter for the first and second stage can be implemented as an 18 gate macro cell with effective output gate delays of 4d, 4d, and 3d. This 18 gate (7:3) compressor has 13 inputs (one complex input and 6 X and 6 Y multiplier inputs) but only 3 outputswhich means only 3 final output wires to drive. All other internal gates drive either one or two following gates. The circuitry of (7:3) counter 1000 lends itself to an efficient implementation as a macro cell with a hand-optimized layout.
(47) Combining Ascending Triangle Compressor, (7:3) Counters, and Descending Triangle Compressor
(48) According to some embodiments of the invention, four sequential stages are used to determine the final multiplier result. The first stage is one of ANDing the multiplier input with each of eight sequentially bit shifted copies of the multiplicand in order to create a parallelogram of partial product bits.
(49) The second stage comprises:
(50) one ascending triangle compressor with inputs from columns 1, 2, 3, 4, and 5.
(51) five (7:3) counters, each with inputs from one of 7-high columns 5, 6, 7, 8, 9, 10, and 11; and
(52) one descending triangle compressor with inputs from columns 12, 13, 14, and 15;
(53)
(54) Column 5 requires adding terms from both of the least significant output bit (S0) of the least significant (7:3) counter and the two most significant (S5) terms of the ascending triangle compressor. In order to be able to instantiate multiple identical 18-gate macros for the (7:3) compressors of columns 5 through 10, a full adder is used on the two ascending triangle compressor S5 outputs and the column 5 (7:3) counter S0 output to produce a column sum for column 5 and carry out to column 6 for the next stage.
(55)
(56) In the third stage five full adders are used on columns 6 to 10 and a half adder on column 11. The result terms are shown in
(57) In the fourth stage a carry propagate adder is used on columns 7 to 12, with resulting carries through column 15. This reduces two rows to one resulting product row with +7 gate delays. This yields a maximum gate delay of 14d for columns 7 to 15.
(58) Some embodiments add latches between stages to create a pipelined multiplier.
Interpretation of Embodiments
(59) Embodiments of the invention described herein are merely exemplary, and should not be construed as limiting of the scope or spirit of the invention as it could be appreciated by those of ordinary skill in the art. The disclosed invention is effectively made or used in any embodiment that comprises any novel aspect described herein. All statements herein reciting principles, aspects, and embodiments of the invention are intended to encompass both structural and functional equivalents thereof. It is intended that such equivalents include both currently known equivalents and equivalents developed in the future. Since all two-input elemental logic gates satisfy the commutative property, claims listing terms combined by two-input logic gates in either order are equivalent. Many equivalent transformations of logic functions are known to persons having ordinary skill in the art. All such equivalents should be construed as equivalents of the logic functions claimed.