Patent classifications
G06F7/505
ADDER CIRCUITRY FOR VERY LARGE INTEGERS
An integrated circuit that includes very large adder circuitry is provided. The very large adder circuitry receives more than two inputs each of which has hundreds or thousands of bits. The very large adder circuitry includes multiple adder nodes arranged in a tree-like network. The adder nodes divide the input operands into segments, computes the sum for each segment, and computes the carry for each segment independently from the segment sums. The carries at each level in the tree are accumulated using population counters. After the last node in the tree, the segment sums can then be combined with the carries to determine the final sum output. An adder tree network implemented in this way asymptotically approaches the area and performance latency as an adder network that uses infinite speed ripple carry adders.
Efficient FPGA multipliers
In some example embodiments a logical block comprising twelve inputs and two six-input lookup tables (LUTs) is provided, wherein four of the twelve inputs are provided as inputs to both of the six-input lookup tables. This configuration supports efficient field programmable gate array (FPGA) implementation of multipliers. Each six-input LUT comprises two five-input lookup tables (LUT5s) that are used to form Booth encoding multiplier building blocks. The five inputs to each LUT5 are two bits from a multiplier and three Booth-encoded bits from a multiplicand. By assembling building blocks, multipliers of arbitrary size may be formed.
Sum address memory decoded dual-read select register file
Aspects of the invention include decoding a base address and an offset to generate a first potential memory address and a second potential memory address. A first cell data associated with the first potential memory address of a first partitioned array and a second cell data associated with a second partitioned array are evaluated. Carry-out bit information is received from a summing operation of the base address and the offset, the operating being performed in parallel to the decoding. The carry-out bit information is used to select either the first cell data or the second cell data.
Sum address memory decoded dual-read select register file
Aspects of the invention include decoding a base address and an offset to generate a first potential memory address and a second potential memory address. A first cell data associated with the first potential memory address of a first partitioned array and a second cell data associated with a second partitioned array are evaluated. Carry-out bit information is received from a summing operation of the base address and the offset, the operating being performed in parallel to the decoding. The carry-out bit information is used to select either the first cell data or the second cell data.
Binary parallel adder and multiplier
An arithmetic logic unit (ALU) including a binary, parallel adder and multiplier to perform arithmetic operations is described. The ALU includes an adder circuit coupled to a multiplexer to receive input operands that are directed to either an addition operation or a multiplication operation. During the multiplication operation, the ALU is configured to determine partial product operands based on first and second operands and provide the partial product operands to the adder circuit via the multiplexer, and the adder circuit is configured to provide an output having a value equal to a product of the first operand second operands. During an addition operation, the ALU is configured to provide the first and second operands to the adder circuit via the multiplexer, and the adder circuit is configured to provide the output having a value equal to a sum of the first and second operands.
Binary parallel adder and multiplier
An arithmetic logic unit (ALU) including a binary, parallel adder and multiplier to perform arithmetic operations is described. The ALU includes an adder circuit coupled to a multiplexer to receive input operands that are directed to either an addition operation or a multiplication operation. During the multiplication operation, the ALU is configured to determine partial product operands based on first and second operands and provide the partial product operands to the adder circuit via the multiplexer, and the adder circuit is configured to provide an output having a value equal to a product of the first operand second operands. During an addition operation, the ALU is configured to provide the first and second operands to the adder circuit via the multiplexer, and the adder circuit is configured to provide the output having a value equal to a sum of the first and second operands.
Adder circuitry for very large integers
An integrated circuit that includes very large adder circuitry is provided. The very large adder circuitry receives more than two inputs each of which has hundreds or thousands of bits. The very large adder circuitry includes multiple adder nodes arranged in a tree-like network. The adder nodes divide the input operands into segments, computes the sum for each segment, and computes the carry for each segment independently from the segment sums. The carries at each level in the tree are accumulated using population counters. After the last node in the tree, the segment sums can then be combined with the carries to determine the final sum output. An adder tree network implemented in this way asymptotically approaches the area and performance latency as an adder network that uses infinite speed ripple carry adders.
Adder circuitry for very large integers
An integrated circuit that includes very large adder circuitry is provided. The very large adder circuitry receives more than two inputs each of which has hundreds or thousands of bits. The very large adder circuitry includes multiple adder nodes arranged in a tree-like network. The adder nodes divide the input operands into segments, computes the sum for each segment, and computes the carry for each segment independently from the segment sums. The carries at each level in the tree are accumulated using population counters. After the last node in the tree, the segment sums can then be combined with the carries to determine the final sum output. An adder tree network implemented in this way asymptotically approaches the area and performance latency as an adder network that uses infinite speed ripple carry adders.
FAST BINARY COUNTERS BASED ON SYMMETRIC STACKING AND METHODS FOR SAME
In this paper, binary stackers and counters are presented. In an embodiment, a counter uses 3-bit stacking circuits which group the T bits together, followed by a symmetric method to combine pairs of 3-bit stacks into 6-bit stacks. The bit stacks are then converted to binary counts, producing 6:3 and 7:3 Counter circuits with no XOR gates on the critical path. This avoidance of XOR gates results in faster designs with efficient power and area utilization. In VLSI simulations, the presently-disclosed counters were 30% faster and at consumed at least 20% less power than existing parallel counters. Additionally, using the presently-disclosed counter in existing Counter Based Wallace tree multiplier architectures reduces latency and improves efficiency in terms of power-delay product for 64-bit and 128-bit multipliers.
FAST BINARY COUNTERS BASED ON SYMMETRIC STACKING AND METHODS FOR SAME
In this paper, binary stackers and counters are presented. In an embodiment, a counter uses 3-bit stacking circuits which group the T bits together, followed by a symmetric method to combine pairs of 3-bit stacks into 6-bit stacks. The bit stacks are then converted to binary counts, producing 6:3 and 7:3 Counter circuits with no XOR gates on the critical path. This avoidance of XOR gates results in faster designs with efficient power and area utilization. In VLSI simulations, the presently-disclosed counters were 30% faster and at consumed at least 20% less power than existing parallel counters. Additionally, using the presently-disclosed counter in existing Counter Based Wallace tree multiplier architectures reduces latency and improves efficiency in terms of power-delay product for 64-bit and 128-bit multipliers.