G06F2207/4812

Adder circuitry for very large integers
11662979 · 2023-05-30 · ·

An integrated circuit that includes very large adder circuitry is provided. The very large adder circuitry receives more than two inputs each of which has hundreds or thousands of bits. The very large adder circuitry includes multiple adder nodes arranged in a tree-like network. The adder nodes divide the input operands into segments, computes the sum for each segment, and computes the carry for each segment independently from the segment sums. The carries at each level in the tree are accumulated using population counters. After the last node in the tree, the segment sums can then be combined with the carries to determine the final sum output. An adder tree network implemented in this way asymptotically approaches the area and performance latency as an adder network that uses infinite speed ripple carry adders.

Data Processing Device Having A Logic Circuit for Calculating a Modified Cross Sum
20220236948 · 2022-07-28 ·

A logic circuit configured to calculate a quotient Q based on a modified cross-sum of an input word CP, a digital circuit having a first input for the input word CP that is a bit-wise inverted value of a number N of M-bit digits having a radix 2.sup.M from a least significant digit to a most significant digit, the circuit configured to calculate a quotient Q, M and N being positive integer numbers larger than one, wherein the digital circuit has a second input RIN that is configured to be set to zero, or to receive a remainder value from another logic circuit, and wherein the digital circuit provides for an output word Q having N digits, each digit of radix 2.sup.M, the output word Q being a raw quotient of the bit-wise inverted value of the input word CP.

Prefix network-directed addition

The present disclosure relates generally to techniques for enhancing adders implemented on an integrated circuit. In particular, arithmetic performed by an adder implemented to receive operands having a first precision is restructured so that a set of sub-adders performs the arithmetic on a respective segment of the operands. More specifically, the adder is restructured, and a decoder determines a generate signal and a propagate signal for each of the sub-adders and routes the generate signal and the propagate signal to a prefix network. The prefix network determines respective carry bit(s), which carries into and/or select a sum at a subsequent sub-adder.

High radix subset code multiplier architecture
11010134 · 2021-05-18 · ·

Systems, methods, and devices for enhancing performance/efficiency of soft multiplier implementations are provided. More specifically, a method to implement soft multipliers with a high radix subset code architecture is provided. The techniques provided herein result in smaller multipliers that consume less area, improve packing, consume less power, and improve routing options on an integrated circuit.

ADDER CIRCUITRY FOR VERY LARGE INTEGERS
20210075425 · 2021-03-11 · ·

An integrated circuit that includes very large adder circuitry is provided. The very large adder circuitry receives more than two inputs each of which has hundreds or thousands of bits. The very large adder circuitry includes multiple adder nodes arranged in a tree-like network. The adder nodes divide the input operands into segments, computes the sum for each segment, and computes the carry for each segment independently from the segment sums. The carries at each level in the tree are accumulated using population counters. After the last node in the tree, the segment sums can then be combined with the carries to determine the final sum output. An adder tree network implemented in this way asymptotically approaches the area and performance latency as an adder network that uses infinite speed ripple carry adders.

Software-driven design optimization for fixed-point multiply-accumulate circuitry

An example multiply accumulate (MACC) circuit includes: a multiply-accumulator having an accumulator output register; a quantizer, coupled to the multiply accumulator; and a control circuit coupled to the multiply-accumulator and the quantizer, the control circuit configured to provide control data to the quantizer, the control data indicative of a most-significant bit (MSB) to least significant bit (LSB) range for selecting bit indices from the accumulator output register.

Adder circuitry for very large integers
10873332 · 2020-12-22 · ·

An integrated circuit that includes very large adder circuitry is provided. The very large adder circuitry receives more than two inputs each of which has hundreds or thousands of bits. The very large adder circuitry includes multiple adder nodes arranged in a tree-like network. The adder nodes divide the input operands into segments, computes the sum for each segment, and computes the carry for each segment independently from the segment sums. The carries at each level in the tree are accumulated using population counters. After the last node in the tree, the segment sums can then be combined with the carries to determine the final sum output. An adder tree network implemented in this way asymptotically approaches the area and performance latency as an adder network that uses infinite speed ripple carry adders.

Data format suitable for fast massively parallel general matrix multiplication in a programmable IC

Methods and apparatus are described for performing data-intensive compute algorithms, such as fast massively parallel general matrix multiplication (GEMM), using a particular data format for both storing data to and reading data from memory. This data format may be utilized for arbitrarily-sized input matrices for GEMM implemented on a finite-size GEMM accelerator in the form of a rectangular compute array of digital signal processing (DSP) elements or similar compute cores. This data format solves the issue of double data rate (DDR) dynamic random access memory (DRAM) bandwidth by allowing both linear DDR addressing and single cycle loading of data into the compute array, avoiding input/output (I/O) and/or DDR bottlenecks.

Radix-4 multiplier partial product generation with improved area and power
10466968 · 2019-11-05 · ·

A system including a series of partial product select encoders and partial product muxes, each of the partial product select encoders receiving a multiplier, receiving a carry input from a multiplier tree, and outputting a select signal to an associated partial product mux based on the multiplier and carry input, and each of the partial product muxes outputting a partial product based on the select signal and a multiplicand received.

High performance FPGA addition

The present disclosure relates generally to techniques for enhancing adders implemented on an integrated circuit. In particular, arithmetic performed by an adder implemented to receive operands having a first precision may be restructured so that a set of sub-adders may perform the arithmetic on a respective segment of the operands. More specifically, the adder may be restructured so that a sub-adder of the set of sub-adders may concurrently output a generate signal and a propagate signal, which may both be routed to a prefix network. The prefix network may determine respective carry bit(s), which may carry into and/or select a sum at a subsequent sub-adder of the restructured adder. As a result, the integrated circuit may benefit from increased efficiencies, reduced latency, and reduced resource consumption (e.g., area and/or power) involved with implementing addition, which may improve operations such as encryption or machine learning on the integrated circuit.