Patent classifications
G06F2207/4812
MULTIPLY-ACCUMULATE SUCCESSIVE APPROXIMATION DEVICES AND METHODS
A multiply-accumulate successive approximation (MASAR) column is provided. The MASAR column includes a plurality of MASAR cells, each including a multiplier configured to perform digital multiplication between an input activation received to an input and an operand to compute a result, and a unit capacitor configured to store the result as analog charge. The MASAR column further includes digital logic configured to perform analog summation of the analog charge of the unit capacitors of the plurality of MASAR cells to determine a digital output of the multiplication.
INTEGRATED CIRCUITS WITH SPECIALIZED PROCESSING BLOCKS FOR PERFORMING FLOATING-POINT FAST FOURIER TRANSFORMS AND COMPLEX MULTIPLICATION
Integrated circuits with specialized processing blocks are provided. A specialized processing block may include one real addition stage and one real multiplier stage. The multiplier stage may simultaneously feed its output to the addition stage and directly to an adjacent specialized processing block. The addition stage may also produce sum and difference outputs in parallel. A group of four such specialized processing blocks may be connected in a chain to implement a radix-2 fast Fourier transform (FFT) butterfly. Multiple radix-2 butterflies may be stacked to form yet higher order radix butterflies. If desired, the specialized processing block may also be used to implement a complex multiply operation. Three or four specialized processing blocks may be chained together and along with one or more adders outside the specialized processing blocks, real and imaginary portions of a complex product can be generated.
ADDER CIRCUITRY FOR VERY LARGE INTEGERS
An integrated circuit that includes very large adder circuitry is provided. The very large adder circuitry receives more than two inputs each of which has hundreds or thousands of bits. The very large adder circuitry includes multiple adder nodes arranged in a tree-like network. The adder nodes divide the input operands into segments, computes the sum for each segment, and computes the carry for each segment independently from the segment sums. The carries at each level in the tree are accumulated using population counters. After the last node in the tree, the segment sums can then be combined with the carries to determine the final sum output. An adder tree network implemented in this way asymptotically approaches the area and performance latency as an adder network that uses infinite speed ripple carry adders.
Prefix Network-Directed Addition
The present disclosure relates generally to techniques for enhancing adders implemented on an integrated circuit. In particular, arithmetic performed by an adder implemented to receive operands having a first precision may be restructured so that a set of sub-adders may perform the arithmetic on a respective segment of the operands. More specifically, the adder may be restructured so that a decoder may determine a generate signal and a propagate signal for each of the sub-adders and may route the generate signal and the propagate signal to a prefix network. The prefix network may determine respective carry bit(s), which may carry into and/or select a sum at a subsequent sub-adder. As a result, the integrated circuit may benefit from increased efficiencies, reduced latency, and reduced resource consumption (e.g., area and/or power) involved with implementing addition, which may improve operations such as encryption or machine learning on the integrated circuit.
HIGH RADIX SUBSET CODE MULTIPLIER ARCHITECTURE
Systems, methods, and devices for enhancing performance/efficiency of soft multiplier implementations are provided. More specifically, a method to implement soft multipliers with a high radix subset code architecture is provided. The techniques provided herein result in smaller multipliers that consume less area, improve packing, consume less power, and improve routing options on an integrated circuit.
Arithmetic device and neural network device
An arithmetic device includes N product-sum-operation circuits, a control circuit, and an output circuit. Each product-sum-operation circuit outputs intermediate signals obtained by binarizing a product-sum-operation value obtained by product-sum-operation of M input values of M input signals and M weight values. The control circuit inverts positive/negative of each M weight value at determining-timing when a given time elapses from input timing. Based on a delay time from the determination-timing to logic finalization of the intermediate signal for each N product-sum-operation circuit, the output circuit outputs an output signal representing a winner-product-sum-operation circuit for which the product-sum-operation value having a sign and the largest absolute value is calculated. Each N product-sum-operation circuit starts the product-sum-operation from the input-timing and the determination-timing, and outputs an intermediate signal for which a propagation-delay-time from starting of the product-sum-operation to inversion of the logic corresponds to the absolute value of the product-sum-operation value.
INTEGRATED CIRCUITS WITH SPECIALIZED PROCESSING BLOCKS FOR PERFORMING FLOATING-POINT FAST FOURIER TRANSFORMS AND COMPLEX MULTIPLICATION
Integrated circuits with specialized processing blocks are provided. A specialized processing block may include one real addition stage and one real multiplier stage. The multiplier stage may simultaneously feed its output to the addition stage and directly to an adjacent specialized processing block. The addition stage may also produce sum and difference outputs in parallel. A group of four such specialized processing blocks may be connected in a chain to implement a radix-2 fast Fourier transform (FFT) butterfly. Multiple radix-2 butterflies may be stacked to form yet higher order radix butterflies. If desired, the specialized processing block may also be used to implement a complex multiply operation. Three or four specialized processing blocks may be chained together and along with one or more adders outside the specialized processing blocks, real and imaginary portions of a complex product can be generated.
Configurable MAC pipelines for finite-impulse-response filtering, and methods of operating same
An integrated circuit comprising a plurality MAC pipelines wherein each MAC pipeline includes: (i) a plurality of MACs connected in series and (ii) a plurality of data paths including an accumulation data path, wherein each MAC includes a multiplier to multiply to generate product data and an accumulator to generate sum data. The integrated circuit further comprises a plurality of control/configure circuits, wherein each control/configure circuit connects directly to and is associated with a MAC pipeline, wherein each control/configure circuit includes an accumulation data path which is configurable to directly connect to the accumulation data path of the MAC pipeline to form an accumulation ring when the control/configure circuit is configured in an accumulation mode, and an output data path configurable to directly connect to the output of the accumulation data path of the MAC pipeline when the control/configure circuit is configured in an output data mode.