G06F7/5306

Logic circuits with augmented arithmetic densities

Integrated circuits with programmable logic regions are provided. The programmable logic regions may be organized into smaller logic units sometimes referred to as a logic cell. A logic cell may include four 4-input lookup tables (LUTs) coupled to an adder carry chain. Each of the four 4-input LUTs may include two 3-input LUTs and a selector multiplexer. The carry chain may include at three or more full adder circuits. The outputs of the 3-input LUTs may be directly connected to inputs of the full adder circuits in the carry chain. By providing at least the same or more number of full adder circuits as the total number of 4-input LUTs in the logic cell, the arithmetic density of the logic is enhanced.

DEVICE AND METHOD FOR ACCELERATING MATRIX MULTIPLY OPERATIONS AS A SUM OF OUTER PRODUCTS

A processing device is provided which includes memory and a processor comprising a plurality of processor cores in communication with each other via first and second hierarchical communication links. Each processor core in a group of the processor cores is in communication with each other via the first hierarchical communication links. Each processor core is configured to store, in the memory, one of a plurality of sub-portions of data of a first matrix, store, in the memory, one of a plurality of sub-portions of data of a second matrix, determine an outer product of the sub-portion of data of the first matrix and the sub-portion of data of the second matrix, receive, from another processor core of the group of processor cores, another sub-portion of data of the second matrix and determine another outer product of the sub-portion of data of the first matrix and the other sub-portion of data of the second matrix.

LOGIC CIRCUITS WITH AUGMENTED ARITHMETIC DENSITIES

Integrated circuits with programmable logic regions are provided. The programmable logic regions may be organized into smaller logic units sometimes referred to as a logic cell. A logic cell may include four 4-input lookup tables (LUTs) coupled to an adder carry chain. Each of the four 4-input LUTs may include two 3-input LUTs and a selector multiplexer. The carry chain may include at three or more full adder circuits. The outputs of the 3-input LUTs may be directly connected to inputs of the full adder circuits in the carry chain. By providing at least the same or more number of full adder circuits as the total number of 4-input LUTs in the logic cell, the arithmetic density of the logic is enhanced.

Programmable-Logic-Directed Multiplier Mapping
20190042197 · 2019-02-07 ·

The present disclosure relates generally to techniques for enhancing multipliers implemented on an integrated circuit. In particular, by refactoring arithmetic implemented by a multiplier to perform multiplication, routing (e.g., wiring) used by the multiplier may be improved. As a result, the integrated circuit may benefit from increased efficiencies, reduced latency, and reduced resource consumption (e.g., wiring, area, and power) involved with implementing multiplication, which may improve machine learning implementations on the integrated circuit.

METHOD AND APPARATUS FOR PERFORMING FIELD PROGRAMMABLE GATE ARRAY PACKING WITH CONTINUOUS CARRY CHAINS
20190042674 · 2019-02-07 ·

A method for designing a system on a target device includes identifying a length for a carry chain that is supported by predefined quanta of a resource on the target device. A plurality of logical adders is mapped onto a single logical adder implemented on the carry chain subject to the identified length to increase logic utilization in a design for the system.

METHOD AND APPARATUS FOR PERFORMING SYNTHESIS FOR FIELD PROGRAMMABLE GATE ARRAY EMBEDDED FEATURE PLACEMENT
20190042683 · 2019-02-07 ·

A method for designing and configuring a system on a field programmable gate array (FPGA) is disclosed. A portion of the system that is implemented greater than a predetermined number of times is identified. A structural netlist that describes how to implement the portion of the system a plurality of times on the FPGA and that leverages a repetitive nature of implementing the portion is generated. The identifying and generating is performed prior to synthesizing and placing other portions of the system that are not implemented greater than the predetermined number of time. Synthesizing, placing, and routing the other portions of the system on the FPGA is performed in accordance with the structural netlist. The FPGA is configured with a configuration file that includes a design for the system that reflects the synthesizing, placing, and routing, wherein the configuring physically transforms resources on the FPGA to implement the system.

MULTIPLY-ACCUMULATE CIRCUIT AND METHOD FOR PERFORMING MULTIPLY-ACCUMULATE OPERATIONS
20240296012 · 2024-09-05 ·

A multiply-accumulate circuit for processing numerical values that are present as input words, each of which is formed from at least two partial words. The circuit is configured, corresponding to a permutation selected from a plurality of permutation possibilities implemented by the multiply-accumulate circuit, to form product partial words as products of in each case one partial word of the first input word with one partial word of the second input word, wherein in the products, the partial words of the first input word are permutated relative to their original order corresponding to the selected permutation; and to add the product partial words with an accumulation word, which is formed from one or more partial words, to determine an updated accumulation word in which product partial words are in each case added to one of the one or more partial words of the accumulation word.

Methods and apparatuses for performing multiplication

In a novel computation device, a plurality of partial product generators is communicatively coupled to a binary number multiplier. The binary number is partitioned in the computation device into non-overlapping subsets of binary bits and each subset is coupled to one of the plurality of partial product generators. Each partial product generator, upon receiving a subset of binary bits representing a number, generates a multiplication product of the number and a predetermined constant. The multiplication products from all partial product generators are summed to generate the final product between the predetermined constant and the binary number. The partial product generators are constructed by logic gates and wires connected the logic gates including a AND gate. The partial product generators are free of memory elements.

Methods and Apparatuses for Performing Multiplication
20170168775 · 2017-06-15 ·

In a novel computation device, a plurality of partial product generators is communicatively coupled to a binary number multiplier. The binary number is partitioned in the computation device into non-overlapping subsets of binary bits and each subset is coupled to one of the plurality of partial product generators. Each partial product generator, upon receiving, a subset of binary bits representing a number, generates a multiplication product of the number and a predetermined constant. The multiplication products from all partial product generators are summed to generate the final, product between the predetermined constant and the binary number. The partial product generators are constructed by logic gates and wires connected the logic gates including a AND gate. The partial product generators are free of memory elements.

Instruction and logic for shift-sum multiplier
09678749 · 2017-06-13 · ·

A processor includes a front end including a decoder, an execution unit including a shift-sum multiplier (SSM), and a retirement unit. The decoder includes logic identify a multiplication instruction to multiply a first number and a second number. The execution unit includes logic to, based on the instruction, access a look-up table based on the second number to determine a plurality of shift parameters and one or more flag parameters. The SSM includes logic to use the shift parameters to shift the first number to determine a plurality of partial products, and the flag parameters to determine signs of the partial products. The SSM also includes logic to sum the partial products to yield a result of the multiplication instruction.