G06F7/5338

Unified computation systems and methods for iterative multiplication and division, efficient overflow detection systems and methods for integer division, and tree-based addition systems and methods for single-cycle multiplication

A unified computation unit for iterative multiplication and division may include an architecture having a unified integer iterative multiplication and division circuit. A method may include a device receiving a dividend and a divisor for a division operation, separating the dividend into two parts based on the determining, and evaluating whether an overflow situation exists based on the two parts. A single-cycle multiplication unit may include a multi-operand addition schema for partial products compression that implements tree-based addition methods for single-cycle multiplication operations.

Montgomery multiplication method for performing final modular reduction without comparison operation and montgomery multiplier

A Montgomery multiplier includes a partial product computing unit for multiplying a multiplicand and a multiplier; a modulus reduction computing unit for performing a multiplication of a modulus and a quotient that reflects a quotient sign; an accumulation unit for accumulating in a intermediate value an output value of the partial product computing unit and an output value of the modulus reduction computing unit from a previous cycle; a quotient computing unit for receiving an accumulation value of the accumulation unit during a current cycle and calculating a quotient sign to be used during a next cycle; and a quotient sign determination unit for determining a quotient sign to be used during a next cycle from the multiplicand, the multiplier and the quotient.

Full adder cell with improved power efficiency
11169779 · 2021-11-09 · ·

An adder circuit provides a first operand input and a second operand input to an XNOR cell. The XNOR cell transforms these inputs to a propagate signal that is applied to an OAT cell to produce a carry out signal. A third OAT cell transforms a third operand input and the propagate signal into a sum output signal.

Apparatus and method for processing an instruction matrix specifying parallel and dependent operations
11163720 · 2021-11-02 · ·

An execution unit to execute instructions using a time-lag sliced architecture (TLSA). The execution unit includes a first computation unit and a second computation unit, where each of the first computation unit and the second computation unit includes a plurality of logic slices arranged in order, where each of the plurality of logic slices except a lattermost logic slice is coupled to an immediately following logic slice to provide an output of that logic slice to the immediately following logic slice, where the immediately following logic slice is to execute with a time lag with respect to its immediately previous logic slice. Further, each of the plurality of logic slices of the second computation unit is coupled to a corresponding logic slice of the first computation unit to receive an output of the corresponding logic slice of the first computation unit.

Full adder cell with improved power efficiency
11294631 · 2022-04-05 · ·

An adder circuit that includes an operand input and a second operand input to an XNOR cell. The XNOR cell is configured to provide the operand input and the second operand input to both a NAND gate and a first OAI cell. A second OAI cell transforms the output of the XNOR cell into a carry out signal.

Programmable multiply-add array hardware
10970043 · 2021-04-06 · ·

An integrated circuit including a data architecture including N adders and N multipliers configured to receive operands. The data architecture receives instructions for selecting a data flow between the N multipliers and the N adders of the data architecture. The selected data flow includes the options: (1) a first data flow using the N multipliers and the N adders to provide a multiply-accumulate mode and (2) a second data flow to provide a multiply-reduce mode.

FULL ADDER CELL WITH IMPROVED POWER EFFICIENCY
20210124558 · 2021-04-29 · ·

An adder circuit provides a first operand input and a second operand input to an XNOR cell. The XNOR cell transforms these inputs to a propagate signal that is applied to an OAT cell to produce a carry out signal. A third OAT cell transforms a third operand input and the propagate signal into a sum output signal.

FULL ADDER CELL WITH IMPROVED POWER EFFICIENCY
20210124559 · 2021-04-29 · ·

This disclosure relates to an adder circuit. The adder circuit comprises an operand input and a second operand input to an XNOR cell. The XNOR cell may be configured to provide the operand input and the second operand input to both a NAND gate and a first OAI cell. A second OAI cell may transform the output of the XNOR cell into a carry out signal.

PROGRAMMABLE MULTIPLY-ADD ARRAY HARDWARE
20200293283 · 2020-09-17 ·

An integrated circuit including a data architecture including N adders and N multipliers configured to receive operands. The data architecture receives instructions for selecting a data flow between the N multipliers and the N adders of the data architecture. The selected data flow includes the options: (1) a first data flow using the N multipliers and the N adders to provide a multiply-accumulate mode and (2) a second data flow to provide a multiply-reduce mode.

Programmable multiply-add array hardware
10678507 · 2020-06-09 · ·

An integrated circuit including a data architecture including N adders and N multipliers configured to receive operands. The data architecture receives instructions for selecting a data flow between the N multipliers and the N adders of the data architecture. The selected data flow includes the options: (1) a first data flow using the N multipliers and the N adders to provide a multiply-accumulate mode and (2) a second data flow to provide a multiply-reduce mode.