Patent classifications
G06F7/5338
Processing circuitry
This application relates to apparatus and methods for the multiplication of signals. A multiplication circuit (100) has first and second time-encoding modulators (103a, 103b) configured to receive first and second combined signals (S.sub.C1, S.sub.C2) respectively, and generate respective first and second PWM signals (S.sub.PWM1, S.sub.PWM2), each with a cycle frequency that depends substantially on the square of the value of the input combined signal. The first combined signal (S.sub.C1) corresponds to a sum of a first and second input signals (S.sub.1, S.sub.2) and the second combined signal (S.sub.C2) corresponds to the difference between the first and second input signals (S.sub.1, S.sub.2). First and second time-decoding converters (104a, 104b) receive the first and second PWM signals and provide respective first and count values (D.sub.1, D.sub.2) based on a parameter related to the frequency of the respective first or second PWM signal. A subtractor (105) determine a difference between the first and second count values (D.sub.1, D.sub.2) and provides an output signal (D.sub.OUT) based on this difference.
PROCESSING CIRCUITRY
This application relates to apparatus and methods for the multiplication of signals. A multiplication circuit (100) has first and second time-encoding modulators (103a, 103b) configured to receive first and second combined signals (S.sub.C1, S.sub.C2) respectively, and generate respective first and second PWM signals (S.sub.PWM1, S.sub.PWM2), each with a cycle frequency that depends substantially on the square of the value of the input combined signal. The first combined signal (S.sub.C1) corresponds to a sum of a first and second input signals (S.sub.1, S.sub.2) and the second combined signal (S.sub.C2) corresponds to the difference between the first and second input signals (S.sub.1, S.sub.2). First and second time-decoding converters (104a, 104b) receive the first and second PWM signals and provide respective first and count values (D.sub.1, D.sub.2) based on a parameter related to the frequency of the respective first or second PWM signal. A subtractor (105) determine a difference between the first and second count values (D.sub.1, D.sub.2) and provides an output signal (D.sub.OUT) based on this difference.
APPARATUS AND METHOD FOR PROCESSING AN INSTRUCTION MATRIX SPECIFYING PARALLEL AND DEPENDENT OPERATIONS
An execution unit to execute instructions using a time-lag sliced architecture (TLSA). The execution unit includes a first computation unit and a second computation unit, where each of the first computation unit and the second computation unit includes a plurality of logic slices arranged in order, where each of the plurality of logic slices except a lattermost logic slice is coupled to an immediately following logic slice to provide an output of that logic slice to the immediately following logic slice, where the immediately following logic slice is to execute with a time lag with respect to its immediately previous logic slice. Further, each of the plurality of logic slices of the second computation unit is coupled to a corresponding logic slice of the first computation unit to receive an output of the corresponding logic slice of the first computation unit.
PROGRAMMABLE MULTIPLY-ADD ARRAY HARDWARE
An integrated circuit including a data architecture including N adders and N multipliers configured to receive operands. The data architecture receives instructions for selecting a data flow between the N multipliers and the N adders of the data architecture. The selected data flow includes the options: (1) a first data flow using the N multipliers and the N adders to provide a multiply-accumulate mode and (2) a second data flow to provide a multiply-reduce mode.
Apparatus and method for processing an instruction matrix specifying parallel and dependent operations
A matrix of execution blocks form a set of rows and columns. The rows support parallel execution of instructions and the columns support execution of dependent instructions. The matrix of execution blocks process a single block of instructions specifying parallel and dependent instructions.
COMMUTATIVE 1ULP HARDWARE MULTIPLIER
Described herein is a truncated modified Booth multiplier that is commutative and accurate to 1 unit in the last place. In various embodiments, the truncated Booth multiplier is a radix-4 Booth multiplier or a radix-8 Booth multiplier. The truncated Booth multiplier can be included within integer, floating-point, or fixed-point units within a graphics processor or compute accelerator, including matrix accelerator units or tensor processors.
APPARATUS AND METHOD FOR PROCESSING AN INSTRUCTION MATRIX SPECIFYING PARALLEL AND DEPENDENT OPERATIONS
A matrix of execution blocks form a set of rows and columns. The rows support parallel execution of instructions and the columns support execution of dependent instructions. The matrix of execution blocks process a single block of instructions specifying parallel and dependent instructions.
Apparatus and method for processing an instruction matrix specifying parallel and dependent operations
A matrix of execution blocks form a set of rows and columns. The rows support parallel execution of instructions and the columns support execution of dependent instructions. The matrix of execution blocks process a single block of instructions specifying parallel and dependent instructions.
Extensible iterative multiplier
An extensible iterative multiplier design is provided. Embodiments provide cascaded 8-bit multipliers for simplifying the performance of multi-byte multiplications. Booth encoding is performed in the lowest order multiplier, with the result of the Booth encoding then provided to higher order multipliers. Additionally, multiply-add operations can be performed by initializing a partial product sum register. Configurable connections between the multipliers facilitate a variety of possible multiplication options, including the possibility of varying the width of the operands.