Patent classifications
G06F7/533
Software-driven design optimization for mapping between floating-point and fixed-point multiply accumulators
An example multiply accumulate (MACC) circuit includes a multiply-accumulator having an accumulator output register, a scaler, coupled to the multiply accumulator, and a control circuit coupled to the multiply-accumulator and the scaler. The control circuit is configured to provide control data to the scaler, the control data indicative of: a most-significant bit (MSB) to least significant bit (LSB) range for selecting bit indices from the accumulator output register for implementing a first right shift; a multiplier; and a second right shift.
COMPUTER DATA PROCESSING METHOD AND APPARATUS FOR LARGE NUMBER OPERATIONS
Implementations of this specification provide a method and apparatus for computer data processing for large number operations. An example method performed by a computing device includes splitting a multiplier and a multiplicand into respective four 64-bit numbers from most significant bits to least significant bits; reading the split multipliers and the split multiplicands into a register; and obtaining a multiplication processing result for the multiplier and the multiplicand by performing operations including: classifying the split multipliers and the split multiplicands into groups of data pairs, calculating multiplication results of the groups of data pairs one by one, performing accumulation on multiplication results of data pairs in each group, and storing an accumulation result corresponding to the data pairs in memory as the multiplication processing result for the multiplier and the multiplicand.
COMPUTER DATA PROCESSING METHOD AND APPARATUS FOR LARGE NUMBER OPERATIONS
Implementations of this specification provide a method and apparatus for computer data processing for large number operations. An example method performed by a computing device includes splitting a multiplier and a multiplicand into respective four 64-bit numbers from most significant bits to least significant bits; reading the split multipliers and the split multiplicands into a register; and obtaining a multiplication processing result for the multiplier and the multiplicand by performing operations including: classifying the split multipliers and the split multiplicands into groups of data pairs, calculating multiplication results of the groups of data pairs one by one, performing accumulation on multiplication results of data pairs in each group, and storing an accumulation result corresponding to the data pairs in memory as the multiplication processing result for the multiplier and the multiplicand.
Efficient FPGA multipliers
In some example embodiments a logical block comprising twelve inputs and two six-input lookup tables (LUTs) is provided, wherein four of the twelve inputs are provided as inputs to both of the six-input lookup tables. This configuration supports efficient field programmable gate array (FPGA) implementation of multipliers. Each six-input LUT comprises two five-input lookup tables (LUT5s) that are used to form Booth encoding multiplier building blocks. The five inputs to each LUT5 are two bits from a multiplier and three Booth-encoded bits from a multiplicand. By assembling building blocks, multipliers of arbitrary size may be formed.
Efficient FPGA multipliers
In some example embodiments a logical block comprising twelve inputs and two six-input lookup tables (LUTs) is provided, wherein four of the twelve inputs are provided as inputs to both of the six-input lookup tables. This configuration supports efficient field programmable gate array (FPGA) implementation of multipliers. Each six-input LUT comprises two five-input lookup tables (LUT5s) that are used to form Booth encoding multiplier building blocks. The five inputs to each LUT5 are two bits from a multiplier and three Booth-encoded bits from a multiplicand. By assembling building blocks, multipliers of arbitrary size may be formed.
Processing circuitry
This application relates to apparatus and methods for the multiplication of signals. A multiplication circuit (100) has first and second time-encoding modulators (103a, 103b) configured to receive first and second combined signals (S.sub.C1, S.sub.C2) respectively, and generate respective first and second PWM signals (S.sub.PWM1, S.sub.PWM2), each with a cycle frequency that depends substantially on the square of the value of the input combined signal. The first combined signal (S.sub.C1) corresponds to a sum of a first and second input signals (S.sub.1, S.sub.2) and the second combined signal (S.sub.C2) corresponds to the difference between the first and second input signals (S.sub.1, S.sub.2). First and second time-decoding converters (104a, 104b) receive the first and second PWM signals and provide respective first and count values (D.sub.1, D.sub.2) based on a parameter related to the frequency of the respective first or second PWM signal. A subtractor (105) determine a difference between the first and second count values (D.sub.1, D.sub.2) and provides an output signal (D.sub.OUT) based on this difference.
System and method for long addition and long multiplication in associative memory
A method for an associative memory device includes replacing a set of three multi-bit binary numbers P, Q and R, stored in the associative memory device, with two multi-bit binary numbers X and Y, also stored in the associative memory device, wherein a sum of the binary numbers P, Q and R is equal to a sum of the binary numbers X and Y. A system includes an associative memory array having rows and columns and a multi-bit multiplier. Each column of the array stores two multi-bit binary numbers to be multiplied. The multi-bit multiplier multiplies, in parallel, the two multi-bit binary numbers per column by concurrently processing all bits of partial products generated by the multiplier. The multiplier performs the processing without any carry propagation delay when adding all but the last two partial products.
System and method for long addition and long multiplication in associative memory
A method for an associative memory device includes replacing a set of three multi-bit binary numbers P, Q and R, stored in the associative memory device, with two multi-bit binary numbers X and Y, also stored in the associative memory device, wherein a sum of the binary numbers P, Q and R is equal to a sum of the binary numbers X and Y. A system includes an associative memory array having rows and columns and a multi-bit multiplier. Each column of the array stores two multi-bit binary numbers to be multiplied. The multi-bit multiplier multiplies, in parallel, the two multi-bit binary numbers per column by concurrently processing all bits of partial products generated by the multiplier. The multiplier performs the processing without any carry propagation delay when adding all but the last two partial products.
PROCESSING APPARATUS AND PROCESSING METHOD
The present disclosure provides a processing device and method. The device includes: an input/output module, a controller module, a computing module, and a storage module. The input/output module is configured to store and transmit input and output data; the controller module is configured to decode a computation instruction into a control signal to control other modules to perform operation; the computing module is configured to perform four arithmetic operation, logical operation, shift operation, and complement operation on data; and the storage module is configured to temporarily store instructions and data. The present disclosure can execute a composite scalar instruction accurately and efficiently.
Multiply-accumulate “0” data gating
In an example, an apparatus comprises a plurality of execution units and logic, at least partially including hardware logic, to gate at least one of a multiply unit or an accumulate unit in response to an input of value zero. Other embodiments are also disclosed and claimed.