Patent classifications
G06F7/5336
MICROPROCESSOR WITH DYNAMICALLY ADJUSTABLE BIT WIDTH FOR PROCESSING DATA
A microprocessor with dynamically adjustable bit width is provided, which has a bit width register, a datapath, a statistical register, and a bit width adjuster. The bit width register stores at least one bit width. The datapath operates according to the bit width stored in the bit width register to acquire input operands from received data and process input operands. The statistical register collects calculation results of the datapath. The bit width adjuster adjusts the bit width stored in the bit width register based on the calculation results collected in the statistical register.
HIGH RADIX SUBSET CODE MULTIPLIER ARCHITECTURE
Systems, methods, and devices for enhancing performance/efficiency of soft multiplier implementations are provided. More specifically, a method to implement soft multipliers with a high radix subset code architecture is provided. The techniques provided herein result in smaller multipliers that consume less area, improve packing, consume less power, and improve routing options on an integrated circuit.
COMMUTATIVE 1ULP HARDWARE MULTIPLIER
Described herein is a truncated modified Booth multiplier that is commutative and accurate to 1 unit in the last place. In various embodiments, the truncated Booth multiplier is a radix-4 Booth multiplier or a radix-8 Booth multiplier. The truncated Booth multiplier can be included within integer, floating-point, or fixed-point units within a graphics processor or compute accelerator, including matrix accelerator units or tensor processors.
Multiplication of first and second operands using redundant representation
A method is provided for multiplying a first operand comprising at least two X-bit portions and a second operand comprising at least one Y-bit portion. At least two partial products are generated, each partial product comprising a product of a selected X-bit portion of the first operand and a selected Y-bit portion of the second operand. Each partial product is converted to a redundant representation in dependence on significance indicating information indicative of a significance of the partial product. In the redundant representation, the partial product is represented using a number of N-bit portions, and in a group of at least two adjacent N-bit portions, a number of overlap bits of a lower N-bit portion of the group have a same significance as some least significant bits of at least one upper N-bit portion of the group. The partial products are added while represented in the redundant representation.
MULTIPLICATION OF FIRST AND SECOND OPERANDS USING REDUNDANT REPRESENTATION
A method is provided for multiplying a first operand comprising at least two X-bit portions and a second operand comprising at least one Y-bit portion. At least two partial products are generated, each partial product comprising a product of a selected X-bit portion of the first operand and a selected Y-bit portion of the second operand. Each partial product is converted to a redundant representation in dependence on significance indicating information indicative of a significance of the partial product. In the redundant representation, the partial product is represented using a number of N-bit portions, and in a group of at least two adjacent N-bit portions, a number of overlap bits of a lower N-bit portion of the group have a same significance as some least significant bits of at least one upper N-bit portion of the group. The partial products are added while represented in the redundant representation.
MULTIPLIER-ACCUMULATOR CIRCUIT WITH PATH MATCHING
A multiplier-accumulator circuit is disclosed, comprising: a partial product generation (PPG) module, summation circuitry and path matching circuitry. The PPG module is configured to receive n multiplicands and n multipliers to generate multiple partial products according to a predefined multiplication algorithm. The summation circuitry coupled to the PPG module comprises S levels of compressors constructed from carry-save adders for summing up the multiple partial products and multiple previous accumulation terms to produce multiple current accumulation terms such that each bit of the multiple current accumulation terms has substantially the same path delay from inputs to outputs of the summation circuitry. The path matching circuitry comprising multiple components that receive a first clock signal to generate a second clock signal. The multiple components comprise either a first number of logic gates connected in series or the same cells as those embedded in the summation circuitry.