Patent classifications
G06F2207/3884
ROUTING INSTRUCTIONS IN A MICROPROCESSOR
A computer system, processor, programming instructions and/or method for balancing the workload of processing pipelines that includes an execution slice, the execution slice comprising at least two processing pipelines having one or more execution units for processing instructions, wherein at least a first processing pipeline and a second processing pipeline are capable of executing a first instruction type; and an instruction decode unit for decoding instructions to determine which of the first processing pipeline or the second processing pipeline to execute the first instruction type. The processor configured to calculate at least one of a workload group consisting of: the first processing pipeline workload, the second processing pipeline workload, and combinations thereof; and select the first processing pipeline or the second processing pipeline to execute the first instruction type based upon at least one of the workload group.
Process for Performing Floating Point Multiply-Accumulate Operations with Precision Based on Exponent Differences for Saving Power
A process for a floating point multiplier-accumulator (MAC) is operative on N pairs of floating point values using N MAC processes operating concurrently, each MAC process operating on a pair of values comprising an input value and a coefficient value. Each MAC process simultaneously generates an integer form fraction accompanied by a sign bit and an exponent difference computed by subtracting an exponent sum from a maximum exponent sum of all exponent sums. A range estimating process determines a possible range of values from the exponent differences and determines an adder precision. A summing process adds all of the integer form fractions using the determined adder precision, and converts the sum to a floating point value using the maximum exponent sum, sign bit of the summed integer form fractions, and optionally performs a 2's complement of the summed integer form fraction if the sign bit is negative.
Power Saving Floating Point Multiplier-Accumulator with Precision-Aware Accumulation
A floating point multiplier-accumulator (MAC) multiplies and accumulates N pairs of floating point values using N MAC processors operating simultaneously, each pair of values comprising an input value and a coefficient value to be multiplied and accumulated. The pairs of floating point values are simultaneously processed by the plurality of MAC processors, each of which outputs a signed integer form fraction and a maximum exponent. A range estimator forms a possible range of values from the exponent differences and determines an adder precision. The integer form fractions are summed using the adder precision, a sign bit is extracted, and a floating point value is output. Each MAC processor provides its integer form fraction with a precision determined by the MAC processor's exponent difference.
Process for Dual Mode Floating Point Multiplier-Accumulator with High Precision Mode for Near Zero Accumulation Results
A process for a floating point multiplier-accumulator (MAC) is operative on N pairs of floating point values using N MAC processes operating concurrently, each MAC process operating on a pair of values comprising an input value and a coefficient value. Each MAC process simultaneously generates: an integer form fraction at a first bitwidth and a second bitwidth greater than the first bitwidth, a sign bit, and an exponent difference computed by subtracting an exponent sum from a maximum exponent sum of all exponent sums. The integer form fractions of the first bitwidths are provided to an adder tree using the first bitwidth, and if the sum has an excess percentage of leading 0s, then the second bitwidth is used by an adder tree using the second bitwidth to form a great precision integer form fraction. The sign, integer form fraction, and maximum exponent are provided to an normalizer which generates a floating point result.
PIPELINED HARDWARE TO ACCELERATE MODULAR ARITHMETIC OPERATIONS
Embodiments are directed to elliptic curve cryptography scalar multiplications in a generic field with heavy pipelining between field operations. A bit width is determined of operands in data to be processed by a modular hardware block. It is checked whether the bit width of the operands matches a fixed bit width of the modular hardware block. In response to there being a match, the modular hardware block processes the operands. In response to there being a mismatch, the operands are modified to be accommodated by the fixed bit width of the modular hardware block.
MULTIPLY-ACCUMULATE WITH VARIABLE FLOATING POINT PRECISION
An integrated circuit including a multiplier-accumulator execution pipeline including a plurality of multiplier-accumulator circuits to, in operation, perform multiply and accumulate operations, wherein each multiplier-accumulator circuit includes: (i) a multiplier to multiply first input data, having a first floating point data format, by a filter weight data, having the first floating point data format, and generate and output a product data having a second floating point data format, and (ii) an accumulator, coupled to the multiplier of the associated MAC circuit, to add second input data and the product data output by the associated multiplier to generate sum data. The plurality of multiplier-accumulator circuits of the multiplier-accumulator execution pipeline may be connected in series and, in operation, perform a plurality of concatenated multiply and accumulate operations.
Multiplier-Accumulator Circuitry and Pipeline using Floating Point Data, and Methods of using Same
An integrated circuit including a multiplier-accumulator execution pipeline including a plurality of multiplier-accumulator circuits to, in operation, perform multiply and accumulate operations, wherein each multiplier-accumulator circuit includes: (i) a multiplier to multiply first input data, having a first floating point data format, by a filter weight data, having the first floating point data format, and generate and output a product data having a second floating point data format, and (ii) an accumulator, coupled to the multiplier of the associated MAC circuit, to add second input data and the product data output by the associated multiplier to generate sum data. The plurality of multiplier-accumulator circuits of the multiplier-accumulator execution pipeline may be connected in series and, in operation, perform a plurality of concatenated multiply and accumulate operations.
Floating-point number operation circuit and method
This invention discloses a floating-point number operation circuit and a method thereof. The floating-point number operation circuit is configured to perform a fused multiplication and accumulation (fused mac) operation or a multiplication and accumulation (mac) operation on a first operand, a second operand, and a third operand, or perform a multiplication operation on the first operand and the second operand. The floating-point number operation circuit includes two rounding circuits, a multiplication circuit, a selection circuit, a control circuit, and an addition circuit. The control circuit controls the scheduling of various operations and the use of resources on each calculation path.
Implementing logarithmic and antilogarithmic operations based on piecewise linear approximation
Implementations of the disclosure provide logarithm and anti-logarithm operations on a hardware processor based on linear piecewise approximation. An example processor includes a piece wise linear log approximation circuit that receives an input of a floating-point number comprising a sign, an exponent and a mantissa. The piece wise linear log approximation circuit approximates a fractional portion of a fixed point number using a linear approximation of the mantissa of the floating-point number. The piece wise linear log approximation circuit also derives an integer from the exponent.
ARITHMETIC PROCESSING DEVICE, IMAGE PROCESSING DEVICE, AND IMAGING DEVICE
An arithmetic processing device of a pipeline configuration in which a combination of a combination circuit and a flip-flop circuit group including a plurality of flip-flop circuits corresponding to each bits of output data of the combination circuit is connected in a plurality of stages includes a mask processing section configured to control a mask of an operation clock signal to be supplied to each flip-flop circuit, wherein the mask processing section is configured to supply the operation clock signal to each flip-flop circuit corresponding to a bit of the input data for use in the arithmetic process in the combination circuit, and wherein the mask processing section is configured to mask the operation clock signal corresponding to a bit of the input data that is unused in the arithmetic process in the combination circuit.