G06F7/49957

METHOD AND APPARATUS WITH NEURAL NETWORK PARAMETER QUANTIZATION
20190122100 · 2019-04-25 · ·

Provided is a processor implemented method that includes performing training or an inference operation with a neural network by obtaining a parameter for the neural network in a floating-point format, applying a fractional length of a fixed-point format to the parameter in the floating-point format, performing an operation with an integer arithmetic logic unit (ALU) to determine whether to round off a fixed point based on a most significant bit among bit values to be discarded after a quantization process, and performing an operation of quantizing the parameter in the floating-point format to a parameter in the fixed-point format, based on a result of the operation with the ALU.

ROUND FOR REROUND MODE IN A DECIMAL FLOATING POINT INSTRUCTION

A round-for-reround mode (preferably in a BID encoded Decimal format) of a floating point instruction prepares a result for later rounding to a variable number of digits by detecting that the least significant digit may be a 0, and if so changing it to 1 when the trailing digits are not all 0. A subsequent reround instruction is then able to round the result to any number of digits at least 2 fewer than the number of digits of the result. An optional embodiment saves a tag indicating the fact that the low order digit of the result is 0 or 5 if the trailing bits are non-zero in a tag field rather than modify the result. Another optional embodiment also saves a half-way-and-above indicator when the trailing digits represent a decimal with a most significant digit having a value of 5. An optional subsequent reround instruction is able to round the result to any number of digits fewer or equal to the number of digits of the result using the saved tags.

REDUCED FLOATING-POINT PRECISION ARITHMETIC CIRCUITRY
20190042191 · 2019-02-07 ·

The present embodiments relate to performing reduced-precision floating-point arithmetic operations using specialized processing blocks with higher-precision floating-point arithmetic circuitry. A specialized processing block may receive four floating-point numbers that represent two single-precision floating-point numbers, each separated into an LSB portion and an MSB portion, or four half-precision floating-point numbers. A first partial product generator may generate a first partial product of first and second input signals, while a second partial product generator may generate a second partial product of third and fourth input signals. A compressor circuit may generate carry and sum vector signals based on the first and second partial products; and circuitry may anticipate rounding and normalization operations by generating in parallel based on the carry and sum vector signals at least two results when performing the single-precision floating-point operation and at least four results when performing the two half-precision floating-point operations.

DEEP NEURAL NETWORK ARCHITECTURE USING PIECEWISE LINEAR APPROXIMATION
20190042922 · 2019-02-07 ·

In one embodiment, an apparatus comprises a log circuit to: identify an input associated with a logarithm operation, wherein the logarithm operation is to be performed by the log circuit using piecewise linear approximation; identify a first range that the input falls within, wherein the first range is identified from a plurality of ranges associated with a plurality of piecewise linear approximation (PLA) equations for the logarithm operation, and wherein the first range corresponds to a first equation of the plurality of PLA equations; compute a result of the first equation based on a plurality of operands associated with the first equation; and return an output associated with the logarithm operation, wherein the output is generated based at least in part on the result of the first equation.

FLOATING POINT CHAINED MULTIPLY ACCUMULATE
20190026077 · 2019-01-24 ·

Floating point chained multiply accumulation is performed using a multiplier to multiply a first floating point operand by a second floating point operand to generate an unrounded multiplication result. An adder then adds a third floating point operand to the unrounded multiplication result to generate an unrounded accumulation result. Rounding circuitry then applies both the rounding associated with the unrounded multiplication result and rounding associated with the unrounded accumulation result to generate a rounded accumulation result.

Rounding floating point numbers
10146503 · 2018-12-04 · ·

Embodiments disclosed pertain to apparatuses, systems, and methods for floating point operations. Disclosed embodiments pertain to a circuit that is capable of processing both a normal and denormal inputs and outputting normal and denormal results, and where a rounding module is used advantageously to reduce operational latency of the circuit.

Digital signal processor using signed magnitude and wireless communication receiver having the same

A digital signal processor is provided. The digital signal processor includes an execution circuit configured to receive a first data including first bits expressed in a signed magnitude method and a second data including second bits expressed in the signed magnitude method, and a control logic circuit configured to output a control signal that determines a type of operation on the first data and the second data based on a command signal, wherein the execution circuit is further configured to perform an operation on the first data and the second data according to a determined type of operation and generate a result of the operation.

Round for reround mode in a decimal floating point instruction

A round-for-reround mode (preferably in a BID encoded Decimal format) of a floating point instruction prepares a result for later rounding to a variable number of digits by detecting that the least significant digit may be a 0, and if so changing it to 1 when the trailing digits are not all 0. A subsequent reround instruction is then able to round the result to any number of digits at least 2 fewer than the number of digits of the result. An optional embodiment saves a tag indicating the fact that the low order digit of the result is 0 or 5 if the trailing bits are non-zero in a tag field rather than modify the result. Another optional embodiment also saves a half-way-and-above indicator when the trailing digits represent a decimal with a most significant digit having a value of 5. An optional subsequent reround instruction is able to round the result to any number of digits fewer or equal to the number of digits of the result using the saved tags.

Reduced floating-point precision arithmetic circuitry
10073676 · 2018-09-11 · ·

The present embodiments relate to performing reduced-precision floating-point arithmetic operations using specialized processing blocks with higher-precision floating-point arithmetic circuitry. A specialized processing block may receive four floating-point numbers that represent two single-precision floating-point numbers, each separated into an LSB portion and an MSB portion, or four half-precision floating-point numbers. A first partial product generator may generate a first partial product of first and second input signals, while a second partial product generator may generate a second partial product of third and fourth input signals. A compressor circuit may generate carry and sum vector signals based on the first and second partial products; and circuitry may anticipate rounding and normalization operations by generating in parallel based on the carry and sum vector signals at least two results when performing the single-precision floating-point operation and at least four results when performing the two half-precision floating-point operations.

Calculation control indicator cache
10019229 · 2018-07-10 · ·

A microprocessor comprises an instruction execution unit operable to generate an intermediate result vector and a plurality of calculation control indicators and storage external to the instruction execution unit which stores the intermediate result vector and the plurality of calculation control indicators. The intermediate result vector is generated from an application of at least a first arithmetic operation of a compound arithmetic operation. The calculation control indicators indicate how subsequent calculations to generate a final result from the intermediate result vector should proceed. The subsequent calculations may involve one or more remaining arithmetic operations of the compound arithmetic operation. The intermediate result vector, in combination with the plurality of calculation control indicators, provides sufficient information to generate a result indistinguishable from an infinitely precise calculation of the compound arithmetic operation whose result is reduced in significance to a target data size.