G06F7/487

CIRCUITRY AND METHOD
20230005209 · 2023-01-05 ·

Circuitry comprises ray tracing circuitry comprising a plurality of floating-point circuitries to perform floating-point processing operations to detect intersection between a virtual ray defined by a ray direction and a test region, the floating-point circuitries operating to a given precision to generate an output floating-point value comprising a significand and an exponent; in which at least some of the plurality of floating-point circuitries are configured to round using a predetermined directed rounding mode any denormal floating-point value generated by operation of that circuitry so as to output normal values, a denormal floating-point value being a floating-point value in which the significand comprises one or more leading zeroes.

Computer processor for higher precision computations using a mixed-precision decomposition of operations
11544057 · 2023-01-03 · ·

Embodiments detailed herein relate to arithmetic operations of float-point values. An exemplary processor includes decoding circuitry to decode an instruction, where the instruction specifies locations of a plurality of operands, values of which being in a floating-point format. The exemplary processor further includes execution circuitry to execute the decoded instruction, where the execution includes to: convert the values for each operand, each value being converted into a plurality of lower precision values, where an exponent is to be stored for each operand; perform arithmetic operations among lower precision values converted from values for the plurality of the operands; and generate a floating-point value by converting a resulting value from the arithmetic operations into the floating-point format and store the floating-point value.

APPARATUSES, METHODS, AND SYSTEMS FOR INSTRUCTIONS FOR MATRIX MULTIPLICATION INSTRUCTIONS

Techniques for matrix multiplication are described. In some examples, decode circuitry is to decode a single instruction having fields for an opcode, an indication of a location of a first source operand, an indication of a location of a second source operand, and an indication of a location of a destination operand, wherein the opcode is to indicate that execution circuitry is to at least convert data elements of the first and second source operands from a first floating point representation to a second floating point representation, perform matrix multiplication with the converted data elements, and accumulate results of the matrix multiplication in the destination operand in the first floating point representation; and the execution circuitry is to execute to the decoded instruction as specified by the opcode.

Process for Performing Floating Point Multiply-Accumulate Operations with Precision Based on Exponent Differences for Saving Power
20220405052 · 2022-12-22 · ·

A process for a floating point multiplier-accumulator (MAC) is operative on N pairs of floating point values using N MAC processes operating concurrently, each MAC process operating on a pair of values comprising an input value and a coefficient value. Each MAC process simultaneously generates an integer form fraction accompanied by a sign bit and an exponent difference computed by subtracting an exponent sum from a maximum exponent sum of all exponent sums. A range estimating process determines a possible range of values from the exponent differences and determines an adder precision. A summing process adds all of the integer form fractions using the determined adder precision, and converts the sum to a floating point value using the maximum exponent sum, sign bit of the summed integer form fractions, and optionally performs a 2's complement of the summed integer form fraction if the sign bit is negative.

Power Saving Floating Point Multiplier-Accumulator with Precision-Aware Accumulation
20220405051 · 2022-12-22 · ·

A floating point multiplier-accumulator (MAC) multiplies and accumulates N pairs of floating point values using N MAC processors operating simultaneously, each pair of values comprising an input value and a coefficient value to be multiplied and accumulated. The pairs of floating point values are simultaneously processed by the plurality of MAC processors, each of which outputs a signed integer form fraction and a maximum exponent. A range estimator forms a possible range of values from the exponent differences and determines an adder precision. The integer form fractions are summed using the adder precision, a sign bit is extracted, and a floating point value is output. Each MAC processor provides its integer form fraction with a precision determined by the MAC processor's exponent difference.

Process for Dual Mode Floating Point Multiplier-Accumulator with High Precision Mode for Near Zero Accumulation Results
20220405054 · 2022-12-22 · ·

A process for a floating point multiplier-accumulator (MAC) is operative on N pairs of floating point values using N MAC processes operating concurrently, each MAC process operating on a pair of values comprising an input value and a coefficient value. Each MAC process simultaneously generates: an integer form fraction at a first bitwidth and a second bitwidth greater than the first bitwidth, a sign bit, and an exponent difference computed by subtracting an exponent sum from a maximum exponent sum of all exponent sums. The integer form fractions of the first bitwidths are provided to an adder tree using the first bitwidth, and if the sum has an excess percentage of leading 0s, then the second bitwidth is used by an adder tree using the second bitwidth to form a great precision integer form fraction. The sign, integer form fraction, and maximum exponent are provided to an normalizer which generates a floating point result.

Power Saving Floating Point Multiplier-Accumulator With a High Precision Accumulation Detection Mode
20220405053 · 2022-12-22 · ·

A floating point multiplier-accumulator (MAC) multiplies and accumulates N pairs of floating point values using N MAC processors operating simultaneously, each pair of values comprising an input value and a coefficient value to be multiplied and accumulated. The pairs of floating point values are simultaneously processed by the plurality of MAC processors, each of which output a signed integer form fraction with a first bitwidth and a second bitwith, along with a maximum exponent. The first bitwidth signed integer form fractions are summed by an adder tree using the first bitwidth to form a first sum, and when an excess leading 0 condition is detected, a second adder tree operative on the second bitwidth integer form fractions forms a second sum. The first sum or second sum, along with the maximum exponent, is converted into floating point result.

SPARSE MATRIX MULTIPLICATION IN HARDWARE
20220382829 · 2022-12-01 ·

Aspects of the disclosure provide for methods, systems, and apparatuses, including computer-readable storage media, for sparse matrix multiplication. A system for matrix multiplication includes an array of sparse shards. Each sparse shard can be configured to receive an input sub-matrix and an input sub-vector, where the input sub-matrix has a number of non-zero values equal to or less than a predetermined maximum non-zero threshold. The sparse shard can, by a plurality of multiplier circuits, compute one or more products of vector values multiplied with respective non-zero values of the input sub-matrix. The sparse shard can generate, as output to the sparse shard and using the one or more products, a shard output vector that is the product of applying the shard input vector to the shard input matrix.

HARDWARE ACCELERATOR METHOD AND DEVICE

A processor-implemented hardware accelerator method includes: receiving input data; loading a lookup table (LUT); determining an address of the LUT by inputting the input data to a comparator; obtaining a value of the LUT corresponding to the input data based on the address; and determining a value of a nonlinear function corresponding to the input data based on the value of the LUT, wherein the LUT is determined based on a weight of a neural network that outputs the value of the nonlinear function.

Neural network method and apparatus with floating point processing

A processor-implemented includes receiving a first floating point operand and a second floating point operand, each having an n-bit format comprising a sign field, an exponent field, and a significand field, normalizing a binary value obtained by performing arithmetic operations for fields corresponding to each other in the first and second floating point operands for an n-bit multiplication operation, determining whether the normalized binary value is a number that is representable in the n-bit format or an extended normal number that is not representable in the n-bit format, according to a result of the determining, encoding the normalized binary value using an extension bit format in which an extension pin identifying whether the normalized binary value is the extended normal number is added to the n-bit format, and outputting the encoded binary value using the extended bit format, as a result of the n-bit multiplication operation.