G06F5/012

Floating Point Number Calculation Circuit and Floating Point Number Calculation Method
20230266941 · 2023-08-24 ·

A splitting circuit included in a floating-point number calculation circuit splits a mantissa part of a first floating-point number and a mantissa part of a second floating-point number. An exponential processing circuit obtains a second number of shifted bits of each mantissa part obtained after splitting. A calculation circuit calculates a product of the mantissa part of the first floating-point number and the mantissa part of the second floating-point number based on each mantissa part obtained after splitting and the second number of shifted bits of each mantissa part obtained after splitting. The floating-point number calculation circuit can split a large bit-width floating-point number into small bit-width floating-point numbers, so that a small bit-width multiplier is used to calculate the large bit-width floating-point number.

METHOD AND APPARATUS FOR FLOATING-POINT DATA TYPE MATRIX MULTIPLICATION BASED ON OUTER PRODUCT

Disclosed herein is a method for outer-product-based matrix multiplication for a floating-point data type includes receiving first floating-point data and second floating-point data and performing matrix multiplication on the first floating-point data and the second floating-point data, and the result value of the matrix multiplication is calculated based on the suboperation result values of floating-point units.

BIT-WIDTH OPTIMIZATION METHOD FOR PERFORMING FLOATING POINT TO FIXED POINT CONVERSION
20220137922 · 2022-05-05 ·

Provided is a bit-width optimization method for performing floating point to fixed point conversion (FFC) by at least one processor. The bit-width optimization method includes receiving a first floating-point value which represents a minimum value among floating-point values to be converted, receiving a second floating-point value which represents a maximum value among the floating-point values to be converted, receiving a maximum permissible error rate for performing FFC, calculating a minimum bit width of fixed-point notation which satisfies the maximum permissible error rate on the basis of the first floating-point value, the second floating-point value, and the maximum permissible error rate, and calculating a scale factor for FFC on the basis of the second floating-point value and the calculated minimum bit width.

NORMALIZER AND MULTIPLICATION AND ACCUMULATION (MAC) OPERATOR INCLUDING THE NORMALIZER
20230244442 · 2023-08-03 · ·

A normalizer includes a “0” search circuit configured to search for a position of a most significant “0” bit of first mantissa data included in input data to output first search data, a “1” search circuit configured to search for a position of a most significant “1” bit of the first mantissa data included in the input data to output second search data, a selector configured to output one selected by a bit value of first sign data of the input data between the first search data and the second search data, as selected data, an exponent adder configured to add first exponent data included in the input data and the selected data to output second exponent data included in output data, and a mantissa shifter configured to perform a shifting operation on the first mantissa data, based on the selected data to output second mantissa data included in the output data.

ACCELERATION CIRCUITRY
20220027129 · 2022-01-27 ·

Systems, apparatuses, and methods related to acceleration circuitry are described. The acceleration circuitry may be deployed in a memory device and can include a memory resource and/or logic circuitry. The acceleration circuitry can perform operations on data to convert the data between one or more numeric formats, such as floating-point and/or universal number (e.g., posit) formats. The acceleration circuitry can perform arithmetic and/or logical operations on the data after the data has been converted to a particular format. For instance, the memory resource can receive data comprising a bit string having a first format that provides a first level of precision. The logic circuitry can receive the data from the memory resource and convert the bit string to a second format that provides a second level of precision that is different from the first level of precision.

Repurposed hexadecimal floating point data path

A method includes dividing a fraction of a floating point result into a first portion and a second portion. The method includes outputting a first normalizer result based on the first portion during to a first clock cycle. The method includes storing a first segment of the first portion during to the first clock cycle. The method includes outputting a first rounder result based on the first normalizer result during to the first clock cycle. The method includes outputting a second normalizer result based on the second portion during to a second clock cycle. The method includes outputting a second rounder result based on the second normalizer result and the first segment during to the second clock cycle.

Preparation and execution of quantized scaling on integrated circuitry

Preparation and execution of quantized scaling may be performed by operations including obtaining an original array and a scaling factor representing a ratio of a size of the original array to a size of a scaled array, determining, for each column of the scaled array, a horizontal coordinate of each of two nearest elements in the horizontal dimension of the original array, and, for each row of the scaled array, a vertical coordinate of each of two nearest elements in the vertical dimension of the original array, calculating, for each row of the scaled array and each column of the scaled array, a linear interpolation coefficient, converting each value of the original array from a floating point number into a quantized number, converting each linear interpolation coefficient from a floating point number into a fixed point number, storing, in a memory, the horizontal coordinates and vertical coordinates as integers, the values as quantized numbers, and the linear interpolation coefficients as fixed point numbers.

REPURPOSED HEXADECIMAL FLOATING POINT DATA PATH

A method includes dividing a fraction of a floating point result into a first portion and a second portion. The method includes outputting a first normalizer result based on the first portion during to a first clock cycle. The method includes storing a first segment of the first portion during to the first clock cycle. The method includes outputting a first rounder result based on the first normalizer result during to the first clock cycle. The method includes outputting a second normalizer result based on the second portion during to a second clock cycle. The method includes outputting a second rounder result based on the second normalizer result and the first segment during to the second clock cycle.

CIRCULAR ACCUMULATOR FOR FLOATING POINT ADDITION
20220004362 · 2022-01-06 ·

Certain aspects of the present disclosure are directed to methods and apparatus for circular floating point addition. An example method generally includes obtaining a first floating point number represented by a first significand and a first exponent, obtaining a second floating point number represented by a second significand and second exponent, and adding the first floating point number and the second floating point number using a circular accumulator device.

Converting floating point numbers to reduce the precision

A hardware module comprising at least one of: one or more field programmable gate arrays and one or more application specific integrated circuits configured to: receive a number in floating-point representation at a first precision level, the number comprising an exponent and a first mantissa; apply a first random number to the first mantissa to generate a first carry; truncate the first mantissa to a level specified by a second precision level; add the first carry to the least significant bit of the mantissa truncated to the level specified by the second precision level to form a mantissa for the number in floating-point representation at the second precision level.