G06F2207/382

Computing device and method

The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.

NORMALIZER AND MULTIPLICATION AND ACCUMULATION (MAC) OPERATOR INCLUDING THE NORMALIZER
20230244442 · 2023-08-03 · ·

A normalizer includes a “0” search circuit configured to search for a position of a most significant “0” bit of first mantissa data included in input data to output first search data, a “1” search circuit configured to search for a position of a most significant “1” bit of the first mantissa data included in the input data to output second search data, a selector configured to output one selected by a bit value of first sign data of the input data between the first search data and the second search data, as selected data, an exponent adder configured to add first exponent data included in the input data and the selected data to output second exponent data included in output data, and a mantissa shifter configured to perform a shifting operation on the first mantissa data, based on the selected data to output second mantissa data included in the output data.

Processing unit with mixed precision operations

A graphics processing unit (GPU) implements operations, with associated op codes, to perform mixed precision mathematical operations. The GPU includes an arithmetic logic unit (ALU) with different execution paths, wherein each execution path executes a different mixed precision operation. By implementing mixed precision operations at the ALU in response to designate op codes that delineate the operations, the GPU efficiently increases the precision of specified mathematical operations while reducing execution overhead.

Matrix multiplication unit with flexible precision operations

A processing unit such as a graphics processing unit (GPU) includes a plurality of vector signal processors (VSPs) that include multiply/accumulate elements. The processing unit also includes a plurality of registers associated with the plurality of VSPs. First portions of first and second matrices are fetched into the plurality of registers prior to a first round that includes a plurality of iterations. The multiply/accumulate elements perform matrix multiplication and accumulation on different combinations of subsets of the first portions of the first and second matrices in the plurality of iterations prior to fetching second portions of the first and second matrices into the plurality of registers for a second round. The accumulated results of multiplying the first portions of the first and second matrices are written into an output buffer in response to completing the plurality of iterations.

BIT STRING COMPRESSION
20220021399 · 2022-01-20 ·

Systems, apparatuses, and methods related to bit string compression are described. A method for bit string compression can include determining that a particular operation is to be performed using a bit string formatted according to a universal number format or a posit format to alter a bit width associated with the bit string from a first bit width to a second bit width and performing a compression operation on a bit string formatted according to a universal number format or a posit format to alter a bit width associated with the bit string from a first bit width to a second bit width. The method can further include writing the bit string having the second bit width to a first register, performing an arithmetic operation or a logical operation, or both using the bit string having the second bit string width, and monitoring a quantity of bits of a result of the operation.

Selectively changing arithmetic data types used in arithmetic execution of deep learning applications based on expressible ratio and fluctuation value comparisons to threshold values
11182156 · 2021-11-23 · ·

An information processing apparatus includes: a memory; and a processor coupled to the memory and configured to: perform an arithmetic operation using an arithmetic operation target; repeat the arithmetic operation by using a calculated arithmetic operation result; obtain a ratio of, in a first number of elements which are included in the arithmetic operation result, a second number of elements in an expressible range as a predetermined-bit fixed point; and perform the arithmetic operation by using the predetermined-bit fixed point based on the ratio.

INTEGRATED CIRCUITS WITH MACHINE LEARNING EXTENSIONS
20220012015 · 2022-01-13 ·

An integrated circuit with specialized processing blocks are provided. A specialized processing block may be optimized for machine learning algorithms and may include a multiplier data path that feeds an adder data path. The multiplier data path may be decomposed into multiple partial product generators, multiple compressors, and multiple carry-propagate adders of a first precision. Results from the carry-propagate adders may be added using a floating-point adder of the first precision. Results from the floating-point adder may be optionally cast to a second precision that is higher or more accurate than the first precision. The adder data path may include an adder of the second precision that combines the results from the floating-point adder with zero, with a general-purpose input, or with other dot product terms. Operated in this way, the specialized processing block provides a technical improvement of greatly increasing the functional density for implementing machine learning algorithms.

Integrated circuits with machine learning extensions
11175892 · 2021-11-16 · ·

An integrated circuit with specialized processing blocks are provided. A specialized processing block may be optimized for machine learning algorithms and may include a multiplier data path that feeds an adder data path. The multiplier data path may be decomposed into multiple partial product generators, multiple compressors, and multiple carry-propagate adders of a first precision. Results from the carry-propagate adders may be added using a floating-point adder of the first precision. Results from the floating-point adder may be optionally cast to a second precision that is higher or more accurate than the first precision. The adder data path may include an adder of the second precision that combines the results from the floating-point adder with zero, with a general-purpose input, or with other dot product terms. Operated in this way, the specialized processing block provides a technical improvement of greatly increasing the functional density for implementing machine learning algorithms.

MULTIPLIER AND MULTIPLICATION METHOD
20210349692 · 2021-11-11 ·

A multiplier includes a multiplier preprocessing circuit, an encoding code, an addition circuit and a partial product selection circuit. The multiplier preprocessing circuit generates different input coding values from a received multiplier according to different operation bit widths. The encoding circuit generates different coded values according to different input coding values, and performs an operation according to different coded values and a received multiplicand to obtain a first partial product. The addition circuit accumulates the first partial product for a corresponding number of times according to different operation bit widths to generate different second partial products. The multiplier supports multiplication of multiple mixed bit widths, and a multiplier unit can be repeatedly used for multiplication operations in encounters with different precisions.

Computing device and method

The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.