G06F7/49915

Method and Apparatus for Vector Based Matrix Multiplication
20220283810 · 2022-09-08 ·

A method is provided that includes performing, by a processor in response to a vector matrix multiply instruction, multiplying an m×n matrix (A matrix) and a n×p matrix (B matrix) to generate elements of an m×p matrix (R matrix), and storing the elements of the R matrix in a storage location specified by the vector matrix multiply instruction.

Floating-point unit stochastic rounding for accelerated deep learning

Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements comprising a portion of a neural network accelerator performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Each compute element has a respective floating-point unit enabled to perform stochastic rounding, thus in some circumstances enabling reducing systematic bias in long dependency chains of floating-point computations. The long dependency chains of floating-point computations are performed, e.g., to train a neural network or to perform inference with respect to a trained neural network.

Method and device for floating point representation with variable precision

The present disclosure relates to a method of storing, by a load and store circuit or other processing means, a variable precision floating point value to a memory address of a memory, the method comprising: reducing the bit length of the variable precision floating point value to no more than a size limit, and storing the variable precision floating point value to one of a plurality of storage zones in the memory, each of the plurality of storage zones having a storage space equal to or greater than the size limit (MBB).

Method and apparatus for vector sorting using vector permutation logic

A method for sorting of a vector in a processor is provided that includes performing, by the processor in response to a vector sort instruction, generating a control input vector for vector permutation logic comprised in the processor based on values in lanes of the vector and a sort order for the vector indicated by the vector sort instruction and storing the control input vector in a storage location.

Method and Apparatus for Dual Issue Multiply Instructions
20220229782 · 2022-07-21 ·

A method is provided that includes performing, by a processor in response to a dual issue multiply instruction, multiplication of operands of the dual issue multiply instruction using multiplication units comprised in a data path of the processor and configured to operate together to determine a product of the operands, and storing, by the processor, the product in a storage location indicated by the dual issue multiply instruction.

Method and Apparatus for Dual Multiplication Units in a Data Path
20220206802 · 2022-06-30 ·

A processor is provided that includes a first multiplication unit in a first data path of the processor, the first multiplication unit configured to perform single issue multiply instructions, and a second multiplication unit in the first data path, the second multiplication unit configured to perform single issue multiply instructions, wherein the first multiplication unit and the second multiplication unit are configured to execute respective single issue multiply instructions in parallel.

Arithmetic processing apparatus, control method, and non-transitory computer-readable recording medium having stored therein control program
11410036 · 2022-08-09 · ·

An arithmetic processing apparatus includes: a memory that stores, when a training of a given machine learning model is repeatedly performed in a plurality of iterations, an error of a decimal point position of each of a plurality of fixed-point number data obtained one in each of the plurality of iterations, the error being obtained based on statistical information related to a distribution of leftmost set bit positions for positive number and leftmost unset bit positions for negative number or a distribution of rightmost set bit positions of the plurality of fixed-point number data; and a processor coupled to the memory, the processor being configured to: determine, based on a tendency of the error in each of the plurality of iterations, an offset amount for correcting a decimal point position of fixed-point number data used in the training.

MULTIPLICATION AND ACCUMULATION (MAC) OPERATOR
20220236949 · 2022-07-28 · ·

A multiplication-accumulation (MAC) includes a multiplication circuit, a pre-processing circuit, and an adder tree. The multiplication circuit performs a multiplication operation on a plurality of weight data and a plurality of vector data each having a floating-point format to output a plurality of multiplication data. The pre-processing circuit performs shifting on mantissa data of the plurality of multiplication data by a difference between first maximum exponent data having a greatest value among the exponent data of the plurality of multiplication data and the remaining exponent data to output a plurality of pre-processed mantissa data. The adder tree adds the plurality of mantissa data to output mantissa addition bits.

METHOD AND APPARATUS FOR PERMUTING STREAMED DATA ELEMENTS

A method is provided that includes receiving, in a permute network, a plurality of data elements for a vector instruction from a streaming engine, and mapping, by the permute network, the plurality of data elements to vector locations for execution of the vector instruction by a vector functional unit in a vector data path of a processor.

METHOD AND APPARATUS TO SORT A VECTOR FOR A BITONIC SORTING ALGORITHM
20220156073 · 2022-05-19 ·

A method is provided that includes performing, by a processor in response to a vector sort instruction, sorting of values stored in lanes of the vector to generate a sorted vector, wherein the values in a first portion of the lanes are sorted in a first order indicated by the vector sort instruction and the values in a second portion of the lanes are sorted in a second order indicated by the vector sort instruction; and storing the sorted vector in a storage location.