G06F7/78

Matrix operation optimization mechanism

An apparatus to facilitate machine learning matrix processing is disclosed. The apparatus comprises a memory to store matrix data one or more processors to execute an instruction to examine a message descriptor included in the instruction to determine a type of matrix layout manipulation operation that is to be executed, examine a message header included in the instruction having a plurality of parameters that define a two-dimensional (2D) memory surface that is to be retrieved, retrieve one or more blocks of the matrix data from the memory based on the plurality of parameters and a register file including a plurality of registers, wherein the one or more blocks of the matrix data is stored within a first set of the plurality of registers.

Matrix operation optimization mechanism

An apparatus to facilitate machine learning matrix processing is disclosed. The apparatus comprises a memory to store matrix data one or more processors to execute an instruction to examine a message descriptor included in the instruction to determine a type of matrix layout manipulation operation that is to be executed, examine a message header included in the instruction having a plurality of parameters that define a two-dimensional (2D) memory surface that is to be retrieved, retrieve one or more blocks of the matrix data from the memory based on the plurality of parameters and a register file including a plurality of registers, wherein the one or more blocks of the matrix data is stored within a first set of the plurality of registers.

NEURAL NETWORK ACCELERATOR, ACCELERATION METHOD, AND APPARATUS
20230236891 · 2023-07-27 ·

A neural network accelerator is provided, including: a preprocessing module (301), configured to perform first forward winograd transform on a target matrix corresponding to an input feature map, to obtain a transformed target matrix, where the preprocessing module (301) is further configured to perform second forward winograd transform on a convolution kernel, to obtain a transformed convolution kernel; a matrix operation module (302), configured to perform a matrix multiplication operation on a first matrix and a second matrix, to obtain a multiplication result, where the first matrix is constructed based on the transformed target matrix, and the second matrix is constructed based on the transformed convolution kernel; and a vector operation module (303), configured to perform inverse winograd transform on the multiplication result, to obtain an output feature map.

NEURAL NETWORK ACCELERATOR, ACCELERATION METHOD, AND APPARATUS
20230236891 · 2023-07-27 ·

A neural network accelerator is provided, including: a preprocessing module (301), configured to perform first forward winograd transform on a target matrix corresponding to an input feature map, to obtain a transformed target matrix, where the preprocessing module (301) is further configured to perform second forward winograd transform on a convolution kernel, to obtain a transformed convolution kernel; a matrix operation module (302), configured to perform a matrix multiplication operation on a first matrix and a second matrix, to obtain a multiplication result, where the first matrix is constructed based on the transformed target matrix, and the second matrix is constructed based on the transformed convolution kernel; and a vector operation module (303), configured to perform inverse winograd transform on the multiplication result, to obtain an output feature map.

EXPANDING RAID SYSTEMS
20230027532 · 2023-01-26 · ·

Physical storage devices (PSDs) of a protection group cluster (PGC) may be represented by a protection group matrix (PGM) having a plurality of rows and a plurality of columns, where each row corresponds to a PSD of the PGC, and each column corresponds to a partition of each PSD. The value specified in each cell at an intersection of a row and column specifies the protection group of the PGC to which the partition of the PSD represented by the column and row, respectively, is (or will be) assigned. In response to one or more of PSDs being added to a PGC, the PGM may be reconfigured, including adding new rows, and transposing portions of columns to the new rows, or transposing portions of rows to portions of columns of the new rows. Protection members of the PGC may be re-assigned based on the reconfiguration.

EXPANDING RAID SYSTEMS
20230027532 · 2023-01-26 · ·

Physical storage devices (PSDs) of a protection group cluster (PGC) may be represented by a protection group matrix (PGM) having a plurality of rows and a plurality of columns, where each row corresponds to a PSD of the PGC, and each column corresponds to a partition of each PSD. The value specified in each cell at an intersection of a row and column specifies the protection group of the PGC to which the partition of the PSD represented by the column and row, respectively, is (or will be) assigned. In response to one or more of PSDs being added to a PGC, the PGM may be reconfigured, including adding new rows, and transposing portions of columns to the new rows, or transposing portions of rows to portions of columns of the new rows. Protection members of the PGC may be re-assigned based on the reconfiguration.

Transposing neural network matrices in hardware
11704547 · 2023-07-18 · ·

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium. In one aspect, a method includes the actions of receiving a request to perform computations for a neural network on a hardware circuit having a matrix computation unit, the request specifying a transpose operation to be performed on a first neural network matrix; and generating instructions that when executed by the hardware circuit cause the hardware circuit to transpose the first neural network matrix by performing first operations, wherein the first operations include repeatedly performing the following second operations: for a current subdivision of the first neural network matrix that divides the first neural network matrix into one or more current submatrices, updating the first neural network matrix by swapping an upper right quadrant and a lower left quadrant of each current submatrix, and subdividing each current submatrix into respective new submatrices to update the current subdivision.

Transposing neural network matrices in hardware
11704547 · 2023-07-18 · ·

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium. In one aspect, a method includes the actions of receiving a request to perform computations for a neural network on a hardware circuit having a matrix computation unit, the request specifying a transpose operation to be performed on a first neural network matrix; and generating instructions that when executed by the hardware circuit cause the hardware circuit to transpose the first neural network matrix by performing first operations, wherein the first operations include repeatedly performing the following second operations: for a current subdivision of the first neural network matrix that divides the first neural network matrix into one or more current submatrices, updating the first neural network matrix by swapping an upper right quadrant and a lower left quadrant of each current submatrix, and subdividing each current submatrix into respective new submatrices to update the current subdivision.

VECTOR BIT TRANSPOSE

A method to transpose source data in a processor in response to a vector bit transpose instruction includes specifying, in respective fields of the vector bit transpose instruction, a source register containing the source data and a destination register to store transposed data. The method also includes executing the vector bit transpose instruction by interpreting N×N bits of the source data as a two-dimensional array having N rows and N columns, creating transposed source data by transposing the bits by reversing a row index and a column index for each bit, and storing the transposed source data in the destination register.

VECTOR BIT TRANSPOSE

A method to transpose source data in a processor in response to a vector bit transpose instruction includes specifying, in respective fields of the vector bit transpose instruction, a source register containing the source data and a destination register to store transposed data. The method also includes executing the vector bit transpose instruction by interpreting N×N bits of the source data as a two-dimensional array having N rows and N columns, creating transposed source data by transposing the bits by reversing a row index and a column index for each bit, and storing the transposed source data in the destination register.