G06F7/78

APPARATUS AND METHOD FOR CONJUGATE TRANSPOSE AND MULTIPLY

An apparatus and method for complex matrix conjugation and multiplication. For example, one embodiment of a processor comprises: a decoder to decode a complex matrix conjugation and multiplication instruction including a first source operand to identify a first complex source matrix comprising a first plurality of complex values, a second source operand to identify a second complex source matrix comprising a second plurality of complex values, and a first destination operand to identify a result matrix; execution circuitry to execute the complex matrix conjugation and multiplication instruction, the execution circuitry comprising: matrix conjugation hardware logic to determine a plurality of complex conjugate values corresponding to the first plurality of complex values; transpose hardware logic to transpose the plurality of complex conjugate values to generate a conjugate transpose matrix comprising the complex conjugate values; parallel multiplication circuitry to: multiply real values from the plurality of complex conjugate values of the conjugate transpose matrix with corresponding imaginary values from the second plurality of complex values to generate a first plurality of imaginary products, and multiply imaginary values from the plurality of complex conjugate values of the conjugate transpose matrix with corresponding real values from the second plurality of complex values to generate a second plurality of imaginary products; and addition/subtraction circuitry to add each imaginary product in the first plurality of imaginary products to a corresponding imaginary product in the second plurality of imaginary products to produce a corresponding imaginary component in the result matrix.

APPARATUS AND METHOD FOR COMPLEX MATRIX CONJUGATE TRANSPOSE

An apparatus and method for complex matrix conjugation. For example, one embodiment of a processor comprises: a decoder to decode a complex conjugate transpose instruction including a source operand to identify a complex source matrix and a destination operand to identify a complex result matrix, the complex source matrix to store a first plurality of complex values and the complex result matrix to store a second plurality of complex values, each complex value in the first and second plurality of complex values including a real component and an imaginary component; a plurality of registers or local memory to store all or a subset of the first plurality of complex values; and execution circuitry to execute the complex conjugate transpose instruction using matrix conjugation hardware logic to determine a plurality of complex conjugate values corresponding to the first plurality of complex values, and transpose hardware logic to perform a matrix transpose operation using the plurality of complex conjugate values to generate a result matrix.

APPARATUS AND METHOD FOR COMPLEX MATRIX CONJUGATE TRANSPOSE

An apparatus and method for complex matrix conjugation. For example, one embodiment of a processor comprises: a decoder to decode a complex conjugate transpose instruction including a source operand to identify a complex source matrix and a destination operand to identify a complex result matrix, the complex source matrix to store a first plurality of complex values and the complex result matrix to store a second plurality of complex values, each complex value in the first and second plurality of complex values including a real component and an imaginary component; a plurality of registers or local memory to store all or a subset of the first plurality of complex values; and execution circuitry to execute the complex conjugate transpose instruction using matrix conjugation hardware logic to determine a plurality of complex conjugate values corresponding to the first plurality of complex values, and transpose hardware logic to perform a matrix transpose operation using the plurality of complex conjugate values to generate a result matrix.

APPARATUS AND METHOD FOR COMPLEX MATRIX TRANSPOSE AND MULTIPLY

An apparatus and method for complex matrix transpose and multiply. For example, one embodiment of a processor comprises: a decoder to decode a first complex matrix multiplication and transpose instruction including a first source operand to identify a first plurality of real and imaginary values in a first complex source matrix, a second source operand to identify a second plurality of real and imaginary values in a second complex source matrix, and a first destination operand to identify a result matrix with real and imaginary values; execution circuitry to execute the first complex matrix transpose and multiplication instruction, the execution circuitry comprising transpose hardware logic to transpose at least one of the source matrices, parallel multiplication circuitry to multiply real values from the first plurality of real and imaginary values with corresponding real values from the second plurality of real and imaginary values to generate a first plurality of real products, to multiply imaginary values from the first plurality of real and imaginary values with corresponding imaginary values from the second plurality of real and imaginary values to generate a second plurality of real products; and addition/subtraction circuitry to subtract each real product in the second plurality of real products from a corresponding real product in the first plurality of real products to produce a corresponding real value in the result matrix.

APPARATUS AND METHOD FOR COMPLEX MATRIX TRANSPOSE AND MULTIPLY

An apparatus and method for complex matrix transpose and multiply. For example, one embodiment of a processor comprises: a decoder to decode a first complex matrix multiplication and transpose instruction including a first source operand to identify a first plurality of real and imaginary values in a first complex source matrix, a second source operand to identify a second plurality of real and imaginary values in a second complex source matrix, and a first destination operand to identify a result matrix with real and imaginary values; execution circuitry to execute the first complex matrix transpose and multiplication instruction, the execution circuitry comprising transpose hardware logic to transpose at least one of the source matrices, parallel multiplication circuitry to multiply real values from the first plurality of real and imaginary values with corresponding real values from the second plurality of real and imaginary values to generate a first plurality of real products, to multiply imaginary values from the first plurality of real and imaginary values with corresponding imaginary values from the second plurality of real and imaginary values to generate a second plurality of real products; and addition/subtraction circuitry to subtract each real product in the second plurality of real products from a corresponding real product in the first plurality of real products to produce a corresponding real value in the result matrix.

Eigenvalue decomposition with stochastic optimization

A computer-implemented method for Eigenpair computation is provided. The method includes computing, her a hardware processor, an Eigenvector and respective Eigenvalues of the Eigenvector of a matrix by using a modified Stochastic Optimization process including performing a matrix vector product on a Resistive Processing Unit (RPU) crossbar array operatively coupled to the hardware processor and performing a scalar vector product on a digital device operatively coupled to the hardware processor and representing, for each of an Eigenpair, an initial guess for the Eigenvector and the respective Eigenvalues. The computing step includes storing the matrix in the RPU crossbar array.

Eigenvalue decomposition with stochastic optimization

A computer-implemented method for Eigenpair computation is provided. The method includes computing, her a hardware processor, an Eigenvector and respective Eigenvalues of the Eigenvector of a matrix by using a modified Stochastic Optimization process including performing a matrix vector product on a Resistive Processing Unit (RPU) crossbar array operatively coupled to the hardware processor and performing a scalar vector product on a digital device operatively coupled to the hardware processor and representing, for each of an Eigenpair, an initial guess for the Eigenvector and the respective Eigenvalues. The computing step includes storing the matrix in the RPU crossbar array.

Two-dimensional data matching method, device and logic circuit

Provided are a two-dimensional data matching method, a device and a logic circuit. The method is executed by a first operator, a first queue, a second operator, a first counter, a second queue, a third operator, a second counter, a first comparator, and a first memory sequentially connected. The method includes: the first operator performs a bitwise matching operation on the matrix a and the matrix b row by row, inputting the result to the first queue; the second operator performs a cumulative operation on the matching result, and outputting an accumulative value to the second queue; the second operator performs a cumulative operation on the accumulative value, and inputs an accumulated result to the first comparator; the first comparator compares the accumulated result with a pre-stored matching threshold, and inputs the comparison result to the first memory to form a matching result matrix; and repeating the above steps.

Two-dimensional data matching method, device and logic circuit

Provided are a two-dimensional data matching method, a device and a logic circuit. The method is executed by a first operator, a first queue, a second operator, a first counter, a second queue, a third operator, a second counter, a first comparator, and a first memory sequentially connected. The method includes: the first operator performs a bitwise matching operation on the matrix a and the matrix b row by row, inputting the result to the first queue; the second operator performs a cumulative operation on the matching result, and outputting an accumulative value to the second queue; the second operator performs a cumulative operation on the accumulative value, and inputs an accumulated result to the first comparator; the first comparator compares the accumulated result with a pre-stored matching threshold, and inputs the comparison result to the first memory to form a matching result matrix; and repeating the above steps.

FAST SORT ENGINE
20220171600 · 2022-06-02 ·

A fast sort engine may perform a Radix sort directly on a data elements array and a monotonic function numerical value array. The Radix sort may include use of buckets which may contain elements instead of integers and may use a monotonic value corresponding to each data element in the data elements array to determine to which bucket the data element will be assigned. The fast sort engine may then sort the data elements array directly as it sorts the monotonic function values array. Permutations made to the monotonic function numerical values array are made to the data elements array as well.