Patent classifications
G06F7/4876
Signed multiplication using unsigned multiplier with dynamic fine-grained operand isolation
An N×N multiplier may include a N/2×N first multiplier, a N/2×N/2 second multiplier, and a N/2×N/2 third multiplier. The N×N multiplier receives two operands to multiply. The first, second and/or third multipliers are selectively disabled if an operand equals zero or has a small value. If the operands are both less than 2.sup.N/2, the second or the third multiplier are used to multiply the operands. If one operand is less than 2.sup.N/2 and the other operand is equal to or greater than 2.sup.N/2, the first multiplier is used or the second and third multipliers are used to multiply the operands. If both operands are equal to or greater than 2.sup.N/2, the first, second and third multipliers are used to multiply the operands.
METHOD AND APPARATUS FOR VECTOR SORTING USING VECTOR PERMUTATION LOGIC
A method for sorting of a vector in a processor is provided that includes performing, by the processor in response to a vector sort instruction, generating a control input vector for vector permutation logic comprised in the processor based on values in lanes of the vector and a sort order for the vector indicated by the vector sort instruction and storing the control input vector in a storage location.
METHOD OF DETERMINING THE CENTER OF LOADING OF A ROLLING ELEMENT
A method of determining the center of loading of a rolling element includes providing a rolling element body and at least three load sensors. The sensors are each positioned within a bore of the rolling element body at a separate distance from a reference position. Load measurements are taken with each one of the sensors at various positions about the circumference of the bearing and the center of loading is calculated at each one of the positions to determine the variation in axial loading about the bearing circumference.
NEURAL NETWORK FACILITATING FIXED-POINT EMULATION OF FLOATING-POINT COMPUTATION
An DNN accelerator can perform fixed-point emulation of floating-point computation. In a multiplication operation on two floating-point matrices, the DNN accelerator determines an extreme exponent for a row in the first floating-point matrix and determines another extreme exponent for a column in the second floating-point matrix. The row and column can be converted to fixed-point vectors based on the extreme exponents. The two fixed-point vectors are fed into a PE array in the DNN accelerator. The PE array performs a multiplication operation on the two fixed-point vectors and generates a fixed-point inner product. The fixed-point inner product can be converted back to a floating-point inner product based on the extreme exponents. The floating-point inner product is an element in the matrix resulted from the multiplication operation on the two floating-point matrices. The matrix can be accumulated with another matrix resulted from a fixed-point emulation of a floating-point matrix multiplication.
SYSTEMS, METHODS, AND APPARATUSES FOR TILE LOAD
Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in the form of decode circuitry to decode an instruction having fields for an opcode, a destination matrix operand identifier, and source memory information, and execution circuitry to execute the decoded instruction to load groups of strided data elements from memory into configured rows of the identified destination matrix operand to memory.
APPARATUS AND METHOD FOR VECTOR PACKED DUAL COMPLEX-BY-COMPLEX AND DUAL COMPLEX-BY-COMPLEX CONJUGATE MULTIPLICATION
An apparatus and method for multiplying packed real and imaginary components of complex numbers and complex conjugates. For example, one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed real and imaginary data elements; a second source register to store a second plurality of packed real and imaginary data elements; and execution circuitry to execute the decoded instruction. The execution circuitry includes multiplier circuitry to multiply select real and imaginary data elements in the first and second source registers to generate a plurality of real and imaginary products; adder circuitry to add/subtract various real and imaginary products, scale the results according to an immediate of the instruction, round the scaled results; and saturation circuitry to saturate the rounded results.
Systems, methods, and apparatuses for tile store
Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in at least a form of decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and destination memory information, and execution circuitry to execute the decoded instruction to store each data element of configured rows of the identified source matrix operand to memory based on the destination memory information.
PROCESSING-IN-MEMORY(PIM) DEVICE
A PIM device includes a memory/arithmetic region including a plurality of memory banks and a plurality of MAC operators, the plurality of MAC operators including a first MAC operator, a peripheral region including a data input/output circuit, and a global data input/output (GIO) line capable of providing a data transmission path between the peripheral region and the memory/arithmetic region. The first MAC operator is configured to perform an EWM operation by performing a multiplication operation on first input data and second input data that are transmitted from first and second memory banks of the plurality of memory banks, respectively, to generate multiplication result data and transmitting the multiplication result data to a third memory bank. While the EWM operation is being performed, data transmission through the GIO line between the peripheral region and the memory/arithmetic region is blocked.
METHOD AND APPARATUS TO SORT A VECTOR FOR A BITONIC SORTING ALGORITHM
A method is provided that includes performing, by a processor in response to a vector sort instruction, sorting of values stored in lanes of the vector to generate a sorted vector, wherein the values in a first portion of the lanes are sorted in a first order indicated by the vector sort instruction and the values in a second portion of the lanes are sorted in a second order indicated by the vector sort instruction; and storing the sorted vector in a storage location.
DATA PROCESSING SYSTEM, OPERATING METHOD THEREOF, AND COMPUTING SYSTEM USING THE SAME
A data processing system may include: a matrix splitting circuit configured to: split the matrix into a positive matrix and a negative matrix, and store the positive matrix and the negative matrix in a first sub array and a second sub array within the computation memory, respectively; a vector conversion circuit configured to generate an offset vector by adding, to elements within the vector, an offset for converting a negative element, which has a largest absolute value among the elements within the vector, into a zero element or a positive element, and apply the offset vector to the row lines of the first sub array and the second sub array; and an offset correction circuit configured to generate an offset correction value by subtracting a result of multiplying the offset and the negative matrix from a result of multiplying the offset and the positive matrix, and subtract the offset correction value from a computation value outputted from the first sub array and the second sub array,