Patent classifications
G06F7/523
APPARATUS AND METHOD FOR VECTOR PACKED MULTIPLY OF SIGNED AND UNSIGNED WORDS
An apparatus and method for performing a vector packed multiplication of signed and unsigned words. For example, one embodiment of a processor includes a decoder to decode a vector packed multiply instruction having operands to identify a first and a second plurality of packed words, first and second source registers to store the first and second plurality of packed words, and execution circuitry to execute the decoded instruction. The execution circuitry includes multiplier circuitry to multiply each packed word in the first source register with a corresponding packed word in the second source register to generate a plurality of doubleword products and rounding circuitry to round each of the doubleword products according to a rounding method to generate a plurality of rounded doubleword products. Each upper word of the rounded doubleword results is then stored into a corresponding word data element positions of a destination register.
APPARATUS AND METHOD FOR VECTOR PACKED MULTIPLY OF SIGNED AND UNSIGNED WORDS
An apparatus and method for performing a vector packed multiplication of signed and unsigned words. For example, one embodiment of a processor includes a decoder to decode a vector packed multiply instruction having operands to identify a first and a second plurality of packed words, first and second source registers to store the first and second plurality of packed words, and execution circuitry to execute the decoded instruction. The execution circuitry includes multiplier circuitry to multiply each packed word in the first source register with a corresponding packed word in the second source register to generate a plurality of doubleword products and rounding circuitry to round each of the doubleword products according to a rounding method to generate a plurality of rounded doubleword products. Each upper word of the rounded doubleword results is then stored into a corresponding word data element positions of a destination register.
Neural network circuit
A neural network circuit having a novel structure is provided. A plurality of arithmetic circuits each including a register, a memory, a multiplier circuit, and an adder circuit are provided. The memory outputs different weight data in response to switching of a context signal. The multiplier circuit outputs multiplication data of the weight data and input data held in the register. The adder circuit performs a product-sum operation by adding the obtained multiplication data to data obtained by a product-sum operation in an adder circuit of another arithmetic circuit. The obtained product-sum operation data is output to an adder circuit of another arithmetic circuit, so that product-sum operations of different weight data and input data are performed.
Neural network circuit
A neural network circuit having a novel structure is provided. A plurality of arithmetic circuits each including a register, a memory, a multiplier circuit, and an adder circuit are provided. The memory outputs different weight data in response to switching of a context signal. The multiplier circuit outputs multiplication data of the weight data and input data held in the register. The adder circuit performs a product-sum operation by adding the obtained multiplication data to data obtained by a product-sum operation in an adder circuit of another arithmetic circuit. The obtained product-sum operation data is output to an adder circuit of another arithmetic circuit, so that product-sum operations of different weight data and input data are performed.
Signal processing system and method
A signal processing method and apparatus, where the apparatus includes an input interface configured to receive an input signal matrix and a weight matrix, a processor configured to interleave the input signal matrix to obtain an interleaved signal matrix, partition the interleaved signal matrix, interleave the weight matrix to obtain an interleaved weight matrix, process the interleaved weight matrix to obtain a plurality of sparsified partitioned weight matrices, perform matrix multiplication on the sparsified partitioned weight matrices and a plurality of partitioned signal matrices to obtain a plurality of matrix multiplication results, and an output interface configured to output a signal processing result.
Signal processing system and method
A signal processing method and apparatus, where the apparatus includes an input interface configured to receive an input signal matrix and a weight matrix, a processor configured to interleave the input signal matrix to obtain an interleaved signal matrix, partition the interleaved signal matrix, interleave the weight matrix to obtain an interleaved weight matrix, process the interleaved weight matrix to obtain a plurality of sparsified partitioned weight matrices, perform matrix multiplication on the sparsified partitioned weight matrices and a plurality of partitioned signal matrices to obtain a plurality of matrix multiplication results, and an output interface configured to output a signal processing result.
BIT-SERIAL COMPUTING DEVICE AND TEST METHOD FOR EVALUATING THE SAME
A bit-serial computing device includes a computing circuit and a scaler. The computing circuit includes multiple MAC slices, and receives a multiplier vector and a multiplicand vector that contains multiple multiplicand inputs. Each multiplicand input contains multiple multiplicand segments that have different significances. The significances respectively correspond to the MAC slices. Correspondence between the significances and the MAC slices is variable. Each MAC slice calculates an inner product of the multiplier vector and a vector that is constituted by the multiplicand segments of the multiplicand inputs having the significance corresponding to the MAC slice. With respect to each MAC slice, the scaler multiplies the inner product that is calculated by the MAC slice by a weighting ratio that represents the significance corresponding to the MAC slice, so as to obtain a scaled inner product that corresponds to the MAC slice.
BIT-SERIAL COMPUTING DEVICE AND TEST METHOD FOR EVALUATING THE SAME
A bit-serial computing device includes a computing circuit and a scaler. The computing circuit includes multiple MAC slices, and receives a multiplier vector and a multiplicand vector that contains multiple multiplicand inputs. Each multiplicand input contains multiple multiplicand segments that have different significances. The significances respectively correspond to the MAC slices. Correspondence between the significances and the MAC slices is variable. Each MAC slice calculates an inner product of the multiplier vector and a vector that is constituted by the multiplicand segments of the multiplicand inputs having the significance corresponding to the MAC slice. With respect to each MAC slice, the scaler multiplies the inner product that is calculated by the MAC slice by a weighting ratio that represents the significance corresponding to the MAC slice, so as to obtain a scaled inner product that corresponds to the MAC slice.
Vector-vector multiplication techniques for processing systems
Vector-vector multiplication or matrix-matrix multiplication computation on computing systems can include computing a first portion of a vector-vector multiplication product based on a most-significant-bit set of a first vector and a most-significant-bit set of a second vector, and determining if the first portion of the vector-vector multiplication product is less than a threshold. If the first partial vector-vector multiplication product is not less than the threshold, a remaining portion of the vector-vector multiplication product can be computed, and a rectified linear vector-vector multiplication product can be determined for the sum of the first portion of the vector-vector multiplication product and the remaining portion of the vector-vector multiplication product. If the first portion of the vector-vector multiplication product is less than the threshold, computation of the remaining portion of the vector-vector multiplication product can be skipped and the rectified linear vector-vector multiplication product can be set to a zero scalar.
Vector-vector multiplication techniques for processing systems
Vector-vector multiplication or matrix-matrix multiplication computation on computing systems can include computing a first portion of a vector-vector multiplication product based on a most-significant-bit set of a first vector and a most-significant-bit set of a second vector, and determining if the first portion of the vector-vector multiplication product is less than a threshold. If the first partial vector-vector multiplication product is not less than the threshold, a remaining portion of the vector-vector multiplication product can be computed, and a rectified linear vector-vector multiplication product can be determined for the sum of the first portion of the vector-vector multiplication product and the remaining portion of the vector-vector multiplication product. If the first portion of the vector-vector multiplication product is less than the threshold, computation of the remaining portion of the vector-vector multiplication product can be skipped and the rectified linear vector-vector multiplication product can be set to a zero scalar.