Patent classifications
G06F7/544
NEURAL NETWORK DATA COMPUTATION USING MIXED-PRECISION
Techniques for mixed-precision data manipulation for neural network data computation are disclosed. A first left group comprising eight bytes of data and a first right group of eight bytes of data are obtained for computation using a processor. A second left group comprising eight bytes of data and a second right group of eight bytes of data are obtained. A sum of products is performed between the first left and right groups and the second left and right groups. The sum of products is performed on bytes of 8-bit integer data. A first result is based on a summation of eight values that are products of the first group’s left eight bytes and the second group’s left eight bytes. A second result is based on the summation of eight values that are products of the first group’s left eight bytes and the second group’s right eight bytes. Results are output.
METHOD AND DEVICE FOR ADDITIVE CODING OF SIGNALS IN ORDER TO IMPLEMENT DIGITAL MAC OPERATIONS WITH DYNAMIC PRECISION
A computer-implemented method is provided for coding a digital signal quantized on a given number N.sub.d of bits and intended to be processed by a digital computing system, the signal being coded on a predetermined number N.sub.p of bits which is strictly less than N.sub.d, the method including the steps of: receiving a digital signal composed of a plurality of samples, decomposing each sample into a sum of k maximum values which are equal to 2.sup.NP−1 and a residual value, with k being a positive or zero integer, successively transmitting the values obtained after decomposition to an integration unit for carrying out a MAC operation between the sample and a weighting coefficient.
HIGH THROUGHPUT MATRIX PROCESSOR WITH SUPPORT FOR CONCURRENTLY PROCESSING MULTIPLE MATRICES
A system comprises a data input vector unit, a weight input vector unit, and a plurality of calculation units. The data input vector unit is configured to concurrently receive elements of different rows of a first and second data matrix. The weight input vector unit is configured to receive a combined weight vector and at least in part concurrently provide obtained weight elements of a first and second weight matrix to a corresponding first and second group of calculation units. At least one calculation unit of each group of the first and second group of calculation units is configured to multiply elements from the data input vector unit with corresponding elements of the corresponding weight matrix from the weight input vector unit and sum together multiplication results of the corresponding calculation unit to at least in part determine a corresponding element in a first or second convolution result matrix.
Neural network circuit
A neural network circuit having a novel structure is provided. A plurality of arithmetic circuits each including a register, a memory, a multiplier circuit, and an adder circuit are provided. The memory outputs different weight data in response to switching of a context signal. The multiplier circuit outputs multiplication data of the weight data and input data held in the register. The adder circuit performs a product-sum operation by adding the obtained multiplication data to data obtained by a product-sum operation in an adder circuit of another arithmetic circuit. The obtained product-sum operation data is output to an adder circuit of another arithmetic circuit, so that product-sum operations of different weight data and input data are performed.
Semiconductor device having neural network
A semiconductor device capable of efficiently recognizing images utilizing a neural network is provided. The semiconductor device includes a shift register group, a D/A converter, and a product-sum operation circuit. The product-sum operation circuit includes an analog memory and stores a parameter of a filter. The shift register group captures image data and outputs part of the image data to the D/A converter while shifting the image data. The D/A converter converts the part of the input image data into analog data and outputs the analog data to the product-sum operation circuit.
Bit matrix multiplication
Detailed are embodiments related to bit matrix multiplication in a processor. For example, in some embodiments a processor comprising: decode circuitry to decode an instruction have fields for an opcode, an identifier of a first source bit matrix, an identifier of a second source bit matrix, an identifier of a destination bit matrix, and an immediate; and execution circuitry to execute the decoded instruction to perform a multiplication of a matrix of S-bit elements of the identified first source bit matrix with S-bit elements of the identified second source bit matrix, wherein the multiplication and accumulation operations are selected by the operation selector and store a result of the matrix multiplication into the identified destination bit matrix, wherein S indicates a plural bit size is described.
LOW-LATENCY POLYNOMIAL MODULO MULTIPLICATION OVER RING
A modular polynomial multiplier includes a plurality of processing elements. Each includes a multiplication unit, an addition unit and a delay unit. The addition unit has an input connected to the output of the multiplication unit. The delay unit is connected to the output of the addition unit delays values by one clock cycle. The first input of the multiplication unit of each processing element carries a respective coefficient of a first polynomial and the second input of the multiplication unit of each processing element is connected to one of an input line carrying a sequence of coefficients of a second polynomial having n coefficients and a delay line carrying the sequence of coefficients of the second polynomial delayed by n clock cycles and negated.
BIT-SERIAL COMPUTING DEVICE AND TEST METHOD FOR EVALUATING THE SAME
A bit-serial computing device includes a computing circuit and a scaler. The computing circuit includes multiple MAC slices, and receives a multiplier vector and a multiplicand vector that contains multiple multiplicand inputs. Each multiplicand input contains multiple multiplicand segments that have different significances. The significances respectively correspond to the MAC slices. Correspondence between the significances and the MAC slices is variable. Each MAC slice calculates an inner product of the multiplier vector and a vector that is constituted by the multiplicand segments of the multiplicand inputs having the significance corresponding to the MAC slice. With respect to each MAC slice, the scaler multiplies the inner product that is calculated by the MAC slice by a weighting ratio that represents the significance corresponding to the MAC slice, so as to obtain a scaled inner product that corresponds to the MAC slice.
Vector-vector multiplication techniques for processing systems
Vector-vector multiplication or matrix-matrix multiplication computation on computing systems can include computing a first portion of a vector-vector multiplication product based on a most-significant-bit set of a first vector and a most-significant-bit set of a second vector, and determining if the first portion of the vector-vector multiplication product is less than a threshold. If the first partial vector-vector multiplication product is not less than the threshold, a remaining portion of the vector-vector multiplication product can be computed, and a rectified linear vector-vector multiplication product can be determined for the sum of the first portion of the vector-vector multiplication product and the remaining portion of the vector-vector multiplication product. If the first portion of the vector-vector multiplication product is less than the threshold, computation of the remaining portion of the vector-vector multiplication product can be skipped and the rectified linear vector-vector multiplication product can be set to a zero scalar.
Device for computing an inner product
A device for computing an inner product includes an index unit, a storage operation unit, a redundant to 2's complement (RTC) converter, a mapping table, and a multiplier-accumulate (MAC) module. The index unit, storing index values, is coupled to word lines. The storage operation unit includes the word lines and bit lines and stores data values. The mapping table stores coefficients corresponding to the index values. The index unit enables the word line according to a count value and the index value, such that the storage operation unit accumulates the data values corresponding to the bit lines and the enabled word line, thereby generating accumulation results. The RTC converter converts the accumulation results into a total data value in 2's complement format. The MAC module operates based on the total data value and the coefficient to generate an inner product value.