Patent classifications
G06F7/483
Compression of Data that Exhibits Mixed Compressibility
Systems and methods for compression of data that exhibits mixed compressibility, such as floating-point data, are provided. As one example, aspects of the present disclosure can be used to compress floating-point data that represents the values of parameters of a machine-learned model. Therefore, aspects of the present disclosure can be used to compress machine-learned models (e.g., for reducing storage requirements associated with the model, reducing the bandwidth expended to transmit the model, etc.).
USING A LOW-BIT-WIDTH DOT PRODUCT ENGINE TO SUM HIGH-BIT-WIDTH NUMBERS
A system includes a vector multiplier configured to multiply a first vector of integer elements with a second vector of integer elements to determine a resulting vector of integer elements, wherein integer elements of the first and second vectors of integer elements are represented using a first number of bits and an integer element of the first vector of integer elements represents a portion of a value of a group of values. The system further includes a vector adder configured to add together the integer elements of the resulting vector of integer elements to determine a summed result, a bit shifter configured to shift bits of the summed result leftward, and an accumulator configured to determine an accumulated output sum that includes the leftward-shifted summed result.
USING A LOW-BIT-WIDTH DOT PRODUCT ENGINE TO SUM HIGH-BIT-WIDTH NUMBERS
A system includes a vector multiplier configured to multiply a first vector of integer elements with a second vector of integer elements to determine a resulting vector of integer elements, wherein integer elements of the first and second vectors of integer elements are represented using a first number of bits and an integer element of the first vector of integer elements represents a portion of a value of a group of values. The system further includes a vector adder configured to add together the integer elements of the resulting vector of integer elements to determine a summed result, a bit shifter configured to shift bits of the summed result leftward, and an accumulator configured to determine an accumulated output sum that includes the leftward-shifted summed result.
METHOD AND APPARATUS WITH CALCULATION
A processor-implemented method includes: receiving a plurality of pieces of input data expressed as floating point; adjusting a bit-width of mantissa by performing masking on the mantissa of each piece of the input data based on a size of an exponent of each piece of the input data; and performing an operation between the input data with the adjusted bit-width.
METHOD AND APPARATUS FOR TRAINING NEURAL NETWORK MODEL
The embodiments of the present disclosure provides a method and an apparatus for training a neural network model, A training sample is obtained, and the neural network model is trained using the training sample. When the neural network model is trained, power exponential domain fixed-point encoding is performed on a first activation inputted into each network layer and a network weight of each network layer, and an encoded first activation and an encoded network weight are power exponential domain fixed-point data, which when used in the operation, can cause a matrix multiplication operation involved to be converted into an addition operation in the power exponential domain by means of the power exponential domain encoding. The hardware resources required for the addition operation are significantly less than that required for the multiplication operation, which therefore can greatly reduce the hardware resource overhead required for running the neural network model.
COMPUTING APPARATUS AND METHOD FOR VECTOR INNER PRODUCT, AND INTEGRATED CIRCUIT CHIP
The present disclosure relates to a computing apparatus, a method and an integrated circuit chip for a vector inner product, where the computing apparatus may be included in a combined processing apparatus. The combined processing apparatus may further include a general interconnection interface and other processing apparatus. The computing apparatus interacts with other processing apparatus to jointly complete a computing operation specified by a user. The combined processing apparatus may further include a storage apparatus, where the storage apparatus is respectively connected to the computing apparatus and other processing apparatus, and the storage apparatus is used for storing data of the computing apparatus and other processing apparatus.
Floating point to fixed point conversion using exponent offset
A binary logic circuit converts a number in floating point format having an exponent E, an exponent bias B=2.sup.ew-1−1, and a significand comprising a mantissa M of mw bits into a fixed point format with an integer width of iw bits and a fractional width of fw bits. The circuit includes an offset unit configured to offset the exponent of the floating point number by an offset value equal to (iw−1−s.sub.y) to generate a shift value s.sub.v of sw bits given by s.sub.v=(B−E)+(iw−1−s.sub.y), the offset value being equal to a maximum amount by which the significand can be left-shifted before overflow occurs in the fixed point format; a right-shifter operable to receive a significand input comprising a formatted set of bits derived from the significand, the shifter being configured to right-shift the input by a number of bits equal to the value represented by k least significant bits of the shift value to generate an output result, where bitwidth[min(2.sup.ew-1−1, iw−1−s.sub.y)+min(2.sup.ew-1−2, fw)]≤k≤sw, where s.sub.y=1 for a signed floating point number and s.sub.y=0 for an unsigned floating point number.
Floating point to fixed point conversion using exponent offset
A binary logic circuit converts a number in floating point format having an exponent E, an exponent bias B=2.sup.ew-1−1, and a significand comprising a mantissa M of mw bits into a fixed point format with an integer width of iw bits and a fractional width of fw bits. The circuit includes an offset unit configured to offset the exponent of the floating point number by an offset value equal to (iw−1−s.sub.y) to generate a shift value s.sub.v of sw bits given by s.sub.v=(B−E)+(iw−1−s.sub.y), the offset value being equal to a maximum amount by which the significand can be left-shifted before overflow occurs in the fixed point format; a right-shifter operable to receive a significand input comprising a formatted set of bits derived from the significand, the shifter being configured to right-shift the input by a number of bits equal to the value represented by k least significant bits of the shift value to generate an output result, where bitwidth[min(2.sup.ew-1−1, iw−1−s.sub.y)+min(2.sup.ew-1−2, fw)]≤k≤sw, where s.sub.y=1 for a signed floating point number and s.sub.y=0 for an unsigned floating point number.
Method and apparatus for generating fixed-point quantized neural network
A method of generating a fixed-point quantized neural network includes analyzing a statistical distribution for each channel of floating-point parameter values of feature maps and a kernel for each channel from data of a pre-trained floating-point neural network, determining a fixed-point expression of each of the parameters for each channel statistically covering a distribution range of the floating-point parameter values based on the statistical distribution for each channel, determining fractional lengths of a bias and a weight for each channel among the parameters of the fixed-point expression for each channel based on a result of performing a convolution operation, and generating a fixed-point quantized neural network in which the bias and the weight for each channel have the determined fractional lengths.
Method and apparatus for generating fixed-point quantized neural network
A method of generating a fixed-point quantized neural network includes analyzing a statistical distribution for each channel of floating-point parameter values of feature maps and a kernel for each channel from data of a pre-trained floating-point neural network, determining a fixed-point expression of each of the parameters for each channel statistically covering a distribution range of the floating-point parameter values based on the statistical distribution for each channel, determining fractional lengths of a bias and a weight for each channel among the parameters of the fixed-point expression for each channel based on a result of performing a convolution operation, and generating a fixed-point quantized neural network in which the bias and the weight for each channel have the determined fractional lengths.