H03M7/24

Logarithmic addition-accumulator circuitry, processing pipeline including same, and methods of operation

An integrated circuit including a plurality of logarithmic addition-accumulator circuits, connected in series, to, in operation, perform logarithmic addition and accumulate operations, wherein each logarithmic addition-accumulator circuit includes: (i) a logarithmic addition circuit to add a first input data and a filter weight data, each having the logarithmic data format, and to generate and output first sum data having a logarithmic data format, and (ii) an accumulator, coupled to the logarithmic addition circuit of the associated logarithmic addition-accumulator circuit, to add a second input data and the first sum data output by the associated logarithmic addition circuit to generate first accumulation data. The integrated circuit may further include first data format conversion circuitry, coupled to the output of each logarithmic addition circuit, to convert the data format of the first sum data to a floating point data format wherein the accumulator may be a floating point type.

Logarithmic addition-accumulator circuitry, processing pipeline including same, and methods of operation

An integrated circuit including a plurality of logarithmic addition-accumulator circuits, connected in series, to, in operation, perform logarithmic addition and accumulate operations, wherein each logarithmic addition-accumulator circuit includes: (i) a logarithmic addition circuit to add a first input data and a filter weight data, each having the logarithmic data format, and to generate and output first sum data having a logarithmic data format, and (ii) an accumulator, coupled to the logarithmic addition circuit of the associated logarithmic addition-accumulator circuit, to add a second input data and the first sum data output by the associated logarithmic addition circuit to generate first accumulation data. The integrated circuit may further include first data format conversion circuitry, coupled to the output of each logarithmic addition circuit, to convert the data format of the first sum data to a floating point data format wherein the accumulator may be a floating point type.

DIGITAL VARIABLE GAIN ADJUSTMENT ON BASEBAND CHIP
20220407485 · 2022-12-22 ·

Embodiments of apparatus and method for digital variable gain adjustment (DVGA) are disclosed. In an example, a baseband chip includes an unpacking module, a symbol recording module operatively coupled to the unpacking module, and a first variable gain adjusting (VGA) module operatively coupled to the symbol recording module. The unpacking module is configured to unpack a plurality of symbols from a first representation of pseudo floating-point numbers to a second representation of fixed-point numbers. The symbol recording module is configured to obtain a symbol parameter based on the unpacking. The first VGA module is configured to dynamically adjust gains of the plurality of symbols having the second representation based on the symbol parameter.

DIGITAL VARIABLE GAIN ADJUSTMENT ON BASEBAND CHIP
20220407485 · 2022-12-22 ·

Embodiments of apparatus and method for digital variable gain adjustment (DVGA) are disclosed. In an example, a baseband chip includes an unpacking module, a symbol recording module operatively coupled to the unpacking module, and a first variable gain adjusting (VGA) module operatively coupled to the symbol recording module. The unpacking module is configured to unpack a plurality of symbols from a first representation of pseudo floating-point numbers to a second representation of fixed-point numbers. The symbol recording module is configured to obtain a symbol parameter based on the unpacking. The first VGA module is configured to dynamically adjust gains of the plurality of symbols having the second representation based on the symbol parameter.

Compression of Data that Exhibits Mixed Compressibility
20220368343 · 2022-11-17 ·

Systems and methods for compression of data that exhibits mixed compressibility, such as floating-point data, are provided. As one example, aspects of the present disclosure can be used to compress floating-point data that represents the values of parameters of a machine-learned model. Therefore, aspects of the present disclosure can be used to compress machine-learned models (e.g., for reducing storage requirements associated with the model, reducing the bandwidth expended to transmit the model, etc.).

Floating point to fixed point conversion using exponent offset
11588497 · 2023-02-21 · ·

A binary logic circuit converts a number in floating point format having an exponent E, an exponent bias B=2.sup.ew-1−1, and a significand comprising a mantissa M of mw bits into a fixed point format with an integer width of iw bits and a fractional width of fw bits. The circuit includes an offset unit configured to offset the exponent of the floating point number by an offset value equal to (iw−1−s.sub.y) to generate a shift value s.sub.v of sw bits given by s.sub.v=(B−E)+(iw−1−s.sub.y), the offset value being equal to a maximum amount by which the significand can be left-shifted before overflow occurs in the fixed point format; a right-shifter operable to receive a significand input comprising a formatted set of bits derived from the significand, the shifter being configured to right-shift the input by a number of bits equal to the value represented by k least significant bits of the shift value to generate an output result, where bitwidth[min(2.sup.ew-1−1, iw−1−s.sub.y)+min(2.sup.ew-1−2, fw)]≤k≤sw, where s.sub.y=1 for a signed floating point number and s.sub.y=0 for an unsigned floating point number.

Floating point to fixed point conversion using exponent offset
11588497 · 2023-02-21 · ·

A binary logic circuit converts a number in floating point format having an exponent E, an exponent bias B=2.sup.ew-1−1, and a significand comprising a mantissa M of mw bits into a fixed point format with an integer width of iw bits and a fractional width of fw bits. The circuit includes an offset unit configured to offset the exponent of the floating point number by an offset value equal to (iw−1−s.sub.y) to generate a shift value s.sub.v of sw bits given by s.sub.v=(B−E)+(iw−1−s.sub.y), the offset value being equal to a maximum amount by which the significand can be left-shifted before overflow occurs in the fixed point format; a right-shifter operable to receive a significand input comprising a formatted set of bits derived from the significand, the shifter being configured to right-shift the input by a number of bits equal to the value represented by k least significant bits of the shift value to generate an output result, where bitwidth[min(2.sup.ew-1−1, iw−1−s.sub.y)+min(2.sup.ew-1−2, fw)]≤k≤sw, where s.sub.y=1 for a signed floating point number and s.sub.y=0 for an unsigned floating point number.

Method and apparatus for generating fixed-point quantized neural network

A method of generating a fixed-point quantized neural network includes analyzing a statistical distribution for each channel of floating-point parameter values of feature maps and a kernel for each channel from data of a pre-trained floating-point neural network, determining a fixed-point expression of each of the parameters for each channel statistically covering a distribution range of the floating-point parameter values based on the statistical distribution for each channel, determining fractional lengths of a bias and a weight for each channel among the parameters of the fixed-point expression for each channel based on a result of performing a convolution operation, and generating a fixed-point quantized neural network in which the bias and the weight for each channel have the determined fractional lengths.

Method and apparatus for generating fixed-point quantized neural network

A method of generating a fixed-point quantized neural network includes analyzing a statistical distribution for each channel of floating-point parameter values of feature maps and a kernel for each channel from data of a pre-trained floating-point neural network, determining a fixed-point expression of each of the parameters for each channel statistically covering a distribution range of the floating-point parameter values based on the statistical distribution for each channel, determining fractional lengths of a bias and a weight for each channel among the parameters of the fixed-point expression for each channel based on a result of performing a convolution operation, and generating a fixed-point quantized neural network in which the bias and the weight for each channel have the determined fractional lengths.

Apparatus and method for converting a floating-point value from half precision to single precision

An embodiment of the invention is a processor including execution circuitry to, in response to a decoded instruction, convert a half-precision floating-point value to a single-precision floating-point value and store the single-precision floating-point value in each of the plurality of element locations of a destination register. The processor also includes a decoder and the destination register. The decoder is to decode an instruction to generate the decoded instruction.