Methods and apparatuses for compressing parameters of neural networks

Abstract

An encoder for encoding weight parameters of a neural network is configured to obtain a plurality of weight parameters of the neural network, to encode the weight parameters of the neural network using a context-dependent arithmetic coding, to select a context for an encoding of a weight parameter, or for an encoding of a syntax element of a number representation of the weight parameter, in dependence on one or more previously encoded weight parameters and/or in dependence on one or more previously encoded syntax elements of a number representation of one or more weight parameters, and to encode the weight parameter, or a syntax element of the weight parameter, using the selected context. Corresponding decoder, quantizer, methods and computer programs are also described.

Claims

1. A decoder for decoding weight parameters of a neural network wherein the decoder is configured to acquire a plurality bits representing weight parameters of the neural network; wherein the decoder is configured to decode the weight parameters of the neural network using a context-dependent arithmetic coding; wherein the decoder is configured to select a context for a decoding of a weight parameter, or for a decoding of a syntax element of a number representation of the weight parameter, in dependence on one or more previously decoded weight parameters and/or in dependence on one or more previously decoded syntax elements of a number representation of one or more weight parameters; and wherein the decoder is configured to decode the weight parameter, or a syntax element of the weight parameter, using the selected context.

2. The decoder of claim 1, wherein the decoder is configured to determine probabilities for bin values of a given bin associated with a given context in dependence on one or more previously decoded bin values associated with the given context.

3. The decoder of claim 1, wherein the decoder is configured to select a context for the decoding of a zero flag of the weight parameter in dependence on a sign of a previously decoded weight parameter.

4. The decoder of claim 1, wherein the decoder is configured to select a context for the decoding of a zero flag of the weight parameter out of at least three different zero flag contexts.

5. The decoder of claim 1, wherein the decoder is configured to select a context for the decoding of a zero flag of the weight parameter in dependence on whether a currently decoded weight parameter is a first weight parameter in a scanning row of a matrix of weight parameters.

6. The decoder of claim 1, wherein the decoder is configured to select the context for a decoding of a zero flag of the weight parameter in dependence whether a weight parameter preceding the currently decoded weight parameter has already been decoded and/or is available.

7. The decoder of claim 1, wherein the decoder is configured to select a first context for a decoding of a zero flag of the weight parameter in case that a previously decoded weight parameter is zero and in case that a weight parameter preceding the currently decoded weight parameter has not yet been decoded and in case that a weight parameter preceding the currently decoded weight parameter is not available, and to select a second context for a decoding of a zero flag of the weight parameter in case that the previously decoded weight parameter is smaller than zero, and to select a third context for a decoding of a zero flag of the weight parameter in case that the previously decoded weight parameter is larger than zero.

8. The decoder of claim 1, wherein the decoder is configured to determine a plurality of status identifies representing statuses of a plurality of weight parameters at a plurality of positions relative to a position of a currently decoded weight parameter in the form of a numeric value, and to combine the status identifiers, in order to acquire a context index value representing a context of the currently decoded weight parameter.

9. The decoder of claim 1, wherein the decoder is configured to select a context for the decoding of a zero flag of the weight parameter in dependence on how many zero-valued weight parameters and/or unavailable weight parameters in a row are adjacent to the currently decoded weight parameter.

10. The decoder of claim 9, wherein the plurality of weight parameters is arranged in a matrix, and the weight parameters are denoted as I.sub.x−1,y, I.sub.x−,y and I.sub.x−3,y and correspond to positions (x−1,y), (x−2,y) and (x−3,y) in the matrix, respectively, and are represented by status identifiers s.sub.x−1,y, s.sub.x−2,y, s.sub.x−3,y.

11. The decoder of claim 8, wherein the plurality of weight parameters is arranged in a matrix, and a status identifier s.sub.x,y for a position (x,y) in the matrix is equal to a first value, if the position (x,y) is not available or the weight parameter at the position (x,y) is equal to zero, the status identifier s.sub.x,y for the position (x,y) is equal a second value, if the weight parameter at the position (x,y) is smaller than zero, and the status identifier s.sub.x,y for the position (x,y) is equal to a third value, if the weight parameter at the position (x,y) is larger than 0.

12. The decoder of claim 8, wherein the plurality of weight parameters is arranged in a matrix, and a status identifier s.sub.x,y for a position (x,y) in the matrix is equal to a first value, if the position (x,y) is not available or the weight parameter at the position (x,y) is equal to zero, and the status identifier s.sub.x,y for the position (x,y) is equal to a second value, if the position (x,y) is available and the weight parameter at the position (x,y) is not equal to zero.

13. The decoder of claim 1, wherein the decoder is configured to select a context for the decoding of a zero flag of the weight parameter in dependence on a distance of a closest non-zero weight parameter present in a predetermined direction, when seen from the currently decoded weight parameter.

14. The decoder of claim 1, wherein the decoder is configured to select a context for the decoding of a zero flag of the weight parameter considering only a single one previously decoded weight parameter, which is adjacent to the currently decoded weight parameter.

15. The decoder of claim 14, wherein the decoder is configured to determine a status identifier for the single one previously decoded weight position, wherein the status identifier for the single one previously decoded weight parameter equals to a first value, if the single one previously decoded weight parameter is not available or the weight parameter at the position (x,y) is equal to zero, equals to a second value, if the single one previously decoded weight parameter is smaller than zero, and equals to a third values, if the single one previously decoded weight parameter is larger than 0; and wherein the decoder is configured to select the context in dependence on the status identifier.

16. The decoder of claim 1, wherein the decoder is configured to select different contexts in dependent on whether the previously decoded weight parameter is smaller than zero, equal to zero or larger than zero.

17. The decoder of claim 1, wherein the decoder is configured to select a context associated with a zero value of the previously decoded weight parameter in case the previously decoded weight parameter is not available.

18. The decoder of claim 1, wherein the weight parameters are organized in rows and columns of a matrix, wherein an order in which the weight parameters are decoded is along a first row of the matrix, then along a subsequent second row of the matrix, or wherein an order in which the weight parameters are decoded is along a first column of the matrix, then along a subsequent second column of the matrix.

19. A method for decoding weight parameters of a neural network wherein the method comprises acquiring a plurality of bits representing weight parameters of the neural network; wherein the method comprises decoding the weight parameters of the neural network using a context-dependent arithmetic coding; wherein the method comprises selecting a context for a decoding of a weight parameter, or for a decoding of a syntax element of a number representation of the weight parameter, in dependence on one or more previously decoded weight parameters and/or in dependence on one or more previously decoded syntax elements of a number representation of one or more weight parameters; and wherein the weight parameter, or a syntax element of the weight parameter, is decoded using the selected context.

20. A non-transitory digital storage medium having a computer program stored thereon to perform the method for decoding weight parameters of a neural network, which method comprises acquiring a plurality of bits representing weight parameters of the neural network; wherein the method comprises decoding the weight parameters of the neural network using a context-dependent arithmetic coding; wherein the method comprises selecting a context for a decoding of a weight parameter, or for a decoding of a syntax element of a number representation of the weight parameter, in dependence on one or more previously decoded weight parameters and/or in dependence on one or more previously decoded syntax elements of a number representation of one or more weight parameters; and wherein the weight parameter, or a syntax element of the weight parameter, is decoded using the selected context, when said computer program is run by a computer.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0216] Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:

[0217] FIG. 1 shows a graph representation of a feed forward neural network,

[0218] FIG. 2 shows an exemplary embodiment of an encoder according to an aspect of the present invention,

[0219] FIG. 3 shows an exemplary embodiment of a decoder according to an aspect of the present invention, and

[0220] FIG. 4 shows an exemplary embodiment of a quantizer according to an aspect of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

[0221] In the following, some approaches for the determination of neural network coefficients will be described, which may for example be used in combination with the further concepts disclosed herein. However it should be noted that different approaches for the determination of the coefficients of a neural network may also be used.

[0222] For example, the here presented apparatus applies a relevance estimation based on the later Bayesian approach. Concretely, it applies the algorithm presented in “Variational Dropout Sparsifies Deep Neural Networks” (Dmitry Molchanov; arXiv:1701.05369, 2017). The algorithm can be applied in order to estimate the optimal mean and variances for each weight parameter of the network for solving the particular task. Once these have been estimated, a relevance-weighted quantization algorithm is applied on to the mean values of the weight parameters.

[0223] Concretely, it uses the standard deviation of each parameter as a measure of the interval-size where quantization is allowed (more on this later).

[0224] The apparatus proposes two options for the estimation of the mean and variances.

[0225] The first option applies the algorithm fully as described in above document. Thus, it trains both, the mean and variances of each weight parameter in order to attain the optimal configuration for solving the task. The initialization of the means may be either random or taken from pretrained network. This approach comes with the advantage that the resulting network may be highly quantized and compressed. However, it is disadvantageous that it involves high computational resources in order to be trained. Here, an entire training set of data may be used for the estimation of the mean and variances.

[0226] The second option does not have the disadvantage as the first one, i.e. involving high computational resources, as it takes a pretrained network as initialization and fixes its parameters as the mean of the distribution (thus, are unchanged). Then, only the variances of each parameters are estimated by applying the algorithm indicated above. Whilst this approach may not attain as high compression gains, it comes with the advantage that the computational resources are greatly reduced, since this option only estimates the variances. This method may be applied if the entire training set of data is available, or only a subset of data samples (such as a validation set).

[0227] The algorithm indicated above redefines the forward propagation method into a stochastic algorithm and minimizes a variational objective instead. Concretely, the algorithm attempts to minimize a regularized objective

custom-character (ϕ)=L.sub.D(ϕ)−D.sub.KL(q.sub.ϕ (w)∥p(w))

where the first term tries to find the mean and variances of each parameter (as parametrized by psi) that solve the task well, and the other attempts to sparsify the means and to maximize the variances.

[0228] Hence, the second option attempts to find the maximum variances (or perturbations) that may be applied to the pretrained values of the network while minimally affecting the accuracy of it. And the first option attempts to additionally find a network with maximal number of 0 means. Therefore, we usually attain higher compression gains when we apply the first option, but at the expense of having to apply high computational resources for the estimation.

[0229] In the following, an approach will be described, which may, for example, be used for the quantization of parameters of a neural network (for example, for the quantization of parameters determined as described above). The quantization approach may, for example, be used in combination with any of the other concepts disclosed herein, but may also be used individually.

[0230] Quantization: Based on the estimated mean and variances of the network, the apparatus applies a context-adaptive relevance-weighted quantization method on to the mean values of the parameter.

[0231] FIG. 4 shows an example for a quantizer according to an aspect of the present invention. It is shown that a quantizer 300 for quantizing weight parameters of a neural network 320 obtains a plurality of input weight parameters 310 of the neural network 320. The quantizer 300 determines a quantized weight parameter 330 on the basis of an input weight parameter 310 using distortion measures 340 describing an expected distortion caused by a usage of different quantized values instead of an unquantized input weight parameter and using bit amount values describing estimated or computed bit efforts for a representation of different quantized values.

[0232] However, it should be noted that different quantization concepts/quantization architectures can be used. In the following some optional details, which may be used for the quantization, e.g. for the quantization of neural network parameters, will be described, which can be used both individually and may be taken in combination.

[0233] Distortion measure: The following weighted distance measure

[00002] $D_{i, k} = \frac{1}{{σ_{i}}^{2}} {(w_{i} - q_{i, k})}^{2}$

may, for example, be employed as distortion measure, where w.sub.i is the i-th weight of a sequence of weights and where .sub.a is the associated standard deviation and where q.sub.i,k is the k-th one of a number of possible quantized versions of w.sub.i. Note that the distortion value D.sub.i,k doesn't exceed 1 if the quantized weight q.sub.i,k lies inside the respective standard deviation interval.

[0234] The quantized versions of a given weight are derived through quantization function |Q(.Math.)| which may, for example, constrain the quantized values q.sub.i,k to be equidistant, allowing for fixed-point representations.

[0235] Rate-distortion optimized quantization: In order to get a good trade-off between compression efficiency and prediction accuracy a rate-distortion optimized quantization may be applied. Therefore, a cost function may be defined

custom-character
for each candidate quantized weight q.sub.i,k, with a distortion measure D.sub.y,k and a bit amount R.sub.y,k. Parameter k controls the operation point and may be chosen depending on the actual application. For example, the distortion measure D.sub.y,k as described above may be applied. Depending on the encoding algorithm, the bit amount R.sub.y,k may be estimated. It is the number of bits that may be used to encode q.sub.i,k into the bit stream. Then, given k, the cost function cost.sub.i,k is minimized over k.

[0236] It may further be of interest to only allow quantized weights for which D.sub.y,k doesn't exceed 1. In this case, the quantized weight q.sub.i,k is guaranteed to stay within the standard deviation interval of the weight w.sub.i.

[0237] In the following, concepts for lossless encoding and decoding, for example for lossless encoding and decoding of neural network parameters, or of quantized neural network parameters, will be described. The concepts for lossless encoding and decoding may, for example be used in combination with the neural network parameter determination described above and/or in combination with the quantization as described above, but may also be taken individually.

[0238] Lossless encoding and decoding: If a uniform quantizer is applied in the previous step, the quantized weight parameters may be represented by an integer value (weight levels) and a scaling factor. The scaling factor can be referred to as quantization step size, which may, for example, be fixed for a whole layer. In order to restore all quantized weight parameters of a layer, the step size and dimensions of the layer may be known by the decoder. They may, for example, be transmitted separately. In this case. The binary patterns are simply written to the bitstream, starting with the dimensions (integer) followed by the step size A (e.g. 32 bit float number).

[0239] Encoding of integers with context-adaptive binary arithmetic coding (CABAC): The quantized weight levels (integer representation) may then be transmitted using entropy coding techniques. Therefore, a layer of weights is mapped onto a sequence of quantized weight levels using a scan.

[0240] FIG. 2 shows an example for an encoder according to an aspect of the present invention. It is shown that an encoder 100 for encoding weight parameters 110 of a neural network 120 obtains a plurality of weight parameters 110 of the neural network 120. Then the encoder 100 encodes the weight parameters 110 of the neural network 120 using a context-dependent arithmetic coding 130, wherein the encoder 100 selects a context out of several available contexts 140.sub.1, 140.sub.2, . . . 140.sub.n for an encoding of one weight parameter 110, or for an encoding of a syntax element 110a of a number representation of the weight parameter.

[0241] The selection is performed in dependence on certain criteria 150 for selection. This document describes many possible options for this criterion 150. One possible criterion 150 is that the selection is performed in dependence on one or more previously encoded weight parameters and/or in dependence on one or more previously encoded syntax elements of a number representation of one or more weight parameters. The encoder encodes the weight parameter 110, or the syntax element 110a of the weight parameter, using the selected context.

[0242] However, it should be noted that different encoding concepts can be used. In the following some optional details, which may be used for the encoding, e.g. for the encoding of neural network parameters, will be described, which can be used both individually and may be taken in combination.

[0243] As an optional example, in an advantageous embodiment, a row-first scan order is used, starting with the upper-most row of the matrix, encoding the contained values from left to right. In this way, all rows are encoded from the top to the bottom.

[0244] As another optional example, in another advantageous embodiment, the matrix is transposed before applying the row-first scan.

[0245] As another optional example, in another advantageous embodiment, the matrix is flipped horizontally and/or vertically and/or rotated by 90/180/270 degree to the left or right, before the row-first scan is applied.

[0246] For coding of the levels CABAC (Context-Adaptive Binary Arithmetic Coding) is used. Details can be found in “Context-Based Adaptive Binary Arithmetic Coding in the H.264/AVC Video Compression Standard” (D. Marpe, et al.; IEEE transactions on circuits and systems for video technology, Vol. 13, No. 7, pp. 620-636, July 2003). So, a quantized weight level l is decomposed in a series of binary symbols or syntax elements, which then may be handed to the binary arithmetic coder (CABAC).

[0247] In the first step, a binary syntax element sig_flag is derived for the quantized weight level, which specifies whether the corresponding level is equal to zero. If the sig_flag is equal to one a further binary syntax elements sign_flag is derived. The bin indicates if the current weight level is positive (e.g. bin=0) or negative (e.g. bin=1).

[0248] Next, a unary sequence of bins is encoded, followed by a fixed length sequence as follows:

[0249] A variable k is initialized with a non-negative integer and X is initialized with 1<<k.

[0250] One or more syntax elements abs_level_greater_X are encoded, which indicate, that the absolute value of the quantized weight level is greater than X. If abs_level_greater_X is equal to 1, the variable k is updated (for example, increased by 1), then 1<<k is added to X and a further abs_level_greater_X is encoded. This procedure is continued until an abs_level_greater_X is equal to 0. Afterwards, a fixed length code of length k suffices to complete the encoding of the quantized weight index. For example, a variable rem=X−|l| could be encoded using k bits. Or alternatively, a variable rem′ could be defined as custom-character which is encoded using k bits. Any other mapping of the variable rem to a fixed length code of k bits may alternatively be used.

[0251] When increasing k by 1 after each abs_level_greater_X, this approach is identical to applying exponential Golomb coding (if the sign_flag is not regarded).

[0252] Additionally, if the maximum absolute value abs_max is known at the encoder and decoder side, encoding of abs_level_greater_X syntax elements may be terminated, when for the next abs_level_greater_X to be transmitted, X>=abs_max holds.

[0253] Decoding of integers with CABAC: Decoding of the quantized weight levels (integer representation) works analogously to the encoding.

[0254] FIG. 3 shows an example for a decoder according to an aspect of the present invention. It is shown that a decoder 200 for decoding weight parameters 260 of a neural network 220 obtains a plurality of weight parameters 260 of the neural network 220. Then the decoder 200 decodes the weight parameters 260 of the neural network 220 using a context-dependent arithmetic coding 230, wherein the decoder 200 selects a context out of several available contexts 240.sub.1, 240.sub.2, . . . 240.sub.n for a decoding of one weight parameter 260, or for a decoding of a syntax element 260a of a number representation of the weight parameter.

[0255] The selection is performed in dependence on certain criteria 250 for selection. This document describes many possible options for this criterion 250. One possible criterion 250 is that the selection is performed in dependence on one or more previously decoded weight parameters and/or in dependence on one or more previously decoded syntax elements of a number representation of one or more weight parameters. The decoder decodes the weight parameter 260, or the syntax element 260a of the weight parameter, using the selected context.

[0256] However, it should be noted that different decoding concepts can be used. In the following some optional details, which may be used for the decoding, e.g. for the decoding of neural network parameters, will be described, which can be used both individually and may be taken in combination.

[0257] The decoder first decodes the sig_flag. If it is equal to one, a sign_flag and a unary sequence of abs_level_greater_X follows, where the updates of k, (and thus increments of X) will follow the same rule as in the encoder. Finally, the fixed length code of k bits is decoded and interpreted as integer number (e.g. as rem or rem′, depending on which of both was encoded). The absolute value of the decoded quantized weight level Ill may then be reconstructed from X, and form the fixed length part. For example, if rem was used as fixed-length part, |l|=X−rem. Or alternatively, if rem′ was encoded, custom-character As a last step, the sign needs to be applied to |l| in dependence on the decoded sign_flag, yielding the quantized weight level l. Finally, the quantized weight q is reconstructed by multiplying the quantized weight level l with the step size Δ.

[0258] In an advantageous embodiment, k is initialized with 0 and updated as follows. After each abs_level_greater_X equal to 1, the update of k that may be performed is done according to the following rule: If X>X′, k is incremented by 1 where X′ is a constant depending on the application. For example, X′ is a number (e.g. between 0 and 100) that is derived by the encoder and signaled to the decoder.

[0259] Context modeling: In the CABAC entropy coding, most syntax elements for the quantized weight levels are coded using a binary probability modelling. Each binary decision (bin) is associated with a context. A context represents a probability model for a class of coded bins. The probability for one of the two possible bin values is estimated for each context based on the values of the bins that have been already coded with the corresponding context. Different context modelling approaches may be applied, depending on the application. Usually, for several bins related to the quantized weight coding, the context, that is used for coding, is selected based on already transmitted syntax elements. Different probability estimators may be chosen, for example SBMP (State-Based Multi-Parameter estimator), or those of HEVC or VTM-4.0, depending on the actual application. The choice affects, for example, the compression efficiency and complexity.

[0260] Details for SBMP can be found in “JVET-K0430-v3-CE5-related: State-based probability estimator” (H. Kirchhoffer, et al.; in JVET, Ljubljana, 2018).

[0261] Further details for HEVC can be found in “ITU-T H.265 High efficiency video coding” (ITU—International Telecommunication Union, Series H: Audiovisual and multimedia systems—Infrastructure of audiovisual services—Coding of moving video, April 2015).

[0262] And details for VTM-4.0 can be found in “JVET-M1001-v6—Versatile Video Coding (Draft 4)” (B. Bross, et al.; in JVET, Marrakech, 2019).

[0263] A context modeling scheme that fits a wide range of neural networks is described as follows. For decoding a quantized weight level l at a particular position (x,y) in the weight matrix, a local template is applied to the current position. This template contains a number of other (ordered) positions like e.g. (x−1, y), (x, y−1), (x−1, y−1), etc. For each position, a status identifier is derived.

[0264] In an advantageous embodiment (denoted Si1), a status identifier s.sub.x,y for a position (x,y) is derived as follows: If position (x,y) points outside of the matrix, or if the quantized weight level l.sub.x,y at position (x,y) is not yet decoded or equals zero, the status identifier s.sub.x,y=0. Otherwise, the status identifier shall be s.sub.x,y=l.sub.x,y<0?1:2.

[0265] In another advantageous embodiment (denoted Si2), a status identifier s.sub.x,y for a position (x,y) is derived as follows: If position (x,y) points outside of the matrix, or if the quantized weight level l.sub.i,y at position (x,y) is not yet decoded or equals zero, the status identifier s.sub.x,y=0. Otherwise, the status identifier shall be s.sub.x,y=1.

[0266] For a particular template, a sequence of status identifiers is derived, and each possible constellation of the values of the status identifiers is mapped to a context index, identifying a context to be used. The template, and the mapping may be different for different syntax elements. For example, from a template containing the (ordered) positions (x−1, y), (x, y−1), (x−1, y−1) an ordered sequence of status identifiers s.sub.x−1,y, s.sub.x,y−1, s.sub.x−1,y−1 is derived. For example, this sequence may be mapped to a context index C=s.sub.x−1,y+3*s.sub.x,y−1+9* s.sub.x−1,y−1. For example, the context index C may be used to identify a number of contexts for the sig_flag.

[0267] In an advantageous embodiment (denoted approach 1), the local template for the sig_flag or for the sign_flag of the quantized weight level l.sub.x,y at position (x,y) consists of only one position (x−1, y) (i.e. the left neighbor). The associated status identifier s.sub.x−1,y is derived according to advantageous embodiment Si1.

[0268] For the sig_flag, one out of three contexts is selected depending on the value of s.sub.x−1,y or for the sign_flag, one out of three other contexts is selected depending on the value of s.sub.x−1,y.

[0269] In another advantageous embodiment (denoted approach 2), the local template for the sig flag contains the three ordered positions (x−1, y), (x−2, y), (x−3, y). The associated sequence of status identifiers s.sub.x−1,y, s.sub.x−2,y, s.sub.x−3,y is derived according to advantageous embodiment Si2.

[0270] For the sig_flag, the context index C is derived as follows:

[0271] If custom-character then C=0. Otherwise, if then C=1. Otherwise, if then C=2. Otherwise, C=3.

[0272] This may also be expressed by the following equation:

|C=(s.sub.x−1,y≠0)?0:((s.sub.x−2,y≠0)?1:((s.sub.x−3,y≠0)?2:3))|

[0273] In the same manner, the number of neighbors to the left may be increased or decreased so that the context index C equals the distance to the next nonzero weight to the left (not exceeding the template size).

[0274] Each abs_level_greater_X flag may, for example, apply an own set of two contexts. One out of the two contexts is then chosen depending on the value of the sign_flag.

[0275] In an advantageous embodiment, for abs_level_greater_X flags with X smaller than a predefined number X′, different contexts are distinguished depending on X and/or on the value of the sign_flag.

[0276] In an advantageous embodiment, for abs_level_greater_X flags with X greater or equal to a predefined number X′, different contexts are distinguished only depending on X.

[0277] In another advantageous embodiment, abs_level_greater_X flags with X greater or equal to a predefined number X′ are encoded using a fixed code length of 1 (e.g. using the bypass mode of an arithmetic coder).

[0278] Furthermore, some or all of the syntax elements may also be encoded without the use of a context. Instead, they are encoded with a fixed length of 1 bit. E.g. using a so-called bypass bin of CABAC.

[0279] In another advantageous embodiment, the fixed-length remainder rem is encoded using the bypass mode.

[0280] In another advantageous embodiment, the encoder determines a predefined number X′, distinguishes for each syntax element abs_level_greater_X with X <X′ two contexts depending on the sign, and uses for each abs_level_greater_X with X>=X′ one context.

[0281] Particularly advantageous aspects:

[0282] According to an aspect of the present invention, the estimated standard deviation of each parameter can be interpreted as the respective relevance score and accordingly weight the distortion measure of the quantization step.

[0283] Further, context adaptive quantization method can be applied based on the distribution of the mean parameter values and their variances.

[0284] Finally, the decoding procedure can be adapted in order to be able to perform efficient dot product operations.

[0285] Any of these concepts may optionally be used in any of the embodiments, in combination with any other aspect or taken individually.

[0286] Generalizations

[0287] The here presented apparatus (or, generally speaking, any of the embodiments disclosed herein) may be generalized and adapted to other relevance score measures. Namely, the distortion function that is applied in the quantization procedure may be generalized to

D.sub.i=R.sub.id(w.sub.i, q(w.sub.i))

where now d(.,.) may be any distance measure and R.sub.i any relevance score measure.

[0288] However, any other distortion function can also be used optionally. It may even be possible to combine more than one distortion function to generate a distortion measure for use with any of the concepts described herein.

[0289] Other work: There have been some work where they suggested to apply a weighted entropy-constrained quantization algorithm. Details can be found in “Towards the limit of network quantization” (Yoojin Choi, et al.; CoRR, abs/1612.01543, 2016) and “Weighted-entropy-based quantization for deep neural networks” (Eunhyeok Park, et al.; in CVPR, 2017). However, their quantization algorithm is based on the entropy-constrained Lloyd algorithm for scalar quantization (see also “Source Coding: Part I of Fundamentals of Source and Video Coding” (Thomas Wiegand and Heiko Schwarz, Foundations and Trends® in Signal Processing: Vol. 4: No. 1-2, 2011)) and does therefore not apply any context-based adaptation algorithm, neither any optimizations that aim to improve the associated dot product algorithm. Moreover, in contrast to the method applied in this document, their relevance scores are based on Taylor-expansion methods or parameters magnitude-based methods.

[0290] However, it has been found that the concepts described in above mentioned document can optionally be used—individually or in combination—with one or more aspects of the present document.

[0291] Conclusions

[0292] To conclude, the embodiments described herein can optionally be supplemented by any of the important points or aspects described here. However, it is noted that the important points and aspects described here can either be used individually or in combination and can be introduced into any of the embodiments described herein, both individually and in combination.

[0293] Implementation Alternatives

[0294] Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, one or more of the most important method steps may be executed by such an apparatus.

[0295] Depending on certain implementation requirements, embodiments of an aspect of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a Blu-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed.

[0296] Therefore, the digital storage medium may be computer readable.

[0297] Some embodiments according to an aspect of the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.

[0298] Generally, embodiments of an aspect of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer.

[0299] The program code may for example be stored on a machine-readable carrier.

[0300] Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine-readable carrier.

[0301] In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.

[0302] A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or non-transitionary.

[0303] A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.

[0304] A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.

[0305] A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.

[0306] A further embodiment according to an aspect of the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.

[0307] In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are advantageously performed by any hardware apparatus.

[0308] The apparatus described herein may be implemented using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.

[0309] The apparatus described herein, or any components of the apparatus described herein, may be implemented at least partially in hardware and/or in software.

[0310] The methods described herein may be performed using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.

[0311] The methods described herein, or any components of the apparatus described herein, may be performed at least partially by hardware and/or by software.

[0312] While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations and equivalents as fall within the true spirit and scope of the present invention.

Methods and apparatuses for compressing parameters of neural networks

Inventors

Cpc classification

Classification Explorer

H03M7/4018

ELECTRICITY

Classification Explorer

H03M7/3079

ELECTRICITY

Classification Explorer

H03M7/70

ELECTRICITY

Classification Explorer

G06N3/08

PHYSICS

Classification Explorer

H03M7/702

ELECTRICITY

Classification Explorer

G06N3/04

PHYSICS

Classification Explorer

G06N3/10

PHYSICS

Classification Explorer

G06N3/063

PHYSICS

International classification

Classification Explorer

G06N3/04

PHYSICS

Classification Explorer

G06N3/10

PHYSICS

Classification Explorer

H03M7/30

ELECTRICITY

Abstract

Claims

Description