DECODER FOR DECODING WEIGHT PARAMETERS OF A NEURAL NETWORK, ENCODER, METHODS AND ENCODED REPRESENTATION USING PROBABILITY ESTIMATION PARAMETERS

Abstract

A decoder for decoding weight parameters of a neural network, wherein the decoder is configured to obtain a plurality of neural network parameters of the neural network on the basis of an encoded bitstream. Furthermore, the decoder is configured to decode the neural network parameters of the neural network using a context-dependent arithmetic decoding Moreover, the decoder is configured to obtain a probability estimate for a decoding of a bin of a number representation of a neural network parameter using one or more probability estimation parameters. In addition, the decoder is configured to use different probability estimation parameter values for a decoding of different neural network parameters and/or to use different probability estimation parameter values for a decoding of bins associated with different context models. Some embodiments are configured to use different probability estimation parameter values for a decoding of neural network parameters of different layers of the neural network.

Claims

1-61. (canceled)

62. A decoder for decoding weight parameters of a neural network, wherein the decoder is configured to acquire a plurality of neural network parameters of the neural network on the basis of an encoded bitstream; wherein the decoder is configured to decode the neural network parameters of the neural network using a context-dependent arithmetic decoding; wherein the decoder is configured to acquire a probability estimate for a decoding of a bin of a number representation of a neural network parameter using one or more probability estimation parameters; wherein the decoder is configured to use different probability estimation parameter values for a decoding of different neural network parameters and/or to use different probability estimation parameter values for a decoding of bins associated with different context models.

63. A decoder for decoding weight parameters of a neural network, wherein the decoder is configured to acquire a plurality of neural network parameters of the neural network on the basis of an encoded bitstream; wherein the decoder is configured to decode the neural network parameters of the neural network using a context-dependent arithmetic decoding; wherein the decoder is configured to acquire a probability estimate for a decoding of a bin of a number representation of a neural network parameter using one or more probability estimation parameters; wherein the decoder is configured to use different probability estimation parameter values for a decoding of neural network parameters associated with different layers of the neural network.

64. The decoder according to claim 62, wherein the decoder is configured to choose one or more probability estimation parameters from a base set, or from a true subset of the base set.

65. The decoder according to claim 62, wherein the decoder is configured to choose one or more probability estimation parameters from different sets of useable parameter values or of useable tuples of parameter values in dependence on a quantization mode and/or in dependence on a number of parameters of a layer of the neural network, or in dependence on a number of neural network parameters to be decoded using the chosen one or more probability estimation parameters, or in dependence on a number of elements of a layer parameter; or wherein the decoder is configured to use different mapping rules mapping an encoded value representing one or more probability estimation parameters onto one or more probability estimation parameters in dependence on a quantization mode and/or in dependence on a number of parameters of a layer of the neural network, or in dependence on a number of neural network parameters to be decoded using the chosen one or more probability estimation parameters, or in dependence on a number of elements of a layer parameter.

66. The decoder according to claim 62, wherein the decoder is configured to selectively choose one or more probability estimation parameters from a first set of useable parameter values or from a first set of useable tuples of parameter values in case that a uniform quantization of the one or more probability estimation parameters is used, and/or if the number of parameters of a layer of the neural network is below a threshold value or if the number of neural network parameters to be decoded using the chosen one or more probability estimation parameters is below a threshold value, or if the number of elements of the layer parameter is below a threshold value, and wherein the decoder is configured to selectively choose one or more probability estimation parameters from a second set of useable parameter values or from a second set of useable tuples of parameter values in case that a variable quantization of the one or more probability estimation parameters is used, and/or if the number of parameters of a layer of the neural network is above the threshold value or if the number of neural network parameters to be decoded using the chosen one or more probability estimation parameters is above the threshold value, or if the number of elements of the layer parameter is above the threshold value; or wherein the decoder is configured to use, or to selectively use, a first mapping rule mapping an encoded value representing one or more probability estimation parameters onto one or more probability estimation parameters in case that a uniform quantization of the one or more probability estimation parameters is used, and/or if the number of parameters of a layer of the neural network is below a threshold value or if the number of neural network parameters to be decoded using the chosen one or more probability estimation parameters is below a threshold value, or if the number of elements of the layer parameter is below a threshold value, and wherein the decoder is configured to use, or to selectively use a second mapping rule mapping an encoded value representing one or more probability estimation parameters onto one or more probability estimation parameters in case that a variable quantization of the one or more probability estimation parameters is used, and/or if the number of parameters of a layer of the neural network is above the threshold value or if the number of neural network parameters to be decoded using the chosen one or more probability estimation parameters is above the threshold value, or if the number of elements of the layer parameter is above the threshold value, and wherein the first set of useable parameter values is different from the second set of useable parameter values, and wherein the first set of useable tuples of parameter values is different from the second set of useable tuples of parameter values; and/or wherein the second set of useable parameter values comprises more useable parameter values than the first set of useable parameter values, and wherein the second set of useable tuples of parameter values comprises more useable tuples than the first set of useable tuples of parameter values; and/or wherein the second mapping rule is different from the first mapping rule.

67. The decoder according to claim 66, wherein, on average, useable parameter values of the second set of useable parameter values allow for a faster adaptation of a probability estimate than useable parameter values of the first set of useable parameter values, or wherein, on average, useable tuples of parameter values of the second set of useable tuples of parameter values allow for a faster adaptation of a probability estimate than useable tuples of parameter values of the first set of useable tuples of parameter values.

68. The decoder according to claim 66, wherein the second set of useable parameter values comprises a useable parameter value which allows for a faster adaptation of a probability estimate than useable parameter values of the first set of useable parameter values, or wherein the second set of useable tuples of parameter values comprises a useable tuple of parameter values which allows for a faster adaptation of a probability estimate than useable tuples of parameter values of the first set of useable tuples of parameter values.

69. The decoder according to claim 62, wherein the decoder is configured to selectively choose the one or more probability estimation parameters from an increased choice if a number of neural network parameters to be decoded using the chosen one or more probability estimation parameters is larger than or equal to a threshold value.

70. The decoder according to claim 62, wherein the decoder is configured evaluate a signaling from which set of useable parameter values or from which set of useable tuples of parameter values the one or more probability estimation parameters are elected; or wherein the decoder is configured evaluate a signaling indication which mapping rule out of a plurality of mapping rules should be used to map an encoded value representing one or more probability estimation parameters onto one or more probability estimation parameters.

71. The decoder according to claim 62, wherein the decoder is configured to decode one or more index values describing a probability estimation parameter value, or describing a plurality of probability estimation parameter values, or describing a tuple of probability estimation parameter values.

72. The decoder according to claim 71, wherein the decoder is configured to decode the one or more index values using one or more context models.

73. The decoder according to claim 71, wherein the decoder is configured to decode a first bin, which describes whether a currently considered index value takes a default value, and wherein the decoder is configured to selectively decode one or more additional bins representing the currently considered index value, or a value derived therefrom, in a binary representation, if the currently considered index value does not take the default value; or wherein the decoder is configured to decode the one or more index values using a unary code decoding, or using a truncated unary code decoding, or using a variable length code decoding.

74. The decoder according to claim 62, wherein the decoder is configured to vary a number of bins or a maximum number of bins used for decoding the one or more probability estimation parameters in dependence on a quantization mode used for quantizing the one or more probability estimation parameters; and/or in dependence on a number of parameters of a layer of the neural network, or in dependence on a number of neural network parameters to be decoded using the one or more probability estimation parameters, or in dependence on a number of elements of a layer parameter.

75. The decoder according to claim 62, wherein the decoder is configured to switch between different sets of usable parameter values associated with the one or more probability estimation parameters, or between different sets of tuples of useable parameter values associated with a plurality of probability estimation parameters, or between different mapping rules for mapping an encoded value representing one or more probability estimation parameters onto one or more probability estimation parameters.

76. The decoder according to claim 75, wherein the decoder is configured to vary a number of bins or a maximum number of bins used for decoding the one or more probability estimation parameters designating a selected probability estimation parameter or a selected tuple of probability estimation parameters in accordance with a switching between different sets of usable parameter values associated with the one or more probability estimation parameters, or between different sets of tuples of useable parameter values associated with a plurality of probability estimation parameters or between different mapping rules.

77. The decoder according to claim 62, wherein the decoder is configured to determine one or more state variables and to derive the probability estimate using the one or more state variables.

78. The decoder according to claim 62, wherein the decoder is configured to derive the probability estimate p.sub.k from two state variables .sub.S1.sup.k, .sub.S2.sup.k according to $s_{k} = {.Math.}_{i = 1}^{N} .Math.s_{i}^{k} \cdot d_{i}^{k}.Math.$ and $p_{k} = \{\begin{array}{r} L U T 2 [.Math.s_{k} \cdot a_{k}.Math.], \\ 1 - L U T 2 [- .Math.s_{k} \cdot a_{k}.Math.], \end{array}) \begin{array}{l} i f .Math.s_{k} \cdot a_{k}.Math. \geq 0. \\ e l s e \end{array}$ .

79. The decoder according to claim 62, wherein the decoder is configured to update the state variables .sub.S1.sup.k, .sub.S2.sup.k according to $s_{i}^{k} = \{\begin{array}{l} s_{i}^{k} + .Math.A [z + .Math.s_{i}^{k} \cdot m_{i}^{k}.Math.] \cdot n_{i}^{k}.Math., & I f d e c o d e d s y m b o l i s 1. \\ s_{i}^{k} + .Math.A [z + .Math.- s_{i}^{k} \cdot m_{i}^{k}.Math.] \cdot n_{i}^{k}.Math., & I f d e c o d e d s y m b o l i s 0. \end{array})$ wherein $m_{i}^{k}$ and $n_{i}^{k}$ are weighting factors; and wherein A is a lookup table; and wherein z is an offset value .

80. The decoder according to claim 79, wherein the decoder is configured to vary the weighting factors $n_{i}^{k},$ so as to use different probability estimation parameter values for a decoding of different neural network parameters and/or to use different probability estimation parameter values for a decoding of bins associated with different context models and/or to use different probability estimation parameter values for a decoding of neural network parameters associated with different layers of the neural network.

81. The decoder according to claim 79, wherein a relationship between the weighting factors $n_{i}^{k}$ and adaptation parameters $s h_{i}^{k}$ is defined according to $n_{i}^{k} = 2^{- s h_{i}^{k} + 4}$ .

82. An encoder for encoding weight parameters of a neural network, wherein the encoder is configured to acquire a plurality of neural network parameters of the neural network; wherein the encoder is configured to encode the neural network parameters of the neural network using a context-dependent arithmetic coding; wherein the encoder is configured to acquire a probability estimate for an encoding of a bin of a number representation of a neural network parameter using one or more probability estimation parameters; wherein the encoder is configured to use different probability estimation parameter values for an encoding of different neural network parameters and/or to use different probability estimation parameter values for an encoding of bins associated with different context models, or wherein the encoder is configured to use different probability estimation parameter values for an encoding of neural network parameters associated with different layers of the neural network.

83. A method for decoding weight parameters of a neural network, wherein the method comprises acquiring a plurality of neural network parameters of the neural network on the basis of an encoded bitstream; wherein the method comprises decoding the neural network parameters of the neural network using a context-dependent arithmetic decoding; wherein the method comprises acquiring a probability estimate for an decoding of a bin of a number representation of a neural network parameter using one or more probability estimation parameters; wherein the method comprises using different probability estimation parameter values for a decoding of different neural network parameters and/or using different probability estimation parameter values for a decoding of bins associated with different context models or wherein the method comprises using different probability estimation parameter values for a decoding of neural network parameters associated with different layers of the neural network.

84. A method for encoding weight parameters of a neural network, wherein the method comprises acquiring a plurality of neural network parameters of the neural network; wherein the method comprises encoding the neural network parameters of the neural network using a context-dependent arithmetic coding; wherein the method comprises acquiring a probability estimate for an encoding of a bin of a number representation of a neural network parameter using one or more probability estimation parameters; wherein the method comprises using different probability estimation parameter values for an encoding of different neural network parameters and/or using different probability estimation parameter values for an encoding of bins associated with different context models, or wherein the method comprises using different probability estimation parameter values for an encoding of neural network parameters associated with different layers of the neural network.

85. A non-transitory digital storage medium having stored thereon a computer program for performing a method for decoding weight parameters of a neural network, wherein the method comprises acquiring a plurality of neural network parameters of the neural network on the basis of an encoded bitstream; wherein the method comprises decoding the neural network parameters of the neural network using a context-dependent arithmetic decoding; wherein the method comprises acquiring a probability estimate for an decoding of a bin of a number representation of a neural network parameter using one or more probability estimation parameters; wherein the method comprises using different probability estimation parameter values for a decoding of different neural network parameters and/or using different probability estimation parameter values for a decoding of bins associated with different context models or wherein the method comprises using different probability estimation parameter values for a decoding of neural network parameters associated with different layers of the neural network, when said computer program is run by a computer.

86. A non-transitory digital storage medium having stored thereon a computer program for performing a method for encoding weight parameters of a neural network, wherein the method comprises acquiring a plurality of neural network parameters of the neural network; wherein the method comprises encoding the neural network parameters of the neural network using a context-dependent arithmetic coding; wherein the method comprises acquiring a probability estimate for an encoding of a bin of a number representation of a neural network parameter using one or more probability estimation parameters; wherein the method comprises using different probability estimation parameter values for an encoding of different neural network parameters and/or using different probability estimation parameter values for an encoding of bins associated with different context models, or wherein the method comprises using different probability estimation parameter values for an encoding of neural network parameters associated with different layers of the neural network, when said computer program is run by a computer.

87. An encoded representation of weight parameters of a neural network, comprising: a plurality of encoded weight parameters of the neural network; and an encoded representation of one or more probability estimation parameters determining characteristics of a probability estimation for an adaptation of a context of an arithmetic decoding of the encoded weight parameters.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0196] The drawings are not necessarily to scale, emphasis instead generally being placed upon illustrating the principles of the invention. In the following description, various embodiments of the invention are described with reference to the following drawings, in which:

[0197] FIG. 1 shows an example for a graph representation of a feed forward neural network;

[0198] FIG. 2 shows a block schematic diagram of a decoder according to embodiments of the invention;

[0199] FIG. 3 shows a schematic representation of an example of a selection of probability estimation parameters according to embodiments of the invention;

[0200] FIG. 4 shows a schematic representation of an example of a decoder selection entity according to embodiments of the invention;

[0201] FIG. 5 shows a schematic representation of an example of an encoded bitstream and index values describing a probability estimation parameter values according to embodiments of the invention; and

[0202] FIG. 6 shows a schematic block diagram of methods according to embodiments of the invention.

DETAILED DESCRIPTION OF THE INVENTION

[0203] Equal or equivalent elements or elements with equal or equivalent functionality are denoted in the following description by equal or equivalent reference numerals even if occurring in different figures.

[0204] In the following description, a plurality of details is set forth to provide a more throughout explanation of embodiments of the present invention. However, it will be apparent to those skilled in the art that embodiments of the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form rather than in detail in order to avoid obscuring embodiments of the present invention. In addition, features of the different embodiments described herein after may be combined with each other, unless specifically noted otherwise.

[0205] FIG. 2 shows a block schematic diagram of a decoder according to embodiments of the invention. Decoder 200 comprises a context dependent arithmetic decoding unit 210, a probability estimator 220 and probability estimation parameter values 230. Optionally, the decoder 200 comprises a bitstream disassembly unit 240 and a parameter reassembly unit 250. The decoder 200 may be configured to receive an encoded bitstream 202. The encoded bitstream 202 may comprise an information about a plurality of neural network parameters.

[0206] The optional bitstream disassembly unit 240 may be configured to convert the encoded bitstream 202 in a processible information for the context dependent arithmetic decoding unit 210. This functionality may be provided by the decoding unit 210, therefore disassembly unit 240 is illustrated only for explanatory purposes. The disassembly unit may be configured to disassemble the encoded bitstream 202 in one part comprising an information about encoded neural network parameters and another part, e.g. flags, indicating a start of the bitstream and/or bits for error correction or other overhead.

[0207] The decoding unit 210 may be configured to decode the encoded neural network parameters in order to provide a plurality of, for example decoded, neural network parameters 204. In order to decode the neural network parameters, the decoding unit 210 is configured to receive a probability estimate from the probability estimator 220. The neural network parameters may be encoded in the bitstream 202 as a sequence of bins. One bin or a plurality of bins may represent a neural network parameter. The bins may, for example, be associated with a context, or in other words a probability model. The probability estimate may indicate a probability of a bin to have a certain value, e.g. 1 or 0. The probability estimate may be determined depending on a context of the bin, or in other words its probability model. The probability estimator 220 comprises probability estimation parameters, in order to determine the probability estimate.

[0208] The decoder or the context dependent arithmetic decoding unit 210 is configured to use different probability estimation parameters values for a decoding of different neural network parameters. Consequently, stochastic, individual characteristics of neural network parameters may be taken into account by adapting the probability estimation parameters. In addition, or alternatively, the decoder or the context dependent arithmetic decoding unit 210 may be configured to use different probability estimation parameter values for a decoding of bins associated with different context models. Individual bins, or sets of bins may be associated with a context model. The context models may be adapted according to, for example recently, decoded neural network parameters or bins thereof. As a result, for individual bins, or individual context models associated with bins, the parametrization of the probability estimator 220 may be adapted. The optional parameter reassembly unit 250 may be configured to reassemble decoded bins to neural network parameters and/or may be configured to interpret the decoded entities that are provided by the decoding unit 210 in order to provide the plurality of neural network parameters 204. Optionally, the probability estimator 220 may receive feedback information from the output of the decoding unit 210 and/or from the optional parameter reassembly unit 250.

[0209] However, decoder 200 may alternatively, or in addition be configured to use different probability estimation parameter values for a decoding of neural network parameters associated with different layers of the neural network. The information of the layer of a certain neural network parameter may be encoded in the bitstream, and may trigger a changing of probability estimation parameters values 230.

[0210] FIG. 3 shows a schematic representation of an example of a selection of probability estimation parameters according to embodiments of the invention. FIG. 3 shows, as an example, probability estimation parameters of approaches 1 - 4 and probability estimation parameters for updating the context model. The decoder may choose one or more probability estimation parameters, e.g. parameters 310 and/or 320 from a base set 300. As shown with parameters 310 and 320 these parameters may be a true subset of the base set 300.

[0211] As a result, the decoder may be configured to use not only different probability estimation parameter values but also different probability estimation parameters. The decoder may use any of the approaches 1 to 4, and therefore only need a subset of probability estimation parameters of the base set 300.

[0212] In addition, the decoder may not only choose probability estimation parameters but also their values. For example, in a first selection step, the decoder may choose parameters 310. In the next step the decoder may as well choose from different subsets of probability estimation parameter values for these probability estimation parameters. In other words, for each probability estimation parameter, there may be a plurality of sets of admissible probability estimation parameter values and the decoder may choose probability estimation parameters and corresponding values.

[0213] FIG. 4 shows a schematic representation of an example of a decoder selection entity according to embodiments of the invention. The decoder may be configured to choose one or more probability estimation parameters from different sets 410, 420 of usable parameter values, wherein, as an example, set 410 comprises a subset 412 (parameters according to approach 3) and a subset 414. Although the approaches 1 -4 and the update of the state variables have been explained in the context of the encoding and/or decoding of neural network parameters themselves, it is to be noted that similar or equivalent approaches and updates of context models may be performed for the encoding and/or decoding of the probability estimation parameters, hence these parameters are shown here. In addition, or alternatively, the decoder may choose from usable sets of tuples of parameters values 430, 440 and/or from different mapping rules, here shown as an example in the form of the Tables 5, 450, and 7, 460. Choice of probability estimation parameters values, tuples or mappings may be performed based on a quantization mode. It is to be noted, that different sets of usable parameters values may comprise the same probability estimation parameters but with different values. The decoder may choose between the first set 412 and another set 414 comprising, as an example, the values d.sub.1.sup.k=15, d.sub.2.sup.k= 2, a.sub.k=1,5.sup.-7 and LUT=LUT2. In other words, an adaption may be performed with regard to the parameters used for encoding/decoding as well as with regard to the respective values of the parameters. However, it is to be noted that according to embodiments only a selection of the values may be performed, e.g. a decoder decision between set 412 and 414, wherein the selection of parameters is not limited to the shown parameters of approach 3.

[0214] For example, the decoder may selectively choose one or more probability estimation parameters from a first set 410 of useable parameter values or from a first set of useable tuples 430 of parameter values in case that a uniform quantization of the one or more probability estimation parameters is used. As an example, the decoder may choose between sets 412 and 414 in case of the uniform quantization. On the other hand, the decoder may selectively choose one or more probability estimation parameters from a second set 420 of useable parameter values or from a second set of useable tuples 440 of parameter values in case that a variable quantization of the one or more probability estimation parameters is used. Similarly the decoder may use a first mapping rule 460 mapping an encoded value representing one or more probability estimation parameters onto one or more probability estimation parameters in case that a uniform quantization of the one or more probability estimation parameters is used, and may use a second mapping rule 450, mapping an encoded value representing one or more probability estimation parameters onto one or more probability estimation parameters in case that a variable quantization of the one or more probability estimation parameters is used. First and second sets, tuples and mapping rules may be different from each other.

[0215] It is to be noted that the different sets, tuples and mapping rules may comprise different probability estimation parameters, e.g. to perform calculations according to the different approaches 1-4 or may comprise the same probability estimation parameters but with different values. Consequently, the decoder may optionally choose a calculation routine first and then its parametrization, or in other words its values. On the other hand, the decoder may only choose the probability estimation parameters values according to the quantization mode, e.g. choosing between sets 412 and 414 or, for example between tuples 430, 440, describing the same probability estimation parameters but with different values.

[0216] As another optional feature, on average, useable parameter values of the second set of useable parameter values or of the second mapping rule may allow for a faster adaptation of a probability estimate, e.g. to a change of bin value frequencies, than useable parameter values of the first set of useable parameter values, or of the first mapping rule. Alternatively, on average, useable tuples of parameter values of the second set of useable tuples of parameter values, or of the second mapping rule, may allow for a faster adaptation of a probability estimate, e.g. to a change of bin value frequencies, than useable tuples of parameter values of the first set of useable tuples of parameter values, or of the first mapping rule.

[0217] Optionally, the second set of useable parameter values, or the second mapping rule, may comprise a useable parameter value which allows for a faster adaptation of a probability estimate, e.g. to a change of bin value frequencies, than useable parameter values, or even than all useable parameter values, of the first set of useable parameter values, or of the first mapping rule. Alternatively, the second set of useable tuples of parameter values comprises a useable tuple of parameter values which allows for a faster adaptation of a probability estimate, e.g. to a change of bin value frequencies, than useable tuples, or even than all useable tuples, of parameter values of the first set of useable tuples of parameter values.

[0218] Furthermore, sets 410 and 420 may be useable, or for example allowable, parameter values, tuples 430, 440 may be useable, or for example allowable, tuples. Decoder choice of such sets or tuples may be performed based on, or in dependence on a number of parameters of a layer of the neural network or in dependence on a number of neural network parameters to be decoded using the chosen one or more probability estimation parameters, or in dependence on a number of elements of a layer parameter. Analogously, usage of different mapping rules 450 ,460 may be performed by the decoder in dependence on a number of neural network parameters to be decoded using the chosen one or more probability estimation parameters, or in dependence on a number of elements of a layer parameter.

[0219] As another optional feature, the decoder may be configured to selectively choose one or more probability estimation parameters from a first set of useable parameter values or from a first set of useable tuples of parameter values if the number of parameters of a layer of the neural network is below a threshold value, e.g. X=1000, or if the number of neural network parameters to be decoded using the chosen one or more probability estimation parameters is below a threshold value, or if the number of elements of the layer parameter is below a threshold value.

[0220] Additionally, the decoder may be configured to selectively choose one or more probability estimation parameters from a second set of useable parameter values or from a second set of useable tuples of parameter values if the number of parameters of a layer of the neural network is above the threshold value or if the number of neural network parameters to be decoded using the chosen one or more probability estimation parameters is above the threshold value, or if the number of elements of the layer parameter is above the threshold value.

[0221] Alternatively, the decoder is configured to selectively use a first mapping rule mapping an encoded value representing one or more probability estimation parameters onto one or more probability estimation parameters if the number of parameters of a layer of the neural network is below a threshold value or if the number of neural network parameters to be decoded using the chosen one or more probability estimation parameters is below a threshold value, or if the number of elements of the layer parameter is below a threshold value, and the decoder may be configured to selectively use a second mapping rule mapping an encoded value representing one or more probability estimation parameters onto one or more probability estimation parameters if the number of parameters of a layer of the neural network is above the threshold value or if the number of neural network parameters to be decoded using the chosen one or more probability estimation parameters is above the threshold value, or if the number of elements of the layer parameter is above the threshold value;

[0222] in this case, the second set of useable parameter values may comprise more useable parameter values than the first set of useable parameter values, and the second set of useable tuples of parameter values may comprise more useable tuples than the first set of useable tuples of parameter values. In addition, or alternatively, the second mapping rule may be different from the first mapping rule.

[0223] FIG. 5 shows a schematic representation of an example of an encoded bitstream and index values describing probability estimation parameter values according to embodiments of the invention. As an example, the bitstream 202 comprises a signaling in the form of a flag indication F or in other words a flag F. However, the signaling may be transmitted in any suitable way. The decoder may evaluate flag F in order to determine from which set of useable parameter values or from which set of useable tuples of parameter values the one or more probability estimation parameters are elected. Alternatively, the decoder may be configured to evaluate the signaling indication which mapping rule out of a plurality of mapping rules should be used to map an encoded value representing one or more probability estimation parameters onto one or more probability estimation parameters. Hence, the decoder’s choice of sets, tuples and or mappings according to FIG. 4 may be based on the signaling, e.g. in the form of an encoded flag F. Optional bitstream disassembly unit 240, may for example, disassemble bitstream 202 in the flag for indicating the set, tuple and/or mapping to be used for decoding and in an information about the probability estimation parameter values.

[0224] As shown in FIG. 5 the bitstream 202 may comprise one or more index values q.sub.i, (here as an example shown for i=1, 2 and 3), for example integer values, describing a probability estimation parameter value, or describing a plurality of probability estimation parameter values, or describing a tuple of probability estimation parameter values. In other words, the index values q.sub.i may be an encoded representation of one or more probability estimation parameters.

[0225] The decoder may be configured to decode the one or more index values q.sub.i, for example using the signaling. In addition, the one or more index values q.sub.i may be associated with one or more context models c.sub.qi. and the decoder may be configured to decode the index values q.sub.i using the context models c.sub.qi.

[0226] The index values may be represented by one or more bins. A first bin, for example as shown fbin, may describe whether a currently considered index value takes a default value. In case the index value takes the default value, the index value may comprise only the one bin, since the index value is already determined by the first bin. Otherwise, the index value may be represented with one or more additional bins, e.g. in the form of bins addbin.sub.j, as an example shown with j = 1, 2, 3. The decoder may be configured to decode the first bin and the optional additional bins. Any of these bins may be associated with a context, for example individually. Optionally context c.sub.qi of index value q.sub.i may be associated with the first bin of the index value and the additional bins may be decoded with a fixed length per bin.

[0227] In addition, the bitstream 202 comprises integer multiples r.sub.i, as an example shown with i = 1, 2, 3, associated with neural network parameters. Based on the index values q.sub.i an adaptation of a context of an arithmetic decoding of encoded neural network parameters, which may be encoded weight parameters of the neural network, represented by integer multiples r.sub.i, may be performed.

[0228] As another optional feature, the decoder may be configured to decode the one or more index values using a unary code decoding, or using a truncated unary code decoding, or using a variable length code decoding, wherein, for example, the code lengths are chosen according to probabilities of occurrence of different index values. According to embodiments any suitable coding technique may be applied, for example individually for different index values, providing improved flexibility and coding efficiency.

[0229] In addition, the decoder may be configured to vary a number of bins or a maximum number of bins used for decoding the one or more probability estimation parameters in dependence on a quantization mode used for quantizing the one or more probability estimation parameters and/or in dependence on a number of parameters of a layer of the neural network, or in dependence on a number of neural network parameters to be decoded using the one or more probability estimation parameters, or in dependence on a number of elements of a layer parameter.

[0230] As another optional feature, the decoder may be configured to switch between different sets of usable parameter values associated with the one or more probability estimation parameters, or between different sets of tuples of useable parameter values associated with a plurality of probability estimation parameters, or between different mapping rules for mapping an encoded value representing one or more probability estimation parameters onto one or more probability estimation parameters. The decoder may switch between sets 410 (and/or between sets 412, 414) and 420 and/or between tuples 430, 440 and/or different mappings 450, 460 as shown in FIG. 4.

[0231] Furthermore, the beforementioned variation of number of bins or maximum number of bins may be performed by the decoder in accordance with a switching between different sets, tuples and/or mapping rules.

[0232] In addition, the decoder may be configured to determine one or more state variables, e.g. s.sub.i.sup.k or s.sub.k, and to derive the probability estimate, e.g. p.sub.k, using the one or more state variables.

[0233] Moreover, encoded bitstream 202 may be an encoded representation of weight parameters of a neural network, comprising a plurality of encoded weight parameters of the neural network in the form of the integer multiples r.sub.i and an encoded representation of one or more probability estimation parameters, namely the index values q.sub.i.

[0234] As shown, the encoded representation in the form of the encoded bitstream 202 may comprise separate encoded representations of separate probability estimation parameters, namely index values q.sub.i, (here as an example shown for i=1, 2 and 3). These index values q.sub.i may be associated with different neural network parameters, e.g. q.sub.1->r.sub.1, q.sub.2->r.sub.2, .... Alternatively or in addition, as shown, separate probability estimation parameters q.sub.i may be associated with different context models c.sub.qi. As another optional feature, separate probability estimation parameters may be associated with different layers of the neural network.

[0235] FIG. 6 shows a schematic block diagram of methods according to embodiments of the invention. FIG. 6 shows methods 600, 700 for decoding weight parameters of a neural network. The methods 600 and 700 comprise obtaining 610, 710 a plurality of neural network parameters, e.g., at least one of entries w.sub.i of matrix W, b, .Math., σ.sup.2, σ, .sub.Y,and/or β, of the neural network on the basis of an encoded bitstream, a decoding 620, 720 the neural network parameters of the neural network, e.g., a quantized version thereof, using a context-dependent arithmetic decoding, e.g., using a context-adaptive binary arithmetic decoding (CABAC). Optionally, probabilities of bin values are determined for different contexts, wherein, for example, each bin is associated with a context. Methods 600, 700 further comprise obtaining 630, 730 a probability estimate, e.g. P(t) or p.sub.k, which may, for example, be associated with a context, for a, optionally arithmetic, decoding of a bin of a number representation of a neural network parameter, e.g. on the basis of one or more previously decoded neural network parameters or bins thereof, using one or more probability estimation parameters, e.g., probability estimator parameters, e.g.

[00096] $N, a_{i}^{k}, b_{i}^{k}, a_{k}, d_{i}^{k}, A, m_{i}^{k}, n_{i}^{k}, s h_{i}^{k}, i n i t V a l_{i}^{k} .$

[0236] Method 600 comprises in addition using 640 different probability estimation parameter values for a decoding of different neural network parameters and/or using different probability estimation parameter values for a decoding of bins associated with different context models, e.g. c.sub.k.

[0237] On the other hand, method 700 comprises in addition using 740 different probability estimation parameter values for a decoding of neural network parameters associated with different layers of the neural network.

Further Embodiments and Aspects

[0238] In the following further embodiments comprising aspects and features that may be incorporated in any of the preceding embodiments are disclosed.

Efficient Representation of Parameters (Examples, Details are Optional)

[0239] The parameters W, b, .Math., σ.sup.2, γ, and β shall collectively be denoted parameters of a layer or layer parameters. They usually need to be signaled in a bitstream (e.g. in an encoded video representation, for example, if the neural network is used in a video decoder). For example, they could be represented as 32 bit floating point numbers or they could, for example, be quantized to an integer representation, also denoted as quantization indices. Note that ∈ is usually not signaled in the bitstream.

[0240] For example, a particularly efficient approach for encoding such parameters employs a uniform reconstruction quantizer (URQ) where, for example, each value is represented as integer multiple of a so-called quantization step size value. The corresponding floating point number can, for example, be reconstructed by multiplying the integer with the quantization step size, which is usually (but not necessarily) a single floating point number. However, for example, efficient implementations for neural network inference (that is, calculating the output of the neural network for an input) employ integer operations whenever possible. Therefore, it may be undesirable to use parameters to be reconstructed to a floating point representation.

[0241] In another efficient approach for encoding the parameters, a set of quantizers is applied where each value is, for example, represented as integer multiple of a quantization step size value. Usually, for example, each quantizer in the set employs a disjoint set of integer multiples of the quantization step size parameter as applicable reconstruction values, but two or more quantizers may share one or more reconstruction values. The applied quantizer depends, for example, on the values of previous quantization indices in coding order. The corresponding floating point number can, for example, be reconstructed by multiplying the integer with the quantization step size, which is usually, for example, a floating point number which depends on the chosen quantizer.

[0242] An example for such a quantizer design is trellis coded quantization (TCQ), also denoted as dependent quantization (DQ).

[0243] In an embodiment a set of two quantizers is used. The first quantizer employs, for example, all even multiples of the quantization step size including zero, and the second quantizer employs all the even multiples of the quantization step size including zero.

Entropy Coding and Probability Estimation (Examples, Details are Optional)

[0244] The quantization indices that are output, for example, by the quantization method are then entropy coded using a suitable entropy coding method.

[0245] A particularly suitable entropy coding method for encoding such quantization indices is Context-based Adaptive Binary Arithmetic Coding, also denoted as CABAC. For this, each quantization index is, for example, decomposed into a sequence of binary decisions, so-called bins.

[0246] Usually, for example, each bin is associated with a probability model, also denoted as context model, which models the statistics of the associated bins, for example, using a probability estimation method.

[0247] A probability estimator is an apparatus, that models the probability P(t) for a bin being equal to x, where x ∈ {0,1} , for example, based on already coded bins associated with the probability estimator.

[0248] For example, probability estimators have several parameters, denoted as probability estimator parameters or estimator parameters (or also as probability estimation parameters), that affect the probability estimates, e.g. the adaptation rate. Usually, those estimator parameters are, for example, chosen globally, depending on the application scenario, e.g. encoding of neural network parameters. Thus, for example, in neural network encoding, each neural network parameter applies the same set of estimator parameters.

[0249] But, it has been found that the compression efficiency can be improved by selecting optimized estimator parameters for a current neural network parameter. So, according to an aspect, the basic idea is to select suitable estimator parameters out of a set of parameters, which are then signaled to the decoder.

Typical Estimator Design (Example, Details are Optional)

[0250] First, a typical estimator design, that is applied in neural network compression, is described.

[0251] For example, for each context model c.sub.k, one or more state variables

[00097] $s_{i}^{k}, .Math., s_{N}^{k}$

are maintained with N ≥ 1. Each state variable

[00098] $s_{i}^{k}$

is implemented, for example, as signed integer value and represents, for example, a probability value

[00099] $P (s_{i}^{k}, i, k) = p_{i}^{k} .$

The probability estimate p.sub.k of a context model c.sub.k shall be defined, for example, as weighted sum of the probability values

[00100] $p_{i}^{k}$

of all state variables of the context model.

[0252] State variables shall advantageously but not necessarily have the following properties: [0253] 1. then [0254] 2. Larger values for correspond to larger [0255] 3.

[0256] Consequently, negative state variables may, for example, correspond to

[00106] $p_{i}^{k} < 0.5.$

In general, it is possible to specify different functions P(.Math.) for each state variable of each context model.

Exemplary Configuration for Associating State Variables with Probability Values (Example, Details are Optional)

[0257] There exist many useful ways of associating state variables with probability values, i.e., of implementing P(.Math.). For example, a state representation that is used in neural network compression can be achieved with the following equation:

[00107] $p (x, i, k) = \{\begin{cases} 0.5 \cdot α^{.Math.x \cdot β_{i}^{k}.Math.}, i f x \geq 0, \\ 1 - 0.5 \cdot α^{.Math.- x \cdot β_{i}^{k}.Math.}, e l s e . \end{cases})$

[00108] $β_{i}^{k}$

is a weighting factor. α is a parameter with 0 < α < 1.

[0258] To achieve, for example, a configuration comparable to the one used in the current draft of MPEG-7 part 17 standard for compression of neural networks for multimedia content description and analysis, which uses two states

[00109] $(N = 2, s_{1}^{k}, s_{2}^{k}),$

set α ≈0.99894079 and

[00110] $β_{1}^{k} = 16$

and

[00111] $β_{2}^{k} = 1$

for all k.

[0259] This exemplary configuration shall give some insight about how state variables could be defined. In general, it is not necessary to define P(.Math.) because it is not directly used, as will be seen in the following. Instead, it often results from the actual implementation of the individual parts.

Initialization of State Variables (Example, Details are Optional)

[0260] Before encoding or decoding the first symbol with a context model, all state variables are optionally initialized with sane values, denoted as

[00112] $i n i t V a l_{i}^{k},$

that may, for example, be optimized to the compression application.

Derivation of a Probability Estimate From State Variables (Examples, Details are Optional)

[0261] For encoding or decoding of a symbol, a probability estimate is derived from the state variables of a context model. Three alternative approaches are presented in the following as examples. Approach 1 yields more accurate results than approach 2 and approach 3, but also has a higher computational complexity.

Approach 1 Example

[0262] This approach consists of two steps. Firstly, each state variable

[00113] $s_{i}^{k}$

of a context model is converted into a probability value

[00114] $p_{i}^{k} .$

Secondly, the probability estimate p.sub.k is derived as weighted sum of the probability values

[00115] $p_{i}^{k} .$

Step 1:

[0263] A lookup table LUT1 is employed for converting a state variable

[00116] $s_{i}^{k}$

into the corresponding probability value

[00117] $p_{i}^{k},$

for example according to Eq. (1).

[00118] $p_{i}^{k} = \{\begin{array}{r} L U T 1 [.Math.s_{i}^{k} \cdot a_{i}^{k}.Math.], \\ 1 - L U T 1 [.Math.- s_{i}^{k} \cdot a_{i}^{k}.Math.], \end{array} \begin{array}{l} i f s_{i}^{k} \geq 0. \\ e l s e . \end{array})$

LUT1 is a lookup table containing probability values.

[00119] $a_{i}^{k}$

is a weighting factor that adapts

[00120] $s_{i}^{k}$

to the size of LUT1.

Step 2:

[0264] The probability estimate p.sub.k is derived from the probability values

[00121] $p_{i}^{k},$

for example according to:

[00122] $p_{k} = {.Math.}_{i = 1}^{N} p_{i}^{k} \cdot b_{i}^{k}$

[00123] $b_{i}^{k}$

is a weighting factor that controls the influence of the individual state variables.

Approach 2 Example

[0265] An alternative approach for deriving the probability estimate from the state variables is presented in the following. It yields less accurate results and has a lower computational complexity. Firstly, a weighted sum s.sub.k of the state variables is derived, for example, according to:

[00124] $s_{k} = {.Math.}_{i = 1}^{N} .Math.s_{i}^{k} \cdot d_{i}^{k}.Math.$

[00125] $d_{i}^{k}$

is a weighting factor that controls the influence of each state variable.

[0266] Secondly, the probability estimate p.sub.k is derived from the weighted sum of state variables s.sub.k, for example according to:

[00126] $p_{k} = \{\begin{array}{r} L U T 2 [.Math.s_{k} \cdot a_{k}.Math.], \\ 1 - L U T 2 [.Math.- s_{k} \cdot a_{k}.Math.], \end{array} \begin{array}{l} i f s_{k} \geq 0. \\ e l s e . \end{array})$

LUT2 is a lookup table containing probability estimates. a.sub.k is a weighting factor that adapts s.sub.k to the size of LUT2.

Approach 3 Example

[0267] A further alternative approach for deriving the probability estimate from the state variables is presented in the following. Firstly, the weighted sum s.sub.k of the state variables is derived, for example, as in approach 2. Secondly, the probability estimate p.sub.k is derived from the weighted sum of state variables s.sub.k, for example according to:

[00127] $p_{k} = \{\begin{array}{r} L U T 2 [.Math.s_{k} \cdot a_{k}.Math.], \\ 1 - L U T 2 [- .Math.s_{k} \cdot a_{k}.Math.], \end{array} \begin{array}{l} i f .Math.s_{k} \cdot a_{k}.Math. \geq 0. \\ e l s e \end{array})$

LUT2 is a lookup table containing probability estimates.

Approach 4 Example

[0268] A further approach uses a linear relation between the state values and the probability P(x, i, k). The derivation of the probability estimate is, for example, using the approach of equation (2). An example of approach 4 is the probability estimation scheme used in the current draft of Versatile Video Coding (VVC).

[0269] To achieve, for example, a configuration used in the current draft of MPEG-7 part 17 standard for compression of neural networks for multimedia content description and analysis, the method of approach 3 is used, for example, with

[00128] $d_{1}^{k} = 16, d_{2}^{k} = 1$

and a.sub.k = 2.sup.-.sup.7 for all k. The look-up table containing the probability estimates is, for example,:

[00129] $\begin{array}{l} L U T 2 = \\ \{\begin{array}{l} 0.5000, 0.4087, 0.3568, 0.3116, 0.2721, 0.2375, 0.2074, 0.1811, \\ 0.1581, 0.1381, 0.1206, 0.1053, 0.0919, 0.0803, 0.0701, 0.0612, \\ 0.0534, 0.0466, 0.0407, 0.0356, 0.0310, 0.0271, 0.0237, 0.0207, \\ 0.0180, 0.0158, 0.0138, 0.0120, 0.0105, 0.0092, 0.0080, 0.0070 \end{array}\} \end{array}$

Update of State Variables (Examples, Details are Optional)

[0270] After the encoding or decoding of a symbol, one or more state variables of a context model may be updated in order to track the statistical behaviour of the symbol sequence.

[0271] The update is, for example, carried out as follows:

[00130] $\begin{array}{l} s_{i}^{k} = \\ \{\begin{matrix} s_{i}^{k} + .Math.A [z + .Math.s_{i}^{k} \cdot m_{i}^{k}.Math.] \cdot n_{i}^{k}.Math., & I f s y m b o l t o b e e n c o d e d i s 1. \\ s_{i}^{k} + .Math.A [z + .Math.- s_{i}^{k} \cdot m_{i}^{k}.Math.] \cdot n_{i}^{k}.Math., & I f s y m b o l t o b e e n c o d e d i s 0. \end{matrix}) \end{array}$

A is a lookup table storing, for example, integer values.

[00131] $m_{i}^{k}$

and

[00132] $n_{i}^{k}$

are weighting factors that control, for example, the update ‘agility’. The factors

[00133] $n_{i}^{k}$

can be written, for example, according to

[00134] $n_{i}^{k} = 2^{- s h_{i}^{k} + 4},$

where

[00135] $s h_{i}^{k}$

also denoted as adaptation parameter. z is an offset that ensures, for example, that look table A is accessed only with nonnegative values.

[0272] The values in lookup table A can, for example, be chosen so that

[00136] $s_{i}^{k}$

stays in a particular given interval.

[0273] Usually, the values of look-up A approximate, for example, an update function. Alternatively, it is, for example, also possible to simply use the related update function for the state updates.

[0274] For example, the estimation method of VVC, following approach 4, applies update functions for the state update and uses bit shifts, which, for example, determine the ‘agility’ of the update. This corresponds, for example, to the adaptation parameters described above. The invention (see below) can be applied to those in the same manner.

[0275] To achieve, for example, a configuration used in the current draft of MPEG-7 part 17 standard for compression of neural networks for multimedia content description and analysis, the parameters are chosen, for example, such that

[00137] $m_{1}^{k} = 2^{- 3}, m_{2}^{k} = 2^{- 7}$

and

[00138] $n_{1}^{k} = 2^{- 1}, n_{2}^{k} = 1,$

for all k, and z = 16. The look-up table A is, for example: A = {157, 143, 129, 115, 101, 87, 73, 59, 45, 35, 29, 23, 17, 13, 9, 5, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 0 }.

[0276] Before encoding a symbol,

[00139] $s_{1}^{k}$

shall, for example, be initialized with a value from the interval [-127,127] and

[00140] $s_{2}^{k}$

shall be initialized, for example, with a value from the interval [-2047, 2047].

[0277] Consequently,

[00141] $s_{1}^{k}$

can, for example, be implemented with an 8 bit signed integer value and

[00142] $s_{2}^{k}$

can, for example, be implemented with a 12 bit signed integer.

Aspect of the Invention (Details are Optional)

[0278] In the following the parameters, i.e.

[00143] $N, a_{i}^{k}, b_{i}^{k}, a_{k}, d_{i}^{k}, A, m_{i}^{k}, n_{i}^{k}, s h_{i}^{k}, i n i t V a l_{i}^{k}$

and any other parameter related to the probability estimator (context model) shall be collectively denoted as probability estimator parameters or estimator parameters (or probability estimation parameters).

[0279] Usually, for example, for each estimator parameter one fixed instance out of a base set of probability estimator parameters is chosen for the entire network. The values of the base set may also be N-tuples of estimator parameters, according to the number of applied states N. According to an aspect of the invention, the probability estimation, and thus the compression efficiency, can, for example, be improved, if the parameters are chosen individually for each parameter or a subset of parameters of a layer (i.e. W, b, .Math., σ.sup.2, y, and β) and/or context model c.sub.k.

[0280] The estimator parameter to be used is, for example, determined among the parameters of a set of parameters, which can, for example, be the base set or any subset of the base set. Each parameter of the set may, for example, be associated with an integer index q. For example, one parameter of the set may be denoted as default parameter. Usually the default parameter is, for example, associated with an integer index equal to zero. The index associated with the chosen estimator parameter is then, for example, signaled to the decoder.

Encoding Schemes (Examples, Details are Optional)

[0281] The index q ∈ [0, q.sub.MAX] to be encoded is, for example, decomposed into a sequence of bins, which are then encoded. Each bin may, for example, be coded using a context model or using a fixed probability.

[0282] The encoding procedure may, for example, be according to one of the following schemes: [0283] 1. A first bin, for example, useNotDefault, denotes if the estimator parameter to be chosen is different from the default parameter (for example, useNotDefault = 1) or not (for example, useNotDefault = 0). If, for example, useNotDefault = 0, the default parameter is chosen and no further bins are encoded. Whenever, for example, useNotDefault = 1, a series of bins is encoded, which denote, for example, the index of the chosen parameter minus one (q - 1), indexMinusOne. The number of bins encoded for index is, for example, equal to [log.sub.2(setLength - 1)], where setLength, denotes the number of elements of the set. [0284] 2. For the second procedure an unary code is used. A first bin, for example, greaterThan_0 denotes if the index q associated with the probability parameter is greater than zero (for example, greaterThan_0 = 1) or not (for example, greaterThan_0 = 0). If, for example, greaterThan_0 = 0 no further bins are encoded. If, for example, greaterThan_0 = 1, another bin is encoded (for example, greaterThan_1), which denotes if index q is greater than one (for example, greaterThan_1 = 1) or not (for example, greaterThan_1 = 0). If, for example, greaterThan_1 = 0 no further bins are encoded. If, for example, greaterThan_1 = 0, further bins (greaterThan_X) are encoded in the same manner until a flag greaterThan_q is equal to zero. [0285] 3. This procedure applies a truncated unary code, which is, for example, identical to the unary code used in encoding method 2. , except for the case where the index to encode q is equal to q.sub.MAX. In this case, for example, after encoding the bin greaterThan_(q.sub.MAX - 1) no further bins are encoded. For example, at the decoder side the value of q is inferred to be q.sub.MAX, if greaterThan_(q.sub.MAX - 1) is equal to one. [0286] 4. This procedure uses a variable length code, where the code lengths are chosen according to the probability of occurrence of a symbol, for example a Huffman code.

Advantageous Embodiments (Examples, Details are Optional)

[0287] In an embodiment an estimator applies, for example, a base set of adaptation parameters, which are N-tuple of adaptation parameters

[00144] $s h_{i}^{k} .$

Then a subset of the base set is chosen. One parameter out of the subset is signaled.

[0288] In a particularly advantageous embodiment, the configuration is, for example, equal to the previous advantageous embodiment, but an estimator is used, which is configured such is identical to the estimator used the current draft of MPEG-7 part 17 standard for compression of neural networks for multimedia content description and analysis and the base set contains, for example, the following 28 pairs for

[00145] $(s h_{i}^{k}, s h_{2}^{k}) :$

TABLE-US-00009 Advantageous base set of adaptations parameters 0 (0,0) 14 (2,3) 1 (0,1) 15 (2,4) 2 (0,2) 16 (2,5) 3 (0,3) 17 (2,6) 4 (0,4) 18 (3,3) 5 (0,5) 19 (3,4) 6 (0,6) 20 (3,5) 7 (1,1) 21 (3,6) 8 (1,2) 22 (4,4) 9 (1,3) 23 (4,5) 10 (1,4) 24 (4,6) 11 (1,5) 25 (5,5) 12 (1,6) 26 (5,6) 13 (2,2) 27 (6,6)

[0289] The subset of size 3 is defined and ordered, for example, such that the indexes q according to Table 2 are assigned, for example, in the case all parameters of a layer are quantized with DQ. The parameter with index q = 0 is denoted, for example, as default parameter:

TABLE-US-00010 Advantageous subset of adaptation parameters for set size 3 q Adaptation parameter pair 0 (1,4) 1 (0,1) 2 (2,6)

[0290] For example, one parameter out of the subset is signaled, for example, by encoding q according to encoding scheme 1., where, for example, the bin useNotDefault is encoded using a context model and all other bins are encoded with a fixed length of one bit per bin.

[0291] In another embodiment (example), the configuration is identical to the previous advantageous embodiment, except for the assigned adaptation parameter pairs and the size of the chosen subset (Table 3), which is equal to 5.

TABLE-US-00011 Advantageous subset of adaptation parameters for set size 5 q Adaptation parameter pair 0 (1,4) 1 (0,0) 2 (0,6) 3 (1,1) 4 (2,6)

[0292] In another embodiment (example), the configuration is identical to the previous advantageous embodiment, except for assigned adaptation parameter pairs (Table 4):

TABLE-US-00012 Second advantageous subset of adaptation parameters for set size 5 q Adaptation parameter pair 0 (1,2) 1 (0,0) 2 (0,5) 3 (2,5) 4 (3,4)

[0293] In another embodiment (example), the configuration is identical to the previous advantageous embodiment, except for the assigned adaptation parameter pairs and the size of the chosen subset (Table 5), which is equal to 9.

TABLE-US-00013 Advantageous subset of adaptation parameters for set size 9 q Adaptation parameter pair 0 (1,4) 1 (0,0) 2 (0,5) 3 (1,1) 4 (1,2) 5 (2,4) 6 (2,6) 7 (3,4) 8 (3,5)

[0294] In another embodiment (example), the configuration is identical to the previous advantageous embodiment, except for the assigned adaptation parameter pairs (Table 6).

TABLE-US-00014 Second advantageous subset of adaptation parameters for set size 9 d Adaptation parameter pair 0 (1,3) 1 (0,0) 2 (0,5) 3 (1,1) 4 (1,6) 5 (2,4) 6 (2,6) 7 (3,5) 8 (4,4)

[0295] In another embodiment (example), the configuration is identical to the previous advantageous embodiment, except for assigned adaptation parameter pairs (Table 7), the size of the chosen subset (5), and the used quantization method, which uses URQ:

TABLE-US-00015 Advantageous subset of adaptation parameters for set size 5 and URQ q Adaptation parameter pair 0 (1,4) 1 (0,6) 2 (1,1) 3 (2,6) 4 (3,4)

[0296] In another embodiment (example), an estimator is used, which is configured such that it is identical to the estimator used the current draft of MPEG-7 part 17 standard for compression of neural networks for multimedia content description and analysis and the base set of Table 1 is used. This is denoted as base configuration.

[0297] Whenever a layer parameter is quantized with DQ, the subset (of size 9) of parameter pairs in Table 5 is applied. If a layer parameter is quantized with URQ the subset in Table 8 is used.

TABLE-US-00016 Advantageous subset of adaptation parameters for set size 9 q Adaptation parameter pair 0 (1,4) 1 (0,1) 2 (0,6) 3 (1,2) 4 (1,6) 5 (2,5) 6 (2,6) 7 (3,4) 8 (3,5)

[0298] In another embodiment (example), the base configuration of the previous advantageous embodiment is applied.

[0299] Whenever the number of elements of a layer parameter is below a threshold X, which may for example be set to X = 1000, the subset with size 3 of parameter pairs, for example, in Table 2, denoted as first subset, is used. Otherwise, if the number of elements of a layer parameter is greater or equal to the threshold X, the subset with size 9, for example, in Table 5, denoted as second subset, is used.

[0300] In another embodiment (example), the configuration is identical to the previous advantageous embodiment, but instead of using a threshold, a flag (for example, useSecondSubset) is encoded, which determines, for example, the subset to be used. For example, if the flag is equal to zero, the first subset is used. If the flag is equal to one, the second subset is used.

[0301] Implementation alternatives:

[0302] Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus.

[0303] Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, one or more of the most important method steps may be executed by such an apparatus.

[0304] Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a Blu-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.

[0305] Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.

[0306] Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.

[0307] Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.

[0308] In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.

[0309] A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or non-transitionary.

[0310] A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.

[0311] A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.

[0312] A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.

[0313] A further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.

[0314] In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods may be performed by any hardware apparatus.

[0315] The apparatus described herein may be implemented using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.

[0316] The apparatus described herein, or any components of the apparatus described herein, may be implemented at least partially in hardware and/or in software.

[0317] The methods described herein may be performed using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.

[0318] The methods described herein, or any components of the apparatus described herein, may be performed at least partially by hardware and/or by software.

[0319] It should be noted that any embodiments as defined by the claims can be supplemented by any of the details (features and functionalities) described herein.

[0320] Also, the embodiments described herein can be used individually, and can also be supplemented by any of the features included in the claims.

[0321] Also, it should be noted that individual aspects described herein can be used individually or in combination. Thus, details can be added to each of said individual aspects without adding details to another one of said aspects.

[0322] It should also be noted that the present disclosure describes, in general, explicitly or implicitly, features usable in a video encoder (apparatus for providing an encoded representation of an input video signal) and in a video decoder (apparatus for providing a decoded representation of a video signal on the basis of an encoded representation), and in an audio encoder and in an audio decoder. Thus, any of the features described herein can be used in the context of a video encoder and in the context of a video decoder and in the context of an audio encoder and in the context of an audio decoder.

[0323] Moreover, features and functionalities disclosed herein relating to a method can also be used in an apparatus (configured to perform such functionality). Furthermore, any features and functionalities disclosed herein with respect to an apparatus can also be used in a corresponding method. In other words, the methods disclosed herein can be supplemented by any of the features and functionalities described with respect to the apparatuses.

[0324] Also, any of the features and functionalities described herein can be implemented in hardware or in software, or using a combination of hardware and software, as will be described in the section “implementation alternatives”.

[0325] Moreover, any of the features and syntax elements described herein can optionally be introduced into a video bit stream, both individually and taken in combination.

[0326] Furthermore, it should be noted that all features, functionalities and details described in the context of an encoder or of an encoding can optionally also be used in the context of a decoder or of a decoding. For example, a context derivation in a decoder may be analog to a context derivation in an encoder, wherein decoded valued may take the role of values to be encoded. Typically decoders are designed such that the context used in the decoder corresponds to the context used in the encoder, to keep the encoder and the decoder in synchronism.

[0327] While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which will be apparent to others skilled in the art and which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations, and equivalents as fall within the true spirit and scope of the present invention.

DECODER FOR DECODING WEIGHT PARAMETERS OF A NEURAL NETWORK, ENCODER, METHODS AND ENCODED REPRESENTATION USING PROBABILITY ESTIMATION PARAMETERS

Inventors

Cpc classification

Classification Explorer

G06N7/01

PHYSICS

Classification Explorer

G06N20/00

PHYSICS

Classification Explorer

H03M7/4018

ELECTRICITY

Classification Explorer

H03M7/70

ELECTRICITY

Classification Explorer

G06N3/0455

PHYSICS

Classification Explorer

G06F18/2193

PHYSICS

Classification Explorer

G06N3/063

PHYSICS

Classification Explorer

G06F18/211

PHYSICS

International classification

Classification Explorer

G06N3/0455

PHYSICS

Classification Explorer

G06F18/211

PHYSICS

Classification Explorer

G06F18/21

PHYSICS

Abstract

Claims

Description