Quality estimation of adaptive multimedia streaming

Abstract

Mechanisms for predicting a multimedia session MOS are provided. The multimedia session has a video session and an audio session. The video quality and the audio quality are represented by vectors of per-time-unit scores of video quality and audio quality, respectively. The multimedia session is represented by a vector of the rebuffering start times of each rebuffering event, and a vector of the rebuffering durations of each rebuffering event. Audiovisual quality features are generated from the vectors of per-time-unit scores of video and audio quality. Buffering features are generated from the vector of rebuffering start times of each rebuffering event and the vector of rebuffering durations of each rebuffering event. A multimedia session MOS is then estimated based on the generated audiovisual quality features and the generated buffering features.

Claims

1. A method, performed by a Mean Opinion Score (MOS) estimator, for predicting a multimedia session MOS, wherein the multimedia session comprises a video session and an audio session, wherein video quality is represented by a vector of per-time-unit scores of video quality and wherein audio quality is represented by is a vector of per-time-unit scores of audio quality, and wherein the multimedia session is represented by a vector of rebuffering start times of each rebuffering event and a vector of rebuffering durations of each rebuffering event, the method comprising: generating audiovisual quality features from the vector of per-time-unit scores of video quality and the vector of per-time-unit scores of audio quality, the audiovisual quality features comprising: a vector of per-time-unit scores of audiovisual quality; a weighted combination of the per-time-unit scores of audiovisual quality; a negative bias representing how a sudden drop in per-time-unit scores of audiovisual quality affects the multimedia session MOS; and a term representing a degradation due to oscillations in the per-time-unit-scores of audiovisual quality; generating buffering features from the vector of rebuffering start times of each rebuffering event and the vector of rebuffering durations of each rebuffering event; and estimating a multimedia session MOS from the generated audiovisual quality features and the generated buffering features.

2. The method of claim 1 wherein the vector of per-time-unit scores of audiovisual quality is calculated as a polynomial function of the vector of per-time-unit scores of video quality and the vector of per-time-unit scores of audio quality.

3. The method of claim 1 wherein the weights are exponential functions of a time since a start time of the multimedia session and a multimedia session duration.

4. The method of claim 1 wherein the rebuffering start times of each rebuffering event are calculated from the start time of the multimedia session.

5. The method of claim 1, wherein the negative bias is calculated as: $negBias = \max (0, - 10 th percentrile of [per - time unit scores of audiovisual quality [t] .Math. (c [1] + (1 - c [1]) .Math. e^{- \frac{(T - t) \log (0.5)}{- c [2]}})]) .Math. c [23]$ wherein c[1], c[2] and c[23] are given coefficients, t is time since the start time of the multimedia session and T is the multimedia session duration.

6. The method of claim 1, wherein the term representing a degradation due to oscillations in the per time unit scores of audiovisual quality is calculated as the number of occurrences when an absolute difference between the per time unit scores of the audiovisual quality and the weighted combination of the per time unit scores of audiovisual quality exceeds a given threshold value, divided by the multimedia session duration.

7. The method of claim 1, wherein the generated buffering features comprise a term representing a degradation due to initial buffering and a term representing a degradation due to rebuffering.

8. The method of claim 7 wherein the multimedia session is further represented by an initial buffering duration being the time between an initiation of the multimedia session and a start time of the multimedia session.

9. The method of claim 8 wherein: the term representing degradation due to initial buffering is modeled as a product of a term representing an initial buffering impact and a term representing a forgetness factor impact; and the initial buffering impact is a sigmoid function of the initial buffering duration; and the forgetness factor is an exponential function of the time since the start time of the multimedia session.

10. The method of claim 7 wherein: the term representing degradation due to rebuffering is modeled as a sum, over all rebuffering events, of products of a rebuffering duration impact, a rebuffering repetition impact, and an impact of time since the last rebuffering ended; the rebuffering duration impact is a sigmoid function of a rebuffering duration; the rebuffering repetition impact is a sigmoid function of a rebuffering repetition number; and the impact of time since the last rebuffering ended is an exponential function of the time since the last rebuffering ended.

11. The method of claim 1, wherein the multimedia session MOS is estimated as the difference between the weighted combination of the per-time-unit scores of audiovisual quality and the sum of: the negative bias; the term representing degradation due to oscillations in the per-time-unit-scores of audiovisual quality; the term representing degradation due to initial buffering, and the term representing degradation due to rebuffering.

12. A Mean Opinion Score (MOS) estimator for predicting a multimedia session MOS, wherein the multimedia session comprises a video session and an audio session, wherein video quality is represented by a vector of per-time-unit scores of video quality and wherein audio quality is represented by is a vector of per-time-unit scores of audio quality, and wherein the multimedia session is represented by a vector of rebuffering start times of each rebuffering event and a vector of rebuffering durations of each rebuffering event, the MOS estimator comprising: memory circuitry configured to store instructions; and processing circuitry operatively connected to the memory circuitry, and configured to execute the instructions stored in the memory circuitry to: generate audiovisual quality features from the vector of per-time-unit scores of video quality and the vector of per-time-unit scores of audio quality, the audiovisual quality features comprising: a vector of per-time-unit scores of audiovisual quality; a weighted combination of the per-time-unit scores of audiovisual quality; a negative bias representing how a sudden drop in per-time-unit scores of audiovisual quality affects the multimedia session MOS; and a term representing a degradation due to oscillations in the per-time-unit-scores of audiovisual quality; generate buffering features from the vector of rebuffering start times of each rebuffering event and the vector of rebuffering durations of each rebuffering event; and estimate a multimedia session MOS from the generated audiovisual quality features and the generated buffering features.

13. The MOS estimator of claim 12 wherein the vector of per-time-unit scores of audiovisual quality is calculated as a polynomial function of the vector of per-time-unit scores of video quality and the vector of per-time-unit scores of audio quality.

14. The MOS estimator of claim 12 wherein the weights are exponential functions of a time since a start time of the multimedia session and a multimedia session duration.

15. The MOS estimator of claim 12 wherein the rebuffering start times of each rebuffering event are calculated from the start time of the multimedia session.

16. The MOS estimator of claim 12 wherein the negative bias is calculated as: $negBias = \max (0, - 10 th percentrile of [per - time - unit scores of audiovisual quality [t] .Math. (c [1] + (1 - c [1]) .Math. e^{- \frac{(T - t) \log (0.5)}{- c [2]}})]) .Math. c [23]$ wherein c[1], c[2] and c[23] are given coefficients, t is time since the start time of the multimedia session and T is the multimedia session duration.

17. The MOS estimator of claim 12, wherein the processing circuitry is further configured to generate the buffering features to comprise a term representing a degradation due to initial buffering and a term representing a degradation due to rebuffering.

18. The MOS estimator of claim 17 wherein the multimedia session is further represented by an initial buffering duration being the time between an initiation of the multimedia session and a start time of the multimedia session.

19. The MOS estimator of claim 14 wherein the processing circuitry is further configured to model the term representing degradation due to initial buffering as a product of a term representing an initial buffering impact and a term representing a forgetness factor impact, and wherein: the initial buffering impact is a sigmoid function of the initial buffering duration; and the forgetness factor is an exponential function of the time since the start time of the multimedia session.

20. The MOS estimator of claim 14 wherein the processing circuitry is further configured to model the term representing degradation due to rebuffering as a sum, over all rebuffering events, of products of a rebuffering duration impact, a rebuffering repetition impact, and an impact of time since the last rebuffering ended, and wherein: the rebuffering duration impact is a sigmoid function of a rebuffering duration; the rebuffering repetition impact is a sigmoid function of a rebuffering repetition number; and the impact of time since the last rebuffering ended is an exponential function of the time since the last rebuffering ended.

21. The MOS estimator of claim 14, wherein the instructions are such that MOS estimator is operative to estimate the multimedia session MOS as the difference between the weighted combination of the per-time-unit scores of audiovisual quality and the sum of: the negative bias; the term representing degradation due to oscillations in the per-time-unit-scores of audiovisual quality; the term representing degradation due to initial buffering, and the term representing degradation due to rebuffering.

22. A non-transitory computer readable recording medium storing a computer program product for controlling a Mean Opinion Score (MOS) estimator for predicting a multimedia session MOS, wherein the multimedia session comprises a video session and an audio session, wherein video quality is represented by a vector of per-time-unit scores of video quality and wherein audio quality is represented by is a vector of per-time-unit scores of audio quality, and wherein the multimedia session is represented by a vector of rebuffering start times of each rebuffering event and a vector of rebuffering durations of each rebuffering event, the computer program product comprising software instructions which, when run on processing circuitry of the MOS estimator, causes the MOS estimator to: generate audiovisual quality features from the vector of per-time-unit scores of video quality and the vector of per-time-unit scores of audio quality, the audiovisual quality features comprising: a vector of per-time-unit scores of audiovisual quality; a weighted combination of the per-time-unit scores of audiovisual quality; a negative bias representing how a sudden drop in per-time-unit scores of audiovisual quality affects the multimedia session MOS; and a term representing a degradation due to oscillations in the per-time-unit-scores of audiovisual quality; generate buffering features from the vector of rebuffering start times of each rebuffering event and the vector of rebuffering durations of each rebuffering event; and estimate a multimedia session MOS from the generated audiovisual quality features and the generated buffering features.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) The invention is now described, by way of example, with reference to the accompanying drawings, in which:

(2) FIGS. 1A-C are schematic graphs illustrating buffering and bitrate over time.

(3) FIG. 2 illustrates the steps performed by a MOS estimator according to the embodiments of the present invention.

(4) FIG. 3 illustrates the weight factor as a function of a sample age according to the embodiments of the present invention.

(5) FIG. 4 shows an initial buffering impact as a function of initial buffering duration according to the embodiments of the present invention.

(6) FIG. 5 shows a forgetness factor impact as a function of time since the start time of multimedia session, according to the embodiments of the present invention.

(7) FIG. 6 illustrates a rebuffering duration impact as a function of rebuffering duration, according to the embodiments of the present invention.

(8) FIG. 7 illustrates a rebuffering repetition impact as a function of rebuffering repetition number, according to the embodiments of the present invention.

(9) FIG. 8 illustrates a forgetting factor impact as a function of time since the last rebuffering, according to the embodiments of the present invention.

(10) FIG. 9 is an aggregation module according to the embodiments of the present invention.

(11) FIG. 10 depicts a schematic block diagram illustrating functional units of a MOS estimator for predicting a multimedia session MOS according to the embodiments of the present invention.

(12) FIG. 11 illustrates a schematic block diagram illustrating a computer comprising a computer program product with a computer program for predicting a multimedia session MOS, according to embodiments of the present invention.

DETAILED DESCRIPTION OF THE PROPOSED SOLUTION

(13) The invention will now be described more fully hereinafter with reference to the accompanying drawings, in which certain embodiments of the invention are shown. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided by way of example so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. Like numbers refer to like elements throughout the description.

(14) The subjective MOS is how humans rate the quality of a multimedia sequence. Objective MOS estimation is using models to predict/estimate how humans will rate it. In general, parametric based methods are usually used to predict the multimedia MOS. This kind of parametric based methods usually results in quite a large prediction error.

(15) The basic idea of embodiments presented herein is to predict the multimedia session MOS. The multimedia session comprises a video session and an audio session, wherein video quality is represented by a vector of per-time-unit scores of video quality and wherein audio quality is represented by a vector of per-time-unit scores of audio quality. The multimedia session is further represented by a vector of rebuffering start times of each rebuffering event, a vector of rebuffering durations of each rebuffering event, and an initial buffering duration being the time between an initiation of the multimedia session and a start time of the multimedia session.

(16) A time unit may be a second. Thus, the lists of per time unit scores of the video and audio quality may be obtained per second. For example, a 300 second clip has audio and video vectors with 300 elements each.

(17) Initial buffering duration may also be expressed in seconds. For example, an 8-second initial buffering (which has a start time at 0 seconds) has a duration of 8 seconds. Rebuffering duration and location may also be expressed in seconds. Start times are in media time, so it doesn't depend on a duration of any previous buffering.

(18) According to one aspect, a method, performed by a MOS, Mean Opinion Score, estimator, for predicting a multimedia session MOS is provided, as described in FIG. 2. The method comprises a step S1 of generating audiovisual quality features from the vector of per-time-unit scores of video quality and the vector of per-time-unit scores of audio quality.

(19) The audiovisual quality features comprise a vector of per-time-unit scores of audiovisual quality, calculated as a polynomial function of the vector of per-time-unit scores of video quality and the vector of per-time-unit scores of audio quality. That is, video quality and audio quality are “merged” to a measure of a combined quality, mosBoth. This merge is known from ITU-T P.1201. For example, as given in a source code below, a per-time-unit score of audiovisual quality may be calculated as:

(20) $mosBoth [i] = \frac{\begin{matrix} (mosV [i] - 1) + c [17] .Math. (mosA [i] - 1) + \\ c [18] .Math. (mosV [i] - 1) .Math. (\frac{mosA [i] - 1}{4}) \end{matrix}}{1 + c [17] + c [18]} + 1$
wherein mosV and mosA respectively are vectors of per-time-unit scores of video and audio quality, and c[17] and c[18] are audio and video merging weights. For example, c[17] may be set to 0.16233, and c[18] to −0.013804, but the present invention is by no means limited to these specific values.

(21) The audiovisual quality features further comprise a weighted combination of the per-time-unit scores of audiovisual quality, wherein the weights are exponential functions of the time since the start time of multimedia session and the multimedia session duration. Namely, due to memory effects, media played longer back in time and thus longer back in memory is slightly forgotten, and is thus weighted down. The weighted combination of the per-time-unit scores of audiovisual quality is referred to as “mosBasic”. An example of the weights as functions of a difference between the multimedia session duration and the time since the start time (depicted as a sample age here) of multimedia session is shown in FIG. 3. A source code below demonstrates how mosBasic may be calculated:

(22) TABLE-US-00001 for i in range(mosLength): mosBoth[i] = (1 * (mosV[i] − 1) + c[17] * (mosA[i] − 1) + c[18] * (mosV[i] − 1) * (mosA[i] − 1) / 4) / (1 + c[17] + c[18]) + 1 mosTime = mosLength − i − 1 mosWeight = exponential([1, c[1], 0, c[2]], mosTime) sum1 += mosBoth[i] * mosWeight sum2 += mosWeight mosBasic = sum1 / sum2
wherein mosLength corresponds to the multimedia session duration, mosTime corresponds to the difference between the multimedia session duration and the time since the start time of multimedia session, and c[1] and c[2] are memory adaptation weights. For example, c[1] may be set to 0.2855, and c[2] to 10.256, but the present invention is by no means limited to these specific values.

(23) The audiovisual quality features further comprise a negative bias. The negative bias represents how a sudden drop in per-time-unit scores of audiovisual quality affects the multimedia session MOS. When media quality varies, one is more affected by a sudden drop in quality, as compared to a similar sudden improvement. This effect is captured by the negative bias. The negative bias may be modelled by calculating the offsets for each per-time-unit (e.g., one-second) quality score towards mosBasic. These offsets may also be scaled by the forgetting factor weight, so that media longer back in memory gets less impact.

(24) From this vector of weighted per-time-unit (i.e., one-second) offsets, a certain percentile can be calculated. For example, it may be an ˜10th percentile, but it could be a different percentile as well. This is usually a negative number, as the lowest quality scores in the vectors should normally be lower than mosBasic, so the result is negated into a positive value, meaning a higher value now indicates a higher impact of the negative bias. This is then scaled linearly to the right range. An example of a source code for calculating the negative bias is as:

(25) TABLE-US-00002 mosOffset = list(mosBoth) for i in range(mosLength): mosTime = mosLength−i−1 mosWeight = exponential([1, c[1], 0, c[2]], mosTime) mosOffset[i] = (mosOffset[i] − mosBasic)*mosWeight mosPerc = np.percentile(mosOffset, c[22], interpolation=‘linear’) negBias = np.maximum(0, −mosPerc) negBias = negBias*c[23]

(26) Equivalently, the negative bias is calculated as follows:

(27) $negBias = \max (0, - 10 th percentile of [per - time - unit scores of audiovisual quality [t] .Math. (c [1] + (1 - c [1]) .Math. e^{- \frac{(T - t) lo g (0.5)}{- c [2]}})]) .Math. c [23]$
wherein t is time since the start time of multimedia session and T is the multimedia session duration. Here c[22] and c[23] represent negative bias coefficients. For example, c[22] may be set to 9.1647, and c[23] to 0.74811, but the present invention is by no means limited to these specific values.

(28) The audiovisual quality features comprise a term representing a degradation due to oscillations in the per-time-unit-scores of audiovisual quality. Namely, when media quality fluctuates this is annoying, and the effect of quality fluctuation is caught by counting the number of tops and dips where the unweighted one-second media quality scores (mosBoth) goes above or below mosBasic. In other words, the term representing a degradation due to oscillations in the per-time-unit scores of audiovisual quality may be calculated as the number of occurrences when the absolute difference between the per-time-unit scores of the audiovisual quality and the weighted combination of the per-time-unit scores of audiovisual quality exceeds a given threshold value, divided by the multimedia session duration. The threshold value may be used to disregard small variations that may not be perceivable. An example for the threshold value is 0.1, i.e., a hysteresis of 0.1 is used.

(29) The term representing a degradation due to oscillations, oscDeg, in the per-time-unit-scores of audiovisual quality may also be truncated so that the maximum value is 0.2 oscillations per second. This may then multiplied by a standard deviation of the per-time unit (i.e., per-second) audiovisual quality values, so that higher level of oscillations gets a higher impact. The following source code illustrates how the term representing a degradation due to oscillations can be calculated:

(30) TABLE-US-00003 osc = 0 offset = 0.1 state = 0 for i in range(mosLength): if state != 1: if mosBoth[i] > mosBasic + offset: osc += 1 state = 1 elif state != −1: if mosBoth[i] < mosBasic − offset: osc += 1 state = −1 oscRel = osc / mosLength oscRel = np.minimum(oscRel, 0.2) # Limit to one change per 5 sec oscDeg = np.power(oscRel * np.std(mosBoth, ddof= 1), c[19]) * c[20]

(31) The result may then be scaled non-linearly (approximately squared), and finally linearly scaled to the right range.

(32) The method comprises a step S2 of generating buffering features from the vector of rebuffering start times of each rebuffering event, calculated from the start time of multimedia session, and the vector of rebuffering durations of each rebuffering event.

(33) The generated buffering features may comprise a term representing a degradation due to initial buffering, initDeg, and a term representing a degradation due to rebuffering, bufDeg.

(34) The term representing degradation due to initial buffering may be modeled as a product of a term representing an initial buffering impact and a term representing a forgetness factor impact.

(35) The initial buffering impact may be a sigmoid function of the initial buffering duration. For example, the sigmoid function may basically give a zero impact below 5 seconds and an impact of 4 if the initial buffering duration is longer than that, as shown in FIG. 4. The source code for calculating initDeq may be as follows:

(36) TABLE-US-00004 lengthDeg = sigmoid([0, 4, c[10], c[1 0] + c[11]], buflnit) memoryDeg = exponential([1, c[4], 0, c[5]], mosLength) initDeg = lengthDeg*memoryDeg

(37) Here c[10] and c[11] are constants related to initial buffering and c[4] and c[5] are memory weights related to initial buffering. For example, c[10]=4.5327, c[11]=1.0054, c[4]=0.054304 and c[5]=10.286, but the present invention is by no means limited to these specific values.

(38) However, the impact from initial buffering is only annoying during the initial buffering itself or close after. If the media continues to stream, this problem is forgotten quite soon. Thus, the second modelling is to weight the initial buffering impact with a forgetness factor. The forgetness factor may be an exponential function of the time since the start time of multimedia session, as shown in FIG. 5.

(39) The term representing degradation due to rebuffering, bufDeg, may be modeled as a sum, over all rebuffering events, of products of a rebuffering duration impact, a rebuffering repetition impact, and an impact of time since the last rebuffering. For each rebuffering instance, first the impact of the rebuffering is calculated. The rebuffering duration impact may be a sigmoid function of a rebuffering duration, as shown in FIG. 6.

(40) However, the rebuffering duration impact only models a single rebuffering, evaluated close to the time when the rebuffering happened. If there are more rebufferings, one gets more annoyed for each additional one. This is modeled by the rebuffering repetition impact. The rebuffering repetition impact may be a sigmoid function of a rebuffering repetition number, as shown in FIG. 7. For example, a weight of up to 5 is assigned when the number of rebufferings becomes 4 or more.

(41) Finally, as the time since the last rebuffering passes, one tends to forget about it. The impact of time since the last rebuffering, or a so-called forgetting factor, may be modelled as an exponential function of the time since the last rebuffering, as shown in FIG. 8.

(42) To get the final effect of a single rebuffering, the rebuffering duration impact, the rebuffering repetition impact and the impact of time since the last rebuffering are multiplied. This result is then added to the total impact result for all rebufferings, as shown in the following source code:

(43) TABLE-US-00005 bufDeg = 0; for j in range(len(bufLength)): lengthDeg = sigmoid([0, 4, c[12], c[12]+c[13]], bufLength[j]) repeatDeg = sigmoid([1, c[14], c[15], c[15]+c[16]], j) memoryDeg = exponential([1, c[7], 0, c[8]], mosLength − bufStart[j]) bufDeg = bufDeg + lengthDeg * repeatDeg * memoryDeg bufDeg = bufDeg/4 * (mosBasic − 1)

(44) Here lengthDeg, repeatDeg and memoryDeg denote impacts due to rebuffering duration, rebuffering repetition and the impact of time since the last rebuffering respectively, and bufStart[j] denotes the time since the last rebuffering. In addition, c[12] and c[13] are rebuffering impact constants, c[14]-c[16] are constants related to rebuffering repetition, and c[7] and c[8] are time-since-the-last rebuffering impact (also referred to as rebuffering memory weights). For example, one may set c[12]=−67.632, c[13]=158.18, c[14]=4.9894, c[15]=2.1274, c[16]=2.0001, c[7]=0.17267 and c[8]=10, but the present invention is by no means limited to these specific values.

(45) Finally, the resulting term representing degradation due to rebuffering may be rescaled relative to mosBasic. This may be done since people are more annoyed by a rebuffering if they otherwise have good quality, while if the quality is poor, a rebuffering does not degrade peoples' perception so much.

(46) The method comprises a step S3 of estimating a multimedia session MOS from the generated audiovisual quality features and the generated buffering features, as illustrated in FIG. 9. The multimedia session MOS may be estimated as the difference between the weighted combination of the per-time-unit scores of audiovisual quality and the sum of: the negative bias, the term representing degradation due to oscillations in the per-time-unit-scores of audiovisual quality, the term representing degradation due to initial buffering, and the term representing degradation due to rebuffering. The score is also truncated to be between 1 and 5. In other words, the multimedia session MOS may be estimated according to a source code below:

(47) TABLE-US-00006 mos = mosBasic − initDeg − bufDeg − oscDeg − negBias if mos < 1: mos = 1 if mos > 5: mos = 5 return (mos)

(48) FIG. 10 is a schematic block diagram of a MOS estimator 100, for predicting a multimedia session MOS, wherein the multimedia session comprises a video session and an audio session. The video quality is represented by a vector of per-time-unit scores of video quality and the audio quality is represented by is a vector of per-time-unit scores of audio quality. The multimedia session is represented by a vector of rebuffering start times of each rebuffering event, a vector of rebuffering durations of each rebuffering event, and an initial buffering duration being the time between an initiation of the multimedia session and a start time of the multimedia session.

(49) The MOS estimator 100 comprises, according to this aspect, a generating unit 160, configured to generate audiovisual quality features from the vector of per-time-unit scores of video quality and the vector of per-time-unit scores of audio quality. The audiovisual quality features comprise: a vector of per-time-unit scores of audiovisual quality, calculated as a polynomial function of the vector of per-time-unit scores of video quality and the vector of per-time-unit scores of audio quality; a weighted combination of the per-time-unit scores of audiovisual quality, wherein the weights are exponential functions of a time since the start time of multimedia session and a multimedia session duration; a negative bias representing how a sudden drop in per-time-unit scores of audiovisual quality affects the multimedia session MOS; and a term representing a degradation due to oscillations in the per-time-unit-scores of audiovisual quality.

(50) The generating unit 160 is further configured to generate buffering features from the vector of rebuffering start times of each rebuffering event, calculated from the start time of multimedia session, and the vector of rebuffering durations of each rebuffering event.

(51) The MOS estimator 100 comprises, according to this aspect, an estimating unit 170, configured to estimate a multimedia session MOS from the generated audiovisual quality features and the generated buffering features.

(52) The generating 160 and estimating 170 units may be hardware based, software based (in this case they are called generating and estimating modules respectively) or may be a combination of hardware and software.

(53) The generating unit 160 may calculate the negative bias as:

(54) $negBias = \max (0, - 10 th percentile of [per - time - unit scores of audiovisual quality [t] .Math. (c [1] + (1 - c [1]) .Math. e^{- \frac{(T - t) lo g (0.5)}{- c [2]}})]) .Math. c [23]$
wherein t is time since the start time of multimedia session, T is the multimedia session duration and c[1], c[2] and c[23] are constants.

(55) The generating unit 160 may calculate the degradation due to oscillations in the per time unit scores of audiovisual quality as the number of occurrences when the absolute difference between the per time unit scores of the audiovisual quality and the weighted combination of the per time unit scores of audiovisual quality exceeds a given threshold value, divided by the multimedia session duration. The threshold value may be e.g. 0.1. The degradation due to oscillations in the per time unit scores of audiovisual quality may also be truncated so that the maximum value is 0.2 oscillations per second.

(56) The generated buffering features comprise a term representing a degradation due to initial buffering and a term representing a degradation due to rebuffering. Thus, the generating unit 160 may model the term representing degradation due to initial buffering as a product of a term representing an initial buffering impact and a term representing a forgetness factor impact. The initial buffering impact may be a sigmoid function of the initial buffering duration, and the forgetness factor may be an exponential function of the time since the start time of multimedia session.

(57) The generating unit 160 may model the term representing degradation due to rebuffering as a sum, over all rebuffering events, of products of a rebuffering duration impact, a rebuffering repetition impact, and an impact of time since the last rebuffering ended. The rebuffering duration impact may be a sigmoid function of a rebuffering duration. The rebuffering repetition impact may be a sigmoid function of a rebuffering repetition number. The impact of time since the last rebuffering ended may be an exponential function of the time since the last rebuffering ended.

(58) The MOS estimator 100 may estimate the multimedia session MOS as the difference between the weighted combination of the per-time-unit scores of audiovisual quality and the sum of the negative bias, the term representing degradation due to oscillations in the per-time-unit-scores of audiovisual quality, the term representing degradation due to initial buffering, and the term representing degradation due to rebuffering.

(59) The MOS estimator 100 can be implemented in hardware, in software or a combination of hardware and software. The MOS estimator 100 can be implemented in user equipment, such as a mobile telephone, tablet, desktop, netbook, multimedia player, video streaming server, set-top box or computer. The MOS estimator 100 may also be implemented in a network device in the form of or connected to a network node, such as radio base station, in a communication network or system.

(60) Although the respective units disclosed in conjunction with FIG. 10 have been disclosed as physically separate units in the device, where all may be special purpose circuits, such as ASICs (Application Specific Integrated Circuits), alternative embodiments of the device are possible where some or all of the units are implemented as computer program modules running on a general-purpose processor. Such an embodiment is disclosed in FIG. 11.

(61) FIG. 11 schematically illustrates an embodiment of a computer 150 having a processing unit 110 such as a DSP (Digital Signal Processor) or CPU (Central Processing Unit). The processing unit 110 can be a single unit or a plurality of units for performing different steps of the method described herein. The computer also comprises an input/output (I/O) unit 120 for receiving a vector of per-time-unit scores of video quality, a vector of per-time-unit scores of audio quality, a vector of rebuffering durations of each rebuffering event, and an initial buffering duration. The I/O unit 120 has been illustrated as a single unit in FIG. 11 but can likewise be in the form of a separate input unit and a separate output unit.

(62) Furthermore, the computer 150 comprises at least one computer program product 130 in the form of a non-volatile memory, for instance an EEPROM (Electrically Erasable Programmable Read-Only Memory), a flash memory or a disk drive. The computer program product 130 comprises a computer program 140, which comprises code means which, when run on the computer 150, such as by the processing unit 110, causes the computer 150 to perform the steps of the method described in the foregoing in connection with FIG. 2.

(63) The embodiments described above are to be understood as a few illustrative examples of the present invention. It will be understood by those skilled in the art that various modifications, combinations and changes may be made to the embodiments without departing from the scope of the present invention. In particular, different part solutions in the different embodiments can be combined in other configurations, where technically possible.

(64) Aggregation Code

(65) The Python code below summarizes the algorithm for estimating MOS, according to the embodiments of the present invention:

(66) TABLE-US-00007 def aggregation11(mosV, mosA, buflnit, buf Length, bufStart): # mosV and mosA are vectors of 1 − sec scores, index 0 is start of video or audio # buflnit is seconds of initial buffering # buf Length is a vector of rebuffering lengths # buf Start is a vector of rebuffering start times # c0 - Dummy # c1-c3 - Adaptation memory weights # c4-c6 - Initbuf memory weights # c7-c9 - Buffering memory weights # c10-c11 - Initbuf impact # c12-c13 - Rebuf impact # c14-c16 - Repetition annoyance # c17-c18 - Audio/video merging weights # c19-c20 - Oscillation weights # c21 - Last part bias (not used) # c22-23 - Negative bias coefs c = [0, 0.2855, 10.256, 17.85, 0.054304, 10.286, 9.8766, 0.17267, 10, 17.762, 4.5327, 1.0054, −67.632, 158.18, 4.9894, 2.1274, 2.0001, 0.16233, −0.013804, 2.1944, 43.565, 0.13025, 9.1647, 0.74811] mosLength = np.minimum(len(mosV), len(mosA)) sum1 = 0 sum2 = 0 mosBoth = list(mosV) for i in range(mosLength): mosBoth[i] = (1 * (mosV[i] − 1) + c[17] * (mosA[i] − 1) + c[18] * (mosV[i] − 1) * (mosA[i] − 1) / 4) / (1 + c[17] + c[18]) + 1 mosTime = mosLength − i − 1 mosWeight = exponential([1, c[1], 0, c[2]], mosTime) sum1 += mosBoth[i] * mosWeight sum2 += mosWeight mosBasic = sum1 / sum2 osc = 0 offset = 0.1 state = 0 for i in range(mosLength): if state != 1: # State = unknown or dip if mosBoth[i] > mosBasic + offset: osc += 1 state = 1 elif state != −1: # State = unknow or top if mosBoth[i] < mosBasic − offset: osc += 1 state =−1 oscRel = osc / mosLength oscRel = np.minimum(oscRel, 0.2) # Limit to one change per 5 sec oscDeg = np.power(oscRel * np.std(mosBoth, ddof=1), c[19]) * c[20] mosOffset = list(mosBoth) for i in range(mosLength): mosTime = mosLength−i−1 mosWeight = exponential([1, c[1], 0, c[2]], mosTime) mosOffset[i] = (mosOffset[i] − mosBasic)*mosWeight mosPerc = np.percentile(mosOffset, c[22], interpolation=‘linear’) # Should normally be negative negBias = np.maximum(0, −mosPerc) negBias = negBias*c[23] lengthDeg = sigmoid([0, 4, c[10], c[10] + c[11]], buflnit) memoryDeg = exponential([1, c[4], 0, c[5]], mosLength) initDeg = lengthDeg*memoryDeg bufDeg = 0; for j in range(len(bufLength)): lengthDeg = sigmoid([0, 4, c[12], c[12]+c[13]], bufLength[j]) repeatDeg = sigmoid([1, c[14], c[15], c[15]+c[16]], j) memoryDeg = exponential([1, c[7], 0, c[8]], mosLength − bufStart[j]) bufDeg = bufDeg + lengthDeg * repeatDeg * memoryDeg bufDeg = bufDeg/4 * (mosBasic − 1) # Convert to relative change mos = mosBasic − initDeg − bufDeg − oscDeg − negBias if mos < 1: mos = 1 if mos > 5: mos = 5 return (mos) def sigmoid(par, x): scalex = 10 / (par[3] − par[2]) midx = (par[2] + par[3]) / 2 y = par[0] + (par[1] − par[0]) / (1 + np.exp(−scalex * (x − midx))) return y def exponential(c, x): z = np.log(0.5) / (−(c[3] − c[2])) y = c[1] + (c[0] − c[1])* np.exp(−(x − c[2]) * z) return y

Quality estimation of adaptive multimedia streaming

Assignee

Inventors

Cpc classification

Classification Explorer

H04N21/23418

ELECTRICITY

Classification Explorer

H04N21/44008

ELECTRICITY

Classification Explorer

H04N21/84

ELECTRICITY

Classification Explorer

H04N21/23406

ELECTRICITY

Classification Explorer

H04N17/004

ELECTRICITY

Classification Explorer

H04N21/23805

ELECTRICITY

Classification Explorer

H04L65/80

ELECTRICITY

Classification Explorer

H04N21/24

ELECTRICITY

Classification Explorer

H04N21/2402

ELECTRICITY

International classification

Classification Explorer

H04N21/234

ELECTRICITY

Classification Explorer

H04N17/00

ELECTRICITY

Classification Explorer

H04N21/44

ELECTRICITY

Classification Explorer

H04N21/238

ELECTRICITY

Classification Explorer

H04L65/80

ELECTRICITY

Classification Explorer

H04N21/24

ELECTRICITY

Classification Explorer

H04N21/84

ELECTRICITY

Abstract

Claims

Description