MATHEMATICAL MODEL DERIVATION APPARATUS, MATHEMATICAL MODEL DERIVATION METHOD AND PROGRAM
20230072186 · 2023-03-09
Inventors
- Kazuhisa YAMAGISHI (Musashino-shi, Tokyo, JP)
- Noritsugu EGI (Musashino-shi, Tokyo, JP)
- Noriko YOSHIMURA (Musashino-shi, Tokyo, JP)
Cpc classification
H04N21/647
ELECTRICITY
H04N19/85
ELECTRICITY
H04N17/00
ELECTRICITY
H04N21/26208
ELECTRICITY
H04N19/154
ELECTRICITY
International classification
H04N21/845
ELECTRICITY
H04N21/647
ELECTRICITY
Abstract
A mathematical model deriving apparatus includes an encoding unit that generates a plurality of deteriorated videos after encoding an original video, in accordance with a plurality of combinations of a plurality of encoding parameters for a codec setting, a quality estimation unit that calculates a quality estimation value of each of the plurality of deteriorated videos, and a deriving unit that outputs video quality in response to the plurality of encoding parameters as input and derives a coefficient of a mathematical model in accordance with the quality estimation value and the plurality of combinations of the plurality of encoding parameters. This allows for deriving a mathematical model capable of evaluating quality according to a codec setting.
Claims
1. A mathematical model deriving apparatus, comprising: a processor; and a memory storing program instructions that cause the processor to: generate a plurality of deteriorated videos after encoding an original video, in accordance with a plurality of combinations of a plurality of encoding parameters for a codec setting; calculate a quality estimation value of each of the plurality of deteriorated videos; and derive a coefficient of a mathematical model in accordance with the quality estimation value and the plurality of combinations of the plurality of encoding parameters, the mathematical model outputting video quality in response to the plurality of encoding parameters as input.
2. The mathematical model deriving apparatus according to claim 1, wherein the processor further calculates, in accordance with the original video, the quality estimation value of each of the plurality of deteriorated videos.
3. The mathematical model deriving apparatus according to claim 1, wherein the processor calculates, for each of the plurality of deteriorated videos, an average value of quality estimation values of the plurality of deteriorated videos per video frame and calculates the quality estimation value by inputting the average value to a function converting the average value into a quality estimation value in which influence of subjective quality on a frame rate is considered, the function being created in advance for a plurality of the frame rates.
4. The mathematical model deriving apparatus according to claim 1, wherein the processor calculates, for each of the plurality of deteriorated videos, an average value of quality estimation values of the plurality of deteriorated videos for each original video and calculates the quality estimation value by inputting the average value to a function converting the average value into a quality estimation value in which influence of subjective quality on the original video is considered, the function being created in advance for the plurality of the original videos.
5. A method for deriving a mathematical model executed by a computer, the method comprising: generating a plurality of deteriorated videos after encoding an original video, in accordance with a plurality of combinations of a plurality of encoding parameters for a codec setting; calculating a quality estimation value of each of the plurality of deteriorated videos; and deriving a coefficient of a mathematical model in accordance with the quality estimation value and the plurality of combinations of the plurality of encoding parameters, the mathematical model outputting video quality in response to the plurality of encoding parameters as input.
6. The method for deriving a mathematical model according to claim 5, wherein the calculating further calculates, in accordance with the original video, the quality estimation value of each of the plurality of deteriorated videos.
7. The method for deriving a mathematical model according to claim 5, wherein the calculating calculates, for each of the plurality of deteriorated videos, an average value of quality estimation values of the plurality of deteriorated videos per video frame, and calculating the quality estimation value by inputting the average value to a function converting the average value into a quality estimation value in which influence of subjective quality on a frame rate is considered, the function being created in advance for a plurality of the frame rates.
8. A non-transitory computer-readable recording medium having stored therein a program causing the computer to perform the method according to claim 5.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0020]
[0021]
[0022]
[0023]
[0024]
DESCRIPTION OF EMBODIMENTS
[0025] Hereinafter, embodiments of the present disclosure will be described with reference to the drawings.
[0026] A program that achieves processing in the mathematical model deriving apparatus 10 is provided on a recording medium 101 such as a flexible disk or a compact disc read-only memory (CD-ROM). When the recording medium 101 storing the program is set in the drive device 100, the program is installed in the auxiliary storage device 102 from the recording medium 101 via the drive device 100. However, the program does not necessarily have to be installed from the recording medium 101 and may be downloaded from another computer via a network. The program may also be installed as a part of another program. The auxiliary storage device 102 stores the installed program and also stores necessary files, data, and the like.
[0027] The memory device 103 reads and stores the program from the auxiliary storage device 102 when the program is instructed to start. The CPU 104 executes a function relating to the mathematical model deriving apparatus 10 according to the program stored in the memory device 103. The interface device 105 is used as an interface for connection to a network.
[0028]
[0029] Hereinafter, a processing procedure executed by the mathematical model deriving apparatus 10 will be described.
[0030] Firstly, the encoding unit 11 receives an input of a plurality of combinations of one original video (a pre-encoding video of approximately 10 seconds), one or more codec settings (for example, the profile, number of encoding passes, GoP size, motion estimation range, and the like), and a plurality of encoding parameters for each of the codec settings (for example, the resolution, frame rate, bit rate). The encoding unit 11 generates a video after the original video is encoded (hereinafter, referred to as “deteriorated video”) for each of the codec settings, based on a plurality of combinations of encoding parameters for the codec setting, and outputs the original video and a plurality of deteriorated videos to the quality estimation unit 12 (S101). In other words, the deteriorated video is generated for each combination of encoding parameters. For example, if nine combinations of encoding parameters are input for one codec setting, nine deteriorated videos are generated. Here, for the codec settings and the encoding parameters, for example, values used in actual operation are input. That is, the codec settings and the encoding parameters are generally information that the service provider can know. Similarly, among the encoding parameters, the resolution and the frame rate are set as expected to be used by the service provider.
[0031] However, as for the deteriorated videos, it is desirable that videos of very good quality to very bad quality (that is, a plurality of qualities) are generated. In other words, it is desirable that the encoding parameters are input so that a plurality of deteriorated videos of different qualities are generated. For example, if deteriorated videos are prepared only for combinations of encoding parameters (resolution, frame rate, bit rate) that the service provider actually plans to provide (for example, 3840×2160/60 fps @8 Mbps, 1920×1080/30 fps @2 Mbps, 1280×720/30 fps @1 Mbps, 960×540/30 fps @0.5 Mbps, 480×270/30 fps @0.2 Mbps), overtraining for coefficients of a mathematical model optimized by the deriving unit 13 described later may occur and appropriate quality estimation accuracy may not be guaranteed. In particular, as for the bit rate, using only the bit rates actually used for the service is not sufficient in terms of the number of samples.
[0032] Thus, when bit rate is determined for the combinations of resolution and frame rate assumed by the service provider (for example, “3840×2160 pixels, 60 fps”, “3840×2160 pixels, 30 fps”, “1920×1080 pixels, 30 fps”, “1280×720 pixels, 30 fps”, “960×540 pixels, 30 fps”, “480×270 pixels, 15 fps”), it is desirable that at least three levels of bit rates, such as bit rates with a high quality, bit rates with medium-level quality, and bit rates with low quality, are set for each combination of resolution and frame rate, by using a quality estimation technique or the like. For example, if four levels of bit rates are set for each of the above six pairs of resolution and frame rate for a certain codec setting, the number of sets of encoding parameters will become 6×4=24 sets. In this case, 24 deteriorated videos are generated for the codec setting.
[0033] As a result, an improvement in the accuracy of optimization for the coefficients of the mathematical model by the deriving unit 13 described later can be expected. That is, if the range of variation in quality of the prepared deteriorated video is not sufficiently wide, estimation accuracy for quality calculated from the mathematical model will become low. To avoid such an issue, as described above, it is necessary to prepare deteriorated videos with uneven quality in an appropriate range. Further, although the bit rate has been described as an example in the above, the quality also changes depending on the change in the resolution and the frame rate, and thus, it is desirable to set at least three levels of resolution to take into account quality change depending on resolution and similarly, at least three levels of frame rate so as to provide different qualities.
[0034] Subsequently, the quality estimation unit 12 receives as inputs the original video and a plurality of deteriorated videos and calculates an estimated value of the video quality of each deteriorated video (hereinafter, simply referred to as “quality estimation value”) (S102). As for the quality estimation techniques, for example, Video Multi-method Assessment Fusion (VMAF) (https://github.com/Netflix/vmaf), Peak Signal-to-Noise Ratio (PSNR), and the like can be used. In the present embodiment, an example using the full-reference model is described, but it is also possible to derive a quality estimation value using a reduced-reference model or a no-reference model. If the no-reference model is used, the original video need not necessarily be input to the quality estimation unit 12. This is because the original video is not required in this case.
[0035] The quality estimation unit 12 calculates, for example, an average value of the quality estimation values of all video frames of the deteriorated video, as the quality estimation value of the deteriorated video. In other words, one quality estimation value is calculated for one deteriorated video.
[0036] However, when the average value of the quality estimation values for video frames is used as the quality estimation value of the deteriorated video, it is not possible to capture the decline in the perceived quality (subjective quality) due to the decrease in the frame rate. This point is also clear from
[0037] For example, frame rates such as 60 fps, 50 fps, 30 fps, 25 fps, 24 fps, 15 fps, and the like are often used. Thus, for each of the frame rates, a correspondence relationship between the subjective quality of each video associated with the frame rate and the average value of the quality estimation values for all frames of each video associated with the frame rate is mapped with a quadratic function, a cubic function, a logarithmic function, an exponential function, a logistic function, or the like (for example, mapping functions 1 to 4 below), using the relationship illustrated in
[0038] Further, a mapping function may be created for each original video. For example, a mapping function 1 indicating the relationship of the subjective quality with the original video 1 (that is, the relationship illustrated in
[0039] Further, in general, the quality estimation value is derived as a value from 1 to 5, or from 0 to 100. Here, if the quality estimation value derived by the quality estimation unit 12 is 0 to 100, while the quality measurement value that can be accepted by the deriving unit 13 is 1 to 5, the above-described mapping function is created to output quality estimation values from 1 to 5.
[0040] Examples of the mapping function are described below.
MOSfromVMAF=ai+biVMAF+ciVMAF.sup.2 Mapping Function 1
[0041] Here, VMAF is the average value of the quality estimation values calculated for frames for the deteriorated video, which are calculated by using VMAF, ai, bi, and ci are coefficients for each frame rate, i is the frame rate, and MOSfromVMAF is a quality estimation value derived by the mapping function.
MOSfromPSNR=aj+(1−aj)/(1+(PSNR/bj).sup.cj) Mapping Function 2
[0042] Here, PSNR is the average value of the PSNR calculated for frames for the deteriorated video, aj, bj, and cj are coefficients for each original video, j is an original video number, and MOSfromPSNR is a quality estimation value derived by the mapping function.
MOSfromPSNR=aij+(1−aij)/(1+(PSNR/bij).sup.cij) Mapping Function 3
Here, PSNR is the average value of the PSNR calculated for frames for the deteriorated video, aij, bij, and cij are coefficients for each frame rate and the original video, i is the frame rate, j is the original video number, and MOSfromPSNR is a quality estimation value derived by the mapping function.
[0043] Although the calculation of the mapping function 1 (MOSfromVMAF) is represented by a quadratic function, the mapping function 1 may be represented by a logistic function like the mapping function 2 or 3.
[0044] As described above, in a technique such as VMAF by which the influence of the original video on the subjective quality can be grasped, the quality estimation values of 1 to 5 can be estimated by using only the mapping function in which the influence of the frame rate on the subjective quality illustrated in
[0045] Subsequently, for each codec setting, the deriving unit 13 derives (optimizes) the coefficients of a mathematical model preset as the mathematical model for estimating a video quality VQ, based on the plurality of sets of encoding parameters input for the codec setting, and the quality estimation value calculated for each of the sets of the encoding parameters (S103). That is, the (coefficients of) the mathematical model is derived for each codec setting.
[0046] The coefficient optimization method can be derived, based on the least squares method, the Solver of Microsoft (registered trademark) Excel (registered trademark), or an optimization method using Python or R. However, in the present embodiment, a mathematical model for calculating the video quality VQ from the video bit rate br, the resolution rs, and the frame rate fr is used (NPL 4). The mathematical model calculates the video quality VQ using the following mathematical equations in consideration of the theoretical highest/maximum video quality X determined for each set of the resolution and frame rate, the characteristic that the highest/maximum video quality X declines as the resolution rs and frame rate fr decrease, and the characteristic that the video quality VQ declines as the video bit rate br decreases with respect to the highest/maximum video quality X.
rs is a resolution obtained from the number of lines and the number of pixels in the vertical and horizontal directions (for example, the total number of pixels such as 1920×1080). However, when it is possible to know only the number of lines in the vertical direction or the number of pixels in the horizontal direction, rs is the resolution calculated by a known method from the number of lines or the number of pixels. fr is the frame rate. v.sub.1, . . . v.sub.7 are coefficients being the derivation targets (optimization targets).
[0047] Specifically, for each codec setting, the deriving unit 13 associates the plurality of quality estimation values calculated for the codec setting, with the video quality VQ, and associates the plurality of combinations of encoding parameters input to the codec setting, with br, rs, and fr, and thereby derives (optimizes) the coefficients of a mathematical model.
[0048] Although only one original video is used in the above described example, a plurality of original videos (for example, eight or more types) may be input to the encoding unit, a deteriorated video after encoding may be generated for a plurality of original videos, based on the same codec setting and the same combinations of encoding parameters, and for a plurality of original videos, a quality estimation value may be calculated for each combination of the encoding parameters. By doing so, the number of samples input to the deriving unit 13 can be increased, and the optimization of the coefficient can be stabilized. In this optimization procedure, the coefficients are derived before the quality estimation is actually performed, and thus there is enough time to perform the calculation.
[0049] As described above, according to the present embodiment, it is possible to derive a mathematical model capable of evaluating the quality according to the codec setting. For example, when the quality that changes for each codec setting is estimated, it is possible to perform comparative evaluation for the quality even if the encoding parameters are the same.
[0050] There are existing issues in estimating video quality for different codec settings, such as an issue that the existing technique (parametric model) receives only the encoding parameters as input and thus does not allow comparative evaluation of video qualities (Issue 1), an issue that the existing technique (parametric model) does not describe any guidance or handling method for comparative evaluation of quality values estimated based on a plurality of codec settings, and thus a user of the parametric model cannot optimize the parametric model to each codec setting and use the parametric model (Issue 2), and the issue that subjective quality values cannot be prepared mechanically when subjective quality evaluation is conducted to obtain subjective quality values for each setting parameter (Issue 3).
[0051] In contrast, in the present embodiment, by optimizing, in advance, the coefficients of the mathematical model of the parametric model for the codec setting for which the video quality is to be estimated, the video quality for each codec setting can be derived, and comparative evaluation of quality can be performed. In addition, the present embodiment, by describing a specific optimization procedure, it is possible to provide an optimization technique to a user. It is also possible to prepare the quality estimation values mechanically without conducting subjective evaluation and apply the quality estimation values to optimization.
[0052] Thus, the present embodiment makes it possible to easily determine whether quality of a service being provided is maintained at a certain level or greater for the viewer by monitoring the quality values of the video communication service actually viewed by the user using coefficients optimized for each codec setting. As a result, it is possible to know and manage the actual condition of quality of the service being provided in real time.
[0053] Thus, it is possible to improve the points concerning the grasping and management of the actual condition of quality of the service being provided, which could not be handled by the related art.
[0054] Although the embodiments of the present disclosure have been described in detail above, the present disclosure is not limited to such specific embodiments, and various modifications and change can be made within the scope of the gist of the present disclosure described in the claims.
[0055] This application claims priority based on the International Patent Application No. PCT/JP2020/011195, filed on Mar. 13, 2020, and the entire contents of the international patent application are incorporated herein by reference.
REFERENCE SIGNS LIST
[0056] 10 Mathematical model deriving apparatus
[0057] 11 Encoding unit
[0058] 12 Quality estimation unit
[0059] 13 Deriving unit
[0060] 100 Drive device
[0061] 101 Recording medium
[0062] 102 Auxiliary storage device
[0063] 103 Memory device
[0064] 104 CPU
[0065] 105 Interface device
[0066] B Bus