TEMPORAL DOMAIN RATE DISTORTION OPTIMIZATION BASED ON VIDEO CONTENT CHARACTERISTIC AND QP-Lambda CORRECTION

Abstract

A temporal domain rate distortion optimization based on video content characteristic and QP-λ correction provides the temporal domain rate distortion optimization based on the video content characteristic and the QP-λ correction for a new generation encoder AV1, wherein according to a previous temporal domain dependency relationship under an HEVC-RA coding structure, a feature of the new generation encoder AV1 and a video sequence feature, an aggregation distortion of a current coding unit and an affected future coding unit is estimated and to propagation factor of the current coding unit in a temporal domain distortion propagation model is calculated by constructing a temporal domain distortion propagation chain, wherein a Lagrange multiplier is adjusted through a more accurate propagation factor to realize a temporal domain dependency rate distortion optimization, and a relationship of QP-λ is re-corrected and an I frame is adjusted to achieve a better coding effect

Claims

1. A temporal domain rate distortion optimization method based on a video content characteristic and QP-λ correction, comprising the following steps: S1: establishing a temporal domain propagation chain according to a temporal domain dependency relationship in an AV1 default coding structure, finding a matching block affected by each original coding block through a forward motion search, and recording a corresponding original motion compensation error and a corresponding motion vector; S2: defining a Lagrange multiplier as λ.sub.new and a quantification step size as Qstep, then counting a Lagrange multiplier λ of a different sequence of a different quantification parameter (QP) and a corresponding quantification step size Qstep according to a built-in correspondence list of the QP and the quantification step size Qstep of an encoder, and constructing a relationship model between the Lagrange multiplier λ.sub.new and the quantification step size Qstep as follows: $λ_{new} = 3.667 * {Qstep}^{2} - 5.198 e - 07 * Qstep - 0.664;$ ${\begin{matrix} λ_{org} = 1.1 * λ_{org}, & .Math. λ_{org} - λ_{new} .Math. > 100 or .Math. λ_{org} - λ_{new} .Math. < 0.05 \\ λ_{org} = 0.95 * λ_{org}, & .Math. λ_{org} - λ_{new} .Math. <= 3 \end{matrix};$ wherein λ.sub.org is a Lagrange multiplier in the encoder; classifying an original video sequence, calculating a sum of absolute values of difference values of subsequent 10, 20, 30 . . . frames relative to an initial first frame by a frame difference method, evaluating an average value of a pixel grade of a cumulative sum as E, and adjusting different QPS, λ adjustment ranges and corresponding α and I frame QP for an obtained result according to a threshold: ${SAD}_{i} = .Math. .Math. p_{0} - p_{10 * i} .Math.;$ $E = \frac{{.Math.}_{i = 1}^{F / 10} {SAD}_{i}}{W * H * F / 10};$ ${\begin{matrix} constrainQPrange = 6, & E < 20 \\ constrainQPrange = 2, & 20 \leq E < 100 \\ constrainQPrange = 3, & else \end{matrix}; {\begin{matrix} (0.25 * λ_{org}, 4 * λ_{org}), & E < 20 \\ (0.63 * λ_{org}, 1.59 * λ_{org}), & 20 \leq E < 100 \\ (0.5 * λ_{org}, 2 * λ_{org}), & else \end{matrix}; α = clip 3 (0.90, 0.98, 1 - 0.006 * (0.8 * E - 20)); {\begin{matrix} Q P_{0} - 12 & E < 20 \\ Q P_{0} - 4 & else \end{matrix};$ wherein SAD.sub.i refers to a sum of an absolute value of an i.sup.th difference value, p.sub.0 represents a pixel value of an initial frame, p.sub.10*i represents pixel values of the subsequent 10, 20, 30 . . . frames, F represents a total frame number of the original video sequence, W represents a width of the original video sequence-, H represents a height of the original video sequence, constrainQPrange represents a maximum adjustable range of the QP, QP.sub.0 represents a QP (0-255) of an frame, α is a coefficient, and a clip3( ) function is used to limit a calculation result of 1-0.0006* (0.8*E-20) to be 0.90-0.98; S3: before a current frame is actually coded, calculating a propagation factor of each coding block 16×16 of the current frame by utilizing the corresponding original motion compensation error and the corresponding motion vector obtained in S1, evaluating an average propagation factor for each Superblock by an harmonic averaging, sorting screen content sequences by a built-in variable of AV1, wherein the AV1 adopts a two-pass coding by default, and pertinently adjusting Lagrange multipliers of different video sequences with an adjustment range obtained in S2; S4: according to the relationship model obtained in the step S2 and λ.sub.new being a Lagrange multiplier calculated by the relationship model, calculating a difference value of the Lagrange multipliers λ.sub.org and λ.sub.new in the encoder, and for different difference values, correcting the Lagrange multiplier λ.sub.org in the encoder by utilizing a relationship model formula; and S5: coding a frame with rPOC being 16 by a special ALT frame in the AV1, wherein the frame with rPOC being 16 is located at a temporal domain level TL1, and distortion from the temporal domain level TL1 affect the subsequent multi-frame distortion, scaling and performing QP-λ correction on the special ALT frame on the basis that an AV1 encoder adjusts a block-level Lagrange multiplier in the special ALT frame, thereby improving the coding efficiency.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0034] FIG. 1 is a rate distortion curve;

[0035] FIG. 2 is an AV1 default coding structure;

[0036] FIG. 3 is a main temporal domain dependency relationship in an AV1;

[0037] FIG. 4 is a construction schematic diagram of a temporal domain distortion propagation chain; and

[0038] FIG. 5 is a rate distortion curve of a BasketballDrill sequence.

DETAILED DESCRIPTION OF THE EMBODIMENTS

[0039] In order to make the object, technical solutions and advantages of the present invention clearer, the technical solution of the present invention is described below in detail with reference to the accompanying drawings and embodiments. The development environment adopted by the embodiments is Visual Studio 2015, and the embodiments are implemented on the basis of AV1 reference software libaom-1.0.

[0040] In order to simplify the implementation method of a global rate distortion algorithm, a global Lagrange multiplier λ.sub.g may be directly modified in the AV1 through a propagation factor κ.sub.i. The subsequent coding unit is not really coded when deducing a propagation factor κ.sub.i, so it is necessary to estimate the distortion of the subsequent coding unit.

[0041] Under the condition of high code rate, the coding distortion of the subsequent coding unit may be represented by a formula: i+1=e.sup.−bRi+1.Math.D.sub.i+1.sup.MCP. Since the coding unit B.sub.i+1 is not coded, R.sub.i+1 cannot be obtained and D.sub.i+1 cannot be calculated by a formula, but the coding distortion of B.sub.i+1 under the quantification step size Q.sub.step may be represented as D.sub.i+1=D.sub.i+1.sup.MCP.Math.F(θ),

F(θ)=D.sub.i+1/D.sub.i+1.sup.MCP=e.sup.−bR.sup.i+1 (1.17)

[0042] wherein θ=√{square root over (2)}Q.sub.step/√{square root over (D.sup.MCP)}. A F(θ) curve may be fit through a large number of experiments of the quantification step size and the coding unit. The F(θ) curve of the previous algorithm aims at the HEVC encoder and is no longer applicable to the AV1. An experiment is performed on the AV1 again to obtain a new curve, points on the curve are sampled, and a query table of F(θ) and θ is established, so that the distortion of the coding block is estimated. Meanwhile, according to the present invention, α which is previously set as a fixed value is changed into a which is adaptive according to the video sequence.

[0043] The main steps of the present invention include:

[0044] Step 1: a temporal domain propagation chain (as shown in FIG. 4) is established. according to a main temporal domain dependency relationship in an AV1 default coding structure, a matching block affected by each original coding block is found through forward motion search, and a corresponding OMCP and a corresponding motion vector are recorded.

[0045] Step 2: A Lagrange multiplier in a relationship model is defined as λ.sub.new and quantification step size Q.sub.step, the Lagrange multiplier of different sequences of different QPs and the corresponding quantification step size Q.sub.step are counted according to a built-in correspondence list of the QP and the quantification step size Qstep of an encoder, a relationship model between the Lagrange multiplier λ.sub.new and the quantification step size Qstep is constructed, and the obtained relationship model is represented by formulas (1.9)-(1.10).

[0046] Step 3: An original video sequence is briefly classified, the sum of absolute values of difference values of the subsequent 10, 20, 30 . . . similar frames relative to an initial first frame is calculated by a similar frame difference method, and an average value of a pixel level of the cumulative sum is finally evaluated. Different QPs, λ adjustment ranges and corresponding α and I frame QP are set for the obtained result according to a threshold and are represented by formulas (1.11)-(1.16).

[0047] Step 4: Before a current frame is actually coded, a propagation factor of each coding block 16×16 of the current frame is calculated by using the original motion compensation error and the motion vector obtained in S1, and an average propagation factor of each Superblock is evaluated by harmonic averaging.

[0048] Screen content sequences are distinguished by built-in variable of the AV1 because AV1 adopts 2-pass coding by default, and the Lagrange multipliers of different video sequences are pertinently adjusted in combination with the adjustment range obtained in S21.

[0049] Step 5: According to the relationship model obtained in the step S2, λ.sub.new is defined as the Lagrange multiplier calculated by the relationship model, a difference value of the Lagrange multipliers λ.sub.org and λ.sub.new in the encoder is calculated, and for different difference values, the Lagrange multiplier λ.sub.org in the encoder is corrected by relationship model formulas (1.9)-(1.10).

[0050] Step 6: A frame with rPOC being 16 is coded by a special ALT frame in the AV1, the frame with rPOC being 16 is located at a temporal domain level TL1, similar to the key frame in HEVC, and the distortion of the frame will affect the subsequent multi-frame distortion. Therefore, the ALT frame is scaled and subjected to QP-λ correction on the basis that the AV1 encoder adjusts the block-level Lagrange multiplier in th ALT frame, thereby improving the coding efficiency.

[0051] When the temporal domain propagation chain is established, motion search is performed by a block of 16×16, and a propagation factor of each block is calculated. In the AV1, the video sequences with the resolution ratio greater than or equal to 720P are independently divided and coded by a SuperBlock of 128×128 and the video sequences with the solution less than 720P are independently divided and coded by a SuperBlock of 64×64, so the propagation factors of all the blocks of 16×16 in the SuperBlock are averaged as the propagation factor of the SuperBlock, and the SuperBlock-level Lagrange multiplier and QP are adjusted. The I frame adjusts part of the sequences according to the threshold.

[0052] According to the present invention, AV1 reference software libaom-1.0 serves as an experimental platform, the experimental environment is referenced to the common test conditions (CTC) specified by WET, the experiment is only performed under an AV1 default coding structure, the experiment test sequences are 20 video sequences such as Class B, C, D and E suggested and each test sequence uses four QP points (32, 43, 53 and 63) for coding. The reference software is configured by taking the BasketballDrill sequence as an example, --codec=av1-w832-h480--fps=50/1--cpu-used=1--threads=0--profile=0--drop-frame=0--static -thresh=0--sharpness=0--frame-parallel=0--tile-columns=0--end-usage=q-v--cq-level=32--ps nr--limit=500-oBasketballDrill_832×480_50.yuv.ivfBasketballDrill_832×480_50.yuv

TABLE-US-00001 TABLE 1 The test result of the present invention compared with libaom−1.0 Sequence Resolution BD−rate (%) Class name rate Y U V B Kimono 1080P 0.68% −3.4% −4.4% ParkScene 1920 × 1080 −0.12% −3.0% −3.2% Cactus −2.73% −11.3% −9.7% BasketballDrive 0.44% −3.6% −1.4% BQTerrace −0.08% −7.6% −10.2% Average −0.36% −5.8% −5.8% C BasketballDrill WVGA −6.21% −9.1% −7.2% BQMall 832 × 480 −0.29% −3.8% −3.3% PartyScene −1.34% −4.2% −4.0% RaceHorses −0.05% −3.7% −2.3% Average −1.97% −5.2% −4.2% D BasketballPass WQVGA 0.29% −1.1% −0.2% BQSquare 416 × 240 −0.53% −5.2% −3.6% BlowingBubbles −0.51% −3.5% −4.0% RaceHorses 0.30% −2.6% −1.8% Average −0.11% −3.1% −2.4% E FourPeople 720P −4.45% −11.3% −9.1% Johnny 1280 × 720 −5.66% −14.9% −13.7% KristenAndSara −4.99% −13.7% −12.9% Average −5.03% −13.3% −11.9% F BasketballDrillText −5.73% −8.3% −6.8% ChinaSpeed −0.40% −4.4% −3.3% SlideEditing −1.44% −2.5% −2.3% SlideShow −0.42% −2.1% −2.3% Average −2.00% −2.1% −2.3% Overall −1.66% −6.0% −5.3%

[0053] The coding experimental result is shown in Table 1. The table shows the Y component of the test sequence under the AV1 default coding structure achieves 1.66% coding performance. For most test sequences, the performance of the present invention is obviously improved, especially for Class E, the performance is obviously improved, and 5.03% code rate is saved under the Y component. The main reason is that Class E is a video sequence with a relatively fixed scene, each video frame has high similarity and high temporal domain dependency, and the present invention can achieve a better effect for the sequence. In addition, the BasketballDrill sequence saves the code rate by 6.21% under the Y component due to the relatively static background. Then, some sequences are selected, a curve comparison diagram is optimized on the basis of the rate distortion, and the improvement condition of the coding performance of the sequences is observed. As shown in FIG. 5 which is a rate distortion curve diagram of a BasketballDrill sequence, wherein the x-coordinate is the code rate, the y-coordinate is reconstructed peak signal to noise ratio (PSNR), the blue curve is a rate distortion curve of the global rate distortion optimization algorithm, and the red curve is a rate distortion curve of the original libaom-1.0. It may be seen that for the sequence with strong temporal domain dependency, the coding efficiency of the algorithm is obviously improved.

[0054] Similarly, in the aspect of the coding complexity, the coding complexity of the temporal domain rate distortion optimization algorithm under the AV1 default coding structure is reduced by 6% on average, mainly because the adaptive Lagrange multiplier calculated by the algorithm enables the coding unit to achieve better prediction effect. Although it takes a certain amount of time to establish the temporal domain propagation chain, high-quality prediction may reduce the coding residual error, so that a series of processes of transformation, quantification and entropy coding are accelerated, thereby reducing the overall time.

TABLE-US-00002 TABLE 2 The coding time percentage of the present invention compared with libaom-1.0 Sequence Class B Class C Class D Class E Class F ΔEncT 95% 94% 97% 91% 92% 94%

TEMPORAL DOMAIN RATE DISTORTION OPTIMIZATION BASED ON VIDEO CONTENT CHARACTERISTIC AND QP-Lambda CORRECTION

Assignee

Inventors

Cpc classification

Classification Explorer

H04N19/147

ELECTRICITY

Classification Explorer

H04N19/176

ELECTRICITY

Classification Explorer

H04N19/149

ELECTRICITY

Classification Explorer

H04N19/19

ELECTRICITY

Classification Explorer

H04N19/51

ELECTRICITY

Classification Explorer

H04N19/124

ELECTRICITY

International classification

Classification Explorer

H04N19/147

ELECTRICITY

Classification Explorer

H04N19/149

ELECTRICITY

Classification Explorer

H04N19/176

ELECTRICITY

Classification Explorer

H04N19/51

ELECTRICITY

Abstract

Claims

Description