QoE-BASED ADAPTIVE ACQUISITION AND TRANSMISSION METHOD FOR VR VIDEO
20220006851 · 2022-01-06
Inventors
- JIE LI (HEFEI, CN)
- CONG ZHANG (HEFEI, CN)
- LING HAN (HEFEI, CN)
- ZHI LIU (HEFEI, CN)
- QIYUE LI (HEFEI, CN)
- RANSHENG FENG (HEFEI, CN)
Cpc classification
H04N19/166
ELECTRICITY
International classification
Abstract
The present application discloses a QoE-based adaptive acquisition and transmission method for VR video, comprising the following steps: 1, capturing, by respective cameras in a VR video acquisition system, original videos with the same bit rate level, and compressing each original video with different bit rate levels; 2, selecting, by a server, a bit rate level for each original video for transmission, and synthesizing all of the transmitted original videos into a complete VR video; 3, performing, by the server, a segmentation process on the synthesized VR video, and compressing each video block into different quality levels; and 4, selecting, by the server, a quality level and an MCS scheme for each video block according to real-time viewing angle information of users and downlink channel bandwidth information in a feedback channel, and transmitting each video block to a client.
Claims
1. A QoE-based adaptive acquisition and transmission method for VR video, applied in a network environment comprising C cameras, a VR video server and N clients; a transmission between the cameras and the VR video server being performed through an uplink, a transmission between the VR video server and the clients being performed through a downlink; and the downlink comprising a feedback channel from the clients to the VR video server; wherein the adaptive acquisition and transmission method for VR video is conducted as follows: 1 step 1, denoting C original videos taken by C cameras as {V.sub.1, V.sub.2, . . . , V.sub.c, . . . , V.sub.C} in the network environment, wherein V.sub.c represents an original video taken by a c-th camera, wherein 1≤c≤C; compressing the c-th original video V.sub.c into original videos with E bit rate levels, denoted as {V.sub.c.sup.1, V.sub.c.sup.2, . . . , V.sub.c.sup.e, . . . , V.sub.c.sup.E}, wherein V.sub.c.sup.e represents an original video with e-th bit rate level obtained after compressing the c-th original video V.sub.c, wherein 1≤e≤E; step 2, establishing an objective function with a goal of maximizing a total utility value constituted with a sum of quality of experiences QoEs of N clients, and setting corresponding constraint conditions, thereby establishing an adaptive acquisition and transmission model for VR video; step 3, solving the adaptive acquisition and transmission model for VR video with a KKT condition and a hybrid branch and bound method to obtain an uplink collecting decision variable and a downlink transmitting decision variable in the network environment; step 4, selecting, by the VR video server, an original video with the e-th bit rate level for the c-th camera according to a value of the uplink collecting decision variable χ.sub.c,e.sup.UL, and receiving the original video of the e-th bit rate level selected by the c-th camera uploaded through the uplink, so that the VR video server receives original videos of corresponding bit rate levels selected by C cameras respectively; step 5, performing, by the VR video server, a stitching and mapping process on the original videos with C corresponding bit rate levels to synthesize a complete VR video; step 6, performing, by the VR video server, a segmentation process on the complete VR video to obtain T video blocks, denoted as {T.sub.1, T.sub.2, . . . , T.sub.t, . . . , T.sub.T}, wherein T.sub.t represents any t-th video block, and 1≤t≤T; wherein the VR video server provides D bit rate selections for the t-th video block T.sub.t for a compressing process, thereby obtaining compressed video blocks with D different bit rate levels, denoted as {T.sub.t.sup.1, T.sub.t.sup.2, . . . , T.sub.t.sup.d, . . . , T.sub.t.sup.D}, T.sub.t.sup.d represents a compressed video block with a d-th bit rate level obtained after the t-th video block T.sub.t is compressed, wherein 1≤d≤D; step 7, assuming that a modulation and coding scheme in the network environment is {M.sub.1, M.sub.2, . . . , M.sub.m, . . . , M.sub.M}, wherein M.sub.m represents an m-th modulation and coding scheme, and 1≤m≤M; and selecting, by the VR video server, the m-th modulation and coding scheme for the t-th video block T.sub.t; and selecting, by the VR video server, the compressed video block T.sub.t.sup.d with the d-th bit rate level of the t-th video block T.sub.t for any n-th client according to a value of the downlink transmitting decision variable χ.sub.t,d,m.sup.DL, and transmitting the selected compressed video block T.sub.t.sup.d with the d-th bit rate level of the t-th video block T.sub.t to the n-th client through the downlink with the m-th modulation and coding scheme; so that the n-th client receives compressed video blocks with corresponding bit rate levels of T video blocks through the corresponding modulation and coding scheme; and step 8, performing, by the n-th client, decoding, mapping, and rendering process on the received compressed video blocks with corresponding bit rate levels of the T video blocks, so as to synthesize a QoE-optimized VR video.
2. The adaptive acquisition and transmission method for VR video according to claim 1, wherein the step 2 is performed as follows: step 2.1, establishing the objective function with formula (1):
3. The adaptive acquisition and transmission method for VR video according to claim 2, wherein the step 3 is performed as follows: step 3.1, performing a relaxation operation on the collecting decision variables χ.sub.c,e.sup.UL and the transmitting decision variables χ.sub.t,d,m.sup.DL of the adaptive acquisition and transmission model for VR video, and obtaining a continuous collecting decision variable and a continuous transmitting decision variable within a scope of [0,1], respectively; step 3.2, according to the constraint conditions of formula (2)-formula (7), denoting
Description
BRIEF DESCRIPTION OF DRAWINGS
[0054]
[0055]
DESCRIPTION OF EMBODIMENTS
[0056] In this embodiment, a QoE-based adaptive acquisition and transmission method for a VR video, as shown in
[0057] step 1, denoting C original videos taken by C cameras as {V.sub.1, V.sub.2, . . . , V.sub.c, . . . , V.sub.C} in an application network environment, where V.sub.c represents an original video taken by a c-th camera, where 1≤c≤C;
[0058] obtaining E original videos with different bit rate levels after compressing the original video V.sub.c taken by the c-th camera, where V.sub.c.sup.e represents an original video with e-th bit rate level obtained after compressing the original video V.sub.c taken by the c-th camera C.sub.c, where 1≤e≤E;
[0059] step 2, establishing an objective function with a goal of maximizing a total utility value constituted with a sum of quality of experiences QoEs of N clients, and setting corresponding constraint conditions, thereby establishing an adaptive acquisition and transmission model for VR video with formula (1) formula (7);
[0060] The objective function:
[0061] formula (1) represents the sum of QoEs of N clients, which is the total utility value of the system; in formula (1), λ.sub.t,d.sup.DL represents a bit rate of a video block t with a quality level of d; λ.sub.t,D.sup.DL represents a bit rate when the video block t is transmitted at a highest quality level D; T.sub.FoV.sup.n represents a video block covered in a FoV of a n-th client; when χ.sub.t,d,m.sup.DL=1, it means that a t-th video block is transmitted to the client through the downlink at a d-th bit rate level and an m-th modulation and coding scheme; and when χ.sub.t,d,m.sup.DL=0, it means that the t-th video block is not transmitted to the client through the downlink at the d-th bit rate level and the m-th modulation and coding scheme;
[0062] The constraint conditions:
[0063] formula (2) indicates that any c-th camera can select an original video of only one bit rate level to upload to the server; in formula (2), when χ.sub.c,e.sup.UL=1, it means that the c-th camera uploads an original video at e-th bit rate level to the server, and when χ.sub.c,e.sup.UL=0, it means that the c-th camera does not upload an original video at e-th bit rate level to the server;
[0064] formula (3) indicates that a total bit rate of the transmitted C videos should not exceed a total bandwidth of the entire uplink channel; in formula (3), BW.sup.UL represents a value of the total bandwidth of the uplink channel;
[0065] formula (4) indicates that when any t-th video block is transmitted to the client through the downlink at d quality level, only one modulation and coding scheme can be selected;
[0066] formula (5) indicates that when any t-th video block is transmitted to the client through the downlink with the m-th modulation and coding scheme, the transmitted video block can select only one bit rate level;
[0067] formula (6) indicates that a total bit rate of all video blocks transmitted does not exceed a bit rate that all resource blocks in the entire downlink channel can provide; in formula (6), R.sub.m.sup.DL indicates a bit rate that can be provided by single resource block when the m-th modulation and coding scheme is selected, Y.sup.DL represents a total number of all resource blocks in the downlink channel;
[0068] formula (7) indicates that a bit rate of any t-th video block in the downlink of the network environment is not greater than a bit rate of an original video taken by any c-th camera in the uplink.
[0069] Step 3, solving the adaptive acquisition and transmission model for VR video with a KKT condition and a hybrid branch and bound method to obtain an uplink collecting decision variable and a downlink transmitting decision variable in the network environment;
[0070] step 3.1, performing a relaxation operation on the collecting decision variable χ.sub.c,e.sup.UL and the transmitting decision variable χ.sub.t,d,m.sup.DL of the adaptive acquisition and transmission model for VR video, and obtaining a continuous collecting decision variable and a continuous transmitting decision variable within a scope of [0,1], respectively;
[0071] step 3.2, according to the constraint conditions of formula (2)-formula (7), denoting
as a function h.sub.1(χ.sub.c,e.sup.UL); denoting
as a function h.sub.2(χ.sub.t,d,m.sup.DL); denoting
as a function h.sub.3(χ.sub.t,d,m.sup.DL); denoting
as a function g.sub.1(χ.sub.c,d.sup.DL); denoting
as a function g.sub.2(χ.sub.t,d,m.sup.DL); denoting
as a function g.sub.1(χ.sub.c,e.sup.DL,χ.sub.t,d,m.sup.DL); and a Lagrangian function L(λ.sub.c,e.sup.UL,χ.sub.t,d,m.sup.DL,λ,μ) of a relaxed adaptive acquisition and transmission model for VR video is calculated with formula (8) as:
[0072] in the formula (8), λ represents a Lagrangian coefficient of equality constraint conditions in formulas (2)-(7), μ represents a Lagrangian coefficient of inequality constraint conditions in formulas (2)-(7), λ.sub.1 represents a Lagrangian coefficient of the function h.sub.1(χ.sub.c,e.sup.UL), λ.sub.2 represents a Lagrangian coefficient of the function h.sub.2(χ.sub.t,d,m.sup.DL); λ.sub.3 is a Lagrangian coefficient of the function h.sub.3(χ.sub.t,d,m.sup.DL), μ.sub.1 is a Lagrangian coefficient of the function g.sub.1(χ.sub.c,e.sup.DL), μ.sub.2 is a Lagrangian coefficient of the function g.sub.2(χ.sub.t,d,m.sup.DL), and μ.sub.3 is a Lagrangian coefficient of the function g.sub.1(χ.sub.c,e.sup.DL,χ.sub.t,d,m.sup.DL), and QoE.sub.n represents quality of experience of the n-th client, and:
[0073] step 3.3, obtaining the KKT conditions of the relaxed adaptive acquisition and transmission model for VR video as shown in formulas (10)-(15) below according to the Lagrangian function L(λ.sub.c,e.sup.UL,χ.sub.t,d,m.sup.DL,λ,μ) of formula (8):
[0074] Formulas (10) and (11) represent necessary conditions when an extreme value of the Lagrangian function L(λ.sub.c,e.sup.UL,χ.sub.t,d,m.sup.DL,λ,μ) is taken; formulas (12) and (13) represent constraint conditions of the functions h.sub.1(χ.sub.c,e.sup.UL), h.sub.2(χ.sub.t,d,m.sup.DL), h.sub.3(χ.sub.t,d,m.sup.DL), g.sub.1(χ.sub.c,e.sup.DL), g.sub.2(χ.sub.t,d,m.sup.DL), g.sub.3(χ.sub.c,e.sup.DL,χ.sub.t,d,m.sup.DL); formula (14) represents constraint conditions of the Lagrangian coefficients λ.sub.1, λ.sub.2, λ.sub.3, μ.sub.1, μ.sub.2, μ.sub.3; and formula (15) represents a complementary relaxation condition.
[0075] Solving the formulas (10)-(15), and obtaining an optimal solution χ.sub.relax and an optimal total utility value Z.sub.relax of the relaxed adaptive acquisition and transmission model for VR video; where the optimal solution χ.sub.relax includes relaxed optimal solutions of the collecting decision variable χ.sub.c,e.sup.UL and the transmitting decision variable χ.sub.t,d,m.sup.DL;
[0076] step 3.4, using the optimal solution χ.sub.relax and the optimal total utility value Z.sub.relax as initial input parameters of the hybrid branch and bound method;
[0077] step 3.5, defining the number of branches in the algorithm as k, defining a lower bound of the optimal total utility value in the algorithm as L, and defining an upper bound of the optimal total utility value of in the algorithm as U;
[0078] determining an output parameter of the hybrid branch and bound method:
[0079] let χ.sub.0-1 denote an optimal solution of a non-relaxed adaptive acquisition and transmission model for VR video;
[0080] let Z.sub.0-1 denote an optimal total utility value of the non-relaxed adaptive acquisition and transmission model for VR video;
[0081] step 3.6, initializing k=0;
[0082] step 3.7, initializing L=0;
[0083] step 3.8, initializing U=Z.sub.relax;
[0084] step 3.9, denoting an optimal solution of the k-th branch as χ.sub.k and denoting a corresponding optimal total utility value as Z.sub.k, assigning a value of χ.sub.relax to χ.sub.k, and using the optimal solution χ.sub.k of the k-th branch as a root node;
[0085] step 3.10, determining whether there is a solution of χ.sub.k that does not meet a 0-1 constraint condition, if there is, dividing a relaxed optimal solution of χ.sub.k into a solution that meets the 0-1 constraint condition and a solution χ.sub.k(0,1) that does not meet the 0-1 constraint condition, and going to step 3.12; otherwise, expressing χ.sub.k as the optimal solution of the non-relaxed adaptive acquisition and transmission model for VR video;
[0086] step 3.11, generating randomly, a random number ε.sub.k for the k-th branch within the range of (0,1), and determining whether 0<χ.sub.k(0,1)<ε.sub.k is true; if true, adding the constraint condition “χ.sub.k(0,1)=0” to the non-relaxed adaptive acquisition and transmission model for VR video to form a sub-branch I of the k-th branch; otherwise, adding a constraint condition “χ.sub.k(0,1)=1” to the non-relaxed adaptive acquisition and transmission model for VR video to form a sub-branch II of the k-th branch;
[0087] step 3.12, solving the relaxed solutions of the sub-branch I and the sub-branch II of the k-th branch with the KKT condition, and using them as an optimal solution χ.sub.k+1 and an optimal total utility value Z.sub.k+1 to the (k+1)-th branch, where the χ.sub.k+1 includes: relaxed solutions of the sub-branch I and the sub-branch II of the (k+1)-th branch;
[0088] step 3.13, determining whether the optimal solution χ.sub.k+1 of the (k+1)-th branch meets the 0-1 constraint condition, if so, finding a maximum value from the optimal total utility value Z.sub.k+1 and assigning it to L, and χ.sub.k+1∈{0,1}; otherwise, finding the maximum value from the optimal total utility value Z.sub.k+1 and assigning it to U, and χ.sub.k+1∈(0,1);
[0089] step 3.14, determining whether Z.sub.k+1<L is true; if so, cutting off the branch where the optimal solution χ.sub.k+1 of the (k+1)-th branch is located, assigning k+1 to k, and returning to step 3.10; otherwise, going to step 3.15;
[0090] step 3.15, determining whether Z.sub.k+1>L is true; if so, assigning k+1 to k, and returning to step 3.10; otherwise, going to step 3.16;
[0091] step 3.16, determining whether Z.sub.k+1=L is true, if so, it means that the optimal solution of the non-relaxed adaptive acquisition and transmission model for VR video is the optimal solution χ.sub.k+1 of the (k+1)-th branch, and assigning χ.sub.k+1 to an optimal solution χ.sub.0-1 of the non-relaxed adaptive acquisition and transmission model for VR video, assigning Z.sub.k+1 corresponding to the χ.sub.k+1 to an optimal total utility value Z.sub.0-1 of the non-relaxed adaptive acquisition and transmission model for the VR video; otherwise, assigning k+1 to k, and returning to step 3.10.
[0092] step 4, selecting, by the VR video server, an original video with the e-th bit rate level for the c-th camera according to the value of the uplink collecting decision variable χ.sub.c,e.sup.UL, and receiving the original video of the e-th bit rate level selected by the c-th camera uploaded through the uplink, so that the VR video server receives original videos of corresponding bit rate levels selected by C cameras respectively;
[0093] step 5, performing, by the VR video server, a stitching and mapping process on the original videos with C corresponding bit rate levels to synthesize a complete VR video;
[0094] step 6, performing, by the VR video server, a segmentation process on the complete VR video to obtain T video blocks, denoted as {T.sub.1, T.sub.2, . . . , T.sub.t, . . . , T.sub.T}, where T.sub.t represents any t-th video block, and 1≤t≤T;
[0095] the VR video server provides D bit rate selections for the t-th video block T.sub.t for a compressing process, thereby obtaining compressed video blocks with D different bit rate levels, denoted as {T.sub.t.sup.1, T.sub.t.sup.2, . . . , T.sub.t.sup.d, . . . , T.sub.t.sup.D}, where T.sub.t.sup.d represents a compressed video block with the d-th bit rate level obtained after the t-th video block T.sub.t is compressed, where 1≤d≤D.
[0096] step 7, assuming that a modulation and coding scheme in the network environment is {M.sub.1, M.sub.2, . . . , M.sub.m, . . . , M.sub.M}, where M.sub.m represents the m-th modulation and coding scheme, and 1≤m≤M; and selecting, by the VR video server, the m-th modulation and coding scheme for the t-th video block T.sub.t; and
[0097] selecting, by the VR video server, the compressed video block T.sub.t.sup.d with the d-th bit rate level of the t-th video block T.sub.t for any n-th client according to a value of the downlink transmitting decision variable χ.sub.t,d,m.sup.DL, and transmitting the selected compressed video block T.sub.t.sup.d with the d-th bit rate level of the t-th video block T.sub.t to the n-th client through the downlink with the m-th modulation and coding scheme; so that the n-th client receives compressed video blocks with corresponding bit rate levels of T video blocks through the corresponding modulation and coding scheme;
[0098] step 8, performing, by the n-th client, decoding, mapping, and rendering process on the received compressed video blocks with corresponding bit rate levels of T video blocks, so as to synthesize a QoE-optimized VR video.