Video streaming method and system
11483368 · 2022-10-25
Assignee
Inventors
Cpc classification
H04L65/65
ELECTRICITY
H04N21/4666
ELECTRICITY
H04N21/8456
ELECTRICITY
H04N21/6587
ELECTRICITY
H04N21/44209
ELECTRICITY
H04N21/4621
ELECTRICITY
International classification
G06F13/00
PHYSICS
H04N21/462
ELECTRICITY
Abstract
A method for streaming a video. The method includes determining a total bitrate for a segment of a video to be received and streamed; predicting a viewpoint of a user for the segment; and determining bitrates for one or more tiles in the segment based on the determined total bitrate and the predicted viewpoint.
Claims
1. A method for streaming a video, comprising: (a) determining, based on adaptive optimization of a quality of experience function, a total bitrate for a segment of the video to be received and streamed, the adaptive optimization of the quality of experience function being based on a deep deterministic policy gradient algorithm; (b) predicting a viewpoint of a user for the segment; and (c) determining bitrates for one or more tiles of a plurality of tiles in the segment based on the determined total bitrate and the predicted viewpoint.
2. The method of claim 1, wherein step (a) and/or step (b) is performed after all tiles in a previous segment of the video have been received.
3. The method of claim 1, wherein the deep deterministic policy gradient algorithm is arranged to process playback states of the user to determine the total bitrate of the segment with an objective to optimize or maximize the quality of experience function.
4. The method of claim 3, wherein the playback states include: past K bitrate records, corresponding download time, predetermined tile bitrate set for the segment, current buffer length, and current freezing length, wherein K is a number larger than 0.
5. The method of claim 3, wherein the quality of experience function includes the following factors: factor associated with visual quality of viewed tiles in the previous segment, factor associated with average quality variation between two segments, playback freezing event factor, and future freezing risk factor.
6. The method of claim 5, wherein one or more of the factors of the quality of experience function is weighted by a respective weighting factor.
7. The method of claim 6, wherein the respective weighting factor is adjustable based on a user input and/or a content of the segment or the video.
8. The method of claim 1, wherein step (b) includes predicting a single-user viewpoint trace for the segment based on a received viewpoint trace of the user.
9. The method of claim 8, wherein the received viewpoint trace of the user comprises a head movement trace and an eye fixation trace.
10. The method of claim 9, wherein step (b) includes predicting a head movement area of the user for the segment and predicting an eye fixation area of the user for the segment.
11. The method of claim 10, wherein the prediction of the single-user viewpoint trace for the segment is performed by processing the received viewpoint trace of the user using a long short term memory model.
12. The method of claim 10, wherein step (c) comprises determining bitrates for all of the tiles in the segment.
13. The method of claim 12, wherein determining the bitrates for all of the tiles in the segment comprises allocating bitrate to each of the tiles such that a sum of the bitrates of all of the tiles in the segment in substantially equal to the total bitrate.
14. The method of claim 13, the determining comprises: allocating a lower bitrate to the tiles in an area outside the predicted head movement area for the segment; and allocating higher bitrate to the tiles in an area inside the predicted head movement area for the segment.
15. The method of claim 14, wherein allocating lower bitrate to the tiles in the area outside the predicted head movement area for the segment comprises: allocating minimum bitrate to the tiles in the area outside the predicted head movement area for the segment.
16. The method of claim 14, wherein allocating higher bitrate to the area inside the predicted head movement area for the segment, comprises: allocating lower bitrate to the tiles in an area outside the predicted eye fixation area for the segment; and allocating higher bitrate to the tiles in an area inside the predicted eye fixation area for the segment.
17. The method of claim 14, wherein the determining comprises: controlling bitrates between adjacent tiles of the segment to be within a difference threshold.
18. The method of claim 8, wherein step (b) further includes predicting a cross-user viewpoint trace for the segment.
19. The method of claim 18, wherein the prediction of the cross-user viewpoint trace for the segment is performed based on a saliency map of known cross-user viewpoint traces associated with the segment.
20. The method of claim 19, wherein the predicted single-user viewpoint trace for the segment and the predicted cross-user viewpoint trace for the segment are both applied to predict the viewpoint of the user for the segment.
21. The method of claim 20, wherein the predicted single-user viewpoint trace for the segment is weighted with a first weighting factor and the predicted cross-user viewpoint trace for the segment is weighted with a second weighting factor.
22. The method of claim 1, further comprising: transmitting a request to receive the tiles of the segment in accordance with the determined bitrates for the tiles.
23. The method of claim 22, wherein the request includes an order indicator to receive the tiles of the segment in the order of decreasing bitrates.
24. The method of claim 1, further comprising repeating steps (a) to (c) for two or more segments of the video so as to stream at least part of the video.
25. The method of claim 1, wherein the video is a 360-degree video.
26. A system streaming a video, comprising: one or more controllers arranged to: determine, based on adaptive optimization of a quality gf experience function, a total bitrate for a segment of the video to be received and streamed, the adaptive optimization of the quality of experience function being based on a deep deterministic policy gradient algorithm; predict a viewpoint of a user for the segment; and determine bitrates for one or more tiles of a plurality of tiles in the segment based on the determined total bitrate and the predicted viewpoint.
27. The system of claim 26, further comprising a display operably connected with the one or more controllers for presenting the received segment or the video to the user.
28. The system of claim 26, wherein the one or more controllers is arranged an electrical device operably connected with a streaming server via a communication network.
29. The system of claim 28, wherein the electrical device comprises a head mounted display device.
30. The system of claim 28, wherein the streaming server comprises a Dynamic Adaptive Streaming over HTTP (DASH) server.
31. The system of claim 26, wherein the video is a 360-degree video.
32. The system of claim 26, wherein the one or more controllers are arranged to process playback states of the user using the deep deterministic policy gradient algorithm to determine the total bitrate of the segment with an objective to optimize or maximize the quality of experience function.
33. The system of claim 32, wherein the playback states include: past K bitrate records, corresponding download time, predetermined tile bitrate set for the segment, current buffer length, and current freezing length, wherein K is a number larger than 0.
34. The system of claim 32, wherein the quality of experience function includes the following factors: factor associated with visual quality of viewed tiles in a previous segment, factor associated with average quality variation between two segments, playback freezing event factor, and future freezing risk factor.
35. The system of claim 34, wherein one or more of the factors of the quality of experience function is weighted by a respective weighting factor.
36. The system of claim 26, wherein the one or more controllers are arranged to process a received viewpoint trace of the user using a long short term memory model to predict a single-user viewpoint trace for the segment.
37. The system of claim 26, wherein the one or more controllers are arranged to: predict a single-user viewpoint trace for the segment based on a received viewpoint trace of the user; and predict a cross-user viewpoint trace for the segment based on a saliency map of known cross-user viewpoint traces associated with the segment.
38. The system of claim 37, wherein the one or more controllers are arranged to apply both the predicted single-user viewpoint trace for the segment and the predicted cross-user viewpoint trace for the segment to predict the viewpoint of the user for the segment.
39. The system of claim 38, wherein the predicted single-user viewpoint trace for the segment is weighted with a first weighting factor and the predicted cross-user viewpoint trace for the segment is weighted with a second weighting factor.
40. The system of claim 26, wherein the one or more controllers are arranged to: predict a single-user viewpoint trace for the segment based on a received viewpoint trace of the user, the received viewpoint trace of the user comprises a head movement trace and an eye fixation trace; predict a head movement area of the user for the segment; predict an eye fixation area of the user for the segment; and determine bitrates for all of the tiles in the segment.
41. The system of claim 40, wherein the one or more controllers are arranged to allocate bitrate to each of the tiles such that a sum of the bitrates of all of the tiles in the segment in substantially equal to the total bitrate.
42. The system of claim 41, wherein the one or more controllers are arranged to: allocate lower bitrate to the tiles in an area outside the predicted head movement area for the segment; and allocate higher bitrate to the tiles in an area inside the predicted head movement area for the segment.
43. The system of claim 42, wherein the one or more controllers are arranged to: allocate minimum bitrate to the tiles in the area outside the predicted head movement area for the segment.
44. The system of claim 42, wherein the one or more controllers are arranged to: allocate lower bitrate to the tiles in an area outside the predicted eye fixation area for the segment; and allocate higher bitrate to the tiles in an area inside the predicted eye fixation area for the segment.
45. The system of claim 42, wherein the one or more controllers are arranged to control bitrates between adjacent tiles of the segment to be within a difference threshold.
46. A method for streaming a video, comprising: (a) determining a total bitrate for a segment of the video to be received and streamed; (b) predicting a viewpoint of a user for the segment; and (c) determining bitrates for one or more tiles in the segment based on the determined total bitrate and the predicted viewpoint; wherein step (b) includes: predicting a single-user viewpoint trace for the segment based on a received viewpoint trace of the user, the received viewpoint trace of the user comprises a head movement trace and an eye fixation trace; and predicting head movement area of the user for the segment and predicting eye fixation area of the user for the segment; wherein the prediction of the single-user viewpoint trace for the segment is performed by processing the received viewpoint trace of the user using a long short term memory model.
47. A method for streaming a video, comprising: (a) determining a total bitrate for a segment of the video to be received and streamed; (b) predicting a viewpoint of a user for the segment; and (c) determining bitrates for one or more tiles in the segment based on the determined total bitrate and the predicted viewpoint; wherein step (b) includes: predicting a single-user viewpoint trace for the segment based on a received viewpoint trace of the user, and predicting a cross-user viewpoint trace for the segment based on a saliency map of known cross-user viewpoint traces associated with the segment.
48. The method of claim 47, wherein the predicted single-user viewpoint trace for the segment and the predicted cross-user viewpoint trace for the segment are both applied to predict the viewpoint of the user for the segment.
49. A method for streaming a video, comprising: (a) determining a total bitrate for a segment of the video to be received and streamed; (b) predicting a viewpoint of a user for the segment; and (c) determining bitrates for one or more tiles of a plurality of tiles in the segment based on the determined total bitrate and the predicted viewpoint; wherein step (b) includes: predicting a single-user viewpoint trace for the segment based on a received viewpoint trace of the user, the received viewpoint trace of the user comprises a head movement trace and an eye fixation trace; and predicting head movement area of the user for the segment and predicting eye fixation area of the user for the segment; and wherein step (c) comprises determining bitrates for all of the tiles in the segment.
50. The method of claim 49, wherein determining bitrates for all of the tiles in the segment comprises allocating bitrate to each of the tiles such that a sum of the bitrates of all of the tiles in the segment in substantially equal to the total bitrate.
51. The method of claim 49, the determining comprises: allocating lower bitrate to the tiles in an area outside the predicted head movement area for the segment; and allocating higher bitrate to the tiles in an area inside the predicted head movement area for the segment.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Embodiments of the invention will now be described, by way of example, with reference to the accompanying drawings in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
(25)
(26)
(27)
(28)
(29)
DETAILED DESCRIPTION
(30) The inventors of the present invention have devised, through research, experiments, and trials, that tile-based coding schemes or layer-based coding schemes can both be used in viewpoint-based streaming strategies to allocate different bitrates to the contents of different video regions. Compared with layer-based coding scheme, tile-based coding schemes are usually easier to use and less complex.
(31) The inventors of the present invention have determined, through research, experiments, and trials, that tile-based ABR control can be regarded as a Markov decision process (MDP), which can be addressed through reinforcement learning (RL). In RL methods, appropriate actions, such as ABR control and viewpoint prediction, are adaptively taken to maximize a given QoE reward. In addition, bitrate allocation for each tile can benefit from game theory techniques, which assist in taking full advantage of the limited network bandwidth and in improving the user's QoE.
(32) The inventors of the present invention have realized that existing ABR algorithm designs have achieved some QoE gains but are still subjected to challenges and shortcomings. First, the main characteristic of a mobile network is high variability in both link conditions and traffic characteristics. This inherent complexity of mobile networks has been overlooked in the design of existing ABR algorithms, making them not particularly suitable for 360-degree video streaming systems over mobile devices. Second, suitable metrics for QoE over mobile devices have not been fully considered in existing RL-based ABR methods. In practice, the video quality and its fluctuations directly affect the user QoE; however, the video bitrate cannot directly affect the video quality in Dynamic Adaptive Streaming over HTTP (DASH) QoE modelling. Playback freezing is another factor that has not been considered in the design of existing ABR algorithms. Finally, in addition to the application of ABR algorithms, 360-degree video streaming systems require interaction between the server and the clients since the user's current viewpoint information must be considered. Therefore, for regional quality control, it is necessary to consider a user's future viewpoint, meaning that 360-degree video streaming systems require viewpoint prediction capabilities. The inventors of the present invention have realized that viewpoint prediction relies on historical trace data or video content analysis, cross-user viewpoint prediction could further improve the prediction accuracy by allowing the viewpoint trajectories of multiple viewers to be correlated. Viewing behavior is correlated with the video content, and such correlation may help to predict future viewpoints. However, most existing studies have not focused on combining historical traces with video content information, thus preventing tile-level bitrate allocation from reaching the global optimum.
(33) Against this background, the inventors of the present invention have devised, in one embodiment, a joint RL and game theory method for segment-level continuous bitrate selection and tile-level bitrate allocation for 360-degree streaming. In this embodiment, the tile-based 360-degree video sequences are stored on the DASH server. The head mount device (HMD) the playback state and the user's viewpoint traces to the controller, which then estimates the total requested quality/bitrate level for the upcoming segment and allocates the corresponding bitrate for every tile in this segment.
(34) The inventors of the present invention have devised, through research, experiments, and trials, that streaming systems generally consist of streaming data, codecs, and players. Given the smoothness requirement for successive segments, fluctuating mobile network conditions, playback freezing avoidance, and other relevant factors, adaptive bitrate streaming (ABS) techniques have become a new technological trend for providing smooth and high-quality streaming. Some existing ABS systems can provide consumers with relatively high-quality videos while using less manpower and fewer resources, and thus have become predominant systems among video delivery systems. When an ABS system is working properly, end users can enjoy high-quality video playback without notable interruption. Among the various ABS strategies, DASH is often used because of its convenience and ease of system construction. In an ABS system, the original videos are divided into “segments” of a fixed playback length, each of which is encoded into multiple representations with different resolutions, bitrates, and qualities; then, ABS clients request segments with the appropriate representations in accordance with the playback state and the expected network conditions. First, because the DASH system is built on top of HTTP, the video packets encounter no difficulties passing through firewalls or network address translation (NAT) devices. Second, the DASH ABR decisions are mainly client driven; thus, all of the ABR logic resides on the client side, and playback does not require a persistent connection between the server and the client. Furthermore, the server is not required to maintain session state information for each client, thereby increasing scalability. Also, because the DASH system transmits its video data over HTIP, it can be easily and seamlessly deployed on and adapted to all existing HTIP facilities, including HTTP caches, servers, and scalable content delivery networks (CDNs). At present, demand for high-definition (HD) videos and ultra-high-definition (UHD) videos is continuing to increase. To enhance the performance of 360-degree streaming systems, the concepts of tiles and slices are used in High Efficiency Video Coding (HEVC) to split video frames; consequently, tile-based strategies, as shown in
(35) The inventors of the present invention have devised, through research, experiments, and trials, that various bitrate allocation strategies have been developed for 360-degree streaming systems to improve streaming performance. For example, C L Fan et al, “Fixation prediction for 360° video streaming in head-mounted virtual reality,” in Proceedings of the 27th Workshop on Network and Operating Systems Support for Digital Audio and Video, 2017 has disclosed allocating more bitrate to the tiles inside the viewpoint area, which can be predicted with the help of the previous viewpoint or frame features. The streaming system disclosed in F. Qian et al “Flare: Practical viewport-adaptive 360-degree video streaming for mobile devices,” in Proceedings of the 24th Annual International Conference on Mobile Computing and Networking, 2018, is designed to combine adaptive QoE optimization models with new transmission protocols for efficient streaming. The streaming system ‘EPASS’ disclosed in Y. Zhang, et al “Epass360: QoE-Aware 360-degree video streaming over mobile devices,” IEEE Transactions on Mobile Computing, pp. 1-1, 2020 concentrates on the common scenarios in real-world 360-degree video streaming services in which users may specify various preferences for QoE metrics. In tile-based 360-degree video streaming schemes, the viewpoint prediction method is critical for lowering the bandwidth cost of unnecessary tiles. Any prediction error may lead to wasted bandwidth and reduced user QoE. The process of viewpoint prediction can be divided into two parts: head movement (HM) area prediction and eye fixation (EF) area prediction. HM area prediction determines the view region to be seen in one frame, and EF area prediction indicates the region of highest interest to humans within the view region. Accordingly, the HM areas should be predicted first to identify which tiles will be viewed by users. Then, given the predicted HM overlap area, the predicted EF area within the view region can be further estimated to help determine the bitrates of the tiles. For illustration, the HM and EF overlap areas in one segment are shown in
(36)
(37) The 360-degree videos used for testing in this embodiment are encoded in the commonly used equirectangular projection format, and the 360-degree streaming performance is evaluated under dynamic viewpoint changes and bandwidth fluctuations. Accurate bitrate allocation for every tile in a segment enables high-quality video playback and avoids playback freezing. Meanwhile, as the long short-term dependency greatly influences the prediction task, an LSTM network is applied in this embodiment to model the RL-based segment-level bitrate selection process. The LSTM model is particularly effective for online time-series prediction over multiple sources and tasks.
(38) In the tile-level bitrate allocation method of this embodiment, to address the case in which the user may occasionally exhibit an orientation inconsistency with the predicted HM area as a result of prediction error, the tiles outside the HM area are first allocated a fixed minimum bitrate. Thus, there will always be video content within the user's view region to ensure a certain user QoE during playback. Then, the tiles inside the HM area are allocated different selected rates depending on the predicted EF area, where these bitrates are chosen in accordance with the interest level of the tiles in the predicted EF area with the help of game theory methods. Furthermore, once the bitrates of all tiles in one segment are determined, in the method of this embodiment, the tiles with the maximum bitrate are requested first to ensure a smooth and high-quality playback experience. The bitrate difference between adjacent tiles is constrained not to exceed a certain level to avoid distinct borders observable by the user. Following the above rules, a suitable bitrate is selected for every tile inside and outside the view area, as shown in
(39) Training a Reinforcement Learning (RL Model with a Freezing-Aware QoE Model
(40) In one embodiment, a freezing-aware QoE model with the following features is formulated to evaluate the method of present embodiment. In the QoE model, Seg.sub.i denotes segment i, where i is the video segment index. Tile.sub.i,j,k is the tile representation of segment Seg.sub.i, where j is the jth tile of video segment i and k is the quality/bitrate level index for video segment i. rate(Tile.sub.i,j,k) is the corresponding video bitrate, and u(Tile.sub.i,j,k) is the video quality measured in terms of the weighted-to-spherically-uniform peak signal-to-noise ratio (WS-PSNR). The ABR decision is made after all tiles in one segment have been fully downloaded to choose a suitable video representation {rate(Tile.sub.i,j,k),u(Tile.sub.i,j,k)} for the next segment from the available set. For each segment Seg.sub.i, the download time τ.sub.i is calculated as follows:
(41)
where Σ.sub.j=1.sup.N (rate(Tile.sub.i,j,k)) is the total bitrate of the tiles in segment Seg.sub.i, also called the segment-level bitrate, which is selected from {rate(Tile.sub.i,j,k|k=1, 2, . . . , L}; T is the playback duration of segment i; and C.sub.i is the average network capacity during the download of segment i.
(42) Whereas Σ.sub.j=1.sup.N (rate(Tile.sub.i,j,k)) is the total bitrate of the tiles in Seg.sub.i, for every frame in Seg.sub.i, only the HM area can be seen by the user. Therefore, x.sub.j∈[0,1] is introduced to represent whether Tile.sub.i,j,k is in the HM area, and the visual quality of the viewed tiles can be formulated as
(43)
where x∈{0,1} represents whether the tile is inside the view area, with x.sub.j=1 meaning that the tile Tile.sub.i,j,k is in the HM area; otherwise, x.sub.j=0. Furthermore, the average quality variation between two segments can be calculated as
(44)
where inVAR is the average intra-segment quality variation, i.e., the quality variation within the user's view. Let B(Seg.sub.i) be the buffer length when starting to download segment Seg.sub.i. When τ.sub.i is less than or equal to B(Seg.sub.i), the DASH user will have a smooth playback experience; when it is greater than B(Seg.sub.i), playback freezing will occur, influencing the user QoE. The playback freezing length F.sub.i is defined as
F.sub.i=max{0,τ.sub.i−B(Seg.sub.i)}. (4)
(45) A suitable buffer level can prevent playback freezing events in the case of a sudden drop in network throughput. When τ.sub.i>B(Seg.sub.i), the playback buffer becomes empty before the next segment is completely downloaded. When segment i+1 is fully downloaded, the buffer length for the next segment is T. When T<τ.sub.i<B(Seg.sub.i), the buffer length for the next segment decreases. When τ.sub.i<T<B(Seg.sub.i), the buffer length for the next segment increases.
(46) Thus, the buffer length variation can be described as follows:
(47)
(48) Accordingly, in this embodiment, the QoE of the tile-based 360-degree DASH streaming system is formulated based on several factors: the visual quality of the viewed tiles in the received segment, the quality variations between two segments, the occurrence of playback freezing events, and the future freezing risk. To derive an RL policy that maximizes the user QoE, in this embodiment, the abovementioned QoE factors are considered and e a QoE reward function for use in the RL-based ABR decision-making method are proposed. In this embodiment, the WS-PSNR is applied as the instantaneous quality metric for a video sequence.
(49) In addition, the future freezing risk FFR.sub.i is introduced to avoid excessively short buffer lengths:
FFR.sub.i=max(0,B.sub.thresh−B(Seg.sub.i)), (6)
where B.sub.thresh represents the risk of playback freezing. The value of B.sub.thresh is calculated as follows:
(50)
(51) The DASH QoE function is defined in the form of a weighted sum:
(52)
in which the weights ω.sub.u, ω.sub.f and ω.sub.ffr are used to balance the different QoE metrics. Thus, they also represent the trade-off between high video quality, a constant quality level, and smooth playback. The desired operating point might depend on several factors, including user preferences and video content. For the parameter tuning strategy, the strategy disclosed in Y. Zhang et al “DRL360: 360-degree video streaming with deep reinforcement learning,”, IEEE INFOCOM 2019—IEEE Conference on Computer Communications, April 2019, pp. 1252-1260 is applied. To provide a high user QoE, the bitrate/quality level, for all segments should be determined via the ABR scheme to maximize the total QoE objective:
(53)
(54) ABR Control with Viewpoint Prediction Maps
(55) Viewpoint Prediction Method
(56) Understanding the viewing behavior of a user/viewer is important for an ABR algorithm to make future decisions. A user's viewing behavior mainly depends on the current viewing traces and is also related to the video content. In this embodiment, a viewpoint prediction method based on SU viewpoint traces and a cross-user viewpoint model is applied to capture the spatial and temporal behavior of users in order to determine the HM and EF areas for every segment. The SU viewpoint traces are predicted utilizing an LSTM network to determine the HM and EF areas for the user. Because a user's viewing behavior mainly depends on the current viewing traces, an LSTM network can model the user's behavior and generate a precise prediction. The model parameters are denoted by θ, and the LSTM model for predicting the EF trace in the time dimension is formulated as follows:
EF′.sub.t+1=LSTM(EF.sub.0, . . . ,EF.sub.t;θ.sub.EF), (10)
where θ.sub.EF denotes the parameters used in predicting the EF area. The frequency at which consecutive images appear on the HMD display is expressed in terms of the number of frames per second (FPS or frame rate, denoted by N.sub.FPS). The predicted EF area in one frame can be recurrently fed into the LSTM model to obtain N.sub.FPS EF areas to generate the predicted EF area for one segment.
(57) Similarly, the future HM trace at the segment level can be predicted with an LSTM model, as follows:
HM′.sub.n+1=LSTM(HM.sub.0, . . . ,HM.sub.n;θ.sub.HM), (11)
where θ.sub.HM denotes the parameters used in predicting the HM area. Because different users may have different viewing behaviors, such as watching videos without much HMD movement or moving the HMD frequently to explore the video content, in order to further decrease the prediction error, an SM prediction method is introduced to model cross-user viewpoint traces. If the predicted SU trace is driven by the video content, then the SM method, such as the one disclosed in M. Xu at al “Predicting head movement in panoramic video: A deep reinforcement learning approach,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 11, pp. 2693-2708, November 2019, will yield more precise prediction results. Otherwise, the SU results will work better. In this embodiment, to benefit from the advantages of both the SU and SM methods, higher weights ϵ.sub.1 and ϵ.sub.2 are set for the SU prediction results in the first half of each segment. In the second half of each segment, higher weights ϵ.sub.1 and ϵ.sub.2 are assigned to the SM prediction results. The final HM area FNHM′.sub.i+1 and EF area FNEF′.sub.i+1 are predicted via the following equations:
FNHM′.sub.i+1=ϵ.sub.1SUHM1′.sub.i+1+(1−ϵ.sub.1)SMHM1′.sub.i+1+(1−ϵ.sub.2)SUHM2′.sub.i+1+ϵ.sub.2SMHM2′.sub.i+1, (12)
FNEF′.sub.i+1=ϵ.sub.1SUEF1′.sub.i+1+(1−ϵ.sub.1)SMEF1′.sub.i+1+(1−ϵ.sub.2)SUEF2′.sub.i+1+ϵ.sub.2SMEF2′.sub.i+1, (13)
where {SUHM1′.sub.i+1,SUHM2′.sub.i+1}, {SUEF1′.sub.i+1,SUEF2′.sub.i+1}, {SMHM1′.sub.i+1,SMHM2′.sub.i+1}, and {SMEF1′.sub.i+1,SMEF2′.sub.i+1} represent the HM/EF areas predicted using the SU/SM methods in the first/second halves of each segment. For the HM area, the weights ϵ.sub.1 and ϵ.sub.2 are calculated via the following equations:
ϵ.sub.1=max{PPSU1′.sub.i+1,HM,PPSM1′.sub.i+1,HM}, (14)
ϵ.sub.2=max{PPSU2′+.sub.i+1,HM,PPSM2′.sub.i+1,HM}, (15)
where {PPSU1′.sub.i+1,HM,PPSM1′.sub.i+1,HM} are the prediction precisions in the first half of each segment using the SU and SM methods, respectively, and {PPSU2′.sub.i+1,HM, PPSM2′.sub.i+1,HM} are the corresponding prediction precisions in the second half of each segment. For the EF area, the weights ϵ.sub.1 and ϵ.sub.2 are calculated via the following equations:
ϵ.sub.1=max{PPSU1′.sub.i+1,EF,PPSM1′.sub.i+1,EF}, (16)
ϵ.sub.2=max{PPSU2′.sub.i+1,EF,PPSM2′.sub.i+1,EF}, (17)
where {PPSU1′.sub.i+1,EF, PPSM1′.sub.i+1,EF} are the prediction precisions in the first half of each segment using the SU and SM methods, respectively, and {PPSU2′.sub.i+1,EF, PPSM2′.sub.i+1,EF} are the corresponding prediction precision in the second half of each segment.
(58) Segment-Level RL-Based Bitrate Selection
(59) An algorithm for joint RL- and cooperative-bargaining-game-based bitrate selection is provided in this embodiment. In this embodiment, the deep deterministic policy gradient (DDPG) algorithm is used as the basis for the RL-based ABR decision algorithm for segment bitrate control. The DDPG framework for RL-based DASH ABR decision-making is illustrated in ,
, rate(Seg.sub.i), B(Seg.sub.i), F.sub.i). The elements of the DASH ABR MDP are as follows:
(60) Input State: After downloading each segment i, the DASH agent receives the playback state s.sub.i=(,
, rate(Seg.sub.i), B(Seg.sub.i), F.sub.i) as its input.
(61) Action Mapping: When segment i is completely downloaded (e.g., in time step i), the DASH learning agent generates an output action a.sub.i that determines the quality level U.sub.i+1=(rate.sub.i+1,u(Tile.sub.i+1)) of the next segment to be downloaded, a.sub.i∈[bitrate.sub.min,bitrate.sub.max,], corresponding to the available bitrate level for the video segment. Considering the bitrate limits and discrete quality levels, to make more precise ABR decisions, the action is mapped to the available bitrate level to improve the accuracy of each choice.
(62) Reward: Given a state and an action, the DASH MDP reward function r.sub.i is defined as follows:
r.sub.i(s.sub.i,a.sub.i)=QoE.sub.i. (18)
(63) The goal of the agent is to maximize the expected return from each state s.sub.i. The action value Q.sup.π(s, a) is the expected return when action a is selected in state s following policy π:
Q.sup.π(s,a)=E[R.sub.i|s.sub.i=s,a]. (19)
(64) The optimization algorithm presented in R. Hong et al “Continuous bitrate & latency control with deep reinforcement learning for live video streaming,” in Proceedings of the 27th ACM International Conference on Multimedia, 2019 is used to learn the RL network parameters, with base learning rates of 10.sup.−3 and 10.sup.−4 for the actor and critic networks, respectively. The discount factor γ used for training is 0.99. The Ornstein-Uhlenbeck process is utilized to ensure the exploratory behavior of the actor network.
(65) Tile-Level Cooperative-Bargaining-Game-Based Bitrate Selection
(66) Once the bitrate for the next segment has been selected, tile-level bitrate allocation is then performed in accordance with the view prediction results. An algorithm is constructed for this purpose as follows. A cooperative bargaining game represents a situation in which the players have similar motivations to cooperate but have different preferences and conflicts of interest over how to cooperate and thus is suitable for describing this task, in which the tiles in different positions in the same segment bargain over the available bitrate. The objective of such a bargaining game is to reach a beneficial agreement among the players. Consider a bargaining game with N players; if an agreement is reached, players 1, 2, . . . , N will receive utilities u.sub.1, u.sub.2, . . . , u.sub.N, respectively. The utility vector {right arrow over (u)}=(u.sub.1, u.sub.2, . . . , u.sub.N).sup.T is one entry in the set of all possible utility combinations, which is denoted by U=({right arrow over (u)}.sub.1, {right arrow over (u)}.sub.2, . . . , {right arrow over (u)}.sub.m),m∈[0,Ω], where Ω is the number of possible utility combinations. In addition, d.sub.j denotes the disagreement of player j, and the disagreement vector d=(d.sub.1, d.sub.2, . . . , d.sub.N).sup.T is defined to ensure a minimum utility for each player. Each element in the Nash bargaining solution (NBS, a possible utility combination) must be larger than the disagreement of the corresponding player; otherwise, the bargaining game will fail.
(67) If U is non-empty, convex, closed and bounded, then the cooperative bargaining game with N players can be denoted by h(U, d), and the NBS can be obtained by means of the following theorem.
(68) Theorem 1: The NBS {right arrow over (u)}.sub.opt=(u.sub.1,opt, u.sub.2,opt, . . . , u.sub.n,opt).sup.T is the unique bargaining solution for h(U, d) if the following equation is satisfied:
(69)
(70) where {right arrow over (u)}.sub.opt=(u.sub.1,opt,u.sub.2,opt, . . . , u.sub.n,opt).sup.T and u.sub.1,opt≥d.sub.1,u.sub.2,opt≥d.sub.2, . . . , u.sub.n,opt≥d.sub.N.
(71) In addition, there are several axioms concerning the NBS, as follows:
(72) i) Individual rationality. u.sub.j,opt≥d.sub.j for all j.
(73) ii) Feasibility u.sub.opt∈U.
(74) iii) Pareto optimality of u.sub.opt.
(75) iv) Independence of irrelevant alternatives. If u.sub.opt∈V⊂U is the NBS of h(V, d), then the NBS of h(U, d) is also u.sub.opt.
(76) v) Independence of linear transformations. Let u.sub.opt be the solution to h(U, d), and let g be a linear transformation function.
(77) Suppose that U.sub.g=g(U) and dg=g(d); then, the solution to h(U.sub.g,d.sub.g) is g(u.sub.opt).
(78) vi) Symmetry. If U is invariant under all exchanges of users, then all elements of u.sub.opt are equal, i.e., u.sub.1,opt=u.sub.2,opt= . . . =u.sub.n,opt.
(79) Axioms (i)-(iii) guarantee the existence and efficiency of the NBS, while axioms (iv)-(vi) guarantee the fairness of the solution. The symmetry axiom guarantees equal priority of the players during the bargaining game when the players have the same utility function and disagreement. It should be noted that when the elements of the disagreement vector d are set to equal values, the NBS will be equal to an optimized solution that maximizes the average utility when the channel conditions for all users are the same. Based on bargaining game theory and the NBS, the optimal bandwidth allocation for multiple tiles in a 360-degree streaming system is modelled as a Nash bargaining game and thus can be solved by the NBS. In the tile-level bitrate allocation algorithm of this embodiment, the utility is the expected quality for the next segment, and the maximization problem is formulated as a bargaining game as follows:
(80)
where the minimum acceptable utility for tile j is denoted by d.sub.j, i.e., the minimum quality that can be accepted for the jth tile. The total bitrate in the HM overlap area as given by the RL agent is denoted by R.sub.c. The maximization of a product can be transformed into the maximization of a sum in the logarithm domain, and the NBS tile.sub.opt can be obtained by introducing Lagrange multipliers. In this paper, the utility function for bitrate allocation is defined as follows:
(81)
where C.sub.j is the number of header bitrates, ρ.sub.j and k.sub.j are the utility parameters, and m.sub.j denotes the mean average deviation (MAD) of the average residuals between the original and reconstructed segments. These parameters are given by the server side before the start of transmission. d.sub.j is obtained via the following equation:
(82)
where rate(Tile.sub.i,j,1) is the bitrate of tile j in segment i with quality level 1 and rate(Tile.sub.i,j,2) is the bitrate of tile j in segment i with quality level 2. The set {ISHM.sub.j,ISEF.sub.j} represents whether tile j is located in the predicted EF area. If tile j is located in the predicted EF area, then {ISHM.sub.j, ISEF.sub.j}={0,1}; otherwise, {ISHM.sub.j, ISEF.sub.j}={1,0}. The utility set U is first proven to be a convex set, as follows.
(83) The utility set U is a convex set: The utility set U is a convex set if and only if for any utility points X=(X.sub.1, . . . , X.sub.N)=(u.sub.1(x), . . . , u.sub.N(x))∈U and Y=(Y.sub.1, . . . , Y.sub.N)=(u.sub.1(y), . . . , u.sub.N(y))∈U, the following condition is satisfied:
θX+(1−θ)Y∈U, (24)
where 0≤θ≤1 and x=(x.sub.1, x.sub.2, . . . , x.sub.N) and y=(y.sub.1, y.sub.2, . . . , y.sub.N) are bitrate allocation strategies that satisfy the total bitrate constraints. Based on the utility function,
(84)
and
(θX.sub.j+(1−θ)Y.sub.j)ρ.sub.jk.sub.jm.sub.j+C.sub.j=θx.sub.j+(1−θ)y.sub.j>R.sub.j.sup.0. (26)
(85) Therefore, θX+(1−θ)Y∈U; thus, the feasible utility set U is convex.
(86) Determination of the NBS: Because U is a convex set, Equation (21) can be converted into the following format:
(87)
(88) The optimal solution to Equation (27) can be obtained by solving the Karush-Kuhn-Tucker (KKT) conditions. Let λ and n.sub.j (j=1, . . . ,N) be the Lagrange multipliers; then the following Lagrangian function L(rate(Tile.sub.i,j,k),λ, n.sub.j) can be obtained:
(89)
(90) The KKT conditions for Equation (28) can be written as
(91)
(92) By substituting the utility function (22) into Equation (29(vii)), this condition can be rewritten as
(93)
(94) According to Equation (29(vi)), there must be a solution rate j that is larger than rate(d.sub.j), such that d.sub.j−u(Tile.sub.i,j,k)<0 and n.sub.j=0:
(95)
(96) As shown in Equation (29(v)),
(97)
(98) Δ can be calculated as
(99)
(100) Therefore, the NBS, i.e., te.sub.opt=(rate(Tile.sub.i,1,k).sub.opt, . . . , rate(Tile.sub.i,N,k).sub.opt).sup.T, can be obtained as
(101)
where rate.sub.opt={rate.sub.j,opt}, j=1, . . . , N, is the NBS for Equation (21). Based on the NBS, in the method of this embodiment, the bitrate list for the current tile set is used as a look-up table to find an appropriate bitrate that is close to the NBS for every tile in the segment.
(102)
(103) Implementation
(104) The method of the above embodiment is implemented on the mobile device client and the streaming server to achieve precise viewpoint prediction and bitrate allocation for the requested tiles. The mobile client receives the requested 360-degree video segments and the updated cross-user viewpoint prediction map generated via the SM method. By jointly considering the SM and SU viewpoint prediction results, the segment- and tile-level bitrates are allocated with the method. After bitrate allocation, the client sends the ordered tile requests and the recorded user viewpoint traces to the server to be used to update the cross-user viewpoint prediction map using the SM method.
(105) Mobile device client and streaming server: In the evaluation of the performance of the method of the above embodiment, all experiments were performed on an Ubuntu (Version 16.04) system with a GTX 2080 Ti GPU, an i7-8700K CPU, and 32 GB of memory. The platform was modified to support the testing of all ABR algorithms requesting video segments from the DASH client side. In the experiment, the Google Chrome browser was used as the mobile device client, and an Apache web server (version 2.4.7) was modified with the software dash.js (version 3.0.0) to serve as a DASH server running on the same machine as the client. The DASH client was configured to have a maximum playback buffer capacity of 10 seconds.
(106) Videos and viewpoint traces: For the experiments, 360-degree video contents were collected from a dataset that includes videos playing both fast and slow. The corresponding user viewpoint traces were also provided by this dataset, with a 200-ms granularity. Videos and their corresponding viewpoint traces were randomly selected to form the training set, and the remaining videos were used as the test set for evaluation. All videos were encoded by Kvazaar with HEVC under rate control mode and assigned bitrates of 0.8 Mbps, 1.2 Mbps, 2 Mbps, 3 Mbps, 4 Mbps, 8 Mbps, 12 Mbps, 20 Mbps and 36 Mbps (8192×4096). Thus, each video was encoded to generate nine bitrate representations, i.e., N.sub.R=9. Then, the videos were divided into segments of one second, two seconds, and three seconds in length using MP4 box to investigate the influence of the segment length on 360-degree video streaming. In addition, for each segment, the frames were split into 6×6, 12×6, and 12×12 tile configurations to investigate the influence of the tile configuration on 360-degree video streaming. The viewpoint angle was set to be the commonly used viewpoint angle for mobile devices of 100×100 degrees.
(107) Bandwidth profiles: The method of the above embodiment was evaluated using both real and synthetic bandwidth profile datasets. For the real bandwidth dataset, five hundred bandwidth profiles of at least 320 seconds in length under various network conditions were randomly selected and scaled up from the typical HSDPA bandwidth dataset and 5G dataset. Two hundred of the generated bandwidth profiles were randomly selected to train the RL-based ABR model, and the rest were used for testing. The values of the selected bandwidth profiles fell between 0.7 Mbps and 20 Mbps. For the synthetic bandwidth dataset, bandwidth profiles classified into four types were used for evaluation; the four types of profiles are depicted in
(108) Algorithms for comparison: The following five existing tile-based 360-degree ABR control algorithms were chosen for comparison with the 360-degree streaming method in the above embodiment of the invention: FIX: In the FIX method, bitrate allocation is performed dynamically for an entire segment without tile-based encoding, and the user's 360-degree HM behavior is ignored. This ABR method uses the harmonic mean for throughput prediction and then selects the highest available bitrate within the predicted throughput. LE: Linear regression is leveraged to predict the viewpoint, and a probabilistic optimization model for the average viewpoint bitrate is set up to allocate the rates for tiles. LJ-G: A Gaussian prediction error is used in a linear-regression-based viewpoint prediction model. 360-P: This framework is a 360-degree version of the “Pensieve” model disclosed in H. Mao et al “Neural adaptive video streaming with pensieve,” in Proceedings of the Conference of the ACM Special Interest Group on Data Communication, 2017, which is also an RL-based model for DASH streaming of traditional video content leveraging the A3C model. The 360-P method involves a specific prediction model for the previous viewpoint trajectories to ensure suitability for the 360-degree video scenario. EPASS: This method, disclosed in Zhang et al. “Epass360: QoE-aware 360-degree video streaming over mobile devices,” IEEE Transactions on Mobile Computing, pp. 1-1, 2020, involves LSTM-based SU viewpoint and bandwidth prediction models for bitrate allocation in a tile-based 360-degree video streaming system.
(109) Performance Evaluation
(110) The performance of the method of the above embodiment is evaluated relative to that of various existing methods for multiple 360-degree videos under a variety of real network traces. Both the prediction performance for the HM and EF overlap areas and the ABR performance are evaluated.
(111) Viewpoint Prediction Performance
(112) To quantify the viewpoint prediction performance during the playback of 360-degree video contents under different network conditions, the average prediction precision and prediction error of the SU, SM, and the above-proposed viewpoint prediction methods were evaluated. A prediction method with a low prediction precision or a high prediction error will cause low-quality areas to be rendered to the user. The average prediction precision is represented by the intersection over union (IoU), which can be obtained by dividing the overlap area by the union area between the predicted and ground-truth view areas with a view angle of 100°×100°. The performance of the viewpoint prediction methods was measured without considering bandwidth variations. The prediction error represents the situation in which necessary tiles for view rendering are assigned the lowest bitrate as a result of the uncertainty of the prediction method. Accordingly, the prediction error is defined as the percentage of lowest-bitrate tiles rendered in the user's view area in one segment.
(113) The average prediction precision and the prediction error of all viewpoint prediction algorithms were measured at five timestamps in every segment. For every frame in the segment, the precision value was obtained after prediction, and the prediction error was calculated after processing the whole segment. The average prediction results at timestamps m=0, . . . , 1 of the SU, SM, and the proposed algorithms of the above embodiment are shown in
(114)
(115) Effectiveness of Continuous Bitrate Control
(116) Once the viewpoint prediction results have been generated, the download bitrate for the tiles in the next segment should be determined and allocated. The segment-level bitrate is determined by the proposed RL-based ABR algorithm, and the tile-level bitrates are chosen based on the predicted HM and EF areas and the selected segment-level bitrate following the allocation strategy described above. Here, the effectiveness of continuous segment-level bitrate control and game-theory-based tile-level bitrate allocation is evaluated under real bandwidth profiles. The performance is evaluated in terms of the average effective viewpoint WS-PSNR (ePSNR) and the average effective bitrate (eBitrate) under various experimental conditions. The ePSNR is the average WS-PSNR of tiles used to render the view area, and the eBitrate represents the actual downloaded bitrates of the tiles for rendering. First, stable network conditions (6 Mbps) were used to test the influence of different viewpoint prediction methods and tile configuration schemes on 360-degree streaming ABR control, as shown in
(117) QoE Gain of the Proposed Method
(118) To further compare and analyze each algorithm, the user QoE gains in terms of different QoE objectives were further evaluated based on the real bandwidth profile dataset. The detailed QoE gains in terms of different QoE objectives are shown in the table of
(119) Performance Evaluation on Synthetic Bandwidth Datasets
(120) The performance evaluation conducted on synthetic bandwidth datasets is also presented. The detailed QoE metrics under the different cases of synthetic mobile network profiles are presented in the tables of
(121) Ablation Study on Real Bandwidth Datasets
(122) The results of an ablation study for which the normalized QoE (minimum is set to 0.1, and maximum is set to 0.9) results under different QoE objectives are shown in
(123) Remarks
(124) Existing bitrate adaptation algorithms have difficulty providing smooth and steady video quality for 360-degree streaming users on mobile devices under highly dynamic network conditions because the viewpoint prediction and bitrate adaptation methods are based on instantaneous states rather than full consideration of historical data. To guarantee a high QoE in terms of different objectives to ensure both smoothness and steadiness, a hybrid control scheme is proposed in the above embodiment for a dynamic adaptive 360-degree streaming system, which can efficiently make viewpoint prediction and ABR decisions under various network and user behavior conditions to optimize various QoE objectives for the user. The proposed method leverages RL for continuous segment-level bitrate control and game theory for tile-level bitrate allocation while fully considering both temporal viewpoint variations and spatial video content variations. Experimental evaluations of the proposed scheme show that the proposed tile-based 360-degree streaming system can achieve improved QoE gains in diverse scenarios on mobile devices.
(125) Exemplary Hardware
(126)
(127) The controller 1900 includes a processor 1902 and a memory 1904. The processor 1902 may be formed by one or more of: CPU, MCU, controllers, logic circuits, Raspberry Pi chip, digital signal processor (DSP), application-specific integrated circuit (ASIC), Field-Programmable Gate Array (FPGA), or any other digital or analog circuitry configured to interpret and/or to execute program instructions and/or to process information and/or data. The memory 1904 may include one or more volatile memory unit (such as RAM, DRAM, SRAM), one or more non-volatile memory unit (such as ROM, PROM, EPROM, EEPROM, FRAM, MRAM, FLASH, SSD, NAND, and NVDIMM), or any of their combinations.
(128) The processor 1902 includes a machine learning processing module 1902A and a non machine learning processing module 1902B. The machine learning processing module 1902A is arranged to process data using one or more machine learning processing models (e.g., reinforcement learning model such as DDPG model, recurrent neural network model such as LSTM model). The non machine learning processing module 1902B is arranged to process data without using machine learning processing models or methods. For example, the non machine learning processing module 1902B may be used to perform various data processing such as filtering, segmenting, thresholding, averaging, smoothing, padding, transforming, scaling, etc. The processor 1902 also includes a training module 1902C arranged to train the machine learning processing model(s), such as the model(s) in the memory 1904.
(129) The memory 1904 includes a machine learning processing model store 1904A arranged to store one or more machine learning processing models to be used by the processor 1902 for processing data. The one or more machine learning processing models may include the reinforcement learning model such as the DDPG model and the recurrent neural network model(s) such as LSTM model(s) for HM and EF predictions. In one example, only one machine learning processing model is stored. In another example, multiple machine learning processing models are stored. The machine learning processing model(s) in the machine learning processing model store 1904A may be trained, re-trained, or updated as needed—new or modified machine learning processing model(s) may be obtained by training or by data transfer (loading into the controller 1900). The memory 1904 also includes data store 1904B and instructions store 1904C. The data store 1904B may store: training/validation/test data for training/validating/testing the machine learning processing model(s), data received from external devices such as a streaming server, etc. The training/validation/test data used to train/validate/test the respective machine learning processing model(s) may be classified for use in the training/validating/testing different machine learning processing models. The instructions store 1904C stores instructions, commands, codes, etc., that can be used by the processor 1902 to operate the controller 1900.
(130)
(131) General Video Streaming Method
(132)
(133) As shown in
(134) In step 2102, the total bitrate is determined/predicted based at least in part on the received playback states (e.g., associated with a previous video segment consecutive with the video segment). The determination may be based on adaptive optimization of a quality of experience function. The adaptive optimization of the quality of experience function may be based on a deep reinforcement learning model, such as a deep deterministic policy gradient algorithm, including but not limited to the one presented in the above embodiment.
(135) In step 2103, the viewpoint of the user is determined/predicted based at least in part on the received viewpoint trace (e.g., associated with a previous video segment consecutive with the video segment, or a current viewpoint of the user). The determination/prediction of the viewpoint of the user may be based on the method presented in the above embodiment. The viewpoint trace of the user may comprise, at least, a head movement trace and an eye fixation trace. Step 2103 may include predicting a single-user viewpoint trace for the segment based on a received viewpoint trace of the user (e.g., associated with the previous segment(s) of the video), and optionally, also predicting a cross-user viewpoint trace for the segment. Step 2103 may include predicting head movement area (e.g., map) of the user for the segment and predicting eye fixation area (e.g., map) of the user for the segment. The prediction may be performed using a recurrent neural network model (e.g., a long short term memory model) processing the received viewpoint trace of the user. The prediction of the cross-user viewpoint trace for the segment may be performed based on a saliency map (SM) of known cross-user viewpoint traces associated with the segment.
(136) After steps 2102 and 2103, in step 2104, the method 2100 determines bitrates for the tiles (preferably all of the tiles) in the segment based on the determined total bitrate and the predicted viewpoint. The determination may include allocating bitrate to each of the tiles such that a sum of the bitrates of all tiles in the segment in substantially equal to the determined total bitrate. In one example, step 2104 includes allocating a lower bitrate (or bitrates) to the tiles in area outside a predicted head movement area for the segment and allocating a higher bitrate (or bitrates) to the tiles in area inside a predicted head movement area for the segment. The bitrate(s) of the tiles in area outside the predicted head movement area need not be identical. Likewise, the bitrate(s) of the tiles in area inside the predicted head movement area need not be identical. In one implementation, a minimum available bitrate is allocated to the tiles in area outside a predicted head movement area for the segment. In one implementation, the tiles in area inside the predicted head movement area but outside the predicted eye fixation area for the segment is allocated lower bitrate (or bitrates) compared with the tiles in area inside both the predicted head movement area and the predicted eye fixation area for the segment. In one implementation, bitrates between adjacent tiles of the segment to be within a difference threshold to avoid apparent presence of boundary.
(137) Once the bitrates for the tiles of the segment are determined, then in step 2106, a request to receive the tiles of the segment in accordance with the determined bitrates for the tiles is generated and transmitted, e.g., from the one or more controllers to a streaming server. In a preferred embodiment, the request includes an order indicator to receive the tiles of the segment with the largest determined bitrate first or in the order of decreasing bitrates. This ensures that the tiles with highest bitrate are received earlier than the other tiles.
(138) In step 2108, the tiles of the segment are received, e.g., from a streaming server operably connected with the one or more controllers, in accordance with the determined bitrates for the tiles.
(139) In step 2110, a determination is made, e.g. by the one or more controllers, as to whether all tiles of that segment is received. If yes, the method 2100 returns to receiving updated playback states and viewpoint trace, or to steps 2102 and 2103, for the next video segment to be received (or downloaded). If not, the method 2100 will wait unit all tiles of that segment is received before returning to receiving updated playback states and viewpoint trace, or to steps 2102 and 2103, for the next video segment to be received (or downloaded).
(140) In step 2112, which may be performed after step 2108 irrespective of the progress of step 2110, the received tiles are processed and the corresponding video content is streamed to the user (e.g., provided to the user at a display operably connected with the one or more controllers).
(141) The steps in method 2100 may be performed by one or more controllers, e.g., at the electrical device for streaming and playing the video, in which case the viewpoint trace and the playback states may be received locally from detectors operably connected with the one or more controllers. The method 2100 can be repeated for two or more or all segments of the video so as to stream the video. The method 2100 is preferably applied for streaming 360 degrees video. However, it is contemplated that the method 2100 can be used to stream other video.
(142) Although not required, the embodiments described with reference to the Figures can be implemented as an application programming interface (API) or as a series of libraries for use by a developer or can be included within another software application, such as a terminal or computer operating system or a portable computing device operating system. Generally, as program modules include routines, programs, objects, components and data files assisting in the performance of particular functions, the skilled person will understand that the functionality of the software application may be distributed across a number of routines, objects and/or components to achieve the same functionality desired herein.
(143) It will also be appreciated that where the methods and systems of the invention are either wholly implemented by computing system or partly implemented by computing systems then any appropriate computing system architecture may be utilized. This will include stand-alone computers, network computers, dedicated or non-dedicated hardware devices. Where the terms “computing system” and “computing device” are used, these terms are intended to include (but not limited to) any appropriate arrangement of computer or information processing hardware capable of implementing the function described.
(144) It will be appreciated by persons skilled in the art that numerous variations and/or modifications may be made to the invention as shown in the specific embodiments to provide other embodiments of the invention. The described embodiments of the invention should therefore be considered in all respects as illustrative, not restrictive. Various optional features of the invention are set out in the summary. For example, the steaming method and/or system may be applied to steam video other than 360 degrees video. The steaming method and/or system may be applied to stream only part of an entire video. The specific algorithms, models, etc., may be adjusted or modified to take into account additional or alternative factors for streaming the video. The video source file may be encoded or processed in a different format. The steaming method and/or system may be applied to other electrical device (not limited to mobile device).