PORTIONED VIDEO STREAMING CONCEPTS
20200137136 · 2020-04-30
Inventors
- Robert Philipp Skupin (Berlin, DE)
- Cornelius Hellge (Berlin, DE)
- Thomas Schierl (Berlin, DE)
- Yago SÁNCHEZ DE LA FUENTE (Berlin, DE)
- Dimitri Podborski (Berlin, DE)
Cpc classification
H04N21/234345
ELECTRICITY
H04N21/234327
ELECTRICITY
H04N21/8456
ELECTRICITY
H04N19/167
ELECTRICITY
H04N21/23476
ELECTRICITY
H04N21/26258
ELECTRICITY
H04L2209/60
ELECTRICITY
International classification
H04N21/2343
ELECTRICITY
H04N19/167
ELECTRICITY
Abstract
Portion- or tile-based video streaming concepts are described.
Claims
1. Apparatus for downloading an ROI-specific video stream by tile-based video streaming, configured to inspect a manifest file so as to, depending on a region of interest, identify and download a set of bit streams along with an extractor, the set of bitstreams having encoded thereinto different portions of a video picture area; compile, using the extractor, a compiled bitstream out of the set of bitstreams by extracting, from each of the set of bitstreams, a picture portion relating to a current picture frame by parsing the respective bitstream and forming the compiled bitstream out of the extracted picture portions so that the compiled bitstream comprises a sub-picture portion for the picture portion of each of the set of bitstreams the compiled bitstream is formed of; and decrypt a coding payload section of each subpicture portion of a subset of one or more of the subpicture portions of the compiled bitstream by using block-wise decryption by use of sequential variation of a plaintext mask and/or block-decryption key by reinitializing the sequential variation for each subpicture portion to be decrypted and finding a border of the coding payload section of each subpicture portion to be decrypted by parsing the coding payload section of the respective subpicture portion up to a currently decrypted position and/or deriving a length of the coding payload section of the respective subpicture portion from a header within the respective subpicture portion, and/or using a bitstream length or pointer indication signaled within the bitstream from which the picture portion is extracted which the respective subpicture portion belongs to.
2. Apparatus for recovering a video stream from a set of bit streams and an extractor file, the set of bitstreams having encoded thereinto different portions of a video picture area, the apparatus configured to compile, using the extractor file, a compiled bitstream out of the set of bitstreams by extracting, from each of the set of bitstreams, a picture portion relating to a current picture frame by parsing the respective bitstream and forming the compiled bitstream out of the extracted picture portions so that the compiled bitstream comprises a sub-picture portion for the picture portion of each of the set of bitstreams the compiled bitstream is formed of; and decrypt a coding payload section of each subpicture portion of a subset of one or more of the subpicture portions of the compiled bitstream by using block-wise decryption by use of sequential variation of a plaintext mask and/or block-decryption key by reinitializing the sequential variation for each subpicture portion to be decrypted and finding a border of the coding payload section of each subpicture portion to be decrypted by parsing the coding payload section of the respective subpicture portion up to a currently decrypted position and/or deriving a length of the coding payload section of the respective subpicture portion from a header within the respective subpicture portion, and/or using a bitstream length or pointer indication signaled within the bitstream from which the picture portion is extracted which the respective subpicture portion belongs to.
3. Apparatus for recovering a video stream from a bitstream which comprises sub-picture portions for different portions of a video picture area, wherein the apparatus is configured to decrypt a coding payload section of each subpicture portion of a subset of one or more of the subpicture portions of the bitstream by using block-wise decryption by use of sequential variation of a plaintext mask and/or block-decryption key by reinitializing the sequential variation for each subpicture portion to be decrypted and finding a border of the coding payload section of each subpicture portion to be decrypted by parsing the coding payload section of the respective subpicture portion up to a currently decrypted position and/or deriving a length of the coding payload section of the respective subpicture portion from a header within the respective subpicture portion, and/or using a bitstream length or pointer indication signaled from outside for the respective subpicture portion.
4. Apparatus according to claim 1, configured to perform the re-initialization for each subpicture portion to be decrypted by deriving mutually different initialization states for the subset of one or more subpicture portions.
5. Apparatus according to claim 4, configured to perform the deriving mutually different initialization states for the subset of one or more subpicture portions by applying mutually different modifications to a base initialization state for the current picture frame.
6. Apparatus according to claim 5, configured to derive the mutually different modifications for each subpicture portion depending on the portion of the video picture area which the respective subpicture portion relates to or depending on an index of the respective subpicture portion.
7. Apparatus according to claim 1, configured to perform the re-initialization for each subpicture portion to be decrypted by deriving an initialization state for each of the subset of one or more subpicture portions from an initialization state list in the extractor file.
8. Apparatus according to claim 1, configured to perform the parsing the coding payload section, the deriving the length of the coding payload section, or the use of the bitstream length or pointer indication for the finding with disregarding explicit border location information in the extractor file.
9. Collection of data for downloading an ROI-specific video stream by tile-based video streaming, comprising bit streams each having encoded thereinto one of portions of a video picture area, so that each portion of the video picture area is encoded into a subset of the bit streams at different qualities; at least one extractor associated with an ROI of the video picture area; a manifest file identifying, for the predetermined ROI, a set of bit streams having encoded thereinto different portions of a video picture area which focus on the ROI, wherein the extractor indicates a compilation of a compiled bitstream out of the set of bitstreams by identifying, for each of the subsets of bitstreams, out of a selected bitstream of the respective subset of bitstreams, a picture portion relating to a current picture frame and signaling a compilation of the compiled bitstream out of the identified picture portions so that the compiled bitstream comprises a sub-picture portion for the picture portion of the selected bitstream of each of the subsets of bitstreams the compiled bitstream is formed of; and wherein a coding payload section of the picture portion of each bitstream of each subset of bitstreams out of an encrypted set of one or more of the subsets of bitstreams is encrypted by using block-wise encryption by use of sequential variation of a plaintext mask and/or block-encryption key by reinitializing the sequential variation for each picture portion.
10. Collection of data according to claim 9, wherein the re-initialization for each picture portion within the current picture frame is based on mutually different initialization states.
11. Collection of data according to claim 9, the mutually different initialization states are the result of applying mutually different modifications to a base initialization state for the current picture frame.
12. Collection of data according to claim 11, wherein the mutually different modifications for each picture portion depend on the portion of the video picture area which is encoded into the bitstream the respective picture portion belongs to, or depend on an index of the respective picture portion by which the respective picture portion is referred to in the extractor file.
13. Collection of data according to claim 9, wherein the extractor file comprises an initialization state list signaling an initialization state for each picture portion within the current picture frame.
14. Video stream, comprising a set of bit streams and an extractor, the set of bitstreams having encoded thereinto different portions of a video picture area, wherein the extractor indicates a compilation of a compiled bitstream out of the set of bitstreams by identifying, for each of the set of bitstreams, a picture portion relating to a current picture frame and signaling a compilation of the compiled bitstream out of the identified picture portions so that the compiled bitstream comprises a sub-picture portion for the picture portion of each of the set of bitstreams the compiled bitstream is formed of; and wherein a coding payload section of the picture portion of each bitstream out of an encrypted set of one or more of the set of bitstreams is encrypted by using block-wise encryption by use of sequential variation of a plaintext mask and/or block-encryption key by reinitializing the sequential variation for each picture portion.
15. Video stream according to claim 14, wherein the re-initialization for each picture portion within the current picture frame is based on mutually different initialization states.
16. Video stream according to claim 14, the mutually different initialization states are the result of applying mutually different modifications to a base initialization state for the current picture frame.
17. Video stream according to claim 16, wherein the mutually different modifications for each picture portion depend on the portion of the video picture area which is encoded into the bitstream the respective picture portion belongs to, or depend on an index of the respective picture portion by which the respective picture portion is referred to in the extractor file.
18. Video stream according to claim 14, wherein the extractor file comprises an initialization state list signaling an initialization state for each picture portion within the current picture frame.
19. Apparatus for downloading an ROI-specific video stream by tile-based video streaming, configured to inspect a manifest file so as to, depending on an ROI, identify and download a set of bit streams along with an extractor file, the set of bitstreams having encoded thereinto mutually different portions of a video picture area; compile, using the extractor file, a compiled bitstream out of the set of bitstreams by extracting, from each of the set of bitstreams, a picture portion relating to a current picture frame by parsing the respective bitstream and forming the compiled bitstream out of the extracted picture portions so that the compiled bitstream comprises a sub-picture portion for the picture portion of each of the set of bitstreams the compiled bitstream is formed of; and identify a predetermined subpicture portion out of the subpicture portions of the compiled bitstream on the basis of signaling in at least one of the extractor file or the sub-picture portions, decrypt a coding payload section of the predetermined subpicture portion of the subpicture portions of the compiled bitstream by finding a border of the coding payload section of the predetermined subpicture portion to be decrypted by parsing the coding payload section up to a currently decrypted position and/or deriving a length of the coding payload section from a header within the one predetermined subpicture portion, and/or using a bitstream length or pointer indication signaled within the bitstream from which the picture portion is extracted which the predetermined subpicture portion belongs to.
20. Apparatus for recovering a video stream from a set of bitstreams and an extractor file, the set of bitstreams having encoded thereinto different portions of a video picture area, the apparatus comprising compile, using the extractor file, a compiled bitstream out of the set of bitstreams by extracting, from each of the set of bitstreams, a picture portion relating to a current picture frame by parsing the respective bitstream and forming the compiled bitstream out of the extracted picture portions so that the compiled bitstream comprises a sub-picture portion for the picture portion of each of the set of bitstreams the compiled bitstream is formed of; and identify a predetermined subpicture portion out of the subpicture portions of the compiled bitstream on the basis of signaling in at least one of the extractor file or the sub-picture portions, decrypt a coding payload section of the predetermined subpicture portion of the subpicture portions of the compiled bitstream by finding a border of the coding payload section of the predetermined subpicture portion to be decrypted by parsing the coding payload section up to a currently decrypted position and/or deriving a length of the coding payload section from a header within the one predetermined subpicture portion, or using a bitstream length or pointer indication signaled within the bitstream from which the picture portion is extracted which the predetermined subpicture portion belongs to.
21. Apparatus for recovering a video stream from a bitstream which comprises sub-picture portions for different portions of a video picture area, wherein the apparatus is configured to identify a predetermined subpicture portion out of the subpicture portions of the bitstream on the basis of signaling inbound from outside or signaling in the sub-picture portions, decrypt a coding payload section of the predetermined subpicture portion of the subpicture portions of the bitstream by finding a border of the coding payload section of the predetermined subpicture portion to be decrypted by parsing the coding payload section up to a currently decrypted position and/or deriving a length of the coding payload section from a header within the predetermined subpicture portion, and/or using a bitstream length or pointer indication signaled from outside for the predetermined subpicture portion.
22. Apparatus according to claim 19, wherein the decryption involves block-decryption by use of sequential variation of a plaintext mask and/or block-decryption key.
23. Apparatus according to claim 19, configured to perform the identification of the predetermined subpicture portion for several picture frames in manner so that the several picture frames comprise picture frames for which the predetermined subpicture portion correspond to different ones of the different portions, and/or the several picture frames comprise first picture frames for which the exactly one subpicture portion is identified to be the predetermined subpicture portion and second picture frames, interspersed between the first picture frames, for which no subpicture portion is identified to be the predetermined subpicture portion.
24. Collection of data for downloading an ROI-specific video stream by tile-based video streaming, comprising bit streams each having encoded thereinto one of portions of a video picture area, so that each portion of the video picture area is encoded into a subset of the bit streams at different qualities; at least one extractor file associated with an ROI of the video picture area; a manifest file identifying, for the predetermined ROI, a set of bit streams having encoded thereinto different portions of a video picture area which focus on the ROI, wherein the extractor file indicates a compilation of a compiled bitstream out of the set of bitstreams by identifying, for each of the subsets of bitstreams, out of a selected bitstream of the respective subset of bitstreams, a picture portion relating to a current picture frame and signaling a compilation of the compiled bitstream out of the identified picture portions so that the compiled bitstream comprises a sub-picture portion for the picture portion of the selected bitstream of each of the subsets of bitstreams the compiled bitstream is formed of; and a predetermined subpicture portion is identified out of the subpicture portions of the compiled bitstream by identifying a predetermined subset of bitstreams out of the subsets of bitstreams so that the picture portion of the selected bitstream of the predetermined subset of bitstreams is the predetermined subpicture portion and by signaling comprised in at least one of the extractor file or the sub-picture portions, wherein a coding payload section of the picture portion of the bitstreams of the predetermined subset of bitstreams is encrypted.
25. Collection of data according to claim 24 wherein the encryption involves block-wise encryption by use of sequential variation of a plaintext mask and/or block-encryption key.
26. Video stream, comprising a set of bit streams and an extractor file, the set of bitstreams having encoded thereinto different portions of a video picture area, wherein the extractor file indicates a compilation of a compiled bitstream out of the set of bitstreams by identifying, for each of the set of bitstreams, a picture portion relating to a current picture frame and signaling a compilation of the compiled bitstream out of the identified picture portions so that the compiled bitstream comprises a sub-picture portion for the picture portion of each of the set of bitstreams the compiled bitstream is formed of; and a predetermined subpicture portion is identified out of the subpicture portions of the compiled bitstream by signaling comprised in at least one of the extractor file or the sub-picture portions, wherein a coding payload section of the predetermined subpicture portion is encrypted.
27. Video stream according to claim 26 wherein the encryption involves block-wise encryption by use of sequential variation of a plaintext mask and/or block-encryption key.
28. Manifest file comprising first parameter sets each defining one of picture-portion specific adaptation sets of representations, the representations of each picture-portion specific adaptation set having encoded thereinto one of different picture portions at different qualities, wherein each first parameter set comprises a quality level for each representation of the picture-portion specific adaptation set defined by the respective first parameter set; at least one second parameter set defining a preselection adaptation set which assigns to each of regions of an output picture area one of the picture-portion specific adaptation sets, wherein the at least one second parameter set comprises one or more parameters for each region of the output picture area, indicating a quality level range covering the quality levels of the representations of the picture-portion specific adaptation set assigned to the respective region, and/or the manifest file comprises an indication whether the quality levels indicated by the first parameter sets are defined on a common ordinal scale so as to be ordinally scaled across different ones of the first parameter sets, or the quality levels indicated by the first parameter sets are defined on separate ordinal scales, individual for the first parameter sets; and/or the at least one second parameter set comprises one or more parameters indicating, for each region of the output picture area, a quality level hint for the respective region and an indication whether the quality level hint for the respective region and the quality levels defined by the first parameter set of the picture-portion specific adaptation set assigned to the respective region, are defined on a common ordinal scale so as to be ordinally scaled thereacross, or the quality level hint and the quality levels defined by the first parameter set of the picture-portion specific adaptation set assigned to the respective region are defined on separate ordinal scales, and/or the at least one second parameter set comprises one or more parameters indicating, for the regions of the output picture area, quality ranking among the regions.
29. Manifest file according to claim 28, wherein for each picture-portion specific adaptation set, the first parameter set defines a field of view information with respect to the picture portion encoded into the representations of the respective picture-portion specific adaptation set.
30. Manifest file according to claim 28, wherein the second parameter set defines a field of view information with respect to a collation of the regions.
31. Manifest file according to claim 30, wherein there are more than two second parameter sets of respective preselection adaptation sets, each defining a field of view information with respect to a collation of its regions, wherein the collation coincides between said at least two second parameter sets.
32. Manifest file according to claim 31, wherein the more than two second parameter sets define a region of highest quality among the regions, a location of which within the collation varies over the more than two second parameter sets.
33. Client device configured to inspect a manifest file of claim 28 and change, based on the quality level range and/or the indication, a streaming strategy in adaptively streaming a video from a server.
34. Client device of claim 33 configured to use the quality levels, quality level ranges, the quality level hints and/or the indication, in order to rank the preselection adaptation sets with respect to a wished viewport.
35. Client device of claim 33 configured to determine a location of a ROI comprising increased quality in an output picture area of a preselection adaptation set based on quality information in the manifest file and compare the location of the ROI with a wished viewport in order to determine the streaming strategy.
36. Manifest file comprising a first parameter set for a region-wise compiled adaptation set defining a set of representations coinciding in a subdivision of a video picture area in regions, the representations having encoded thereinto the regions of video picture area at different quality level tuples assigning a region-specific quality level to each region, the first parameter set comprising an adaptation set quality level indication for all regions, and for each representation, a representation-specific quality level indication wherein, for each representation, the quality level tuple of the respective representation is derivable from a combination of the adaptation set quality level indication and the representation-specific quality level indication for the respective representation.
37. Manifest file of claim 36, wherein the quality level tuple of the representation representations vary such that a location of a region of highest quality among the regions varies over the representations.
38. Client device configured to inspect the manifest file of claim 36 and use the quality level tuples of the representations in a streaming strategy for adaptively streaming a video from a server.
39. Client device of claim 38 configured to use the quality level tuples of the representation in order to rank the representations with respect to a wished viewport.
40. File format descriptor for an extractor which indicates a compilation of a compiled bitstream out of subsets of bitstreams each associated with a different one of portions of a video picture area, with leaving freedom to select for each portion one bitstream out of the associated subset of bitstreams, wherein the file format comprises one or more parameters for each portion of the video picture area, indicating a quality level range covering quality levels signaled in the representations of the subset of representations assigned to the respective portion, or quality offsets between the quality levels signaled by the representations of different ones of the subsets of representations sets, and/or comprises an indication whether quality levels indicated in the representations are defined on a common ordinal scale so as to be ordinally scaled across different ones of the representations of different subsets, or the quality levels indicated by the representations are defined on separate ordinal scales, individual for the subsets; and/or comprises one or more parameters indicating, for each portion of the output picture area, a quality level hint for the respective portion and an indication whether the quality level hint for the respective portion and the quality levels indicated in the representations comprised by the subset associated with the respective portion, are defined on a common ordinal scale so as to be ordinally scaled thereacross, or the quality level hint and the quality levels indicated in the representations comprised by the subset associated with the respective portion are defined on separate ordinal scales, and/or comprises one or more parameters indicating, for the portions of the output picture area, quality ranking among the portions.
41. Method for downloading an ROI-specific video stream by tile-based video streaming, the method comprising: inspecting a manifest file so as to, depending on a region of interest, identify and download a set of bit streams along with an extractor, the set of bitstreams having encoded thereinto different portions of a video picture area; compiling, using the extractor, a compiled bitstream out of the set of bitstreams by extracting, from each of the set of bitstreams, a picture portion relating to a current picture frame by parsing the respective bitstream and forming the compiled bitstream out of the extracted picture portions so that the compiled bitstream comprises a sub-picture portion for the picture portion of each of the set of bitstreams the compiled bitstream is formed of; and decrypting a coding payload section of each subpicture portion of a subset of one or more of the subpicture portions of the compiled bitstream by using block-wise decryption by use of sequential variation of a plaintext mask and/or block-decryption key by reinitializing the sequential variation for each subpicture portion to be decrypted and finding a border of the coding payload section of each subpicture portion to be decrypted by parsing the coding payload section of the respective subpicture portion up to a currently decrypted position and/or deriving a length of the coding payload section of the respective subpicture portion from a header within the respective subpicture portion, and/or using a bitstream length or pointer indication signaled within the bitstream from which the picture portion is extracted which the respective subpicture portion belongs to.
42. Method for recovering a video stream from a set of bit streams and an extractor file, the set of bitstreams having encoded thereinto different portions of a video picture area, the method comprising: compiling, using the extractor file, a compiled bitstream out of the set of bitstreams by extracting, from each of the set of bitstreams, a picture portion relating to a current picture frame by parsing the respective bitstream and forming the compiled bitstream out of the extracted picture portions so that the compiled bitstream comprises a sub-picture portion for the picture portion of each of the set of bitstreams the compiled bitstream is formed of; and decrypting a coding payload section of each subpicture portion of a subset of one or more of the subpicture portions of the compiled bitstream by using block-wise decryption by use of sequential variation of a plaintext mask and/or block-decryption key by reinitializing the sequential variation for each subpicture portion to be decrypted and finding a border of the coding payload section of each subpicture portion to be decrypted by parsing the coding payload section of the respective subpicture portion up to a currently decrypted position and/or deriving a length of the coding payload section of the respective subpicture portion from a header within the respective subpicture portion, and/or using a bitstream length or pointer indication signaled within the bitstream from which the picture portion is extracted which the respective subpicture portion belongs to.
43. Method for recovering a video stream from a bitstream which comprises sub-picture portions for different portions of a video picture area, the method comprising: decrypting a coding payload section of each subpicture portion of a subset of one or more of the subpicture portions of the bitstream by using block-wise decryption by use of sequential variation of a plaintext mask and/or block-decryption key by reinitializing the sequential variation for each subpicture portion to be decrypted and finding a border of the coding payload section of each subpicture portion to be decrypted by parsing the coding payload section of the respective subpicture portion up to a currently decrypted position and/or deriving a length of the coding payload section of the respective subpicture portion from a header within the respective subpicture portion, and/or using a bitstream length or pointer indication signaled from outside for the respective subpicture portion.
44. Method for downloading an ROI-specific video stream by tile-based video streaming, the method comprising: inspecting a manifest file so as to, depending on an ROI, identify and download a set of bit streams along with an extractor file, the set of bitstreams having encoded thereinto mutually different portions of a video picture area; compiling, using the extractor file, a compiled bitstream out of the set of bitstreams by extracting, from each of the set of bitstreams, a picture portion relating to a current picture frame by parsing the respective bitstream and forming the compiled bitstream out of the extracted picture portions so that the compiled bitstream comprises a sub-picture portion for the picture portion of each of the set of bitstreams the compiled bitstream is formed of; and identifying a predetermined subpicture portion out of the subpicture portions of the compiled bitstream on the basis of signaling in at least one of the extractor file or the sub-picture portions, decrypting a coding payload section of the predetermined subpicture portion of the subpicture portions of the compiled bitstream by finding a border of the coding payload section of the predetermined subpicture portion to be decrypted by parsing the coding payload section up to a currently decrypted position and/or deriving a length of the coding payload section from a header within the one predetermined subpicture portion, and/or using a bitstream length or pointer indication signaled within the bitstream from which the picture portion is extracted which the predetermined subpicture portion belongs to.
45. Method for recovering a video stream from a set of bitstreams and an extractor file, the set of bitstreams having encoded thereinto different portions of a video picture area, the method comprising: compiling, using the extractor file, a compiled bitstream out of the set of bitstreams by extracting, from each of the set of bitstreams, a picture portion relating to a current picture frame by parsing the respective bitstream and forming the compiled bitstream out of the extracted picture portions so that the compiled bitstream comprises a sub-picture portion for the picture portion of each of the set of bitstreams the compiled bitstream is formed of; and identifying a predetermined subpicture portion out of the subpicture portions of the compiled bitstream on the basis of signaling in at least one of the extractor file or the sub-picture portions, decrypting a coding payload section of the predetermined subpicture portion of the subpicture portions of the compiled bitstream by finding a border of the coding payload section of the predetermined subpicture portion to be decrypted by parsing the coding payload section up to a currently decrypted position and/or deriving a length of the coding payload section from a header within the one predetermined subpicture portion, or using a bitstream length or pointer indication signaled within the bitstream from which the picture portion is extracted which the predetermined subpicture portion belongs to.
46. Method for recovering a video stream from a bitstream which comprises sub-picture portions for different portions of a video picture area, the method comprising: identifying a predetermined subpicture portion out of the subpicture portions of the bitstream on the basis of signaling inbound from outside or signaling in the sub-picture portions, decrypting a coding payload section of the predetermined subpicture portion of the subpicture portions of the bitstream by finding a border of the coding payload section of the predetermined subpicture portion to be decrypted by parsing the coding payload section up to a currently decrypted position and/or deriving a length of the coding payload section from a header within the predetermined subpicture portion, and/or using a bitstream length or pointer indication signaled from outside for the predetermined subpicture portion.
47. Method for operating a client device, comprising: inspecting a manifest file of claim 28 and changing, based on the quality level range and/or the indication, a streaming strategy in adaptively streaming a video from a server.
48. Method for operating a client device, comprising: inspecting the manifest file of claim 36 and using the quality level tuples of the representations in a streaming strategy for adaptively streaming a video from a server.
49. A non-transitory digital storage medium having a computer program stored thereon to perform the method for recovering a video stream from a set of bit streams and an extractor file, the set of bitstreams having encoded thereinto different portions of a video picture area, the method comprising: compiling, using the extractor file, a compiled bitstream out of the set of bitstreams by extracting, from each of the set of bitstreams, a picture portion relating to a current picture frame by parsing the respective bitstream and forming the compiled bitstream out of the extracted picture portions so that the compiled bitstream comprises a sub-picture portion for the picture portion of each of the set of bitstreams the compiled bitstream is formed of; and decrypting a coding payload section of each subpicture portion of a subset of one or more of the subpicture portions of the compiled bitstream by using block-wise decryption by use of sequential variation of a plaintext mask and/or block-decryption key by reinitializing the sequential variation for each subpicture portion to be decrypted and finding a border of the coding payload section of each subpicture portion to be decrypted by parsing the coding payload section of the respective subpicture portion up to a currently decrypted position and/or deriving a length of the coding payload section of the respective subpicture portion from a header within the respective subpicture portion, and/or using a bitstream length or pointer indication signaled within the bitstream from which the picture portion is extracted which the respective subpicture portion belongs to, when said computer program is run by a computer.
50. A non-transitory digital storage medium having a computer program stored thereon to perform the method for recovering a video stream from a bitstream which comprises sub-picture portions for different portions of a video picture area, the method comprising: decrypting a coding payload section of each subpicture portion of a subset of one or more of the subpicture portions of the bitstream by using block-wise decryption by use of sequential variation of a plaintext mask and/or block-decryption key by reinitializing the sequential variation for each subpicture portion to be decrypted and finding a border of the coding payload section of each subpicture portion to be decrypted by parsing the coding payload section of the respective subpicture portion up to a currently decrypted position and/or deriving a length of the coding payload section of the respective subpicture portion from a header within the respective subpicture portion, and/or using a bitstream length or pointer indication signaled from outside for the respective subpicture portion, when said computer program is run by a computer.
51. A non-transitory digital storage medium having a computer program stored thereon to perform the method for recovering a video stream from a set of bitstreams and an extractor file, the set of bitstreams having encoded thereinto different portions of a video picture area, the method comprising: compiling, using the extractor file, a compiled bitstream out of the set of bitstreams by extracting, from each of the set of bitstreams, a picture portion relating to a current picture frame by parsing the respective bitstream and forming the compiled bitstream out of the extracted picture portions so that the compiled bitstream comprises a sub-picture portion for the picture portion of each of the set of bitstreams the compiled bitstream is formed of; and identifying a predetermined subpicture portion out of the subpicture portions of the compiled bitstream on the basis of signaling in at least one of the extractor file or the sub-picture portions, decrypting a coding payload section of the predetermined subpicture portion of the subpicture portions of the compiled bitstream by finding a border of the coding payload section of the predetermined subpicture portion to be decrypted by parsing the coding payload section up to a currently decrypted position and/or deriving a length of the coding payload section from a header within the one predetermined subpicture portion, or using a bitstream length or pointer indication signaled within the bitstream from which the picture portion is extracted which the predetermined subpicture portion belongs to, when said computer program is run by a computer.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0054] Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
[0055]
[0056]
[0057]
[0058]
[0059]
[0060]
[0061]
[0062]
[0063]
[0064]
[0065]
[0066]
[0067]
[0068]
[0069]
[0070]
[0071]
[0072]
[0073]
[0074]
[0075]
DETAILED DESCRIPTION OF THE INVENTION
[0076] The following description of embodiments relating to the first aspect of the present application preliminarily resumes the description of the handling of encryption relating to portioned or tile-based video streaming set out above in the introductory portion of the specification. To this end, possible modifications of the known techniques in the environment of MPEG are presented. These modifications, thus, represent embodiments of the first aspect of the present application, and they are abstracted thereinafter as the modifications are not restricted to be used in the MPEG environment, but may be advantageously used elsewhere.
[0077] In particular, embodiments described further below enable content media encryption in tile-based video streaming systems across a wider set of available platforms in an efficient manner and overcome the shortcoming of the encryption schemes present in the introductory portion of the specification in this regard. In particular, this encompasses tile-based streaming services with: [0078] CTR based encryption of all sub-pictures [0079] Encrypted media (CTR or CBC) with DASH Preselections
[0080] A first tool which is used in accordance with a subsequently described modifying embodiment which allows for cbcs all subsample encryption with preselection, is called mandatory subsample identification concept or algorithm in the following. This algorithm allows to make use of CBC based encryption schemes when preselections are used in the MPD. Common encryption [3] offers two ways to identify subsample boundaries and, hence, the byte ranges of encrypted and un-encrypted data as reproduced for reference in the following: A decryptor can decrypt by parsing NAL units to locate video NALs by their type header, then parse their slice headers to locate the start of the encryption pattern, and parse their Part 15 NAL size headers to determine the end of the NAL and matching Subsample protected data range. It is therefore possible to decrypt a track using either (a) this algorithm, i.e. by parsing, ignoring the Sample Auxiliary Information or (b) the Sample Auxiliary Information, ignoring this algorithm.
[0081] The Sample Auxiliary Information (SAI) consists of the two boxes saiz and saio defined in [4] that together indicate the location and ranges of the bytes of encrypted and un-encrypted data. However, in a tile-based streaming scenario with preselections, it is not possible to know the bitrate (and hence byte size) of each sub-picture/tile in the resulting client-side bitstream. Hence, it is not possible for the extractor track to include correct SAI beforehand.
[0082] Therefore, in accordance with embodiments described herein, it is signalled or mandated in an application format specification such as OMAF that, if present, the incorrect SAI parameters related to clear/protected byte ranges within the extractor track are not to be regarded and instead the above algorithm is to be used for derivation of the location and ranges of the bytes of encrypted and un-encrypted data.
[0083] In accordance with a first embodiment, this concept is used along with encrypting the video content portion/tile wise as described in the following.
[0084] In particular,
[0085] The data 10 further comprises at least one extractor 20, i.e. extractor data or extractor file or extractor track, associated with an ROI 22 of the video picture area, and a manifest file 24. The latter identifies, for the predetermined ROI 22, as illustrated by arrow 26, a set of bit streams 12, the set being composed of one bit stream 12 per subset 18 so as to have encoded thereinto the different portions 14 into which the video picture area 16 is partitioned in a manner focussing on the ROI 22. This focussing is done, for instance, by composing the set such that for subsets 18 within the ROI, the one bit stream out of this subset 18, which contributes to the composed set, is of higher quality compared to subsets 18 pertaining portions 14 outside ROI 22 where the one bit stream selected out of corresponding subsets 18 and comprised by the ROI specific set is of lower quality. The set, thus formed by referencing 26 and indicated by manifest 24, is a ROI specific set of bit streams. An example is depicted in
[0086] Note that the bit streams 12 may, for instance, be formed by M independently coded tiles of N video data streams each having video picture area 16 encoded thereinto in units of these M tiles 14, but at different quality levels. Thus, N times M bit streams would result with
[0087] The bit streams 12 may be stored for on a storage for being downloaded, in pieces and selectively, by a client as taught later on, and might be treated, though, as individual representations in the MPD 24 which is also stored for download by the client and indicates to the client addresses for the download of the bit streams 12. The representations corresponding to bit streams 12 may be, however, by indicated as being not dedicated for being played out individually, i.e. not for play out without being part of a ROI specific set s formed by adaptation set. The extractor 20 is also stored for download by the clients either separately by addresses being indicated in the manifest 24, or along with any of the bit streams such as a track of a media file. In the further description herein, the extractor 20 has also been denoted as FF extractor file. The quality levels which the representations in one subset 18 relate to, may vary in terms of, for instance, SNR and/or spatial resolution and/or colorness.
[0088] The extractor file 20 is quasi a constructor for constructing a compiled bit stream out of the ROI specific set. It may be downloaded by the client along with the ROI specific set of bit streams 12. It indicates, by way of pointers and/or construction instructions, a compilation of the compiled bitstream out of the ROI specific set of bitstreams by identifying 26, for each of the subsets 18 of bitstreams, out of the one bitstream of the respective subset 18 of bitstreams, comprised by the ROI specific set, a picture portion relating to a current picture frame and signalling a compilation of the compiled bitstream out of the identified picture portions so that the compiled bitstream comprises a sub-picture portion for the picture portion of the selected bitstream of each of the subsets 18 of bitstreams the compiled bitstream is formed of. In
[0089] Note that in case of preselection which
[0090] A coding payload section of the picture portion 34 of each bitstream 12 of each subset 18 of bitstreams, is encrypted by using block-wise encryption by use of sequential variation of a plaintext mask and/or block-encryption key by reinitializing the sequential variation for each picture portion 34. That is, instead of encrypting the coding payload sections of the picture portions 34 of a collection of bit streams, the portions 14 of which together cover the picture area 16 and all belong to the a common picture frame 30, sequentially without reinitializing the sequential variation therebetween such as for the set 32, the encryption is done for each picture portion 34 separately.
[0091] It should be noted that the encryption of the coding payload section may be restricted to picture portions 34 of bit streams 12 belonging to any of an encrypted set of one or more of the subsets 18 of bitstreams, such as to subsets 18 relating to portions 14 in the mid of picture 16 or subsets 18 relating to every second portion 14 distributed over the area 16 like checkerboard pattern, for instance.
[0092]
[0093] Thus, the RIO specific set 32 of bit streams, in its not yet decrypted form, and the extractor 20 together represent an encrypted video stream. The ROI specific set 32 of bitstreams 12 has encoded thereinto the portions 14 of video picture area 16, and the extractor 20 indicates the compilation of the compiled bitstream out of this set 32. The coding payload section 48 of the picture portion 34 of each bitstream 12 out of set 32or merely of the encrypted set of bitstreams thereamongis encrypted by using the block-wise encryption using the sequential variation of plaintext mask and/or block-encryption key and by reinitializing the sequential variation for each picture portion.
[0094]
[0095] The DASH client 82 downloads and inspects the manifest file 24 so as to, depending on an ROI which is currently of interest because of, for instance, the user looking at the corresponding viewport, such as 22 in
[0096] The file handler 84 compiles, using the extractor file 20, the compiled bitstream 40 out of the ROI specific set 32 of bitstreams 12 by extracting, from each of these bitstreams, the picture portion 34 relating to a current picture frame 30 by parsing the respective bitstream and forming the compiled bitstream 40 out of the extracted picture portions 34 so that the compiled bitstream 40 is composed of the corresponding sub-picture portions 44, one for each portion 14. Note that at the time of receiving the bitstreams of ROI specific set 32, the picture portions' payload sections are still encrypted. The picture portions are, however, packetized so that the file handler is able to handle them though.
[0097] The decryptor 86 decrypts the encrypted coding payload section 48 of each subpicture portion 44 by using block-wise decryption by use of sequential variation of a plaintext mask and/or block-decryption key. To this end, the decryptor 86 reinitializes the sequential variation for each subpicture portion 44 to be decrypted, i.e. at the beginning 92 of concatenation 50 or the start border of the payload section 48 of the first unit 36. It finds the borders 54, 56 of the coding payload section(s) of each subpicture portion 44 to be decrypted by parsing the coding payload section of the respective subpicture portion 44 up to a currently decrypted position or, differently speaking, by alternatingly decrypting and parsing the payload section(s) of concatenation 50.
[0098] See, for instance,
[0099] Note that the payload data sections 48 were denoted video slice data in
[0100] In effect, the concatenation or combination of file handler 84 and decryptor 86 from a an apparatus for recovering a video stream from a downloaded ROI specific set 32 of bit streams 12 and a corresponding extractor 20. The video stream may be fed into decoder 88 which may optionally part of that apparatus or not. The file handler performs the compilation using the extractor file 20 and the decryptor 86 the decryption of the coding payload sections 48 using the alternating parsing/decryption concept of
[0101] The decryptor 86, in turn, represents an apparatus for recovering a video stream for being decoded by a decoder 88, from compiled bitstream 40, the apparatus being configured to decrypt the coding payload sections of each subpicture portion 44 the alternating parsing/decryption concept of
[0102] Note that, as described, the parsing the coding payload section 48 according to
[0103] The above embodiments enabled an encryption of all subsamples 44 downloaded.
[0104] However, in accordance with embodiments described next, encryption may be focused onto one sub-sample 44, for instance. Again, the above description of the introductory specification is initially resumed before presenting broadening embodiments. In particular, here, an index of an encrypted subsample is used for addressing alternation (or allowing alternating) single (one|most important|high-res) subsample encryption, wherein this is combinable with CTR or cbc1 encryption and the usage of preselections.
[0105] Based on the subsample identification algorithm illustrated in
[0108] To enable this subsample encryption, an index to the encrypted subsample is signalled so that the decryptor can identify the encrypted subsample 44. For instance, the decryptor may simply count through the subsamples 44 within a sample 42 until the decryptor reaches the signalled index of the encrypted subsample and, by way of gathering the NALU length from the Part 15 header and by identifying how many bytes to decrypt as taught with respect to
TABLE-US-00002 aligned(8) class SampleEncryptionBox extends FullBox(senc, version=0, flags) { unsigned int(32)sample_count; { unsigned int(Per_Sample_IV_Size*8)InitializationVector; if (flags & 0x000002) { unsigned int(16)subsample_count; { unsigned int(16) BytesOfClearData; unsigned int(32) BytesOfProtectedData; } [ subsample_count ] } }[ sample_count ] }
[0109] One embodiment is a new version of the senc box that omits signaling of incorrect byte ranges and instead indicates indexes of encrypted subsamples is as follows.
TABLE-US-00003 aligned(8) class SampleEncryptionBox_Invention2 extends FullBox(senc, version, flags) { unsigned int(32)sample_count; { unsigned int(Per_Sample_IV_Size*8)InitializationVector; if (flags & 0x000002) { if (version == 0) { unsigned int(16)subsample_count; { unsigned int(16) BytesOfClearData; unsigned int(32) BytesOfProtectedData; }[ subsample_count ] } else if (version == 1){ unsigned int(32)EncryptedSubsampleIndex; } } }[ sample_count ] }
[0110] Here, EncryptedSubsampleIndex points to the encrypted subsample 44 within the current sample 42.
[0111] The just described modification leads to embodiments which may be explained by referring to
[0112] Having said this,
[0113] The data downloaded according to the latter embodiment, represents a video stream, comprising the ROI specific set 32 of bit streams 12 and the extractor 20, wherein the ROI specific set 32 of bitstreams 12 has encoded thereinto the portions 14 of the video picture area, and the extractor 20 indicates the compilation of the compiled bitstream 40 out of the ROI specific set 32 of bitstreams 12 in the manner outlined above. The predetermined subpicture portion 44 in this compiled bitstream is identified out of the subpicture portions 44 of the compiled bitstream 40 by signaling contained in at least one of the extractor 20 or the sub-picture portions 44. The coding payload section of the predetermined subpicture portion is encrypted.
[0114] In line with above re-interpretation of
[0115] The decryptor 86, in turn, represents an apparatus for recovering the video stream from the bitstream 40, wherein the apparatus is configured to identify the encrypted subpicture portion 44 on the basis of signaling inbound from outside, namely from the file handler 84 which forwards this information as taken from signaling in the extractor 20, or itself from signaling in the sub-picture portions 44. It then performs the decryption of the coding payload section 48 of the encrypted subpicture portion 44 with forming the border detection of
[0116] The signaling may index or address the encrypted subsample 44 out of the subsamples of the current sample 42 of the compiled bitstream 40 in form its rank in the sample 42 so that the decryptor 84 may count the subsamples 44 in the current sample 42 to detect the nth subsample 44 in sample 42 with n being the rank indicated by the signaling.
[0117] The identification of the encrypted subpicture portion for several picture frames may be done in manner so that the several picture frames contain picture frames 30 for which the encrypted subpicture portion 44 corresponds to different portions 14, and/or the several picture frames contain first picture frames for which there is exactly one encrypted subpicture portion 44 and second picture frames, interspersed between the first picture frames, for which no subpicture portion is identified to be the encrypted subpicture portion. That is, for some frames, no encryption may take place with respect to any portion 14.
[0118] Again, it is noted that all details having initially been described above with respect to
[0119] Without having explicitly mentioned it with respect to
[0120] Next, modifications of above described embodiments are described which do not need the alternating decryption/parsing procedure for detecting the encrypted ranges 48. An extended SAI variant which allows cbcs all subsample encryption with preselection described next would allow this cbcs all subsample encryption with preselection, but without the need to parse the slice header. According to next variants, an explicit signaling or straight-forwards derivation of clear and protected data ranges within the extractor track is allowed.
[0121] First, a senc box extension using NAL lengths (i.e. extracted bytes) for derivation of encrypted byte ranges is described. As described before, the individual subsamples' sizes in the composed bitstream 32 may vary depending on the extracted data when preselection is used. The video bitstream structure may be used to derive encrypted byte ranges, specifically the Part 15 NALU length headers. One embodiment would be to define a second version of the box as follows:
TABLE-US-00004 aligned(8) class SampleEncryptionBox_Invention3.1 extends FullBox(senc, version, flags) { unsigned int(32)sample_count; { unsigned int(Per_Sample_IV_Size*8)InitializationVector; if (flags & 0x000002) { unsigned int(16)subsample_count; { if (version == 0){ unsigned int(16) BytesOfClearData; unsigned int(32) BytesOfProtectedData; }else if (version == 1){ unsigned int(1) WholeDataClear; unsigned int(15) BytesOfClearData; } } [ subsample_count ] } }[ sample_count ] }
[0122] In this embodiment, a simplification is assumed, which is that a subsample is to be equal to a NAL Unit. The size of the subsample is determined by the NALULength. This is found at the first position (e.g. first 4 bytes) of the sample (this applies for the first subsample of the sample) and at position Pos_i=Sum{i=1 . . . N}(NALULengthi) (for the remaining subsamples in the sample). The length of the BytesOfProtectedData is derived as the length of the subsample-BytesOfClearData if WholeDataClear is not 1. If WholeDataClear is equal to 1, BytesOfProtectedData is inferred to be equal to 0 and BytesOfClearData (although in this case mandated to be signalled as 0 in the box/syntax) is inferred to be equal to the subsample length derived from the Part 15 NALU length header.
[0123] That is, in accordance with all embodiments for apparatuses described above with respect to
[0124] Another option to avoid the alternating decryption/parsing border detection may be called CENC: a FF-senc inheritance box is used to inherit subsample sizes from any sub-picture track or bitstream 12 into extractor track or the composed bitstream 40, respectively.
[0125] The aim of this option is to define an inheritance box that derives the subsample values from the dependent tracks (bitstreams 12 of set 32). The dependent tracks are signalled in the tref box in the moov box, i.e. the extractor 20. This information is used to get the samples from the dependent tracks, thereby becoming subsamples 44 of composed bitstream 40. In a similar manner, the BytesOfProtectedData can be inherited by a box (e.g. senc box) of the dependent track with some hints (e.g. offsets how to find it) and the BytesOfClearData signalled in the inheritance box since this is the same size, and independent of the representation used when using Preselections. Hence, inheritance of the senc relevant information from information signalled in the dependent tracks carrying the subsamples is allowed. Hints for gathering this information is signaled in the extractor 20. As illustrated in
[0126] That is, in accordance with all embodiments for apparatuses described above with respect to
[0127] Note that whatever border detection alternative is used, the client apparatus 10 may be disregard explicit border location information in the extractor 20 which may be wrong and be there merely for standard conformance reasons, or, differently speaking, which might be in there, for instance, because mandatory according to standard, but not correct owing to preselection-inherent freedom in selecting among representations within each adaptation set.
[0128] Next, possible extensions of above described embodiments are presented. They may be referred to as ces2CTR based encryption with subsample initialization vector.
[0129] Here, a CTR based sub-picture encryption scheme is augmented with encryption metadata (i.e. means for allowing re-initialization of the encryption chain for each subsample with an per subsample initialization vector(s)) that allow independence of the encrypted data streams of each tiles.
[0130] A comparison approach which may be used for the CBC based cbcs scheme is to use one IV for all subsamples of the sample. This has the disadvantage to result in similar ciphertext blocks at be beginning of each subsample when the plaintext blocks are similar.
[0131] The presently discussed possibilities entail various modes for derivation of the varying per-subsample IVs on client side. First, the IVs can be explicitly signalled in a new version of the senc box as given below.
TABLE-US-00005 aligned(8) class SampleEncryptionBox_Invention4 extends FullBox(senc, version, flags) { unsigned int(32)sample_count; { if (version = 0){ unsigned int(Per_Sample_IV_Size*8)InitializationVector; if (flags & 0x000002){ unsigned int(16)subsample_count; { unsigned int(16) BytesOfClearData; unsigned int(32) BytesOfProtectedData; } [ subsample_count ] } } else if (version == 1){ if (flags & 0x000002){ unsigned int(16)subsample_count; { unsigned int(Per_Sample_IV_Size*8) InitializationVectorPerSubsample; unsigned int(16) BytesOfClearData; unsigned int(32) BytesOfProtectedData; } [ subsample_count ] } } }[ sample_count ] }
[0132] A further possibility is to derive the subsample Ns on client side based on a single signalled IV per sample as in the existing senc box but with an additional subsample dependent offset. The offset in this case can either be [0133] calculated via an numeric function (e.g. offset equals subsample_index*((2(N*8)1)/subsample_count) for an N byte counter. [0134] derived from the subsample_index-th entry of an prearranged pseudo-random sequence.
[0135] Summarizing, in the above described embodiments, described above with respect to
[0136] The following description focuses on another aspect of the present application. In particular, here, the embodiments seek to overcome a problem associated with the usage of preselection adaptation sets, namely the problem that the combinational options offered by such preselection adaptation sets for the client by selecting one representation out of each picture-portion specific adaptation set assigned by this preselection adaptation set to each of regions of an output picture area, are difficult to understand in terms of the quality ranking between these combinational options as well as in terms of the overall location of the ROI within the circumference of the output picture area they correspond to. The following embodiments seek to overcome this problem. As done previously with respect to the encryption/decryption related embodiments, the following description starts with resuming the description set out in the introductory portion of the specification of the present application by way of presenting possible modifications of the techniques set out in the introductory portion. Later on, the embodiments represented by these modifications are then broadened by broadening embodiments.
[0137] In particular, to cope with the just-outlined problem one of the following solutions might be used:
First Embodiment
[0138] Add max_quality_ranking and min_quality_ranking attributes to the region-wise quality descriptor as shown in
Second Embodiment
[0139] Add a flag indicating scope of the quality values is only within adaptation set as show in
[0140] It would be undesirable to have regions defined in the RWQR descriptor for which local_quality_ranking has different values, since it would be difficult to interpreted the meaning of the qualities of different regions across representations. Therefore, it can be mandated that all RWQR descriptors within an adaptation set shall have the same value for local_quality_ranking. Alternatively the signaling could be done out of RWQR descriptor and add it at the MPD (e.g. at adaptation Set level).
Third Embodiment
[0141] Add the RWQR as a delta to a qualityRanking indicated for a representation.
[0142] It would be desirable to group all representations with same viewport as focus within an AdaptationSet. Therefore, it is helpful to indicate for a given AdaptationSet which region is emphasized and to describe the quality relationships for each region. Such an indication can be used as a grouping mechanism. E.g. in
[0143] In this example we assume that the region of RWQR1 has a better quality as RWQR2 and the region-wise quality descriptors are used on the AdaptationSet level to signal that. The RWQR is therefore use to group the representations and indicate the quality relationship of the regions. This is done as a delta/offset to a quality ranking indicated for the representations themselves. Thus the @qualityRanking attributes from all representations within the same AdaptationSet are used to compute the real quality values of the regions together with the region-wise quality ranking descriptors (RWQR1 and RWQR2).
[0144] An option could be to apply the described descriptor to tile-based streaming, in which case the dependencyIds would be used in such a way, that within the AdaptationSet where the region-wise quality ranking descriptors are located, all combinations of Representations and their @qualityRanking attributes have the same relationship (signalled delta in the proposed RWQR). For example, if RWQR1 and RWQR2 values define the delta/offset value of 1, qualityRanking attributes shall have the same relationship.
[0145] Obviously, the same approach can be used for other viewport dependent solutions. If the viewport dependency is achieved using a certain projection method, like for example in case of the Truncated Square Pyramid Projection (TSP) (see the example for the projection in
[0146] In order to describe certain broadening embodiments with respect to the just-outlined modification embodiments, reference is made to
[0147]
[0148] The manifest file 24, at least, comprises first parameter sets 202, namely one for each adaptation set 200. Each parameter set # i, 202, defines the corresponding scene-portion specific adaptation set # i, 200, by associating with this adaptation set # i a certain sub-group of representations 12 within one subset 18 so that the representations 12 within each such adaptation sets 200 have encoded there into the same scene portion 14, but at different qualities. Each of these parameter sets 202 comprises a quality level, or a syntax element 204 indicating a quality level, for each representation 12 within the adaptation set which the respective parameter set defines. To this end, the parameter set 202 defining adaptation set # i has a quality level Q.sub.i(j) for each representation #j within that adaptation set i. This had also been depicted in
[0149] Besides, the manifest file 24 comprises parameters sets 206 which define preselection adaptation sets. Each preselection adaptation set 208 assigns to each of regions of an output picture area one of the tile-specific adaptation sets 200. The preselection adaptation sets 208, thus defined, differ in assignment of tile-specific adaptation sets 200 to the regions. Frankly speaking, preselection adaptation sets are ROI specific in that they, for instance, assign adaptation sets 200 of representations 12 of higher quality to a region or regions corresponding to the ROI, compared to qualities of representations 12 of adaptation sets assigned to regions farther away from the ROI, or in that, for instance, they only collect adaptation sets 200 relating to regions at and around the ROI with leaving out regions farther away from the ROI. A problem exists in that, however, the client has to ascertain by itself, and in a manner further outlined below, as to which ROI a specific preselection adaptation sets relates to. The qualities 204 are not suitable to this end by themselves alone as they are merely ordinally scaled within the same set 202 they are comprised by.
[0150] Generally, the mentioned regions and output picture area may correspond to a partitioning of the picture or scene area 16 into portions 14 using which bitstreams 12 might have been obtained by tile-based encoding, but the output picture area might alternatively rearrange and/or scale and/or rotate portions 14 to result into an output picture area with this rearrangement and/or scaling and/or rotation possibly being indicated in the manifest file 24 as well, or the output picture area only is composed of a proper subset of the portions 14. In order to ease the description of the main topics of the following embodiments, it shall preliminarily be assumed that the output picture area looks like the scene area 16 and that the portions 14 represent the regions 14 for which each preselection adaptation set 208 assigns one of the corresponding adaptation sets 200.
[0151] With respect to
[0152] Summarizing the description brought forward so far with respect to
[0153] To this end, each preselection adaptation set 206 comprises certain additional quality guidance data 218, namely guidance data 218 which enables to define a ranking among the picture-portion specific adaptation sets 200 assigned by the respective preselection adaptation set 206 to regions 214 mutually in terms of quality, and optionally may enable an even finer assessment of the mutual relationship between the representations 12 comprised by the assigned picture-portion specific adaptation sets 200 assigned by a certain preselection adaptation set 206 in terms of quality.
[0154] A first embodiment conveyed by the above description of modifications of the technique set out in the introductory portion of the specification of the present application, is described with respect to
[0155] In the above example of
[0156] And beyond all, the client may deduce as to where the ROI of a certain preselection adaptation set lies and may, accordingly, select among several preselection adaptation sets available one where the ROI coincides, for instance, with a current user's viewport.
[0157] A further embodiment which is derivable from the description of
[0158] In
[0159] In accordance with an even further embodiment, the guidance information 2018 merely comprises the Q(i)'s without 223 or 218. Even here, the client is able to determine the RIO of a certain preselection adaptation set 206 and, accordingly, to select a matching preselection adaptation set for a wanted view port. In particular, a mere ranking between the assigned picture-portion specific adaptation sets 200 as realized by such quality_ranking parameter Q(i) enable to client device 80 at least to correctly assess the general quality gradient across the area 216 to find the ROI.
[0160] It should be noted that the indication 223 could be interpreted to signal the common ordinal scale 222 for all quality levels 204 of all picture-portion specific adaptation sets 200 coinciding in viewpoint, i.e. coinciding in viewpoint from which the respective portion 14 of the video picture area 16 is captured and which is indicated, for instance, in the respective parameter set 202. This renders the following clear: a described above with respect to
[0161] The latter aspect that the quality guidance information 223 may alternatively be positioned in the manifest file 24 outside parameter sets 206 is indicated in
[0162] As an alternative to the description of
[0163] As already stated above, the existence of an extractor 20 is not mandatory for achieving the advantages described with respect to
[0164] FF descriptor would come in addition and would indicate e.g. whether all these quality values, residing in the different tracks stemming from different subsets 200 relating to different regions 214, are defined on the common scale 222 or separate scales 224, or would indicate the ranges 220 on the common scale 222. The FF descriptor might be part of an initialization segment of the composed video stream downloaded by the client which is interested in the ROI associated with the extractor 20 to which the FF descriptor indicating the quality globality belongs: The file has, as mentioned, the referenced tracks 12 of set 32 in there, and the extractor 20. Each referenced track has its quality value in a local FF box/descriptor, for instance, and the FF descriptor/box outlined herein may be part of the initialization segment downloaded first by the client to obtain settings of the file.
[0165] For sake of completeness, it shall be mentioned that for each picture-portion specific adaptation set 200, the corresponding first parameter set 202 may define a field of view information with respect to the picture portion 14 encoded into the representations of the respective picture-portion specific adaptation set. The second parameter set 206, in turn, may define a field of view information with respect to a collation of the regions 214, i.e. the field of view resulting from the overlay of all regions 214. If there are more than two second parameter sets 206 of respective preselection adaptation sets 208, as depicted in
[0166] The client device may, as described, inspect the manifest file 24 and change, based on the quality level range and/or the indication, a streaming strategy in adaptively streaming a video from a server. It may use the quality levels, quality level ranges, the quality level hints and/or the indication, in order to rank the preselection adaptation sets with respect to a wished viewport.
[0167] As explained with respect to
[0168] Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, one or more of the most important method steps may be executed by such an apparatus.
[0169] The inventive data signals such as data collections, video streams, manifest files, descriptors and the like can be stored on a digital storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
[0170] Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a Blu-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
[0171] Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
[0172] Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.
[0173] Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
[0174] In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
[0175] A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or non-transitionary.
[0176] A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
[0177] A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
[0178] A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
[0179] A further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
[0180] In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are performed by any hardware apparatus.
[0181] The apparatus described herein may be implemented using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.
[0182] The apparatus described herein, or any components of the apparatus described herein, may be implemented at least partially in hardware and/or in software.
[0183] The methods described herein may be performed using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.
[0184] The methods described herein, or any components of the apparatus described herein, may be performed at least partially by hardware and/or by software.
[0185] While this invention has been described in terms of several advantageous embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations, and equivalents as fall within the true spirit and scope of the present invention.
REFERENCES
[0186] [1] NIST, ADVANCED ENCRYPTION STANDARD (AES), 2001, online: http://nvlpubs.nist.govinistpubs/FIPS/NIST.FIPS.197.pdf [0187] [2] NIST, Recommendation for Block 2001 Edition Cipher Modes of Operation, NIST Special Publication 800-38A 2001 Edition, online: http://dx.doi.org/10.6028/NIST.SP.800-38A [0188] [3] ISO/IEC 23001-7:2016, Information technologyMPEG systems technologiesPart 7: Common encryption in ISO base media file format files [0189] [4] ISO/IEC 14496-12:2015, Information technologyCoding of audio-visual objectsPart 12: ISO base media file format [0190] [5] ISO/IEC 14496-15:2017, Information technologyCoding of audio-visual objectsPart 15: Carriage of network abstraction layer (NAL) unit structured video in the ISO base media file format [0191] [6] ISO/IEC 23008-2:2013, Information technologyHigh efficiency coding and media delivery in heterogeneous environmentsPart 2: High efficiency video coding [0192] [7] Byeongdoo Choi, Ye-Kui Wang, Miska M. Hannuksela, Youngkwon Lim (editors), OMAF DIS text with updates based on Berlin OMAF AHG meeting agreements, m40849, 2017-06-16 [0193] [8] ISO/IEC 23009-1:2014, Information technologyDynamic adaptive streaming over HTTP (DASH)Part 1: Media presentation description and segment formats