DERIVATION OF A VALUE FOR EACH LAYER REPRESENTATION OF A BITSTREAM
20230232011 · 2023-07-20
Assignee
Inventors
- Rickard Sjöberg (Stockholm, SE)
- Martin Pettersson (Vallentuna, SE)
- Mitra DAMIGHANIAN (Upplands-Bro, SE)
- Jacob Strom (Stockholm, SE)
- Zhi Zhang (Solna, SE)
- Jack Enhorn (Kista, SE)
- Ruoyang YU (Täby, SE)
- Du LIU (Solna, SE)
Cpc classification
H04N19/70
ELECTRICITY
International classification
Abstract
There is provided a method for processing a bitstream. The method comprises determining a value, N, wherein N identifies a number of ordered layer representations, wherein N is greater than or equal to 3 such that the N ordered layer representations comprises a highest layer representation, a second highest layer representation, and a third highest layer representation. The method further comprises determining a value for the highest layer representation. The method comprises, after determining the value for the highest layer representation and before determining a value for the third highest layer representation, determining a value for the second highest layer representation. The method comprises, after determining the value for the second highest layer representation, determining a value for the third highest layer representation.
Claims
1. A method for processing a bitstream, the method comprising: determining a value, N, wherein N identifies a number of ordered layer representations, wherein N is greater than or equal to 3 such that the N ordered layer representations comprises a highest layer representation, a second highest layer representation, and a third highest layer representation; determining a value for the highest layer representation; after determining the value for the highest layer representation and before determining a value for the third highest layer representation, determining a value for the second highest layer representation; and after determining the value for the second highest layer representation, determining a value for the third highest layer representation, wherein determining the value for the second highest layer representation comprises: determining whether the bitstream contains a first certain syntax element associated with the second highest layer representation; and if the bitstream contains said first certain syntax element associated with the second highest layer representation, then deriving a value from the first certain syntax element associated with the second highest layer representation and determining that the value for the second highest layer representation is the value derived from the first certain syntax element, otherwise determining that the value for the second highest layer representation is the value for the highest layer representation.
2. (canceled)
3. The method of claim 1, wherein determining the value for the third highest layer representation comprises: determining whether the bitstream contains a second certain syntax element associated with the third highest layer representation; and if the bitstream contains said second certain syntax element associated with the third highest layer representation, then deriving a value from said second certain syntax element associated with the third highest layer representation and determining that the value for the third highest layer representation is the value derived from said second certain syntax element, otherwise determining that the value for the third highest layer representation is the value for the second highest layer representation or the value for the highest layer representation.
4. The method of claim 3, wherein if the bitstream contains both said first certain syntax element associated with the second highest layer representation and said second certain syntax element associated with the third highest layer representation, then said first certain syntax element associated with the second highest layer representation precedes said second certain syntax element associated with the third highest layer representation in the bitstream.
5. The method of claim 1, wherein the method further comprises decoding a syntax element from the bitstream to produce a decoded value and setting a variable, V2, equal to the decoded value, and determining whether the bitstream contains the first certain syntax element associated with the second highest layer representation comprises determining whether V2 is equal to a predetermined value.
6. The method of claim 1, wherein the method further comprises decoding a syntax element from the bitstream to produce a decoded value and setting a variable, V3, equal to the decoded value, and determining whether the bitstream contains the second certain syntax element associated with the third highest layer representation comprises determining whether V3 is equal to a predetermined value.
7. The method of claim 5, wherein the syntax element used to set the variable V2 and/or the syntax element used to set the variable V3, is ptl_sublayer_level_present_flag[i].
8. The method of claim 1, wherein determining N comprises determining N based on a syntax element included in the bitstream.
9. The method of claim 1, wherein determining N comprises: determining N based on a syntax element included in a Video Parameter Set, VPS, included in the bitstream, or determining N based on a syntax element included in a Sequence Parameter Set, SPS, included in the bitstream.
10. The method of claim 8, wherein determining N is based on a syntax element that specifies a max temporal ID for a profile, tier and level (PTL) structure, or a syntax element that is used to determine the max number of sublayers.
11. The method of claim 1, wherein determining a value for the highest layer representation comprises deriving the value for the highest layer representation from a syntax element included in a parameter set included in the bitstream.
12. The method of claim 11, wherein the parameter set is a Sequence Parameter Set or a Video Parameter Set.
13. The method of claim 1, wherein each value for each layer representation is a level value.
14. The method of claim 1, wherein a layer representation is a temporal sub-layer representation.
15. The method of claim 1, further comprising: setting a first variable associated with the highest layer representation equal to the value determined for the highest layer representation; setting a second variable associated with the second highest layer representation equal to the value determined for the second highest layer representation; and setting a third variable associated with the third highest layer representation equal to the value determined for the third highest layer representation.
16. A non-transitory computer readable storage medium storing a computer program comprising instructions which when executed by processing circuitry of an apparatus causes the apparatus to perform the method of claim 1.
17-18. (canceled)
19. An apparatus, the apparatus comprising: processing circuitry; and a memory, said memory containing instructions executable by said processing circuitry for configuring the apparatus to perform a process comprising: determining a value, N, wherein N identifies a number of ordered layer representations, wherein N is greater than or equal to 3 such that the N ordered layer representations comprises a highest layer representation, a second highest layer representation, and a third highest layer representation; determining a value for the highest layer representation; after determining the value for the highest layer representation and before determining a value for the third highest layer representation, determining a value for the second highest layer representation; and after determining the value for the second highest layer representation, determining a value for the third highest layer representation, wherein determining the value for the second highest layer representation comprises: determining whether a bitstream contains a first certain syntax element associated with the second highest layer representation; and if the bitstream contains said first certain syntax element associated with the second highest layer representation, then deriving a value from the first certain syntax element associated with the second highest layer representation and determining that the value for the second highest layer representation is the value derived from the first certain syntax element, otherwise determining that the value for the second highest layer representation is the value for the highest layer representation.
20. The apparatus of claim 19, wherein determining the value for the third highest layer representation comprises: determining whether the bitstream contains a second certain syntax element associated with the third highest layer representation; and if the bitstream contains said second certain syntax element associated with the third highest layer representation, then deriving a value from said second certain syntax element associated with the third highest layer representation and determining that the value for the third highest layer representation is the value derived from said second certain syntax element, otherwise determining that the value for the third highest layer representation is the value for the second highest layer representation or the value for the highest layer representation.
21. The apparatus of claim 20, wherein if the bitstream contains both said first certain syntax element associated with the second highest layer representation and said second certain syntax element associated with the third highest layer representation, then said first certain syntax element associated with the second highest layer representation precedes said second certain syntax element associated with the third highest layer representation in the bitstream.
22. The apparatus of claim 19, wherein the process further comprises decoding a syntax element from the bitstream to produce a decoded value and setting a variable, V2, equal to the decoded value, and determining whether the bitstream contains the first certain syntax element associated with the second highest layer representation comprises determining whether V2 is equal to a predetermined value.
23. The apparatus of claim 19, wherein the process further comprises decoding a syntax element from the bitstream to produce a decoded value and setting a variable, V3, equal to the decoded value, and determining whether the bitstream contains the second certain syntax element associated with the third highest layer representation comprises determining whether V3 is equal to a predetermined value.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0043] The accompanying drawings, which are incorporated herein and form part of the specification, illustrate various embodiments.
[0044]
[0045]
[0046]
[0047]
[0048]
[0049]
[0050]
[0051]
DETAILED DESCRIPTION
[0052]
[0053]
[0054]
Embodiments
[0055] In the description below, various embodiments are described that solve one or more of the above described problems. It is to be understood by a person skilled in the art that two or more embodiments, or parts of embodiments, may be combined to form new solutions which are still covered by this disclosure.
[0056] In the embodiments below, given an array X[i] that has N elements, increasing scan order is defined as a loop through each element in X[i] starting with i equal to 0 and ending with i equal to (N−1), and decreasing scan order is defined as a loop through each element in X[i] starting with i equal to (N−1) and ending with i equal to 0.
[0057]
[0058] Step s402 comprises deriving a value N, wherein the value N represents the number of layer representations that are present in the coded video bitstream, and wherein the value N is derived from a syntax element SYN1 in the coded video bitstream. The value N here may be a value that is larger than 2.
[0059] After step s402, N values L[i] for i=0 to N−1 are derived, wherein each value L[i] corresponds to the i-th layer representation in the coded video bitstream and wherein each value L[i] specifies a level for the i-th layer representation, and further wherein the values L[i] are derived in decreasing scan order from the highest layer representation (e.g. i equal to N−1) to the lowest layer representation (e.g. i equal to 0) (see steps s404 to step s412).
[0060] In step s404, the value of L[N−1] is set equal to a particular value (denoted “G”) (step s404). After steps s404, steps s405 and s406 are performed, where in step s405 i is set equal to N−2 and in step s406 it is determined whether i is greater than or equal to zero. If i is greater than or equal to zero, the process proceeds to steps s407, other the process ends. Step s407 comprises determining whether a syntax element SYN[i] for the i-th layer representation is present in the coded video bitstream or not.
[0061] If the corresponding syntax element SYN[i] is determined to be present in the coded video bitstream, then step s408 is performed, otherwise step s410 is performed. Step s408 comprises deriving the value of L[i] by decoding the corresponding SYN[i] syntax element, wherein the syntax element SYN[i] represents a level value L[i] for the i-th layer representation (e.g., L[i] is set equal to SYN[i]). Step s410 comprises setting the value of L[i] equal to the value of L[i+1]. In an alternative version of this embodiment, step s410 is modified such that L[i] is set equal to L[i], where j>i and j<N, rather than set L[i] equal to L[i+1]. Step s412 comprises decrementing i. After step s412 is performed, the process goes back to step s406.
[0062] In one embodiment, the particular value (i.e., denoted G above) is derived from a syntax element (denoted SYN2) in the coded video bitstream, and wherein the value G is a general level value. The syntax element SYN2 may be a syntax element with the name general_level_idc. The syntax element may be present in an SPS and/or a VPS in the coded video bitstream.
[0063] In one embodiment, process 400 further includes deriving an ordered set of values V[i] (i=0 to i=(N−1)) from (N−1) syntax elements in the coded video bitstream, wherein each value V[i] corresponds to the i-th layer representation in the coded video bitstream and wherein each value V[i] indicates whether there is a syntax element SYN[i] for the i-th layer representation present in the coded video bitstream or not. In this embodiment, the determining whether a syntax element SYN[i] for the i-th layer representation is present in the coded video bitstream or not in step s407 comprises determining the value of the corresponding value V[i]. In one embodiment, each of the (N−1) syntax elements is a one bit flag. In the bitstream, the order of syntax elements may be such that all V[i] syntax elements precede (i.e., come before) any SYN[i] syntax elements. The order of the SYN[i] syntax elements must be such that when m is larger than n, SYN[m] precedes SYN[n] in the bitstream. In other words, the SYN[i] syntax elements are in the bitstream ordered in decreasing order of i. The syntax elements V[i] may in the bitstream be ordered in increasing or decreasing order of i.
[0064]
[0065] Step s502 comprises decoding a syntax element S1 from the bitstream and deriving a number N of layer representations present in the bitstream from S1 (e.g., S1 may be vps_ptl_max_temporal_id[i] or sps_max_sublayers_minus1).
[0066] Step s504 comprises decoding a syntax element S2 from the bitstream and deriving a general level value G from the decoded S2 value.
[0067] Step s506 comprises setting L[N−1] to G.
[0068] If N is equal to 1, then process 500 is finished, otherwise process 500 proceeds to step s510, where the variable i is set to 0. After step s510, steps s512, s514, and s516 are performed until i reaches N−1.
[0069] In step s512, for layer representation i, a syntax element S3 is decoded from the bitstream and a value V[i] is derived from the decoded value, where V[i] determines whether a syntax element S4 representing a level value for the respective layer representation (i.e., the ith layer representation is present in the bitstream or not). Step s514 comprises incrementing i by one (i.e., i=i1). Step s516 comprises determining whether i=N−1. If i=N−1, process 500 proceeds to step s518, otherwise process 500 goes back to step s512. As illustrated in
[0070] Once all of the V[i] values are obtained, then a level value L[i] for each of the layer representations (e.g. layer representations=i) is derived by a single pass from the highest layer representation (i.e., i=N−1) to the lowest layer representation (i.e., i=0). That is, steps s518 to s528 are performed.
[0071] Step s518 comprises setting i=N−2.
[0072] Step s520 comprise determining if the value of the corresponding syntax element S3 specifies that a corresponding syntax element S4 is present in the bitstream. In one embodiment (shown in
[0073] Step s522 comprises decoding the corresponding S4 syntax element and deriving a level value L[i] for the layer representation from S4 (e.g., setting L[i] equal to the decoded corresponding S4 syntax element).
[0074] Step s524 comprises setting the level value L[i] for the layer representation equal to the level value for the closest higher layer representation (e.g., setting L[i] equal to L[i+1]).
[0075] Steps s526 comprises setting i=i-1. And step s528 comprises determining whether i is greater than or equal to 0. If it is not, then process 500 finishes, otherwise process 500 goes back to step s520.
[0076] The number of layer representations (e.g. i) may be a number of temporal sublayer representations, and the level value L[i] may be a temporal sublayer level value.
[0077] In some embodiments, one or more of the steps of process 500 (e.g., steps s510 to s528) are performed by decoder 104 when decoder 104 is decoding the syntax structure profile_tier_level( ). The names for constant values and syntax table in the decoding steps can be mapped to the names in the syntax structure as follows in table 6:
TABLE-US-00006 TABLE 6 Name in decoding Name in Variable name in steps syntax structure decoding steps N maxNumSubLayersMinus1 N = maxNumSubLayersMinus1 + 1 S2 general_level_idc G S3 sublayer_level_present_flag[i] V[i], i in range {0, N − 2}, inclusive S4 sublayer_level_idc[i] L[i], i in range {0, . . . , N − 1}, inclusive
[0078] In one embodiment, the profile_tier_level( ) is defined as follows in table 7:
TABLE-US-00007 TABLE 7 Descriptor profile_tier_level( profileTierPresentFlag, maxNumSubLayersMinus1 ) { 1. if( profileTierPresentFlag ) { 2. general_profile_idc u(7) 3. general_tier_flag u(1) 4. general_constraint_info( ) 5. } 6. general_level_idc u(8) 7. if( profileTierPresentFlag ) { 8. num_sub_profiles u(8) 9. for( i = 0; i < num_sub_profiles; i++ ) 10. general_sub_profile_idc[ i ] u(32) 11. } 12. for( i = 0; i < maxNumSubLayersMinus1; i++ ) 13. sublayer_level_present_flag[ i ] u(1) 14. while( !byte_aligned( ) ) 15. ptl_alignment_zero_bit f(1) 16. for( i = maxNumSubLayersMinus1 − 1; i >= 0; i −− ) 17. if( sublayer_level_present_flag[ i ] ) 18. sublayer_level_idc[ i ] u(8) 19. }
[0079] Comparing Table 7 to Table 5, one case see that row 16 of Table 7 differs from row 16 of Table 5. Specifically, in row 16 of Table 7, the variable i is initialized to (maxNumSubLayersMinus1−1) and is then decremented until it reaches −1, whereas in row 16 of Table 5, the variable i is initialized to 0 and then incremented until it reaches maxNumSubLayersMinus1. Accordingly, assuming that sublayer_level_present_flag[i] and sublayer_level_present_flag[i−1] are both set to 1 (i.e., the bitstream contains both sublayer_level_idc[i] and sublayer_level_idc[i−1], then syntax element sublayer_level_idc[i] precedes syntax element sublayer_level_idc[i−1] in a bitstream.
[0080] In this embodiment shown in Table 7, the semantics of sublayer_level_idc[i] is as follows:
[0081] sublayer_level_idc[i] indicates a level for the sublayer representation with Temporal d equal to i. When not present, the value of sublayer_level_idc[i] is inferred as follows: sublayer_level_idc[maxNumSubLayersMinus1] is inferred to be equal to general_level_idc of the same profile_tier_level( ) structure, and for i from maxNumSubLayersMinus1−1 to 0 (in decreasing order of values of i), inclusive, sublayer_level_idc[i] is inferred to be equal to sublayer_level_idc[i+1].
[0082] In another embodiment, the profile_tier_level( ) is defined as follows in table 8:
TABLE-US-00008 TABLE 8 Descriptor profile_tier_level( profileTierPresentFlag, maxNumSubLayersMinus1 ) { 20. if( profileTierPresentFlag ) { 21. general_profile_idc u(7) 22. general_tier_flag u(1) 23. general_constraint_info( ) 24. } 25. general_level_idc u(8) 26. if( profileTierPresentFlag ) { 27. num_sub_profiles u(8) 28. for( i = 0; i < num_sub_profiles; i++ ) 29. general_sub_profile_idc[ i ] u(32) 30. } 31. for( i = 0; i < maxNumSubLayersMinus1; i++ ) 32. sublayer_level_present_flag[ i ] u(1) 33. while( !byte_aligned( ) ) 34. ptl_alignment_zero_bit f(1) 35. for( i = 0; i < maxNumSubLayersMinus1; i ++ ) 36. if( sublayer_level_present_flag[ maxNumSubLayersMinus1 −1 − i ] ) 37. sublayer_level_idc[ maxNumSubLayersMinus1 −1 − i ] u(8) 38. }
[0083] In this embodiment shown in Table 8, the loop index variable (denoted with “i” in the syntax table above) starts from 0 and ends at the highest value, in increasing order that is. However, the check of the sublayer_level_present_flag[ ] values and parsing of the sublayer_level_idc[ ] syntax elements is done in decreasing index order. Accordingly, this embodiment is equivalent to embodiments disclosed above but expressed in an alternative way.
[0084] In another embodiment, the example syntax tables are the same (or unchanged) compared to the current version of VVC. S1 is one syntax element in the decoding steps. In the example syntax tables, S is vps_ptl_max_temoral_id[i] or sps_max_sublayers_minus1 in the VPS and SPS, respectively. The value N is equal to the decoded value of the syntax element S1 plus one.
[0085] S1 is vps_ptl_max_temporal_id[i] in the VPS
TABLE-US-00009 Descriptor video_parameter_set_rbsp( ) { ... ... vps_num_ptls_minus1 u(8) for( i = 0; i <= vps_num_ptls_minus1; i++ ) { if ( i > 0 ) vps_pt_present_flag[ i ] u(1) if( !vps_all_layers_same_num_sublayers_flag ) vps_ptl_max_temporal_id[ i ] u(3) } while( !byte_aligned( ) ) vps_ptl_alignment_zero_bit /* equal to 0 */ f(1) for( i = 0; i <= vps_num_ptls_minus1; i++ ) profile_tier_level( vps_pt_present_flag[ i ], vps_ptl_max_temporal_id[ i ] ) ... ...
TABLE-US-00010 Descriptor seq_parameter_set_rbsp( ) { ... ... sps_video_parameter_set_id u(4) sps_max_sublayers_minus1 u(3) sps_reserved_zero_4bits u(4) sps_ptl_dpb_hrd_params_present_flag u(1) if( sps_ptl_dpb_hrd_params_present_flag ) profile_tier_level( 1, sps_max_sublayers_minus1 ) ... ...
[0086]
[0087] Step s602 comprises decoding a syntax element S1 from the bitstream and deriving a number N of layer representations present in the bitstream from S1 (e.g., S1 may be vps_ptl_max_temporal_id[i] or sps_max_sublayers_minus1).
[0088] Step s604 comprises decoding a syntax element S2 from the bitstream and deriving a general level value G from the decoded S2 value.
[0089] Step s606 comprises setting L[N−1] to G.
[0090] If N is equal to 1, then process 600 is finished, otherwise process 600 proceeds to step s610.
[0091] Step s610 comprises setting the variable i to N−2. After step s610, step s612 to step s622 are performed until i reaches −1.
[0092] In step s612, for layer representation i, a syntax element S3 is decoded from the bitstream and a value V[i] is derived from the decoded value, where V[i] determines whether a syntax element S4 representing a level value for the respective layer representation (i.e., the ith layer representation is present in the bitstream or not).
[0093] Step s614 comprise determining if the value of the corresponding syntax element S3 specifies that a corresponding syntax element S4 is present in the bitstream. In one embodiment (shown in
[0094] Step s616 comprises decoding the corresponding S4 syntax element and deriving a level value L[i] for the layer representation from S4 (e.g., setting L[i] equal to the decoded corresponding S4 syntax element).
[0095] Step s618 comprises setting the level value L[i] for the layer representation equal to the level value for the closest higher layer representation (e.g., setting L[i] equal to L[i+1]).
[0096] Steps s620 comprises setting i=i−1. And step s622 comprises determining whether i is equal to −1. If it is, then process 600 finishes, otherwise process 600 goes back to step s612.
[0097] In this embodiment, the profile_tier_level( ) is defined as follows in table 9:
TABLE-US-00011 TABLE 9 Descriptor profile_tier_level( profileTierPresentFlag, maxNumSubLayersMinus1 ) { if( profileTierPresentFlag ) { general_profile_idc u(7) general_tier_flag u(1) general_constraint_info( ) } general_level_idc u(8) if( profileTierPresentFlag ) { num_sub_profiles u(8) for( i = 0; i < num_sub_profiles; i++ ) general_sub_profile_idc[ i ] u(32) } for( i = maxNumSubLayersMinus1 − 1; i >= 0; i −− ){ sublayer_level_present_flag[ i ] u(1) if( sublayer_level_present_flag[ i ] ) sublayer_level_idc[ i ] u(8) } while( !byte_aligned( ) ) ptl_alignment_zero_bit f(1) }
[0098] As table 9 indicates, the two “For” loops shown in Table 5 are replaced by a single “For” loop.
[0099]
[0100] While various embodiments are described herein, it should be understood that they have been presented by way of example only, and not limitation. Thus, the breadth and scope of this disclosure should not be limited by any of the above-described exemplary embodiments. Moreover, any combination of the above-described elements in all possible variations thereof is encompassed by the disclosure unless otherwise indicated herein or otherwise clearly contradicted by context.
[0101] Additionally, while the processes described above and illustrated in the drawings are shown as a sequence of steps, this was done solely for the sake of illustration. Accordingly, it is contemplated that some steps may be added, some steps may be omitted, the order of the steps may be re-arranged, and some steps may be performed in parallel.