Packet format of network abstraction layer unit, and algorithm and apparatus for video encoding and decoding using the format, QoS control algorithm and apparatus for IPv6 label switching using the format
09853893 · 2017-12-26
Assignee
Inventors
Cpc classification
H04N19/164
ELECTRICITY
H04L69/167
ELECTRICITY
H04N19/70
ELECTRICITY
H04N19/46
ELECTRICITY
H04N21/64792
ELECTRICITY
H04L69/16
ELECTRICITY
H04N19/132
ELECTRICITY
H04N21/234327
ELECTRICITY
H04N19/188
ELECTRICITY
International classification
H04N19/132
ELECTRICITY
H04N19/46
ELECTRICITY
H04N21/647
ELECTRICITY
H04N21/845
ELECTRICITY
H04N19/70
ELECTRICITY
H04N21/2343
ELECTRICITY
H04N19/169
ELECTRICITY
Abstract
The construction method of NALU (Network Abstraction Layer Unit) for IPv6 label switching and its using algorithms of video encoding, QoS control, and decoding are provided. According to an embodiment of the present invention, the NALU format is composed of the NALH (Network Abstraction Layer Header) including the label and the NAL (Network Abstraction Layer) payload. Here, the label is determined based on layer information which is combination of a spatial scalable level, a temporal scalable level, and a quality scalable level of the encoded data. The decoder uses the label to decide which one of multiple decoding modules is used to decode the current NAL payload. Moreover, the label can be included in the packet header so that the MANE (Media Aware Network Element) can use the label to decide whether to forward the packet or drop it. For example, the label in the packet header can be used for QoS control of video service by using the flow label field in IPv6 packet header. The IPv6 router can identify priority of the video packet by using the 20 bit long flow label, into which the label in NALH can be inserted. According to the embodiment, the MANE assumed in the MPEG and JVT (Joint Video Team) can be implemented effectively.
Claims
1. A method for delivering a packet by a packet generation apparatus, comprising the steps of: generating the packet including an identification information identifying a flow; signaling a mapping information between the identification information and at least one variable that characterizes the flow to a receiver or network intermediate node when a session starts or changes session information; and delivering the generated packet to the receiver or the network intermediate node, wherein the at least one variable indicates a scalable layer of one or more scalable layers to which the flow is related, each scalable layer of the one or more scalable layers being associated with a corresponding spatial level, a corresponding quality level, and a corresponding view number, and wherein any packets associated with a particular value of the variable are associated with a corresponding particular scalable layer and include pictures having a view, size, and quality corresponding to the particular scalable layer.
2. The method of claim 1, wherein the identification information is included in a packet header.
3. The method of claim 1, wherein the identification information is included in a flow label of a packet header.
4. The method of claim 1, wherein the mapping information is information made by mapping a combination of several variables characterizing one flow to an arbitrary number by each combination, in order to facilitate identification of the flow.
5. The method of claim 1, wherein the identification information includes information indicating significance of the flow.
6. The method of claim 5, wherein the network intermediate node determines whether or not to forward the packet according to significance information included in the identification information.
7. The method of claim 6, wherein the network intermediate node has a different priority related to discarding a packet corresponding to at least one condition of network-related condition including channel condition, terminal-related condition and user-related condition according to the significance information included in the identification information.
8. The method of claim 6, wherein the network intermediate node determines whether or not to forward using a extraction map, and the extraction map includes information indicating whether or not to forward the packet under a predetermined condition according to the extracted label.
9. The method of claim 5, wherein the network intermediate node performs QoS (Quality of Service) control using the significance information included in the identification information.
10. The method of claim 1, wherein the identification information includes information indicating whether the packet is included in a flow to which resources are allocated when the session starts.
11. The method of claim 1, wherein the packet includes a layer information.
12. The method of claim 11, wherein the layer information includes at least one of a spatial scalable level, a temporal scalable level, and a quality scalable level of the packet.
13. The method of claim 1, wherein the mapping information is exchanged when a session starts or changes session information, such that the network intermediate node or the receiver processes the packet by the flow.
14. The method of claim 1, wherein the packet includes a view identification information.
15. A method for receiving a packet by a packet reception apparatus, comprising the steps of: receiving a mapping information between an identification information identifying a flow and at least one variable that characterizes the flow, from a transmitter or a network intermediate node when session starts or changes session information; receiving the packet including the identification information; and decoding the packet according to the identification information, wherein the at least one variable indicates a scalable layer of one or more scalable layers to which the flow is related, each scalable layer of the one or more scalable layers being associated with a corresponding spatial level, a corresponding quality level, and a corresponding view number, and wherein any packets associated with a particular value of the variable are associated with a corresponding particular scalable layer and include pictures having a view, size, and quality corresponding to the particular scalable layer.
16. The method of claim 15, wherein the decoding includes processing the packet by the flow using the identification information.
17. The method of claim 16, wherein the decoding includes designating a decoder module or a buffer by the flow, and decoding the packet through the decoder module or the buffer designated for the flow, according to the identification information.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
BEST MODE FOR CARRYING OUT THE INVENTION
(16) An embodiment of the present invention is explained in detail by using attached figures as follows.
(17)
(18) According to
(19) In more detail, as an embodiment of the present invention, a combination of NAL type, Priority_id for priority, Dependency_id for spatial scalable level, Temporal_level for temporal scalable level, and Quality_level for quality scalable level which the conventional NALH includes, represents a set among NALH sets of limited number, and is converted into a label to discern the sets. That is, as an embodiment of the present invention, a set of NALH's with the same combination is mapped into a label one by one basis. Therefore, one unique label is assigned to any combination of NAL type, Priority_id for priority, Dependency_id for spatial scalable level, Temporal_level for temporal scalable level, and Quality_level for quality scalable level.
(20) Like this, as an embodiment of the present invention, SNALH in NALU includes a label corresponding to various values in the conventional NALH. This embodiment of the present invention is based on the fact that finite number of NALH sets exists during a video session or stored video sequence, and the same number of labels is used in the SNALH. For example, if number of NALH sets is less than 250, the conventional NALH can be replaced by 1 byte long SNALH.
(21)
(22) According to
(23) In more detail, as an embodiment of the present invention, at first, NALU type field in the conventional NALU format is unchanged. And, a combination of NAL type, Priority_id for priority, Dependency_id for spatial scalable level, Temporal_level for temporal scalable level, and Quality_level for quality scalable level which the conventional NALH includes, represents a set among NALH sets of limited number, and is converted into a label to discern the sets. That is, as an embodiment of the present invention, a set of NALH's with the same combination is mapped into a label one by one basis. Therefore, one unique label is assigned to any combination of NAL type, Priority_id for priority, Dependency_id for spatial scalable level, Temporal_level for temporal scalable level, and Quality_level for quality scalable level.
(24) Like this, as an embodiment of the present invention, SNALH in NALU includes a label corresponding to various values (P, D, T, Q) in the conventional NALH. This embodiment of the present invention is based on the fact that finite number of NALH sets exists during a video session or stored video sequence, and the same number of labels is used in the SNALH. For example, if number of NALH sets is less than 250, the conventional NALH can be replaced by 1 byte long SNALH.
(25) And according to the format in
(26) The method for the encoder and decoder to use the mapping table is as follows. For example, by using the mapping table, the encoder generates a NALU with a label, as in the format mentioned in the first or second embodiment of the present invention. Then, the decoder identifies scalable layer of a NALU by using the mapping table and the label in the NALH. The remained payload is sent to the decoder module appropriate to the layer identified by the label.
(27)
(28) In
(29) The video encoding module is used to perform encoding processes (transformation and quantization, etc.) to generate encoded data from original image, satisfying the SVC standard. The SVC encoding module includes the base layer encoding module, and multiple enhanced layer encoding modules. As result of encoding, the video encoding module, also, generates information representing NAL type, Dependency_id (D) for spatial scalable level, Temporal_level (T) for temporal scalable level, and Quality_level (Q) for quality scalable level.
(30) The NALU generator is a module to generate NALU by using information about NALU type, D, T, and Q given by the video encoding module. At this time, priority of NALU can be defined. More than one priority can be defined for a combination of D, T, and Q. The NALU generator 120 can generate NALU in the format depicted in
(31) The mapping module 122 is used to decide a label for SNALH according to P, D, T, and Q values. In order to decide a label, the mapping module 122 uses a stored mapping table or that produced conditionally for a video sequence or a video session. And, the NALU composer 124 composes a NALU by using encoded data provided by the video encoding module 110 and label given by the mapping module 122 as depicted in
(32)
(33) The label parser 210 parses a label from a NALU (with the format in
(34) The forwarding decision module 220 decides whether to forward each NALU or not based on its extraction table and the label parsed by the label parser 210. The extraction table is a table of labels with which bitstreams (NALU's) are allowed to be forwarded. Therefore, the extractor which is an embodiment of the present invention, does not need to parse P, D, T, and Q values of each NALU, but, by using only a label and the extraction table, can decide whether to forward corresponding NALU or not. For each label, information about forwarding or not can be indicated by a bit.
(35) The extraction table can be received from the server or can be constructed in the NALU extractors (for example, MANE). Labels in the extraction table may correspond to those in the mapping table in the encoder. According to various conditions of network, the terminal, and/or the user, more than one extraction table may be used. Moreover, for a multicast session, different NALU extractors can be used in different branches in the network.
(36)
(37)
(38)
(39) As another embodiment of the present invention, the decoder may have another NALU extractor explained in
(40) Hereinafter, above-described embodiments of the present invention are explained in various view points.
(41) <NALH Compression Method>
(42) During a video session temporarily, or when a video sequence is stored, each 3-5 byte long NALH as in the JVT standard is mapped and replaced by a SNALH (Shortened NALH). The same mapping table is used in the encoder and the decoder. When an encoded video sequence is stored, the mapping table is, also, to be stored.
(43) As an embodiment of the present invention, in order to control QoS adaptively, the MANE should have the extraction table for extracting dependently on labels of NALU's. The extraction table can be received from the server or the client, or can be constructed in the MANE by itself. If the MANE by itself constructs an extraction table, it should have the above-mentioned mapping table. In the NALU format according to an embodiment of the present invention, a set of P, D, T, and Q values are replaced by a label if a video session or a stored video sequence is composed of finite NALU sets of the same P, D, T, and Q values. For example, the number of sets is less than 250, the indicator for each set can be compressed into 8 bits.
(44) For example, suppose that a video sequence is encoded with 3 spatial scalable layers, each of which is scaled into 2 temporal scalable layers, each of which is scaled into 4 quality layers. Then, there are 24 (3×2×4) different layers, which can be identified by 24 different labels.
(45) If a layer is composed of two streams of NALUS with different priority_ID's, the layer needs 2 different labels. Here, ‘a stream of the same layer’ means stream of NALU's with the same dependency_ID, temporal_level, and quality_level (DTQ in short). To the contrary, if a P value is used for different DTQ sets, each set should have its own label.
(46) The encoder and the decoder should have the mapping table between labels and DTQ sets. Even though labels can be constructed arbitrarily, if possible, it is better to make labels to indicate DTQ values of the H.264/AVC standard without the mapping table.
(47) 1 byte long SNALH
(48) During a video session temporarily, or when a video sequence is stored, if number of layers (or sets of NALH's) is finite, the same number of labels is determined and used as SNALH's. According to an embodiment, under assumption that the number is less than 250, NALH can be compressed into 8 bits. An embodiment of 1 byte long SNALH is shown in
(49) 2 byte long SNALH
(50) In order to keep compatibility to the H.264/AVC standard, the first byte is the same as that in the standard at the moment of January 2006. The remained NALH bytes are compressed into a byte long label. An embodiment of 2 byte long SNALH is shown in
(51) Extended usage of SNALH
(52) ‘Extended usage of SNALH’ means to use SNALH in various purposes differently from above-mentioned embodiment of the present invention. There are ‘NALH extension,’ ‘scalable methodology extension,’ ‘media extension,’ and ‘protocol extension.’
(53) ‘NALH extension’ method keeps the conventional NALH format and adds the SNALH. That is, in
(54) ‘Scalable methodology extension’ means to allow more combinations of DTQ than those of the conventional standard. Since D, T, and Q are represented by 3, 3, and 2 bits in the conventional standard, numbers of possible layers are 8, 8, and 4, respectively. According to an embodiment of the current invention, since labels are defined by the mapping table, number of layers can be freely extended if the entities engaged in a service agree together.
(55) ‘Media extension’ means to apply the labels proposed by an embodiment of the current invention to media streams other than video streams such as audio stream. In an MVC (Multi-View Video Coding) session, labels can be assigned to views. In addition, labels can be assigned to streams of FEC (Forward Erasure Correction) packets for individual views or layers. If network is packet-lossy, parity packet streams are forwarded, while they are discarded in loss-free network.
(56) ‘Protocol extension’ means to apply the above-mentioned labels to other protocols with or without slight modification. As an embodiment, SNALH in
(57) Method to Guarantee Compatibility to H.264/AVC
(58) The SNALH in
(59) Compatibility to SEI Messages
(60) Scalability_info and layer_ID delivered by using SEI (Supplemental Enhancement Information) message can be used as labels. In this case, the encoder and the decoder do not need another labels.
(61) <Implementation of the Encoder>
(62) The decoder should be informed of the information that the encoder uses the SNALH. The information can be informed by using SPS(Sequence Parameter Set). In case of storing the encoded sequence, this information that the SNALH is used should be stored.
(63) Storing and Transmitting the Label Mapping Table
(64) Labels are assigned as many as number of streams (sets of NALU's) discerned in a video sequence. Relationship to map streams to labels is constructed as the mapping table. The mapping table is constructed to assign all or partial sets of NALH's to labels. As an embodiment, this information can be delivered to the decoder by using control data such as SPS or PPS (Picture Parameter Set). As an embodiment, labels can be serial number for layers of SVC or views of MVC.
(65) Generation of NALU
(66) In every NALU, the conventional long NALH is replaced by the SNALH.
(67) <Implementation of the Decoder>
(68) The decoder should be informed of the information that the encoder uses the SNALH. The information can be informed by using SPS. In case of playing the encoded sequence, this information that the SNALH is used should be read and properly parsed.
(69) Constructing the Label Mapping Table
(70) During decoder initialization period, the mapping given by the encoder is read. The decoder initializes the video decoding modules as many as number of streams to be decoded, and sets up relationship to labels respectively. At this moment, referencing relationship between SVC layers and MVC views, is also established.
(71) NALU Processing
(72) Upon receiving a NALU, the decoder parses the SNALH as in
(73) <Implementation of the MANE (Media Aware Network Element)>
(74) According to an embodiment of the present invention, when the MANE decides which streams it forwards, it depends on its extraction table. Therefore, according to an embodiment of the present invention, the MANE does not need to evaluate P, D, T, and Q values, but it decides whether to discard or not by using 1 bit information for every label in the extraction table. This extraction table is received from the server or the client, or is generated in the MANE. The MANE may have more than one extraction table as many as differentiable conditions of network, terminals, and users. For a multicast session, different extraction tables are placed in the network branches as shown in
(75) Construction of the Extraction Table
(76) The MANE receives information that the server uses the SNALH during initialization period of a video session. In order to construct an extraction table, the MANE may receive it from the server or the client, or may generate the extraction table by itself by using control information received from intelligent routers or the base station of wireless network. Types and range of layers or views to be forwarded are determined by the user's preference, terminal capability, and network condition. The terminal capability includes display resolution and allowable channel bandwidth. Even during a service session, the extraction policy in the extraction table can be modified adaptively to time varying user's preference, terminal capability, and network condition. It is desirable that this decision policy is received from the server. The server may send a modified extraction table to the NAME.
(77) As an example in
(78) It is desirable that the server sends a modified extraction table if the client requests to change service level during the service session. It is desirable that the MANE modifies the extraction table if channel (or network) condition is changed to certain amount. However, if the MANE is not intelligent enough to perform modification, the server or the client monitoring channel condition (available bitrate, packet loss ratio) sends a modified extraction table to the MANE.
(79) Extraction Procedure
(80) If the SNALH is used, the MANE decides extraction based on labels in the SNALH. However, this approach causes L3 (network layer) routers burden that they deal with application layer such as video data, and hurts the principle of independency between protocol layers. Therefore, it is desirable to insert the above-mentioned label into the flow label in the IPv6 header so that the routers only evaluate IP headers for decision of extraction.
(81)
(82) In more detail, for SVC if the conventional method is used as shown in
(83) QoS Control for IPv6 Label Switching
(84) When we transmit video data encoded by using MPEG-2 or H.264, significance of video packets is different from each other. In order to control quality of video service effectively, one should discern difference of significance of video packets. The IPv6 routers discern the difference in two different modalities such as packet switching modality and label switching modality. As for packet switching case, the routers should read 5 turples such as destination address, source address, destination port number, source port number, and protocol in order to select pre-defined service policy of corresponding packet. 5 turples are shown as shaded region in
(85) According to an embodiment of the present invention, label switching is used to eliminate the problem. During initialization period of a session, temporary labels are assigned, and they are used only for the session. The label is inserted in the flow label which is shaded region in
(86) Label in label switching includes both path information and resource information for every packet stream. After call setup, path and significance of every packet is identified by using the label. The conventional packet switching requires the router to read about 600 bits (including IPv6 header and video layer header) while according to an embodiment of the present invention, it is possible the router to identify path and significance by evaluating 20 bit long flow label.
(87)
(88) Label Swapping
(89) In the architecture including many MANE's along transmission path, the labels used in each MANE could be different. In this case, the precedent MANE may swap its labels to those used by following MANE. The swapping could be performed in the server before transmission after parsing every NALU.
(90) The present invention has been explained with reference to the embodiments which are merely exemplary. It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention cover modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents.
INDUSTRIAL APPLICABILITY
(91) The present invention is useful in image processing industry which includes video encoding and decoding. And, it, also, can be used for transmission of encoded video over telecommunication networks and or so, especially it is useful for packet switched networks.