Video transmission device and video transmission method
11343520 · 2022-05-24
Assignee
Inventors
Cpc classification
H04H20/28
ELECTRICITY
H04N21/8455
ELECTRICITY
H04N21/234309
ELECTRICITY
H04N21/234327
ELECTRICITY
H04N21/23605
ELECTRICITY
H04N21/23608
ELECTRICITY
H04N19/114
ELECTRICITY
H04N19/188
ELECTRICITY
International classification
H04N19/114
ELECTRICITY
H04N19/169
ELECTRICITY
Abstract
The present disclosure aims to provide a method for detecting a GOP boundary of an encoded bit stream of each layer and associating GOPs of the layers for hierarchical transmission in a video transmission device that transmits a hierarchically encoded bit stream. The present disclosure provides a video transmission device and a video transmission method that detect a GOP head access unit in a base layer of a hierarchically encoded bit stream by analyzing the base layer and detect a head access unit of an enhancement layer of an identical GOP to that of the aforementioned access unit from a decoding time stamp of the access unit by using the relationship between a decoding time stamp of the base layer and a decoding time stamp of the enhancement layer.
Claims
1. A video transmission device for transmitting a hierarchically encoded bit stream to which layers are multiplexed with different identifiers comprising: an input processing unit configured to extract access units from a group-of-pictures (GOP) constituting a hierarchically encoded bit stream and impart a decoding time stamp to each extracted access unit; a GOP number imparting unit configured to detect a GOP head access unit of a base layer from among the extracted access units, impart a GOP number to the GOP head access unit of the base layer, further detect a GOP head access unit of an enhancement layer using a decoding time stamp of the GOP head access unit of the base layer and individual decoding time stamps of access units of the enhancement layer, and impart a GOP number to the GOP head access unit of the enhancement layer in accordance with a decoding time stamp of the GOP head access unit of the enhancement layer; and a hierarchical transmission control unit configured to transmit, hierarchically, access units of the base layer and access units of the enhancement layer using the GOP number imparted by the GOP number imparting unit, wherein the GOP number imparting unit is configured to determine whether the decoding time stamp of the GOP head access unit of the base layer is greater or smaller than a decoding time stamp of the access unit of the enhancement layer which is simultaneously input with the GOP head access unit of the base layer, and when the decoding time stamp of the access unit of the enhancement layer is greater than the decoding time stamp of the GOP head access unit of the base layer, to determine that the enhancement layer precedes the base layer and to impart, as a GOP number for the GOP head access unit of the enhancement layer, a GOP number that is different from the GOP number imparted to the GOP head access unit of the base layer.
2. The video transmission device according to claim 1, wherein the GOP number imparting unit identifies types of network abstraction layer (NAL) units included in each access unit of the base layer to detect the GOP head access unit of the base layer.
3. The video transmission device according to claim 2, wherein the GOP number imparting unit detects an access unit including both a video parameter set (VPS) NAL unit and a sequence parameter set (SPS) NAL unit as the GOP head access unit of the base layer.
4. The video transmission device according to claim 1, wherein the hierarchically encoded bit stream is transmitted using MPEG-2 TS, and the input processing unit is a TS processing unit that reconfigures a packetized elementary stream (PES) from MPEG-2 TS and imparts a decoding time stamp (DTS) included in a header of the PES as the decoding time stamp to an access unit obtained from a payload of the PES.
5. The video transmission device according to claim 4, wherein the GOP number imparting unit is configured to determine whether the decoding time stamp DTS.sup.B.sub.1 of the GOP head access unit of the base layer is greater or smaller than a decoding time stamp DTS.sup.E.sub.1 of the access unit of the enhancement layer which is simultaneously input with the GOP head access unit of the base layer, and to detect, when the DTS.sup.E.sub.1 is smaller than the DTS.sup.B.sub.1 as the GOP head access unit of the enhancement layer, an access unit of the enhancement layer whose DTS has a value equal to the sum of a value of the DTS.sup.B.sub.1 and a DTS difference value between two consecutive access units, and imparts, to the detected GOP head access unit of the enhancement layer, a GOP number identical to the GOP number imparted to the GOP head access unit of the base layer and to calculate, when the DTS.sup.E.sub.1 is greater than the DTS.sup.B.sub.1, an indication k representing how far ahead the DTS.sup.E.sub.1 precedes the DTS.sup.B.sub.1 in terms of the number of GOPs, and to impart a GOP number corresponding to the calculated indication k.
6. The video transmission device according to claim 5, wherein when T is the DTS difference corresponding to one GOP, and D is a DTS difference value between two consecutive access units, the GOP number imparting unit calculates the indication k by using k={(DTS.sup.E.sub.1−DTS.sup.B.sub.1)mod T}+1, and detects the GOP head access unit of the enhancement layer by awaiting appearance of an access unit having a DTS satisfying DTS.sup.B.sub.1+k*T+D of the enhancement layer by sequentially monitoring the decoding time stamps of the access units of subsequent enhancement layers.
7. The video transmission device according to claim 1, wherein the hierarchical transmission control unit is an MMT transmission control unit that encapsulates the access units of the base layer and the access units of the enhancement layer in each of media processing units (MPUs) having different packet_ids and transmits the MPUs according to an MPEG Media Transport (MMT) protocol.
8. A video transmission method comprising: an input processing procedure of a video transmission device extracting access units from a group-of-pictures (GOP) constituting a hierarchically encoded bit stream and imparting a decoding time stamp to each extracted access unit; a GOP number imparting procedure of the video transmission device detecting a GOP head access unit of a base layer from among the extracted access units, imparting a GOP number to the GOP head access unit of the base layer, further detecting a GOP head access unit of an enhancement layer using a decoding time stamp of the GOP head access unit of the base layer and individual decoding time stamps of access units of the enhancement layer, and imparting a GOP number to the GOP head access unit of the enhancement layer in accordance with a decoding time stamp of the GOP head access unit of the enhancement layer; and a hierarchical transmission control procedure of the video transmission device transmitting, hierarchically, access units of the base layer and access units of the enhancement layer using the GOP number imparted in the GOP number imparting procedure wherein the GOP number imparting procedure determine whether the decoding time stamp of the GOP head access unit of the base layer is greater or smaller than a decoding time stamp of the access unit of the enhancement layer which is simultaneously input with the GOP head access unit of the base layer, and when the decoding time stamp of the access unit of the enhancement layer is greater than the decoding time stamp of the GOP head access unit of the base layer, determines that the enhancement layer precedes the base layer and imparts, as a GOP number for the GOP head access unit of the enhancement layer, a GOP number that is different from the GOP number Imparted to the GOP head access unit of the base layer.
9. The video transmission method according to claim 8, wherein the hierarchically encoded bit stream is transmitted using MPEG-2 TS, the input processing procedure is a TS processing procedure to reconfigure a packetized elementary stream (PES) from MPEG-2 TS and impart a decoding time stamp (DTS) included in a header of the PES as the decoding time stamp to an access unit obtained from a payload of the PES, and in the GOP number imparting procedure, whether the decoding time stamp DTS.sup.B.sub.1 of the GOP head access unit of the base layer is greater or smaller than a decoding time stamp DTS.sup.E.sub.1 of the access unit of the enhancement layer which is simultaneously input with the GOP head access unit of the base layer, is determined, and when the DTS.sup.E.sub.1 is smaller than the DTS.sup.B.sub.1, as the GOP head access unit of the enhancement layer, an access unit of the enhancement layer whose DTS has a value equal to the sum of a value of the DTS.sup.B.sub.1 and a DTS difference value between two consecutive access units is detected, and a GOP number identical to the GOP number imparted to the GOP head access unit of the base layer is imparted to the detected GOP head access unit of the enhancement layer, and when the DTS.sup.E.sub.1 is greater than the DTS.sup.B.sub.1, an indication k representing how far ahead the DTS.sup.E.sub.1 precedes the DTS.sup.B.sub.1 in terms of the number of GOPs is calculated, and a GOP number corresponding to the calculated indication k is imparted.
10. The video transmission method according to claim 9 wherein when T is the DTS difference corresponding to one GOP, and D is a DTS difference value between two consecutive access units, the GOP number imparting procedure calculates the indication k by using k={(DTS.sup.E.sub.1−DTS.sup.B.sub.1)mod T}+1, and detects the GOP head access unit of the enhancement layer by awaiting appearance of an access unit having a DTS satisfying DTS.sup.B.sub.1+k*T+D of the enhancement layer by sequentially monitoring the decoding time stamps of the access units of subsequent enhancement layers.
Description
BRIEF DESCRIPTION OF DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
DESCRIPTION OF EMBODIMENTS
(10) An embodiment of the present disclosure will be described below in detail with reference to the drawings. Note that the present disclosure is not limited to the embodiment described below. The embodiment is merely an example, and the present disclosure can be implemented with various modifications and improvements made to the invention based on knowledge of a person skilled in the art. Note that constituent elements having identical reference signs in the present specification and the drawings are assumed to be the same.
(11) Basic Configuration
(12) A basic configuration of a video transmission device is illustrated in
(13) The input processing unit 11 extracts access units in response to an input of a hierarchically encoded bit stream, applies a decoding time stamp to each of the access units, and passes the access units to the GOP number imparting unit 12 by layer. A “hierarchically encoded bit stream” of the present disclosure includes an encoded bit stream of a video signal with any hierarchical structure. The hierarchical structure includes a hierarchical structure of a time direction or a spatial direction.
(14) The GOP number imparting unit 12 detects a GOP head access unit of a base layer by identifying the types of NAL units constituting each access unit of the base layer, and imparts a GOP number to the detected access unit. A GOP number is assumed to be imparted with an identical value until the next GOP head access unit is detected, and is incremented by one each time a GOP head access unit is detected. The GOP number may start from any value.
(15) Next, the GOP number imparting unit 12 detects a head access unit of an enhancement layer of the same GOP from a decoding time stamp of the detected GOP head access unit of the base layer using the relationship between the decoding time stamp of the base layer and a decoding time stamp of the enhancement layer, and imparts the same GOP number as that of the base layer. With regard to the enhancement layer, the same GOP number is also imparted until the next GOP head access unit is detected. The access units with the GOP number imparted are passed to the hierarchical transmission control unit 13 by layer.
(16) The hierarchical transmission control unit 13 constructs a data unit for hierarchical transmission with the access units having the same GOP number, and performs IP transmission. At this time, the hierarchical transmission control unit 13 encapsulates the access units of the base layer and the access units of the enhancement layer in MPUs having different packet_ids as illustrated in
(17) The temporal relationship between the decoding time stamps of the head access unit of the base layer and the head access unit of the enhancement layer belonging to the same GOP can be calculated in advance from a GOP structure and a frame rate. Then, the GOP number imparting unit 12 determines which layer precedes and how far ahead. Finally based on the relationship of the decoding time stamps, the head of the GOP of the enhancement layer that could not have been detected from the types of NAL units constituting each access unit is detected. In this way, data units (MPUs) for hierarchical transmission can be constructed.
(18)
(19) For the decoding time stamp, for example, a decoding time stamp (DTS) encapsulated in the header of a packetized elementary stream (PES) may be used.
(20) Specific Example of Hierarchical Transmission of Hierarchically Encoded Bit Stream in Time Direction in MMT
(21)
(22) The time-direction hierarchically encoded bit stream is compliant with the time-direction hierarchical encoding provisions of ARIB STD-B32 (see, for example, Non Patent Literature 2). A relationship between an encoded bit stream and MPEG-2 TS is illustrated in
(23) A DTS is encapsulated in the header of each of PESs and an access unit is encapsulated in a payload. The PESs are divided into TS packets and transmitted. The MPEG-2 TS also shares a value called program clock reference (PCR) which is counted by a 27-MHz clock of an encoder, and uses the PCR as time. For example, even if the PES with the DTS of 10000 is received, the PES is encapsulated in a buffer while the PCR is less than 10000. When the PCR reaches 10000, the PES begins to be decoded.
(24) The TS processing unit 111 combines the payloads of the input TS packets to reconfigure the PESs and obtains the payloads of the PESs as access units. In addition, DTSs are acquired from the DTS fields of the headers of the PESs as decoding time stamps, imparted to the access units, and passed to the GOP number imparting unit 112. In this manner, the access units and the DTSs extracted from an identical PES are associated and sent to the GOP number imparting unit 12. At this time, a new data structure including the access units and the DTSs may be defined to associate the access units with the DTSs.
(25) The GOP number imparting unit 112 detects the head of a GOP according to the processing flowchart illustrated in
(26) According to the processing flowchart, a nal_unit_type field of the header of the NAL unit is read for each of the NAL units constituting the access units of the 60P sub-bit stream to detect an access unit including both a video parameter set (VPS) NAL unit and a sequence parameter set (SPS) NAL unit (S101). The “VPS NAL unit” and the “SPS NAL unit” are types of NAL units, and the “VPS NAL unit” and the “SPS NAL unit” encapsulate parameters for encoding, rather than encoding data themselves. Because the access unit including both the VPS NAL and the SPS NAL is only the GOP head, the detected access unit is considered to be the GOP head of the 60P sub-bit stream, and a GOP number “0” is imparted thereto (S102).
(27) A DTS imparted to the access unit which is considered to be the GOP head of the base layer is assumed to be a DTS.sup.B.sub.1, a DTS imparted to the access unit of the 120P subset at that time is assumed to be DTS.sup.E.sub.1, and DTS.sup.B.sub.1 and DTS.sup.E.sub.1 are compared (S103). If DTS.sup.E.sub.1<DTS.sup.B.sub.1, it is determined that the 60P sub-bit stream has preceded (Yes in S103). In this case, an access unit having a DTS satisfying DTS.sup.B.sub.1+D is detected from the subsequent 120p subset, and a GOP number “0” is imparted as the head of the 120P subset (S104).
(28) Here, “D” represents a DTS difference between the access units in which decoding is continuous. The DTS is a counter value of 90 kHz time resolution, and if a frame rate of an input video signal is F, it is determined according to D=90000/F. The equation becomes D=90000/120=750 in time-direction hierarchical encoding of 120P and 60P, and the equation becomes D=90000/60=1500 in time-direction hierarchical encoding of 60P and 30P. A value of “D” varies according to a frame rate of a video signal, and thus an appropriate value is used in accordance with conditions. A correspondence relationship between the layers at this time is illustrated in
(29) In the case of DTS.sup.B.sub.1<DTS.sup.E.sub.1 (No in S103), it is determined that the 120P subset has preceded, and an indication k representing how far ahead the DTS.sup.E.sub.1 precedes the DTS.sup.B.sub.1 in terms of the number of GOPs is calculated (S105).
[Formula 1]
K={(DTS.sup.E.sub.1−DTS.sup.B.sub.1)mod T}+1 (1)
(30) Here, T is the DTS difference corresponding to one GOP. The DTS at the head of the GOP that appears next in the 120P subset can be calculated using k in DTS.sup.E.sub.2=DTS.sup.B.sub.1+k*T+D. If the number of frames included in the GOP is set as L, T is expressed as T=D*L, and if D=750 and L=16 are set, T=12000 is satisfied. Here, L includes both the base layer and the enhancement layer. Because the frames of the base layer and the enhancement layer alternate at all times, for example, in the case of L=16, the base layer has L/2=8 frames and the enhancement layer has 8 frames. Subsequent 120P subsets are monitored to detect an access unit having DTS.sup.E.sub.2 to impart a GOP number“k”. The correspondence relationship between the layers at this time is illustrated in
(31) For example, as illustrated in
(32) The access units, each of which the GOP number is imparted to in the above-described method, are passed to the MMT transmission control unit 113. When the GOP structure is fixed, the number of access units in each of the layers is fixed, so once the head is found, the head of the GOP can be recognized by simply counting the number of access units. In a case in which the GOP structure is variable, the above-described method is repeated by incrementing the GOP number.
(33) The MMT transmission control unit 113 receives the access units, constructs an MPU with the access units having the same GOP number, and performs IP transmission hierarchically according to the MMT protocol. The GOP number may be used as the MPU sequence number as is.
(34) Effects
(35) According to the present disclosure, in a case in which a hierarchical MPU is constructed by extracting access units from MPEG-2 TS in which a time-direction hierarchically encoded bit stream is multiplexed with different PIDs by layer, a GOP boundary of each layer can be correctly detected and an MPU can be constructed. Furthermore, using a GOP number, it is easy to impart an MPU sequence number common to an MPU including access units belonging to the same GOP.
(36) Furthermore, a method for detecting a GOP head access unit using types of NAL units constituting each access unit described in the embodiment is easy to implement and is also suitable for hardware in that the position of the nal_unit_type field indicating a type of each NAL unit is fixed to the second to seventh bytes at the head of the NAL unit.
(37) Furthermore, because it is possible to sequentially analyze the access units and detect the head, there is no need to provide a buffer for accumulating the access units. Therefore, even if a time difference between the base layer and the enhancement layer is large, buffer overflow does not occur. In addition, as no buffer is provided, the mounting costs can be reduced accordingly. Furthermore, it is advantageous that there is no increase in time delay that is caused by accumulation in the buffer.
INDUSTRIAL APPLICABILITY
(38) The present disclosure can be applied in the information communication industry.
REFERENCE SIGNS LIST
(39) 10, 100 Video transmission device 11 Input processing unit 12, 112 GOP number imparting unit 13 Hierarchical transmission control unit 111 TS processing unit 113 MMT transmission control unit