TRANSMISSION DEVICE, TRANSMISSION METHOD, RECEPTION DEVICE, AND RECEPTION METHOD
20180376173 ยท 2018-12-27
Assignee
Inventors
Cpc classification
H04N21/435
ELECTRICITY
H04N21/4312
ELECTRICITY
H04N21/2353
ELECTRICITY
International classification
H04N21/235
ELECTRICITY
H04N21/435
ELECTRICITY
Abstract
A process load for displaying a subtitle in a reception side is reduced. A video stream including encoded video data is generated. A subtitle stream including subtitle text information, which has display timing information, and abstract information, which has information corresponding to a part of pieces of information indicated by the text information is generated. A container in a predetermined format including the video stream and the subtitle stream is transmitted.
Claims
1. A transmission device comprising: a video encoder configured to generate a video stream including encoded video data; a subtitle encoder configured to generate a subtitle stream including subtitle text information, which has display timing information, and abstract information, which has information corresponding to a part of pieces of information indicated by the text information; and a transmitter configured to transmit a container in a predetermined format including the video stream and the subtitle stream.
2. The transmission device according to claim 1, wherein the abstract information includes subtitle display timing information.
3. The transmission device according to claim 2, wherein the subtitle display timing information has information of a display start timing and a display period.
4. The transmission device according to claim 3, wherein the subtitle stream is composed of a PES packet including a PES header and a PES payload, the subtitle text information and the abstract information are provided in the PES payload, and the display start timing is expressed as a display offset from a PTS inserted in the PES header.
5. The transmission device according to claim 1, wherein the abstract information includes display control information for controlling a subtitle display condition.
6. The transmission device according to claim 5, wherein the display control information includes information of at least one of a display position, a color gamut, and a dynamic range of the subtitle.
7. The transmission device according to claim 6, wherein the display control information further includes subject video information.
8. The transmission device according to claim 1, wherein the abstract information includes notification information for providing notification that there is a change in an element of the subtitle text information.
9. The transmission device according to claim 1, wherein the subtitle encoder divides the subtitle text information and abstract information into segments and generates the subtitle stream including a predetermined number of segments.
10. The transmission device according to claim 9, wherein in the subtitle stream, a segment of the abstract information is provided in the beginning and a segment of the subtitle text information is subsequently provided.
11. The transmission device according to claim 1, wherein the subtitle text information is in TTML or in a format related to TTML.
12. A transmission method comprising: a video encoding step of generating a video stream including encoded video data; a subtitle encoding step of generating a subtitle stream including subtitle text information, which has display timing information, and abstract information, which has information corresponding to a part of pieces of information indicated by the text information; and a transmission step of transmitting, by a transmitter, a container in a predetermined format including the video stream and the subtitle stream.
13. A reception device comprising: a receiver configured to receive a container in a predetermined format including a video stream and a subtitle stream, the video stream including encoded video data, and the subtitle stream including subtitle text information, which has display timing information, and abstract information, which has information corresponding to a part of pieces of information indicated by the text information; and a processor configured to control a video decoding process for obtaining video data by decoding the video stream, a subtitle decoding process for decoding the subtitle stream to obtain subtitle bitmap data and extract the abstract information, a video superimposing process for superimposing the subtitle bitmap data on the video data to obtain display video data, and a bitmap data process for processing the subtitle bitmap data to be superimposed on the video data on the basis of the abstract information.
14. The reception device according to claim 13, wherein the abstract information includes subtitle display timing information, and in the bitmap data process, the timing to superimpose the subtitle bitmap data on the video data is controlled on the basis of the subtitle display timing information.
15. The reception device according to claim 13, wherein the abstract information includes display control information for controlling the subtitle display condition, and in the bitmap data process, a condition of the subtitle bitmap to be superimposed on the video data is controlled on the basis of the display control information.
16. A reception method comprising: a reception step of receiving, by a receiver, a container in a predetermined format including a video stream and a subtitle stream, the video stream including encoded video data, and the subtitle stream including subtitle text information, which has display timing information, and abstract information, which has information corresponding to a part of pieces of information indicated by the text information; a video decoding step of decoding the video stream to obtain video data; a subtitle decoding step of decoding the subtitle stream to obtain subtitle bitmap data and extract the abstract information; a video superimposing step of superimposing subtitle bitmap data on the video data to obtain display video data; and a controlling step of controlling the subtitle bitmap data to be superimposed on the video data on the basis of the abstract information.
17.-20 (canceled)
Description
BRIEF DESCRIPTION OF DRAWINGS
[0041]
[0042]
[0043]
[0044]
[0045]
[0046]
[0047]
[0048]
[0049]
[0050]
[0051]
[0052]
[0053]
[0054]
[0055]
[0056]
[0057]
[0058]
[0059]
[0060]
[0061]
[0062]
[0063]
[0064]
[0065]
[0066]
[0067]
[0068]
[0069]
[0070]
[0071]
[0072]
[0073]
[0074]
[0075]
[0076]
[0077]
MODE FOR CARRYING OUT THE INVENTION
[0078] In the following, a mode for carrying out the present invention (hereinafter, referred to as an embodiment) will be described. Note that description will be provided in the following order.
1. Embodiment
2. Modification
1. EMBODIMENT
(Exemplary Configuration of Transmitting/Receiving System)
[0079]
[0080] The transmission device 100 generates an MPEG2 transport stream TS as a container and transmits the transport stream TS over airwave or a packet on a network. The transport stream TS includes a video stream containing encoded video data.
[0081] Further, the transport stream TS includes a subtitle stream. The subtitle stream includes text information of subtitles (caption) including display timing information and abstract information including information corresponding to a part of pieces of information indicated by the text information. According to the present embodiment, the text information is, for example, in Timed Text Markup Language (TTML) proposed by the World Wide Web Consortium ((W3C).
[0082] According to the present embodiment, the abstract information includes display timing information of the subtitles. The display timing information includes information of a display start timing and a display period. Here, the subtitle stream is composed of a PES packet including a PES header and a PES payload, the text information and display timing information of the subtitles are provided in the PES payload, and, for example, the display start timing is expressed by a display offset from a PTS, which is inserted in the PES header.
[0083] Further, according to the present embodiment, the abstract information includes display control information used to control a display condition of the subtitles. According to the present embodiment, the display control information includes information related to a display position, a color gamut, and a dynamic range of the subtitles. Further, according to the present embodiment, the abstract information includes information of the subject video.
[0084] The reception device 200 receives a transport stream TS transmitted from the transmission device 100 over airwave. As described above, the transport stream TS includes a video stream including encoded video data and a subtitle stream including subtitle text information and abstract information.
[0085] The reception device 200 obtains video data from the video stream, and obtains subtitle bitmap data and extracts abstract information from the subtitle stream. The reception device 200 superimposes the subtitle bitmap data on the video data and obtains video data for displaying. The television receiver 200 processes the subtitle bitmap data to be superimposed on the video data, on the basis of the abstract information.
[0086] According to the present embodiment, the abstract information includes the display timing information of the subtitles and the reception device 200 controls the timing to superimpose the subtitle bitmap data on the video data on the basis of the display timing information. Further, according to the present embodiment, the abstract information includes the display control information used to control the display condition of the subtitles (the display position, color gamut, dynamic range, and the like) and the reception device 200 controls the bitmap condition of the subtitles on the basis of the display control information.
(Exemplary Configuration of Transmission Device)
[0087]
[0088] The control unit 101 includes a central processing unit (CPU) and controls operation of each unit in the transmission device 100 on the basis of a control program. The camera 102 captures an image of a subject and outputs video data (image data) in a high dynamic range (HDR) or a standard dynamic range (SDR). An HDR image has a contrast ratio of 0 to 100%*N (N is greater than 1) such as 0 to 1000% exceeding luminance at a white peak of an SDR image. Here, 100% level corresponds to, for example, a white luminance value 100 cd/m.sup.2.
[0089] The video photoelectric conversion unit 103 performs a photoelectric conversion on the video data obtained in the camera 102 and obtains transmission video data V1. In this case, in a case where the video data is SDR video data, a photoelectric conversion is performed as applying an SDR photoelectric conversion characteristic and obtains SDR transmission video data (transmission video data having the SDR photoelectric conversion characteristic). On the other hand, in a case where the video data is HDR video data, the photoelectric conversion is performed as applying an HDR photoelectric conversion characteristic and obtains HDR transmission video data (transmission video data having the HDR photoelectric conversion characteristic).
[0090] The RGB/YCbCr conversion unit 104 converts the transmission video data in an RGB domain into a YCbCr (luminance/chrominance) domain. The video encoder 105 performs encoding with MPEG4-AVC, HEVC, or the like for example on transmission video data V1 converted in a YCbCr domain, and generates a video stream (PES stream) VS including encoded video data.
[0091] In this case, the video encoder 105 inserts meta-information, such as information indicating an electric-photo conversion characteristic corresponding to the photoelectric conversion characteristic of the transmission video data V1 (transfer function), information indicating a color gamut of the transmission video data V1, and information indicating a reference level, into a video usability information (VUI) region in an SPS NAL unit of an access unit (AU).
[0092] Further, the video encoder 105 inserts a newly defined dynamic range/SEI message (Dynamic_range SEI message) which has meta-information, such as information indicating an electric-photo conversion characteristic corresponding to the photoelectric conversion characteristic of the transmission video data V1 (transfer function) and information of a reference level, into a portion of SEIs of the access unit (AU).
[0093] Here, the reason why information indicating the electric-photo conversion characteristic is provided in the dynamic range/SEI message is because information indicating the electric-photo conversion characteristic corresponding to the HDR photoelectric conversion characteristic is needed in a place other than the VUI since information indicating an electric-photo conversion characteristic (gamma characteristic) corresponding to the SDR photoelectric conversion characteristic is inserted to the VUI of the SPS NAL unit in a case where the HDR photoelectric conversion characteristic is compatible with the SDR photoelectric conversion characteristic, even in a case that the transmission video data V1 is HDR transmission video data.
[0094]
[0095] Further, the reason why information of a reference level is provided to the dynamic range/SEI message is because an insertion of the reference level is not clearly defined, although information indicating an electric-photo conversion characteristic (gamma characteristic) corresponding to the SDR photoelectric conversion characteristic is inserted to the VUI of the SPS NAL unit in a case where the transmission video data V1 is SDR transmission video data.
[0096]
[0097] In a case where Dynamic_range__flag is 0, there is a following field. An 8-bit field in coded_data__depth indicates an encoding pixel bit depth. An 8-bit field in reference_level indicates a reference luminance level value as a reference level. A one-bit field in modify_tf_flag illustrates whether or not to correct a transfer function (TF) indicated by video usability information (VUI). 0 indicates that the TF indicated by the VUI is a target and 1 indicates that the TF of the VUI is corrected by using a TF specified by transfer_function of the SEI. An 8-bit field in transfer_function indicates an electric-photo conversion characteristic corresponding to the photoelectric conversion characteristic of the transmission video data V1.
[0098] Referring back to
[0099]
[0100] The TTML is composed of a header (head) and a body (body). In the header (head), there are elements of metadata (metadata), styling (styling), styling extension (styling extension), layout (layout), and the like.
[0101]
[0102] tts:origin specifies a start position of the region (Region), which is a subtitle display region expressed with the number of pixels. In this example, tts:origin480px 600px indicates that the start position is (480, 600) (see the arrow P) as illustrated in
[0103] tts:opacity=1.0 indicates a mix ratio of the subtitles (caption) and background video. For example, 1.0 indicates that the subtitle is 100% and the background video is 0%, and 0.1 indicates the subtitle (caption) is 0% and the background video is 100%. In the illustrated example, 1.0 is set.
[0104]
[0105]
[0106]
[0107] Referring back to
[0108]
[0109] Here, the PES data payload (PES data payload) may include segments of APTS (abstract_parameter_TimedText_segment), THAS (text_header_all_segment), and TBS (text_body_segment). Further, the PES data payload (PES data payload) may include segments of APTS (abstract_parameter_TimedText_segment) and TWS (text_whole_segment).
[0110]
[0111] An 8-bit field in subtitle_stream_id indicates an ID that identifies a type of the subtitle stream. In a case of a subtitle stream for transmitting text information, a new value, 0x01 for example, can be set to distinguish from a subtitle stream 0x00 for transmitting conventional bitmap.
[0112] In a TimedTextSubtitling_segments ( ) field, a group of segments are provided.
[0113]
[0114] Here, it is flexible whether to insert each segment to a subtitle stream and, for example, in a case where there is no change other than the display subtitle, only two segments of APTS (abstract_parameter_TimedText_segment) and TBS (text_body_segment) are included. In both cases, in the PES data payload, a segment of APTS having abstract information is provided in the beginning, followed by other segments. With such an arrangement, in the reception side, the segments of the abstract information are easily and efficiently extracted from the subtitle stream.
[0115]
[0116]
[0117]
[0118]
[0119]
[0120]
[0121]
[0122] In this manner, in a case where all the elements of TTML are transmitted in a single segment, as illustrated in
[0123] The ttnew:sequentialinorder composes information related to an order of TTML transmission. This ttnew:sequentialinorder is provided in front of <head>. ttnew:sequentialinorder=true (=1) indicates that there is a restriction related to the transmission order. In this case, it is indicated in <head> that <metadata>, <styling>, <styling extension>, and <layout> are provided in order, followed by <div>, and<p> text </p></div>, which are included in<body>. Here, in a case where <styling extension> does not exist, the order is <metadata>, <styling>, and <layout>. On the other hand, the ttnew:sequentialinorder=false (=0) indicates that there is not the above described restriction.
[0124] Since the element of ttnew:sequentialinorder is inserted in this manner, the order of TTML transmission can be recognized in the reception side and this helps to confirm that the TTML transmission is performed according to a predetermined order, simplify the processes up to decoding, and efficiently perform the decoding process even in a case where all elements of TTML are transmitted at once.
[0125] Further, the ttnew:partialupdate composes information as to whether there is an update of a TTML. This ttnew:partialupdate is provided before <head>. ttnew:partialupdate=true (=1) is used to indicate that there is an update of one of the elements in <head> before <body>. On the other hand, ttnew:partialupdate=false (=0) is used to indicate that there is not the above described update. Since an element of ttnew:sequentialinorder is inserted in this manner, the reception side can easily recognize whether there is an update of the TTML.
[0126] Here, an example in which two new elements of ttnew:sequentialinorder and ttnew:partialupdate are inserted to the element layer has been described above. However, an example maybe considered that those new elements are inserted to the layer of the segment as illustrated in
(Segment of APTS (abstract_parameter_TimedText_segment))
[0127] Here a segment of APTS (abstract_parameter_TimedText_segment) will be described. The APTS segment includes abstract information. The abstract information includes information related to a part of pieces of information indicated by TTML.
[0128]
[0129] The 4-bit field in APT_version_number indicates whether or not there is a change in the element in the APTS (abstract_parameter_TimedText_segment) from the previously transmitted content and, in a case where there is a change, its value is increased by one. The 4-bit field in TTM_version_number indicates whether or not there is a change in the element in THMS (text_header_metadata_segment) from the previously transmitted content and, in a case where there is a change, its value is increased by one. The 4-bit field in TTS_version_number indicates whether or not there is a change in the element in THSS (text_header_styling_segment) from the previously transmitted content and, in a case where there is a change, its value is increased by one.
[0130] The 4-bit field in TTSE_version_number indicates whether or not there is a change in the element in THSES (text_header_styling_extension_segment) from the previously transmitted content and, in a case where there is a change, its value is increased by one. The 4-bit field in TTL_version_number indicates whether or not there is a change in the element in THLS (text_header_layout_segment) from the previously transmitted content and, in a case where there is a change, its value is increased by one.
[0131] The 4-bit field in TTHA_version_number indicates whether or not there is a change in the element in THAS (text_header_all_segment) from the previously transmitted content and, in a case where there is a change, its value is increased by one. The 4-bit field in TW_version_number indicates whether or not there is a change in the element in TWS (text whole segment) from the previously transmitted content and, in a case where there is a change, its value is increased by one.
[0132] The 4-bit field in subtitle_display_area specifies a subtitle display area (subtitle area). For example, 0x1 specifies 640h*480v, 0x2 specifies 720h*480v, 0x3 specifies 720h*576v, 0x4 specifies 1280h*720v, 0x5 specifies 1920h*1080v, 0x6 specifies 3840h*2160v, and0x7 specifies 7680h*4320v.
[0133] The 4-bit field in subtitle__gamut_info specifies a color gamut, which is to be used for the subtitle. The 4-bit field in subtitle_dynamic_range_info specifies a dynamic range, which is to be used for the subtitle. For example, 0x1 indicates SDR and 0x2 in indicates HDR. In a case where HDR is specified for the subtitle, it is indicated that the luminance level of the subtitle is assumed to be suppressed equal to or lower than a reference white level of a video.
[0134] The 4-bit field in target_video_resolution specifies an assumed resolution of the video. For example, 0x1 specifies 640h*480v, 0x2 specifies 720h*480v, 0x3 specifies 720h*576v, 0x4 specifies 1280h*720v, 0x5 specifies 1920h*1080v, 0x6 specifies 3840h*2160v, and 0x7 specifies 7680h*4320v.
[0135] The 4-bit field in target_video_color_gamut_info specifies an assumed color gamut of the video. For example, 0x1 indicates BT.709 and 0x2 indicates BT.2020. The 4-bit field in target_video_dynamic_range_info specifies an assumed dynamic range of the video. For example, 0x1 indicates BT.709, 0x2 indicates BT.202x, and 0x3 indicates Smpte 2084.
[0136] The 4-bit field in number_of_regions specifies a number of regions. According to the number of the regions, the following fields are repeatedly provided. The 16-bit field in region_id indicates an ID of a region.
[0137] The 8-bit field in start_time_offset indicates a subtitle display start time as an offset value from the PTS. The offset value of start_time_offset is a signed value, and, a negative value expresses to start to display at timing earlier than the PTS. In a case where the offset value of this start_time_offset is zero, it indicates to start the display at timing of the PTS. A value in the case of an 8-bit expression is accurate up to a first decimal place, which is calculated by dividing a code value by ten.
[0138] The 8-bit field in end_time_offset indicates a subtitle display end time as an offset value from start_time_offset. This offset value indicates a display period. When the offset value of the above start_time_offset is zero, the display is ended at timing of a value in which the offset value of end_time_offset is added to the PTS. The value of an 8-bit expression is accurate up to a first decimal place, which is calculated by dividing a code value by ten.
[0139] Here, start_time_offset and end_time_offset may be transmitted with an accuracy of 90 kHz, same as the PTS. In this case, a 32-bit space is maintained for the respective fields of start_time_offset and end_time_offset.
[0140] As illustrated in
[0141] The 16-bit field in region_start_horizontal indicates a horizontal pixel position of an upper left corner (See point P in
[0142] Referring back to
[0143] An operation of the transmission device 100 of
[0144] In this case, in a case where the video data is SDR video data, the photoelectric conversion is performed as applying an SDR photoelectric conversion characteristic, and SDR transmission video data (transmission video data having the SDR photoelectric conversion characteristic) is obtained. On the other hand, in a case where the video data is HDR video data, the photoelectric conversion is performed as applying an HDR photoelectric conversion characteristic, and HDR transmission video data (transmission video data having the HDR photoelectric conversion characteristic) is obtained.
[0145] The transmission video data V1 obtained in the video photoelectric conversion unit 103 is converted from an RGB domain into YCbCr (luminance/chrominance) domain in the RGB/YCbCr conversion unit 104 and then provided to the video encoder 105. The video encoder 105 encodes the transmission video data V1 with MPEG4-AVC, HEVC, or the like for example and generates a video stream (PES stream) VS including the encoded video data.
[0146] Further, the video encoder 105 inserts, into a VUI region of an SPS NAL unit in an access unit (AU), meta-information such as information (transfer function) that indicates an electric-photo conversion characteristic corresponding to the photoelectric conversion characteristic of the transmission video data V1, information that indicates a color gamut of the transmission video data V1, and information that indicates a reference level.
[0147] Further, the video encoder 105 inserts, into a portion SEIs of the access unit (AU), a newly defined dynamic range/SEI message (see
[0148] The subtitle generation unit 106 generates text data (character code) DT as subtitle information. The text data DT is provided to the text format conversion unit 107. The text format conversion unit 107 converts the text data DT into subtitle text information having display timing information, which is TTML (see
[0149] The subtitle encoder 108 converts TTML obtained in the text format conversion unit 107 into various types of segments, and generates a subtitle stream SS, which is composed of a PES packet including a payload in which those segments are provided. In this case, in the payload of the PES packet, APTS segments (see
[0150] The video stream VS generated in the video encoder 105 is provided to the system encoder 109. The subtitle stream SS generated in the subtitle encoder 108 is provided to the system encoder 109. The system encoder 109 generates a transport stream TS including a video stream VS and a subtitle stream SS. The transport stream TS is transmitted to the reception device 200 by the transmission unit 110 over airwaves or a packet on a network.
(Exemplary Configuration of Reception Device)
[0151]
[0152] The control unit 201 has a central processing unit (CPU) and controls operation of each unit in the reception device 200 on the basis of a control program. The user operation unit 202 is a switch, a touch panel, a remote control transmission unit, or the like, which are used by a user such as a viewer to perform various operation. The reception unit 203 receives a transport stream TS transmitted from the transmission device 100 over airwaves or a packet on a network.
[0153] The system decoder 204 extracts a video stream VS and a subtitle stream SS from the transport stream TS. Further, the system decoder 204 extracts various types of information inserted in a transport stream TS (container) and transmits the information to the control unit 201.
[0154] The video decoder 205 performs a decode process on the video stream VS extracted in the system decoder 204 and outputs transmission video data V1. Further, the video decoder 205 extracts a parameter set and an SEI message, which are inserted in each access unit composing the video stream VS and transmits the parameter set and the SEI message to the control unit 201.
[0155] In the VUI region of the SPS NAL unit, information (transfer function) that indicates an electric-photo conversion characteristic corresponding to the photoelectric conversion characteristic of the transmission video data V1, information that indicates a color gamut of the transmission video data V1, and information that indicates a reference level, or the like is inserted. Further, the SEI message also includes a dynamic range/SEI message (see
[0156] The subtitle decoder 206 processes segment data in each region included in the subtitle stream SS and outputs bitmap data in each region, which is to be superimposed on video data. Further, the subtitle decoder 206 extracts abstract information included in an APTS segment and transmits the abstract information to the control unit 201.
[0157] The abstract information includes subtitle display timing information, subtitle display control information (information of a display position, a color gamut and a dynamic range of the subtitle), and also subject video information (information of a resolution, a color gamut, and a dynamic range) or the like.
[0158] Here, since the display timing information and display control information of the subtitle are included in XML information provided in segment_payload ( ) of a segment other than APTS, the display timing information and display control information of the subtitle may be obtained by scanning the XML information; however, the display timing information and display control information of the subtitle can be easily obtained only by extracting abstract information from an APTS segment. Here, the information of the subject video (information of a resolution, a color gamut, and a dynamic range) can be obtained from the system of a video stream VS; however, the information of the subject video can be easily obtained only by extracting abstract information from the APTS segment.
[0159]
[0160] The coded buffer 261 temporarily stores a subtitle stream SS. The subtitle segment decoder 262 performs a decoding process on segment data in each region stored in the coded buffer 261 at predetermined timing and obtains text data and a control code of each region.
[0161] The font developing unit 263 develops a font on the basis of the text data and control code of each region obtained by the subtitle segment decoder 262 and obtains subtitle bitmap data of each region. In this case, the font developing unit 263 uses, as location information of each region, location information (region_start_horizontal, region_start_vertical, region_end_horizontal, and region_end_vertical) included in the abstract information for example.
[0162] The subtitle bitmap data is obtained in the RGB domain. Further, it is assumed that the color gamut of the subtitle bitmap data corresponds to the color gamut indicated by the subtitle color gamut information included in the abstract information. Further, it is assumed that the dynamic range of the subtitle bitmap data corresponds to the dynamic range indicated by the subtitle dynamic range information included in the abstract information.
[0163] For example, in a case where the dynamic range information is SDR, it is assumed that the subtitle bitmap data has a dynamic range of SDR and a photoelectric conversion has been performed as applying an SDR photoelectric conversion characteristic. Further, for example, in a case where the dynamic range information is HDR, it is assumed that the subtitle bitmap data has a dynamic range of HDR and photoelectric conversion has been performed as applying an HDR photoelectric conversion characteristic. In this case, the luminance level is limited up to the HDR reference level assuming a superimposition on an HDR video.
[0164] The bitmap buffer 264 temporarily stores bitmap data of each region obtained by the font developing unit 263. The bitmap data of each region stored in the bitmap buffer 264 is read from the display start timing and superimposed on image data, and this process continues only during the display period.
[0165] Here, the subtitle segment decoder 262 extracts the PTS from the PES header of the PES packet. Further, the subtitle segment decoder 262 extracts abstract information from the APTS segment. These pieces of information are transmitted to the control unit 201. The control unit 201 controls timing to read the bitmap data of each region from the bitmap buffer 264 on the basis of the PTS and the information of start_time_offset and end_time_offset included in the abstract information.
[0166] Referring back to
[0167]
[0168] The electric-photo conversion unit 221 performs a photoelectric conversion on the input subtitle bitmap data. Here, in a case where the dynamic range of the subtitle bitmap data is SDR, the electric-photo conversion unit 221 performs an electric-photo conversion as applying an SDR electric-photo conversion characteristic to generate a linear state.
[0169] Further, in a case where the dynamic range of the subtitle bitmap data is HDR, the electric-photo conversion unit 221 performs an electric-photo conversion as applying an HDR electric-photo conversion characteristic to generate a linear state. Here, the input subtitle bitmap data may be in a linear state without a photoelectric conversion. In this case, the electric-photo conversion unit 221 is not needed.
[0170] The color gamut conversion unit 222 modifies the color gamut of the subtitle bitmap data output from the electric-photo conversion unit 221 to fit the color gamut of the video data. For example, in a case where the color gamut of the subtitle bitmap data is BT.709 and the color gamut of the video data is BT.2020, the color gamut of the subtitle bitmap data is converted from BT.709 to BT.2020. Here, in a case where the color gamut of the subtitle bitmap data is same as the color gamut of the video data, the color gamut conversion unit 222 practically does not perform any process and outputs the input subtitle bitmap data as it is.
[0171] The photoelectric conversion unit 223 performs a photoelectric conversion on the subtitle bitmap data output from the color gamut conversion unit 222, as applying a photoelectric conversion characteristic which is the same as the photoelectric conversion characteristic applied to the video data. The RGB/YCbCr conversion unit 224 converts the subtitle bitmap data output from the photoelectric conversion unit 223 from RGB domain to YCbCr (luminance/chrominance) domain.
[0172] The luminance level conversion unit 225 obtains output bitmap data by adjusting the subtitle bitmap data output from the RGB/YCbCr conversion unit 224 so that the maximum level of the luminance level of the subtitle bitmap data becomes equal to or lower than the luminance reference level of the video data or a reference white level. In this case, in a case where the luminance level of the subtitle bitmap data is already adjusted as considering rendering to HDR video, the input subtitle bitmap data is output as it is practically without being performed with any process if the video data is HDR.
[0173]
[0174] The encoding pixel bit depth adjustment unit 231 modifies the encoding pixel bit depth of the luminance level signal Ys of the subtitle bitmap data to fit to the encoding pixel bit depth of the video data. For example, in a case where the encoding pixel bit depth of the luminance level signal Ys is 8 bits and the encoding pixel bit depth of the video data is 10 bits, the encoding pixel bit depth of the luminance level signal Ys is converted from 8 bits to 10 bits. The level adjustment unit 232 generates an output luminance level signal Ys by adjusting the luminance level signal Ys in which the encoding pixel bit depth is made to fit so that the maximum level of the luminance level signal Ys becomes equal to or lower than the luminance reference level of the video data or a reference white level.
[0175]
[0176] The reference level exists between a maximum level (sc_high) and a minimum level (sc_low) of the luminance level signal Ys after the encoding pixel bit depth is made to fit. In this case, the maximum level (sc_high) is adjusted to be equal to or lower than the reference level. Here, in this case, a method to scale down to a linear state for example is employed, since a method of clipping causes a solid white pattern.
[0177] In a case where the subtitle bitmap data is superimposed on the video data by adjusting the level of the luminance level signal Ys in this manner, the high image quality can be maintained since the subtitle is prevented from being brightly displayed on the background video.
[0178] Here, the above description has described the component part 225Y (see
[0179] Referring back to
[0180] For example, a case where the subtitle is compatible with an HD resolution and the video has a UHD resolution will be explained. In this case, the UHD resolution exceeds the HD resolution and includes a 4K resolution or an 8K resolution.
[0181]
[0182]
[0183] Further, under the control by the control unit 201, the position/size conversion unit 208 performs a subtitle size conversion process on the subtitle bitmap data obtained in the color gamut/luminance level conversion unit 207 in response to an operation by the user such as a viewer or automatically on the basis of the relationship between the video resolution and the corresponding resolution of the subtitle for example.
[0184] As illustrated in
[0185] As illustrated in
[0186] In other words, proportion between a distance from rc to (rsx2, rsy2) and a distance from rc to (rsx1, rsy1) and a proportion between a distance from rc to (rex2, rey2) and a distance from rc to (rex1, rey1) are adjusted to correspond to the Ratio. This allows a size conversion of the subtitle (region) to be performed as maintaining relative location relationship of the entire display area since the center location rc of the region is kept at the same location even after the size conversion is performed.
[0187] Referring back to
[0188] The YCbCr/RGB conversion unit 210 converts the transmission video data V1 on which the subtitle bitmap data is superimposed, from YCbCr (luminance/chrominance) domain to RGB domain. In this case, the YCbCr/RGB conversion unit 210 performs the conversion by using a conversion equation corresponding to a color gamut on the basis of the color gamut information.
[0189] The electric-photo conversion unit 211 obtains display video data used to display an image by performing an electric-photo conversion on the transmission video data V1, which is converted to an RGB domain, as applying an electric-photo conversion characteristic corresponding to the photoelectric conversion characteristic which is applied thereto. The display mapping unit 212 adjusts a display luminance level on the display video data according to a maximum luminance level display performance of the CE monitor 213. The CE monitor 213 displays an image on the basis of the display video data, on which the display luminance level adjustment is performed. The CE monitor 213 is composed of a liquid crystal display (LCD), an organic electroluminescence display (organic EL display), and the like for example.
[0190] Operation of the reception device 200 illustrated in
[0191] The video stream VS extracted from the system decoder 204 is provided to the video decoder 205. The video decoder 205 performs a decoding process on the video stream VS and obtains a transmission video data V1. Further, the video decoder 205 extracts a parameter set and an SEI message inserted to each access unit, which composes a video stream VS, and transmits the parameter set and SEI message to the control unit 201.
[0192] In the VUI region of the SPS NAL unit, information (transfer function) that indicates an electric-photo conversion characteristic corresponding to the photoelectric conversion characteristic of the transmission video data V1, information that indicates a color gamut of the transmission video data V1, and information that indicates a reference level are inserted. Further, the SEI message also includes a dynamic range/SEI message (see
[0193] The subtitle stream SS extracted in the system decoder 204 is provided to the subtitle decoder 206. The subtitle decoder 206 performs a decoding process on segment data of each region included in the subtitle stream SS and obtains subtitle bitmap data of each region, which is to be superimposed on the video data.
[0194] Further, the subtitle decoder 206 extracts abstract information included in an APTS segment (see
[0195] Under the control by the control unit 201, in the subtitle decoder 206, output timing of the subtitle bitmap data of each region is controlled by the control unit 201 on the basis of the subtitle display timing information (start_time_offset and end_time_offset) included the abstract information for example.
[0196] The subtitle bitmap data of each region obtained by the subtitle decoder 206 is provided to the color gamut/luminance level conversion unit 207. Under the control by the control unit 201, the color gamut/luminance level conversion unit 207 modifies the color gamut of the subtitle bitmap data to fit the color gamut of the video data on the basis of the color gamut information (subtitle_color gamut_info and target_video_color_gamut_info) included in the abstract information for example.
[0197] Further, under the control by the control unit 201, the color gamut/luminance level conversion unit 207 adjusts the maximum luminance level of the subtitle bitmap data to be equal to or lower than the reference luminance level of the video data on the basis of the dynamic range information (subtitle_dynamic_range_info and target_video_dynamic_range_info) included in the abstract information for example.
[0198] The subtitle bitmap data of each region obtained by the color gamut/luminance level conversion unit 207 is provided to the position/size conversion unit 208. Under the control by the control unit 201, the position/size conversion unit 208 performs a location conversion process on the subtitle bitmap data of each region on the basis of the resolution information (subtitle_display_area and target_video_resolution) included in the abstract information for example.
[0199] Further, under the control by the control unit 201, the position/size conversion unit 208 performs a subtitle size conversion process on the subtitle bitmap data obtained by the color gamut/luminance level conversion unit 207 in response to an operation by a user such as a viewer or automatically on the basis of the relationship between the video resolution and the corresponding resolution of the subtitle for example.
[0200] The transmission video data V1 obtained by the video decoder 204 is provided to the video superimposition unit 209. Further, the subtitle bitmap data of each region obtained by the position/size conversion unit 208 is provided to the video superimposition unit 209. The video superimposition unit 209 superimposes the subtitle bitmap data of each region on the transmission video data V1. In this case, the subtitle bitmap data is mixed on the basis of a mix ratio indicated by the mix ratio information (Mixing data).
[0201] The transmission video data V1, which is obtained in the video superimposition unit 209 and on which the subtitle bitmap data of each region is superimposed, is converted from a YCbCr (luminance/chrominance) domain to an RGB domain in the YCbCr/RGB conversion unit 210 and provided to the electric-photo conversion unit 211. The electric-photo conversion unit 211 obtains display video data used to display an image by performing an electric-photo conversion on the transmission video data V1 as applying an electric-photo conversion characteristic corresponding to the photoelectric conversion characteristic applied thereto.
[0202] The display video data is provided to the display mapping unit 212. The display mapping unit 212 adjusts the display luminance level of the display video data according to a maximum luminance level display performance of the CE monitor 213. The display video data, in which the display luminance level is adjusted, is provided to the CE monitor 213. On the CE monitor 213, an image is displayed on the basis of the display video data.
[0203] As described above, in the transmitting/receiving system 10 illustrated in
[0204] In this case, in the reception side, since the process load is reduced, a time series display control in which display of subtitles changes relatively fast can be easily handled. For example, a case where the subtitle display changes as illustrated in
[0205] In this case, firstly, for example, as illustrated in
[0206] Then, the reception side outputs the bitmap data from display start timing T1 until display end timing T3, on the basis of the PTS1 and the display timing information (STS1, ETS1) included in the APTS segment. With this configuration, in the reception side, the letters of ABC are continuously displayed on the screen from T1 to T3 as illustrated in
[0207] Next, for example, as illustrated in
[0208] Then, the reception side outputs the bitmap data from display start timing T2 until display end timing T5 on the basis of the PTS2 and display timing information (STS2, ETS2) of the APTS segment. With this configuration, in the reception side, as illustrated in
[0209] Next, for example, as illustrated in
[0210] Then, the reception side outputs the bitmap data from display start timing T4 until display end timing T6, on the basis of PTS3 and display timing information (STS3, ETS3) included in the APTS segment. With this configuration, in the reception side, the letters of GHI are continuously displayed from T4 to T6 as illustrated in
2. MODIFIED EXAMPLE
[0211] Note that the above described embodiment has described an example that TTML is used as subtitle text information in a predetermined format having display timing information. However, the present technology is not limited to this example and other timed text information having information relevant to TTML may be used. For example, a format related to TTML may be used.
[0212] Further, the above described embodiment has described an example that the container is a transport stream (MPEG-2 TS). However, it is not limited to the MPEG-2 TS container and the present technology may be realized similarly with a container in other formats such as MMT or ISOBMFF for example.
[0213] Further, the above described embodiment has described an example that TTML and abstract information are included in a segment and provided in a PES data payload of a PES packet. However, according to the present technology, it may be considered that the TTML and abstract information are directly provided to the PES data payload.
[0214] Further, the above described embodiment has described a transmitting/receiving system 10 including the transmission device 100 and reception device 200; however, the configuration of the transmitting/receiving system to which the present technology is applied is not limited to the example. For example, a part of the reception device 200 may have a configuration of a set-top box and a monitor which are connected via a digital interface such as a High-Definition Multimedia Interface (HDMI). Note that HDMI is a registered trademark.
[0215] Further, the present technology may have the following configurations. [0216] (1) A transmission device including: [0217] a video encoder configured to generate a video stream including encoded video data;
[0218] a subtitle encoder configured to generate a subtitle stream including subtitle text information, which has display timing information, and abstract information, which has information corresponding to a part of pieces of information indicated by the text information; and
[0219] a transmission unit configured to transmit a container including the video stream and the subtitle stream. [0220] (2) The transmission device according to (1), in which the abstract information includes subtitle display timing information. [0221] (3) The transmission device according to (2), in which the subtitle display timing information has information of a display start timing and a display period. [0222] (4) The transmission device according to (3), in which
[0223] the subtitle stream is composed of a PES packet including a PES header and a PES payload,
[0224] the subtitle text information and the abstract information are provided in the PES payload, and
[0225] the display start timing is expressed as a display offset from a PTS inserted in the PES header. [0226] (5) The transmission device according to any of (1) to (4), in which the abstract information includes display control information for controlling a subtitle display condition. [0227] (6) The transmission device according to (5), in which the display control information includes information of at least one of a display position, a color gamut, and a dynamic range of the subtitle. [0228] (7) The transmission device according to (6), in which the display control information further includes subject video information. [0229] (8) The transmission device according to any of (1) to (7), in which the abstract information includes notification information for providing notification that there is a change in an element of the subtitle text information. [0230] (9) The transmission device according to any of (1) to (8), in which the subtitle encoder divides the subtitle text information and abstract information into segments and generates the subtitle stream including a predetermined number of segments. [0231] (10) The transmission device according to (9), in which in the subtitle stream, a segment of the abstract information is provided in the beginning and a segment of the subtitle text information is subsequently provided. [0232] (11) The transmission device according to any of (1) to (10), in which the subtitle text information is in TTML or in a format related to TTML. [0233] (12) A transmission method including:
[0234] a video encoding step of generating a video stream including encoded video data;
[0235] a subtitle encoding step of generating a subtitle stream including subtitle text information, which has display timing information, and abstract information, which has information corresponding to a part of pieces of information indicated by the text information; and
[0236] a transmission step of transmitting, by a transmission unit, a container including the video stream and the subtitle stream. [0237] (13) A reception device including
[0238] a reception unit configured to receive a container in a predetermined format including a video stream and a subtitle stream,
[0239] the video stream including encoded video data, and
[0240] the subtitle stream including subtitle text information including display timing information and abstract information including information corresponding to a part of pieces of information indicated by the text information;
[0241] a video decoder configured to decode the video stream and obtain video data;
[0242] a subtitle decoder configured to decode the subtitle stream, obtain subtitle bitmap data, and extract the abstract information;
[0243] a video superimposition unit configured to superimpose the subtitle bitmap data on the video data and obtain display video data; and
[0244] a control unit configured to control the subtitle bitmap data to be superimposed on the video data on the basis of the abstract information. [0245] (14) The reception device according to (13) in which
[0246] the abstract information includes subtitle display timing information, and
[0247] the control unit controls timing to superimpose the subtitle bitmap data on the video data on the basis of the subtitle display timing information. [0248] (15) The reception device according to (13) or (14) in which
[0249] the abstract information includes display control information used to control a subtitle display condition, and
[0250] the control unit controls a bitmap condition of the subtitle to be superimposed on the video data on the basis of the display control information. [0251] (16) A reception method including:
[0252] a reception step of receiving, by a reception unit, a container in a predetermined format including a video stream and a subtitle stream,
[0253] the video stream including encoded video data, and
[0254] the subtitle stream including subtitle text information, which has display timing information, and abstract information, which has information corresponding to a part of pieces of information indicated by the text information;
[0255] a video decoding step of decoding the video stream to obtain video data;
[0256] a subtitle decoding step of decoding the subtitle stream to obtain subtitle bitmap data and extract the abstract information;
[0257] a video superimposing step of superimposing subtitle bitmap data on the video data to obtain display video data; and
[0258] a controlling step of controlling the subtitle bitmap data to be superimposed on the video data on the basis of the abstract information. [0259] (17) A transmission device including:
[0260] a video encoder configured to generate a video stream including encoded video data;
[0261] a subtitle encoder configured to generate one or more segments in which an element of subtitle text information including display timing information is provided, and generate a subtitle stream including the one or more segments; and
[0262] a transmission unit configured to transmit a container in a predetermined format including the video stream and the subtitle stream. [0263] (18) The transmission device according to (17), in which
[0264] in a case where a segment in which all elements of the subtitle text information are provided is generated,
[0265] the subtitle encoder inserts information related to a transmission order and/or a presence or an absence of an update related to the subtitle text information into a layer of the segment or a layer of the elements. [0266] (19) The transmission device according to (17) or (18), in which
[0267] the subtitle text information is in TTML or in a format related to TTML. [0268] (20) A transmission method including:
[0269] a video encoding step of generating a video stream including encoded video data;
[0270] a subtitle encoding step of generating one or more segments in which an element of subtitle text information including display timing information is provided and generating a subtitle stream including the one or more segments; and
[0271] a transmission step of transmitting, by a transmission unit, a container in a predetermined format including the video stream and the subtitle stream.
[0272] A main characteristic of the present technology is that the process load to display subtitles in a reception side is reduced by including subtitle text information and abstract information corresponding to the text information in a subtitle stream (see
REFERENCE SIGNS LIST
[0273] 10 Transmitting/receiving system [0274] 100 Transmission device [0275] 101 Control unit [0276] 102 Camera [0277] 103 Video photoelectric conversion unit [0278] 104 RGB/YCbCr conversion unit [0279] 105 Video encoder [0280] 106 Subtitle generation unit [0281] 107 Text format conversion unit [0282] 108 Subtitle encoder [0283] 109 System encoder [0284] 110 Transmission unit [0285] 200 Reception device [0286] 201 Control unit [0287] 202 User operation unit [0288] 203 Reception unit [0289] 204 System decoder [0290] 205 Video decoder [0291] 206 Subtitle decoder [0292] 207 Color gamut/luminance level conversion unit [0293] 208 Position/size conversion unit [0294] 209 Video superimposition unit [0295] 210 YCbCr/RGB conversion unit [0296] 211 Electric-photo conversion unit [0297] 212 Display mapping unit [0298] 213 CE monitor [0299] 221 Electric-photo conversion unit [0300] 222 Color gamut conversion unit [0301] 223 Photoelectric conversion unit [0302] 224 RGB/YCbCr conversion unit [0303] 225 Luminance level conversion unit [0304] 225Y Component part [0305] 231 Encoding pixel bit depth adjustment unit [0306] 232 Level adjustment unit [0307] 261 Coded buffer [0308] 262 Subtitle segment decoder [0309] 263 Font developing unit [0310] 264 Bitmap buffer