Metadata transcoding
11075762 · 2021-07-27
Assignee
- Dolby Laboratories Licensing Corporation (San Francisco, CA)
- Dolby International Ab (Amsterdam Zuidoost, NL)
Inventors
- Andreas Schneider (Fuerth, DE)
- Christof Fersch (Neumarkt, DE)
- Martin Wolters (Nuremberg, DE)
- Jeffrey RIEDMILLER (Novato, CA, US)
- Scott Gregory NORCROSS (San Rafael, CA, US)
- Michael Grant (San Francisco, CA, US)
Cpc classification
H04N21/2355
ELECTRICITY
H04N21/4355
ELECTRICITY
H04L9/3242
ELECTRICITY
G10L19/173
PHYSICS
International classification
H04L9/32
ELECTRICITY
H04N21/235
ELECTRICITY
H04N19/40
ELECTRICITY
Abstract
The present document relates to transcoding of metadata, and in particular to a method and system for transcoding metadata with reduced computational complexity. A transcoder configured to transcode an inbound bitstream comprising an inbound content frame and an associated inbound metadata frame into an outbound bitstream comprising an outbound content frame and an associated outbound metadata frame is described. The inbound content frame is indicative of a signal encoded according to a first codec system and the outbound content frame is indicative of the signal encoded according to a second codec system. The transcoder is configured to identify an inbound block of metadata from the inbound metadata frame, the inbound block of metadata associated with an inbound descriptor indicative of one or more properties of metadata comprised within the inbound block of metadata, and to generate the outbound metadata frame from the inbound metadata frame based on the inbound descriptor.
Claims
1. A system for processing an encoded audio bitstream, the system comprising: a memory for storing the encoded audio bitstream, wherein the encoded audio bitstream includes a content frame and a metadata frame, the content frame including encoded audio content and metadata associated with the encoded audio content, and the metadata frame including a metadata container with additional metadata distinct from the metadata of the content frame; an audio decoder for decoding the content frame or the metadata frame; and an extractor for extracting the content frame or the metadata frame, wherein the metadata container includes a key identifier, one or more metadata payloads, and protection data, and wherein the metadata container further includes, for each of the one or more metadata payloads, a payload identifier, a metadata payload configuration, and a size.
2. The system of claim 1 wherein the metadata of the content frame comprises a dialogue normalization parameter.
3. The system of claim 1 wherein the key identifier is stored in a 3-bit field and the size is stored in an 8-bit field.
4. The system of claim 1 wherein an end of the metadata container is indicated when a value of the payload identifier is zero.
5. A method for processing an encoded audio bitstream, the method comprising: storing the encoded audio bitstream, wherein the encoded audio bitstream includes a content frame and a metadata frame, the content frame including encoded audio content and metadata associated with the encoded audio content, and the metadata frame including a metadata container with additional metadata distinct from the metadata of the content frame; decoding the content frame or the metadata frame; and extracting the content frame or the metadata frame, wherein the metadata container includes a key identifier, one or more metadata payloads, and protection data, and wherein the metadata container further includes, for each of the one or more metadata payloads, a payload identifier, a metadata payload configuration, and a size.
6. The method of claim 5 wherein the metadata of the content frame comprises a dialogue normalization parameter.
7. The method of claim 5 wherein the key identifier is stored in a 3-bit field and the size is stored in an 8-bit field.
8. The method of claim 5 wherein an end of the metadata container is indicated when a value of the payload identifier is zero.
9. A non-transitory computer readable medium comprising instructions that when executed by a processor cause the processor to perform the method of claim 5.
Description
SHORT DESCRIPTION OF THE FIGURES
(1) The invention is explained below in an exemplary manner with reference to the accompanying drawings, wherein
(2)
(3)
(4)
(5)
(6)
(7)
DETAILED DESCRIPTION
(8) As outlined in the background section, audio content is typically associated with metadata and encoded into a joint bitstream comprising a sequence of encoded content frames and an associated sequence of metadata containers (also referred to as metadata frames).
(9) The distribution system 100 comprises an encoder 101 which is configured to encode the audio content and provide an encoded bitstream 110 (also referred to as the first encoded bitstream 110 or the inbound encoded bitstream 110). The first encoded bitstream 110 typically comprises a sequence of encoded content frames 111, wherein an encoded content frame 111 may be associated with a corresponding metadata frame 112. The encoder 101 is configured to provide a first encoded bitstream 110 which is encoded in accordance to a first audio codec system. The first audio codec system may e.g. be one of: Dolby E, Dolby Digital Plus, Dolby Digital Dolby True HD, Dolby Pulse, AAC and/or HE-AAC. The content frames 111 may represent or may comprise a pre-determined number of samples of the audio content, e.g. 1536, 1024 or 512 samples of the audio content.
(10) The first encoded bitstream 110 is provided to a transcoder 103 via a transmission medium or via a storage medium 102. The transcoder 103 is configured to transcode or convert the first encoded bitstream 110 into a second encoded bitstream 120 (also referred to as an outbound bitstream 120), wherein the second encoded bitstream 120 is encoded in accordance to a second audio codec system. The second audio codec system may be different from the first audio codec system. On the other hand, the second audio codec system may be the same as the first audio codec, but use a different configuration e.g. a different bit-rate, a different frame-rate and/or a different channel configuration. The second audio codec system may e.g. be one of: Dolby E, Dolby Digital Plus, Dolby Digital Dolby True HD, Dolby Pulse, AAC and/or HE-AAC. In a similar manner to the first encoded bitstream 110, the second encoded bitstream 120 comprises a sequence of content frames 121 and a corresponding sequence of metadata frames 122. The content frames 121 of the second encoded bitstream 120 may have a frame size which is different from the frame size of the content frames 111 of the first encoded bitstream 110. The second encoded bitstream 120 may be provided to a decoder 104 for rendering of the audio content.
(11) The metadata frames 112, 122 may have a pre-determined structure. In other words, the metadata frames 112, 122 may follow a pre-determined syntax. By way of example, the metadata frames 112, 122 may follow the so called evolution frame syntax illustrated in Table 1. The evolution frame syntax may e.g. be used in the context of standardized multi-media content codec systems such as the Digital Video Broadcast (DVB) system and/or a Moving Picture Expert Group (MPEG) codec system. It should be noted that the metadata frame syntax shown in Table 1 and the following tables is only an example. Variations or modifications of the syntax are possible. In particular, the syntax shown in the present document may be extended by additional fields, e.g. for providing additional functionalities.
(12) TABLE-US-00001 TABLE 1 Syntax No.of bits Comments evo_frame() { key_id = variable_bits (3); while ((id = payload_id) != END) { 5 if (payload_id ==31) { payload_id += variable_bits (5) } payload_config(); payload_size =variable_bits (8); payload (payload_id, payload_size); } protection(); }
(13) The semantics of the parameters of the evolution frame shown in Table 1 may be as follows: key_id may be an identifier of the cryptographic key used for hashing (i.e. used for calculating the protection_bits of the protection( ) field). payload_id may be an identifier of the following application payload; a payload_id END=“0000b” may have the meaning that no further payload is contained in this evo_frame( ); payload_size may indicate the number of bytes in the following payload field.
(14) The evolution frame syntax specifies a metadata frame 112, 122 which may comprise a plurality of blocks of metadata, wherein a block of metadata is also referred to as a playload. As such, a metadata frame 112, 122 may comprise zero, one or more blocks of metadata, wherein each block of metadata is indicative of a particular type and/or a particular aspect of metadata. Example for types of metadata are descriptive metadata which describes particular aspects of the content frame 111 that the metadata frame 112 is associated with (e.g. tempo and/or harmonic information); unrelated metadata which comprises auxiliary data, which is not directly related to the content frame 111 (such as firmware upgrades for a target decoder of the encoded audio content); control metadata which may be used to control the rendering of one or more samples of the content frame 111 that the metadata frame 112 is associated with (e.g. loudness values for one or more samples of the content frame 111).
(15) As such, the metadata frame 112 provides a flexible structure which can be expanded by additional blocks of metadata as needed, in order to describe additional characteristics of the encoded audio content or in order to transmit additional auxiliary data within the bitstream 110. In case no metadata is to be transmitted along with a content frame 111, the metadata frame 112 may comprise no block of metadata, which may be indicated in the syntax of the evolution frame of Table 1 by a payload_id which corresponds to a pre-determined “END” ID (identifier).
(16) In the present document, it is proposed to add a descriptor to a block of metadata, wherein the descriptor describes one or more characteristics or properties of the metadata comprised within the associated block of metadata. This descriptor is referred to as “payload_config( )” in the syntax of the evolution frame shown in Table 1. The descriptor may be used by a transcoder to perform an efficient transcoding of the block of metadata, without the need to analyze the metadata comprised within the associated block of metadata. As a result of this, the complexity of the transcoding of metadata can be significantly reduced.
(17) In other words, the present document describes methods of transcoding blocks of metadata (also referred to as payloads) within a metadata frame 112 (e.g. within the evolution frame shown in Table 1) from one coded bitstream to another bitstream. The transcoding operations may be guided by specific fields within each payload (e.g. the field “payload_config( )” of a block of metadata, as shown in Table 1). The transcoding operations may then be specified such that the individual payloads can be appropriately transcoded from one coded stream to another coded stream, without the need to extract or interpret the essence of the underlying metadata parameters of the block of metadata (i.e. without the need to extract or interpret the essence of the underlying payloads).
(18)
(19) An example descriptor 142 of a block 140 of metadata for an evolution frame, i.e. an example “payload_config( )” field, is shown in Table 2. It can be seen that the descriptor 142 may comprise or may be indicative of one or more properties of the metadata comprised within the block 140. In the example of Table 2, the properties are a timestamp parameter indicative of a sample of the audio content, to which the metadata of the block 140 is applicable. The timestamp may indicate a sample which is comprised within the content frame 111 that is associated with the metadata frame 112 of the block 140. Alternatively or in addition, the timestamp may be configured to take on sufficiently large values, to indicate a sample which is comprised within a content frame that is succeeding the content frame 111 which is associated with the metadata frame 112 of the block 140. a duration parameter indicative of the number of samples (starting from the sample indicated by the timestamp), for which the metadata of the block 140 is applicable. a transcoding flag (referred to as a “don't transcode” flag in Table 2) which provides an instruction to a transcoder on whether or not to transcode the block 140 of metadata. If the “don't transcode” flag is set, the transcoder may simply ignore or remove the block 140 of metadata when transcoding the inbound bitstream 110. This may be useful in case of metadata which is relevant only for the first codec system of the inbound bitstream 110, and does not make sense for any other codec system to which the bitstream 110 may be transcoded (as is the case e.g. for a cyclic redundancy check (CRC) which is generated over data comprised within the inbound bitstream 110. A CRC typically only makes sense, if the encoded data are not modified, so that there is no need to transcode the CRC). In more general terms, the transcoding flag may be used to identify metadata that is only useful during the decode process of the inbound bitstream within the transcoder (and therefore not required for the subsequent re-encode process for generating the outbound bitstream). a duplicate flag which provides an instruction to a transcoder on whether or not to duplicate the metadata comprised within the block 140, when the size of the content frame 111 prior and subsequent to transcoding differs. a de-duplicate flag which provides an instruction to a transcoder on whether or not to remove duplicates of the metadata comprised within the block 140, when the size of the content frame 111 prior and subsequent to transcoding differs. a priority parameter which provides an indication of the relative importance of the metadata comprised within the block 140. The transcoder may use the priority parameter to select one or more blocks 140 from a metadata frame 130, e.g. if the allowed bit-rate of the transcoded second bitstream 120 is reduced with respect to the bit-rate of the first bitstream 110. an association flag (referred to as the “now_or_never” flag in Table 2) which provides an indication to the transcoder on whether or not the metadata comprised within the block 140 is associated with the corresponding content frame 111. As such, if the “now_or_never” flag is set, the transcoder is aware of the fact that the metadata comprised within the block 140 should either be transcoded immediately or should be dropped (as the “now_or_never” flag indicates that the decoder cannot use the metadata if the metadata is delayed).
(20) TABLE-US-00002 TABLE 2 Syntax No. of bits Comments payload_config() { timestamp_present; 1 if (timestamp_present) { timestamp = variable_bits (11); } duration present; 1 if (duration_present) { duration = varaible_bits (11); } dont_transcode; 1 if (!dont_transcode) { duplicate; 1 deduplicate; 1 priority; 5 now_or_never; 1 tight_coupling 2 } }
(21) In other words, the semantics of the property parameters of the descriptor 142 shown in Table 2 may be as follows: a timestamp parameter indicating the offset in samples from the beginning of the content frame 111 to which the payload 143 in question belongs; a duration parameter indicating the time in samples for which the payload 143 in question remains valid; a dont_transcode flag that signals whether the payload 143 in question must be discarded when transcoding (flag=1) or whether transcoding can occur (flag=0); a duplicate flag that—when set to 1—signals that the payload 143 in question needs to be repeated during transcoding so that it appears in the transcoded blocks 140 between timestamp and timestamp+duration. The duplicate flag may be set, e.g. for loudness data to indicate that frames have the same dialnorm. In more general terms, the duplicate flag may be set for metadata that do not have a notion of time. The duplicate flag is typically not set for data that supports the concept of time by itself, like e.g. the bitstream of a codec. In other words, metadata that is internally timed may not be provided with a duplicate and/or de-duplicate flag which is set, wherein the term “internally timed” means that only the exact sequence of blocks of metadata is meaningful, i.e. a repetition or de-duplication would invalidate the metadata. An example for metadata which is internally timed is a different bitstream (different from the content comprised within the content frames) which is imbedded into a sequence of metadata blocks of a sequence of metadata frames. The payload of such a bitstream should never be repeated or de-duplicated. Otherwise, the bitstream would be repeated in parts or partially chopped. Another example for internally timed data is binary data, like an executable program. If such binary data is transmitted in multiple metadata blocks of multiple metadata frames, then the duplication or de-duplication of metadata blocks would invalidate the meaning of the binary data. a de-duplicate flag: The de-duplicate flag may ensure that during transcoding, every block of metadata of a particular id within the same outbound metadata frame beyond the first that has this flag set to one may be deleted. The de-duplicate flag may be set e.g. for loudness data like dialnorm that does not need to be present multiple times per outbound metadata frame 122. a “now_or_never” flag that indicates that a payload must not be delayed while transcoding. a PCM processing parameter, referred to as a “tight_coupling” parameter in Table 2. The PCM processing parameter may e.g. be used in the context of a PCM-connected transcoder as described below, in order to inform the PCM-connected transcoder on how to handle the metadata of a particular metadata frame which is associated with a particular content frame, in case of a modification of the samples of the signal comprised within the content frame. The function of the PCM processing parameter will be described in further detail below, when describing the functions of a PCM-connected transcoder.
(22) TABLE-US-00003 TABLE 3 Syntax No.of bits Comments payload (id, size) { for (i =0; i < size; i++) { payload_bytes[i]; 8 } }
(23) Table 3 shows the syntax of an example data field 143 of a block 140 of metadata.
(24) As outlined above, the bitstream syntax for carrying metadata (i.e. the metadata frame 130 comprising a block 140 of metadata) may define generic metadata properties (e.g. comprised in the descriptor 142, i.e. in the payload_config( ) field shown in Table 2). These properties enable a simple copying of the metadata from one inbound (i.e. first) bitstream 110 to an outbound (i.e. second) bitstream 120, even if the first codec (used for encoding the inbound bitstream 110) and the second codec (used for encoding the outbound bitstream 120) use different framing. The way that the copying of the metadata is done is guided by the properties comprised within the descriptor 142. The only thing that might need to be changed during the transcoding process may be the properties themselves. However, the modification of the properties comprised within the descriptor 142 does not require knowledge about the actual meaning of the metadata comprised within the data field 143 of the block 140.
(25) In the following, the example properties shown in Table 2 are described in more detail. In particular, it is described, how the transcoder 103 can make use of one or more of the properties indicated by the descriptor 142 for performing an efficient transcoding of the metadata comprised within a block 140 of metadata.
(26)
(27)
(28) In particular, the timestamp parameter 213 may indicate the location of the sample 202 relative to the last, i.e. the most recent, sample of the outbound content frame 221 that the outbound metadata frame 222 which comprises the timestamp parameter 213 is associated with. For this purpose, the timestamp parameter 213 may take on values which exceed the number of samples comprised within a content frame 221. In a similar manner, the timestamp parameter 213 may be configured to take on negative values. Such negative values could be used to indicate a sample 202 which is comprised in a future content frame, i.e. in a content frame which is subsequent to the content frame 221 associated with the metadata frame 222 comprising the timestamp parameter 213. By doing this, metadata may be transmitted prior to the one or more samples that it is associated with (e.g. that it is to be applied to).
(29) As such, the timestamp parameter 211 (possibly in combination with the association flag) enables a transcoder 103 to transmit the metadata associated with a timestamp 211 in a subsequent or preceding metadata frame 222 and adjust the timestamp 213 such that it refers to the same PCM sample 202 (even though after transcoding, the sample 202 is not comprised in the content frame 221 which is associated with the metadata frame 222 which comprises the particular block 140). As a result of this, the transcoder 103 is provided with some flexibility to smoothen the bit-rate of the second bitstream 120.
(30) It should be noted that—in a similar manner to the transcoder 103—the encoder 101 may be configured to include metadata for a sample into a subsequent metadata frame. As such, the encoder 101 may be configured to generate a timestamp 213 which points to a sample 202 which is comprised in a content frame 121 that is not the content frame that the metadata frame comprising the timestamp 213 is associated with.
(31)
(32) This is illustrated in
(33) It should be noted that the de-duplication flag of the one or more blocks 140 of the inbound metadata frame 112 may also be set. The transcoder 103 may be configured to only drop the blocks 140 of a second (or more) metadata frame 312 used to build the outbound metadata frame 122. In other words, the transcoder 103 may be configured to consider the de-duplicate flag only if more than one inbound metadata frame 112 is to be considered for generating an outbound metadata frame 122. As such, the de-duplicate flag may be used to prevent “duplicates” of a particular type of metadata block 140, while still ensuring that at least one metadata block 140 of the particular type is included.
(34)
(35) The de-duplicate flag may e.g. be used to identify metadata blocks 140 which are inserted into a plurality of succeeding metadata frames 112, 312 (e.g. into every metadata frame 112, 312 of a bitstream 110). As such, the de-duplicate flag enables a transcoder 103 to easily identify metadata blocks 140 which may be discarded (without the need of analyzing the metadata stored in the data field 143 of the metadata block 140). As a result, the computational complexity for transcoding metadata is reduced. On the other hand, a de-duplicate flag which is not set indicates that a corresponding block 140 of metadata should not be dropped. This may be used for auxiliary data, in order to ensure that the auxiliary data is not dropped, even if a plurality of inbound metadata frames 112, 312 are transcoded into a single outbound metadata frame 122.
(36)
(37) As outlined above, the descriptor 142 of a block 140 of metadata may be indicative of an association flag (referred to as the “now_or_never” flag in Table 2). The association flag may indicate that the metadata comprised within the block 140 may be delayed without impacting the content comprised in the associated content frame. As such, the syntax of the descriptor 142 may enable a transcoder 103 to delay metadata by an arbitrary amount of time, if this is one property of the metadata. This may be indicated by setting the flag now_or_never to 0. The association flag enables the transcoder 103 to transmit the metadata which is comprised within the block 140 e.g. when the underlying audio codec can “afford” the transmission of the metadata, e.g. when the content frames comprise silence. One example of metadata which may be delayed is auxiliary data or binary data, like a firmware upgrade, which does not need to be transmitted along with a particular content frame 121.
(38) As described in the context of Table 2, the descriptor 142 of a block 140 of metadata may be indicative of or may comprise a priority property or a priority parameter. The priority parameter may indicate a relative importance of the metadata of a particular block 140 (e.g. relative to the importance of other blocks 140). A transcoder 103 can decide to only transcode a certain number of metadata blocks 140 and to discard all other metadata blocks in the metadata frame 112. This may e.g. be required when transcoding from a higher bit-rate inbound bitstream 110 to a lower bit-rate outbound bitstream 120. The priority parameter may enable the transcoder 103 to select those blocks 140 of an inbound metadata frame 112 having the relative highest priorities and to discard (or delay) those blocks 140 having relative lower priorities.
(39) Applications and/or encoders 101 may provide multiple sets of metadata in the same metadata frame 112, each with a different priority. The multiple sets of metadata may be associated with different qualities of metadata. The priority of higher quality metadata may be lower than the priority of lower quality metadata. As such, the transcoder 103 may be configured to degrade the quality of the metadata by considering the priority parameter. By way of example, if priorities are set in a way such that scalability is possible, i.e. every metadata set can be applied if all metadata sets of the same application of a higher priority are transmitted, then a transcoder can gracefully degrade the quality of the metadata without having to know about the meaning of the metadata. In particular, the multiple sets of metadata may comprise incremental metadata, i.e. each set of metadata may add some quality to the set of metadata with the next highest priority. The highest quality of metadata may then be provided by combining all sets of metadata (from the highest priority down to the lowest priority). As such, an inbound metadata frame 112 may comprise a plurality of blocks 140 of incremental metadata, wherein the block 140 of metadata with the highest priority comprises a version of the metadata with minimum acceptable quality and wherein the blocks 140 with successively lower priority comprise incremental versions of metadata which allow to incrementally increase the quality of the metadata. As such, the transcoder 103 may decide on the quality of metadata which is included into the second bitstream 120 by considering the priority parameters of the plurality of blocks 140 of incremental metadata.
(40) As indicated in the example syntax of a metadata frame 112 shown in Table 1, the metadata frame 130 may comprise a protection field. The protection field may be used to enable the decoder 104 to verify whether the content of the metadata frame 130 and/or the content of the associated content frame has been modified and may therefore be invalid. In other words, the protection field may allow a decoder 104 to verify whether the metadata comprised within a metadata frame 130 and/or within an associated content frame is trustworthy or not. Table 4 shows an example syntax of a protection field of a metadata frame 130. The protection field may be comprised within the header 131 of the metadata frame 130.
(41) TABLE-US-00004 TABLE 4 Syntax No. of bits Comments protection() { protection_config_frame; 2 protection_config_history; 2 switch (protection_config_frame) { case 0: protection_bits_frame; 0 break; case 1: protection_bits_frame; 8 break; case 2: protection_bits_frame; 32 break; case 3: protection_bits_frame; 128 break; } switch (protection config history) { case 0: protection_bits_history; 0 break; case 1: protection_bits_history; 8 break; case 2: protection_bits_history; 32 break; case 3: protection_bits_history; 128 break; } }
(42) The semantics of the protection field may be as follows: protection_bits_frame may comprise the truncated protection payload of the current frame (comprising the content frame and/or the associated metadata frame). protection_bits_history may comprise the truncated protection payload of the current frame and of the frame(s) before the current frame (comprising the content frame and/or the associated metadata frame). An example scheme for securing a sequence of frames is described in WO2011/015369, the content of which is incorporated by reference.
(43) As such, the protection field may comprise one or more cryptographic values. One of the cryptographic values may be generated based on the metadata comprised within a current metadata frame (comprising the protection field) and/or based on the content frame associated with the current metadata frame. As such, it may be ensured that an isolated metadata frame and/or the associated content frame are not modified. Another one of the cryptographic values may be generated based on the metadata comprised within the current metadata frame and within one or more preceding metadata frames (as well as on the respective associated content frames). As such, it may be ensured that sequences of content frames and/or metadata frames are not modified.
(44) A cryptographic value may be determined at an encoder 101 by applying a one-way function to a group of one or more metadata frames 112, 312 and/or the associated content frames 111, 311. In particular, a cryptographic value may be generated using a key value and a cryptographic hash function (the so called one-way function). In particular, the cryptographic value may be generated by calculating an HMAC-MD5 (hash message authentication code) value for the data comprised within one or more metadata frames 112, 312 and for the data comprised within the one or more associated content frames 111, 311. Furthermore, the generation of the cryptographic value may comprise truncating of the HMAC-MD5 value, e.g. truncating to 16, 24, 32, 48, 64 or 128 bits. The truncation may be beneficial in view of reducing the required overhead for the cryptographic value in the encoded bitstream 110 comprising the metadata frames 112, 312. It should be noted that other hash functions, such a SHA-1 or SHA-256, may be used instead of MD5. Furthermore, it should be noted that the encoder 101 may be configured to transmit zero bits of a cryptographic value, i.e. to transmit no cryptographic value, e.g. in situation where no protection of the metadata is required.
(45) In more detail, the cryptographic value for one or more content frames 111, 311 and of one or more metadata frames 112, 312 may be determined by using a cryptographic hash function H(.) and a “secret” key K (also referred to as security key) which is typically padded to the right with extra zeros to the block size of the hash function H(.) to determine a hash message authentication code (HMAC) of the one or more content frames 111, 311 and of one or more metadata frames 112, 312. Let the ∥ sign denote a concatenation and the ⊕ sign denote an exclusive or, and the outer padding opad=0x5c5c5c . . . 5c5c and the inner padding ipad=0x363636 . . . 3636 be constants of the length of the block size of the hash function H(.), then the HMAC value of the one or more content frames 111, 311 and of one or more metadata frames 112, 312 may be written as
HMAC(m)=H((K⊕opad)∥H((K⊕ipad)∥m)),
where m is the combined bit sequence of the one or more content frames 111, 311 and of one or more metadata frames 112, 312. The block size used with MD5 or SHA-1 or SHA-256 hash functions is typically 512 bits. The size of the output of the HMAC operation is the same as that of the underlying hash function, i.e. 128 bits in case of MD5 or 160 bits in case of SHA-1.
(46) As such, the protection field may comprise at least two cryptographic values a frame cryptographic value (referred to as “protection_bits_frame” in Table 4) which is indicative of the authenticity of an individual content frame 111 and its associated metadata frame 112. The frame cryptographic value may be used to identify whether the data of the individual content frame 111 and its associated metadata frame 112 has been changed. The frame cryptographic value may be determined using a message m which comprises the bit sequence of the individual content frame 111 and of its associated metadata frame 112 (or of the payload comprised within the individual content frame 111 and of its associated metadata frame 112. a history cryptographic value (referred to as “protection_bits_history” in Table 4) which is indicative of the authenticity of a sequence of at least two content frames 111, 311 and their associated at least two metadata frames 112, 312. The history cryptographic value may be used to identify whether the sequence of the at least two content frames 111, 311 and their associated metadata frames 112, 312 has been changed. The history cryptographic value may be determined using a message m which comprises the bit sequence of the at least two content frames 111, 311 and their associated at least two metadata frames 112, 312 (or of the payload comprised therein).
(47) As outlined above, the cryptographic values are determined using a secure key K, which is typically known only to the encoder 101 and the decoder 104. In the present document, it is proposed to enable multiple levels of trust by allowing the use of different secure keys K providing different levels of trust. By way of example, at least two levels of trustworthy keys may be provided a highly secure key K.sub.1, which may not be disclosed to any parties outside of the entity which provides the components 101, 103, 104 along a distribution chain 100. Such an entity may be a provider of the codec systems used along the distribution chain 100 (e.g. Dolby Laboratories). In particular, such an entity may be the provider of the encoders and the decoders used along the distribution chain 100. By keeping the highly secure key undisclosed, it can be ensured that a decoder 104 which renders the audio signal comprised within the received bitstream 120 can be certain that the metadata comprised within the metadata frames 122, 322 of the received bitstream 120 is authentic and has not been modified in an unauthorized manner along the distribution chain 100. a moderate secure key K.sub.2, which may be disclosed to other parties, e.g. parties operating some of the components 101, 103, 104 along the distribution chain 100 (e.g. licensees of the provider of the codec systems). If the decoder 104 receives a bitstream 120 which has been protected using the moderate secure key K.sub.2, the decoder 104 knows that the bitstream 120 comprises metadata (in the metadata frames 122, 322) which has been handled in accordance to some policies of the operator of the distribution chain 100, which may be different from the policies of the provider of the codec systems (holding the highly secure key K.sub.1)
(48) An indication of the secure key K used by the encoder 101 may be provided within a metadata frame 130 (e.g. within the header 131 of the metadata frame 130). This is illustrated in Table 1 which shows the key_id parameter. The key_id parameter may comprise an index to a pre-determined number of secure keys, thereby allowing the decoder 104 to determine the secure key K, which was used to determine the one or more cryptographic values, wherein the one or more cryptographic values may be comprised in the protection( ) field of the metadata frame 130, as shown in Table 4). The decoder 104 may then use the identified secure key to determine the one or more cryptographic values in the same manner as done by the corresponding encoder 101. The cryptographic values which are determined by the decoder 104 may be referred to as the verification cryptographic values. The verification cryptographic values are then compared with the cryptographic values stored in the metadata frame 103. In case of a match, it is confirmed that the individual frame and/or that the sequence of frames has not been modified. On the other hand, in case of a mismatch, it is confirmed that the individual frame and/or that the sequence of frames has been modified.
(49) Alternatively or in addition to providing an indication of the secure key within the metadata frame 130, the decoder 104 may be configured to determine a plurality of sets of verification cryptographic values using a plurality of pre-determined secure keys known to the decoder 104. If one of the sets of verification cryptographic values matches the cryptographic values comprised in the metadata frame 130, the decoder 104 knows which secure key has been used and that the individual frame and/or that the sequence of frames has not been modified. On the other hand, a mismatch for all sets of verification cryptographic values indicates that the individual frame and/or that the sequence of frames has been modified.
(50) Being able to detect which key was used to secure a bitstream 110, 120 in decoders 104 and transcoders 103 enables applications to make finer grained decisions on what to do with data of different trustworthiness. Decisions might be different depending on the detected secure key. In particular, the highly secure key may be detected, the moderate secure key may be detected or no valid key may be detected and the security check may not pass.
(51) As such, levels of trustworthiness may be provided when using a plurality of different secure keys (which are attached to different levels of trust), compared a solution which only uses a single secure key, where only a binary decision can be made on whether data can be trusted or not.
(52) As described in the context of
(53) The transcoder 103 may be a so called professional transcoder which is a device used by professional content providers such as broadcasters. As outlined above, the transcoder 103 may be configured to accept the inbound bitstream 110 in a first format (e.g. Dolby E) and to transcode the inbound bitstream 110 into a different format (e.g. Dolby Digital Plus). Such transcoders 103 typically incorporate one or more decoders (for decoding the inbound bitstream 110) and one or more encoders (for encoding the outbound bitstream 120).
(54) A PCM-connected transcoder may have one or more PCM processing stages between the decoder and the encoder. Loudness leveling is one example of such PCM processing. Other examples of PCM processing are sample rate conversion, channel downmixing, and/or channel upmixing.
(55) Such PCM-connected transcoders 103 pose a challenge with regards to authenticity, protection and trust issues outlined above. As outlined above, an inbound bitstream 110 may comprise metadata frames 112, 312 which are protected using one or more cryptographic values (comprised e.g. in the protection field of the metadata frames 112, 312 as shown in Tables 1 and 4). A PCM-connected transcoder 103 allows a user to modify PCM data derived from the content frames 111, 311, thereby possibly invalidating the metadata comprised within the associated metadata frames 112, 312, and thereby possibly compromising the trustworthiness of the metadata.
(56) In the present document, a method and a system for ensuring the trustworthiness of metadata in a transcoder 103 is described. In particular, the described method and system allow the trustworthiness of metadata comprised in metadata frames 112, 312 to be maintained, even when using a PCM-connected transcoder 103.
(57)
(58) Typically, a decoder 504 provides an unprotected set of PCM data and metadata (e.g. on a frame by frame basis). In other words, the decoder 504 typically decodes each content frame 111 and associated metadata frame 112 and provides the respective set of PCM data and metadata without protection. As such, the decoder 504 provides a sequence of sets of PCM data and metadata from a corresponding sequence of content frames 111 and metadata frames 112. The sequence of sets of PCM data and metadata may be modified by the transcoder and may then be passed to an encoder 501 which is configured to convert the sequence of (possibly modified) sets of PCM data and metadata to the outbound bitstream 120. In this context, the encoder 501 is typically not able to verify whether the sequence of (possibly modified) sets of PCM data and metadata has been modified in a sensible manner. In other words, the encoder 501 may not verify the trustworthiness of the sequence of (possibly modified) sets of PCM data and metadata.
(59) In the present document, it is proposed to enable the decoder 504 to provide one or more signature values based on one or more sets of PCM data and metadata, thereby allowing the protection of the PCM connection between the decoder 504 and the encoder 501. The signature values may be determined in a similar manner to the cryptographic values, as described above. However, the signature values may make use of a message m which comprises one or more sets of PCM data and metadata (in contrast to one or more content frames and associated metadata frames). In particular, the decoder 504 may be configured to determine a frame signature value based on an individual set of PCM data and associated metadata; and a history signature value based on two or more sequential sets of PCM data and associated metadata.
(60) In other words, within the PCM domain of a PCM-connected transcoder 503 (i.e. between the decoder 504 and the encoder 501), the trustworthiness of the content may be “protected” using one or more signatures (also referred to as signature values). The decoder 504 may be configured to produce one or more signature values as an output. The one or more signature values may be calculated over the union of PCM data and regular metadata (taken from the content frame) and additional metadata (taken from the associated metadata frame), as produced by the decoder 504. As such, for each frame of the inbound bitstream 110, one or more signature values may be determined based on the decoded sets of PCM data and metadata. These one or more signature values may be used by the corresponding encoder 501 to verify whether a received set of PCM data and metadata has been modified or not, and/or is trustworthy or not.
(61) The encoder 501 accepts the one or more signature values as an input, along with PCM data, regular metadata and the additional metadata. The encoder 501 may then check the signature values against the other inputs (i.e. against the received set(s) of PCM data and metadata). If the other inputs have been modified/tampered, the signature check will fail and the encoder will take appropriate action. The verification of the one or more signature values may be performed at the encoder 501 by determining verification signature values based on the received one or more sets of PCM data and metadata (in a similar manner, as described for the cryptographic values).
(62) As such, the trustworthiness of the decoded PCM data (and the associated metadata) may be maintained within a PCM-connected transcoder 503 by enabling the decoder 504 to determine one or more signature values based on the decoded PCM data and the associated metadata and by enabling the corresponding encoder 501 to verify the authenticity of the to-be-encoded PCM data (and the associated metadata) based on the one or more signature values. The determination of the one or more signature values and its verification may be performed based on a single or based on a plurality of leveled security keys K.sub.1 and K.sub.2, as outlined above, wherein the one or more security keys may only be known to the decoder 504 and the encoder 501, and are typically unknown to an entity performing PCM processing on the connection between the decoder 504 and the encoder 501.
(63) The use of one or more signature values allows the implementation of various use cases as illustrated in
(64)
(65) As indicated above, the encoder 501 may be configured to drop the additional metadata from the inbound metadata frames 112, if the one or more signature values are not valid. As outlined in the context of Table 2, the metadata blocks 140 of an inbound metadata frame 112 may be indicative of respective descriptors 142 which describe one or more properties of the corresponding metadata blocks 140. One of these properties may be the PCM processing parameter (referred to as tight_coupling parameter in Table 2). The encoder 501 may be configured to use the PCM processing parameter of a metadata block 142, in order to decide on whether or not to include the metadata comprised within the metadata block 142 into the outbound bitstream 120. In particular, the PCM processing parameter may indicate to the encoder 501 to include metadata from a block 140 of the inbound metadata frame 112 into the outbound bitstream 120, even though the PCM samples of the associated content frame 111 have been modified.
(66) Table 5 shows example semantics of the PCM processing parameter (i.e. of the tight_coupling parameter of Table 2). In the illustrated example, a value “0” of the PCM processing parameter indicates that the payload 143 (i.e. the metadata) of a block 140 of metadata should be included into the outbound bitstream 120 only if no PCM processing occurred, e.g. only if the one or more signature values have been verified by the encoder 501. On the other hand, a value “3” of the PCM processing parameter may indicate that the payload 143 of the block 140 should always be included into the outbound bitstream 120, even if the PCM samples have been modified, e.g. even if the one of more signature values have not been verified. Furthermore, the PCM processing parameter may take on values which indicate intermediate situations, i.e. the PCM processing parameter may take on values which indicate the PCM processing conditions that need to be met for payload 143 to be included into the outbound bitstream 120 or which indicate the PCM processing conditions, in case of which the payload 143 is not included into the outbound bitstream 120.
(67) The PCM processing stage 505 may be configured to inform the encoder 501 on the processing which has been performed on the PCM samples in the PCM processing stage 505. In other words, the PCM processing stage 505 may be configured to inform the encoder 501 about the PCM processing conditions (e.g. conversion of the sampling rate of the PCM samples, inclusion of a system sound into the PCM samples, modification of the metadata, modification of a channel configuration (e.g. modification of a mono signal to a stereo signal, or downmixing of a 5.1 multi-channel signal to a stereo signal), leveling of the loudness, etc.). As such, the encoder 501 may be configured to receive indications of the PCM processing conditions from the PCM processing stage 505. Furthermore, the encoder 501 may be configured to process the metadata of a block 140 of metadata, based on the received PCM processing conditions and based on the value of the PCM processing parameter (e.g. in accordance to the semantics of Table 5).
(68) TABLE-US-00005 TABLE 5 0 keep payload only if no PCM processing occurred 1 keep payload if one or more of the following changes to PCM occurred: the sampling rate has been converted 2 keep payload if one or more of the following changes to PCM occurred: Any of the changes mentioned for case “1” above system sounds are mixed into PCM metadata have been modified the channel configuration has been changed loudness has been levelled 3 keep payload regardless of any PCM processing performed
(69)
(70)
(71)
(72) In the present document, methods and systems for transcoding metadata have been described. The methods and systems allow for a transcoding of metadata with a reduced computational complexity. In particular, it is proposed to provide descriptors for blocks of metadata, thereby enabling a transcoder to transcode the metadata based on the descriptors only, without the need of analyzing the actual metadata comprised within a block of metadata. By doing this, the complexity of a transcoder may be significantly reduced. Furthermore, the present document provides methods and systems for protecting metadata frames and for protecting PCM data in a PCM-connected transcoder. As a result, it can be ensured that a receiver of transcoder metadata is provided with an indication of the trustworthiness of the received metadata.
(73) The methods and systems described in the present document may be implemented as software, firmware and/or hardware. Certain components may e.g. be implemented as software running on a digital signal processor or microprocessor. Other components may e.g. be implemented as hardware and or as application specific integrated circuits. The signals encountered in the described methods and systems may be stored on media such as random access memory or optical storage media. They may be transferred via networks, such as radio networks, satellite networks, wireless networks or wireline networks, e.g. the Internet. Typical devices making use of the methods and systems described in the present document are portable electronic devices or other consumer equipment which are used to store and/or render audio signals.