Audio signal decoder, audio signal encoder, method for providing an upmix signal representation, method for providing a downmix signal representation, computer program and bitstream using a common inter-object-correlation parameter value
09805728 · 2017-10-31
Assignee
- Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V. (Munich, DE)
- Dolby International Ab (Amsterdam Zuid-Oost, NL)
Inventors
- Juergen Herre (Buckenhof, DE)
- Johannes Hilpert (Nuremberg, DE)
- Andreas Hoelzer (Erlangen, DE)
- Jonas Engdegard (Stockholm, SE)
- Heiko Purnhagen (Sundbyberg, SE)
Cpc classification
G10L19/20
PHYSICS
H04S5/005
ELECTRICITY
H04S2420/03
ELECTRICITY
H04S3/02
ELECTRICITY
G10L19/008
PHYSICS
International classification
G10L19/005
PHYSICS
G10L19/008
PHYSICS
Abstract
An audio signal decoder for providing an upmix signal representation on the basis of a downmix signal representation and an object-related parametric information and in dependence on a rendering information has an object parameter determinator. The object parameter determinator is configured to obtain inter-object-correlation values for a plurality of pairs of audio objects. The object parameter determinator is configured to evaluate a bitstream signaling parameter in order to decide whether to evaluate individual inter-object-correlation bitstream parameter values to obtain inter-object-correlation values for a plurality of pairs of related audio objects, or to obtain inter-object-correlation values for a plurality of pairs of related audio objects using a common inter-object-correlation bitstream parameter value. The audio signal decoder also has a signal processor configured to obtain the upmix signal representation on the basis of the downmix signal representation and using the inter-object-correlation values for a plurality of pairs of related objects and the rendering information.
Claims
1. An audio signal encoder for providing a bitstream representation on the basis of a plurality of audio object signals, the audio signal encoder comprising: a downmixer configured to provide a downmix signal on the basis of the audio object signals and in dependence on downmix parameters describing contributions of the audio object signals to one or more channels of the downmix signal; and a parameter provider configured to provide a common inter-object-correlation bitstream parameter value associated with a plurality of pairs of related audio object signals, and to also provide a bitstream signaling parameter indicating that the common inter-object-correlation bitstream parameter value is provided instead of a plurality of individual inter-object-correlation bitstream parameter values; wherein the parameter provider is configured to also provide an object relationship information describing whether two audio objects are related to each other; and a bitstream formatter configured to provide a bitstream comprising a representation of the downmix signal, a representation of the common inter-object-correlation bitstream parameter value and the bitstream signaling parameter.
2. The audio signal encoder according to claim 1, wherein the parameter provider is configured to provide the common inter-object-correlation bitstream parameter value in dependence on a ratio between a sum of cross power terms and a sum of average power terms.
3. The audio signal encoder according to claim 2, wherein the parameter provider is configured to compute the cross power term for a given pair of audio objects by evaluating a sum of products of spectral coefficients associated with audio objects of the given pair of audio objects over a plurality of time instances, or over a plurality of frequency instances; and wherein the parameter provider is configured to compute the average power term for the given pair of audio objects by evaluating a geometric mean of a power value representing the power of a first audio object over a plurality of time instances or over a plurality of frequency instances, and of a power value representing the power of a second audio object over a plurality of time instances or over a plurality of frequency instances.
4. The audio signal encoder according to claim 2, wherein the parameter provider is configured to provide the common inter-object-correlation bitstream parameter value IOC.sub.single according to
5. The audio signal encoder according to claim 1, wherein the parameter provider is configured to provide a predetermined constant value as the common inter-object-correlation bitstream parameter value.
6. The audio signal encoder according to claim 1, wherein the parameter provider is configured to selectively evaluate an inter-object-correlation of audio objects, for which the object relationship information indicates a relationship, for a computation of the common inter-object-correlation bitstream parameter value.
7. A method for providing a bitstream representation on the basis of a plurality of audio object signals, the method comprising: providing a downmix signal on the basis of the audio object signals and in dependence on downmix parameters describing contributions of the audio object signals to the one or more channels of the downmix signal; and providing a common inter-object-correlation bitstream parameter value associated with a plurality of pairs of related audio object signals; and providing a bitstream signaling parameter indicating that the common inter-object-correlation bitstream parameter value is provided instead of a plurality of individual inter-object-correlation bitstream parameter values; and providing an object-relationship information describing whether two audio objects are related to each other, providing a bitstream comprising a representation of the downmix signal, a representation of the common inter-object-correlation bitstream parameter value and the bitstream signaling parameter, wherein the method is performed using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.
8. A non-transitory digital storage medium having stored thereon a computer program for performing, when executed by a computer, a method for providing a bitstream representation on the basis of a plurality of audio object signals, the method comprising: providing a downmix signal on the basis of the audio object signals and in dependence on downmix parameters describing contributions of the audio object signals to the one or more channels of the downmix signal; and providing a common inter-object-correlation bitstream parameter value associated with a plurality of pairs of related audio object signals; and providing a bitstream signaling parameter indicating that the common inter-object-correlation bitstream parameter value is provided instead of a plurality of individual inter-object-correlation bitstream parameter values; and providing an object-relationship information describing whether two audio objects are related to each other, providing a bitstream comprising a representation of the downmix signal, a representation of the common inter-object-correlation bitstream parameter value and the bitstream signaling parameter, when the computer program runs on a computer.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Embodiments according to the invention will subsequently be described taking reference to the enclosed FIGS. in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
DETAILED DESCRIPTION OF THE INVENTION
(13) 1. Audio Signal Decoder According to
(14) In the following, an audio signal decoder 100 will be described taking reference to
(15) Firstly, input and output signals of the audio signal decoder 100 will be described. Subsequently, the structure of the audio signal decoder 100 will be described and, finally, the functionality of the audio signal decoder 100 will be discussed.
(16) The audio signal decoder 100 is configured to receive a downmix signal representation 110, which typically represents a plurality of audio object signals, for example, in the form of a one-channel audio signal representation or a two-channel audio signal representation.
(17) The audio signal decoder 100 also receives an object-related parametric information 112, which typically describes the audio objects, which are included in the downmix signal representation 110.
(18) For example, the object-related parametric information 112 describes object levels of the audio objects, which are represented by the downmix signal representation 110, using object-level difference values (OLD).
(19) In addition, the object-related parametric information 112 typically represents inter-object-correlation characteristics of the audio objects, which are represented by the downmix signal representation 110. The object-related parametric information typically comprises a bitstream signalling parameter (also designated with “bsOneIOC” herein), which signals whether the object-rated parametric information comprises individual inter-object-correlation bitstream parameter values associated to individual pairs of audio objects or a common inter-object-correlation bitstream parameter value associated with a plurality of pairs of audio objects. Accordingly, the object-related parametric information comprises the individual inter-object-correlation bitstream parameter values or the common inter-object-correlation bitstream parameter value, in accordance with the bitstream signalling parameter “bsOneIOC”.
(20) The object-related parametric information 112 may also comprise downmix information describing a downmix of the individual audio objects into the downmix signal representation. For example, the object-related parametric information comprises a downmix gain information DMG describing a contribution of the audio object signals to the downmix signal representation 110. In addition, the object-related parametric information may, optionally, comprise a downmix-channel-level-difference information DCLD describing downmix gain differences between different downmix channels.
(21) The signal decoder 100 is also configured to receive a rendering information 120, for example, from a user interface for inputting said rendering information. The rendering information describes an allocation of the signals of the audio objects to upmix channels. For example, the rendering information 120 may take the form of a rendering matrix (or entries thereof). Alternatively, the rendering information 120 may comprise a description of a desired rendering position (for example, in terms of spatial coordinates) of the audio objects and desired intensities (or volumes) of the audio objects.
(22) The audio signal decoder 100 provides an upmix signal representation 130, which constitutes a rendered representation of the audio object signals described by the downmix signal representation and the object-related parametric information. For example, the upmix signal representation may take the form of individual audio channel signals, or may take the form of a downmix signal representation in combination with a channel-related parametric side information (for example, MPEG-Surround side information).
(23) The audio signal decoder 100 is configured to provide the upmix signal representation 130 on the basis of the downmix signal representation 110 and the object-related parametric information 112 and in dependence on the rendering information 120. The apparatus 100 comprises an object-parameter determinator 140, which is configured to obtain inter-object-correlation values (at least) for a plurality of pairs of related audio objects on the basis of the object-related parametric information 112. For this purpose, the object-parameter determinator 140 is configured to evaluate the bitstream signalling parameter (“bsOneIOC”) in order to decide whether to evaluate individual inter-object-correlation bitstream parameter values to obtain the inter-object-correlation values for a plurality of pairs of related audio objects or to obtain the inter-object-correlation values for a plurality of pairs of related audio objects using a common inter-object-correlation bitstream parameter value. Accordingly, the object-parameter determinator 140 is configured to provide the inter-object-correlation values 142 for a plurality of pairs of related audio objects on the basis of individual inter-object-correlation bitstream parameter values if the bitstream signaling parameter indicates that a common inter-object-correlation bitstream parameter value is not available. Similarly, the object-parameter determinator determines the inter-object-correlation values 142 for a plurality of pairs of related audio objects on the basis of the common inter-object-correlation bitstream parameter value if the bitstream signaling parameter indicates that such a common inter-object-correlation bitstream parameter value is available.
(24) The object-parameter determinator also typically provides other object-related values, like, for example, object-level-difference values OLD, downmix-gain values DMG and (optionally) downmix-channel-level-difference values DCLD on the basis of the object-related parametric information 112.
(25) The audio signal decoder 100 also comprises an signal processor 150, which is configured to obtain the upmix signal representation 130 on the basis of the downmix signal representation 110 and using the inter-object-correlation values 142 for a plurality of pairs of related audio objects and the rendering information 120. The signal processor 150 also uses the other object-related values, like object-level-difference values, downmix-gain values and downmix-channel-level-difference values.
(26) The signal processor 150 may, for example, estimate statistic characteristics of a desired upmix signal representation 130 and process the downmix signal representation such that the upmix signal representation 130 derive from the downmix signal representation comprises the desired statistic characteristics. Alternatively, the signal processor 150 may try to separate the audio object signals of the plurality of audio objects, which are combined in the downmix signal representation 110, using the knowledge about the object characteristics and the downmix process. Accordingly, the signal processor may calculate a processing rule (for example, a scaling rule or a linear combination rule), which would allow for a reconstruction of the individual audio object signals or at least of audio signals having similar statistical characteristics as the individual audio object signals. The signal processor 150 may then apply the desired rendering to obtain the upmix signal representation. Naturally, the computation of reconstructed audio object signals, which approximate the original individual audio object signals, and the rendering can be combined in a single processing step in order to reduce the computational complexity.
(27) To summarize the above, the audio signal decoder is configured to provide the upmix signal representation 130 on the basis of the downmix signal representation 110 and the object-related parametric information 112 using the rendering information 120. The object-related parametric information 112 is evaluated in order to have a knowledge about the statistical characteristics of the individual audio object signals and of the relationship between the individual audio object signals, which is needed by the signal processor 150. For example, the object-related parametric information 112 is used in order to obtain an estimated variance matrix describing estimated covariance values of the individual audio object signals. The estimated covariance matrix is then applied by the signal processor 150 in order to determine a processing rule (for example, as discussed above) for deriving the upmix signal representation 130 from the downmix signal representation 110, wherein, naturally, other object-related information may also be exploited.
(28) The object-parameter determinator 140 comprises different modes in order to obtain the inter-object-correlation values for a plurality of pairs of related audio objects, which constitutes an important input information for the signal processor 150. In a first mode, the inter-object-correlation values are determined using individual inter-object-correlation bitstream parameter values. For example, there may be one individual inter-object-correlation bitstream parameter value for each pair of related audio objects, such that the object-parameter determinator 140 simply maps such an individual inter-object-correlation bitstream parameter value onto one or two inter-object-correlation values associated with a given pair of related audio objects. On the other hand, there is also a second mode of operation, in which the object-parameter determinator 140 merely reads a single common inter-object-correlation bitstream parameter value from the bitstream and provides a plurality of inter-object-correlation values for a plurality of different pairs of related audio objects on the basis of this single common inter-object-correlation bitstream parameter value. Accordingly, the inter-object-correlation values for a plurality of pairs of related audio objects may, for example, be identical to the value represented by the single common inter-object-correlation bitstream parameter value, or may be derived from the same common inter-object-correlation bitstream parameter value. The object-parameter determinator 140 is switchable between said first mode and said second mode in dependence on the bitstream signalling parameter (“bsOneIOC”).
(29) Accordingly, there are different modes for the provision of the inter-object-correlation values, which can be applied by the object-parameter determinator 140. If there is a relatively small number of pairs of related audio objects, the inter-object-correlation values for said pairs of related audio objects are typically (in dependence on the bitstream signaling parameter) determined individually by the object-parameter determinator, which allows for a particularly precise representation of the characteristics of said pairs of related audio objects and, consequently, brings along the possibility of reconstructing the individual audio object signals with good accuracy in the signal processor 150. Thus, it is typically possible to provide a good hearing impression in such a case in which only correlations between a comparatively small number of pairs of related audio objects are relevant.
(30) The second mode of operation of the object-parameter determinator, in which a common inter-object-correlation bitstream parameter value is used to obtain inter-object-correlation values for a plurality of pairs of related audio objects, is typically used in cases in which there are non-negligible correlations between a plurality of pairs of audio objects. Such cases could conventionally not be handled without excessively increasing the bitrate of a bitstream representing both the downmix signal representation 110 and the object-related parametric information 112. The usage of a common inter-object-correlation bitstream parameter value brings along specific advantages if there are non-negligible correlations between a comparatively large number of pairs of audio objects, which correlations do not comprise acoustically significant variations. In this case, it is possible to consider the correlations with moderate bitrate effort, which brings along a reasonably good compromise between bitrate requirement and quality of the hearing impression.
(31) Accordingly, the audio signal decoder 100 is capable of efficiently handling different situations, namely situations in which there are only a few pairs of related audio objects, the inter-object-correlation of which should be taken into consideration with high precision, and situations in which there is a large number of pairs of related audio objects, the inter-object-correlations of which should not be neglected entirely but have some similarity. The audio signal decoder 100 is capable of handling both situations with a good quality of the hearing impression.
(32) 2. Audio signal Encoder According to
(33) In the following, an audio signal encoder 200 will be described taking reference to
(34) The audio signal encoder 200 is configured to receive a plurality of audio object signals 210a to 210N. The audio object signals 210a to 210N may, for example, be one-channel signals or two-channel signals representing different audio objects.
(35) The audio signal encoder 200 is also configured to provide a bitstream representation 220, which describes the auditory scene represented by the audio object signals 210a to 210N in a compact and bitrate-efficient manner.
(36) The audio signal encoder 200 comprises a downmixer 220, which is configured to receive the audio object signals 210a to 210N and to provide a downmix signal 232 on the basis of the audio object signals 210a to 210N. The downmixer 230 is configured to provide the downmix signal 232 in dependence on downmix parameters describing contributions of the audio object signals 210a to 210N to the one or more channels of the downmix signal.
(37) The audio signal encoder also comprises a parameter provider 240, which is configured to provide a common inter-object-correlation bitstream parameter value 242 associated with a plurality of pairs of related audio object signals 210a to 210N. The parameter provider 240 is also configured to provide a bitstream signalling parameter 244 indicating that the common inter-object-correlation bitstream parameter value 242 is provided instead of a plurality of individual inter-object-correlation bitstream parameters (individually associated with different pairs of audio objects).
(38) The audio signal encoder 200 also comprises a bitstream formatter 250, which is configured to provide a bitstream representation 250 comprising a representation of the downmix signal 232 (for example, an encoded representation of the downmix signal 232), a representation of the common inter-object-correlation bitstream parameter value 242 (for example, a quantized and encoded representation thereof) and the bitstream signalling parameter 244 (for example, in the form of a one-bit parameter value).
(39) The audio signal decoder 200 consequently provides a bitstream representation 220, which represents the audio scene described by the audio object signals 210a to 210N with good accuracy. In particular, the bitstream representation 220 comprises a compact side information if many of the audio object signals 210a to 210N are related to each other, i.e. comprise a non-negligible inter-object-correlation. In this case, the common inter-object-correlation bitstream parameter value 242 is provided instead of individual inter-object-correlation bitstream parameter values individually associated with pairs of audio objects. Accordingly, the audio signal encoder can provide a compact bitstream representation 220 in any case, both if there are many related pairs of audio object signals 210a to 210N and if there are only a few pairs of related audio object signals 210a to 210N. In particular the bitstream representation 220 may comprise the information needed by the audio signal decoder 100 as an input information, namely the downmix signal representation 110 and the object-related parametric information 112. Thus, the parameter provider 240 may be configured to provide additional object-related parametric information describing the audio object signals 210a to 210N as well as the downmix process performed by the downmixer 230. For example, the parameter provider 240 may additionally provide an object-level-difference information OLD describing the object levels (or object-level differences) of the audio object signals 210a to 210N. Furthermore, the parameter provider 240 may provide a downmix-gain information DMG describing downmix gains applied to the individual audio object signals 210a to 210N when forming the one or more channels of the downmix signal 232. Downmix-channel-level-difference values DCLD, which describe downmix gain differences between different channels of the downmix signal 232, may also, optionally, be provided by the parameter provider 240 for inclusion into the bitstream representation 220.
(40) To summarize the above, the audio signal encoder efficiently provides the object-related parametric information needed for a reconstruction of the audio scene described by the audio object signals 210a to 210N with a good hearing impression, wherein a compact common inter-object-correlation bitstream parameter value is used if there is a large number of related pairs of audio objects. This is signaled using the bitstream signaling parameter 244. Thus, an excessive bitstream load is avoided in such a case.
(41) Further details regarding the provision of a bitstream representation will be described below.
(42) 3. Bitstream According to
(43)
(44) The bitstream 300 may, for example, serve as an input bitstream of the audio signal decoder 100, carrying the downmix signal representation 110 and the object-related parametric information 112. The bitstream 300 may be provided as an output bitstream 220 by the audio signal encoder 200.
(45) The bitstream 300 comprises a downmix signal representation 310, which is a representation of a one-channel or multi-channel downmix signal (for example, the downmix signal 232) combining audio signals of a plurality of audio objects. The bitstream 300 also comprises object-related parametric side information 320 describing characteristics of the audio objects, the audio object signals of which are represented, in a combined form, by the downmix signal representation 310. The object-related parametric side information 320 comprises a bitstream signaling parameter 322 indicating whether the bitstream comprises individual inter-object-correlation bitstream parameters (individually associated with different pairs of audio objects) or a common inter-object-correlation bitstream parameter value (associated with a plurality of different pairs of audio objects). The object-related parametric side information also comprises a plurality of individual inter-object-correlation bitstream parameter values 324a, which is indicated by a first state of the bitstream signaling parameter 322, or a common inter-object-correlation bitstream parameter value, which is indicated by a second state of the bitstream signaling parameter 322.
(46) Accordingly, the bitstream 300 may be adapted to the relationship characteristics of the audio object signals 210a to 210N by adapting the format of the bitstream 300 to contain a representation of individual inter-object-correlation bitstream parameter values or a representation of a common inter-object-correlation bitstream parameter value.
(47) The bitstream 300 may, consequently, provide the chance of efficiently encoding different types of audio scenes with a compact side information, while maintaining the change of obtaining a good hearing impression for the case that there are only a few strongly-correlated audio objects.
(48) Further details regarding the bitstream will subsequently be discussed.
(49) 4. The MPEG SAOC System According to
(50) In the following, an MPEG SAOC system using a single IOC parameter calculation will be described taking reference to
(51) The MPEG SAOC system 400 according to
(52) The SAOC encoder 410 is configured to receive a plurality of, for example, L audio object signals 420a to 420N. The SAOC encoder 410 is configured to provide a downmix signal representation 430 and a side information 432, which are advantageously, but not necessarily, included in a bitstream.
(53) The SAOC encoder 410 comprises an SAOC downmix processing 440, which receives the audio object signals 420a to 420N and provides the downmix signal representation 430 on the basis thereof. The SAOC encoder 410 also comprises a parameter extractor 444, which may receive the object signals 420a to 420N and which may, optionally, also receive an information about the SAOC downmix processing 440 (for example, one or more downmix parameters). The parameter extractor 444 comprises a single inter-object-correlation calculator 448, which is configured to calculate a single (common) inter-object-correlation value associated with a plurality of pairs of audio objects. In addition, the single inter-object-correlation calculator 448 is configured to provide a single inter-object-correlation signaling 452, which indicates if a single inter-object-correlation value is used instead of object-pair-individual inter-object-correlation values. The single inter-object-correlation calculator 448 may, for example, decide on the basis of an analysis of the audio object signals 420a to 420N whether a single common inter-object-correlation value (or, alternatively, a plurality of individual inter-object-correlation parameter values associated individually with pairs of audio object signals) are provided. However, the single inter-object-correlation calculator 448 may also receive an external control information determining whether a common inter-object-correlation value (for example, a bitstream parameter value) or individual inter-object-correlation values (for example, bitstream parameter values) should be calculated.
(54) The parameter extractor 444 is also configured to provide a plurality of parameters describing the audio object signals 420a to 420N, like, for example, object-level difference parameters. The parameter extractor 444 is also advantageously configured to provide parameters describing the downmix, like, for example, a set of downmix-gain parameters DMG and a set of downmix-channel-level-difference parameters DCLD.
(55) The SAOC encoder 410 comprises a quantization 456, which quantizes the parameters provided by the parameter extractor 444. For example, the common inter-object-correlation parameter may be quantized by the quantization 456. In addition, the object-level-difference parameters, the downmix-gain parameters and the downmix-channel-level-difference parameters may also be quantized by the quantization 456. Accordingly, the quantized parameters are obtained by the quantization 456.
(56) The SAOC encoder 410 also comprises a noiseless coding 460, which is configured to encode the quantized parameters provided by the quantization 456. For example, the noiseless coding may noiselessly encode the quantized common inter-object-correlation parameter and also the other quantized parameters (for example, OLD, DMG and DCLD).
(57) Accordingly, the SAOC decoder 410 provides the side information 432 such that the side information comprises the single IOC signaling 452 (which may be considered as a bitstream signaling parameter) and the noiselessly-coded parameters provided by the noiseless coding 480 (which may be considered as bitstream parameter values).
(58) The SAOC decoder 420 is configured to receive the side information 432 provided by the SAOC encoder 410 and the downmix signal representation 430 provided by the SAOC encoder 410.
(59) The SAOC decoder 420 comprises a noiseless decoding 464, which is configured to reverse the noiseless coding 460 of the side information 432 performed in the encoder 410. The SAOC decoder 420 also comprises a de-quantization 468, which may also be considered as an inverse quantization (even though, strictly speaking, quantization is not invertible with perfect accuracy), wherein the de-quantization 468 is configured to receive the decoded side information 466 from the noiseless decoding 464. The de-quantization 468 provides the dequantized parameters 470, for example, the decoded and de-quantized common inter-object-correlation value provided by the single inter-object-correlation calculator 448 and also decoded and de-quantized object-level difference values OLD, decoded and de-quantized downmix-gain values DMG and decoded and de-quantized downmix-channel-level-difference values DCLD. The SAOC decoder 420 also comprises a single inter-object-correlation expander 474, which is configured to provide a plurality of inter-object-correlation values associated with a plurality of pairs of related audio objects on the basis of the common inter-object-correlation value. However, it should be noted that the single inter-object-correlation expander 474 may be arranged before the noiseless decoding 464 and the de-quantization 468 in some embodiments. For example, the single inter-object-correlation expander 474 may be integrated into a bitstream parser, which receives a bitstream comprising both the downmix signal representation 430 and the side information 432.
(60) The SAOC decoder 420 also comprises an SAOC decoder processing and mixing 480, which is configured to receive the downmix signal representation 430 and the decoded parameters included (in an encoded form) in the side information 432. Thus, the SAOC decoder processing and mixing 480 may, for example, receive one or two inter-object-correlation values for every pair of (different) audio objects, wherein the one or two inter-object-correlation values may be zero for non-related audio objects and non-zero for related audio objects. In addition, the SAOC decoder processing and mixing 480 may receive object-level-difference values for every audio object. In addition, the SAOC decoder processing and mixing 480 may receive downmix-gain values and (optionally) downmix-channel-level-difference values describing the downmix performed in the SAOC downmix processing 440. Accordingly, the SAOC decoder processing and mixing 480 may provide a plurality of channel signals 484a to 484N in dependence on the downmix signal representation 430, the side information parameters included in the side information 432 and an interaction information 482, which describes a desired rendering of the audio objects. However, it should be noted that the channels 484a to 484N may be represented either in the form of individual audio channel signals or in the form of a parametric representation, like, for example, a multi-channel representation according to the MPEG Surround standard (comprising, for example, an MPEG Surround downmix signal and channel-related MPEG Surround side information). In other words, both an individual channel audio signal representation and a parametric multi-channel audio signal representation will be considered as an upmix signal representation within the present description.
(61) In the following, some details regarding the functionality of the SAOC encoder 410 and of the SAOC decoder 420 will be described.
(62) The SAOC side information, which will be discussed in the following, plays an important role in the SAOC encoding and the SAOC decoding. The SAOC side information describes the input objects (audio objects) by means of their time/frequency variant covariance matrix. The N object signals 420a to 420N (also sometimes briefly designated as “objects”) can be written as rows in a matrix:
(63)
(64) Here, the entries s.sub.i(1) designate spectral values of an audio object having audio object index i for a plurality of temporal portions having time indices 1. A signal block of L samples represents the signal in a time and frequency interval which is a part of the perceptually motivated tiling of the time-frequency plane that is applied for the description of signal properties.
(65) Hence, the covariance matrix is given as
(66)
(67) The covariance matrix is typically used by the SAOC decoder processing and mixing 480 in order to obtain the channel signals 484a to 484N.
(68) The diagonal elements can directly be reconstructed at the SAOC decoder side with the OLD data, and the non-diagonal elements are given by the inter-object-correlations (IOCs) as
ρ.sub.mn=∥s.sub.m∥.Math.∥s.sub.n∥.Math.IOC.sub.mn.
(69) It should be noted that the object-level-difference values describe s.sub.m and s.sub.n.
(70) The number of inter-object-correlation values needed to convey the whole covariance matrix is N*N/2−N/2. As this number can get large (for example, for a large number N of object signals), resulting in a high bit demand, the SAOC encoder 410 (as well as the audio signal encoder 200) can, optionally, transmit only selected inter-object-correlation values for object pairs, which are signaled to be “related to” each other. This optional “related to” information is, for example, statically conveyed in an SAOC-specific configuration syntax element of the bitstream, which may, for example, be designated with “SAOCSpecificConfig( )”. Objects, which are not related to each other, are, for example, assumed to be uncorrelated, i.e. their inter-object-correlation is equal to zero.
(71) However, there exist application scenarios where all objects (or almost all objects) are related to each other. An example of such an application scenario is a telephone conference with a microphone setup and room acoustics with a high degree of inter-microphone cross talk. In these cases, the transmission of all IOC values would be needed (if the above-mentioned conventional mechanism was used), but usually would exceed the desired bit budget. As an alternative, assuming that all objects are uncorrelated would induce a large error in the model and, therefore, would yield sub-optimal audio quality of the rendered scene.
(72) The underlying assumption of the proposed approach is that for certain SAOC application scenarios, uncorrelated sound sources result in correlated SAOC input objects due to the acoustic environment they are located in and due to the applied recording techniques.
(73) Considering a telephone conference setup, for instance, the impact of the room reverberation and the imperfect isolation of the individual speakers leads to correlated SAOC objects although the talking of the individual subjects is uncorrelated. These acoustical circumstances and the resulting correlation can be approximately described with a single frequency- and time-varying value.
(74) Thus, the proposed method successfully circumvents the high bitrate demand of conveying all desired object correlations. This is done by calculating a single time/frequency dependent single IOC value in a dedicated “single IOC calculator” module 448 in the SAOC encoder (see
(75) In a typical application, the bitstream header (for example, the “SAOCSpecificConfigQ” element according to the non-prepublished SAOC Standard [SAOC]) includes one bit indicating if “single IOC” signaling or “normal” IOC signaling is used. Some details regarding this issue will be discussed below.
(76) The payload frame data (for example, the “SAOCFrameQ” element in the non-prepublished SAOC Standard [SAOC]) then includes IOCs common for all objects or several IOCs depending on the “single IOCs” or “normal” mode.
(77) Hence, a bitstream parser (which may be part of the SAOC decoder) for the payload data in the decoder could be designed according to the example below (which is formulated in a pseudo C code):
(78) TABLE-US-00001 if (iocMode == SINGLE_IOC) { readIocDataFromBitstream(1); } else { readIocDataFromBitstream (numberOfTransmittedIocs); }
(79) According to the above example, the bitstream parser checks whether a flag “iocMode” (also designated with “bsOneIOC” in the following) indicates that there is only a single inter-object-correlation bitstream parameter value (which is signaled by the parameter value “SINGLE_IOC”). If the bitstream parser finds that there is only a single inter-object-correlation value, the bitstream parser reads one inter-object-correlation data unit (i.e., one inter-object-correlation bitstream parameter value) from the bitstream, which is indicated by the operation “readlocDataFromBitstream(1)”. If, in contrast, the bitstream parser finds that the flag “iocMode” does not indicate the usage of a single (common) inter-object-correlation value, the bitstream parser reads a different number of inter-object-correlation data units (e.g., inter-object-correlation bitstream parameter values) from the bitstream, which is indicated by the function “readlocDataFromBitstream (numberOfTransmittedlocs)”). The number (“numberOfTransmittedlocs”) of inter-object-correlation data units read in this case is typically determined by a number of pairs of related audio objects.
(80) Alternatively, the “single IOC” signalling can be present in the payload frame (for example, in the so-called “SAOCFrame( )” element in the non-prepublished SAOC Standard) to enable dynamical switching between single IOC mode and normal IOC mode on a per-frame basis.
(81) 5. Encoder-Sided Implementation of the Calculation of a Common Inter-Object-Correlation Bitstream Parameter
(82) In the following, some implementations for the single IOC (IOC.sub.single) calculation will be described.
(83) 5.1. Calculation Using Cross-Power Terms
(84) In an embodiment of the SAOC encoder 410, the common inter-object-correlation bitstream parameter value IOC.sub.single can be computed according to the following equation:
(85)
with the cross power terms
(86)
where n and k are the time and frequency instances (or time and frequency indices) for which the SAOC parameter applies.
(87) In other words, the common inter-object-correlation bitstream parameter value IOC.sub.single can be computed in dependence on a ratio between a sum of cross-power terms nrg.sub.ij (wherein the object index i is typically different from the object index j) and a sum of average energy values √{square root over (nrg.sub.iinrg.sub.jj)} (which average energy values represent, for example, a geometrical mean between the energy values nrg.sub.ii and nrg.sub.ij).
(88) The summation may be performed, for example, for all pairs of different audio objects, or for pairs of related audio objects only.
(89) The cross-power term nrg.sub.ij may, for example, be formed as a sum over complex conjugate products (with one of the factors being complex-conjugated) of spectral coefficients s.sub.i.sup.n,k, s.sub.j.sup.n,k associated with the audio object signals of the pair of audio objects under consideration for a plurality of time instances (having time indices n) and/or a plurality of frequency instances (having frequency indices k).
(90) A real part of said ratio may be formed (for example, by an operation Re{ }) in order to have a real-valued common inter-object-correlation bitstream parameter value IOC.sub.single, as shown in the above equation.
(91) 5.2. Usage of a Constant Value
(92) In another embodiment, a constant value c may be chosen to obtain the common inter-object-correlation bitstream parameter value IOC.sub.single in accordance with
IOC.sub.single=c,
with c being a constant.
(93) This constant c could, for example, describe a time- and frequency-independent cross talk of a room with specific acoustics (amount of reverb) where a telephone conference takes place.
(94) The constant c may, for example, be set in accordance with an estimation of the room acoustics, which may be performed by the SAOC encoder. Alternatively, the constant c may be input via a user interface, or may be predetermined in the SAOC encoder 410.
(95) 6. Decoder-Sided Determination of the Inter-Object-Correlation Values for all Object Pairs
(96) In the following, it will be described how the inter-object-correlation values for all object pairs can be obtained.
(97) At the decoder side (for example, in the SAOC decoder 420), the single inter-object-correlation (bitstream) parameter (IOC.sub.single) is used to determine the inter-object-correlation values for all object pairs. This is done, for example, in the “Single IOC Expander” module 474 (see
(98) An advantageous method is a simple copy operation. The copying can be applied with or without considering the “related to” information conveyed, for example, in the SAOC bitstream header (for example, in the portion “SAOCSpecificConfiguration( )”).
(99) In an embodiment, a copying without “related to” information (i.e., without transferring or considering a “related to” information) may be performed in the following manner:
IOC.sub.mn=IOC.sub.single, for all m,n with m≠n.
(100) Thus, all inter-object-correlation values for pairs of different audio objects are set to the common inter-object-correlation (bitstream) parameter value.
(101) In another embodiment, a copying with “related to” information (i.e., taking into consideration the “related to” information) is performed, for example, in the following manner:
(102)
(103) Accordingly, one or even two inter-object-correlation values associated with a pair of audio objects (having audio object indices m and n) are set to the value IOC.sub.single specified, for example, by the common inter-object-correlation bitstream parameter value, if the object relationship information “relatedTo(m,n)” indicates that said audio objects are related to each other. Otherwise, i.e. if the object relationship information “relatedTo(m,n)” indicates that the audio objects of a pair of audio objects are not related, one or even two inter-object-correlation values associated with the pair of audio objects are set to a predetermined value, for example, to zero.
(104) However, different distribution methods are possible, for example, taking the object powers into account. For example, inter-object-correlation values relating to objects with relatively low power could be set to high values, such as 1 (full correlation), to minimize the influence of the decorrelation filter in the SAOC decoder.
(105) 7. Decoder Concept Using Bitstream Elements According to
(106) In the following, a decoder concept of an audio signal decoder using the bitstream syntax elements according to
(107) Accordingly, the bitstream comprising the downmix signal representation 110 and the object-related parametric information 112 and/or the bitstream representation 220 and/or the bitstream 300 and/or a bitstream comprising the downmix information 430 and the side information 432, may be provided in accordance with the following description.
(108) An SAOC bitstream, which may be provided by the above-described SAOC encoders and which may be evaluated by the above-described SAOC decoders may comprise an SAOC specific configuration portion, which will be described in the following taking reference to
(109) The SAOC specific configuration information comprises, for example, sampling frequency configuration information, which describes a sampling frequency used by an audio signal encoder and/or to be used by an audio signal decoder. The SAOC specific configuration information also comprises a low delay mode configuration information, which describes whether a low delay mode has been used by an audio signal encoder an/or should be used by an audio signal decoder. The SAOC specific configuration information also comprises a frequency resolution configuration information, which describes a frequency resolution used by an audio signal encoder and/or to be used by an audio signal decoder. The SAOC specific configuration information also comprises a frame length configuration information describing a frame length of audio frames used by the SAOC encoder and/or to be used by the SAOC decoder. The SOAC specific configuration information also comprises an object number configuration information which describes a number of audio objects. This object number configuration information, which is also designated with “bsNumObjects”, for example describes the value N, which has been used above.
(110) The SAOC specific configuration information also comprises an object relationship configuration information. For example, there may be one bitstream bit for every pair of different audio objects. However, the relationship of audio objects may be represented, for example, by a square N×N matrix having a one-bit entry for every combination of audio objects. Entries of said matrix describing the relationship of an object with itself, i.e., diagonal elements, may be set to one, which indicates that an object is related to itself. Two entries, namely a first entry having a first index i and a second index j, and a second entry having a first index j and a second index i, may be associated with each pair of different audio objects having audio object indices i and j. Accordingly, a single bitstream bit determines the values of two entries of the object relationship matrix, which are set to identical values.
(111) As can be seen, a first audio object index i runs from i=0 to i=bsNumObjects (outer for-loop). A diagonal entry “bsRelatedTo[i][i]” is set to one for all values of i. For a first audio object index i, bits describing a relationship between audio object i and audio objects j (having audio object index j) are included in the bit stream for j=i+1 to j=bsNumObjects. Accordingly, entries of the relationship matrix “bsRelatedTo[i][j]”, which describe a relationship between the audio objects having audio object indices i and j, are set to the value given in the bit stream. In addition, an object relationship matrix entry “bsRelatedTo[j][i]” is set to the same value, i.e., to the value of the matrix entry “bsRelatedTo[i][j]”. For details, reference is made to the syntax representation of
(112) The SAOC specific configuration information also comprises an absolute energy transmission configuration information, which describes whether an audio encoder has included an absolute energy information into the bit stream, and/or whether an audio decoder should evaluate an absolute energy transmission configuration information included in the bit stream.
(113) The SAOC specific configuration information also comprises a downmix-channel-number configuration information, which describes a number of downmix channels used by the audio encoder and/or to be used by the audio decoder. The SAOC specific configuration information may also comprise additional configuration information, which is not relevant for the present application, and which can optionally be omitted.
(114) The SAOC specific configuration information also comprises a common inter-object-correlation configuration information (also designated as a “bitstream signaling parameter” herein) which describes whether a common inter-object-correlation bitstream parameter value is included in the SAOC bitstream, or whether object-pair-individual inter-object-correlation bitstream parameter values are included in the SAOC bitstream. Said common inter-object-correlation configuration information may, for example, be designated with “bsOneIOC, and may be a one-bit value.
(115) The SAOC specific configuration information may also comprise a distortion control unit configuration information.
(116) In addition, the SAOC specific configuration information may comprise one or more fill bits, which are designated with “ByteAlign( )”, and which may be used to adjust the lengths of the SAOC specific configuration information. In addition, the SAOC specific configuration information may comprise optional additional configuration information “SAOCExtensionConfig( )” which is not of relevance for the present application and which will not be discussed here for this reason.
(117) It should be noted here that the SAOC specific configuration information may comprise more or less than the above described configuration information. In other words, some of the above described configuration information may be omitted in some embodiments, and additional configuration information may also be also included in some embodiments.
(118) However, it should be noted that the SAOC specific configuration information may, for example, be included once per piece of audio in an SAOC bitstream. However, the SAOC specific configuration information may optionally be included more often in the bitstream. Nevertheless, the SAOC specific configuration information is typically provided for a plurality of SAOC frames, because the SAOC specific configuration information provides a significant bit load overhead.
(119) In the following, the syntax of an SAOC frame will be described taking reference to
(120) The SAOC frame also comprises encoded absolute energy values NRG, which may be considered as optional, and which may be included band-wise.
(121) The SAOC frame also comprises encoded inter-object-correlation values IOC, which may be provide band-wise, i.e., separately for a plurality of frequency bands, and for a plurality of combinations of audio objects.
(122) In the following, the bitstream will be described with respect to the operations which may be performed by a bitstream parser parsing the bitstream.
(123) The bitstream parser may, for example, initialize variables k, iocldx1, iocldx2 to a value of zero in a first preparatory step.
(124) Subsequently, the bitstream parser may perform a parsing for a plurality of values of the first audio object index i between i=0 and i=bsNumObjects (outer for-loop). The bitstream parser may, for example, set an inter-object-correlation index value idxIoc[i][i] describing a relationship between the audio object having audio object index i and itself to zero which indicates a full correlation.
(125) Subsequently, a bitstream parser may evaluate the bitstream for values j of a second audio object index between i+1 and bsNumObjects. If audio objects having audio object indices i and j are related, which is indicated by a non-zero value of the object relationship matrix entry “bsRelatedTo[i][j]”, the bitstream parser performs an algorithm 610, and otherwise, the bitstream parser sets the inter-object-correlation index associated with the audio objects having audio object indices i and j to five (operation “idxIOC[i][j]=5”), which describes a zero correlation. Thus, for pairs of audio objects, for which the object relationship matrix indicates no relationship, the inter-object-correlation value is set to zero. For related pairs of audio objects, however, the bitstream signaling parameter “bsOneIOC”, which is included in the SAOC specific configuration, is evaluated to decide how to proceed. If the bitstream signaling parameter “bsOneIOC” indicates that there are object-pair-individual inter-object-correlation bitstream parameter values, a plurality of inter-object-relationship indices idxIOC[i][j] (which may be considered as inter-object-relationship bitstream parameter values) are extracted from the bitstream for “numBands” frequency bands using the function “EcDataSaoc”, wherein said function may be used to decode the inter-object-relationship indices.
(126) However, if the bitstream signaling parameter “bsOneIOC” indicated that a common inter-object-correlation bitstream parameter value is used for a plurality of pairs of audio objects, and id the bitstream parameter “bsRelatedTo[i][j]” indicates that the audio objects having audio object indices i and j are related, a single set of a plurality of inter-object-correlation indices “idxIOC[i][j]” is read from the bitstream using the function “EcDataSaoc” for a plurality of numBands frequency bands, wherein only a single inter-object-correlation index is read for any given frequency band. However upon re-execution of the algorithm 610, a previously read inter-object-correlation index idxIOC[iocldx1][iocldx2] is copied without evaluating the bitstream. This is ensured by use of the variable k, which is initialized to zero and incremented upon evaluation of the first set of inter-object-correlation indices idxIOC[i][j].
(127) To summarize, for each combination of two audio objects, it is first evaluated whether the two audio objects of such a combination are signaled as being related to each other (for example, by checking whether the value “bsRelatedTo[i][j]” takes the value zero or not). If the audio objects of the pair of audio objects are related, the further processing 610 is performed. Otherwise, the value “idxIOC[i][j]” associated to this pair of (substantially unrelated) audio objects is set to a predetermined value, for example, to a predetermined value indicating a zero inter-object-correlation.
(128) In the processing 610, a bitstream value is read from the bitstream for every pair of audio objects (which is signaled to comprise related audio objects) if the signaling “bsOneIOC” is inactive. Otherwise, i.e., if the signaling “bsOneIOC” is active, only one bitstream value is read for one pair of audio objects, and the reference to said single pair is maintained by setting the index values iocldx1 and iocldx2 to point at this read out value. The single read out value is reused for other pairs of audio objects (which are signaled as being related to each other) if the signaling “bsOneIOC” is active.
(129) Finally, it is also ensured that a same inter-object-correlation index value is associated to both combinations of two given different audio objects, irrespective of which of the two given audio objects is the first audio object and which of the two given audio objects is the second audio object.
(130) In addition, it should be noted that the SAOC frame typically comprises the encoded downmix gain values (DMG) on a per-audio-object basis.
(131) In addition, the SAOC frame typically comprises encoded downmix-channel-level-differences (DCLD), which may optionally be included on a per-audio-object basis.
(132) The SAOC frame further optionally comprises encoded post-processing-downmix-gain values (PDG), which may be included in a band wise-manner and per downmix channel.
(133) In addition, the SAOC frame may comprise encoded distortion-control-unit parameters, which determine the application of distortion control measures.
(134) Moreover, the SAOC frame may comprise one or more fill bits “ByteAlign( )”.
(135) Furthermore, an SAOC frame may comprise extension data “SAOCExtensionFrame( )”, which, however, are not relevant for the present application and will not be discussed in detail here for this reason.
(136) Taking reference now to
(137) As can be seen, a first row 710 of a table of
(138) To conclude, an SAOC configuration portion “SAOCSpecificConfig( )” advantageously comprises a bitstream parameter “bsOneIOC” which indicates if only a single IOC parameter is conveyed common to all objects which have relation with each other, signaled by “bsRelatedTo[i][j]=1”. The inter-object-correlation values are included in the bitstream in encoded form “EcDataSaoc (IOC,k,numBands)”. An array “idxIOC[i][j]” is filled on the basis of one or more encoded inter-object-correlation values. The entries of the array “idxIOC[i][j]” are mapped onto inversely quantized values using the mapping table of
(139) The covariance matrix E of size N×N with elements e.sub.i,j represents an approximation of the original signal covariance matrix E≈SS* and is obtained from the OLD and IOC parameters as
e.sub.i,j=OLD.sub.iOLD.sub.jIOC.sub.i,j.
7. Implementation Alternatives
(140) Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, some one or more of the most important method steps may be executed by such an apparatus.
(141) The inventive encoded audio signal can be stored on a digital storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
(142) Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a Blu-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
(143) Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
(144) Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.
(145) Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
(146) In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
(147) A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or non-transitionary.
(148) A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
(149) A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
(150) A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
(151) In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are advantageously performed by any hardware apparatus.
(152) The above described embodiments are merely illustrative for the principles of the present invention. It is understood that modifications and variations of the arrangements and the details described herein will be apparent to others skilled in the art. It is the intent, therefore, to be limited only by the scope of the impending patent claims and not by the specific details presented by way of description and explanation of the embodiments herein.
(153) While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations and equivalents as fall within the true spirit and scope of the present invention.
8. REFERENCES
(154) [BCC] C. Faller and F. Baumgarte, “Binaural Cue Coding—Part II: Schemes and applications,” IEEE Trans. on Speech and Audio Proc., vol. 11, no. 6, November 2003 [JSC] C. Faller, “Parametric Joint-Coding of Audio Sources”, 120th AES Convention, Paris, 2006, Preprint 6752 [SAOC1] J. Herre, S. Disch, J. Hilpert, O. Hellmuth: “From SAC To SAOC—Recent Developments in Parametric Coding of Spatial Audio”, 22nd Regional UK AES Conference, Cambridge, UK, April 2007 [SAOC2] J. Engdegård, B. Resch, C. Falch, O. Hellmuth, J. Hilpert, A. Hölzer, L. Terentiev, J. Breebaart, J. Koppens, E. Schuijers and W. Oomen: “Spatial Audio Object Coding (SAOC)—The Upcoming MPEG Standard on Parametric Object Based Audio Coding”, 124th AES Convention, Amsterdam 2008, Preprint 7377 [SAOC] ISO/IEC, “MPEG audio technologies—Part 2: Spatial Audio Object Coding (SAOC),” ISO/IEC JTC1/SC29/WG11 (MPEG) FCD 23003-2.