Method for decoding and encoding a downmix matrix, method for presenting audio content, encoder and decoder for a downmix matrix, audio encoder and audio decoder
11393481 · 2022-07-19
Assignee
Inventors
Cpc classification
International classification
G10L19/008
PHYSICS
G10L19/02
PHYSICS
G10L19/20
PHYSICS
G10L19/022
PHYSICS
G10L19/025
PHYSICS
G10L19/005
PHYSICS
Abstract
A method is described which decodes a downmix matrix for mapping a plurality of input channels of audio content to a plurality of output channels, the input and output channels being associated with respective speakers at predetermined positions relative to a listener position, wherein the downmix matrix is encoded by exploiting the symmetry of speaker pairs of the plurality of input channels and the symmetry of speaker pairs of the plurality of output channels. Encoded information representing the encoded downmix matrix is received and decoded for obtaining the decoded downmix matrix.
Claims
1. A method, comprising: decoding an encoded downmix matrix for obtaining a decoded downmix matrix, the downmix matrix mapping a plurality of input channels of audio content to a plurality of output channels, the input and output channels being associated with respective speakers at predetermined positions relative to a listener position, wherein the downmix matrix is encoded by exploiting a symmetry of speaker pairs of the plurality of input channels and a symmetry of speaker pairs of the plurality of output channels, wherein the encoded downmix matrix is decoded by receiving encoded information representing the encoded downmix matrix; and decoding the encoded information for obtaining the decoded downmix matrix, wherein respective pairs of input and output channels in the downmix matrix have associated respective mixing gains for adapting a level by which a given input channel contributes to a given output channel, and the method further comprising: decoding encoded significance values from the encoded information representing the encoded downmix matrix for obtaining decoded significance values, wherein respective decoded significance values are assigned to pairs of symmetric speaker groups of the input channels and symmetric speaker groups of the output channels, wherein a decoded significance value indicates if a mixing gain for one or more of the input channels is zero or not; and decoding encoded mixing gains from the encoded information representing the encoded downmix matrix for obtaining the mixing gains.
2. The method of claim 1, wherein decoding the encoded significance values is based on a template, the template having the same pairs of speaker groups of the input channels and speaker groups of the output channels as the downmix matrix, wherein respective template significance values are assigned to pairs of symmetric speaker groups of the input channels and symmetric speaker groups of the output channels of the template.
3. The method of claim 2, comprising: decoding a one-dimensional vector using a run-length scheme, wherein the one-dimensional vector logically combines the significance values and the template significance values, the one-dimensional vector indicating by a first value that a significance value and a template significance value are identical, and by a second value that a significance value and template significance value are different.
4. The method of claim 1, wherein the decoded significance values comprise a first value indicative of a mixing gain of zero and a second value indicative of a mixing gain not being zero, the method further comprising: decoding a one-dimensional vector using a run-length scheme, the one-dimensional vector concatenating the decoded significance values in a predefined order.
5. The method of claim 4, wherein the one-dimensional vector comprises a list containing run-lengths, and wherein the run-length scheme comprises a number of consecutive first values terminated by a second value.
6. The method of claim 4, wherein the run-length scheme comprises a Golomb-Rice coding or a limited Golomb-Rice coding.
7. The method of claim 1, wherein decoding the encoded downmix matrix comprises: decoding from the encoded information representing the encoded downmix matrix information indicating in the downmix matrix for each group of output channels whether a symmetry property and a separability property are satisfied, the symmetry property indicating that a group of output channels is mixed with the same gain from a single input channel, or a group of output channels is mixed equally from a group of input channels, and the separability property indicating that a group of output channels is mixed from a group of input channels while keeping all signals at the respective left or right side.
8. The method of claim 1, wherein decoding the encoded downmix matrix comprises: decoding, from the encoded information representing downmix matrix information indicating in the downmix matrix for each group of output channels whether a symmetry property and a separability property are satisfied, the symmetry property indicating that a group of output channels is mixed with the same gain from a single input channel, or a group of output channels is mixed equally from a group of input channels, and the separability property indicating that a group of output channels is mixed from a group of input channels while keeping all signals at the respective left or right, and for groups of output channels satisfying the symmetry property and the separability property a single mixing gain is provided.
9. The method of claim 1, wherein: a list holding the mixing gains is provided, each of the mixing gains being associated with an index in the list, and wherein the method comprises: decoding from the encoded information representing the encoded downmix matrix the indexes of the list; and selecting the mixing gains from the list in accordance with the decoded indexes in the list.
10. The method of claim 9, wherein, in the encoded information representing the encoded downmix matrix, the indexes are represented using Golomb-Rice coding or the limited Golomb-Rice coding.
11. The method of claim 9, wherein providing the list comprises: decoding from the encoded information representing the encoded downmix matrix a minimum gain value, a maximum gain value and a desired precision; and creating the list including a plurality of gain values between the minimum gain value and the maximum gain value, the gain values being provided with the desired precision, wherein the more frequently the gain values are typically used, the closer the gain values are to the beginning of the list, the beginning of the list having the smallest indexes.
12. The method of claim 11, wherein the list of gain values is created as follows: add integer multiples of a first gain value, between the minimum gain, inclusive, and a starting gain value, inclusive, in decreasing order; add remaining integer multiples of the first gain value, between the starting gain value, inclusive, and the maximum gain, inclusive, in increasing order; add remaining integer multiples of a first precision level, between the minimum gain, inclusive, and the starting gain value, inclusive, in decreasing order; add remaining integer multiples of the first precision level, between the starting gain value, inclusive, and the maximum gain, inclusive, in increasing order; stop here if precision level is the first precision level; add remaining integer multiples of a second precision level, between the minimum gain, inclusive, and the starting gain value, inclusive, in decreasing order; add remaining integer multiples of the second precision level, between the starting gain value, inclusive, and the maximum gain, inclusive, in increasing order; stop here if precision level is the second precision level; add remaining integer multiples of a third precision level, between the minimum gain, inclusive, and the starting gain value, inclusive, in decreasing order; and add remaining integer multiples of the third precision level, between the starting gain value, inclusive, and the maximum gain, inclusive, in increasing order.
13. The method of claim 12, wherein the starting gain value=0 dB, the first gain value=3 dB, the first precision level=1 dB, the second precision level=0.5 dB, and the third precision level=0.25 dB.
14. The method of claim 1, wherein a predetermined position of a loudspeaker is defined dependent on an azimuth angle and an elevation angle of the speaker position relative to the listener position, and wherein a symmetric speaker pair is formed by speakers having the same elevation angle and having the same absolute value of the azimuth angle but with different signs.
15. The method of claim 1, wherein the input and output channels further include channels associated with one or more center speakers and one or more asymmetrical speakers, an asymmetrical speaker lacking another symmetrical speaker in the configuration defined by the input/output channels.
16. The method of claim 1, wherein decoding the encoded downmix matrix comprises: decoding an encoded compact downmix matrix for obtaining a decoded compact downmix matrix, the compact downmix matrix by grouping together input channels in the downmix matrix associated with symmetric speaker pairs and output channels in the downmix matrix associated with symmetric speaker pairs into common columns or rows.
17. The method of claim 1, wherein decoding the encoded downmix matrix comprises: decoding an encoded compact matrix for obtaining a decoded compact downmix matrix, the compact downmix matrix grouping together input channels in the downmix matrix associated with symmetric speaker pairs and output channels in the downmix matrix associated with symmetric speaker pairs into common columns or rows, assigning the mixing gains to the corresponding significance values indicating that a gain is not zero, and ungrouping the input channels and the output channels grouped together for obtaining the decoded downmix matrix.
18. A method for presenting audio content having a plurality of input channels to a system having a plurality of output channels different from the input channels, the method comprising: providing the audio content and a downmix matrix for mapping the input channels to the output channels, encoding the audio content for obtaining encoded audio content; encoding the downmix matrix for obtaining an encoded downmix matrix, the downmix matrix encoded by exploiting a symmetry of speaker pairs of the plurality of input channels and a symmetry of speaker pairs of the plurality of output channels; transmitting the encoded audio content and the encoded downmix matrix to the system; decoding the encoded audio content for obtaining decoded audio content; decoding the encoded downmix matrix by receiving encoded information representing the encoded downmix matrix and decoding the encoded information for obtaining a decoded downmix matrix; and mapping the input channels of the decoded audio content to the output channels of the system using the decoded downmix matrix, wherein respective pairs of input and output channels in the downmix matrix have associated respective mixing gains for adapting a level by which a given input channel contributes to a given output channel, and wherein decoding the encoded downmix matrix comprises: decoding encoded significance values from the encoded information representing the encoded downmix matrix for obtaining decoded significance values, wherein respective decoded significance values are assigned to pairs of symmetric speaker groups of the input channels and symmetric speaker groups of the output channels, wherein a decoded significance value indicates if a mixing gain for one or more of the input channels is zero or not; and decoding encoded mixing gains from the encoded information representing the encoded downmix matrix for obtaining the mixing gains.
19. The method of claim 18, wherein the downmix matrix is specified by a user.
20. The method of claim 18, further comprising transmitting equalizer parameters associated to the input channels or elements of the downmix matrix.
21. A non-transitory computer-readable medium storing instructions which, when executed by a processor, cause the processor to carry out a method comprising: decoding an encoded downmix matrix for obtaining a decoded downmix matrix, the downmix matrix mapping a plurality of input channels of audio content to a plurality of output channels, the input and output channels being associated with respective speakers at predetermined positions relative to a listener position, wherein the downmix matrix is encoded by exploiting a symmetry of speaker pairs of the plurality of input channels and a symmetry of speaker pairs of the plurality of output channels, wherein the encoded downmix matrix is decoded by receiving encoded information representing the encoded downmix matrix; and decoding the encoded information for obtaining the decoded downmix matrix, wherein respective pairs of input and output channels in the downmix matrix have associated respective mixing gains for adapting a level by which a given input channel contributes to a given output channel, and the method further comprising: decoding encoded significance values from the encoded information representing the encoded downmix matrix for obtaining decoded significance values, wherein respective decoded significance values are assigned to pairs of symmetric speaker groups of the input channels and symmetric speaker groups of the output channels, wherein a decoded significance value indicates if a mixing gain for one or more of the input channels is zero or not; and decoding encoded mixing gains from the encoded information representing the encoded downmix matrix for obtaining the mixing gains.
22. A non-transitory computer-readable medium storing instructions which, when executed by a processor, cause the processor to carry out a method, the method comprising: encoding a downmix matrix, the downmix matrix mapping a plurality of input channels of audio content to a plurality of output channels, the input and output channels being associated with respective speakers at predetermined positions relative to a listener position, wherein encoding the downmix matrix comprises exploiting a symmetry of speaker pairs of the plurality of input channels and a symmetry of speaker pairs of the plurality of output channels, wherein respective pairs of input and output channels in the downmix matrix have associated respective mixing gains for adapting a level by which a given input channel contributes to a given output channel, and wherein encoding the downmix matrix comprises encoding significance values, wherein respective significance values are assigned to pairs of symmetric speaker groups of the input channels and symmetric speaker groups of the output channels, wherein a significance value indicates if a mixing gain for one or more of the input channels is zero or not.
23. A non-transitory computer-readable medium storing instructions which, when executed by a processor, cause the processor to carry out a method for presenting audio content having a plurality of input channels to a system having a plurality of output channels different from the input channels, the method comprising: providing the audio content and a downmix matrix for mapping the input channels to the output channels, encoding the audio content for obtaining encoded audio content; encoding the downmix matrix for obtaining an encoded downmix matrix, the downmix matrix encoded by exploiting a symmetry of speaker pairs of the plurality of input channels and a symmetry of speaker pairs of the plurality of output channels; transmitting the encoded audio content and the encoded downmix matrix to the system; decoding the encoded audio content for obtaining decoded audio content; decoding the encoded downmix matrix by receiving encoded information representing the encoded downmix matrix and decoding the encoded information for obtaining a decoded downmix matrix; and mapping the input channels of the decoded audio content to the output channels of the system using the decoded downmix matrix, wherein respective pairs of input and output channels in the downmix matrix have associated respective mixing gains for adapting a level by which a given input channel contributes to a given output channel, and wherein decoding the encoded downmix matrix comprises: decoding encoded significance values from the encoded information representing the encoded downmix matrix for obtaining decoded significance values, wherein respective decoded significance values are assigned to pairs of symmetric speaker groups of the input channels and symmetric speaker groups of the output channels, wherein a decoded significance value indicates if a mixing gain for one or more of the input channels is zero or not; and decoding encoded mixing gains from the encoded information representing the encoded downmix matrix for obtaining the mixing gains.
24. An encoder for encoding a downmix matrix, the downmix matrix mapping a plurality of input channels of audio content to a plurality of output channels, the input and output channels being associated with respective speakers at predetermined positions relative to a listener position, the encoder comprising: a processor configured to encode the downmix matrix, wherein encoding the downmix matrix comprises exploiting a symmetry of speaker pairs of the plurality of input channels and a symmetry of speaker pairs of the plurality of output channels, wherein respective pairs of input and output channels in the downmix matrix have associated respective mixing gains for adapting a level by which a given input channel contributes to a given output channel, and wherein, for encoding the downmix matrix, the processor is configured to encode significance values, wherein respective significance values are assigned to pairs of symmetric speaker groups of the input channels and symmetric speaker groups of the output channels, wherein a significance value indicates if a mixing gain for one or more of the input channels is zero or not.
25. A decoder, comprising: a processor configured to decode an encoded downmix matrix for obtaining a decoded downmix matrix, the downmix matrix mapping a plurality of input channels of audio content to a plurality of output channels, the input and output channels being associated with respective speakers at predetermined positions relative to a listener position, wherein the downmix matrix is encoded by exploiting a symmetry of speaker pairs of the plurality of input channels and a symmetry of speaker pairs of the plurality of output channels, wherein the processor is configured to decode the encoded downmix matrix by receiving encoded information representing the encoded downmix matrix, and decoding the encoded information for obtaining the decoded downmix matrix, wherein respective pairs of input and output channels in the downmix matrix have associated respective mixing gains for adapting a level by which a given input channel contributes to a given output channel, and wherein, for decoding the encoded downmix matrix, the processor is configured to decode encoded significance values from the encoded information representing the encoded downmix matrix for obtaining decoded significance values, wherein respective decoded significance values are assigned to pairs of symmetric speaker groups of the input channels and symmetric speaker groups of the output channels, wherein a decoded significance value indicates if a mixing gain for one or more of the input channels is zero or not; and decode encoded mixing gains from the encoded information representing the encoded downmix matrix for obtaining the mixing gains.
26. An audio encoder for encoding an audio signal, comprising an encoder for encoding a downmix matrix, the downmix matrix mapping a plurality of input channels of audio content to a plurality of output channels, the input and output channels being associated with respective speakers at predetermined positions relative to a listener position, the encoder comprising: a processor configured to encode the downmix matrix, wherein encoding the downmix matrix comprises exploiting a symmetry of speaker pairs of the plurality of input channels and a symmetry of speaker pairs of the plurality of output channels, wherein respective pairs of input and output channels in the downmix matrix have associated respective mixing gains for adapting a level by which a given input channel contributes to a given output channel, and wherein, for encoding the downmix matrix, the processor is configured to encode significance values, wherein respective significance values are assigned to pairs of symmetric speaker groups of the input channels and symmetric speaker groups of the output channels, wherein a significance value indicates if a mixing gain for one or more of the input channels is zero or not.
27. An audio decoder for decoding an encoded audio signal, the audio decoder comprising a decoder for decoding an encoded downmix matrix for obtaining a decoded downmix matrix, the downmix matrix, mapping a plurality of input channels of audio content to a plurality of output channels, the input and output channels being associated with respective speakers at predetermined positions relative to a listener position, wherein the downmix matrix is encoded by exploiting a symmetry of speaker pairs of the plurality of input channels and a symmetry of speaker pairs of the plurality of output channels, the decoder comprising: a processor configured to receive encoded information representing the encoded downmix matrix, and to decode the encoded information for obtaining the decoded downmix matrix, wherein respective pairs of input and output channels in the downmix matrix have associated respective mixing gains for adapting a level by which a given input channel contributes to a given output channel, and wherein, for decoding the encoded downmix matrix, the processor is configured to decode encoded significance values from the encoded information representing the encoded downmix matrix for obtaining decoded significance values, wherein respective decoded significance values are assigned to pairs of symmetric speaker groups of the input channels and symmetric speaker groups of the output channels, wherein a decoded significance value indicates if a mixing gain for one or more of the input channels is zero or not; and decode encoded mixing gains from the encoded information representing the encoded downmix matrix for obtaining the mixing gains.
28. The audio decoder of claim 27, comprising a format converter coupled to the decoder for receiving the decoded downmix matrix and operative to convert a format of the decoded audio signal in accordance with the received decoded downmix matrix.
29. A method, comprising: encoding a downmix matrix, the downmix matrix mapping a plurality of input channels of audio content to a plurality of output channels, the input and output channels being associated with respective speakers at predetermined positions relative to a listener position, and wherein encoding the downmix matrix comprises exploiting a symmetry of speaker pairs of the plurality of input channels and a symmetry of speaker pairs of the plurality of output channels, wherein respective pairs of input and output channels in the downmix matrix have associated respective mixing gains for adapting a level by which a given input channel contributes to a given output channel, and wherein encoding the downmix matrix comprises encoding significance values, wherein respective significance values are assigned to pairs of symmetric speaker groups of the input channels and symmetric speaker groups of the output channels, wherein a significance value indicates if a mixing gain for one or more of the input channels is zero or not.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
DETAILED DESCRIPTION OF THE INVENTION
(11) Embodiments of the inventive approach will be described. The following description will start with a system overview of a 3D audio codec system in which the inventive approach may be implemented.
(12)
(13)
(14) In an embodiment of the present invention, the encoding/decoding system depicted in
(15) The algorithm blocks of the overall 3D audio system shown in
(16) The pre-renderer/mixer 102 may be optionally provided to convert a channel plus object input scene into a channel scene before encoding. Functionally, it is identical to the object renderer/mixer that will be described below. Pre-rendering of objects may be desired to ensure a deterministic signal entropy at the encoder input that is basically independent of the number of simultaneously active object signals. With pre-rendering of objects, no object metadata transmission is necessitated. Discrete object signals are rendered to the channel layout that the encoder is configured to use. The weights of the objects for each channel are obtained from the associated object metadata (OAM).
(17) The USAC encoder 116 is the core codec for loudspeaker-channel signals, discrete object signals, object downmix signals and pre-rendered signals. It is based on the MPEG-D USAC technology. It handles the coding of the above signals by creating channel- and object mapping information based on the geometric and semantic information of the input channel and object assignment. This mapping information describes how input channels and objects are mapped to USAC-channel elements, like channel pair elements (CPEs), single channel elements (SCEs), low frequency effects (LFEs) and quad channel elements (QCEs) and CPEs, SCEs and LFEs, and the corresponding information is transmitted to the decoder. All additional payloads like SAOC data 114, 118 or object metadata 126 are considered in the encoder's rate control. The coding of objects is possible in different ways, depending on the rate/distortion requirements and the interactivity requirements for the renderer. In accordance with embodiments, the following object coding variants are possible: Pre-rendered objects: Object signals are pre-rendered and mixed to the 22.2 channel signals before encoding. The subsequent coding chain sees 22.2 channel signals. Discrete object waveforms: Objects are supplied as monophonic waveforms to the encoder. The encoder uses single channel elements (SCEs) to transmit the objects in addition to the channel signals. The decoded objects are rendered and mixed at the receiver side. Compressed object metadata information is transmitted to the receiver/renderer. Parametric object waveforms: Object properties and their relation to each other are described by means of SAOC parameters. The downmix of the object signals is coded with the USAC. The parametric information is transmitted alongside. The number of downmix channels is chosen depending on the number of objects and the overall data rate. Compressed object metadata information is transmitted to the SAOC renderer.
(18) The SAOC encoder 112 and the SAOC decoder 220 for object signals may be based on the MPEG SAOC technology. The system is capable of recreating, modifying and rendering a number of audio objects based on a smaller number of transmitted channels and additional parametric data, such as OLDs, IOCs (Inter Object Coherence), DMGs (DownMix Gains). The additional parametric data exhibits a significantly lower data rate than necessitated for transmitting all objects individually, making the coding very efficient. The SAOC encoder 112 takes as input the object/channel signals as monophonic waveforms and outputs the parametric information (which is packed into the 3D-Audio bitstream 128) and the SAOC transport channels (which are encoded using single channel elements and are transmitted). The SAOC decoder 220 reconstructs the object/channel signals from the decoded SAOC transport channels 210 and the parametric information 214, and generates the output audio scene based on the reproduction layout, the decompressed object metadata information and optionally on the basis of the user interaction information.
(19) The object metadata codec (see OAM encoder 124 and OAM decoder 224) is provided so that, for each object, the associated metadata that specifies the geometrical position and volume of the objects in the 3D space is efficiently coded by quantization of the object properties in time and space. The compressed object metadata cOAM 126 is transmitted to the receiver 200 as side information.
(20) The object renderer 216 utilizes the compressed object metadata to generate object waveforms according to the given reproduction format. Each object is rendered to a certain output channel according to its metadata. The output of this block results from the sum of the partial results. If both channel based content as well as discrete/parametric objects are decoded, the channel based waveforms and the rendered object waveforms are mixed by the mixer 226 before outputting the resulting waveforms 228 or before feeding them to a postprocessor module like the binaural renderer 236 or the loudspeaker renderer module 232.
(21) The binaural renderer module 236 produces a binaural downmix of the multichannel audio material such that each input channel is represented by a virtual sound source. The processing is conducted frame-wise in the QMF (Quadrature Mirror Filterbank) domain, and the binauralization is based on measured binaural room impulse responses.
(22) The loudspeaker renderer 232 converts between the transmitted channel configuration 228 and the desired reproduction format. It may also be called “format converter.” The format converter performs conversions to lower numbers of output channels, i.e., it creates downmixes.
(23)
(24) Multichannel audio formats are currently present in a large variety of configurations; they are used in a 3D audio system as it has been described above in detail which is used, for example, for providing audio information provided on DVDs and Blue-ray discs. One important issue is to accommodate the real-time transmission of multi-channel audio, while maintaining the compatibility with existing available customer physical speaker setups. A solution is to encode the audio content in the original format used, for example, in production, which typically has a large number of output channels. In addition, downmix side information is provided to generate other formats which have less independent channels. Assuming, for example, a number N of input channels and a number M of output channels, the downmix procedure at the receiver may be specified by a downmix matrix having the size N×M. This particular procedure, as it might be carried out in the downmixer of the above described format converter or binaural renderer, represents a passive downmix, meaning that no adaptive signal processing dependent on the actual audio content is applied to the input signals or to the downmixed output signals.
(25) A downmix matrix tries to match not only the physical mixing of the audio information, but may also convey the artistic intentions of the producer which may use his knowledge about the actual content that is transmitted. Therefore, there are several ways of generating downmix matrices, for example manually by using generic acoustic knowledge about the role and position of the input and output speakers, manually by using knowledge about the actual content and the artistic intention, and automatically, for example by using a software tool which computes an approximation using the given output speakers.
(26) There are a number of known approaches in the art for providing such downmix matrices. However, existing schemes make many assumptions and hard-code an important part of the structure and the contents of the actual downmix matrix. In “Information technology—Coding of audio-visual objects—Part 3: Audio, AMENDMENT 4: New levels for AAC profiles,” ISO/IEC 14496-3:2009/DAM 4, 2013, it is described to use particular downmixing procedures that are explicitly defined for downmixing from the 5.1 channel configuration (see ITU-R BS.775-3, “Multichannel stereophonic sound system with and without accompanying picture,” Rec., International Telecommunications Union, Geneva, Switzerland, 2012) to the 2.0 channel configuration, from the 6.1 or 7.1 Front or Front Height or Surround Back variants to the 5.1 or 2.0 channel configurations. The drawback of these known approaches is that the downmixing schemes only have a limited degree of freedom in the sense that some of the input channels are mixed with predefined weights (for example, in case of mapping the 7.1 Surround Back to the 5.1 configuration, the L, R and C input channels are directly mapped to the corresponding output channels) and a reduced number of gain values is shared for some other input channels (for example, in case of mapping the 7.1 Front to the 5.1 configuration, the L, R, Lc and Rc input channels are mixed to the L and R output channels using only one gain value). Moreover, the gains only have a limited range and precision, for example from 0 dB to −9 dB with a total of eight levels. Explicitly describing the downmix procedures for each input and output configuration pair is laborious and implies addendums to existing standards, at the expense of delayed compliance. Another proposal is described in “Enhanced audio support and other improvements,” ISO/IEC 14496-12:2012 PDAM 3, 2013. This approach uses explicit downmix matrices which represent an improvement in flexibility; however, the scheme again limits the range and precision of 0 dB to −9 dB with a total of 16 levels. Moreover, each gain is encoded with a fixed precision of 4 bits.
(27) Thus, in view of the prior art known, an improved approach for efficient coding of downmix matrices is needed, including the aspects of choosing a suitable representation domain and quantization scheme but also a lossless coding of the quantized values.
(28) In accordance with embodiments, unrestricted flexibility is achieved for handling downmix matrices by allowing encoding of arbitrary downmix matrices, with the range and the precision specified by the producer according to his needs. Also, embodiments of the invention provide for a very efficient lossless coding so the typical matrices use a small amount of bits, and departing from typical matrices will only gradually decrease efficiency. This means that the more similar a matrix is to a typical one, the more efficient the coding described in accordance with embodiments of the present invention will be.
(29) In accordance with embodiments, the necessitated precision may be specified by the producer as 1 dB, 0.5 dB or 0.25 dB, to be used for uniform quantization. It is noted that in accordance with other embodiments, also other values for the precision can be selected. Contrary thereto, existing schemes only allow for a precision of 1.5 dB or 0.5 dB for values around 0 dB, while using a lower precision for the other values. Using a coarser quantization for some values affects the worst case tolerances achieved and makes interpretation of decoded matrices more difficult. In existing techniques, a lower precision is used for some values which is a simple means to reduce the number of necessitated bits using uniform coding. However, practically the same results can be achieved without sacrificing precision by using an improved coding scheme that will be described in further detail below.
(30) In accordance with embodiments, the values of the mixing gains can be specified between a maximum value, for example +22 dB and a minimum value, for example −47 dB. They may also include the value minus infinity. The effective value range used in the matrix is indicated in the bit stream as a maximum gain and a minimum gain, thereby not wasting any bits on values which are not actually used while not limiting the desired flexibility.
(31) In accordance with embodiments, it is assumed that an input channel list of the audio content for which the downmix matrix is to be provided is available, as well as an output channel list indicative of the output speaker configuration. These lists provide geometrical information about each speaker in the input configuration and in the output configuration such as the azimuth angle and the elevation angle. Optionally, also the speakers' conventional names may be provided.
(32)
(33) In the following several techniques will be described which are applied in accordance with embodiments of the present invention to achieve an efficient lossless coding of the downmix matrix. In the following embodiments, reference will be made to a coding of the downmix matrix shown in
(34) In the following description of embodiments of the invention some aspects will be described in the context of encoding the downmix matrix; however, to the skilled reader, it is clear that these aspects also represent a description of the corresponding approach for decoding the downmix matrix. Analogously, aspects described in the context of decoding the downmix matrix also represent a description of a corresponding approach for encoding the downmix matrix.
(35) In accordance with embodiments, the first step is to take advantage of the significant number of zero entries in the matrix. In the following step, in accordance with embodiments, one takes advantage of the global and also the fine level regularities which are typically present in a downmix matrix. A third step is to take advantage of the typical distribution of the nonzero gain values.
(36) In accordance with a first embodiment, the inventive approach starts from a downmix matrix, as it may be provided by a producer of the audio content. For the following discussion, for the sake of simplicity, it is assumed that the downmix matrix considered is the one of
(37)
(38) In accordance with embodiments different classes of speaker groups are defined, mainly symmetric speakers S, center speakers C, and asymmetric speakers A. Center speakers are those speakers whose positions do not change when changing the sign of the azimuth angle of the speaker position. Asymmetric speakers are those speakers that lack the other or corresponding symmetric speaker in a given configuration, or in some rare configurations the speaker on the other side may have a different elevation angle or azimuth angle so that in this case there are two separate asymmetric speakers instead of a symmetric pair. In the downmix matrix 306 shown in
(39) In accordance with the described embodiment, the downmix matrix 306 is converted to a compact representation 308 by grouping together input and output speakers which form symmetric speaker pairs. Grouping the respective speakers together yields a compact input configuration 310 including the same center speakers C1 to C6 as in the original input configuration 300. However, when compared to the original input configuration 300 the symmetric speakers S1 to S9 are respectively grouped together such that the respective pairs now occupy only a single row, as is indicated in the lower part of
(40) In the embodiment described with regard to
(41) With regard to
(42) In accordance with another embodiment, the representation of the downmix matrix in its compact form as shown in
(43)
where (1) represents a virtual termination in case the bit vector ends with a 0. The above shown run-length may be coded using an appropriate coding scheme, such as a limited Golomb-Rice coding which assigns a variable length prefix code to each number, so that the total bit length is minimized. The Golomb-Rice coding approach is used to code a non-negative integer n≥0, using a non-negative integer parameter p≥0 as follows: first, the number h=└N/2.sup.p┘ is coded using a unary coding, the h one (1) bits being followed by a terminating zero bit; then the number l=n−h.Math.2.sup.p is uniformly coded using p bits.
(44) The limited Golomb-Rice coding is a trivial variant used when it is known in advance that n<N. It does not include the terminating zero bit when coding the maximum possible value of h, which is h.sub.max=└(N−1)/2.sup.p┘. More exactly, to encode h=h.sub.max only h one (1) bits are used without the terminating zero bit, which is not needed because the decoder can implicitly detect this condition.
(45) As mentioned above, the gains associated with the respective element 314 need to be encoded and transmitted as well and embodiments for doing this will be described in detail further below. Prior to discussing the encoding of the gains in detail, further embodiments for encoding the structure of the compact downmix matrix shown in
(46)
(47)
This list can now be encoded, for example by also using the limited Golomb-Rice coding. When compared to the embodiment described with regard to
(48) With regard to the use of a template matrix, as it has been described with regard to
(49) In the following, as mentioned above, embodiments will be described regarding the encoding of the mixing gains provided in the original downmix matrix which are no longer present in the compact downmix matrix and which need to be encoded and transmitted as well.
(50)
(51)
(52)
(53)
(54) In accordance with this embodiment, for each output speaker group, it is checked whether the corresponding column satisfies for all entries the properties of symmetry and separability and this information is transmitted as side information using two bits.
(55) The symmetry property will be described with regard to
(56) The separability property means that a symmetric group gets mixed into or from another symmetric group by keeping all signals from the left side to the left and all signals from the right side to the right. This applies for the sub-matrix shown in
(57) Using the above mentioned two properties, which are encountered in the majority of known downmix matrices, allows to further significantly reduce the actual number of gains that need to be coded and also directly eliminates the coding needed for a large number of zero gains in case of satisfying the separability property. For example, when considering the compact matrix of
(58) In the following, an embodiment will be described for dynamically creating a table of gains that may be used for defining the original gain values in the original downmix matrix, for example by a producer of the audio content. In accordance with this embodiment, a table of gains is created dynamically between a minimum gain value (minGain) and a maximum gain value (maxGain) using a specified precision. The table is created such that the most frequently used values and also the more “round” values are arranged closer to the beginning of the table or list than the other values, namely the values not so often used or the not so round values. In accordance with an embodiment, the list of possible values using maxGain, minGain and the precision level can be created as follows: add integer multiples of 3 dB, going down from 0 dB to minGain; add integer multiples of 3 dB, going up from 3 dB to maxGain; add remaining integer multiples of 1 dB, going down from 0 dB to minGain; add remaining integer multiples of 1 dB, going up from 1 dB to maxGain;
(59) stop here if precision level is 1 dB; add remaining integer multiples of 0.5 dB, going down from 0 dB to minGain; add remaining integer multiples of 0.5 dB, going up from 0.5 dB to maxGain;
(60) stop here if precision level is 0.5 dB; add remaining integer multiples of 0.25 dB, going down from 0 dB to minGain; and add remaining integer multiples of 0.25 dB, going up from 0.25 dB to maxGain.
(61) For example, when maxGain is 2 dB and minGain is −6 dB, and precision is 0.5 dB, the following list is crated:
(62) 0, −3, −6, −1, −2, −4, −5, 1, 2, −0.5, −1.5, −2.5, −3.5, −4.5, −5.5, 0.5, 1.5.
(63) With regard to the above embodiment, it is noted that the invention is not limited to the values indicated above, rather, instead of using integer multiples of 3 dB and starting from 0 dB, other values may be selected and also other values for the precision level may be selected depending on the circumstances.
(64) In general, the list of gain values may be created as follows: add integer multiples of a first gain value, between the minimum gain, inclusive, and a starting gain value, inclusive, in decreasing order; add remaining integer multiples of the first gain value, between the starting gain value, inclusive, and the maximum gain, inclusive, in increasing order; add remaining integer multiples of a first precision level, between the minimum gain, inclusive, and the starting gain value, inclusive, in decreasing order; add remaining integer multiples of the first precision level, between the starting gain value, inclusive, and the maximum gain, inclusive, in increasing order;
(65) stop here if precision level is the first precision level; add remaining integer multiples of a second precision level, between the minimum gain, inclusive, and the starting gain value, inclusive, in decreasing order; add remaining integer multiples of the second precision level, between the starting gain value, inclusive, and the maximum gain, inclusive, in increasing order;
(66) stop here if precision level is the second precision level; add remaining integer multiples of a third precision level, between the minimum gain, inclusive, and the starting gain value, inclusive, in decreasing order; and add remaining integer multiples of the third precision level, between the starting gain value, inclusive, and the maximum gain, inclusive, in increasing order.
(67) In the embodiment above, when the starting gain value is zero, the parts which add remaining values in increasing order and satisfying the associated multiplicity condition will initially add the first gain value or the first or second or third precision level. However, in the general case, the parts which add remaining values in increasing order will initially add the smallest value, satisfying the associated multiplicity condition, in the interval between the starting gain value, inclusive, and the maximum gain, inclusive. Correspondingly, the parts which add remaining values in decreasing order will initially add the largest value, satisfying the associated multiplicity condition, in the interval between the minimum gain, inclusive, and the starting gain value, inclusive. Considering an example similar to the one above but with a starting gain value=1 dB (a first gain value=3 dB, maxGain=2 dB, minGain=−6 dB and precision level=0.5 dB) yields the following:
(68) Down: 0, −3, −6
(69) Up: [empty]
(70) Down: 1, −2, −4, −5
(71) Up: 2
(72) Down: 0.5, −0.5, −1.5, −2.5, −3.5, −4.5, −5.5
(73) Up: 1.5
(74) To encode a gain value, the gain is looked up in the table and its position inside the table is output. The desired gain will be found because all the gains are previously quantized to the nearest integer multiple of the specified precision of, for example, 1 dB, 0.5 dB or 0.25 dB. In accordance with an embodiment, the positions of the gain values have associated therewith an index, indicating the position in the table and the indexes of the gains can be encoded, for example, using the limited Golomb-Rice coding approach. This results in small indexes to use a smaller number of bits than large indexes and, in this way, the frequently used values or the typical values, like 0 dB, −3 dB or −6 dB will use the smallest number of bits and also the more “round” values, like −4 dB, will use a smaller number of bits that the not so round numbers (for example, −4.5 dB). Thus, by using the above described embodiment not only a producer of the audio content may generate a desired list of gains, but these gains may also be encoded very efficiently so that when applying, in accordance with yet another embodiment, all the above described approaches, a highly efficient coding of downmix matrices can be achieved.
(75) The above described functionality may be part of an audio encoder as it has been described above with regard to
(76) Upon receiving the encoded compact downmix matrix at the receiver side, in accordance with embodiments a method for decoding is provided which decodes the encoded compact downmix matrix and un-groups (separates) the grouped speakers into single speakers, thereby yielding the original downmix matrix. When the encoding of the matrix includes encoding the significance values and the gain values, during the decoding step, these are decoded so that on the basis of the significance values and on the basis of the desired input/output configuration, the downmix matrix can be reconstructed and the respective decoded gains can be associated to the respective matrix elements of the reconstructed downmix matrix. This may be performed by a separate decoder that yields the completed downmix matrix to the audio decoder which may use it in a format converter, for example, the audio decoder described above with regard to
(77) Thus, the inventive approach as defined above provides also for a system and a method for presenting audio content having a specific input channel configuration to a receiving system having a different output channel configuration, wherein the additional information for the downmix is transmitted together with the encoded bit stream from the encoder side to the decoder side and, in accordance with the inventive approach, due to the very efficient coding of the downmix matrices the overhead is clearly reduced.
(78) In the following, a further embodiment implementing the efficient static downmix matrix coding is described. More specifically, an embodiment for a static downmix matrix with optional EQ coding will be described. As also mentioned earlier, one issue related to multichannel audio is to accommodate its real-time transmission, while maintaining compatibility with all the existing available consumer physical speaker setups. One solution is to provide, alongside the audio content in the original production format, downmix side information to generate the other formats which have less independent channels, if needed. Assuming an inputCount input channels and an outputCount output channels, the downmix procedure is specified by a downmix matrix of size inputCount by outputCount. This particular procedure represents a passive downmix, meaning no adaptive signal processing depending on the actual audio content is applied to the input signals or to the downmixed output signals. The inventive approach, in accordance with the embodiment described now, describes a complete scheme for efficient encoding of downmix matrices, including aspects about choosing a suitable representation domain and quantization scheme but also about lossless coding of the quantized values. Each matrix element represents a mixing gain which adjusts the level a given input channel contributes to a given output channel. The embodiment described now aims to achieve unrestricted flexibility by allowing encoding of arbitrary downmix matrixes, with a range and a precision that may be specified by the producer according to his needs. Also an efficient lossless coding is desired, so that typical matrices use a small amount of bits, and departing from typical matrices will only gradually decrease efficiency. This means that the more similar a matrix is to a typical one, the more efficient its coding will be. In accordance with embodiments, the necessitated precision can be specified by the producer as 1, 0.5, or 0.25 dB, to be used for uniform quantization. The values of the mixing gains may be specified between a maximum of +22 dB to a minimum of −47 dB inclusive, and also include the value −∞ (0 in linear domain). The effective value range that is used in the downmix matrix is indicated in the bit stream as a maximum gain value maxGain and a minimum gain value minGain, therefore not wasting any bits on values which are not actually used while not limiting flexibility.
(79) Assuming that an input channel list and also an output channel list is available which provide geometrical information about each speaker, such as the azimuth and elevation angles and optionally the speaker conventional name, for example according to International Standard ISO/IEC 23003-3:2012, Information technology—MPEG audio technologies—Part 3: Unified Speech and Audio Coding, 2012; or International Standard ISO/IEC 23001-8:2013, Information technology—MPEG systems technologies—Part 8: Coding-independent code points, 2013, an algorithm for encoding a downmix matrix, in accordance with embodiments, may be as shown in Table 1 below:
(80) TABLE-US-00001 TABLE 1 Syntax of DownmixMatrix No. of Syntax bits Mnemonic DownmixMatrix(inputConfig, inputCount, outputConfig, outputCount) { equalizerPresent; 1 uimsbf if (equalizerPresent) { EqualizerConfig(inputConfig, inputCount); } precisionLevel; 2 uimsbf maxGain = escapedValue(3, 4, 0); minGain = escapedValue(4, 5, 0) + 1; ConvertToCompactConfig(inputConfig, inputCount); ConvertToCompactConfig(outputConfig, outputCount); isAllSeparable; 1 uimsbf if (!isAllSeparable) { for (i = 0; i < compactOutputCount; i++) { if (compactOutputConfig[i].pairType == SYMMETRIC) { isSeparable[i]; 1 uimsbf } } } else { for (i = 0; i < compactOutputCount; i++) { if (compactOutputConfig[i].pairType == SYMMETRIC) { isSeparable[i] = 1; } } } isAllSymmetric; 1 uimsbf if (!isAllSymmetric) { for (i = 0; i < compactOutputCount; i++) { isSymmetric[i]; 1 uimsbf } } else { for (i = 0; i < compactOutputCount; i++) { isSymmetric[i] = 1; } mixLFEOnlyToLFE; 1 uimsbf rawCodingCompactMatrix; 1 uimsbf if (rawCodingCompactMatrix) { for (i = 0; i < compactInputCount; i++) { for (j = 0; j < compactOutputCount; j++) { if (!mixLFEOnlyToLFE || (compactInputConfig[i].isLFE == compactOutputConfig[j].isLFE)) { compactDownmixMatrix[i][j]; 1 uimsbf } else { compactDownmixMatrix[i][j] = 0; } } } } else { if (mixLFEOnlyToLFE) { compactInputLFECount = 0; compactOutputLFECount = 0; for (i = 0; i < compactInputCount; i++) { if (compactInputConfig[i].isLFE) compactInputLFECount++; } for (i = 0; i < compactOutputCount; i++) { if (compactOutputConfig[i].isLFE) compactOutputLFECount++; } totalCount = (compactInputCount − compactInputLFECount) * (compactOutputCount − compactOutputLFECount); } else { totalCount = compactInputCount * compactOutputCount; } useCompactTemplate; 1 uimsbf n = 3; if (totalCount >= 256) n = 4; runLGRParam; n uimsbf count = 0; flatCompactMatrix[totalCount + 1]; while (count < totalCount) { zeroRunLength; /* limited Golomb-Rice using runLGRparam */ varies bslbf flatCompactMatrix[count .. count + zeroRunLength] = {0, ..., 0, 1}; count += zeroRunLength + 1; } count = 0; for (i = 0; i < compactInputCount; i++) { for (j = 0; j < compactOutputCount; j++) { if (mixLFEOnlyToLFE && compactInputConfig[i].isLFE && compactOutputConfig[j].isLFE) { compactDownmixMatrix[i][j]; 1 uimsbf } else if (mixLFEOnlyToLFE && (compactInputConfig[i].isLFE {circumflex over ( )} compactOutputConfig[j].isLFE)) { compactDownmixMatrix[i][j] = 0; } else { compactDownmixMatrix[i][j] = flatCompactMatrix[count++]; } } } if (useCompactTemplate) { compactTemplate = FindCompactTemplate(inputConfig, inputCount, outputConfig, outputCount); for (i = 0; i < compactInputCount; i++) { for (j = 0; j < compactOutputCount; j++) { compactDownmixMatrix[i][j] {circumflex over ( )}= compactTemplate[i][j]; } } } } 1 uimsbf 1 uimsbf fullForAsymmetricInputs; rawCodingNonzeros; 3 uimsbf if (!rawCodingNonzeros) { gainLGRParam; generateGainTable(maxGain, minGain, precisionLevel); } for (i = 0; i < compactInputCount; i++) { iType = compactInputConfig[i].pairType; for (j = 0; j < compactOutputCount; j++) { oType = compactOutputConfig[j].pairType; i1 = compactInputConfig[i].originalPosition; o1 = compactOutputConfig[j].originalPosition; if ((iType != SYMMETRIC) && (oType != SYMMETRIC)) { downmixMatrix[i1][o1] = 0.0; if (!compactDownmixMatrix[i][j]) continue; downmixMatrix[i1][o1] = DecodeGainValue( ); } else if (iType != SYMMETRIC) { o2 = compactOutputConfig[j].SymmetricPair.originalPosition; downmixMatrix[i1][o1] = 0.0; downmixMatrix[i1][o2] = 0.0; if (!compactDownmixMatrix[i][j]) continue; downmixMatrix[i1][o1] = DecodeGainValue( ); useFull = (iType == ASYMMETRIC) && fullForAsymmetricInputs; if (isSymmetric[j] && !useFull) { downmixMatrix[i1][o2] = downmixMatrix[i1][o1]; } else { downmixMatrix[i1][o2] = DecodeGainValue( ); } } else if (oType != SYMMETRIC) { i2 = compactInputConfig[i].SymmetricPair.originalPosition; downmixMatrix[i1][o1] = 0.0; downmixMatrix[i2][o1] = 0.0; if (!compactDownmixMatrix[i][j]) continue; downmixMatrix[i1][o1] = DecodeGainValue( ); if (isSymmetric[j]) { downmixMatrix[i2][o1] = downmixMatrix[i1][o1]; } else { downmixMatrix[i2][o1] = DecodeGainValue( ); } } else { i2 = compactInputConfig[i].SymmetricPair.originalPosition; o2 = compactOutputConfig[j].SymmetricPair.originalPosition; downmixMatrix[i1][o1] = 0.0; downmixMatrix[i1][o2] = 0.0; downmixMatrix[i2][o1] = 0.0; downmixMatrix[i2][o2] = 0.0; if (!compactDownmixMatrix[i][j]) continue; downmixMatrix[i1][o1] = DecodeGainValue( ); if (isSeparable[j] && isSymmetric[j]) { downmixMatrix[i2][o2] = downmixMatrix[i1][o1]; } else if (!isSeparable[j] && isSymmetric[j]) { downmixMatrix[i1][o2] = DecodeGainValue( ); downmixMatrix[i2][o1] = downmixMatrix[i1][o2]; downmixMatrix[i2][o2] = downmixMatrix[i1][o1]; } else if (isSeparable[j] && !isSymmetric[j]) { downmixMatrix[i2][o2] = DecodeGainValue( ); } else { downmixMatrix[i1][o2] = DecodeGainValue( ); downmixMatrix[i2][o2] = DecodeGainValue( ); downmixMatrix[i2][o2] = DecodeGainValue( ); } } } } }
(81) An algorithm for decoding gain values, in accordance with embodiments, may be as shown in Table 2 below:
(82) TABLE-US-00002 TABLE 2 Syntax of DecodeGainValue No. of Syntax bits Mnemonic DecodeGainValue( ) { if (rawCodingNonzeros) { nAlphabet = (maxGain − minGain) * 2 {circumflex over ( )} precisionLevel + 1; gainValueIndex = ReadRange(nAlphabet); gainValue = maxGain − gainValueIndex / 2 {circumflex over ( )} precisonLevel; } else { gainValueIndex; /* limited Golomb-Rice using gainLGRParam */ varies bslbf gainValue = gainTable[gainValueIndex]; } }
(83) An algorithm for defining the read range function, in accordance with embodiments, may be as shown in Table 3 below:
(84) TABLE-US-00003 TABLE 3 Syntax of ReadRange No. of Syntax bits Mnemonic ReadRange(alphabetSize) { nBits = floor(log2(alphabetSize)); nUnused = 2 {circumflex over ( )} (nBits + 1) − alphabetSize; range; nBits uimsbf if (range >= nUnused) { rangeExtra; 1 uimsbf range = range * 2 − nUnused + rangeExtra; } return range; }
(85) An algorithm for defining the equalizer configuration, in accordance with embodiments, may be as shown in Table 4 below:
(86) TABLE-US-00004 TABLE 4 Syntax of EqualizerConfig No. of Syntax bits Mnemonic EqualizerConfig(inputConfig, inputCount) { numEqualizers = escapedValue(3, 5, 0) + 1; eqPrecisionLevel; 2 uimsbf eqExtendedRange; 1 uimsbf for (i = 0; i < numEqualizers; i++) { numSections = escapedValue(2, 4, 0) + 1; lastCenterFreqP10 = 0; lastCenterFreqLd2 = 10; maxCenterFreqLd2 = 99; for (j = 0; j < numSections; j++) { centerFreqP10 = lastCenterFreqP10 + ReadRange(4 − lastCenterFreqP10); if (centerFreqP10 > lastCenterFreqP10) lastCenterFreqLd2 = 10; if (centerFreqP10 == 3) maxCenterFreqLd2 = 24; centerFreqLd2 = lastCenterFreqLd2 + ReadRange(1 + maxCenterFreqLd2 − lastCenterFreqLd2); 5 uimsbf qFactorIndex; if (qFactorIndex > 19) { 3 uimsbf qFactorExtra; } cgBits = 4 + eqExtendedRange + eqPrecisionLevel; cgBits uimsbf centerGainIndex; } sgBits = 4 + eqExtendedRange + min(eqPrecisionLevel + 1, 3); uimsbf scalingGainIndex; sgBits } for (i = 0; i < inputCount; i++) { uimsbf hasEqualizer[i]; if (hasEqualizer[i]) { 1 equalizerIndex[i] = ReadRange(numEqualizers); } } }
(87) The elements of the downmix matrix, in accordance with embodiments, may be as shown in Table 5 below:
(88) TABLE-US-00005 TABLE 5 Elements of DownmixMatrix Field Description/Values paramConfig, Channel configuration vectors specifying the information about inputConfig, each speaker. Each entry, paramConfig[i], is a structure with the outputConfig members: AzimuthAngle, the absolute value of the speaker azimuth angle; AzimuthDirection, the azimuth direction, 0 (left) or 1 (right); ElevationAngle, the absolute value of the speaker elevation angle; ElevationDirection, the elevation direction, 0 (up) or 1 (down); alreadyUsed, indicates whether the speaker is already part of a group; isLFE, indicates whether the speaker is a LFE speaker. paramCount, Number of speakers in the corresponding channel configuration inputCount, vectors outputCount compactParamConfig, Compact channel configuration vectors specifying the information compactInputConfig, about each speaker group. Each entry, compactParamConfig[i], is compactOutputConfig a structure with the members: pairType, type of the speaker group, which can be SYMMETRIC (a symmetric pair of two speakers), CENTER, or ASYMMETRIC; isLFE, indicates whether the speaker group consists of LFE speakers; originalPosition, position in the original channel configuration of the first speaker, or the only speaker, in the group; symmetricPair.originalPosition, position in the original channel configuration of the second speaker in the group, for SYMMETRIC groups only. compactParamCount, Number of speaker groups in the corresponding compact channel compactInputCount, configuration vectors compactOutputCount equalizerPresent Boolean indicating whether equalizer information that is to be applied to the input channels is present precisionLevel Precision used for uniform quantization of the gains: 0 = 1 dB, 1 = 0.5 dB, 2 = 0.25 dB, 3 reserved maxGain Maximum actual gain in the matrix, expressed in dB: possible values from 0 to 22, in linear 1..12.589 minGain Minimum actual gain in the matrix, expressed in dB: possible values from −1 to −47, in linear 0.891..0.004 isAllSeparable Boolean indicating whether all the output speaker groups satisfy the separability property isSeparable[i] Boolean indicating whether the output speaker group with index i satisfies the separability property isAllSymmetric Boolean indicating whether all the output speaker groups satisfy the symmetry property isSymmetric[i] Boolean indicating whether the output speaker group with index i satisfies the symmetry property mixLFEOnlyToLFE Boolean indicating whether the LFE speakers are mixed only to LFE speakers and, at the same time, the non-LFE speakers are mixed only to non-LFE speakers rawCodingCompactMatrix Boolean indicating whether compactDownmixMatrix is coded raw (using one bit per entry) or it is coded using run-length coding followed by limited Golomb-Rice compactDownmixMatrix[i][j] An entry in compactDownmixMatrix corresponding to input speaker group i and output speaker group j, indicating whether any of the associated gains is nonzero: 0 = all gains are zero, 1 = at least one gain is nonzero useCompactTemplate Boolean indicating whether to apply an element-wise XOR to compactDownmixMatrix with a predefined compact template matrix, to improve the efficiency of the run-length coding runLGRParam Limited Golomb-Rice parameter used to code the zero run-lengths in the linearized flatCompactMatrix flatCompactMatrix Linearized version of compactDownmixMatrix with the predefined compact template matrix already applied; When mixLFEOnlyToLFE is enabled, it does not include the entries known to be zero (due to mixing between non-LFE and LFE) or those used for LFE to LFE mixing compactTemplate Predefined compact template matrix, having “typical” entries, which is XORed element-wise to compactDownmixMatrix, in order to improve coding efficiency by creating mostly zero value entries zeroRunLength The length of a zero run followeed by a one, in the flatCompactMatrix, which is coded with limited Golomb-Rice coding, using the parameter runLGRParam fullForAsymmetricInputs Boolean indicating whether to ignore the symmetry property for every asymmetric input speaker group; When enabled, every asymmetric input speaker group will have two gain values decoded for each symmetric output speaker group with index i, regardless of isSymmetric[i] gainTable Dynamically generated gain table which contains the list of all possible gains between minGain and maxGain with precision precisionLevel rawCodingNonzeros Boolean indicating whether the nonzero gain values are coded raw (uniform coding, using the ReadRange function) or their indexes in the gainTable list are coded using limited Golomb-Rice coding gainLGRParam Limited Golomb-Rice parameter used to code the nonzero gain indexes, computed by searching each gain in the gainTable list
(89) Golomb-Rice coding is used to code any non-negative integer n≥0, using a given non-negative integer parameter p≥0 as follows: first code the number h=└n/2.sup.p┘ using unary coding, as h one bits followed by a terminating zero bit; then code the number l=n−h.Math.2.sup.p uniformly using p bits.
(90) Limited Golomb-Rice coding is a trivial variant used when it is known in advance that n<N, for a given integer N≥1. It does not include the terminating zero bit when coding the maximum possible value of h, which is h.sub.max=└(N−1)/2.sup.p┘. More exactly, to encode h=h.sub.max we write only h one bits, but not the terminating zero bit, which is not needed because the decoder can implicitly detect this condition.
(91) The function ConvertToCompactConfig(paramConfig, paramCount) described below is used to convert the given paramConfig configuration consisting of paramCount speakers into the compact compactParamConfig configuration consisting of compactParamCount speaker groups. The compactParamConfig[i].pairType field can be SYMMETRIC (S), when the group represents a pair of symmetric speakers, CENTER (C), when the group represents a center speaker, or ASYMMETRIC (A), when the group represents a speaker without a symmetric pair.
(92) TABLE-US-00006 ConvertToCompactConfig(paramConfig, paramCount) { for (i = 0; i < paramCount; ++i) { paramConfig[i].alreadyUsed = 0; } idx = 0; for (i = 0; i < paramCount; ++i) { if (paramConfig[i].alreadyUsed) continue; compactParamConfig[idx].isLFE = paramConfig[i].isLFE; if ((paramConfig[i].AzimuthAngle == 0) || (paramConfig[i].AzimuthAngle == 180°) { compactParamConfig[idx].pairType = CENTER; compactParamConfig[idx].originalPosition = i; } else { j = SearchForSymmetricSpeaker(paramConfig, paramCount, i); if (j != −1) { compactParamConfig[idx].pairType = SYMMETRIC; if (paramConfig.AzimuthDirection == 0) { compactParamConfig[idx].originalPosition = i; compactParamConfig[idx].symmetricPair.originalPosition = j; } else { compactParamConfig[idx].originalPosition = j; compactParamConfig[idx].symmetricPair.originalPosition = i; } paramConfig[j].alreadyUsed = 1; } else { compactParamConfig[idx].pairType = ASYMMETRIC; compactParamConfig[idx].originalPosition = i; } } idx++; } compactParamCount = idx; }
(93) The function FindCompactTemplate(inputConfig, inputCount, outputConfig, outputCount) is used to find a compact template matrix matching the input channel configuration represented by inputConfig and inputCount, and the output channel configuration represented by outputConfig and outputCount.
(94) The compact template matrix is found by searching in a predefined list of compact template matrices, available at both the encoder and decoder, for the one with the same the set of input speakers as inputConfig and the same set of output speakers as outputConfig, regardless of the actual speaker order, which is not relevant. Before returning the found compact template matrix, the function may need to reorder its lines and columns to match the order of the speakers groups as derived from the given input configuration and the order of the speaker groups as derived from the given output configuration.
(95) If a matching compact template matrix is not found, the function shall return a matrix having the correct number of lines (which is the computed number of input speaker groups) and columns (which is the computed number of output speaker groups), which has for all entries the value one (1).
(96) The function SearchForSymmetricSpeaker(paramConfig, paramCount, i) is used to search the channel configuration represented by paramConfig and paramCount for the symmetric speaker corresponding to the speaker paramConfig[i]. This symmetric speaker, paramConfig[j], shall be situated after the speaker paramConfig[i]; therefore, j can be in the range i+1 to paramConfig−1, inclusive. Additionally, it shall not be already part of a speaker group, meaning that paramConfig[j].alreadyUsed has to be false.
(97) The function readRange( ) is used to read a uniformly distributed integer in the range 0 . . . alphabetSize−1 inclusive, which can have a total of alphabetSize possible values. This may be simply done reading ceil(log 2(alphabetSize)) bits, but without taking advantage of the unused values. For example, when alphabetSize is 3, the function will use just one bit for integer 0, and two bits for integers 1 and 2.
(98) The function generateGainTable(maxGain, minGain, precisionLevel) is used to dynamically generate the gain table gain Table which contains the list of all possible gains between minGain and maxGain with precision precisionLevel. The order of the values is chosen so that the most frequently used values and also more “round” values would be typically closer to the beginning of the list. The gain table with the list of all possible gain values is generated as follows: add integer multiples of 3 dB, going down from 0 dB to minGain; add integer multiples of 3 dB, going up from 3 dB to maxGain; add remaining integer multiples of 1 dB, going down from 0 dB to minGain; add remaining integer multiples of 1 dB, going up from 1 dB to maxGain;
(99) stop here if precisionLevel is 0 (corresponding to 1 dB); add remaining integer multiples of 0.5 dB, going down from 0 dB to minGain; add remaining integer multiples of 0.5 dB, going up from 0.5 dB to maxGain;
(100) stop here if precisionLevel is 1 (corresponding to 0.5 dB); add remaining integer multiples of 0.25 dB, going down from 0 dB to minGain; add remaining integer multiples of 0.25 dB, going up from 0.25 dB to maxGain.
(101) For example, when maxGain is 2 dB and minGain is −6 dB, and precisionLevel is 0.5 dB, we create the following list:
(102) 0, −3, −6, −1, −2, −4, −5, 1, 2, −0.5, −1.5, −2.5, −3.5, −4.5, −5.5, 0.5, 1.5.
(103) The elements for the equalizer configuration, in accordance with embodiments, may be as shown in Table 6 below:
(104) TABLE-US-00007 TABLE 6 Elements of EqualizerConfig Field Description/Values numEqualizers Number of different equalizer filters present eqPrecisionLevel Precision used for uniform quantization of the gains: 0 = 1 dB, 1 = 0.5 dB, 2 = 0.25 dB, 3 = 0.1 dB eqExtendedRange Boolean indicating whether to use an extended range for the gains; if enabled, the available range is doubled numSections Number of sections of an equalizer filter, each one being a peak filter centerFreqLd2 The leading two decimal digits of the center frequency for a peak filter; the maximum range is 10 . . . 99 centerFreqP10 Number of zeros to be appended to centerFreqLd2; the maximum range is 0 . . . 3 qFactorIndex Quality factor index for a peak filter qFactorExtra Extra bits for decoding a quality factor larger than 1.0 centerGainIndex Gain at the center frequency for a peak filter scalingGainIndex Scaling gain for an equalizer filter hasEqualizer[i] Boolean indicating whether the input channel with index i has an equalizer associated to it eqalizerIndex[i] The index of the equalizer associated with the input channel with index i
(105) In the following aspects of the decoding process in accordance with embodiments will be described, starting with the decoding of the downmix matrix.
(106) The syntax element DownmixMatrix( ) contains the downmix matrix information. The decoding first reads the equalizer information represented by the syntax element EqualizerConfig( ), if enabled. The fields precisionLevel, maxGain, and minGain are then read. The input and output configurations are converted to compact configurations using the function ConvertToCompactConfig( ). Then, the flags indicating if the separability and symmetry properties are satisfied for each output speaker group are read.
(107) The significance matrix compactDownmixMatrix is then read, either a) raw using one bit per entry, or b) using the limited Golomb-Rice coding of the run lengths, and then copying the decoded bits from flactCompactMatrix to compactDownmixMatrix and applying the compact Template matrix.
(108) Finally, the nonzero gains are read. For each nonzero entry of compactDownmixMatrix, depending on the field pairType of the corresponding input group and the field pairType of the corresponding output group, a sub-matrix of size up to 2 by 2 has to be reconstructed. Using the separability and symmetry associated properties, a number of gain values are read using the function DecodeGainValue( ). A gain value can be coded uniformly, by using the function ReadRange®, or using the limited Golomb-Rice coding of the indices of the gain in the gain Table table, which contains all the possible gain values.
(109) Now, aspects of the decoding of the equalizer configuration will be described. The syntax element EqualizerConfig( ) contains the equalizer information that is to be applied to the input channels. A number of numEqualizers equalizer filters is first decoded and thereafter selected for specific input channels using eqIndex[i]. The fields eqPrecisionLevel and eqExtendedRange indicate the quantization precision and the available range of the scaling gains and of the peak filter gains.
(110) Each equalizer filter is a serial cascade consisting in a number of numSections of peak filters and one scalingGain. Each peak filter is fully defined by its centerFreq, qualityFactor, and centerGain.
(111) The centerFreq parameters of the peak filters which belong to a given equalizer filter have to be given in non-decreasing order. The parameter is limited to 10 . . . 24000 Hz inclusive, and it is calculated as
centerFreq=centerFreqLd2×10.sup.centerFreqP10
(112) The qualityFactor parameter of the peak filter can represent values between 0.05 and 1.0 inclusive with a precision of 0.05 and from 1.1 to 11.3 inclusive with a precision of 0.1 and it is calculated as
(113)
(114) The vector eqPrecisions is introduced which gives the precision in dB corresponding to a given eqPrecisionLevel, and the eqMinRanges and eqMaxRanges matrices which give the minimum and maximum values in dB for the gains corresponding to a given eqExtendedRange and eqPrecisionLevel.
eqPrecisions[4]={1.0,0.5,0.25,0.1};
eqMinRanges[2][4]={{−8.0,−8.0,−8.0,−6.4},{−16.0,−16.0,−16.0,−12.8}};
eqMaxRanges[2][4]={{7.0,7.5,7.75,6.3},{15.0,15.5,15.75,12.7}};
(115) The parameter scalingGain uses the precision level min(eqPrecisionLevel+1,3), which is the next better precision level if not already the last one. The mappings from the fields centerGainIndex and scalingGainIndex to the gain parameters centerGain and scalingGain are calculated as
centerGain=eqMinRanges[eqExtendedRange][eqPrecisionLevel]+eqPrecisions[eqPrecisionLevel]×centerGainIndex
scalingGain=eqMinRanges[eqExtendedRange][min(eqPrecisionLevel+1,3)]+eqPrecisions[min(eqPrecisionLevel+1,3)]×scalingGainIndex
(116) Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, one or more of the most important method steps may be executed by such an apparatus.
(117) Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a non-transitory storage medium such as a digital storage medium, for example a floppy disc, a hard disk, a DVD, a Blu-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
(118) Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
(119) Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may, for example, be stored on a machine readable carrier.
(120) Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
(121) In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
(122) A further embodiment of the inventive method is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or non-transitionary.
(123) A further embodiment of the invention method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may, for example, be configured to be transferred via a data communication connection, for example, via the internet.
(124) A further embodiment comprises a processing means, for example, a computer or a programmable logic device, configured to, or programmed to, perform one of the methods described herein.
(125) A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
(126) A further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
(127) In some embodiments, a programmable logic device (for example, a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are performed by any hardware apparatus.
(128) While this invention has been described in terms of several advantageous embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is, therefore, intended that the following appended claims be interpreted as including all such alterations, permutations, and equivalents as fall within the true spirit and scope of the present invention.