Audio decoder and decoding method

11705143 · 2023-07-18

Assignee

Inventors

Cpc classification

International classification

Abstract

A method for representing a second presentation of audio channels or objects as a data stream, the method comprising the steps of: (a) providing a set of base signals, the base signals representing a first presentation of the audio channels or objects; (b) providing a set of transformation parameters, the transformation parameters intended to transform the first presentation into the second presentation; the transformation parameters further being specified for at least two frequency bands and including a set of multi-tap convolution matrix parameters for at least one of the frequency bands.

Claims

1. A method of decoding an encoded audio signal, comprising: receiving, by a decoder, an input bitstream; dividing the input bitstream into a base signal bitstream and transformation parameter data; decoding, by a base signal decoder, the base signal bitstream to generate base signals; processing the base signals, by an analysis filterbank, to generate frequency-domain signals having a plurality of subbands; applying, by a first matrix multiplication unit, a complex-valued convolution matrix to a first subband of the frequency-domain signals; applying, by a second matrix multiplication unit, complex-valued, single-tap matrix coefficients to a second subband of the frequency-domain signals; applying, by a third matrix multiplication unit, real-valued matrix coefficients to one or more remaining subbands of the frequency-domain signals; and converting, by a synthesis filterbank, output signals from the matrix multiplication units into a time-domain output.

2. A non-transitory computer-readable medium storing instructions that, when executed by a device, cause the device to perform operations comprising: receiving, by a decoder, an input bitstream; dividing the input bitstream into a base signal bitstream and transformation parameter data; decoding, by a base signal decoder, the base signal bitstream to generate base signals; processing the base signals, by an analysis filterbank, to generate frequency-domain signals having a plurality of subbands; applying, by a first matrix multiplication unit, a complex-valued convolution matrix to a first subband of the frequency-domain signals; applying, by a second matrix multiplication unit, complex-valued, single-tap matrix coefficients to a second subband of the frequency-domain signals; applying, by a third matrix multiplication unit, real-valued matrix coefficients to one or more remaining subbands of the frequency-domain signals; and converting, by a synthesis filterbank, output signals from the matrix multiplication units into a time-domain output.

3. A system comprising: a processor; and a non-transitory computer-readable medium storing instructions that, when executed by the processor, cause the processor to perform operations comprising: receiving, by a decoder, an input bitstream; dividing the input bitstream into a base signal bitstream and transformation parameter data; decoding, by a base signal decoder, the base signal bitstream to generate base signals; processing the base signals, by an analysis filterbank, to generate frequency-domain signals having a plurality of subbands; applying, by a first matrix multiplication unit, a complex-valued convolution matrix to a first subband of the frequency-domain signals; applying, by a second matrix multiplication unit, complex-valued, single-tap matrix coefficients to a second subband of the frequency-domain signals; applying, by a third matrix multiplication unit, real-valued matrix coefficients to one or more remaining subbands of the frequency-domain signals; and converting, by a synthesis filterbank, output signals from the matrix multiplication units into a time-domain output.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) Embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings in which:

(2) FIG. 1 illustrates a schematic overview of the HRIR convolution process for two sources objects, with each channel or object being processed by a pair of HRIRs/BRIRs;

(3) FIG. 2 illustrates schematically a generic parametric coding system supporting channels and objects;

(4) FIG. 3 illustrates schematically one form of channel or object reconstruction unit 30 of FIG. 2 in more detail;

(5) FIG. 4 illustrates the data flow of a method to transform a stereo loudspeaker presentation into a binaural headphones presentation;

(6) FIG. 5 illustrates schematically the hybrid analysis filter bank structure according to prior art;

(7) FIG. 6 illustrates a comparison of the desired (dashed line) and actual (solid line) phase response obtained with the prior art;

(8) FIG. 7 illustrates schematically an exemplary encoder filter bank and parameter mapping system in accordance with an embodiment of the invention;

(9) FIG. 8 illustrates schematically the decoder filter bank and parameter mapping according to an embodiment; and

(10) FIG. 9 illustrates an encoder for transformation of stereo to binaural presentations.

(11) FIG. 10 illustrates schematically a decoder for transformation of stereo to binaural presentations.

REFERENCES

(12) Wightman, F. L., and Kistler, D. J. (1989). “Headphone simulation of free-field listening. I. Stimulus synthesis,” J. Acoust. Soc. Am. 85, 858-867. Schuijers, Erik, et al. (2004). “Low complexity parametric stereo coding.” Audio Engineering Society Convention 116. Audio Engineering Society. Herre, J., Kjörling, K., Breebaart, J., Faller, C., Disch, S., Purnhagen, H., . . . & Chong, K. S. (2008). MPEG surround-the ISO/MPEG standard for efficient and compatible multichannel audio coding. Journal of the Audio Engineering Society, 56(11), 932-955. Herre, J., Purnhagen, H., Koppens, J., Hellmuth, O., Engdegård, J., Hilpert, J., & Oh, H. O. (2012). MPEG Spatial Audio Object Coding—the ISO/MPEG standard for efficient coding of interactive audio scenes. Journal of the Audio Engineering Society, 60(9), 655-673. Brandenburg, K., & Stoll, G. (1994). ISO/MPEG-1 audio: A generic standard for coding of high-quality digital audio. Journal of the Audio Engineering Society, 42(10), 780-792. Bosi, M., Brandenburg, K., Quackenbush, S., Fielder, L., Akagiri, K., Fuchs, H., & Dietz, M. (1997). ISO/IEC MPEG-2 advanced audio coding. Journal of the Audio engineering society, 45(10), 789-814. Andersen, R. L., Crockett, B. G., Davidson, G. A., Davis, M. F., Fielder, L. D., Turner, S. C., . . . & Williams, P. A. (2004, October). Introduction to Dolby digital plus, an enhancement to the Dolby digital coding system. In Audio Engineering Society Convention 117. Audio Engineering Society. Zwicker, E. (1961). Subdivision of the audible frequency range into critical bands (Frequenzgruppen). The Journal of the Acoustical Society of America, (33 (2)), 248. Breebaart, J., van de Par, S., Kohlrausch, A., & Schuijers, E. (2005). Parametric coding of stereo audio. EURASIP Journal on Applied Signal Processing, 2005, 1305-1322. Breebaart, J., Nater, F., & Kohlrausch, A. (2010). Spectral and spatial parameter resolution requirements for parametric, filter-bank-based HRTF processing. Journal of the Audio Engineering Society, 58(3), 126-140. Breebaart, J., van de Par, S., Kohlrausch, A., & Schuijers, E. (2005). Parametric coding of stereo audio. EURASIP Journal on Applied Signal Processing, 2005, 1305-1322.

DETAILED DESCRIPTION

(13) This preferred embodiment provides a method to reconstruct objects, channels or ‘presentations’ from a set of base signals that can be applied in filter banks with a low frequency resolution. One example is the transformation of a stereo presentation into a binaural presentation intended for headphone playback that can be applied without a Nyquist (hybrid) filter bank. The reduced decoder frequency resolution is compensated for by a multi-tap, convolution matrix. This convolution matrix requires only a few taps (e.g. two) and in practical cases, is only required at low frequencies. This method (1) reduces the computational complexity of a decoder, (2) reduces the memory usage of a decoder, and (3) reduces the parameter bit rate.

(14) In the preferred embodiment there is provided a system and method for overcoming the undesirable decoder-side computational complexity and memory requirements. This is implemented by providing a high frequency resolution in an encoder, utilising a constrained (lower) frequency resolution in the decoder (e.g., use a frequency resolution that is significantly worse than the one used in the corresponding encoder), and utilising a multi-tap (convolution) matrix to compensate for the reduced decoder frequency resolution.

(15) Typically, since a high-frequency matrix resolution is only required at low frequencies, the multi-tap (convolution) matrix can be used at low frequencies, while a conventional (stateless) matrix can be used for the remaining (higher) frequencies. In other words, at low frequencies, the matrix represents a set of FIR filters operating on each combination of input and output, while at high frequencies, a stateless matrix is used.

(16) Encoder Filter Bank and Parameter Mapping

(17) FIG. 7 illustrates 90 an exemplary encoder filter bank and parameter mapping system according to an embodiment. In this example embodiment 90, 8 sub bands (b=1, . . . , 8) e.g. 91 are initially generated by means of a hybrid (cascaded) filter bank 92 and Nyquist filter bank 93. Subsequently, the first four sub bands are mapped 94 onto one and the same parameter band (p=1) to compute a convolution matrix M[k, p=1], e.g., the matrix now has an additional index k. The remaining sub bands (b=5, . . . , 8) are mapped onto parameter bands (p=2, 3) using state-less matrices M[p(b)] 95, 96.

(18) Decoder Filter Bank and Parameter Mapping

(19) FIG. 8 illustrates the corresponding exemplary decoder filter bank and parameter mapping system 100. In contrast to the encoder, no Nyquist filter bank is present, nor are there any delays to compensate for the Nyquist filter bank delay. The decoder analysis filter bank 101 generates only 5 sub bands (b=1, . . . , 5) e.g. 102 that are down sampled by a factor Q. The first sub band is processed by a convolution matrix M[k, p=1] 103, while the remaining bands are processed by stateless matrices 104, 105 according to the prior art.

(20) Although the example above applies a Nyquist filter bank in the encoder 90 and a corresponding convolution matrix for the first CQMF sub band in the decoder 100 only, the same process can be applied to a multitude of sub bands, not necessarily limited to the lowest sub band(s) only.

Encoder Embodiment

(21) One embodiment which is especially useful is in the transformation of a loudspeaker presentation into a binaural presentation. FIG. 9 illustrates an encoder 110 using the proposed method for the presentation transformation. A set of input channels or objects x.sub.i[n] is first transformed using a filter bank 111. The filter bank 111 is a hybrid complex quadrature mirror filter (HCQMF) bank, but other filter bank structures can equally be used. The resulting sub-band representations X.sub.i[k, b] are processed twice 112, 113.

(22) Firstly 113, to generate a set of base signals Z.sub.S[k, b] 113 intended for output of the encoder. This output can, for example, be generated using amplitude panning techniques so that the resulting signals are intended for loudspeaker playback.

(23) Secondly 112, to generate a set of desired transformed signals Y.sub.j[k, b] 112. This output can, for example, be generated using HRIR processing so that the resulting signals are intended for headphone playback. Such HRIR processing may be employed in the filter-bank domain, but can equally be performed in the time domain by means of HRIR convolution. The HRIRs are obtained from a database 114.

(24) The convolution matrix M[k, p] is subsequently obtained by feeding the base signals Z.sub.S[k, b] through a tapped delay line 116. Each of the taps of the delay lines serve as additional inputs to a MMSE predictor stage 115. This MMSE predictor stage computes the convolution matrix M[k, p] that minimizes the error between the desired transformed signals Y.sub.j[k, b] and the output of the decoder 100 of FIG. 8, applying convolution matrices. It then follows that the matrix coefficients M[k, p] are given by:
M=(Z*Z+ϵI).sup.−1Z*Y
In this formulation, the matrix Z contains all inputs of the tapped delay lines.

(25) Taking initially the case for the reconstruction of the one signal Ŷ[k] for a given sub band b, where there are A inputs from the tapped delay lines, one has:

(26) Z = [ Z 1 [ 0 , b ] .Math. Z 1 [ - ( A - 1 ) , b ] Z S [ 0 , b ] .Math. Z S [ - ( A - 1 ) , b .Math. .Math. .Math. .Math. Z 1 [ K - 1 , b ] .Math. Z 1 [ K - 1 - ( A - 1 ) , b ] Z S [ K - 1 , b ] .Math. Z S [ K - 1 - ( A - 1 ) , b ] ] Y = [ Y 1 [ 0 , b ] .Math. Y 1 [ K - 1 , b ] ] M = [ m 1 [ 0 , b ] .Math. m S [ 0 , b ] .Math. .Math. m 1 [ A - 1 , b ] .Math. m S [ A - 1 , b ] ] = ( Z * Z + ϵ1 ) - 1 Z * Y

(27) The resulting convolution matrix coefficients M[k, p] are quantized, encoded, and transmitted along with the base signals z.sub.s[n]. The decoder can then use a convolution process to reconstruct Ŷ[k,b] from input signals Z.sub.S[k, b]:

(28) Y ^ [ k , b ] = .Math. s Z s [ k , b ] * m s [ . , b ]
or written differently using a convolution expression:

(29) 0 Y ^ [ k , b ] = .Math. s .Math. a = 0 A - 1 Z s [ k - a , b ] m s [ a , b ]

(30) The convolution approach can be mixed with a linear (stateless) matrix process.

(31) A further distinction can be made between complex-valued and real-valued stateless matrixing. At low frequencies (typically below 1 kHz), the convolution process (A>1) is preferred to allow accurate reconstruction of inter-channel properties in line with a perceptual frequency scale. At medium frequencies, up to about 2 or 3 kHz, the human hearing system is sensitive to inter-channel phase differences, but does not require a very high frequency resolution for reconstruction of such phase. This implies that a single tap (stateless), complex-valued matrix suffices. For higher frequencies, the human auditory system is virtually insensitive to waveform fine-structure phase, and real-valued, stateless matrixing suffices. With increasing frequencies, the number of filter bank outputs mapped onto a parameter band typically increases to reflect the non-linear frequency resolution of the human auditory system.

(32) In another embodiment, the first and second presentations in the encoder are interchanged, e.g., the first presentation is intended for headphone playback, and the second presentation is intended for loudspeaker playback. In this embodiment, the loudspeaker presentation (second presentation) is generated by applying time-dependent transformation parameters in at least two frequency bands to the first presentation, in which the transformation parameters are further being specified as including a set of filter coefficients for at least one of the frequency bands.

(33) In some embodiments, the first presentation can be temporally divided up into a series of segments, with a separate set of transformation parameters for each segment. In a further refinement, where segment transformation parameters are unavailable, the parameters can be interpolated from previous coefficients.

Decoder Embodiment

(34) FIG. 10 illustrates an embodiment of the decoder 120. Input bitstream 121 is divided into a base signal bit stream 131 and transformation parameter data 124. Subsequently, a base signal decoder 123 decodes the base signals z[n], which are subsequently processed by an analysis filterbank 125. The resulting frequency-domain signals Z[k,b] with sub-band b=1, . . . , 5 are processed by matrix multiplication units 126, 129 and 130. In particular, matrix multiplication unit 126 applies a complex-valued convolution matrix M[k,p=1] to frequency-domain signal Z[k, b=1]. Furthermore, matrix multiplier unit 129 applies complex-valued, single-tap matrix coefficients M[p=2] to signal Z[k, b=2]. Lastly, matrix multiplication unit 130 applies real-valued matrix coefficients M[p=3] to frequency-domain signals Z[k, b=3 . . . 5]. The matrix multiplication unit output signals are converted to time-domain output 128 by means of a synthesis filterbank 127. References to z[n], Z[k], etc. refer to the set of base signals, rather than any specific base signal. Thus, z[n], Z[k], etc. may be interpreted as z.sub.s[n], Z.sub.s[k], etc., where 0≤s<N, and N is the number of base signals.

(35) In other words, matrix multiplication unit 126 determines output samples of sub-band b=1 of an output signal Ŷ.sub.j[k] from weighted combinations of current samples of sub-band b=1 of base signals Z[k] and previous samples of sub-band b=1 of base signals Z[k] (e.g., Z[k-a], where 0<a<A, and A is greater than 1). The weights used to determine the output samples of sub-band b=1 of output signal Ŷ.sub.j[k] correspond to the complex-valued convolution matrix M[k, p=1] for signal.

(36) Furthermore, matrix multiplier unit 129 determines output samples of sub-band b=2 of output signal Ŷ.sub.j[k] from weighted combinations of current samples of sub-band b=2 of base signals Z[k]. The weights used to determine the output samples of sub-band b=2 of output signal Ŷ.sub.j[k] correspond to the complex-valued, single-tap matrix coefficients M[p=2].

(37) Finally, matrix multiplier unit 130 determines output samples of sub-bands b=3 . . . 5 of output signal Ŷ.sub.j [k] from weighted combinations of current samples of sub-bands b=3 . . . 5 of base signals Z[k]. The weights used to determine output samples of sub-bands b=3 . . . 5 of output signal Ŷ.sub.j[k] correspond to the real-valued matrix coefficients M[p=3].

(38) In some cases, the base signal decoder 123 may operate on signals at the same frequency resolution as that provided by analysis filterbank 125. In such cases, base signal decoder 125 may be configured to output frequency-domain signals Z[k] rather than time-domain signals z[n], in which case analysis filterbank 125 may be omitted. Furthermore, in some instances, it may be preferable to apply complex-valued single-tap matrix coefficients, instead of real-valued matrix coefficients, to frequency-domain signals Z[k, b=3 . . . 5].

(39) In practice, the matrix coefficients M can be updated over time; for example by associating individual frames of the base signals with matrix coefficients M. Alternatively, or additionally, matrix coefficients M are augmented with time stamps, which indicate at which time or interval of the base signals z[n] the matrices should be applied. To reduce the transmission bit rate associated with matrix updates, the number of updates is ideally limited, resulting in a time-sparse distribution of matrix updates. Such infrequent updates of matrices requires dedicated processing to ensure smooth transitions from one instance of the matrix to the next. The matrices M may be provided associated with specific time segments (frames) and/or frequency regions of the base signals Z. The decoder may employ a variety of interpolation methods to ensure a smooth transition from subsequent instances of the matrix M over time. One example of such interpolation method is to compute overlapping, windowed frames of the signals Z, and computing a corresponding set of output signals Y for each of such frame using the matrix coefficients M associated with that particular frame. The subsequent frames can then be aggregated using an overlap-add technique providing a smooth cross-faded transition. Alternatively, the decoder may receive time stamps associated with matrices M, which describe the desired matrix coefficients at specific instances in time. For audio samples in-between time stamps, the matrix coefficients of matrix M may be interpolated using linear, cubic, band-limited, or other means for interpolation to ensure smooth transitions. Besides interpolation across time, similar techniques may be used to interpolate matrix coefficients across frequency.

(40) Hence, the present document describes a method (and a corresponding encoder 90) for representing a second presentation of audio channels or objects X.sub.i as a data stream that is to be transmitted or provided to a corresponding decoder 100. The method comprises the step of providing base signals Z.sub.s, said base signals representing a first presentation of the audio channels or objects X.sub.i. As outlined above, the base signals Z.sub.s may be determined from the audio channels or objects X.sub.i using first rendering parameters G (i.e. notably using a first gain matrix, e.g. for amplitude panning). The first presentation may be intended for loudspeaker playback or for headphone playback. On the other hand, the second presentation may be intended for headphone playback or for loudspeaker playback. Hence, a transformation from loudspeaker playback to headphone playback (or vice versa) may be performed.

(41) The method further comprises providing transformation parameters M (notably one or more transformation matrices), said transformation parameters M intended to transform the base signals Z.sub.S of said first presentation into output signals Ŷ.sub.j of said second presentation. The transformation parameters may be determined as outlined in the present document. In particular, desired output signals Y.sub.j for the second presentation may be determined from the audio channels or objects X.sub.i using second rendering parameters H (as outlined in the present document). The transform parameters M may be determined by minimizing a deviation of the output signals Ŷ.sub.j from the desired output signals Y.sub.j (e.g. using a minimum mean-square error criterion).

(42) Even more particularly, the transform parameters M may be determined in the sub-band-domain (i.e. for different frequency bands). For this purpose, sub-band-domain base signals Z[k,b] may be determined for B frequency bands using an encoder filter bank 92, 93. The number B of frequency bands is greater than one, e.g. B equal to or greater than 4, 6, 8, 10. In the examples described in the present document B=8 or B=5. As outlined above, the encoder filter bank 92, 93 may comprise a hybrid filter bank which provides low frequency bands the B frequency bands having a higher frequency resolution than high frequency bands of the B frequency bands. Furthermore, sub-band-domain desired output signals Y[k,b] for the B frequency bands may be determined. The transform parameters M for one or more frequency bands may be determined by minimizing a deviation of the output signals Ŷ.sub.j from the desired output signals Y.sub.j within the one or more frequency bands (e.g. using a minimum mean-square error criterion).

(43) The transformation parameters M may therefore each be specified for at least two frequency bands (notably for B frequency bands). Furthermore, the transformation parameters may include a set of multi-tap convolution matrix parameters for at least one of the frequency bands.

(44) Hence, a method (and a corresponding decoder) for determining output signals of a second presentation of audio channels/objects from base signals of a first presentation of the audio channels/objects is described. The first presentation may be used for loudspeaker playback and the second presentation may be used for headphone playback (or vice versa). The output signals are determined using transformation parameters for different frequency bands, wherein the transformation parameters for at least one of the frequency bands comprises multi-tap convolution matrix parameters. As a result of using multi-tap convolution matrix parameters for at least one of the frequency bands, the computational complexity of a decoder 100 may be reduced, notably by reducing the frequency resolution of a filter bank used by the decoder.

(45) For example, determining an output signal for a first frequency band using multi-tap convolution matrix parameters may comprise determining a current sample of the first frequency band of the output signal as a weighted combination of current, and one or more previous, samples of the first frequency band of the base signals, wherein the weights used to determine the weighted combination correspond to the multi-tap convolution matrix parameters for the first frequency band. One of more of the multi-tap convolution matrix parameters for the first frequency band are typically complex-valued.

(46) Furthermore, determining an output signal for a second frequency band may comprise determining a current sample of the second frequency band of the output signal as a weighted combination of current samples of the second frequency band of the base signals (and not based on previous samples of the second frequency band of the base signals), wherein the weights used to determine the weighted combination correspond to transformation parameters for the second frequency band. The transformation parameters for the second frequency band may be complex-valued, or may alternatively be real-valued.

(47) In particular, the same set of multi-tap convolution matrix parameters may be determined for at least two adjacent frequency bands of the B frequency bands. As illustrated in FIG. 7, a single set of multi-tap convolution matrix parameters may be determined for the frequency bands provided by the Nyquist filter bank (i.e. for the frequency bands having a relatively high frequency resolution). By doing this, the use of a Nyquist filter bank within the decoder 100 may be omitted, thereby reducing the computational complexity of the decoder 100 (while maintaining the quality of the output signals for the second presentation).

(48) Furthermore, the same real-valued transform parameter may be determined for at least two adjacent high frequency bands (as illustrated in the context of FIG. 7). By doing this, the computational complexity of the decoder 100 may be further reduced (while maintaining the quality of the output signals for the second presentation).

(49) Interpretation

(50) Reference throughout this specification to “one embodiment”, “some embodiments” or “an embodiment” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases “in one embodiment”, “in some embodiments” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment, but may. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner, as would be apparent to one of ordinary skill in the art from this disclosure, in one or more embodiments.

(51) As used herein, unless otherwise specified the use of the ordinal adjectives “first”, “second”, “third”, etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.

(52) In the claims below and the description herein, any one of the terms comprising, comprised of or which comprises is an open term that means including at least the elements/features that follow, but not excluding others. Thus, the term comprising, when used in the claims, should not be interpreted as being limitative to the means or elements or steps listed thereafter. For example, the scope of the expression a device comprising A and B should not be limited to devices consisting only of elements A and B. Any one of the terms including or which includes or that includes as used herein is also an open term that also means including at least the elements/features that follow the term, but not excluding others. Thus, including is synonymous with and means comprising.

(53) As used herein, the term “exemplary” is used in the sense of providing examples, as opposed to indicating quality. That is, an “exemplary embodiment” is an embodiment provided as an example, as opposed to necessarily being an embodiment of exemplary quality.

(54) It should be appreciated that in the above description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this invention.

(55) Furthermore, while some embodiments described herein include some but not other features included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention, and form different embodiments, as would be understood by those skilled in the art. For example, in the following claims, any of the claimed embodiments can be used in any combination.

(56) Furthermore, some of the embodiments are described herein as a method or combination of elements of a method that can be implemented by a processor of a computer system or by other means of carrying out the function. Thus, a processor with the necessary instructions for carrying out such a method or element of a method forms a means for carrying out the method or element of a method. Furthermore, an element described herein of an apparatus embodiment is an example of a means for carrying out the function performed by the element for the purpose of carrying out the invention.

(57) In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In other instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

(58) Similarly, it is to be noticed that the term coupled, when used in the claims, should not be interpreted as being limited to direct connections only. The terms “coupled” and “connected,” along with their derivatives, may be used. It should be understood that these terms are not intended as synonyms for each other. Thus, the scope of the expression a device A coupled to a device B should not be limited to devices or systems wherein an output of device A is directly connected to an input of device B. It means that there exists a path between an output of A and an input of B which may be a path including other devices or means. “Coupled” may mean that two or more elements are either in direct physical or electrical contact, or that two or more elements are not in direct contact with each other but yet still co-operate or interact with each other.

(59) Thus, while there has been described what are believed to be the preferred embodiments of the invention, those skilled in the art will recognize that other and further modifications may be made thereto without departing from the spirit of the invention, and it is intended to claim all such changes and modifications as falling within the scope of the invention. For example, any formulas given above are merely representative of procedures that may be used. Functionality may be added or deleted from the block diagrams and operations may be interchanged among functional blocks. Steps may be added or deleted to methods described within the scope of the present invention.

(60) Various aspects of the present invention may be appreciated from the following enumerated example embodiments (EEESs):

(61) EEE 1. A method for representing a second presentation of audio channels or objects as a data stream, the method comprising the steps of:

(62) (a) providing a set of base signals, said base signals representing a first presentation of the audio channels or objects;

(63) (b) providing a set of transformation parameters, said transformation parameters intended to transform said first presentation into said second presentation; said transformation parameters further being specified for at least two frequency bands and including a set of multi-tap convolution matrix parameters for at least one of the frequency bands.

(64) EEE 2. The method of EEE 1 wherein said set of filter coefficients represent a finite impulse response (FIR) filter.

(65) EEE 3. The method of any previous EEE wherein said set of base signals are divided up into a series of temporal segments, and a set of transformation parameters is provided for each temporal segment.

(66) EEE 4. The method of any previous EEE, in which said filter coefficients include at least one coefficient that is complex valued.

(67) EEE 5. The method of any previous EEE, wherein the first or the second presentation is intended for headphone playback.

(68) EEE 6. The method of any previous EEE wherein the transformation parameters associated with higher frequencies do not modify the signal phase, while for lower frequencies, the transformation parameters do modify the signal phase.

(69) EEE 7. The method of any previous EEE wherein said set of filter coefficients are operable for processing a multi tap convolution matrix.

(70) EEE 8. The method of EEE 7 wherein said set of filter coefficients are utilized to process a low frequency band,

(71) EEE 9. The method of any previous EEE wherein said set of base signals and said set of transformation parameters are combined to form said data stream.

(72) EEE 10. The method of any previous EEE wherein said transformation parameters include high frequency audio matrix coefficients for matrix manipulation of a high frequency portion of said set of base signals.

(73) EEE 11. The method of EEE 10 wherein for a medium frequency portion of the high frequency portion of said set of base signals, the matrix manipulation includes complex valued transformation parameters.

(74) EEE 12. A decoder for decoding an encoded audio signal, the encoded audio signal including:

(75) a first presentation including a set of audio base signals intended for reproduction of the audio in a first audio presentation format; and

(76) a set of transformation parameters, for transforming said audio base signals in said first presentation format, into a second presentation format, said transformation parameters including at least high frequency audio transformation parameters and low frequency audio transformation parameters, with said low frequency transformation parameters including multi tap convolution matrix parameters,

(77) the decoder including:

(78) first separation unit for separating the set of audio base signals, and the set of transformation parameters,

(79) a matrix multiplication unit for applying said multi tap convolution matrix parameters to low frequency components of the audio base signals; to apply a convolution to the low frequency components, producing convolved low frequency components; and

(80) a scalar multiplication unit for applying said high frequency audio transformation parameters to high frequency components of the audio base signals to produce scalar high frequency components;

(81) an output filter bank for combining said convolved low frequency components and said scalar high frequency components to produce a time domain output signal in said second presentation format.

(82) EEE 13. The decoder of EEE 12 wherein said matrix multiplication unit modifies the phase of the low frequency components of the audio base signals.

(83) EEE 14. The decoder of EEE 12 or 13 wherein said multi tap convolution matrix transformation parameters are complex valued.

(84) EEE 15. The decoder of any one of EEEs 12 to 14, wherein said high frequency audio transformation parameters are complex-valued.

(85) EEE 16. The decoder of EEE 15, wherein said set of transformation parameters further comprises real-valued higher frequency audio transformation parameters.

(86) EEE 17. The decoder of any one of EEEs 12 to 16, further comprising filters for separating the audio base signals into said low frequency components and said high frequency components.

(87) EEE 18. A method of decoding an encoded audio signal, the encoded audio signal including:

(88) a first presentation including a set of audio base signals intended for reproduction of the audio in a first audio presentation format; and

(89) a set of transformation parameters, for transforming said audio base signals in said first presentation format, into a second presentation format, said transformation parameters including at least high frequency audio transformation parameters and low frequency audio transformation parameters, with said low frequency transformation parameters including multi tap convolution matrix parameters,

(90) the method including the steps of:

(91) convolving low frequency components of the audio base signals with the low frequency transformation parameters to produce convolved low frequency components;

(92) multiplying high frequency components of the audio base signals with the high frequency transformation parameters to produce multiplied high frequency components;

(93) combining said convolved low frequency components and said multiplied high frequency components to produce output audio signal frequency components for playback over a second presentation format.

(94) EEE 19. The method of EEE 18, wherein said encoded signal comprises multiple temporal segments, said method further includes the steps of:

(95) interpolating transformation parameters of multiple temporal segments of the encoded signal to produce interpolated transformation parameters, including interpolated low frequency audio transformation parameters; and

(96) convolving multiple temporal segments of the low frequency components of the audio base signals with the interpolated low frequency audio transformation parameters to produce multiple temporal segments of said convolved low frequency components.

(97) EEE 20. The method of EEE 18 wherein the set of transformation parameters of said encoded audio signal are time varying, and said method further includes the steps of:

(98) convolving the low frequency components with the low frequency transformation parameters for multiple temporal segments to produce multiple sets of intermediate convolved low frequency components;

(99) interpolating the multiple sets of intermediate convolved low frequency components to produce said convolved low frequency components.

(100) EEE 21. The method of either EEE 19 or EEE 20 wherein said interpolating utilizes an overlap and add method of the multiple sets of intermediate convolved low frequency components.

(101) EEE 22. The method of any one of EEEs 18-21, further comprising filtering the audio base signals into said low frequency components and said high frequency components.

(102) EEE 23. A computer readable non transitory storage medium including program instructions for the operation of a computer in accordance with the method of any one of EEEs 1 to 11, and 18-22.