METHODS AND APPARATUS FOR COMPRESSING AND DECOMPRESSING A HIGHER ORDER AMBISONICS REPRESENTATION

Abstract

Higher Order Ambisonics represents three-dimensional sound independent of a specific loudspeaker set-up. However, transmission of an HOA representation results in a very high bit rate. Therefore, compression with a fixed number of channels is used, in which directional and ambient signal components are processed differently. The ambient HOA component is represented by a minimum number of HOA coefficient sequences. The remaining channels contain either directional signals or additional coefficient sequences of the ambient HOA component, depending on what will result in optimum perceptual quality. This processing can change on a frame-by-frame basis.

Claims

1. A method for decompressing a compressed Higher Order Ambisonics (HOA) representation, the method comprising: decoding the compressed HOA representation to provide a decoded frame of channels and a first set of indices indicating active directions of the HOA representation, and a second set of directions indicating energetically dominant components of the compressed HOA representation; re-distributing the decoded frame of channels based on the first set of indices and the second set of directions indicating energetically dominant components, wherein the re-distribution determines a frame of HOA directional signals and a frame of ambient HOA components; and re-composing a current decompressed frame of the HOA representation from the frame of HOA directional signals and the frame of ambient HOA components.

2. The method of claim 1, wherein the energetically dominant components of the compressed HOA representation were determined based on a search using directional power distribution.

3. The method of claim 1, wherein the frame of HOA directional is created based on the first set of indices and the second set of directions indicating energetically dominant components.

4. A non-transitory computer readable storage medium containing instructions that when executed by a processor perform the method according to claim 1.

5. An apparatus for decompressing a Higher Order Ambisonics (HOA) representation, the apparatus comprising: a decoder for decoding the compressed HOA representation to provide a decoded frame of channels and a first set of indices indicating active directions of the HOA representation, and a second set of directions indicating energetically dominant components of the compressed HOA representation; a first processor for re-distributing the decoded frame of channels based on the first set of indices and the second set of directions indicating energetically dominant components, wherein the re-distribution determines a frame of HOA directional signals and a frame of ambient HOA components; and a second processor for re-composing a current decompressed frame of the HOA representation from the frame of HOA directional signals and the frame of ambient HOA components.

Description

BRIEF DESCRIPTION OF DRAWINGS

[0040] Exemplary embodiments of the invention are described with reference to the accompanying drawings, which show in:

[0041] FIG. 1 illustrates block diagram for the HOA compression;

[0042] FIG. 2 illustrates estimation of dominant sound source directions;

[0043] FIG. 3 illustrates block diagram for the HOA decompression;

[0044] FIG. 4 illustrates spherical coordinate system;

[0045] FIG. 5 illustrates normalised dispersion function v.sub.N(Θ) for different Ambisonics orders N and for angles θ∈[0,π].

DESCRIPTION OF EMBODIMENTS

A. Improved HOA Compression

[0046] The compression processing according to the invention, which is based on EP 12306569.0, is illustrated in FIG. 1 where the signal processing blocks that have been modified or newly introduced compared to EP 12306569.0 are presented with a bold box, and where ‘ custom-character ’ (direction estimates as such) and ‘C’ in this application correspond to ‘A’ (matrix of direction estimates) and ‘D’ in EP 12306569.0, respectively.

For the HOA compression a frame-wise processing with non-overlapping input frames C(k) of HOA coefficient sequences of length L is used, where k denotes the frame index. The frames are defined with respect to the HOA coefficient sequences specified in equation (45) as

[00001] $\begin{matrix} C (k) := [\begin{matrix} c ((k L + 1) T_{S}) & c ((k L + 2) T_{S}) & c ((k + 1) L T_{S}) \end{matrix}], & (1) \end{matrix}$

where T.sub.s indicates the sampling period.
The first step or stage 11/12 in FIG. 1 is optional and consists of concatenating the non-overlapping k-th and the (k−1)-th frames of HOA coefficient sequences into a long frame {tilde over (C)}(k) as

[00002] $\begin{matrix} \tilde{C} (k) := [\begin{matrix} C (k - 1) & C (k) \end{matrix}], & (2) \end{matrix}$

which long frame is 50% overlapped with an adjacent long frame and which long frame is successively used for the estimation of dominant sound source directions. Similar to the notation for {tilde over (C)}(k), the tilde symbol is used in the following description for indicating that the respective quantity refers to long overlapping frames. If step/stage 11/12 is not present, the tilde symbol has no specific meaning.

[0047] In principle, the estimation step or stage 13 of dominant sound sources is carried out as proposed in EP 13305156.5, but with an important modification. The modification is related to the determination of the amount of directions to be detected, i.e. how many directional signals are supposed to be extracted from the HOA representation. This is accomplished with the motivation to extract directional signals only if it is perceptually more relevant than using instead additional HOA coefficient sequences for better approximation of the ambient HOA component. A detailed description of this technique is given in section A.2.

[0048] The estimation provides a data set custom-character .sub.DIR,ACT(k).Math.{1, . . . , D} of indices of directional signals that have been detected as well as the set .sub.Ω,ACT(k) of corresponding direction estimates. D denotes the maximum number of directional signals that has to be set before starting the HOA compression.

[0049] In step or stage 14, the current (long) frame {tilde over (C)}(k) of HOA coefficient sequences is decomposed (as proposed in EP 13305156.5) into a number of directional signals X.sub.DIR(k−2) belonging to the directions contained in the set custom-character .sub.Ω,ACT(k), and a residual ambient HOA component C.sub.AMB(k−2). The delay of two frames is introduced as a result of overlap-add processing in order to obtain smooth signals. It is assumed that X.sub.DIR(k−2) is containing a total of D channels, of which however only those corresponding to the active directional signals are non-zero. The indices specifying these channels are assumed to be output in the data set custom-character .sub.DIR,ACT(k−2). Additionally, the decomposition in step/stage 14 provides some parameters ζ(k−2) which are used at decompression side for predicting portions of the original HOA representation from the directional signals (see EP 13305156.5 for more details).

[0050] In step or stage 15, the number of coefficients of the ambient HOA component C.sub.AMB(k−2) is intelligently reduced to contain only O.sub.RED+D−N.sub.DIR,ACT(k−2) non-zero HOA coefficient sequences, where N.sub.DIR,ACT(k−2)=| custom-character .sub.DIR,ACT(k−2)| indicates the cardinality of the data set .sub.DIR,ACT (k−2), i.e. the number of active directional signals in frame k−2. Since the ambient HOA component is assumed to be always represented by a minimum number O.sub.RED of HOA coefficient sequences, this problem can be actually reduced to the selection of the remaining D−N.sub.DIR,ACT(k−2) HOA coefficient sequences out of the possible O−O.sub.RED ones. In order to obtain a smooth reduced ambient HOA representation, this choice is accomplished such that, compared to the choice taken at the previous frame k−3, as few changes as possible will occur.

[0051] In particular, the three following cases are to be differentiated: [0052] a) N.sub.DIR,ACT(k−2)=N.sub.DIR,ACT(k−3): In this case the same HOA coefficient sequences are assumed to be selected as in frame k−3. [0053] b) N.sub.DIR,ACT(k−2)<N.sub.DIR,ACT(k−3): In this case, more HOA coefficient sequences than in the last frame k−3 can be used for representing the ambient HOA component in the current frame. Those HOA coefficient sequences that were selected in k−3 are assumed to be also selected in the current frame. The additional HOA coefficient sequences can be selected according to different criteria. For instance, selecting those HOA coefficient sequences in C.sub.AMB(k−2) with the highest average power, or selecting the HOA coefficients sequences with respect to their perceptual significance. [0054] c) N.sub.DIR,ACT(k−2)>N.sub.DIR,ACT(k−3): In this case, less HOA coefficient sequences than in the last frame k−3 can be used for representing the ambient HOA component in the current frame. The question to be answered here is which of the previously selected HOA coefficient sequences have to be deactivated. A reasonable solution is to deactivate those sequences which were assigned to the channels i∈ custom-character .sub.DIR,ACT(k−2) at the signal assigning step or stage 16 at frame k−3.

[0055] For avoiding discontinuities at frame borders when additional HOA coefficient sequences are activated or deactivated, it is advantageous to smoothly fade in or out the respective signals.

[0056] The final ambient HOA representation with the reduced number of O.sub.RED+N.sub.DIR,ACT(k−2)non-zero coefficient sequences is denoted by C.sub.AMB,RED(k−2). The indices of the chosen ambient HOA coefficient sequences are output in the data set custom-character .sub.AMB,ACT(k−2).

[0057] In step/stage 16, the active directional signals contained in X.sub.DIR(k−2) and the HOA coefficient sequences contained in C.sub.AMB,RED(k−2) are assigned to the frame Y(k−2) of I channels for individual perceptual encoding. To describe the signal assignment in more detail, the frames X.sub.DIR(k−2), Y(k−2) and C.sub.AMB,RED(k−2) are assumed to consist of the individual signals x.sub.DIR,d(k−2), d∈{1, . . . , D}, y.sub.i(k−2), i∈{1, . . . , I} and c.sub.AMB,RED,o(k−2), o∈{1, . . . , 0} as follows:

[00003] $\begin{matrix} X_{D I R} (k - 2) = [\begin{matrix} x_{DIR, 1} (k - 2) \\ x_{DIR, 2} (k - 2) \\ .Math. \\ x_{DIR, D} (k - 2) \end{matrix}], C_{AMB, RED} (k - 2) = [\begin{matrix} c_{AMB, RED, 1} (k - 2) \\ c_{AMB, RED, 2} (k - 2) \\ .Math. \\ c_{AMB, RED, O} (k - 2) \end{matrix}], Y (k - 2) = [\begin{matrix} y_{1} (k - 2) \\ y_{2} (k - 2 \\ .Math. \\ y_{I} (k - 2) \end{matrix}] . & (3) \end{matrix}$

[0058] The active directional signals are assigned such that they keep their channel indices in order to obtain continuous signals for the successive perceptual coding. This can be expressed by

[00004] $\begin{matrix} y_{d} (k - 2) = x_{DIR, d} (k - 2) for all d \in {��}_{DIR, ACT} (k - 2) . & (4) \end{matrix}$

[0059] The HOA coefficient sequences of the ambient component are assigned such the minimum number of O.sub.RED coefficient sequences is always contained in the last O.sub.RED signals of Y(k−2), i.e.

[00005] $\begin{matrix} y_{D + o} (k - 2) = c_{AMB, RED, o} (k - 2) for 1 \leq o \leq O_{R E D} . & (5) \end{matrix}$

[0060] For the additional D−N.sub.DIR,ACT(k−2) HOA coefficient sequences of the ambient component it is to be differentiated whether or not they were also selected in the previous frame: [0061] a) If they were also selected to be transmitted in the previous frame, i.e. if the respective indices are also contained in data set custom-character .sub.AMB,ACT(k−3) the assignment of these coefficient sequences to the signals in Y(k−2) is the same as for the previous frame. This operation assures smooth signals y.sub.i(k−2), which is favourable for the successive perceptual coding in step or stage 17. [0062] b) Otherwise, if some coefficient sequences are newly selected, i.e. if their indices are contained in data set custom-character .sub.AMB,ACT(k−2) but not in data set .sub.AMB,ACT(k−3) they are first arranged with respect to their indices in an ascending order and are in this order assigned to channels i.Math..sub.DIR,ACT(k−2) of Y(k−2) which are not yet occupied by directional signals. [0063] This specific assignment offers the advantage that, during a HOA decompression process, the signal redistribution and composition can be performed without the knowledge about which ambient HOA coefficient sequence is contained in which channel of Y(k−2). Instead, the assignment can be reconstructed during HOA decompression with the mere knowledge of the data sets custom-character .sub.AMB,ACT(k−2) and .sub.DIR,ACT (k).

[0064] Advantageously, this assigning operation also provides the assignment vector γ(k)∈ custom-character .sup.D−N.sup.DIR,ACT.sup.(k−2), whose elements γ.sub.o(k), o=1, . . . , D−N.sub.DIR,ACT(k−2), denote the indices of each one of the additional D−N.sub.DIR,ACT(k−2) HOA coefficient sequences of the ambient component. To say it differently, the elements of the assignment vector γ(k) provide information about which of the additional O−O.sub.RED HOA coefficient sequences of the ambient HOA component are assigned into the D−N.sub.DIR,ACT(k−2) channels with inactive directional signals. This vector can be transmitted additionally, but less frequently than by the frame rate, in order to allow for an initialisation of the re-distribution procedure performed for the HOA decompression (see section B). Perceptual coding step/stage 17 encodes the I channels of frame Y(k−2) and outputs an encoded frame custom-character (k−2).

[0065] For frames for which vector γ(k) is not transmitted from step/stage 16, at decompression side the data parameter sets custom-character .sub.DIR,ACT (k) and .sub.AMB,ACT(k−2) instead of vector γ(k) are used for the performing the re-distribution.

A.1 Estimation of the Dominant Sound Source Directions

[0066] The estimation step/stage 13 for dominant sound source directions of FIG. 1 is depicted in FIG. 2 in more detail. It is essentially performed according to that of EP 13305156.5, but with a decisive difference, which is the way of determining the amount of dominant sound sources, corresponding to the number of directional signals to be extracted from the given HOA representation. This number is significant because it is used for controlling whether the given HOA representation is better represented either by using more directional signals or instead by using more HOA coefficient sequences to better model the ambient HOA component.

[0067] The dominant sound source directions estimation starts in step or stage 21 with a preliminary search for the dominant sound source directions, using the long frame {tilde over (C)}(k) of input HOA coefficient sequences. Along with the preliminary direction estimates {tilde over (Ω)}.sub.DOM.sup.(d)(k), 1≤d≤D, the corresponding directional signals {tilde over (x)}.sub.DOM.sup.(d)(k) and the HOA sound field components {tilde over (C)}.sub.DOM,CORR.sup.(d)(k), which are supposed to be created by the individual sound sources, are computed as described in EP 13305156.5. In step or stage 22, these quantities are used together with the frame {tilde over (C)}(k) of input HOA coefficient sequences for determining the number {tilde over (D)}(k) of directional signals to be extracted. Consequently, the direction estimates {tilde over (Ω)}.sub.DOM.sup.(d)(k), {tilde over (D)}(k)<d≤D, the corresponding directional signals {tilde over (x)}.sub.DOM.sup.(d)(k), and HOA sound field components {tilde over (C)}.sub.DOM,CORR.sup.(d)(k) are discarded. Instead, only the direction estimates {tilde over (Ω)}.sub.DOM.sup.(d)(k), 1≤d≤{tilde over (D)}(k) are then assigned to previously found sound sources.

[0068] In step or stage 23, the resulting direction trajectories are smoothed according to a sound source movement model and it is determined which ones of the sound sources are supposed to be active (see EP 13305156.5). The last operation provides the set custom-character .sub.DIR,ACT(k) of indices of active directional sound sources and the set .sub.ΩACT(k) of the corresponding direction estimates.

A.2 Determination of Number of Extracted Directional Signals

[0069] For determining the number of directional signals in step/stage 22, the situation is assumed that there is a given total amount of I channels which are to be exploited for capturing the perceptually most relevant sound field information. Therefore, the number of directional signals to be extracted is determined, motivated by the question whether for the overall HOA compression/decompression quality the current HOA representation is represented better by using either more directional signals, or more HOA coefficient sequences for a better modelling of the ambient HOA component.

[0070] To derive in step/stage 22 a criterion for the determination of the number of directional sound sources to be extracted, which criterion is related to the human perception, it is taken into consideration that HOA compression is achieved in particular by the following two operations: [0071] reduction of HOA coefficient sequences for representing the ambient HOA component (which means reduction of the number of related channels); [0072] perceptual encoding of the directional signals and of the HOA coefficient sequences for representing the ambient HOA component.

[0073] Depending on the number M, 0≤M≤D, of extracted directional signals, the first operation results in the approximation

[00006] $\begin{matrix} \tilde{C} (k) \approx {\tilde{C}}^{(M)} (k) & (6) \\ := {\tilde{C}}_{D I R}^{(M)} (k) + {\tilde{C}}_{A M B, R E D}^{(M)} (k) & (7) \\ where {\tilde{C}}_{D I R}^{(M)} (k) := {.Math.}_{d = 1}^{M} {\tilde{C}}_{D O M, C O R R}^{(d)} (k), & (8) \end{matrix}$

denotes the HOA representation of the directional component consisting of the HOA sound field components {tilde over (C)}.sub.DOM,CORR.sup.(d)(k), 1≤d≤M, supposed to be created by the M individually considered sound sources, and {tilde over (C)}.sub.AMB,RED.sup.(M)(k) denotes the HOA representation of the ambient component with only I−M non-zero HOA coefficient sequences.

[0074] The approximation from the second operation can be expressed by

[00007] $\begin{matrix} \tilde{C} (k) \approx {\hat{\tilde{C}}}^{(M)} (k) & (9) \\ := {\hat{\tilde{C}}}_{D I R}^{(M)} (k) + {\hat{\tilde{C}}}_{AMB, RED}^{(M)} (k) & (10) \end{matrix}$

where {tilde over (Ĉ)}.sub.DIR.sup.(M)(k) and {tilde over (Ĉ)}.sub.AMB,RED.sup.(M)(k) denote the composed directional and ambient HOA components after perceptual decoding, respectively.

Formulation of Criterion

[0075] The number {tilde over (D)}(k) of directional signals to be extracted is chosen such that the total approximation error

[00008] $\begin{matrix} {\hat{\tilde{E}}}^{(M)} (k) := \tilde{C} (k) - {\hat{\tilde{C}}}^{(M)} (k) & (11) \end{matrix}$

with M={tilde over (D)}(k) is as less significant as possible with respect to the human perception. To assure this, the directional power distribution of the total error for individual Bark scale critical bands is considered at a predefined number Q of test directions Ω.sub.q, q=1, . . . , Q, which are nearly uniformly distributed on the unit sphere. To be more specific, the directional power distribution for the b-th critical band, b=1, . . . , B, is represented by the vector

[00009] $\begin{matrix} {\hat{\tilde{��}}}^{(M)} = (k, b) := {[{\hat{\tilde{��}}}_{1}^{(M)} (k, b) {\hat{\tilde{��}}}_{2}^{(M)} (k, b) .Math. {\hat{\tilde{��}}}_{Q}^{(M)} (k, b)]}^{T}, & (12) \end{matrix}$

whose components custom-character .sub.q.sup.(M)(k,b) denote the power of the total error {tilde over (Ê)}.sup.(M))(k) related to the direction Ω.sub.q, the b-th Bark scale critical band and the k-th frame. The directional power distribution .sup.(M)(k,b) of the total error {tilde over (Ê)}.sup.(M)(k) is compared with the directional perceptual masking power distribution

[00010] $\begin{matrix} {\tilde{��}}_{MASK} (k, b) := {[\begin{matrix} {\tilde{P}}_{M A S K, 1} (k, b) & {\tilde{P}}_{M A S K, 2} (k, b) & .Math. & {\tilde{P}}_{M A S K, Q} (k, b) \end{matrix}]}^{T} & (13) \end{matrix}$

due to the original HOA representation {tilde over (C)}(k). Next, for each test direction Ω.sub.q and critical band b the level of perception custom-character .sub.q.sup.(M)(k,b) of the total error is computed. It is here essentially defined as the ratio of the directional power of the total error {tilde over (Ê)}.sup.(M)(k) and the directional masking power according to

[00011] $\begin{matrix} {\tilde{ℒ}}_{q}^{(M)} (k, b) := \max (0, \frac{{\hat{\tilde{��}}}_{q}^{(M)} (k, b)}{{\tilde{��}}_{MASK, q} (k, b)} - 1) . & (14) \end{matrix}$

The subtraction of ‘1’ and the successive maximum operation is performed to ensure that the perception level is zero, as long as the error power is below the masking threshold.
Finally, the number {tilde over (D)}(k) of directionals signals to be extracted can be chosen to minimise the average over all test directions of the maximum of the error perception level over all critical bands, i.e.

[00012] $\begin{matrix} \tilde{D} (k) = \underset{M}{argmin} \frac{1}{Q} {.Math.}_{q = 1}^{Q} \max_{b} {\tilde{ℒ}}_{q}^{(M)} (k, b) . & (15) \end{matrix}$

It is noted that, alternatively, it is possible to replace the maximum by an averaging operation in equation (15).

Computation of the Directional Perceptual Masking Power Distribution

[0076] For the computation of the directional perceptual masking power distribution custom-character .sub.MASK(k,b) due to the original HOA representation {tilde over (C)}(k), the latter is transformed to the spatial domain in order to be represented by general plane waves {tilde over (v)}.sub.q(k) impinging from the test directions Ω.sub.q, q=1, . . . , Q. When arranging the general plane wave signals {tilde over (v)}.sub.q(k) in the matrix {tilde over (V)}(k) as

[00013] $\begin{matrix} \tilde{V} (k) = [\begin{matrix} {\tilde{v}}_{1} (k) \\ {\tilde{v}}_{2} (k) \\ .Math. \\ {\tilde{v}}_{Q} (k) \end{matrix}], & (16) \end{matrix}$

the transformation to the spatial domain is expressed by the operation

{tilde over (V)}(k),Ξ.sup.T{tilde over (C)}(k), (17)

where Ξ denotes the mode matrix with respect to the test direction Ω.sub.q, q=1, . . . ,Q, defined by

[00014] $\begin{matrix} Ξ := [\begin{matrix} S_{1} & S_{2} & .Math. & S_{Q} \end{matrix}] \in ℝ^{O \times Q} with & (18) \\ S_{q} := {[\begin{matrix} S_{0}^{0} (Ω_{q}) & S_{- 1}^{- 1} (Ω_{q}) & S_{- 1}^{0} (Ω_{q}) & S_{- 1}^{1} (Ω_{q}) & S_{- 2}^{- 2} (Ω_{q}) & .Math. & S_{N}^{N} (Ω_{q}) \end{matrix}]}^{T} \in ℝ^{o} . & (19) \end{matrix}$

The elements custom-character .sub.MASK(k,b) of the directional perceptual masking power distribution .sub.MASK(k,b), due to the original HOA representation {tilde over (C)}(k), are corresponding to the masking powers of the general plane wave functions {tilde over (v)}.sub.q(k) for individual critical bands b.

Computation of Directional Power Distribution

[0077] In the following two alternatives for the computation of the directional power distribution custom-character .sup.(M)(k,b) are presented: [0078] a. One possibility is to actually compute the approximation {tilde over (Ĉ)}.sup.(M)(k) of the desired HOA representation {tilde over (C)}(k) by performing the two operations mentioned at the beginning of section A.2. Then the total approximation error {tilde over (Ê)}.sup.(M)(k) is computed according to equation (11). Next, the total approximation error {tilde over (Ê)}.sup.(M)(k) is transformed to the spatial domain in order to be represented by general plane waves {tilde over (ŵ)}.sub.q.sup.(M)(k) impinging from the test directions Ω.sub.q, q=1, . . . ,Q. Arranging the general plane wave signals in the matrix {tilde over (Ŵ)}.sup.(M)(k) as

[00015] $\begin{matrix} {\hat{\tilde{W}}}^{(M)} (k) = [\begin{matrix} {\hat{\tilde{w}}}_{1}^{(M)} (k) \\ {\hat{\tilde{w}}}_{2}^{(M)} (k) \\ .Math. \\ {\hat{\tilde{w}}}_{Q}^{(M)} (k) \end{matrix}], & (20) \end{matrix}$

the transformation to the spatial domain is expressed by the operation

[00016] $\begin{matrix} {\overset{\hat{~}}{W}}^{(M)} (k) = Ξ^{T} {\overset{\hat{~}}{E}}^{(M)} (k) . & (21) \end{matrix}$

The elements custom-character .sub.q.sup.(M)(k,b) of the directional power distribution .sup.(M)(k,b) of the total approximation error {tilde over (Ê)}.sup.(M)(k) are obtained by computing the powers of the general plane wave functions {tilde over (ŵ)}.sub.q.sup.(M)(k), =1, . . . ,Q, within individual critical bands b.
b. The alternative solution is to compute only the approximation {tilde over (C)}.sup.(M)(k) instead of {tilde over (Ĉ)}.sup.(M)(k). This method offers the advantage that the complicated perceptual coding of the individual signals needs not be carried out directly. Instead, it is sufficient to know the powers of the perceptual quantisation error within individual Bark scale critical bands. For this purpose, the total approximation error defined in equation (11) can be written as a sum of the three following approximation errors:

[00017] $\begin{matrix} {\tilde{E}}^{(M)} (k) := \tilde{C} (k) - {\tilde{C}}^{(M)} (k) & (22) \\ {\hat{\tilde{E}}}_{D I R}^{(M)} (k) := {\tilde{C}}_{D I R}^{(M)} (k) - {\hat{\tilde{C}}}_{D I R}^{(M)} (k) & (23) \\ {\hat{\tilde{E}}}_{AMB, RED}^{(M)} (k) := {\tilde{C}}_{AMB, RED}^{(M)} (k) - {\hat{\tilde{C}}}_{AMB, RED}^{(M)} (k), & (24) \end{matrix}$

which can be assumed to be independent of each other. Due to this independence, the directional power distribution of the total error {tilde over (Ê)}.sup.(M)(k) can be expressed as the sum of the directional power distributions of the three individual errors {tilde over (E)}.sup.(M)(k), {tilde over (Ê)}.sub.DIR.sup.(M)(k) and {tilde over (Ê)}.sub.AMB,RED.sup.(M)(k).

[0079] The following describes how to compute the directional power distributions of the three errors for individual Bark scale critical bands: [0080] a. To compute the directional power distribution of the error {tilde over (E)}.sup.(M)(k), it is first transformed to the spatial domain by

[00018] $\begin{matrix} {\tilde{W}}^{(M)} (k) = Ξ^{T} {\tilde{E}}^{(M)} (k), & (25) \end{matrix}$

wherein the approximation error {tilde over (E)}.sup.(M)(k) is hence represented by general plane waves {tilde over (w)}.sub.q.sup.(M)(k) impinging from the test directions Ω.sub.q, q=1, . . . ,Q, which are arranged in the matrix {tilde over (W)}.sup.(M)(k) according to

[00019] $\begin{matrix} {\tilde{W}}^{(M)} (k) = [\begin{matrix} {\tilde{w}}_{1}^{(M)} (k) \\ {\tilde{w}}_{2}^{(M)} (k) \\ .Math. \\ {\tilde{w}}_{Q}^{(M)} (k) \end{matrix}] . & (26) \end{matrix}$

Consequently, the elements custom-character .sub.q.sup.(M)(k,b) of the directional power distribution .sup.(M)(k,b) of the approximation error {tilde over (E)}.sup.(M)(k) are obtained by computing the powers of the general plane wave functions {tilde over (w)}.sub.q.sup.(M)(k), q=1, . . . , Q, within individual critical bands b. [0081] b. For computing the directional power distribution custom-character .sub.DIR.sup.(M)(k,b) of the error {tilde over (Ê)}.sub.DIR.sup.(M)(k), it is to be borne in mind that this error is introduced into the directional HOA component {tilde over (C)}.sub.DIR.sup.(M)(k) by perceptually coding the directional signals {tilde over (x)}.sub.DOM.sup.(d)(k), 1≤d≤M. Further, it is to be considered that the directional HOA component is given by equation (8). Then for simplicity it is assumed that the HOA component {tilde over (C)}.sub.DOM,CORR.sup.(d)(k) is equivalently represented in the spatial domain by 0 general plane wave functions {tilde over (v)}.sub.GRID,o.sup.(d)(k), which are created from the directional signal {tilde over (x)}.sub.DOM.sup.(d)(k) by a mere scaling, i.e.

[00020] $\begin{matrix} {\tilde{ν}}_{GRID, o}^{(d)} (k) = α_{o}^{(d)} (k) {\tilde{x}}_{D O M}^{(d)} (k), & (27) \end{matrix}$

where α.sub.o.sup.(d)(k), o=1, . . . , 0, denote the scaling parameters. The respective plane wave directions {tilde over (Ω)}.sub.ROT,o.sup.(d)(k), o=1, . . . , 0, are assumed to be uniformly distributed on the unit sphere and rotated such that {tilde over (Ω)}.sub.ROT,1.sup.(d)(k) corresponds to the direction estimate {tilde over (Ω)}.sub.DOM.sup.(d)(k). Hence, the scaling parameter α.sub.1.sup.(d)(k) is equal to ‘1’.
When defining Ξ.sub.GRID.sup.(d)(k) to be the mode matrix with respect to the rotated directions {tilde over (Ω)}.sub.ROT,o.sup.(d)(k), o=1, . . . , 0, and arranging all scaling parameters α.sub.o.sup.(d) (k) in a vector according to

[00021] $\begin{matrix} α^{(d)} (k) : = {[\begin{matrix} 1 & α_{2}^{(d)} (k) & α_{3}^{(d)} (k) & .Math. & α_{o}^{(d)} (k) \end{matrix}]}^{T} \in ℝ^{o}, & (28) \end{matrix}$

the HOA component {tilde over (C)}.sub.DOM,CORR.sup.(d)(k) can be written as

[00022] $\begin{matrix} {\tilde{C}}_{DOM, CORR}^{(d)} (k) = Ξ_{GRID}^{(d)} (k) α^{(d)} (k) {\tilde{x}}_{DOM}^{(d)} (k) . & (29) \end{matrix}$

Consequently, the error {tilde over (Ê)}.sub.DIR.sup.(M)(k) (see equation (23)) between the true directional HOA component

[00023] $\begin{matrix} {\tilde{C}}_{DIR}^{(M)} (k) = {.Math.}_{d = 1}^{M} {\tilde{C}}_{DOM, CORR}^{(d)} (k) & (30) \end{matrix}$

and that composed from the perceptually decoded directional signals {tilde over ({circumflex over (x)})}.sub.DOM.sup.(d)(k), d=1, . . . , M, by

[00024] $\begin{matrix} {\overset{\hat{~}}{C}}_{DIR}^{(M)} (k) = {.Math.}_{d = 1}^{M} {\overset{\hat{~}}{C}}_{DOM, CORR}^{(d)} (k) & (31) \\ := {.Math.}_{d = 1}^{M} Ξ_{GRID}^{(d)} (k) α^{(d)} (k) {\overset{\hat{~}}{x}}_{DOM}^{(d)} (k) & (32) \end{matrix}$

can be expressed in terms of the perceptual coding errors

[00025] $\begin{matrix} {\overset{\hat{~}}{e}}_{DOM}^{(d)} (k) : = {\tilde{x}}_{DOM}^{(d)} (k) - {\overset{\hat{~}}{x}}_{DOM}^{(d)} (k) & (33) \end{matrix}$

in the individual directional signals by

[00026] $\begin{matrix} {\overset{\hat{~}}{E}}_{D I R}^{(M)} (k) = {.Math.}_{d = 1}^{M} Ξ_{GRID}^{(d)} (k) α^{(d)} (k) {\overset{\hat{~}}{e}}_{D O M}^{(d)} (k) . & (34) \end{matrix}$

The representation of the error {tilde over (Ê)}.sub.DIR.sup.(M)(k) in the spatial domain with respect to the test directions Ω.sub.q, q=1, . . . , Q, is given by

[00027] $\begin{matrix} {\overset{\hat{~}}{W}}_{DIR, q}^{(M)} (d) = {.Math.}_{d = 1}^{M} \underset{\underset{= : β^{(d)} (k)}{︸}}{Ξ^{T} Ξ_{GRID}^{(d)} (k) α^{(d)} (k)} {\overset{\hat{~}}{e}}_{DOM}^{(d)} (k) . & (35) \end{matrix}$

Denoting the elements of the vector β.sup.(d)(k) by β.sub.q.sup.(d) (k), q=1, . . . , Q, and assuming the individual perceptual coding errors {tilde over (ê)}.sub.DOM.sup.(d)(k), d=1, . . . , M, to be independent of each other, it follows from equation (35) that the elements custom-character .sub.DIR,q.sup.(M)(k,b) of the directional power distribution .sub.DIR.sup.(M)(k,b) of the perceptual coding error {tilde over (Ê)}.sub.DIR.sup.(M)(k) can be computed by

[00028] $\begin{matrix} {\hat{\tilde{P}}}_{D I R, q}^{(M)} (k, b) = {.Math.}_{d = 1}^{M} {(β_{q}^{(d)} (k))}^{2} {\tilde{σ}}_{DIR, d}^{2} (k, b) . & (36) \end{matrix}$

{tilde over (σ)}.sub.DIR,d(k,b) is supposed to represent the power of the perceptual quantisation error within the b-th critical band in the directional signal {tilde over ({circumflex over (x)})}.sub.DOM.sup.(d)(k). This power can be assumed to correspond to the perceptual masking power of the directional signal {tilde over (x)}.sub.DOM.sup.(d)(k). [0082] c. For computing the directional power distribution custom-character .sub.AMB,RED.sup.(M)(k,b) of the error {tilde over (Ê)}.sub.AMB,RED.sup.(M)(k) resulting from the perceptual coding of the HOA coefficient sequences of the ambient HOA component, each HOA coefficient sequence is assumed to be coded independently. Hence, the errors introduced into the individual HOA coefficient sequences within each Bark scale critical band can be assumed to be uncorrelated. This means that the inter-coefficient correlation matrix of the error {tilde over (Ê)}.sub.AMB,RED.sup.(M)(k) with respect to each Bark scale critical band is diagonal, i.e.

[00029] $\begin{matrix} {\tilde{.Math.}}_{AMB, RED}^{(M)} (k, b) = diag ({\tilde{σ}}_{AMB, RED, 1}^{2 (M)} (k, b), {\tilde{σ}}_{AMB, RED, 2}^{2 (M)} (k, b), {\tilde{σ}}_{AMB, RED, O}^{2 (M)} (k, b)) . & (37) \end{matrix}$

The elements {tilde over (σ)}.sub.AMB,RED,o.sup.2(M)(k,b), o=1, . . . , 0, are supposed to represent the power of the perceptual quantisation error within the b-th critical band in the o-th coded HOA coefficient sequence in {tilde over (Ĉ)}.sub.AMB,RED.sup.(M)(k). They can be assumed to correspond to the perceptual masking power of the o-th HOA coefficient sequence {tilde over (C)}.sub.AMB,RED.sup.(M)(k). The directional power distribution of the perceptual coding error {tilde over (Ê)}.sub.AMB,RED.sup.(M)(k) is thus computed by

[00030] $\begin{matrix} {\hat{\tilde{P}}}_{AMB, RED}^{(M)} (k, b) = = d iag (Ξ^{T} {\tilde{.Math.}}_{AMB, RED}^{(M)} (k, b) Ξ) . & (38) \end{matrix}$

B. Improved HOA Decompression

[0083] The corresponding HOA decompression processing is depicted in FIG. 3 and includes the following steps or stages.

In step or stage 31 a perceptual decoding of the I signals contained in custom-character (k−2) is performed in order to obtain the I decoded signals in Ŷ(k−2).

[0084] In signal re-distributing step or stage 32, the perceptually decoded signals in Ŷ(k−2) are re-distributed in order to recreate the frame {circumflex over (X)}.sub.DIR(k−2) of directional signals and the frame Ĉ.sub.AMB,RED(k−2) of the ambient HOA component. The information about how to re-distribute the signals is obtained by reproducing the assigning operation performed for the HOA compression, using the index data sets custom-character .sub.DIR,ACT(k) and .sub.AMB,ACT(k−2). Since this is a recursive procedure (see section A), the additionally transmitted assignment vector γ(k) can be used in order to allow for an initialisation of the re-distribution procedure, e.g. in case the transmission is breaking down.

[0085] In composition step or stage 33, a current frame Ĉ(k−3) of the desired total HOA representation is re-composed (according to the processing described in connection with FIG. 2b and FIG. 4 of EP 12306569.0 using the frame {circumflex over (X)}.sub.DIR(k−2) of the directional signals, the set custom-character .sub.DIR,ACT(k) of the active directional signal indices together with the set .sub.ΩAcT(k) of the corresponding directions, the parameters ζ(k−2) for predicting portions of the HOA representation from the directional signals, and the frame Ĉ.sub.AMB,RED(k−2) of HOA coefficient sequences of the reduced ambient HOA component. Ĉ.sub.AMB,RED(k−2) corresponds to component {circumflex over (D)}.sub.A(k−2) in EP 12306569.0, and custom-character .sub.Ω,ACT(k) and .sub.DIR,ACT(k) correspond to A.sub.{circumflex over (Ω)}(k) in EP 12306569.0, wherein active directional signal indices are marked in the matrix elements of A.sub.{circumflex over (Ω)}(k). I.e., directional signals with respect to uniformly distributed directions are predicted from the directional signals ({circumflex over (X)}.sub.DIR (k−2)) using the received parameters (ζ(k−2)) for such prediction, and thereafter the current decompressed frame (Ĉ(k−3)) is re-composed from the frame of directional signals ({circumflex over (X)}.sub.DIR(k−2)), the predicted portions and the reduced ambient HOA component (Ĉ.sub.AMB,RED(k−2)).

C. Basics of Higher Order Ambisonics

[0086] Higher Order Ambisonics (HOA) is based on the description of a sound field within a compact area of interest, which is assumed to be free of sound sources. In that case the spatiotemporal behaviour of the sound pressure p(t,x) at time t and position x within the area of interest is physically fully determined by the homogeneous wave equation. In the following a spherical coordinate system as shown in FIG. 4 is assumed. In the used coordinate system, the x axis points to the frontal position, the y axis points to the left, and the z axis points to the top. A position in space x=(r,θ, ϕ).sup.T is represented by a radius r>0 (i.e. the distance to the coordinate origin), an inclination angle θ∈[0, π] measured from the polar axis z and an azimuth angle ϕ∈[0,2π[ measured counter-clockwise in the x−y plane from the x axis. Further, (⋅).sup.T denotes the transposition.

[0087] It can be shown (see E. G. Williams, “Fourier Acoustics”, volume 93 of Applied Mathematical Sciences, Academic Press, 1999) that the Fourier transform of the sound pressure with respect to time denoted by custom-character .sub.t(⋅), i.e.

[00031] $\begin{matrix} P (ω, x) = ℱ_{t} (p (t, x)) = \int_{- \infty}^{\infty} p (t, x) e^{- i ω t} dt, & (39) \end{matrix}$

with ω denoting the angular frequency and i indicating the imaginary unit, can be expanded into a series of Spherical Harmonics according to

[00032] $\begin{matrix} P (ω = k c_{s}, r, θ, ϕ) = {.Math.}_{n = 0}^{n} {.Math.}_{m = - n}^{n} A_{n}^{m} (k) j_{n} (k r) S_{n}^{m} (θ, ϕ) . & (40) \end{matrix}$

[0088] In equation (40), c.sub.s denotes the speed of sound and k denotes the angular wave number, which is related to the angular frequency ω by

[00033] $k = \frac{ω}{c_{s}} .$

Further, j.sub.n(⋅) the spherical Bessel functions of the first kind and S.sub.n.sup.m(θϕ) denote the real valued Spherical Harmonics of order n and degree m, which are defined in below section C.1. The expansion coefficients A.sub.n.sup.m(k) are depending only on the angular wave number k. In the foregoing it has been implicitly assumed that sound pressure is spatially band-limited. Thus, the series of Spherical Harmonics is truncated with respect to the order index n at an upper limit N, which is called the order of the HOA representation.

[0089] If the sound field is represented by a superposition of an infinite number of harmonic plane waves of different angular frequencies ω arriving from all possible directions specified by the angle tuple (θϕ), it can be shown (see B. Rafaely, “Plane-wave Decomposition of the Sound Field on a Sphere by Spherical Convolution”, Journal of the Acoustical Society of America, vol. 4(116), pages 2149-2157, 2004) that the respective plane wave complex amplitude function C(ω, θϕ) can be expressed by the following Spherical Harmonics expansion

[00034] $\begin{matrix} C (ω = k c_{S}, θ, ϕ) = {.Math.}_{n = 0}^{N} {.Math.}_{m = - n}^{n} C_{n}^{m} (k) S_{n}^{m} (θ, ϕ), & (41) \end{matrix}$

where the expansion coefficients C.sub.n.sup.m(k) are related to the expansion coefficients

[00035] $\begin{matrix} A_{n}^{m} (k) by A_{n}^{m} (k) = 4 π i^{n} C_{n}^{m} (k) . & (42) \end{matrix}$

Assuming the individual coefficients C.sub.n.sup.m(ω=kc.sub.s) to be functions of the angular frequency ω, the application of the inverse Fourier transform (denoted by custom-character .sup.−1(⋅)) provides time domain functions

[00036] $\begin{matrix} c_{n}^{m} (t) = ℱ_{t}^{- 1} (C_{n}^{m} (ω / c_{s})) = \frac{1}{2 π} \int_{- \infty}^{\infty} C_{n}^{m} (\frac{ω}{c_{s}}) e^{i ω t} d ω & (43) \end{matrix}$

for each order n and degree m, which can be collected in a single vector c(t) by

[00037] $\begin{matrix} c (t) = {[\begin{matrix} c_{0}^{0} (t) & c_{1}^{- 1} (t) & c_{1}^{0} (t) & c_{1}^{1} (t) & c_{2}^{- 2} (t) & c_{2}^{- 1} (t) & c_{2}^{0} (t) & c_{2}^{1} (t) & c_{2}^{2} (t) & .Math. & c_{N}^{N - 1} (t) & c_{N}^{N} (t) \end{matrix}]}^{T} . & (44) \end{matrix}$

[0090] The position index of a time domain function c.sub.n.sup.m(t) within the vector c(t) is given by n(n+1)+1+m. The overall number of elements in vector c(t) is given by O=(N+1).sup.2.

[0091] The final Ambisonics format provides the sampled version of c(t) using a sampling frequency f.sub.s as

[00038] $\begin{matrix} {c (l T_{S})}_{l \in ℕ} = {c (T_{S}), c (2 T_{S}), c (3 T_{S}), c (4 T_{S}), .Math.} & (45) \end{matrix}$

where T.sub.s=1/f.sub.s denotes the sampling period. The elements of c(lT.sub.s) are here referred to as Ambisonics coefficients. The time domain signals c.sub.n.sup.m(t) and hence the Ambisonics coefficients are real-valued.

C.1 Definition of Real-Valued Spherical Harmonics

[0092] The real-valued spherical harmonics S.sub.n.sup.m(θ, ϕ) are given by

[00039] $\begin{matrix} S_{n}^{m} (θ, ϕ) = \sqrt{\frac{(2 n + 1)}{4 π} \frac{(n - .Math. m .Math.)!}{(n + .Math. m .Math.)!}} P_{n, | m |} (c o s θ) {trg}_{m} (ϕ) & (46) \\ with {trg}_{m} (ϕ) = {\begin{matrix} \sqrt{2} \cos (m ϕ) & m > 0 \\ 1 & m = 0 \\ - \sqrt{2} \sin (m ϕ) & m < 0 \end{matrix} . & (47) \end{matrix}$

The associated Legendre functions P.sub.n,m(x) are defined as

[00040] $\begin{matrix} P_{n, m} (x) = {(1 - x^{2})}^{\frac{m}{2}} \frac{d^{m}}{d x^{m}} P_{n} (x), m \geq 0 & (48) \end{matrix}$

with the Legendre polynomial P.sub.n(x) and, unlike in the above-mentioned Williams article, without the Condon-Shortley phase term (−1).sup.m.

C.2 Spatial Resolution of Higher Order Ambisonics

[0093] A general plane wave function x(t) arriving from a direction Ω.sub.0=(θ.sub.0, ϕ.sub.0).sup.T is represented in HOA by

[00041] $\begin{matrix} c_{n}^{m} (t) = x (t) S_{n}^{m} (Ω_{0}), 0 \leq n \leq N, .Math. m .Math. \leq n . & (49) \end{matrix}$

The corresponding spatial density of plane wave amplitudes c(t, Ω):= custom-character .sub.t.sup.−1(C(ω, Ω)) is given by

[00042] $\begin{matrix} c (t, Ω) = {.Math.}_{n = 0}^{n} {.Math.}_{m = - n}^{n} c_{n}^{m} (t) S_{n}^{m} (Ω) & (50) \\ = x (t) \underset{v_{N} (Θ)}{\underset{︸}{[{.Math.}_{n = 0}^{n} {.Math.}_{m = - n}^{n} S_{n}^{m} (Ω_{0}) S_{n}^{m} (Ω)]}} . & (51) \end{matrix}$

[0094] It can be seen from equation (51) that it is a product of the general plane wave function x(t) and of a spatial dispersion function v.sub.N(Θ), which can be shown to only depend on the angle Θ between Ω and Ω.sub.0 having the property

[00043] $\begin{matrix} \cos Θ = \cos θ \cos θ_{0} + \cos (ϕ - ϕ_{0}) \sin θ \sin θ_{0} . & (52) \end{matrix}$

[0095] As expected, in the limit of an infinite order, i.e., N.fwdarw.∞, the spatial dispersion function turns into a Dirac delta δ(⋅), i.e.

[00044] $\begin{matrix} \lim_{N .fwdarw. \infty} v_{N} (Θ) = \frac{δ (Θ)}{2 π} . & (53) \end{matrix}$

[0096] However, in the case of a finite order N, the contribution of the general plane wave from direction Ω.sub.0 is smeared to neighbouring directions, where the extent of the blurring decreases with an increasing order. A plot of the normalised function v.sub.N(Θ) for different values of N is shown in FIG. 5.

[0097] It should be pointed out that for any direction Ω the time domain behaviour of the spatial density of plane wave amplitudes is a multiple of its behaviour at any other direction. In particular, the functions c(t, Ω.sub.1) and c(t, Ω.sub.2) for some fixed directions Ω and Ω.sub.2 are highly correlated with each other with respect to time t.

C.3 Spherical Harmonic Transform

[0098] If the spatial density of plane wave amplitudes is discretised at a number of 0 spatial directions Ω.sub.o, 1≤o≤0, which are nearly uniformly distributed on the unit sphere, 0 directional signals c(t, Ω.sub.o) are obtained. Collecting these signals into a vector as

c.sub.SPAT(t): =[c(t,Ω.sub.1) . . . c(t,Ω.sub.0)].sup.T, (54)

by using equation (50) it can be verified that this vector can be computed from the continuous Ambisonics representation d(t) defined in equation (44) by a simple matrix multiplication as

[00045] $\begin{matrix} c_{SPAT} (t) = Ψ^{H} c (t), & (55) \end{matrix}$

where (⋅).sup.H indicates the joint transposition and conjugation, and Ψ denotes a mode-matrix defined by

[00046] $\begin{matrix} Ψ : = [\begin{matrix} S_{1} & .Math. . & S_{o} \end{matrix}] with & (56) \\ S_{o} := [\begin{matrix} S_{0}^{0} (Ω_{o}) & S_{1}^{- 1} (Ω_{O}) & S_{1}^{0} (Ω_{O}) & S_{1}^{1} (Ω_{O}) & .Math. & S_{N}^{N - 1} (Ω_{o}) & S_{N}^{N} (Ω_{O}) \end{matrix}] . & (57) \end{matrix}$

[0099] Because the directions Ω.sub.o are nearly uniformly distributed on the unit sphere, the mode matrix is invertible in general. Hence, the continuous Ambisonics representation can be computed from the directional signals c(t, Ω.sub.o) by

[00047] $\begin{matrix} c (t) = Ψ^{- H} c_{SPAT} (t) . & (58) \end{matrix}$

[0100] Both equations constitute a transform and an inverse transform between the Ambisonics representation and the spatial domain. These transforms are here called the Spherical Harmonic Transform and the inverse Spherical Harmonic Transform.

[0101] It should be noted that since the directions Ω.sub.o are nearly uniformly distributed on the unit sphere, the approximation

[00048] $\begin{matrix} Ψ^{H} \approx Ψ^{- 1} & (59) \end{matrix}$

is available, which justifies the use of Ψ.sup.−1 instead of Ψ.sup.H in equation (55).

[0102] Advantageously, all the mentioned relations are valid for the discrete-time domain, too.

[0103] The inventive processing can be carried out by a single processor or electronic circuit, or by several processors or electronic circuits operating in parallel and/or operating on different parts of the inventive processing.

METHODS AND APPARATUS FOR COMPRESSING AND DECOMPRESSING A HIGHER ORDER AMBISONICS REPRESENTATION

Assignee

Inventors

Cpc classification

Classification Explorer

H04S2420/13

ELECTRICITY

Classification Explorer

H04S2420/03

ELECTRICITY

Classification Explorer

H04S2420/11

ELECTRICITY

Classification Explorer

G10L19/008

PHYSICS

Classification Explorer

H04S3/008

ELECTRICITY

International classification

Classification Explorer

H04S3/00

ELECTRICITY

Classification Explorer

G10L19/008

PHYSICS

Abstract

Claims

Description