Method and apparatus for compressing and decompressing a Higher Order Ambisonics signal representation

11234091 · 2022-01-25

Assignee

Inventors

Cpc classification

International classification

Abstract

A method and apparatus for decompressing a Higher Order Ambisonics (HOA) signal representation is disclosed. The apparatus includes an input interface that receives an encoded directional signal and an encoded ambient signal and an audio decoder that perceptually decodes the encoded directional signal and encoded ambient signal to produce a decoded directional signal and a decoded ambient signal, respectively. The apparatus further includes an extractor for obtaining side information related to the directional signal and an inverse transformer for converting the decoded ambient signal from a spatial domain to an HOA domain representation of the ambient signal. The apparatus also includes a synthesizer for recomposing a Higher Order Ambisonics (HOA) signal from the HOA domain representation of the ambient signal and the decoded directional signal. The side information includes a direction of the directional signal selected from a set of uniformly spaced directions.

Claims

1. A method for decompressing a compressed Higher Order Ambisonics (HOA) signal that includes an encoded directional signal and an encoded ambient signal, the method comprising: receiving the compressed HOA signal; obtaining side information related to the encoded directional signal, wherein the side information includes a direction of the directional signal selected from a set of uniformly spaced directions; perceptually decoding the compressed HOA signal based on the side information to produce a decoded directional HOA signal and a decoded ambient HOA signal; performing order extension on the decoded ambient HOA signal to obtain a representation of the decoded ambient HOA signal; and recomposing a decoded HOA representation from the representation of the decoded ambient HOA signal and the decoded directional HOA signal.

2. The method of claim 1 wherein the decoded HOA representation has an order greater than one.

3. The method of claim 2 wherein the order of the decoded ambient HOA signal is less than the order of the decoded HOA representation.

4. An apparatus for decompressing a compressed Higher Order Ambisonics (HOA) signal that includes an encoded directional signal and an encoded ambient signal, the apparatus comprising: an input interface that receives the compressed HOA signal; a first processor for obtaining side information related to the encoded directional signal, wherein the side information includes a direction of the directional signal selected from a set of uniformly spaced directions; an audio decoder that perceptually decodes the compressed HOA signal based on the side information to produce a decoded directional HOA signal and a decoded ambient HOA signal; a second processor for performing order extension on the decoded ambient HOA signal to obtain a representation of the decoded ambient HOA signal; and a synthesizer for recomposing a decoded HOA representation from the representation of the decoded ambient HOA signal and the decoded directional HOA signal.

5. The apparatus of claim 4 wherein the decoded HOA representation has an order greater than one.

6. The apparatus of claim 5 wherein the order of the decoded ambient HOA signal is less than the order of the decoded HOA representation.

7. A non-transitory computer readable medium containing instructions that when executed by a processor perform the method of claim 1.

Description

DRAWINGS

(1) Exemplary embodiments of the invention are described with references to the accompanying drawings:

(2) FIG. 1 illustrates normalised dispersion function v.sub.N(Θ) for different Ambisonics orders N and for angles Θ∈[0,π];

(3) FIG. 2 illustrates a block diagram of the compression processing according to the invention; and

(4) FIG. 3 illustrates a block diagram of the decompression processing according to the invention.

EXEMPLARY EMBODIMENTS

(5) Ambisonics signals describe sound fields within source-free areas using Spherical Harmonics (SH) expansion. The feasibility of this description can be attributed to the physical property that the temporal and spatial behaviour of the sound pressure is essentially determined by the wave equation.

(6) Wave Equation and Spherical Harmonics Expansion

(7) For a more detailed description of Ambisonics, in the following a spherical coordinate system is assumed, where a point in space x=(r,θ,ϕ).sup.T is represented by a radius r>0 (i.e. the distance to the coordinate origin), an inclination angle θ∈[0,π] measured from the polar axis z, and an azimuth angle ϕ∈[0,2π[ measured in the x=y plane from the x axis. In this spherical coordinate system the wave equation for the sound pressure p(t,x) within a connected source-free area, where t denotes time, is given by the textbook of Earl G. Williams, “Fourier Acoustics”, vol. 93 of Applied Mathematical Sciences, Academic Press, 1999:

(8) 1 r 2 [ r ( r 2 p ( t , x ) r ) + 1 sin θ θ ( sinθ p ( t , x ) θ ) + 1 sin 2 θ 2 p ( t , x ) ϕ 2 ] - 1 c s 2 2 p ( t , x ) t 2 = 0 ( 1 )
with c.sub.s indicating the speed of sound. As a consequence, the Fourier transform of the sound pressure with respect to time
P(ω,x):=custom character.sub.t{p(t,x)}  (2)
:=∫.sub.−∞.sup.∞p(t,x)e.sup.−iωtdt,  (3)
where i denotes the imaginary unit, may be expanded into the series of SH according to the Williams textbook:
P(kc.sub.s,(r,θ,ϕ).sup.T)=Σ.sub.n=0.sup.∞Σ.sub.m=−n.sup.np.sub.n.sup.m(kr)Y.sub.n.sup.m(θ,ϕ)  (4)

(9) It should be noted that this expansion is valid for all points x within a connected source-free area, which corresponds to the region of convergence of the series.

(10) In eq. (4), k denotes the angular wave number defined by

(11) k : = ω c s ( 5 )
and p.sub.n.sup.m(kr) indicates the SH expansion coefficients, which depend only on the product kr.

(12) Further, Y.sub.n.sup.m(θ,ϕ) are the SH functions of order n and degree m:

(13) Y n m ( θ , ϕ ) := ( 2 n + 1 ) 4 π ( n - m ) ! ( n + m ) ! P n m ( cosθ ) e imϕ , ( 6 )
where P.sub.n.sup.m(cos θ) denote the associated Legendre functions and (⋅)! indicates the factorial.

(14) The associated Legendre functions for non-negative degree indices m are defined through the Legendre polynomials P.sub.n(x) by

(15) P n m ( x ) := ( - 1 ) m ( 1 - x 2 ) m 2 d m dx m P n ( x ) for m 0. ( 7 )
For negative degree indices, i.e. m<0, the associated Legendre functions are defined by

(16) P n m ( x ) := ( - 1 ) m ( n + m ) ! ( n - m ) ! P n - m ( x ) for m < 0. ( 8 )
The Legendre polynomials P.sub.n(x) (n≥0) in turn can be defined using the Rodrigues' Formula as

(17) P n ( x ) = 1 2 n n ! d n dx n ( x 2 - 1 ) n . ( 9 )

(18) In the prior art, e.g. in M. Poletti, “Unified Description of Ambisonics using Real and Complex Spherical Harmonics”, Proceedings of the Ambisonics Symposium 2009, 25-27 Jun. 2009, Graz, Austria, there also exist definitions of the SH functions which deviate from that in eq. (6) by a factor of (−1).sup.m for negative degree indices m.

(19) Alternatively, the Fourier transform of the sound pressure with respect to time can be expressed using real SH functions S.sub.n.sup.m(θ,ϕ) as
P(kc.sub.s,(r,θ,ϕ).sup.T)=Σ.sub.n=0.sup.∞Σ.sub.m=−n.sup.nq.sub.n.sup.m(kr)S.sub.n.sup.m(θ,ϕ).  (10)

(20) In literature, there exist various definitions of the real SH functions (see e.g. the above-mentioned Poletti article). One possible definition, which is applied throughout this document, is given by

(21) S n m ( θ , ϕ ) := ( ( - 1 ) m 2 [ Y n m ( θ , ϕ ) + Y n m * ( θ , ϕ ) ] for m > 0 Y n m ( θ , ϕ ) for m = 0 ( - 1 ) i 2 [ Y n m ( θ , ϕ ) - Y n m * ( θ , ϕ ) ] for m < 0 , ( 11 )
where (⋅)* denotes complex conjugation. An alternative expression is obtained by inserting eq. (6) into eq. (11):

(22) S n m ( θ , ϕ ) = ( 2 n + 1 ) 4 π ( n - m ) ! ( n + m ) ! P n m ( cosθ ) trg m ( ϕ ) , ( 12 ) with trg m ( ϕ ) : = ( ( - 1 ) m 2 cos ( ) for m > 0 1 for m = 0 - 2 sin ( ) for m < 0 , ( 13 )

(23) Although the real SH functions are real-valued per definition, this does not hold for the corresponding expansion coefficients q.sub.n.sup.m(kr) in general.

(24) The complex SH functions are related to the real SH functions as follows:

(25) Y n m ( θ , ϕ ) = ( q n m ( kr ) 2 [ S n m ( θ , ϕ ) + iS n - m ( θ , ϕ ) ] for m > 0 S n 0 ( θ , ϕ ) for m = 0 1 i 2 [ S n m ( θ , ϕ ) + iS n - m ( θ , ϕ ) ] for m < 0 . ( 14 )

(26) The complex SH functions Y.sub.n.sup.m(θ,ϕ) as well as the real SH functions S.sub.n.sup.m(θ,ϕ) with the direction vector Ω:=(θ,ϕ).sup.T form an orthonormal basis for squared integrable complex valued functions on the unit sphere custom character.sup.2 in the three-dimensional space, and thus obey the conditions

(27) 0 Y n m ( Ω ) Y n m * ( Ω ) d Ω = 0 2 π 0 π Y n m ( θ , ϕ ) Y n m * ( θ , ϕ ) sin θ d θ d ϕ = δ n - n δ m - m ( 15 ) S n m ( Ω ) S n m ( Ω ) d Ω = δ n - n δ m - m , ( 16 )
where δ denotes the Kronecker delta function. The second result can be derived using eq. (15) and the definition of the real spherical harmonics in eq. (11).
Interior Problem and Ambisonics Coefficients

(28) The purpose of Ambisonics is a representation of a sound field in the vicinity of the coordinate origin. Without loss of generality, this region of interest is here assumed to be a ball of radius R centred in the coordinate origin, which is specified by the set {x|0≤r≤R}. A crucial assumption for the representation is that this ball is supposed to not contain any sound sources. Finding the representation of the sound field within this ball is termed the ‘interior problem’, cf. the above-mentioned Williams textbook.

(29) It can be shown that for the interior problem the SH functions expansion coefficients p.sub.n.sup.m(kr) can be expressed as
p.sub.n.sup.m(kr)=a.sub.n.sup.m(k)j.sub.n(kr),  (17)
where j.sub.n(.) denote the spherical Bessel functions of first order. From eq. (17) it follows that the complete information about the sound field is contained in the coefficients a.sub.n.sup.m(k), which are referred to as Ambisonics coefficients.

(30) Similarly, the coefficients of the real SH functions expansion q.sub.n.sup.m(kr) can be factorised as
q.sub.n.sup.m(kr)=b.sub.n.sup.m(k)j.sub.n(kr),  (18)
where the coefficients b.sub.n.sup.m(k) are referred to as Ambisonics coefficients with respect to the expansion using real-valued SH functions. They are related to a.sub.n.sup.m(k) through

(31) b n m ( k ) = ( 1 2 [ ( - 1 ) m a n m ( k ) + a n - m ( k ) ] for m > 0 a n 0 ( k ) for m = 0 1 i 2 [ a n m ( k ) - ( - 1 ) m a n - m ( k ) ] for m < 0 . ( 19 )
Plane Wave Decomposition

(32) The sound field within a sound source-free ball centred in the coordinate origin can be expressed by a superposition of an infinite number of plane waves of different angular wave numbers k, impinging on the ball from all possible directions, cf. the above-mentioned Rafaely “Plane-wave decomposition . . . ” article. Assuming that the complex amplitude of a plane wave with angular wave number k from the direction Ω.sub.0 is given by D(k,Ω.sub.0), it can be shown in a similar way by using eq. (11) and eq. (19) that the corresponding Ambisonics coefficients with respect to the real SH functions expansion are given by
b.sub.n,plane wave.sup.m(k;Ω.sub.0)=4πi.sup.nD(k,Ω.sub.0)S.sub.n.sup.m(Ω.sub.0).  (20)

(33) Consequently, the Ambisonics coefficients for the sound field resulting from a superposition of an infinite number of plane waves of angular wave number k are obtained from an integration of eq. (20) over all possible directions Ω.sub.0∈custom character.sup.2:

(34) b n m ( k ) = b n , plane wave m ( k ; Ω 0 ) d Ω 0 ( 21 ) = 4 π i n D ( k , Ω 0 ) S n m ( Ω 0 ) d Ω 0 . ( 22 )

(35) The function D(k,Ω) is termed ‘amplitude density’ and is assumed to be square integrable on the unit sphere custom character.sup.2. It can be expanded into the series of real SH functions as
D(k,Ω)=Σ.sub.n=0.sup.∞Σ.sub.m=−n.sup.nc.sub.n.sup.m(k)S.sub.n.sup.m(Ω),  (23)
where the expansion coefficients c.sub.n.sup.m(k) are equal to the integral occurring in eq. (22), i.e.
c.sub.n.sup.m(k)=custom characterD(k,Ω)S.sub.n.sup.m(Ω)dΩ.  (24)

(36) By inserting eq. (24) into eq. (22) it can be seen that the Ambisonics coefficients b.sub.n.sup.m(k) are a scaled version of the expansion coefficients c.sub.n.sup.m(k), i.e.
b.sub.n.sup.m(k)=4πi.sup.nc.sub.n.sup.m(k).  (25)

(37) When applying the inverse Fourier transform with respect to time to the scaled Ambisonics coefficients c.sub.n.sup.m(k) and to the amplitude density function D(k,Ω), the corresponding time domain quantities

(38) c ~ n m ( t ) := t - 1 { c n m ( ω c s ) } = 1 2 π - c n m ( ω c s ) e i ω t d ω ( 26 ) d ( t , Ω ) := t - 1 { D ( ω c s , Ω ) } = 1 2 π - D ( ω c s , Ω ) e i ω t d ω ( 27 )
are obtained. Then, in the time domain, eq. (24) can be formulated as
{tilde over (c)}.sub.n.sup.m(t)=custom characterd(t,Ω)S.sub.n.sup.m(Ω)dΩ.  (28)

(39) The time domain directional signal d(t,Ω) may be represented by a real SH function expansion according to
d(t,Ω)=Σ.sub.n=0.sup.∞Σ.sub.m=−n.sup.n{tilde over (c)}.sub.n.sup.m(t)S.sub.n.sup.m(Ω).  (29)

(40) Using the fact that the SH functions S.sub.n.sup.m(Ω) are real-valued, its complex conjugate can be expressed by
d*(t,Ω)=Σ.sub.n=0.sup.∞Σ.sub.m=−n.sup.n{tilde over (c)}.sub.n.sup.m*(t)S.sub.n.sup.m(Ω).  (30)

(41) Assuming the time domain signal d(t,Ω) to be real-valued, i.e. d(t,Ω)=d*(t,Ω), it follows from the comparison of eq. (29) with eq. (30) that the coefficients ć.sub.n.sup.m*(t) are real-valued in that case, i.e. {tilde over (c)}.sub.n.sup.m(t)={tilde over (c)}.sub.n.sup.m*(t).

(42) The coefficients {tilde over (c)}.sub.n.sup.m(t) will be referred to as scaled time domain Ambisonics coefficients in the following.

(43) In the following it is also assumed that the sound field representation is given by these coefficients, which will be described in more detail in the below section dealing with the compression.

(44) It is noted that the time domain HOA representation by the coefficients {tilde over (c)}.sub.n.sup.m(t) used for the processing according to the invention is equivalent to a corresponding frequency domain HOA representation c.sub.n.sup.m(t). Therefore, the described compression and decompression can be equivalently realised in the frequency domain with minor respective modifications of the equations.

(45) Spatial Resolution with Finite Order

(46) In practice the sound field in the vicinity of the coordinate origin is described using only a finite number of Ambisonics coefficients c.sub.n.sup.m(t) of order n≤N. Computing the amplitude density function from the truncated series of SH functions according to
D.sub.N(k,Ω):=Σ.sub.n=0.sup.NΣ.sub.m=−n.sup.nc.sub.n.sup.m(k)S.sub.n.sup.m(Ω)  (31)
introduces a kind of spatial dispersion compared to the true amplitude density function D(k,Ω), cf. the above-mentioned “Plane-wave decomposition . . . ” article. This can be realised by computing the amplitude density function for a single plane wave from the direction Ω.sub.0 using eq. (31):

(47) D N ( k , Ω ) = .Math. n = 0 N .Math. m = - n n 1 4 π i n n .Math. b n , plane wave m ( k ; Ω 0 ) S n m ( Ω ) ( 32 ) = D ( k , Ω 0 ) .Math. n = 0 N .Math. m = - n n S n m ( Ω 0 ) S n m ( Ω ) ( 33 ) = D ( k , Ω 0 ) .Math. n = 0 N .Math. m = - n n Y n m * ( Ω 0 ) Y n m ( Ω ) ( 34 ) = D ( k , Ω 0 ) .Math. n = 0 N 2 n + 1 4 π P n ( cos Θ ) ( 35 ) = D ( k , Ω 0 ) [ N + 1 4 π ( cos Θ - 1 ) ( P N + 1 ( cos Θ ) - P N ( cos Θ ) ) ] ( 36 ) = D ( k , Ω 0 ) v N ( Θ ) with ( 37 ) v N ( Θ ) := N + 1 4 π ( cos Θ - 1 ) ( P N + 1 ( cos Θ ) - P N ( cos Θ ) ) , ( 38 )
where Θ denotes the angle between the two vectors pointing towards the directions Ω and Ω.sub.0 satisfying the property
cos Θ=cos θ cos θ.sub.0+cos(ϕ−ϕ.sub.0)sin θ sin θ.sub.0.  (39)

(48) In eq. (34) the Ambisonics coefficients for a plane wave given in eq. (20) are employed, while in equations (35) and (36) some mathematical theorems are exploited, cf. the above-mentioned “Plane-wave decomposition . . . ” article. The property in eq. (33) can be shown using eq. (14).

(49) Comparing eq. (37) to the true amplitude density function

(50) D ( k , Ω ) = D ( k , Ω 0 ) δ ( Θ ) 2 π , ( 40 )
where δ(⋅) denotes the Dirac delta function, the spatial dispersion becomes obvious from the replacement of the scaled Dirac delta function by the dispersion function v.sub.N(Θ) which, after having been normalised by its maximum value, is illustrated in FIG. 1 for different Ambisonics orders N and angles Θ∈[0,π].

(51) Because the first zero of v.sub.N(Θ) is located approximately at

(52) π N
for N≥4 (see the above-mentioned “Plane-wave decomposition . . . ” article), the dispersion effect is reduced (and thus the spatial resolution is improved) with increasing Ambisonics order N. For N.fwdarw.∞ the dispersion function v.sub.N(Θ) converges to the scaled Dirac delta function. This can be seen if the completeness relation for the Legendre polynomials

(53) .Math. n = 0 2 n + 1 2 P n ( x ) P n ( x ) = δ ( x - x ) ( 41 )
is used together with eq. (35) to express the limit of v.sub.N(Θ) for N.fwdarw.∞ as

(54) lim N .fwdarw. v N ( Θ ) = 1 2 π .Math. n = 0 2 n + 1 2 P n ( cos Θ ) ( 42 ) = 1 2 π .Math. n = 0 2 n + 1 2 P n ( cos Θ ) P n ( 1 ) ( 43 ) = 1 2 π δ ( cos Θ - 1 ) ( 44 ) = 1 2 π δ ( Θ ) . ( 45 )

(55) When defining the vector of real SH functions of order n≤N by
(Ω):=(S.sub.0.sup.0(Ω),S.sub.1.sup.−1(Ω),S.sub.1.sup.0(n),S.sub.1.sup.1(n),S.sub.2.sup.−2(n),S.sub.N.sup.N(Ω)).sup.T∈custom character.sup.O,  (46)
where O=(N+1).sup.2 and where (.).sup.T denotes transposition, the comparison of eq. (37) with eq. (33) shows that the dispersion function can be expressed through the scalar product of two real SH vectors as
v.sub.N(Θ)=S.sup.T(Ω)S(Ω.sub.0).  (47)
The dispersion can be equivalently expressed in time domain as

(56) d N ( t , Ω ) := .Math. n = 0 N .Math. m = - n n c ~ n m ( t ) S n m ( Ω ) ( 48 ) = d ( t , Ω 0 ) v N ( Θ ) . ( 49 )
Sampling

(57) For some applications it is desirable to determine the scaled time domain Ambisonics coefficients {tilde over (c)}.sub.n.sup.m(t) from the samples of the time domain amplitude density function d(t,Ω) at a finite number J of discrete directions Ω.sub.j. The integral in eq. (28) is then approximated by a finite sum according to B. Rafaely, “Analysis and Design of Spherical Microphone Arrays”, IEEE Transactions on Speech and Audio Processing, vol. 13, no. 1, pp. 135-143, January 2005:
{tilde over (c)}.sub.n.sup.m(t)≈Σ.sub.j=1.sup.Jg.sub.j.Math.d(t,Ω.sub.j)S.sub.n.sup.m(Ω.sub.j),  (50)
where the g.sub.j denote some appropriately chosen sampling weights. In contrast to the “Analysis and Design . . . ” article, approximation (50) refers to a time domain representation using real SH functions rather than to a frequency domain representation using complex SH functions. A necessary condition for approximation (50) to become exact is that the amplitude density is of limited harmonic order N, meaning that
{tilde over (c)}.sub.n.sup.m(t)=0 for n>N.  (51)

(58) If this condition is not met, approximation (50) suffers from spatial aliasing errors, cf. B. Rafaely, “Spatial Aliasing in Spherical Microphone Arrays”, IEEE Transactions on Signal Processing, vol. 55, no. 3, pp. 1003-1010, March 2007.

(59) A second necessary condition requires the sampling points Ω.sub.j and the corresponding weights to fulfil the corresponding conditions given in the “Analysis and Design . . . ” article:
Σ.sub.j=1.sup.Jg.sub.jS.sub.n′.sup.m′(Ω.sub.j)S.sub.n.sup.m(Ω.sub.j)=δ.sub.n-n′δ.sub.m-m′ for m,m′≤N.  (52)

(60) The conditions (51) and (52) jointly are sufficient for exact sampling.

(61) The sampling condition (52) consists of a set of linear equations, which can be formulated compactly using a single matrix equation as
Ψ.sup.H=I,  (53)
where Ψ indicates the mode matrix defined by
Ψ:=[S(Ω.sub.1) . . . S(Ω.sub.J)]∈custom character.sup.O×J  (54)
and G denotes the matrix with the weights on its diagonal, i.e.
G:=diag(g.sub.1,g.sub.J).  (55)

(62) From eq. (53) it can be seen that a necessary condition for eq. (52) to hold is that the number J of sampling points fulfils J≥0. Collecting the values of the time domain amplitude density at the J sampling points into the vector
w(t):=(D(t,Ω.sub.1), . . . ,D(t,Ω.sub.J)),  (56)
and defining the vector of scaled time domain Ambisonics coefficients by
c(t):=({tilde over (c)}.sub.0.sup.0(t),{tilde over (c)}.sub.1.sup.−1(t),{tilde over (c)}.sub.1.sup.0(t),{tilde over (c)}.sub.1.sup.1(t),{tilde over (c)}.sub.2.sup.−2(t),{tilde over (c)}.sub.O.sup.O(t)),  (57)
both vectors are related through the SH functions expansion (29). This relation provides the following system of linear equations:
(t)=Ψ.sup.Hc(t).  (58)

(63) Using the introduced vector notation, the computation of the scaled time domain Ambisonics coefficients from the values of the time domain amplitude density function samples can be written as
(t)≈ΨGw(t).  (59)

(64) Given a fixed Ambisonics order N, it is often not possible to compute a number J≥0 of sampling points Ω.sub.j and the corresponding weights such that the sampling condition eq. (52) holds. However, if the sampling points are chosen such that the sampling condition is well approximated, then the rank of the mode matrix Ψ is 0 and its condition number low. In this case, the pseudo-inverse
Ψ.sup.+:=(ΨΨ.sup.H).sup.−1ΨΨ.sup.+  (60)
of the mode matrix Ψ exists and a reasonable approximation of the scaled time domain Ambisonics coefficient vector c(t) from the vector of the time domain amplitude density function samples is given by
c(t)≈Ψ.sup.+w(t).  (61)

(65) If J=O and the rank of the mode matrix is O, then its pseudo-inverse coincides with its inverse since
Ψ.sup.+=(ΨΨ.sup.H).sup.−1Ψ=Ψ.sup.−HΨ.sup.−1Ψ=Ψ.sup.−H.  (62)

(66) If additionally the sampling condition eq. (52) is satisfied, then
Ψ.sup.−H=ΨG  (63)

(67) holds and both approximations (59) and (61) are equivalent and exact.

(68) Vector w(t) can be interpreted as a vector of spatial time domain signals. The transform from the HOA domain to the spatial domain can be performed e.g. by using eq. (58). This kind of transform is termed ‘Spherical Harmonic Transform’ (SHT) in this application and is used when the ambient HOA component of reduced order is transformed to the spatial domain. It is implicitly assumed that the spatial sampling points Ω.sub.j for the SHT approximately satisfy the sampling condition in eq. (52) with

(69) 0 g j 4 π O
for j=1, . . . , J and that J=O. Under these assumptions the SHT matrix satisfies

(70) Ψ H 4 π o Ψ - 1 .Math.
In case the absolute scaling for the SHT not being important, the constant

(71) 4 π O
can be neglected.
Compression

(72) This invention is related to the compression of a given HOA signal representation. As mentioned above, the HOA representation is decomposed into a predefined number of dominant directional signals in the time domain and an ambient component in HOA domain, followed by compression of the HOA representation of the ambient component by reducing its order. This operation exploits the assumption, which is supported by listening tests, that the ambient sound field component can be represented with sufficient accuracy by a HOA representation with a low order. The extraction of the dominant directional signals ensures that, following that compression and a corresponding decompression, a high spatial resolution is retained.

(73) After the decomposition, the ambient HOA component of reduced order is transformed to the spatial domain, and is perceptually coded together with the directional signals as described in section Exemplary embodiments of patent application EP 10306472.1.

(74) The compression processing includes two successive steps, which are depicted in FIG. 2. The exact definitions of the individual signals are described in below section Details of the compression.

(75) In the first step or stage shown in FIG. 2a, in a dominant direction estimator 22 dominant directions are estimated and a decomposition of the Ambisonics signal C(l) into a directional and a residual or ambient component is performed, where l denotes the frame index. The directional component is calculated in a directional signal computation step or stage 23, whereby the Ambisonics representation is converted to time domain signals represented by a set of D conventional directional signals X(l) with corresponding directions Ω.sub.DOM(l). The residual ambient component is calculated in an ambient HOA component computation step or stage 24, and is represented by HOA domain coefficients C.sub.A(l).

(76) In the second step shown in FIG. 2b, a perceptual coding of the directional signals X(l) and the ambient HOA component C.sub.A(l) is carried out as follows: The conventional time domain directional signals X(l) can be individually compressed in a perceptual coder 27 using any known perceptual compression technique. The compression of the ambient HOA domain component C.sub.A(l) is carried out in two sub steps or stages.

(77) The first substep or stage 25 performs a reduction of the original Ambisonics order N to N.sub.RED, e.g. N.sub.RED=2 resulting in the ambient HOA component C.sub.A,RED(l). Here, the assumption is exploited that the ambient sound field component can be represented with sufficient accuracy by HOA with a low order. The second substep or stage 26 is based on a compression described in patent application EP 10306472.1. The O.sub.RED:=(N.sub.RED+1).sup.2 HOA signals C.sub.A,RED(l) of the ambient sound field component, which were computed at substep/stage 25, are transformed into O.sub.RED equivalent signals W.sub.A,RED(l) in the spatial domain by applying a Spherical Harmonic Transform, resulting in conventional time domain signals which can be input to a bank of parallel perceptual codecs 27. Any known perceptual coding or compression technique can be applied. The encoded directional signals {hacek over (X)}(l) and the order-reduced encoded spatial domain signals {hacek over (W)}.sub.A,RED(l) are output and can be transmitted or stored.

(78) Advantageously, the perceptual compression of all time domain signals X(l) and W.sub.A,RED(l) can be performed jointly in a perceptual coder 27 in order to improve the overall coding efficiency by exploiting the potentially remaining inter-channel correlations.

(79) Decompression

(80) The decompression processing for a received or replayed signal is depicted in FIG. 3. Like the compression processing, it includes two successive steps.

(81) In the first step or stage shown in FIG. 3a, in a perceptual decoding 31 a perceptual decoding or decompression of the encoded directional signals {hacek over (X)}(l) and of the order-reduced encoded spatial domain signals {hacek over (W)}.sub.A,RED(l) is carried out, where {circumflex over (X)}(l) is the represents component and {hacek over (W)}.sub.A,RED(l) represents the ambient HOA component. The perceptually decoded or decompressed spatial domain signals Ŵ.sub.A,RED(l) are transformed in an inverse spherical harmonic transformer 32 to an HOA domain representation Ĉ.sub.A,RED(l) of order N.sub.RED via an inverse Spherical Harmonics transform. Thereafter, in an order extension step or stage 33 an appropriate HOA representation Ĉ.sub.A(l) of order N is estimated from Ĉ.sub.A,RED(l) by order extension.

(82) In the second step or stage shown in FIG. 3b, the total HOA representation Ĉ(l) is re-composed in an HOA signal assembler 34 from the directional signals {circumflex over (X)}(l) and the corresponding direction information Ω.sub.DOM(l) as well as from the original-order ambient HOA component Ĉ.sub.A(l).

(83) Achievable Data Rate Reduction

(84) A problem solved by the invention is the considerable reduction of the data rate as compared to existing compression methods for HOA representations. In the following the achievable compression rate compared to the non-compressed HOA representation is discussed. The compression rate results from the comparison of the data rate required for the transmission of a non-compressed HOA signal C(l) of order N with the data rate required for the transmission of a compressed signal representation consisting of D perceptually coded directional signals X(l) with corresponding directions Ω.sub.DOM(l) and N.sub.RED perceptually coded spatial domain signals W.sub.A,RED(l) representing the ambient HOA component.

(85) For the transmission of the non-compressed HOA signal C(l) a data rate of O.Math.f.sub.S.Math.N.sub.b is required. On the contrary, the transmission of D perceptually coded directional signals X(l) requires a data rate of D.Math.f.sub.b,COD, where f.sub.b,COD denotes the bit rate of the perceptually coded signals. Similarly, the transmission of the N.sub.RED perceptually coded spatial domain signals W.sub.A,RED(l) signals requires a bit rate of O.sub.RED.Math.f.sub.b,COD. The directions Ω.sub.DOM(l) are assumed to be computed based on a much lower rate compared to the sampling rate f.sub.S, i.e. they are assumed to be fixed for the duration of a signal frame consisting of B samples, e.g. B=1200 for a sampling rate of f.sub.S=48 kHz, and the corresponding data rate share can be neglected for the computation of the total data rate of the compressed HOA signal.

(86) Therefore, the transmission of the compressed representation requires a data rate of approximately (D+O.sub.RED).Math.f.sub.b,COD. Consequently, the compression rate r.sub.COMPR is

(87) r COMPR O .Math. f S .Math. N b ( D + O RED ) .Math. f b , COD .Math. ( 64 )

(88) For example, the compression of an HOA representation of order N=4 employing a sampling rate f.sub.S=48 kHz and N.sub.b=16 bits per sample to a representation with D=3 dominant directions using a reduced HOA order N.sub.RED=2 and a bit rate of

(89) 64 kbits s
will result in a compression rate of r.sub.COMPR≈25. The transmission of the compressed representation requires a data rate of approximately

(90) 768 kbits s .
Reduced Probability for Occurrence of Coding Noise Unmasking

(91) As explained in the Background section, the perceptual compression of spatial domain signals described in patent application EP 10306472.1 suffers from remaining cross correlations between the signals, which may lead to unmasking of perceptual coding noise. According to the invention, the dominant directional signals are first extracted from the HOA sound field representation before being perceptually coded. This means that, when composing the HOA representation, after perceptual decoding the coding noise has exactly the same spatial directivity as the directional signals. In particular, the contributions of the coding noise as well as that of the directional signal to any arbitrary direction is deterministically described by the spatial dispersion function explained in section Spatial resolution with finite order. In other words, at any time instant the HOA coefficients vector representing the coding noise is exactly a multiple of the HOA coefficients vector representing the directional signal. Thus, an arbitrarily weighted sum of the noisy HOA coefficients will not lead to any unmasking of the perceptual coding noise.

(92) Further, the ambient component of reduced order is processed exactly as proposed in EP 10306472.1, but because per definition the spatial domain signals of the ambient component have a rather low correlation between each other, the probability for perceptual noise unmasking is low.

(93) Improved Direction Estimation

(94) The inventive direction estimation is dependent on the directional power distribution of the energetically dominant HOA component. The directional power distribution is computed from the rank-reduced correlation matrix of the HOA representation, which is obtained by eigenvalue decomposition of the correlation matrix of the HOA representation.

(95) Compared to the direction estimation used in the above-mentioned “Plane-wave decomposition . . . ” article, it offers the advantage of being more precise, since focusing on the energetically dominant HOA component instead of using the complete HOA representation for the direction estimation reduces the spatial blurring of the directional power distribution.

(96) Compared to the direction estimation proposed in the above-mentioned “The Application of Compressive Sampling to the Analysis and Synthesis of Spatial Sound Fields” and “Time Domain Reconstruction of Spatial Sound Fields Using Compressed Sensing” articles, it offers the advantage of being more robust. The reason is that the decomposition of the HOA representation into the directional and ambient component can hardly ever be accomplished perfectly, so that there remains a small ambient component amount in the directional component. Then, compressive sampling methods like in these two articles fail to provide reasonable direction estimates due to their high sensitivity to the presence of ambient signals.

(97) Advantageously, the inventive direction estimation does not suffer from this problem.

(98) Alternative Applications of the HOA Representation Decomposition

(99) The described decomposition of the HOA representation into a number of directional signals with related direction information and an ambient component in HOA domain can be used for a signal-adaptive DirAC-like rendering of the HOA representation according to that proposed in the above-mentioned Pulkki article “Spatial Sound Reproduction with Directional Audio Coding”.

(100) Each HOA component can be rendered differently because the physical characteristics of the two components are different. For example, the directional signals can be rendered to the loudspeakers using signal panning techniques like Vector Based Amplitude Panning (VBAP), cf. V. Pulkki, “Virtual Sound Source Positioning Using Vector Base Amplitude Panning”, Journal of Audio Eng. Society, vol. 45, no. 6, pp. 456-466, 1997. The ambient HOA component can be rendered using known standard HOA rendering techniques.

(101) Such rendering is not restricted to Ambisonics representation of order ‘1’ and can thus be seen as an extension of the DirAC-like rendering to HOA representations of order N>1.

(102) The estimation of several directions from an HOA signal representation can be used for any related kind of sound field analysis.

(103) The following sections describe in more detail the signal processing steps.

(104) Compression

(105) Definition of Input Format

(106) As input, the scaled time domain HOA coefficients {tilde over (c)}.sub.n.sup.m(t) defined in eq. (26) are assumed to be sampled at a rate

(107) f S = 1 T S .
A vector c(j) is defined to be composed of all coefficients belonging to the sampling time t=jT.sub.S, j∈custom character, according to
c(j):=[{tilde over (c)}.sub.0.sup.0(jT.sub.S),{tilde over (c)}.sub.1.sup.−1(jT.sub.S),{tilde over (c)}.sub.1.sup.0(jT.sub.S),{tilde over (c)}.sub.1.sup.1(jT.sub.S),{tilde over (c)}.sub.2.sup.−2(jT.sub.S),{tilde over (c)}.sub.N.sup.N(jT.sub.S),].sup.T∈custom character.sup.O.  (65)
Framing

(108) The incoming vectors c(j) of scaled HOA coefficients are framed in framing step or stage 21 into non-overlapping frames of length B according to
C(l):=[c(lB+1)c(lB+2) . . . c(lB+B)]∈custom character.sup.O×B.  (66)

(109) Assuming a sampling rate of f.sub.S=48 kHz, an appropriate frame length is B=1200 samples corresponding to a frame duration of 25 ms.

(110) Estimation of Dominant Directions

(111) For the estimation of the dominant directions the following correlation matrix

(112) B ( l ) := 1 LB .Math. l = 0 L - 1 C ( l - l ) C T ( l - l ) O O . ( 67 )
is computed. The summation over the current frame l and L−1 previous frames indicates that the directional analysis is based on long overlapping groups of frames with L.Math.B samples, i.e. for each current frame the content of adjacent frames is taken into consideration. This contributes to the stability of the directional analysis for two reasons: longer frames are resulting in a greater number of observations, and the direction estimates are smoothed due to overlapping frames.

(113) Assuming f.sub.s=48 kHz and B=1200, a reasonable value for L is 4 corresponding to an overall frame duration of 100 ms.

(114) Next, an eigenvalue decomposition of the correlation matrix B(l) is determined according to
(l)=V(l)∧(l)V.sup.T(l),  (68)
wherein matrix V(l) is composed of the eigenvectors v.sub.i(l), 1≤i≤O, as
V(l):=[v.sub.1(l)v.sub.2(l) . . . v.sub.O(l)]∈custom character.sup.O×O  (69)
and matrix ∧(l) is a diagonal matrix with the corresponding eigenvalues λ.sub.i(l), 1≤i≤0, on its diagonal:
∧(l):=diag(λ.sub.1(l),λ.sub.2(l), . . . ,λ.sub.O(l))∈custom character.sup.O×O.  (70)

(115) It is assumed that the eigenvalues are indexed in a non-ascending order, i.e.
λ.sub.1(l)≥λ.sub.2(l)≥ . . . ≥λ.sub.O(l).  (71)

(116) Thereafter, the index set {1, . . . , custom character(l)} of dominant eigenvalues is computed. One possibility to manage this is defining a desired minimal broadband directional-to-ambient power ratio DAR.sub.MIN and then determining custom character(l) such that

(117) 10 log 10 ( λ i ( l ) λ 1 ( l ) ) - DAR MIN i ( l ) and 10 log 10 ( λ i ( l ) λ 1 ( 1 ) ) > - DAR MIN for i = ( l ) + 1. ( 72 )

(118) A reasonable choice for DAR.sub.MIN is 15 dB. The number of dominant eigenvalues is further constrained to be not greater than D in order to concentrate on no more than D dominant directions. This is accomplished by replacing the index set {1, . . . ,custom character(l)} by {1, . . . ,custom character(l)}, where
custom character(l):=max(custom character(l),D).  (73)

(119) Next, the custom character(l)-rank approximation of B(l) is obtained by
custom character(l):=custom character(l)custom character(l)custom character(l), where  (74)
custom character(l):=[v.sub.1(l)v.sub.2(l) . . . custom character(l)]∈custom character,  (75)
custom character(l):=diag(λ.sub.1(l),λ.sub.2(l), . . . ,custom character(l))∈custom character.  (76)

(120) This matrix should contain the contributions of the dominant directional components to B(l).

(121) Thereafter, the vector

(122) σ 2 ( l ) := diag ( Ξ T ( l ) Ξ ) Q ( 77 ) = ( S 1 T ( l ) S 1 , .Math. , S Q T B �� ( l ) S Q ) T ( 78 )
is computed, where Ξ denotes a mode matrix with respect to a high number of nearly equally distributed test directions Ω.sub.q:=(θ.sub.q,ϕ.sub.q), 1≤q≤Q, where θ.sub.q∈[0,π] denotes the inclination angle θ∈[0,π] measured from the polar axis z and ϕ.sub.q∈[−π,π[ denotes the azimuth angle measured in the x=y plane from the x axis.

(123) Mode matrix Ξ is defined by
Ξ:=[S.sub.1 S.sub.2 . . . S.sub.Q]∈custom character.sup.O×Q  (79)
with
S.sub.q:=[S.sub.0.sup.0(Ω.sub.q),S.sub.1.sup.−1(Ω.sub.q),S.sub.1.sup.0(Ω.sub.q),S.sub.1.sup.−1(Ω.sub.q),S.sub.2.sup.−2(Ω.sub.q), . . . ,S.sub.N.sup.N(Ω.sub.q)].sup.T  (80)
for 1≤q≤Q.

(124) The σ.sub.q.sup.2(l) elements of σ.sup.2(l) are approximations of the powers of plane waves, corresponding to dominant directional signals, impinging from the directions Ω.sub.q. The theoretical explanation for that is provided in the below section Explanation of direction search algorithm.

(125) From σ.sup.2(l) a number {tilde over (D)}(l) of dominant directions Ω.sub.CURRDOM,{tilde over (d)}(l), 1≤{tilde over (d)}≤{tilde over (D)}(l), for the determination of the directional signal components is computed. The number of dominant directions is thereby constrained to fulfil {tilde over (D)}(l)≤D in order to assure a constant data rate. However, if a variable data rate is allowed, the number of dominant directions can be adapted to the current sound scene.

(126) One possibility to compute the {tilde over (D)}(l) dominant directions is to set the first dominant direction to that with the maximum power, i.e. Ω.sub.CURRDOM,1(l)=Ω.sub.q.sub.1 with q.sub.1:=custom characterσ.sub.q.sup.2(l) and custom character:={1, 2, . . . , Q}. Assuming that the power maximum is created by a dominant directional signal, and considering the fact that using a HOA representation of finite order N results in a spatial dispersion of directional signals (cf. the above-mentioned “Plane-wave decomposition . . . ” article), it can be concluded that in the directional neighbourhood of Ω.sub.CURRDOM,1(l) there should occur power components belonging to the same directional signal. Since the spatial signal dispersion can be expressed by the function v.sub.N(Θ.sub.q,q.sub.1) (see eq. (38)), where Θ.sub.q,q.sub.1:=∠(Ω.sub.q,Ω.sub.q.sub.1) denotes the angle between Ω.sub.q and Ω.sub.CURRDOM,1(l), the power belonging to the directional signal declines according to v.sub.N.sup.2(Θ.sub.q,q.sub.1). Therefore, it is reasonable to exclude all directions Ω.sub.q in the directional neighbourhood of Ω.sub.q.sub.1 with Θ.sub.q,1≤Θ.sub.MIN for the search of further dominant directions. The distance Θ.sub.MIN can be chosen as the first zero of v.sub.N(x), which is approximately given by

(127) 0 π N
for N≥4. The second dominant direction is then set to that with the maximum power in the remaining directions Ω.sub.q∈custom character.sub.2 with custom character.sub.2:={q∈custom character.sub.1|Θ.sub.q,1>Θ.sub.MIN}. The remaining dominant directions are determined in an analogous way.

(128) The number {tilde over (D)}(l) of dominant directions can be determined by regarding the powers

(129) σ q d ~ 2 ( l )
assigned to the individual dominant directions

(130) Ω q d ~
and searching for the case where the ratio

(131) σ q 1 2 ( l ) / σ q d ~ 2 ( l )
exceeds the value of a desired direct to ambient power ratio DAR.sub.MIN. This means that {tilde over (D)}(l) satisfies

(132) 10 log 10 ( σ q 1 2 ( l ) σ q D ~ ( l ) 2 ( l ) ) DAR MIN ^ [ 10 log 10 ( σ q 1 2 ( l ) σ q D ~ ( l ) + 1 2 ( l ) ) > DAR MIN D ~ ( l ) = D ] .Math. ( 81 )

(133) The overall processing for the computation of all dominant directions is can be carried out as follows:

(134) TABLE-US-00001 Algorithm 1 Search of dominant directions given power distribution on the sphere PowerFlag = true {hacek over (d)}{hacek over ( )} = 1 custom character .sub.1 = {1, 2, . . . , Q} repeat   q d = argmax q M d σ q 2 ( l ) if [ d > 1 .Math. 10 log 10 ( n q i 3 ( l ) σ q d 3 ( l ) ) > DAR MIN ] then   PowerFlag = false  else    Ω CURRDOM , d ~ ( l ) = Ω q d ~    d ^ + j = { q d ~ | ( Ω d ^ Ω q d ~ ) > θ MIN }   {hacek over (d)} = {tilde over (d)} + 1  end if until [{hacek over (d)} > D ∨ PowerFlag = false] {hacek over (D)} (l) = {tilde over (d)} − 1

(135) Next, the directions Ω.sub.CURRDOM,{tilde over (d)}(l), 1≤{tilde over (d)}≤{tilde over (D)}(l), obtained in the current frame are smoothed with the directions from the previous frames, resulting in smoothed directions Ω.sub.DOM,d(l), 1≤d≤D. This operation can be subdivided into two successive parts: (a) The current dominant directions Ω.sub.CURRDOM,{tilde over (d)}(l), 1≤{tilde over (d)}≤{tilde over (D)}(l), are assigned to the smoothed directions Ω.sub.DOM,d(l−1), 1≤d≤D, from the previous frame. The assignment function custom character:{1, . . . , {tilde over (D)}(l)}.fwdarw.{1, . . . , D} is determined such that the sum of angles between assigned directions

(136) .Math. d ~ = 1 D ~ ( l ) ( Ω CURRDOM , d _ ( l ) , Ω _ DOM , f �� , l ( d ~ ) ( l - 1 ) ) ( 82 ) is minimised. Such an assignment problem can be solved using the well-known Hungarian algorithm, cf. H. W. Kuhn, “The Hungarian method for the assignment problem”, Naval research logistics quarterly 2, no. 1-2, pp. 83-97, 1955. The angles between current directions Ω.sub.CURRDOM,{tilde over (d)}(l) and inactive directions (see below for explanation of the term ‘inactive direction’) from the previous frame Ω.sub.DOM,d(l−1) are set to 2Θ.sub.MIN. This operation has the effect that current directions Ω.sub.CURRDOM,{tilde over (d)}(l), which are closer than 2Θ.sub.MIN to previously active directions Ω.sub.DOM,d(l−1), are attempted to be assigned to them. If the distance exceeds 2Θ.sub.MIN, the corresponding current direction is assumed to belong to a new signal, which means that it is favoured to be assigned to a previously inactive direction Ω.sub.DOM,d(l−1). Remark: when allowing a greater latency of the overall compression algorithm, the assignment of successive direction estimates may be performed more robust. For example, abrupt direction changes may be better identified without mixing them up with outliers resulting from estimation errors. (b) The smoothed directions Ω.sub.DOM,d(l−1), 1≤d≤D are computed using the assignment from step (a). The smoothing is based on spherical geometry rather than Euclidean geometry. For each of the current dominant directions Ω.sub.CURRDOM,{tilde over (d)} (l), 1≤{tilde over (d)}≤{tilde over (D)}(l), the smoothing is performed along the minor arc of the great circle crossing the two points on the sphere, which are specified by the directions Ω.sub.CURRDOM,{tilde over (d)}(l) and Ω.sub.DOM,d(l−1). Explicitly, the azimuth and inclination angles are smoothed independently by computing the exponentially-weighted moving average with a smoothing factor α.sub.Ω. For the inclination angle this results in the following smoothing operation:

(137) 0 θ _ DOM , f �� , l ( d _ ) ( l ) = ( 1 - α Ω ) .Math. θ _ DOM , f �� , l ( d _ ) ( l - 1 ) + α Ω .Math. θ DOM , d ~ ( l ) , 1 d ~ D ~ ( l ) . ( 83 ) For the azimuth angle the smoothing has to be modified to achieve a correct smoothing at the transition from π−ε to −π, ε>0, and the transition in the opposite direction. This can be taken into consideration by first computing the difference angle modulo 2π as

(138) Δ ϕ , [ 0 , 2 π [ , d ~ ( l ) := [ ϕ DOM , d ~ ( l ) - ϕ _ DOM , f �� , l ( d _ ) ( l - 1 ) ] mod 2 π , ( 84 ) which is converted to the interval [−π,π[ by

(139) Δ ϕ , [ - π , π [ , d ( l ) := ( Δ ϕ , [ 0 , 2 [ , d ~ ( l ) Δ ϕ , [ 0 , 2 [ , d ~ ( l ) - 2 π for for Δ ϕ , [ 0 , 2 π [ , d ~ ( l ) < π Δ ϕ , [ 0 , 2 π [ , d ~ ( l ) π . ( 85 ) The smoothed dominant azimuth angle modulo 2π is determined as

(140) ϕ _ DOM , [ 0 , 2 π [ , d ~ ( l ) := [ ϕ _ DOM , d ~ ( l - 1 ) + α Ω .Math. Δ ϕ , [ - π , π [ , d ~ ( l ) ] mod 2 π ( 86 ) and is finally converted to lie within the interval [−π,π[ by

(141) ϕ _ DOM , d _ ( l ) = ( ϕ _ DOM , [ 0 , 2 π [ , d ~ ( l ) ϕ _ DOM , [ 0 , 2 π [ , d ~ ( l ) - 2 π for for ϕ _ DOM , [ 0 , 2 π [ , d ~ ( l ) < π ϕ _ DOM , [ 0 , 2 π [ , d ~ ( l ) π . ( 87 )

(142) In case {tilde over (D)}(l)<D, there are directions Ω.sub.DOM,d(l−1) from the previous frame that do not get an assigned current dominant direction. The corresponding index set is denoted by
custom character(l):={1, . . . ,D}\{f({tilde over (d)})|1≤{tilde over (d)}≤D}.  (88)

(143) The respective directions are copied from the last frame, i.e.
Ω.sub.DOM,d(l)=Ω.sub.DOM,d(l−1) for ∈custom character(l).  (89)

(144) Directions which are not assigned for a predefined number L.sub.IA of frames are termed inactive.

(145) Thereafter the index set of active directions denoted by custom character(l) is computed. Its cardinality is denoted by D.sub.ACT(l):=custom character(l)|.

(146) Then all smoothed directions are concatenated into a single direction matrix as
Ω.sub.DOM(l):=[Ω.sub.DOM,1(l)Ω.sub.DOM,2(l) . . . Ω.sub.DOM,D(l)].  (90)
Computation of Direction Signals

(147) The computation of the direction signals is based on mode matching. In particular, a search is made for those directional signals whose HOA representation results in the best approximation of the given HOA signal. Because the changes of the directions between successive frames can lead to a discontinuity of the directional signals, estimates of the directional signals for overlapping frames can be computed, followed by smoothing the results of successive overlapping frames using an appropriate window function. The smoothing, however, introduces a latency of a single frame.

(148) The detailed estimation of the directional signals is explained in the following:

(149) First, the mode matrix based on the smoothed active directions is computed according to

(150) Ξ ACT ( l ) := [ S DOM , d ACT , 1 ( l ) S DOM , d ACT , 2 ( l ) .Math. S DOM , d ACT , D ACT ( l ) ( l ) ] O × D ACT ( l ) ( 91 ) with S DOM , d ( l ) := [ S 0 0 ( Ω _ DOM , d ( l ) ) , S 1 - 1 ( Ω _ DOM , d ( l ) ) , S 1 0 ( Ω _ DOM , d ( l ) ) , .Math. , S N N ( Ω _ DOM , d ( l ) ) ] T O , ( 92 )
wherein d.sub.ACT,j, 1≤j≤D.sub.ACT(l) denotes the indices of the active directions.

(151) Next, a matrix X.sub.INST(l) is computed that contains the non-smoothed estimates of all directional signals for the (l−1)-th and l-th frame:
X.sub.INST(l):=[x.sub.INST(l,1)x.sub.INST(l,2) . . . x.sub.INST(l,2B)]∈custom character.sup.D×2B  (93)
with
x.sub.INST(l,j)=[x.sub.INST,1(l,j),x.sub.INST,2(l,j), . . . ,x.sub.INST,D(l,j)].sup.T∈custom character.sup.D, 1≤j≤2B.  (94)

(152) This is accomplished in two steps. In the first step, the directional signal samples in the rows corresponding to inactive directions are set to zero, i.e.
x.sub.INST,d(l,j)=0 ∀1≤j≤2B, if d.Math.custom character(l).  (95)

(153) In the second step, the directional signal samples corresponding to active directions are obtained by first arranging them in a matrix according to

(154) X INST , ACT ( l ) := [ x INST , d ACT , 1 ( l , 1 ) x INST , d ACT , 1 ( l , 2 B ) .Math. ⋱.Math. x INST , d ACT , D ACT ( l ) ( l , 1 ) x INST , d ACT , D ACT ( l ) ( l , 2 B ) ] . ( 96 )

(155) This matrix is then computed such as to minimise the Euclidean norm of the error
Ξ.sub.ACT(l)X.sub.INST,ACT(l)−[C(l−1)C(l)].  (97)
The solution is given by
X.sub.INST,ACT(l)=[Ξ.sub.ACT.sup.T(l)Ξ.sub.ACT(l)].sup.−1Ξ.sub.ACT.sup.T(l)[C(l−1)C(l)].  (98)

(156) The estimates of the directional signals x.sub.INST,d(l,j), 1≤d≤D, are windowed by an appropriate window function w(j):
x.sub.INST,WIN,d(l,j):=x.sub.INST,d(l,j).Math.w(j), 1≤j≤2B.  (99)

(157) An example for the window function is given by the periodic Hamming window defined by

(158) w ( j ) := ( K w [ 0.54 - 0.46 cos ( 2 π j 2 B + 1 ) ] 0 for else 1 j 2 B , ( 100 )
where K.sub.w denotes a scaling factor which is determined such that the sum of the shifted windows equals ‘1’. The smoothed directional signals for the (l−1)-th frame are computed by the appropriate superposition of windowed non-smoothed estimates according to
x.sub.d((l−1)B+j)=x.sub.INST,WIN,d(l−1,B+j)+x.sub.INST,WIN,d(l,j).  (101)
The samples of all smoothed directional signals for the (l−1)-th frame are arranged in matrix X(l−1) as
X(l−1):=[x((l−1)B+1)x((l−1)B+2) . . . x((l−1)B+B)]∈custom character.sup.D×B  (102)
with
(j)=[x.sub.1(j),x.sub.2(j), . . . ,x.sub.D(j)].sup.T∈custom character.sup.D.  (103)
Computation of Ambient HOA Component

(159) The ambient HOA component C.sub.A(l−1) is obtained by subtracting the total directional HOA component C.sub.DIR(l−1) from the total HOA representation C(l−1) according to
C.sub.A(l−1):=C(l−1)−C.sub.DIR(l−1)∈custom character.sup.O×B,  (104)
where C.sub.DIR(l−1) is determined by

(160) C DIR ( l - 1 ) := Ξ DOM ( l - 1 ) [ x INST , WIN , 1 ( l - 1 , B + 1 ) x INST , WIN , 1 ( l - 1 , 2 B ) .Math. .Math. x INST , WIN , D ( l - 1 , B + 1 ) x INST , WIN , D ( l - 1 , 2 B ) ] + Ξ DOM ( l ) [ x INST , WIN , 1 ( l , 1 ) x INST , WIN , 1 ( l , B ) .Math. .Math. x INST , WIN , D ( l , 1 ) x INST , WIN , D ( l , B ) ] , ( 105 )
and where Ξ.sub.DOM(l) denotes the mode matrix based on all smoothed directions defined by
Ξ.sub.DOM(l):=[S.sub.DOM,1(l)S.sub.DOM,2(l) . . . S.sub.DOM,D(l)]∈custom character.sup.O×D.  (106)

(161) Because the computation of the total directional HOA component is also based on a spatial smoothing of overlapping successive instantaneous total directional HOA components, the ambient HOA component is also obtained with a latency of a single frame.

(162) Order Reduction for Ambient HOA Component

(163) Expressing C.sub.A(l−1) through its components as

(164) C A ( l - 1 ) = [ C 0 , A 0 ( ( l - 1 ) B + 1 ) C 0 , A 0 ( ( l - 1 ) B + B ) .Math. .Math. C N , A N ( ( l - 1 ) B + 1 ) C N , A N ( ( l - 1 ) B + B ) ] , ( 107 )
the order reduction is accomplished by dropping all HOA coefficients
c.sub.n,A.sup.m(j) with n>N.sub.RED

(165) 0 C A , RED ( l - 1 ) := [ c 0 , A 0 ( ( l - 1 ) B + 1 ) c 0 , A 0 ( ( l - 1 ) B + B ) .Math. .Math. c N RED , A N RED ( ( l - 1 ) B + 1 ) c N RED , A N RED ( ( l - 1 ) B + B ) ] O RED × B . ( 108 )
Spherical Harmonic Transform for Ambient HOA Component

(166) The Spherical Harmonic Transform is performed by the multiplication of the ambient HOA component of reduced order C.sub.A,RED(l) with the inverse of the mode matrix

(167) Ξ A := [ S A , 1 S A , 2 .Math. S A , O RED ] O RED × O RED ( 109 ) with S A , d := [ S 0 0 ( Ω A , d ) , S 1 - 1 ( Ω A , d ) , S 1 0 ( Ω A , d ) , .Math. , S N RED N RED ( Ω A , d ) ] T O RED , ( 110 )
based on O.sub.RED being uniformly distributed directions
Ω.sub.A,d,1≤d≤O.sub.RED:W.sub.A,RED(l)=(Ξ.sub.A).sup.−1C.sub.A,RED(l).  (111)
Decompression
Inverse Spherical Harmonic Transform

(168) The perceptually decompressed spatial domain signals Ŵ.sub.A,RED(l) are transformed to a HOA domain representation Ĉ.sub.A,RED(l) of order N.sub.RED via an Inverse Spherical Harmonics Transform by
Ĉ.sub.A,RED(l)=Ξ.sub.AŴ.sub.A,RED(l).  (112)
Order Extension

(169) The Ambisonics order of the HOA representation Ĉ.sub.A,RED(l) is extended to N by appending zeros according to

(170) C ^ A ( l ) := [ C ^ A , RED ( l ) 0 ( O - O RED ) × B ] O × B , ( 113 )
where 0.sub.m×n denotes a zero matrix with m rows and n columns.
HOA Coefficients Composition

(171) The final decompressed HOA coefficients are additively composed of the directional and the ambient HOA component according to
Ĉ(l−1):=Ĉ.sub.A(l−1)+Ĉ.sub.DIR(l−1).  (114)

(172) At this stage, once again a latency of a single frame is introduced to allow the directional HOA component to be computed based on spatial smoothing. By doing this, potential undesired discontinuities in the directional component of the sound field resulting from the changes of the directions between successive frames are avoided.

(173) To compute the smoothed directional HOA component, two successive frames containing the estimates of all individual directional signals are concatenated into a single long frame as
{circumflex over (X)}.sub.INST(l):=[{circumflex over (X)}(l−1){circumflex over (X)}(l)]∈custom character.sup.D×2B.  (115)

(174) Each of the individual signal excerpts contained in this long frame are multiplied by a window function, e.g. like that of eq. (100). When expressing the long frame {circumflex over (X)}.sub.INST(l) through its components by

(175) X ^ INST ( l ) = [ x ^ INST , 1 ( l , 1 ) x ^ INST , 1 ( l , 2 B ) .Math. .Math. x ^ INST , D ( l , 1 ) x ^ INST , D ( l , 2 B ) ] , ( 116 )
the windowing operation can be formulated as computing the windowed signal excerpts {circumflex over (x)}.sub.INST,WIN,d(l,j), 1≤d≤D, by
{circumflex over (x)}.sub.INST,WIN,d(l,j)={circumflex over (x)}.sub.INST,d(l,j).Math.w(j), 1≤j≤2B, 1≤d≤D.  (117)

(176) Finally, the total directional HOA component C.sub.DIR(l−1) is obtained by encoding all the windowed directional signal excerpts into the appropriate directions and superposing them in an overlapped fashion:

(177) C ^ DIR ( l - 1 ) = Ξ DOM ( l - 1 ) [ x ^ INST , WIN , 1 ( l - 1 , B + 1 ) x ^ INST , WIN , 1 ( l - 1 , 2 B ) .Math. .Math. x ^ INST , WIN , D ( l - 1 , B + 1 ) x ^ INST , WIN , D ( l - 1 , 2 B ) ] + Ξ DOM ( l ) [ x ^ INST , WIN , 1 ( l , 1 ) x ^ INST , WIN , 1 ( l , B ) .Math. .Math. x ^ INST , WIN , D ( l , 1 ) x ^ INST , WIN , D ( l , B ) ] . ( 118 )
Explanation of Direction Search Algorithm

(178) In the following, the motivation is explained behind the direction search processing described in section Estimation of dominant directions. It is based on some assumptions which are defined first.

(179) Assumptions

(180) The HOA coefficients vector c(j), which is in general related to the time domain amplitude density function d(i,Ω) through
c(j)=custom characterd(j,Ω)S(Ω)dΩ,  (119)
is assumed to obey the following model:
c(j)=Σ.sub.i=1.sup.Ix.sub.i(j)S(Ω.sub.x.sub.i(l))+c.sub.A(j) for lB+1≤j≤(l+1)B.  (120)

(181) This model states that the HOA coefficients vector c(j) is on one hand created by l dominant directional source signals x.sub.i(j), 1≤i≤l, arriving from the directions Ω.sub.x.sub.i(l) in the l-th frame. In particular, the directions are assumed to be fixed for the duration of a single frame. The number of dominant source signals l is assumed to be distinctly smaller than the total number of HOA coefficients O. Further, the frame length B is assumed to be distinctly greater than O. On the other hand, the vector c(j) consists of a residual component c.sub.A(j), which can be regarded as representing the ideally isotropic ambient sound field.

(182) The individual HOA coefficient vector components are assumed to have the following properties: The dominant source signals are assumed to be zero mean, i.e.
Σ.sub.j=lB+1.sup.(l+1)Bx.sub.i(j)≈0 ∀1≤i≤l,  (121) and are assumed to be uncorrelated with each other, i.e.

(183) 1 B .Math. j = lB + 1 ( l + 1 ) B x i ( j ) x i ( j ) δ i - i σ _ x i 2 ( l ) 1 i , i I ( 122 ) with σ.sub.x.sub.i.sup.2(l) denoting the average power of the i-th signal for the l-th frame. The dominant source signals are assumed to be uncorrelated with the ambient component of HOA coefficient vector, i.e.

(184) 1 B .Math. j = lB + 1 ( l + 1 ) B x i ( j ) c A ( j ) 0 1 i I . ( 123 ) The ambient HOA component vector is assumed to be zero mean and is assumed to have the covariance matrix

(185) .Math. A ( l ) := 1 B .Math. j = lB + 1 ( l + 1 ) B c A ( j ) c A T ( j ) . ( 124 ) The direct-to-ambient power ratio DAR(l) of each frame l, which is here defined by

(186) DAR ( l ) := 10 log 10 [ max 1 i I σ _ x i 2 ( l ) .Math. .Math. A ( l ) .Math. 2 ] , ( 125 ) is assumed to be greater than a predefined desired value
DAR.sub.MIN, i.e. DAR(l)≥DAR.sub.MIN.  (126)
Explanation of Direction Search

(187) For the explanation the case is considered where the correlation matrix B(l) (see eq. (67)) is computed based only on the samples of the l-th frame without considering the samples of the L−1 previous frames. This operation corresponds to setting L=1. Consequently, the correlation matrix can be expressed by

(188) B ( l ) = 1 B C ( l ) C T ( l ) ( 127 ) = 1 B .Math. j = lB + 1 ( l + 1 ) B c ( j ) c T ( j ) . ( 128 )

(189) By substituting the model assumption in eq. (120) into eq. (128) and by using equations (122) and (123) and the definition in eq. (124), the correlation matrix B(l) can be approximated as

(190) 0 B ( l ) = 1 B .Math. j = lB + 1 ( l + 1 ) B [ .Math. i = 1 I x i ( j ) S ( Ω x i ( l ) ) + c A ( j ) ] [ .Math. i = 1 I x i ( j ) S ( Ω x i ( l ) ) + c A ( j ) ] T = .Math. i = 1 I .Math. i = 1 I S ( Ω x i ( l ) ) S T ( Ω x i ( l ) ) 1 B .Math. j = lB + 1 ( l + 1 ) B x i ( j ) x i ( j ) + .Math. i = 1 I S ( Ω x i ( l ) ) 1 B .Math. j = lB + 1 ( l + 1 ) B x i ( j ) c A T ( j ) + .Math. i = 1 I 1 B .Math. j = lB + 1 ( l + 1 ) B x i ( j ) c A ( j ) S T ( Ω x i ( l ) ) ( 129 ) + 1 B .Math. j = lB + 1 ( l + 1 ) B c A ( j ) c A T ( j ) ( 130 ) .Math. i = 1 I σ _ x i 2 ( l ) S ( Ω x i ( l ) ) S T ( Ω x i ( l ) ) + .Math. A ( l ) . ( 131 )

(191) From eq. (131) it can be seen that B(l) approximately consists of two additive components attributable to the directional and to the ambient HOA component. Its custom character(l)-rank approximation Bcustom character(l) provides an approximation of the directional HOA component, i.e.
Bcustom character(l)≈Σ.sub.i=1.sup.Iσ.sub.x.sub.i.sup.2(l)S(Ω.sub.x.sup.i(l))S.sup.T(Ω.sub.x.sub.i(l)),  (132)
which follows from the eq. (126) on the directional-to-ambient power ratio.

(192) However, it should be stressed that some portion of Σ.sub.A(l) will inevitably leak into Bcustom character(l), since Σ.sub.A(l) has full rank in general and thus, the subspaces spanned by the columns of the matrices Σ.sub.i=1.sup.Iσ.sub.x.sub.i.sup.2(l)S(Ω.sub.x.sub.i(l))S.sup.T(Ω.sub.x.sub.i(l)) and Σ.sub.A(l) are not orthogonal to each other. With eq. (132) the vector σ.sup.2(l) in eq. (77), which is used for the search of the dominant directions, can be expressed by

(193) σ 2 ( l ) = diag ( Ξ T B �� ( l ) Ξ ) ( 133 ) = diag ( [ S T ( Ω 1 ) B �� ( l ) S ( Ω 1 ) S T ( Ω 1 ) B �� ( l ) S ( Ω Q ) .Math. .Math. S T ( Ω Q ) B �� ( l ) S ( Ω 1 ) S T ( Ω Q ) B �� ( l ) S ( Ω Q ) ] ) ( 134 ) diag ( [ .Math. i = 1 I σ _ x i 2 ( l ) v N 2 ( ( Ω 1 , Ω x i ) ) .Math. i = 1 I σ _ x i 2 ( l ) v N ( ( Ω 1 , Ω x i ) ) v N ( ( Ω x i , Ω Q ) ) .Math. .Math. .Math. i = 1 I σ _ x i 2 ( l ) v N ( ( Ω Q , Ω x i ) ) v N ( ( Ω x i , Ω 1 ) ) .Math. i = 1 I σ _ x i 2 ( l ) v N 2 ( ( Ω Q , Ω x i ) ) ] ) ( 135 ) = [ .Math. i = 1 I σ _ x i 2 ( l ) v N 2 ( ( Ω 1 , Ω x i ) ) .Math. .Math. i = 1 I σ _ x i 2 ( l ) v N 2 ( ( Ω Q , Ω x i ) ) ] T . ( 136 )

(194) In eq. (135) the following property of Spherical Harmonics shown in eq. (47) was used:
S.sup.T(Ω.sub.q)S(Ω.sub.q′)=v.sub.N(∠(Ω.sub.q,Ω.sub.q′)).  (137)

(195) Eq. (136) shows that the σ.sub.q.sup.2(l) components of σ.sup.2(l) are approximations of the powers of signals arriving from the test directions Ω.sub.q, 1≤q≤Q.