METHOD AND APPARATUS FOR ENCODING AND DECODING AN HOA REPRESENTATION

Abstract

The present invention relates to methods and apparatus for encoding an HOA signal representation (c(t)) of a sound field having an order of N and a number O=(N+1).sup.2 of coefficient sequences to a mezzanine HOA signal representation (w.sub.MEZZ(t)). The present invention further relates to methods and apparatus for decoding a reconstructed HOA signal representation from the mezzanine HOA signal representation.

Claims

1.-18. (canceled)

19. A method for encoding an ambisonics signal representation of a sound field having an order N to determine a mezzanine ambisonics signal representation, the method comprising: receiving, by a processor configured to encode the ambisonics signal representation, a first multi-channel signal comprising first channels O, wherein O=(N+1){circumflex over ( )}2, wherein the O channels correspond to O Higher Order Ambisonics (HOA) coefficient sequences, and wherein the ambisonics signal representation is represented by the first multi-channel signal; receiving, by the processor, transforming information for encoding the first multi-channel signal of the ambisonics signal representation, wherein the transforming information includes a matrix comprising mapping information for mapping the O HOA coefficient sequences to O virtual loudspeaker signals; and transforming, by the processor, the first multi-channel signal to a second multi-channel signal based on the transforming information, wherein the mezzanine ambisonics signal representation is represented by the second multi-channel signal, and wherein the second multi-channel signal comprises a second number of channels I, and wherein the I channels represent I groups of virtual loudspeaker signals.

20. The method of claim 19, wherein the I groups comprising of two virtual loudspeakers per group.

21. A computer program product comprising a nontransitive storage medium, the computer program product including code that, when executed by the processor is configured to perform the method of claim 19.

22. An apparatus for encoding an ambisonics signal representation of a sound field having an order N to determine a mezzanine ambisonics signal representation, the apparatus comprising: a first receiver configured to receive a first multi-channel signal comprising a first channels O, wherein O=(N+1){circumflex over ( )}2, wherein the O channels correspond to O Higher Order Ambisonics (HOA) coefficient sequences, and wherein the ambisonics signal representation is represented by the first multi-channel signal; a second receiver configured to receive transforming information for transforming the first multi-channel signal of the first number of channels O to the mezzanine ambisonics signal representation, wherein the transforming information includes a matrix comprising mapping information for mapping the O HOA coefficient sequences to O virtual loudspeaker signals; and a processing unit configured to transform the first multi-channel signal to a second multi-channel signal based on the transforming information, wherein the mezzanine ambisonics signal representation is represented by the second multi-channel signal, and wherein the second multi-channel signal comprises a second number of channels I, and wherein the I channels represent I groups of virtual loudspeaker signals.

23. The apparatus of claim 22, wherein the I groups comprising of two virtual loudspeakers per group.

24. A method for decoding a mezzanine ambisonics signal representation to determine a reconstructed ambisonics signal representation of a sound field having an order N, the method comprising: receiving, by a processor configured to decode the mezzanine ambisonics signal representation, a first multi-channel signal of the mezzanine ambisonics signal representation, the first multi-channel signal of the mezzanine ambisonics signal representation having a first number of channels I; receiving, by the processor, transforming information for decoding the first multi-channel signal of the mezzanine ambisonics signal representation, wherein the transforming information includes matrix information for mapping O number of virtual loudspeakers to O sequences of Higher Order Ambisonics (HOA) coefficient sequences that represent the reconstructed ambisonics signal representation; and transforming, by the processor, the first multi-channel signal to a second multi-channel signal based in part on the transforming information, wherein the second multi-channel signal represents the reconstructed ambisonics signal representation, wherein the second multi-channel signal comprises O channels, wherein O=(N+1){circumflex over ( )}2, and wherein the transforming includes de-grouping the I channels to O de-grouped channels.

25. The method of claim 24, wherein the matrix information includes information regarding a decoding matrix V that is a pseudo inverse of an encoding matrix V+.

26. The method of claim 24, wherein the de-grouping includes de-grouping groups of two virtual loudspeakers.

27. A computer program product comprising a nontransitive storage medium, the computer program product including code that, when executed by the processor is configured to perform the method of claim 24.

28. An apparatus for decoding a mezzanine ambisonics signal representation to determine a reconstructed ambisonics signal representation of a sound field having an order N, the apparatus comprising: a first receiver configured to receive a first multi-channel signal of the mezzanine ambisonics signal representation, the first multi-channel signal of the mezzanine ambisonics signal representation having a first number of channels I; a second receiver configured to receive transforming information for decoding the first multi-channel signal of the mezzanine ambisonics signal representation, wherein the transforming information includes matrix information for mapping O number of virtual loudspeakers to O sequences of Higher Order Ambisonics (HOA) coefficient sequences that represent the reconstructed ambisonics signal representation; and a processing unit configured to transform the first multi-channel signal to a second multi-channel signal based in part on the transforming information, wherein the second multi-channel signal represents the reconstructed ambisonics signal representation, wherein the second multi-channel signal comprises O number of channels, wherein O=(N+1){circumflex over ( )}2, and wherein the transforming includes de-grouping the I channels to O de-grouped channels.

29. The apparatus of claim 28, wherein the matrix information includes information regarding a decoding matrix V that is pseudo inverse of an encoding matrix V+.

30. The apparatus of claim 28, wherein the de-grouping includes de-grouping groups of two virtual loudspeakers.

Description

BRIEF DESCRIPTION OF DRAWINGS

[0043] Exemplary embodiments of the invention are described with reference to the accompanying drawings, which show in:

[0044] FIG. 1 illustrates an exemplary conversion of a combination of object based and HOA sound field representations to a multi-channel PCM format;

[0045] FIG. 2 illustrates an exemplary reconstruction of a combination of object based and HOA sound field representations from a multi-channel PCM format;

[0046] FIG. 3 illustrates an exemplary normalized dispersion function ξ.sub.N(Θ) for different Ambisonics orders N and for angles Θ∈[0,π];

[0047] FIG. 4 depicts an exemplary illustration of directions Ω.sub.j.sup.(N), 1≤j≤O for N=3 (computed according to [3]) presented in a three-dimensional coordinate system as sampling positions (drawn as crosses) on the unit sphere, where only those directions that are visible from the given viewpoint are shown;

[0048] FIG. 5 illustrates exemplary dispersion functions ξ.sub.N(Θ) for 9-th and 11-th virtual loudspeaker signal computed according to the conventional spatial transform using directions Ω.sub.j.sup.(3), 1<j<16 computed according to [3]. The values of the dispersion function are coded into the shading of the sphere, where high values are shaded into dark grey to black and low values into light grey to white;

[0049] FIG. 6 illustrates exemplary dispersion functions resulting from the combination of the mode vectors for 9-th and 11-th virtual loudspeaker directions computed according to the conventional spatial transform using directions Ω.sub.j.sup.(3), 1<j<16 computed according to [3]. The values of the dispersion function are coded into the shading of the sphere, where high values are shaded into dark grey to black and low values into light grey to white;

[0050] FIG. 7 illustrates an exemplary spherical coordinate system.

DESCRIPTION OF EMBODIMENTS

[0051] Even if not explicitly described, the following embodiments may be employed in any combination or sub-combination.

[0052] In the following a mezzanine HOA format is described that is computed by a modified spatial transform of a conventional HOA representation consisting of O coefficient sequences to an arbitrary and non-quadratic number I of virtual loudspeaker signals.

[0053] Without loss of generality, it is further assumed in the following that I<O, since for the opposite case it is always possible to artificially extend the number of coefficient sequences of the original HOA representation by appending an appropriate number of zero coefficient sequences.

[0054] A first optional step is to reduce the order N of the original HOA representation to a smaller order N.sub.R such that the resulting number O.sub.R=(N.sub.R+1).sup.2 of coefficient sequences is the next upper square integer number to the desired number I of virtual loudspeaker signals, i.e. the reduced number O.sub.R of coefficient sequences is the smallest integer number square that is greater than the number I. The rationale behind this step is the fact that is not reasonable to represent an HOA representation of an order greater than N.sub.R by a number I<O.sub.R of virtual loudspeaker signals, of which the directions cover the sphere as uniformly as possible. This means that in the following the transform of a conventional HOA representation consisting of O.sub.R (rather than O) coefficient sequences to an arbitrary number I of virtual loudspeaker signals is considered. Nevertheless, it is also possible to set O.sub.R=O and to ignore this optional order reduction.

[0055] In case this first optional step is not carried out, in the following NR is replaced by N, O.sub.R by O, c.sub.R(t) by c(t), S.sub.n,R by S.sub.n, Ψ.sub.R by Ψ, Ψ.sub.R.sup.−1 by Ψ.sup.−1, and w.sub.R(t) by w(t).

[0056] The next step is to consider the conventional spatial transform for an HOA representation of order N.sub.R (described in section Spatial transform), and to sub-divide the virtual speaker directions Ω.sub.j.sup.(N.sup.R.sup.), 1≤j≤O.sub.R into the desired number I of groups of neighbouring directions. The grouping is motivated by a spatially selective reduction of spatial resolution, which means that the grouped virtual loudspeaker signals are meant to be replaced by a single one. The effect of this replacement on the sound field is explained in section Illustration of grouping effect. The grouping can be expressed by I sets custom-character , i=1 . . . , I, which contain the indices of the virtual directions grouped into the i-th group.

[0057] Subsequently, the mode vectors

S.sub.n,R:=[S.sub.0.sup.0(Ω.sub.n.sup.(N.sup.R.sup.))S.sub.1.sup.−1(Ω.sub.n.sup.(N.sup.R.sup.))S.sub.1.sup.0(Ω.sub.n.sup.(N.sup.R.sup.))S.sub.1.sup.1(Ω.sub.n.sup.(N.sup.R.sup.)) . . . S.sub.N.sup.N−1(Ω.sub.n.sup.(N.sup.R.sup.))S.sub.N.sub.R.sup.N.sup.R(Ω.sub.n.sup.(N.sup.R.sup.))].sup.T∈ custom-character .sup.O.sup.R (3)

for directions Ω.sub.n.sup.(N.sup.R.sup.) within each group are linearly combined resulting in the vectors

V.sub.i=Σ.sub.n∈ custom-character .sub.iα.sub.nS.sub.n,R∈.sup.O.sup.R, (7)

where α.sub.n≥0 denotes the weight of S.sub.n,R for the combination. The choice of the weights is addressed in more detail in the following section Choice of the weights for combination of mode vectors.

[0058] The vectors V.sub.i are finally used to construct the matrix

V:=K.Math.[V.sub.1V.sub.2 . . . V.sub.I]∈ custom-character .sup.O.sup.R.sup.×I (8)

with an arbitrary positive real-valued scaling factor K>0 to replace the scaled mode matrix Ψ used for the conventional spatial transform.

[0059] The mezzanine HOA representation w.sub.MEZZ(t) is then computed from the order reduced HOA representation, denoted by c.sub.R(t), through

w.sub.MEZZ(t)=V.sup.+.Math.c.sub.R(t) (9)

with (⋅).sup.+ indicating the Moore-Penrose pseudoinverse of a matrix.

[0060] The inverse transform for computing a recovered conventional HOA representation ĉ.sub.R(t) of order N.sub.R from the mezzanine HOA representation is given by

ĉ.sub.R(t)=V.Math.w.sub.MEZZ(t). (10)

An N-th order HOA representation c(t) can be recovered by zero-padding c.sub.R(t) according to

[00001] $\begin{matrix} c (t) = [\begin{matrix} c_{R} (t) \\ 0 \end{matrix}], & (11) \end{matrix}$

where O denotes a zero vector of dimension O-O.sub.R.

[0061] Note that, in general, the transform is not lossless such that ĉ(t)≠c(t). This is due to the order reduction on one hand, and the fact that the rank of the transform matrix V is I at most on the other hand. The latter can be expressed by a spatially selective reduction of spatial resolution resulting from the grouping of virtual speaker directions, which will be illustrated in the next section.

[0062] A somewhat different computation of the mezzanine HOA representation compared to equation (9) is obtained by expressing matrix V by

V=Ψ.sub.R.Math.A, (12)

where Ψ.sub.R denotes the mode matrix of the reduced order N.sub.R with respect to the directions Ω.sub.j.sup.(N.sup.R.sup.), 1≤j≤O.sub.R, and where A∈ custom-character .sub.0 is a weighting factor matrix, whose elements a.sub.i,n can be expressed in dependence on the weights α.sub.n, n=1, . . . , O.sub.R, by

[00002] $\begin{matrix} α_{i, n} = {\begin{matrix} α_{n} & if the n - th direction is grouped into group \\ 0 & else \end{matrix} . & (13) \end{matrix}$

[0063] The alternative mezzanine HOA representation can then be computed from the order reduced HOA representation c.sub.R(t) by

w.sub.MEZZ,ALT(t)=A.sup.+.Math.Ψ.sub.R.sup.−1c.sub.R(t), (14)

with the inverse transform being equivalent to equation (10), i.e.

c.sub.R,ALT(t)=V.Math.w.sub.MEZZ,ALT(t). (15)

[0064] By expressing equation (14) as

w.sub.MEZZ,ALT(t)=A.sup.+.Math.w.sub.R(t), (16)

where

w.sub.R(t)=Ψ.sub.R.sup.−1.Math.c.sub.R(t), (17)

it can be seen that the virtual loudspeakers w.sub.MEZZ,ALT(t) of this alternative transform are computed by a linear combination of the virtual loudspeaker signals w.sub.R(t) of the conventional spatial transform. Finally, it should be noted that the mezzanine HOA representation w.sub.MEZZ(t) is optimal in the sense that the corresponding recovered conventional HOA representation c.sub.R(t) has the smallest error (measured by the Euclidean norm) to the order-reduced original HOA representation c.sub.R(t). Hence, it should be the preferred choice to keep the losses during the transform as small as possible. The alternative mezzanine HOA representation w.sub.MEZZ,ALT(t) has the property of best approximating (measured by the Euclidean norm) the virtual loudspeaker signals w.sub.R(t) of the conventional spatial transform.

[0065] In practice, it is possible to pre-compute the matrices V and corresponding matrices V.sup.+ (or, for the alternative embodiment processing, the matrices A.sup.+ and Ψ.sub.R.sup.−1, or their product A.sup.+.Math.Ψ.sub.R.sup.−1) for different desired numbers I of virtual loudspeaker signals and for corresponding reduced orders N.sub.R of input HOA representations. Storing the resulting matrices V within an inverse transform processing unit and storing the resulting matrices V.sup.+ (or for the alternative processing the matrices A.sup.+ and Ψ.sub.R.sup.−1, or their product A.sup.+.Math.Ψ.sub.R.sup.−1) within the transform processing unit, will define the behaviour of the transform processing unit and the inverse transform processing unit for different desired numbers I of virtual loudspeaker signals and corresponding reduced orders N.sub.R of input HOA representations.

Choice of the Weights for Combination of Mode Vectors

[0066] The weights can be used for controlling the reduction of the spatial resolution in the region covered by the directions Ω.sub.n.sup.(N.sup.R.sup.) of the i-th group, i.e. for n∈ custom-character .sub.i. In particular, a greater weight α.sub.n, compared to other weights in the same group, can be applied to ensure that the resolution in the neighbourhood of the direction Ω.sub.n.sup.(N.sup.R.sub.) is not affected as much as in the neighbourhood of the other directions in the same group. Setting an individual weight α.sub.n to a low value (or even to zero) has the effect of attenuating (or even removing) contributions to the resulting sound field from general plane waves with directions of incidence in the neighbourhood of direction Ω.sub.n.sup.(N.sup.R.sup.).

[0067] An exemplary reasonable choice for the weights is

α.sub.n=1 ∀n∈ custom-character .sub.i, (18)

where all mode vectors are combined equally. With this choice the spatial resolution is reduced uniformly over the neighbourhood of the directions Ω.sub.n.sup.(N.sup.R.sup.) of the i-th group, i.e. for n∈ custom-character .sub.i. Further, the created virtual loudspeaker signals w.sub.MEZZ,i(t) will have approximately the same value range as the average of the replaced virtual loudspeaker signals w.sub.n(t), n∈.sub.i. Hence, assuming that the original HOA representation is normalised such that virtual loudspeaker signals resulting from the conventional spatial transform lie in the same value range of [−1,1[, this choice of the weights is the preferred one for the transmission of HOA representations over SDI.

[0068] An alternative exemplary choice is

[00003] $\begin{matrix} α_{n} = \frac{1}{.Math. i .Math.} \forall n \in i, & (19) \end{matrix}$

where |⋅| denotes the cardinality of a set. In this case, the spatial blurring is the same as with equation (18). However, the value range of the created virtual loudspeaker signals is approximately equal to that of the sum of the replaced virtual loudspeaker signals.

Illustration of Grouping Effect

[0069] To understand the effects of the proposed modified spatial transform, it is reasonable to first understand the conventional spatial transform.

[0070] For HOA the sound pressure p(t,x) at time t and position x in a sound source free listening area can be represented by a superposition of an infinite number of general plane waves arriving from all possible directions Ω=(θ,ϕ), i.e.

p(t,x)= custom-character p.sub.GPW(t,x,Ω)dΩ (20)

c(t,Ω)=p.sub.GPW(t,x,Ω)|.sub.x=x.sub.ORIG (21)

represents the contribution of each general plane wave to the sound pressure in the coordinate origin x.sub.ORIG=(000).sup.T. This function is expanded into a series of Spherical Harmonics for each time instant t according to

c(t,Ω=(θ,ϕ))=Σ.sub.n=0.sup.NΣ.sub.m=−n.sup.nc.sub.n.sup.m(t)S.sub.n.sup.m(θ,ϕ), (22)

wherein the conventional HOA coefficient sequences c.sub.n.sup.m(t) are the weights of the expansion, regarded as functions over time t. Assuming an infinite order of the expansion (22), the function c(t, Ω) for a single general plane wave y(t) from direction Ω.sub.0 can be factored into a time dependent and a direction dependent component according to

c(t,Ω)=y(t).Math.δ(Ω−Ω.sub.0) for N.fwdarw.∞, (23)

where δ(⋅) denotes the Dirac delta function. The corresponding HOA coefficient sequences are given by

[00004] $\begin{matrix} c_{n}^{m} (t) = \frac{1}{4 π} .Math. \int_{{��}^{2}} c (t, Ω) S_{n}^{m} (θ, ϕ) d Ω & (24) \\ = y (t) .Math. \frac{1}{4 π} .Math. S_{n}^{m} (θ_{0}, ϕ_{0}) & (25) \end{matrix}$

The truncation of the expansion (22) to a finite order N, however, introduces a spatial dispersion on the direction dependent component. This can be seen by plugging the expression (25) for the HOA coefficients into the expansion (22), resulting in

[00005] $\begin{matrix} c (t, (θ, ϕ)) = y (t) .Math. \frac{1}{4 π} .Math. {.Math.}_{n = 0}^{N} {.Math.}_{m = - n}^{n} S_{n}^{m} (θ_{0}, ϕ_{0}) S_{n}^{m} (θ, ϕ) & (26) \end{matrix}$

for a finite order N. It can be shown (see [9]) that equation (26) can be simplified to

[00006] $\begin{matrix} c (t, (θ, ϕ)) = y (t) .Math. ξ_{N} (Θ) & (27) \\ with ξ_{N} (Θ) := \frac{N + 1}{4 π (\cos Θ - 1)} (P_{N + 1} (\cos θ) - P_{N} (\cos Θ)), & (28) \end{matrix}$

wherein Θ denotes the angle between the two vectors pointing towards the directions Ω and Ω.sub.0.

[0071] Now, the directional dispersion effect becomes obvious by comparing the case for an infinite order shown in equation (23) with the case for a finite order expressed by equation (27). It can be seen that for the latter case the Dirac delta function is replaced by the dispersion function ξ.sub.N(Θ), which is illustrated in FIG. 3 after having been normalised by its maximum value for different Ambisonics orders N, whereby the vertical scale is

[00007] $\frac{ξ_{N} (Θ)}{\max_{Θ} ξ_{N} (Θ)}$

and the horizontal scale is Θ. In this context, dispersion means that a general plane wave is replaced by infinitely many general plane waves, of which the amplitudes are modelled by the dispersion function ξ.sub.N(Θ).

[0072] Because the first zero of ξ.sub.N(Θ) is located approximately at

[00008] $\frac{π}{N}$

for N≥4 (see [9]), the dispersion effect is reduced (and thus the spatial resolution is improved) with increasing Ambisonics order N. For N.fwdarw.∞ the dispersion function ξ.sub.N(Θ) converges to the Dirac delta function.

[0073] Having the dispersion effect in mind, the conventional spatial transform is considered again and the relation (5) between the conventional HOA coefficient sequences and the virtual loudspeaker signals is reformulated using below equation (35) and equations (1), (2) and (3) to

c.sub.n.sup.m(t)=Σ.sub.j=1.sup.oK.Math.S.sub.n.sup.m(Ω.sub.j.sup.(N)).Math.w.sub.j(t). (29)

It appears that the contribution due to each j-th virtual loudspeaker has the same form as in expression (25) with

[00009] $K = \frac{1}{4 π} .$

That actually means that the virtual loudspeaker signals have to be interpreted as directionally dispersed general plane wave signals.

[0074] To illustrate this, the conventional spatial transform for a third order HOA representation (i.e. for N=3) is considered, where the directions for the virtual loudspeakers Ω.sub.j.sup.(N), 1≤j≤O (computed according to [3]) are depicted in FIG. 4.

[0075] In FIG. 5 exemplarily shows the dispersion functions for the 9-th and 11-th virtual loudspeaker signal in FIG. 5a and FIG. 5b, respectively. To further illustrate the effect of virtual directions grouping for the modified spatial transform, it is assumed that the corresponding directions Ω.sub.9.sup.(3) and Ω.sub.11.sup.(3) have been grouped together. The direction-dependent dispersion of the contribution of the resulting virtual loudspeaker signal is shown for two different choices of weights in FIG. 6 in order to exemplarily demonstrate the effect of the weighting.

[0076] For FIG. 6a an equal weighting of α.sub.9=α.sub.11=1 is assumed, such that the resulting dispersion function is a pure sum of the dispersion functions for the 9-th and 11-th virtual loudspeaker signal. In FIG. 6b the weighting for the dispersion function for the 9-th virtual loudspeaker is reduced to α.sub.9=0.3, resulting in a more concentrated dispersion function and making its maximum move closer to the direction Ω.sub.11.sup.(3).

Basics of Higher Order Ambisonics

[0077] Higher Order Ambisonics (HOA) is based on the description of a sound field within a compact spatial area of interest, which is assumed to be free of sound sources. The spatio-temporal behaviour of the sound pressure p(t,x) at time t and position x within the spatial area of interest is physically fully determined by the homogeneous wave equation. In the following, a spherical coordinate system is assumed as shown in FIG. 7. In this coordinate system the x axis points to the frontal position, the y axis points to the left, and the z axis points to the top. A position in space x=(r,θ,ϕ).sup.T is represented by a radius r≥0 (i.e. the distance to the coordinate origin), an inclination angle θ∈[0,π] measured from the polar axis z and an azimuth angle ϕ∈[0,2π] measured counter-clockwise in the x-y plane from the x axis. Further, (⋅).sup.T denotes a transposition.

[0078] It can be shown (see [10]) that the Fourier transform of the sound pressure with respect to time denoted by custom-character .sub.t(⋅), i.e.

P(ω,x)= custom-character .sub.t(p(t,x))=∫.sub.−∞.sup.∞p(t,x)e.sup.−iωtdt (30)

with ω denoting the angular frequency and i indicating the imaginary unit, can be expanded into a series of Spherical Harmonics according to

P(ω=kc.sub.s,r,θ,ϕ)=Σ.sub.n=0.sup.NΣ.sub.m=−n.sup.nA.sub.n.sup.m(k)j.sub.n(kr)S.sub.n.sup.m(θ,ϕ). (31)

In equation (31), c.sub.s denotes the speed of sound and k denotes the angular wave number, which is related to the angular frequency ω by

[00010] $k = \frac{ω}{c_{s}} .$

Further, j.sub.n(⋅) denote the spherical Bessel functions of the first kind and S.sub.n.sup.m(θ,ϕ) denote the real valued Spherical Harmonics of order n and degree m, which are defined in below section Definition of real valued Spherical Harmonics. The expansion coefficients A.sub.n.sup.m(k) depend only on the angular wave number k. Note that it has been implicitly assumed that sound pressure is spatially band-limited. Thus the series is truncated with respect to the order index n at an upper limit N, which is called the order of the HOA representation.

[0079] Because the spatial area of interest is assumed to be free of sound sources, the sound field can be represented by a superposition of an infinite number of general plane waves arriving from all possible directions Ω=(θ,ϕ), i.e.

p(t,X)= custom-character p.sub.GPW(t,x,Ω)dΩ, (32)

where custom-character .sup.2 indicates the unit sphere in the three-dimensional space and p.sub.GPW(t, x, Ω) denotes the contribution of the general plane wave from direction Ω to the pressure at time t and position x. Evaluating the contribution of each general plane wave to the pressure in the coordinate origin x.sub.ORIG=(000).sup.T provides a time and direction dependent function

c(t,Ω)=p.sub.GPW(t,x,Ω)|.sub.x=x.sub.ORIG, (33)

which is then for each time instant expanded into a series of Spherical Harmonics according to

c(t,Ω=(θ,ϕ))=Σ.sub.n=0.sup.NΣ.sub.m=−n.sup.nc.sub.n.sup.m(t)S.sub.n.sup.m(θ,ϕ). (34)

[0080] The weights c.sub.n.sup.m(t) of the expansion, regarded as functions over time t, are referred to as continuous-time HOA coefficient sequences and can be shown to always be real-valued. Collected in a single vector c(t) according to

c(t)=[c.sub.0.sup.0(t)c.sub.1.sup.−1(t)c.sub.1.sup.0(t)c.sub.1.sup.1(t)c.sub.2.sup.−2(t)c.sub.2.sup.−1(t)c.sub.2.sup.0(t)c.sub.2.sup.1(t)c.sub.2.sup.2(t) . . . c.sub.N.sup.N−1(t)c.sub.N.sup.N(t)].sup.T, (35)

they constitute the actual HOA sound field representation. The position index of an HOA coefficient sequence c.sub.n.sup.m(t) within the vector c(t) is given by n(n+1)+1+m. The overall number of elements in the vector c(t) is given by O=(N+1).sup.2.

[0081] The knowledge of the continuous-time HOA coefficient sequences is theoretically sufficient for perfect reconstruction of the sound pressure within the spatial area of interest, since it can be shown that their Fourier transforms with respect to time, i.e. C.sub.n.sup.m(ω)= custom-character (c.sub.n.sup.m(t)), are related to the expansion coefficients A.sub.n.sup.m(k) (from equation (31)) by

A.sub.n.sup.m(k)=i.sup.nC.sub.n.sup.m=kc.sub.s). (36)

Definition of Real Valued Spherical Harmonics

[0082] The real-valued spherical harmonics S.sub.n.sup.m(θ,ϕ) (assuming SN3D normalisation (see chapter 3.1 in [2]) are given by

[00011] $\begin{matrix} S_{n}^{m} (θ, ϕ) = \sqrt{(2 n + 1) \frac{(n - .Math. m .Math.)!}{(n + .Math. m .Math.)!}} P_{n, .Math. m .Math.} (\cos θ) t r g_{m} (ϕ) & (37) \\ with tr g_{m} (ϕ) = {\begin{matrix} \sqrt{2} \cos (m ϕ) & m > 0 \\ 1 & m = 0 \\ - \sqrt{2} \sin (m ϕ) & m < 0 \end{matrix} . & (38) \end{matrix}$

The associated Legendre functions P.sub.n,m(x) are defined as

[00012] $\begin{matrix} P_{n, m} (x) = {(1 - x^{2})}^{\frac{m}{2}} \frac{d^{m}}{d x^{m}} P_{n} (x), m \geq 0 & (39) \end{matrix}$

with the Legendre polynomial P.sub.n(x) and, unlike in [10], without the Condon-Shortley phase term (−1).sup.m.

[0083] There are also alternative definitions of ‘spherical harmonics’. In such case the transformation described is also valid.

[0084] The described processing can be carried out by a single processor or electronic circuit, or by several processors or electronic circuits operating in parallel and/or operating on different parts of the complete processing.

[0085] The instructions for operating the processor or the processors according to the described processing can be stored in one or more memories. The at least one processor is configured to carry out these instructions.

REFERENCES

[0086] [1] ISO/IEC JTC1/SC29/WG11 DIS 23008-3, “Information technology—High efficiency coding and media delivery in heterogeneous environments—Part 3: 3D Audio”, July 2014 [0087] [2] J. Daniel, “Représentation de champs acoustiques, application à la transmission et à la reproduction de scènes sonores complexes dans un contexte multimédia”, PhD thesis, Université Paris 6, 2001 [0088] [3] J. Fliege, U. Maier, “A two-stage approach for computing cubature formulae for the sphere”, Technical report, Section Mathematics, University of Dortmund, 1999. Node numbers are found at http://www.mathematik.uni-dortmund.de/lsx/research/projects/fliege/nodes/nodes.html [0089] [4] EP 2469742 A2 [0090] [5] PCT/EP2015/063912 [0091] [6] WO 2014/090660 A1 [0092] [7] WO 2014/177455 A1 [0093] [8] WO 2013/171083 A1 [0094] [9] B. Rafaely, “Plane-wave decomposition of the sound field on a sphere by spherical convolution”, J. Acoust. Soc. Am., 4(116), pages 2149-2157, October 2004 [0095] [10] E. G. Williams, “Fourier Acoustics”, Applied Mathematical Sciences, vol. 93, 1999, Academic Press

METHOD AND APPARATUS FOR ENCODING AND DECODING AN HOA REPRESENTATION

Assignee

Inventors

Cpc classification

Classification Explorer

H04S2420/11

ELECTRICITY

Classification Explorer

H04S3/02

ELECTRICITY

Classification Explorer

G10L19/008

PHYSICS

Classification Explorer

H04S3/008

ELECTRICITY

International classification

Classification Explorer

G10L19/008

PHYSICS

Classification Explorer

H04S3/00

ELECTRICITY

Classification Explorer

H04S3/02

ELECTRICITY

Abstract

Claims

Description