METHOD AND DEVICE FOR DECODING AN AUDIO SOUNDFIELD REPRESENTATION
20220189492 · 2022-06-16
Assignee
Inventors
Cpc classification
H04S7/308
ELECTRICITY
H04S2420/11
ELECTRICITY
H04S3/02
ELECTRICITY
G10L19/008
PHYSICS
International classification
G10L19/008
PHYSICS
H04S3/02
ELECTRICITY
Abstract
Soundfield signals such as e.g. Ambisonics carry a representation of a desired sound field. The Ambisonics format is based on spherical harmonic decomposition of the soundfield, and Higher Order Ambisonics (HOA) uses spherical harmonics of at least 2.sup.nd order. However, commonly used loudspeaker setups are irregular and lead to problems in decoder design. A method for improved decoding an audio soundfield representation for audio playback comprises calculating a panning function (W) using a geometrical method based on the positions of a plurality of loudspeakers and a plurality of source directions, calculating a mode matrix (Ξ) from the loudspeaker positions, calculating a pseudo-inverse mode matrix (Ξ.sup.+) and decoding the audio soundfield representation. The decoding is based on a decode matrix (D) that is obtained from the panning function (W) and the pseudo-inverse mode matrix (Ξ.sup.+).
Claims
1. A method for decoding an ambisonics audio soundfield representation for playback, the method comprising: receiving, by a processor configured to decode the audio soundfield representation, the audio soundfield representation; receiving, by the processor, a decode matrix for decoding the audio soundfield representation to determine a decoded audio signal, wherein the decode matrix is based on a matrix containing panning functions for a first plurality of L loudspeaker positions and a second plurality of S source directions, wherein the size of the matrix is L×S, wherein the plurality of S source directions are distributed over a unit sphere, and wherein the panning functions are indicated by gain values; and determining the decoded audio signal based on a multiplication of the decode matrix and the audio soundfield representation
2. The method of claim 1, wherein the decode matrix is predetermined.
3. The method of claim 1, wherein each element of the decode matrix relates to at least a spherical harmonic function of the decoded audio signal.
4. A non-transitory computer readable medium having stored on it executable instructions to cause a computer to perform a method for decoding the ambisonics audio soundfield representation for audio playback according to claim 1.
5. A system for decoding an ambisonics audio soundfield representation for playback, the apparatus comprising: a receiver for receiving the audio soundfield representation; a processor for receiving a decode matrix for decoding the audio soundfield representation to determine a decoded audio signal, wherein the decode matrix is based on a matrix containing panning functions for a first plurality of L loudspeaker positions and a second plurality of S source directions, wherein the size of the matrix is L×S, wherein the plurality of S source directions are distributed over a unit sphere, and wherein the panning functions are indicated by gain values; and a decoder for determining the decoded audio signal based on a multiplication of the decode matrix and the audio soundfield representation.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0021] Exemplary embodiments of the invention are described with reference to the accompanying drawings.
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
DETAILED DESCRIPTION OF THE INVENTION
[0029] As shown in
[0030] As shown in
[0031] A particularly useful 3D loudspeaker setup has 16 loudspeakers. As shown in
[0032] In the following, Vector Base Amplitude Panning (VBAP) is described in detail. In one embodiment, VBAP is used herein to place virtual acoustic sources with an arbitrary loudspeaker setup where the same distance of the loudspeakers from the listening position is assumed. VBAP uses three loudspeakers to place a virtual source in the 3D space. For each virtual source, a monophonic signal with different gains is fed to the loudspeakers to be used. The gains for the different loudspeakers are dependent on the position of the virtual source. VBAP is a geometric approach to calculate the gains of the loudspeaker signals for the panning between the loudspeakers. In the 3D case, three loudspeakers arranged in a triangle build a vector base. Each vector base is identified by the loudspeaker numbers k, m, n and the loudspeaker position vectors I.sub.k, I.sub.m, I.sub.n given in Cartesian coordinates normalised to unity length. The vector base for loudspeakers k,m,n is defined by
L.sub.kmn={I.sub.k,I.sub.m,I.sub.n} (1)
[0033] The desired direction Ω=(θ,ϕ) of the virtual source has to be given as azimuth angle ϕ and inclination angle θ. The unity length position vector p(Ω) of the virtual source in Cartesian coordinates is therefore defined by
p(Ω)={cos ϕ,sin ϕ,sin ϕ sin ϕ,cos ϕ}.sup.T (2)
[0034] A virtual source position can be represented with the vector base and the gain factors g(Ω)=(.sup.˜g.sub.k, .sup.˜g.sub.m, .sup.˜g.sub.n).sup.T by
p(Ω)=L.sub.kmng(Ω)=.sup.˜g.sub.kI.sub.k+.sup.˜g.sub.mI.sub.m+.sup.˜g.sub.nI.sub.n (3)
[0035] By inverting the vector base matrix the required gain factors can be computed by
g(Ω)=L.sup.−1.sub.kmnp(Ω) (4)
[0036] The vector base to be used is determined according to Pulkki's document: First the gains are calculated according to Pulkki for all vector bases. Then for each vector base the minimum over the gain factors is evaluated by .sup.˜gmin=min{.sup.˜gk, .sup.˜gm, .sup.˜gn}. Finally the vector base where .sup.˜gmin has the highest value is used. The resulting gain factors must not be negative. Depending on the listening room acoustics the gain factors may be normalised for energy preservation.
[0037] In the following, the Ambisonics format is described, which is an exemplary soundfield format. The Ambisonics representation is a sound field description method employing a mathematical approximation of the sound field in one location. Using the spherical coordinate system, the pressure at point r=(r,θ,ϕ) in space is described by means of the spherical Fourier transform
[0038] where k is the wave number. Normally n runs to a finite order M. The coefficients A.sup.m.sub.n(k) of the series describe the sound field (assuming sources outside the region of validity), j.sub.n(kr) is the spherical Bessel function of first kind and Y.sup.m.sub.n(θ,ϕ) denote the spherical harmonics. Coefficients A.sup.m.sub.n(k) are regarded as Ambisonics coefficients in this context. The spherical harmonics Y.sub.m n(θ,ϕ) only depend on the inclination and azimuth angles and describe a function on the unity sphere.
[0039] For reasons of simplicity often plain waves are assumed for sound field reproduction. The Ambisonics coefficients describing a plane wave as an acoustic source from direction Ω.sub.s are
A.sub.n,plane.sup.m(Ω.sub.s)=4πi.sup.nY.sub.n.sup.m(Ω.sub.s)* (6)
[0040] Their dependency on wave number k decreases to a pure directional dependency in this special case. For a limited order M the coefficients form a vector A that may be arranged as
A(Ω.sub.s)=[A.sub.0.sup.0A.sub.0.sup.−1A.sub.1.sup.0A.sub.1.sup.1 . . . A.sub.M.sup.M].sup.T (7)
[0041] holding O=(M+1).sup.2 elements. The same arrangement is used for the spherical harmonics coefficients yielding a vector Y(Ω.sub.s)*=[Y.sub.0.sup.0 Y.sub.1.sup.−1 Y.sub.1.sup.0 Y.sub.1.sup.1 . . . A.sub.M.sup.M].sup.H.
[0042] Superscript H denotes the complex conjugate transpose.
[0043] To calculate loudspeaker signals from an Ambisonics representation of a sound field, mode matching is a commonly used approach. The basic idea is to express a given Ambisonics sound field description A(Ω.sub.s) by a weighted sum of the loudspeakers' sound field descriptions A(Ω.sub.l)
[0044] where Ω.sub.l denote the loudspeakers' directions, w.sub.l are weights, and L is the number of loudspeakers. To derive panning functions from eq.(8), we assume a known direction of incidence Ω.sub.s. If source and speaker sound fields are both plane waves, the factor 4πi.sup.n (see eq.(6)) can be dropped and eq.(8) only depends on the complex conjugates of spherical harmonic vectors, also referred to as “modes”. Using matrix notation, this is written as
Y(Ω.sub.s)*=Ψw(Ω.sub.s) (9)
[0045] where Ψ is the mode matrix of the loudspeaker setup
Ψ=[Y(Ω.sub.1)*,Y(Ω.sub.2)*, . . . ,Y(Ω.sub.L)*] (10)
[0046] with O×L elements. To obtain the desired weighting vector w, various strategies to accomplish this are known. If M=3 is chosen, Ψ is square and may be invertible. Due to the irregular loudspeaker setup the matrix is badly scaled, though. In such a case, often the pseudo inverse matrix is chosen and
D=[Ψ.sub.HΨ].sup.−1Ψ.sup.H (11)
[0047] yields a L×O decoding matrix D. Finally we can write
w(Ω.sub.s)=DY(Ω.sub.s)* (12)
[0048] where the weights w(Ω.sub.s) are the minimum energy solution for eq.(9). The consequences from using the pseudo inverse are described below.
[0049] The following describes the link between panning functions and the Ambisonics decoding matrix. Starting with Ambisonics, the panning functions for the individual loudspeakers can be calculated using eq.(12). Let
Ξ=[Y(Ω.sub.1)*,Y(Ω.sub.2)*, . . . ,Y(Ω.sub.s)*] (13)
[0050] be the mode matrix of S input signal directions (Ω.sub.s), e. g. a spherical grid with an inclination angle running in steps of one degree from 1 . . . 180° and an azimuth angle from 1 . . . 360° respectively. This mode matrix has O×S elements. Using eq.(12), the resulting matrix W has L×S elements, row l holds the S panning weights for the respective loudspeaker:
W=DΞ (14)
[0051] As a representative example, the panning function of a single loudspeaker 2 is shown as beam pattern in
[0052] As outlined in the introduction, another way to obtain a decoding matrix D for playback of Ambisonics signals is possible when the panning functions are already known. The panning functions W are viewed as desired signal defined on a set of virtual source directions Ω, and the mode matrix Ξ of these directions serves as input signal. Then the decoding matrix can be calculated using
D=WΞ.sup.H[ΞΞ.sup.H].sup.−1=WΞ.sup.++ (15)
[0053] where Ξ.sup.H[ΞΞ.sup.H].sup.−1 or simply Ξ.sup.+ is the pseudo inverse of the mode matrix Ξ. In the new approach, we take the panning functions in W from VBAP and calculate an Ambisonics decoding matrix from this.
[0054] The panning functions for W are taken as gain values g(Ω) calculated using eq.(4), where Ω is chosen according to eq.(13). The resulting decode matrix using eq.(15) is an Ambisonics decoding matrix facilitating the VBAP panning functions. An example is depicted in
[0055] The source directions 103 can be rather freely defined. A condition for the number of source directions S is that it must be at least (N+1).sup.2. Thus, having a given order N of the soundfield signal SF.sub.c it is possible to define S according to S≥(N+1).sup.2, and distribute the S source directions evenly over a unity sphere. As mentioned above, the result can be a spherical grid with an inclination angle θ running in constant steps of x (e.g. x=1 . . . 5 or x=10, 20 etc.) degrees from 1 . . . 180° and an azimuth angle ϕ from 1 . . . 360° respectively, wherein each source direction Ω=(θ,ϕ) can be given by azimuth angle ϕ and inclination angle θ.
[0056] The advantageous effect has been confirmed in a listening test. For the evaluation of the localisation of a single source, a virtual source is compared against a real source as a reference. For the real source, a loudspeaker at the desired position is used. The playback methods used are VBAP, Ambisonics mode matching decoding, and the newly proposed Ambisonics decoding using VBAP panning functions according to the present invention. For the latter two methods, for each tested position and each tested input signal, an Ambisonics signal of third order is generated. This synthetic Ambisonics signal is then decoded using the corresponding decoding matrices. The test signals used are broadband pink noise and a male speech signal. The tested positions are placed in the frontal region with the directions
Ω1=(76.1°,−23.2°),Ω2=(63.3°,−4.3°) (16)
[0057] The listening test was conducted in an acoustic room with a mean reverberation time of approximately 0.2 s. Nine people participated in the listening test. The test subjects were asked to grade the spatial playback performance of all playback methods compared to the reference. A single grade value had to be found to represent the localisation of the virtual source and timbre alterations.
[0058] As the results show, the unregularised Ambisonics mode matching decoding is graded perceptually worse than the other methods under test. This result corresponds to
[0059] As a conclusion, a new way of obtaining an Ambisonics decoding matrix from the VBAP panning functions is disclosed. For different loudspeaker setups, this approach is advantageous as compared to matrices of the mode matching approach. Properties and consequences of these decoding matrices are discussed above. In summary, the newly proposed Ambisonics decoding with VBAP panning functions avoids typical problems of the well known mode matching approach. A listening test has shown that VBAP-derived Ambisonics decoding can produce a spatial playback quality better than the direct use of VBAP can produce. The proposed method requires only a sound field description while VBAP requires a parametric description of the virtual sources to be rendered.
[0060] While there has been shown, described, and pointed out fundamental novel features of the present invention as applied to preferred embodiments thereof, it will be understood that various omissions and substitutions and changes in the apparatus and method described, in the form and details of the devices disclosed, and in their operation, may be made by those skilled in the art without departing from the spirit of the present invention. It is expressly intended that all combinations of those elements that perform substantially the same function in substantially the same way to achieve the same results are within the scope of the invention. Substitutions of elements from one described embodiment to another are also fully intended and contemplated. It will be understood that modifications of detail can be made without departing from the scope of the invention. Each feature disclosed in the description and (where appropriate) the claims and drawings may be provided independently or in any appropriate combination. Features may, where appropriate be implemented in hardware, software, or a combination of the two.
[0061] Reference numerals appearing in the claims are by way of illustration only and shall have no limiting effect on the scope of the claims.