METHOD AND APPARATUS FOR DECODING STEREO LOUDSPEAKER SIGNALS FROM A HIGHER-ORDER AMBISONICS AUDIO SIGNAL
20220182775 · 2022-06-09
Assignee
Inventors
Cpc classification
H04S1/002
ELECTRICITY
G10L19/00
PHYSICS
H04S2400/11
ELECTRICITY
H04S7/30
ELECTRICITY
H04S2420/11
ELECTRICITY
H04S3/02
ELECTRICITY
H04S3/008
ELECTRICITY
International classification
H04S3/00
ELECTRICITY
G10L19/00
PHYSICS
H04S3/02
ELECTRICITY
Abstract
Decoding of Ambisonics representations for a stereo loudspeaker setup is known for first-order Ambisonics audio signals. But such first-order Ambisonics approaches have either high negative side lobes or poor localisation in the frontal region. The invention deals with the processing for stereo decoders for higher-order Ambisonics HOA.
Claims
1. (canceled)
2. A method for decoding a Higher-Order Ambisonics (HOA) audio signal, the method comprising: receiving the HOA audio signal; determining a matrix G of panning function values, wherein the matrix G contains, for each of a number S of virtual sampling points on a sphere, gain vectors g.sub.1 . . . g.sub.s, wherein at least a first gain vector g.sub.1 and a second gain vector g.sub.2 of the matrix G are different; receiving a mode matrix that was determined based on the number S of virtual sampling points and an order of the HOA audio signal; determining a decoding matrix based on the matrix G and the mode matrix; and determining, by at least one processor, stereo loudspeaker signals based on the decoding matrix and the HOA audio signal.
3. The method of claim 2, wherein the matrix G has a size L×S, wherein L corresponds to a number of loudspeakers.
4. The method of claim 3, wherein the gain vectors g.sub.1 . . . g.sub.s, are directed to achieve a panned mix in S directions of the L loudspeakers.
5. The method of claim 3, wherein a subset of the gain vectors g.sub.1 . . . g.sub.s, have higher values for sampling points located closer to the L loudspeakers.
6. The method of claim 5, wherein another subset of the gain vectors g.sub.1 . . . g.sub.s, have values close to zero for sampling points located far from the L loudspeakers.
7. A non-transitory computer-readable medium having stored thereon instructions, that when executed by one or more processors, cause one or more processors to perform the method of claim 2.
8. An apparatus for decoding a Higher-Order Ambisonics (HOA) audio signal, the apparatus comprising: a first receiver configured to receive the HOA audio signal; a first processor for determining a matrix G of panning function values, wherein the matrix G contains, for each of a number S of virtual sampling points on a sphere, gain vectors g.sub.1 . . . g.sub.s, wherein at least a first gain vector g.sub.1 and a second gain vector g.sub.2 of the matrix G are different; a second receiver configured to receive a mode matrix that was determined based on the number S of virtual sampling points and an order of the HOA audio signal; a second processor for determining a decoding matrix based on the matrix G and the mode matrix; and a third processor for determining stereo loudspeaker signals based on the decoding matrix and the HOA audio signal.
Description
DRAWINGS
[0040] Exemplary embodiments of the invention are described with reference to the accompanying drawings:
[0041]
[0042]
[0043]
[0044]
[0045]
EXEMPLARY EMBODIMENTS
[0046] In a first step in the decoding processing, the positions of the loudspeakers have to be defined. The loudspeakers are assumed to have the same distance from the listening position, whereby the loudspeaker positions are defined by their azimuth angles. The azimuth is denoted by ϕ and is measured counter-clockwise. The azimuth angles of the left and right loudspeaker are ϕ.sub.L and ϕ.sub.R, and in a symmetric setup ϕ.sub.R=−ϕ.sub.L. A typical value is ϕ.sub.L=30°. In the following description, all angle values can be interpreted with an offset of integer multiples of 2π (rad) or 360°.
[0047] The virtual sampling points on a circle are to be defined. These are the virtual source directions used in the Ambisonics decoding processing, and for these directions the desired panning function values for e.g. two real loudspeaker positions are defined. The number of virtual sampling points is denoted by S, and the corresponding directions are equally distributed around the circle, leading to
S should be greater than 2N+1, where N denotes the Ambisonics order. Experiments show that an advantageous value is S=8N.
[0048] The desired panning functions g.sub.L(ϕ) and g.sub.R(ϕ) for the left and right loudspeakers have to be defined. In contrast to the approach from WO 2011/117399 A1 and the above-mentioned Batke/Keiler article, the panning functions are defined for multiple segments where for the segments different panning functions are used. For example, for the desired panning functions three segments are used: [0049] a) For the frontal direction between the two loudspeakers a well-known panning law is used, e.g. tangent law or, equivalently, vector base amplitude panning (VBAP) as described in V. Pulkki, “Virtual sound source positioning using vector base amplitude panning”, J. Audio Eng. Society, 45(6), pp. 456-466, June 1997. [0050] b) For directions beyond the loudspeaker circle section positions a slight attenuation for the back directions is defined, whereby this part of the panning function is approaching the value of zero at an angle approximately opposite the loudspeaker position. [0051] c) The remaining part of the desired panning functions is set to zero in order to avoid playback of sounds from the right on the left loudspeaker and sounds from the left on the right loudspeaker.
[0052] The points or angle values where the desired panning functions are reaching zero are defined by ϕ.sub.L,0 for the left and ϕ.sub.R,0 for the right loudspeaker. The desired panning functions for the left and right loudspeakers can be expressed as:
[0053] The panning functions g.sub.L,1(ϕ) and g.sub.R,1(ϕ) define the panning law between the loudspeaker positions, whereas the panning functions g.sub.L,2(ϕ) and g.sub.R,2(ϕ) typically define the attenuation for backward directions. At the intersection points the following properties should be satisfied:
g.sub.L,2(ϕ.sub.L)=g.sub.L,1(ϕ.sub.L) (4)
g.sub.L,2(ϕ.sub.L,0)=0 (5)
g.sub.R,2(ϕ.sub.R)=g.sub.R,1(ϕ.sub.R) (6)
g.sub.R,2(ϕ.sub.R,0)=0. (7)
[0054] The desired panning functions are sampled at the virtual sampling points. A matrix containing the desired panning function values for all virtual sampling points is defined by:
[0055] The real or complex valued Ambisonics circular harmonic functions are Y.sub.m(ϕ) with m=−N, . . . , N where N is the Ambisonics order as mentioned above. The circular harmonics are represented by the azimuth-dependent part of the spherical harmonics, cf. Earl G. Williams, “Fourier Acoustics”, vol. 93 of Applied Mathematical Sciences, Academic Press, 1999.
[0056] With the real-valued circular harmonics
the circular harmonic functions are typically defined by
wherein Ñ.sub.m and N.sub.m are scaling factors depending on the used normalisation scheme.
The circular harmonics are combined in a vector
y(ϕ)=[Y.sub.−N(ϕ), . . . ,Y.sub.0(ϕ), . . . ,Y.sub.N(ϕ)].sup.T. (11)
Complex conjugation, denoted by (⋅)*, yields
y*(ϕ)=[Y*.sub.−N(ϕ), . . . ,Y*.sub.0(ϕ), . . . ,Y*.sub.N(ϕ)].sup.T. (12)
The mode matrix for the virtual sampling points is defined by
Ξ=[y*(ϕ.sub.1),y*(ϕ.sub.2), . . . ,y*(ϕ.sub.S)]. (13)
The resulting 2-D decoding matrix is computed by
D=GΞ.sup.+, (14)
with Ξ.sup.+ being the pseudo-inverse of matrix Ξ. For equally distributed virtual sampling points as given in equation (1), the pseudo-inverse can be replaced by a scaled version of Ξ.sup.H, which is the adjoint (transposed and complex conjugate) of Ξ. In this case the decoding matrix is
D=αGΞ.sup.H, (15)
wherein the scaling factor α depends on the normalisation scheme of the circular harmonics and on the number of design directions S.
[0057] Vector l(t) representing the loudspeaker sample signals for time instance t is calculated by
l(t)=Da(t). (16)
[0058] When using 3-dimensional higher-order Ambisonics signals a(t) as input signals, an appropriate conversion to the 2-dimensional space is applied, resulting in converted Ambisonics coefficients a′(t). In this case equation (16) is changed to l(t)=Da′(t).
[0059] It is also possible to define a matrix D.sub.3D, which already includes that 3D/2D conversion and is directly applied to the 3D Ambisonics signals a(t).
[0060] In the following, an example for panning functions for a stereo loudspeaker setup is described. In-between the loudspeaker positions, panning functions g.sub.L,1(ϕ) and g.sub.R,1(ϕ) from eq. (2) and eq. (3) and panning gains according to VBAP are used. These panning functions are continued by one half of a cardioid pattern having its maximum value at the loudspeaker position. The angles ϕ.sub.L,0 and ϕ.sub.R,0 are defined so as to have positions opposite to the loudspeaker positions:
ϕ.sub.L,0=ϕ.sub.L+π (17)
ϕ.sub.R,0=ϕ.sub.R+π. (18)
Normalised panning gains are satisfying g.sub.L,1(ϕ.sub.L)=1 and g.sub.R,1(ϕ.sub.R)=1. The cardioid patterns pointing towards ϕ.sub.L and ϕ.sub.R are defined by:
g.sub.L,2(ϕ)=½(1+cos(ϕ−ϕ.sub.L)) (19)
g.sub.R,2(ϕ)=½(1+cos(ϕ−ϕ.sub.R)). (20)
[0061] For the evaluation of the decoding, the resulting panning functions for arbitrary input directions can be obtained by
W=DY (21)
where Y is the mode matrix of the considered input directions. W is a matrix that contains the panning weights for the used input directions and the used loudspeaker positions when applying the Ambisonics decoding process.
[0062]
[0063] The resulting panning weights for Ambisonics decoding are computed using eq. (21) for the used input directions.
[0064] The comparison of
[0065] In the following, an example for a 3D to 2D conversion is provided for complex-valued spherical and circular harmonics (for real-valued basis functions it can be carried out in a similar way). The spherical harmonics for 3D Ambisonics are:
Ŷ.sub.n.sup.m(θ,φ)=M.sub.n,mP.sub.n.sup.m(cos(θ))e.sup.imφ, (21)
wherein n=0, . . . , N is the order index, m=−n, . . . , n is the degree index, M.sub.n,m is the normalisation factor dependent on the normalisation scheme, θ is the inclination angle and P.sub.n.sup.m(⋅) are the associated Legendre functions. With given Ambisonics coefficients Â.sub.n.sup.m for the 3D case, the 2D coefficients are calculated by
A.sub.m=α.sub.mÂ.sub.|m|.sup.m,m=−N, . . . ,N (22)
with the scaling factors
[0066] In
[0067] Step or stage 54 computes the pseudo-inverse Ξ.sup.+ of matrix Ξ. From matrices G and Ξ.sup.+ the decoding matrix D is calculated in step/stage 55 according to equation 15. In step/stage 56, the loudspeaker signals l(t) are calculated from Ambisonics signal a(t) using decoding matrix D. In case the Ambisonics input signal a(t) is a three-dimensional spatial signal, a 3D-to-2D conversion can be carried out in step or stage 57 and step/stage 56 receives the 2D Ambisonics signal a′(t).