Method for interpolating a sound field, corresponding computer program product and device
11736882 · 2023-08-22
Assignee
Inventors
Cpc classification
H04S2400/15
ELECTRICITY
H04S2420/11
ELECTRICITY
International classification
Abstract
A method for interpolating a sound field captured by a plurality of N microphones each outputting the encoded sound field in a form including at least one captured pressure and an associated pressure gradient vector. Such a method includes an interpolation of the sound field at an interpolation position outputting an interpolated encoded sound field as a linear combination of the N encoded sound fields each weighted by a corresponding weighting factor. The interpolation includes an estimation of the N weighting factors at least from: the interpolation position; a position of each of the N microphones; the N pressures captured by the N microphones; and an estimated power of the sound field at the interpolation position.
Claims
1. A method comprising: receiving a sound field captured by a plurality of N microphones each outputting said sound field encoded in a form comprising at least one captured pressure and an associated pressure gradient vector; and interpolating said sound field at an interpolation position outputting an interpolated encoded sound field as a linear combination of said N encoded sound fields each weighted by a corresponding weighting factor, wherein said interpolating comprises estimating said N weighting factors at least from: said interpolation position; a position of each of said N microphones; said N pressures captured by said N microphones; and an estimated power of said sound field at said interpolation position.
2. The method according to claim 1, wherein said estimating implements a resolution of the equation Σ.sub.ia.sub.i(t)(t)x.sub.i(t)=
(t)x.sub.a(t), with: x.sub.i(t) being a vector representative of said position of the microphone an index i among said N microphones; x.sub.a(t) being a vector representative of said interpolation position;
(t) being said estimate of the power of said sound field at said interpolation position;
(t) being, an estimate of instantaneous power W.sub.i.sup.2(t) of said pressure captured by said microphone bearing the index i; and a.sub.i(t) being the N weighting factors.
3. The method according to claim 2, wherein said resolution is performed with the constraint that Σ.sub.ia.sub.i(t)(t)=
(t).
4. The method according to claim 3, wherein said resolution is further performed with the constraint that of the N weighting factors a.sub.i(t) are positive or zero.
5. The method according to claim 2, wherein said estimation also implements a resolution of the equation αΣ.sub.ia.sub.i(t)(t)==α
(t), with α being a homogenisation factor.
6. The method according to claim 2, wherein said estimating comprises: a time averaging of said instantaneous power W.sub.i.sup.2(t) over a predetermined period of time outputting said estimate (t); or an autoregressive filtering of time samples of said instantaneous power W.sub.i.sup.2(t), outputting said estimate
(t).
7. The method according to claim 2, wherein said estimate (t) of the power of said sound field at said interpolation position is estimated from said instantaneous sound power W.sub.i.sup.2(t) captured by that one among said N microphones the closest to said interpolation position or from said estimate
(t) of said instantaneous sound power W.sub.i.sup.2(t) captured by that one among said N microphones the closest to said interpolation position.
8. The method according to claim 2, wherein said estimate (t) of the power of said sound field at said interpolation position is estimated from a barycentre of said N instantaneous sound powers W.sub.i.sup.2(t) captured by said N microphones, respectively from a barycentre of said N estimates
(t) of said N instantaneous sound powers W.sub.i.sup.2(t) captured by said N microphones, a coefficient weighting the instantaneous sound power W.sub.i.sup.2(t), respectively weighting the estimate
(t) of the instantaneous sound power W.sub.i.sup.2(t) captured by said microphone bearing the index i, in said barycentre being inversely proportional to a normalised version of the distance between the position of said microphone bearing the index i outputting said pressure W.sub.i(t) and said interpolation position, said distance being expressed in the sense of a L-p norm.
9. The method according to claim 1, further comprising, prior to said interpolating, selecting said N microphones among Nt microphones, Nt>N.
10. The method according to claim 9, wherein the N selected microphones are those the closest to said interpolation position among said Nt microphones.
11. The method according to claim 9, wherein said selecting comprises: selecting two microphones bearing the indexes i.sub.1 and i.sub.2 the closest to said interpolation position among said Nt microphones; calculating a median vector u.sub.12(t) having as an origin said interpolation position and pointing between the positions of the two microphones bearing the indexes i.sub.1 and i.sub.2; and determining a third microphone bearing the index i.sub.3 different from said two microphones bearing the indexes i.sub.1 and i.sub.2 among the Nt microphones and whose position is the most opposite to the median vector u.sub.12(t).
12. The method according to claim 1, further comprising, for given encoded sound field among said N encoded sound fields output by said N microphones, transforming said given encoded sound field by application of a perfect reconstruction filter bank outputting M field frequency components associated to said given encoded sound field, each field frequency component among said M field frequency components being located in a distinct frequency sub-band, said transforming being repeated for said N encoded sound fields outputting N corresponding sets of M field frequency components, wherein, for a given frequency sub-band among said M frequency sub-bands, said interpolating outputs a field frequency component interpolated at said interpolation position and located within said given frequency sub-band, said interpolated field frequency component being expressed as a linear combination of said N field frequency components, among said N sets, located in said given frequency sub-band, and said interpolating being for said M frequency sub-bands outputting M interpolated field frequency components at said interpolation position, each interpolated field frequency component among said M interpolated field frequency components being located in a distinct frequency sub-band.
13. The method according to claim 12, further comprising an inverse transformation of said transformation, said inverse transformation being applied to said M interpolated field frequency components outputting said interpolated encoded sound field at said interpolation position.
14. The method of claim 1, further comprising: capturing said sound field by the plurality of N microphones each outputting the corresponding captured sound field; encoding of each of said captured sound fields outputting a corresponding encoded sound field in the form comprising the at least one captured pressure and associated pressure gradient vector; performing an interpolation phase comprising the interpolating and outputting said interpolated encoded sound field at said interpolation position; compressing said interpolated encoded sound field outputting a compressed interpolated encoded sound field; transmitting said compressed interpolated encoded sound field to at least one rendering device; decompressing said received compressed interpolated encoded sound field; and rendering said interpolated encoded sound field on said at least one rendering device.
15. A non-transitory computer-readable medium comprising program code instructions stored thereon for implementing a method of interpolating, when said program is executed on a computer, wherein the instructions configure the computer to: receiving a sound field captured by a plurality of N microphones each outputting said sound field encoded in a form comprising at least one captured pressure and an associated pressure gradient vector; and interpolating said sound field at an interpolation position outputting an interpolated encoded sound field as a linear combination of said N encoded sound fields each weighted by a corresponding weighting factor, wherein said interpolating comprises estimating said N weighting factors at least from: said interpolation position; a position of each of said N microphones; said N pressures captured by said N microphones; and an estimated power of said sound field at said interpolation position.
16. A device for interpolating a sound field captured by a plurality of N microphones each outputting said sound field encoded in a form comprising at least one captured pressure and an associated pressure gradient vector, said device comprising: a reprogrammable computing machine or a dedicated computing machine, configured to: receive sound field captured by the N microphones; and interpolate said sound field at an interpolation position outputting an interpolated encoded sound field expressed as a linear combination of said N encoded sound fields each weighted by a corresponding weighting factor, wherein said reprogrammable computing machine or said dedicated computing machine is further configured to estimate said N weighting factors from at least: said interpolation position; a position of each of said N microphones; said N pressures captured by said N microphones, and an estimate of the power of said sound field at said interpolation position.
17. The device of claim 16, further comprising the plurality of N microphones.
18. The method of claim 1, further comprising capturing the sound field by the plurality of N microphones.
Description
LIST OF FIGURES
(1) Other objects, features and advantages of the invention will appear more clearly upon reading the following description, provided merely as an illustrative and non-limiting example, with reference to the figures, among which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION
(13) In all figures of the present document, identical elements and steps bear the same reference numeral.
(14) The general principle of the invention is based on encoding of the sound field by the microphones capturing the considered sound field in a form comprising at least one captured pressure and an associated pressure gradient. In this manner, the pressure gradient of the field interpolated through a linear combination of the sound fields encoded by the microphones remains coherent with that of the sound field as emitted by the source(s) of the scene at the interpolation position. Moreover, the method according to the invention bases the estimation of the weighting factors involved in the considered linear combination on an estimate of the power of the sound field at the interpolation position. Thus, a low computing complexity is obtained.
(15) In the following, a particular example of application of the invention to the context of navigation of a listener in a sound stage is considered. Of course, it should be noted that the invention is not limited to this type of application and may advantageously be used in other fields such as the rendering of a multi-channel scene, the compression of a multi-channel scene, etc.
(16) Moreover, in the present application: the term encoding (or coding) is used to refer to the operation of representing a physical sound field captured by a given microphone according to one or several quantities according to a predefined representation format. For example, such a format is the ambisonic format described hereinabove in connection with the “The prior art and its drawbacks” section. The reverse operation then amounts to a rendering of the sound field, for example on a loudspeaker-type device which converts samples of the sound fields in the predefined representation format into a physical acoustic field; and the term compression is, in turn, used to refer to a processing aiming to reduce the amount of data necessary to represent a given amount of information. For example, it consists of an “entropic coding” type processing (for example, according to the MP3 standard) applied to the samples of the encoded sound field. Thus, the term decompression corresponds to the reverse operation.
(17) As of now, a sound stage 100 wherein a listener 110 moves, a sound field having been diffused by sound sources 100s and having been captured by microphones 100m are presented, with reference to [
(18) More particularly, the listener 110 is provided with a headset equipped with loudspeakers 110hp enabling rendering of the interpolated sound field at the interpolation position occupied thereby. For example, it consists of Hi-Fi headphones, or a virtual reality headset such as Oculus, HTC Vive or Samsung Gear. In this instance, the sound field is interpolated and rendered through the implementation of the rendering method described hereinbelow with reference to [
(19) Moreover, the sound field captured by the microphones 100m is encoded in a form comprising a captured pressure and an associated pressure gradient.
(20) In other non-illustrated embodiments, the sound field captured by the microphones is encoded in a form comprising the captured pressure, the associated pressure gradient vector as well as all or part of the higher order components of the sound field in the ambisonic format.
(21) Back to [
(22)
(23) It is shown that this vector is orthogonal to the wavefront and points in the direction of propagation of the sound wave, namely opposite to the position of the emitter source: this way, it is directly correlated with the perception of the wavefront. This is particularly obvious when considering a field generated by one single punctual and far source s(t) propagating in an anechoic environment. The ambisonics theory states that, for such a plane wave with an incidence (ϑ, φ), where ϑ is the azimuth and p the elevation, the first-order sound field is given by the following equation:
(24)
(25) In this case, the full-band acoustic intensity {right arrow over (I)}(t) is equal (while considering a multiplying coefficient), to:
(26)
(27) Hence, we see that it points to the opposite of the direction of the emitter source and the direction of arrival (ϑ, φ) of the wavefront may be estimated by the following trigonometric relationships:
(28)
(29) As of now, a method for interpolating the sound field captured by the microphones 100m of the stage 100 according to an embodiment of the invention is presented, with reference to [
(30) Such a method comprises a step E200 of selecting N microphones among the Nt microphones of the stage 100. It should be noted that in the embodiment represented in [
(31) More particularly, as discussed hereinbelow in connection with steps E210 and E210a, the method according to the invention implements the resolution of systems of equations (i.e. [math 4] in different constraints alternatives (i.e. hyperplan and/or positive weighting factors) and [Math 5]). In practice, it turns out that the resolution of the considered systems in the case where they are underdetermined (which case corresponds to the configuration where there are more microphones 100m than equations to be solved) leads to solutions that might favor different sets of microphones, over time. While the location of the sources 100s as perceived via the interpolated sound field is still coherent, there are nevertheless timbre changes that are perceptible by the ear. These differences are due: i) to the colouring of the reverberation which is different from one microphone 100m to another; ii) to the comb filtering induced by the mixture of non-coincident microphones 100m, which filtering has different characteristics from one set of microphones to another.
(32) To avoid such timber changes, N microphones 100m are selected while always ensuring that the mixture is determined, and even overdetermined. For example, in the case of a 3D interpolation, it is possible to select up to three microphones among the Nt microphones 100m.
(33) In one variant, the N microphones 100m that are the closest to the position to be interpolated are selected. This solution should be preferred when a large number Nt of microphones 110m is present in the stage. However, in some cases, the selection of the closest N microphones 110m could turn out to be “imbalanced” considering the interpolation position with respect to the source 100s and lead to a total reversal of the direction of arrival: this is the case in particular when the source 100s is placed between the microphones 100m and the interpolation position.
(34) To avoid this situation, in another variant, the N microphones are selected distributed around the interpolation position. For example, we select the two microphones bearing the indexes i.sub.1 and i.sub.2 that are the closest to the interpolation position among the Nt microphones 100m, and then we look among the remaining microphones for that one that maximises the “enveloping” of the interpolation position. To achieve this, step E200 comprises for example: a selection of two microphones bearing the indexes i.sub.1 and i.sub.2 that are the closest to the interpolation position among the Nt microphones 110m; a calculation of a median vector u.sub.12(t) having the interpolation position as an origin and pointing between the positions of the two microphones bearing the indexes i.sub.1 and i.sub.2; and a determination of a third microphone bearing an index i.sub.3 different from the two microphones bearing the indexes i.sub.1 and i.sub.2 among the Nt microphones 110m and whose position is the most opposite to the median vector u.sub.12(t).
(35) For example, the median vector u.sub.12(t) is expressed as:
(36)
(37) with: x.sub.a(t)=(x.sub.a(t) y.sub.a(t) z.sub.a(t)).sup.T a vector representative of the interpolation position (i.e. the position of the listener 110 in the embodiment represented in [
(38) the considered vectors being expressed in a given reference frame.
(39) In this case, the index i.sub.3 of said third microphone is, for example, an index different from i.sub.1 and i.sub.2 which minimises the scalar product
(40)
among the Nt indexes of the microphones 100m. Indeed, the considered scalar product varies between −1 and +1, and it is minimum when the vectors u.sub.12(t) and
(41)
are opposite to one another, that is to say when the 3 microphones selected among the Nt microphones 110m surround the interpolation position.
(42) In other embodiments that are not illustrated in [
(43) Back to [
(44) Thus, in the embodiment discussed hereinabove with reference to [
(45)
(46) with: (W.sub.i(t) X.sub.i(t) Y.sub.i(t) Z.sub.i(t)).sup.T the column vector of the field in the encoded format output by the microphone bearing the index i, i an integer from 1 to N; (W.sub.a(t) X.sub.a(t) Y.sub.a(t) Z.sub.a(t)).sup.T the column vector of the field in the encoded format at the interpolation position (for example, the position of the listener 110 in the embodiment illustrated in [
(47) In other embodiments that are not illustrated in [
(48)
where the dots refer to the higher-order components of the sound field decomposed in the ambisonic format.
(49) Regardless of the embodiment considered for encoding of the sound field, the interpolation method according to the invention applies in the same manner in order to estimate the weighting factors a.sub.i(t).
(50) For this purpose, the method of [=(
(t)
(t)
(t)).sup.T, coherent relative to the position of the sources 100s present in the sound stage 100.
(51) More particularly, in the embodiment of [
(52)
(53) with: x.sub.i(t)=(x.sub.i(t) y.sub.i(t) z.sub.i(t)).sup.T a vector representative of the position of the microphone 100m bearing the index i; x.sub.s(t)=(x.sub.s(t) y.sub.s(t) z.sub.s(t)).sup.T a vector representative of the position of the active source 100s; and d(x.sub.i(t), x.sub.s(t)) is the distance between the microphone 100m bearing the index i and the active source 100s.
(54) In this instance, the equation [Math 2] simply reflects the fact that for a plane wave: The first-order component (i.e. the pressure gradient vector) of the encoded sound field is directed in the “source-capture point” direction; and The amplitude of the sound field decreases linearly with the distance.
(55) At a first glance, the distance d(x.sub.i(t),x.sub.s(t)) is unknown, but it is possible to observe that, assuming a unique plane wave, the instantaneous acoustic pressure W.sub.i(t) at the microphone 100m bearing the index i is, in turn, inversely proportional to this distance. Thus:
(56)
(57) By substituting this relationship in [Math 2], the following proportional relationship is obtained:
B.sub.i%W.sub.i.sup.2(t)(x.sub.i(t)−x.sub.s(t))
(58) By replacing the relationship the latter relationship in [Math 1], the following equation is obtained:
(59)
(60) with x.sub.a(t)=(x.sub.a(t) y.sub.a(t) z.sub.a(t)).sup.T a vector representative of the interpolation position in the aforementioned reference frame. By reorganizing, we obtain:
(61)
(62) In general, the aforementioned different positions (for example, of the active source 100s, of the microphones 100m, of the interpolation position, etc.) vary over time. Thus, in general, the weighting factors a.sub.i(t) are time-dependent. Estimating the weighting factors a.sub.i(t) amounts to solving a system of three linear equations (written hereinabove in the form of one single vector equation in [Math 3]). For the interpolation to remain coherent over time with the interpolation position which may vary over time (for example, the considered position corresponds to the position of the listener 110 who could move), it is carried out at different time points with a time resolution T.sub.a adapted to the speed of change of the interpolation position. In practice, a refresh frequency f.sub.a=1/T.sub.a is substantially lower than the sampling frequency f.sub.s of the acoustic signals. For example, an update of the interpolation coefficients a.sub.i(t) every T.sub.a=100 ms is quite enough.
(63) In [Math 3], the square of the sound pressure at the interpolation position, W.sub.a.sup.2(t), also called instantaneous acoustic power (or more simply instantaneous power), is an unknown, the same applies to the vector representative of the position x.sub.s(t) of the active source 100s.
(64) To be able to estimate the weighting factors a.sub.i(t) based on a resolution of [Math 3], an estimate (t) of the acoustic power at the interpolation position is obtained for example.
(65) A first approach consists in approaching the instantaneous acoustic power by that one captured by the microphone 100m that is the closest to the considered interpolation position, i.e.:
(66)
(67) In practice, the instantaneous acoustic power W.sub.k.sup.2(t) may vary quickly over time, this may lead to a noisy estimate of the weighting factors a.sub.i(t) and to an instability of the interpolated stage. Thus, in some variants, the average or effective power captured by the microphone 100m that is the closest to the interpolation position over a time window around the considered time point, is calculated by averaging the instantaneous power over a frame of T samples:
(68)
(69) where T corresponds to a duration of a few tens of milliseconds, or equal to the refresh time resolution of the weighting factors a.sub.i(t).
(70) In other variants, it is possible to estimate the actual power by autoregressive smoothing in the form:(t)=α.sub.w
(t−1)+(1−α.sub.w)W.sub.i.sup.2(t),
(71) where the forget factor α.sub.w is determined so as to integrate the power over a few tens of milliseconds. In practice, values from 0.95 to 0.98 for sampling frequencies of the signal ranging from 8 kHz to 48 kHz achieves a good tradeoff between the robustness of the interpolation and its responsiveness to changes in the position of the source.
(72) In a second approach, the instantaneous acoustic power W.sub.a.sup.2(t) at the interpolation position is estimated as a barycentre of the N estimates (t) of the N instantaneous powers W.sub.i.sup.2(t) of the N pressures captured by the selected N microphones 100m. Such an approach turns out to be more relevant when the microphones 100m are spaced apart from one another. For example, the barycentric coefficients are determined according to the distance ∥x.sub.i(t)−x.sub.a(t)∥.sub.p, where p is a positive real number and ∥⋅∥.sub.p is the L-p norm, between the interpolation position and the microphone 110m bearing the index i among the N microphones 100m. Thus, according to this second approach:
(73)
(74) where {tilde over (d)}(x.sub.i(t),x.sub.a(t)) is the normalised version of ∥x.sub.i(t)−x.sub.a(t)∥.sub.p such that Σ.sub.i{tilde over (d)}(x.sub.i(t),x.sub.a(t))=1. Thus, a coefficient weighting the estimate (t) of the instantaneous power W.sub.i.sup.2(t) of the pressure captured by the microphone 110m bearing the index i, in the barycentric expression hereinabove and inversely proportional to a normalised version of the distance, in the sense of a L-p norm, between the position of the microphone bearing the index i outputting the pressure W.sub.i(t) and the interpolation position.
(75) In some alternatives, the instantaneous acoustic power W.sub.a.sup.2(t) at the interpolation position is directly estimated as a barycentre of the N instantaneous powers W.sub.i.sup.2(t) of the N pressures captured by the N microphones 100m. In practice, this amounts to substitute (t) with W.sub.i.sup.2(t) in the equation hereinabove.
(76) Moreover, different options for the norm p may be considered. For example, a low value of p tends to average the power over the entire area delimited by the microphones 100m, whereas a high value tends to favour the microphone 100m that is the closest to the interpolation position, the case p=∞ amounting to estimating by the power of the closest microphone 100m. For example, when p is selected equal to two, the decay law of the pressure of the sound field is met, leading to good results regardless of the configuration of the stage.
(77) Moreover, the estimation of the weighting factors a.sub.i(t) based on a resolution of [Math 3] requires addressing the problem of not knowing the vector representative of the position x.sub.s(t) of the active source 100s.
(78) In a first variant, the weighting factors a.sub.i(t) are estimated while neglecting the term containing the position of the source that is unknown, i.e. the right-side member in [Math 3]. Moreover, starting from the estimate of the power (t) and from the estimate
(t) of the instantaneous power W.sub.i.sup.2(t) captured by the microphones 100m, such a neglecting of the right-side member of [Math 3] amounts to solving the following system of three linear equations, written herein in the vector form:
(79)
(80) Thus, it arises that the weighting factors a.sub.i(t) are estimated from: the interpolation position, represented by the vector x.sub.a(t) the position of each of the N microphones 100m, represented by the corresponding vector x.sub.i(t), i from 1 to N, in the aforementioned reference frame; the N pressures W.sub.i(t), i from 1 to N, captured by the N microphones; and the estimated power (t) of the sound field at the interpolation position,
(t) being actually estimated from the considered quantities as described hereinabove.
(81) For example, [Math 4] is solved in the sense of mean squared error minimisation, for example by minimising the cost function ∥Σ.sub.ia.sub.i(t)(t)x.sub.i(t)−
(t)x.sub.a(t)∥.sup.2. In practice, the solving method (for example, the Simplex algorithm) is selected according to the overdetermined (more equations than microphones) or underdetermined (more microphones than equations) nature.
(82) In a second variant, the weighting factors a.sub.i(t) are no longer estimated while neglecting the term containing the unknown position of the source, i.e. the right-side member of [Math 3], but while constraining the search for the coefficients a.sub.i(t) around the hyperplan Σ.sub.ia.sub.i(t)(t)=
(t). Indeed, in the case where the estimate
(t) is a reliable estimate of the actual power W.sub.a.sup.2(t), imposing that the coefficients _a.sub.i(t) meet “to the best” the relationship Σ.sub.ia.sub.i(t)
(t)=
(t) implies that the right-side member in [Math 3] is low, and therefore any solution that solves the system of equations [Math 4] properly rebuilds the pressure gradients.
(83) Thus, in this second variant, the weighting factors a.sub.i(t) are estimated by solving the system [Math 4] with the constraint that Σ.sub.ia.sub.i(t)(t)=
(t). In the considered system,
(t) and
(t) are, for example, estimated according to one of the variants provided hereinabove. In practice, solving such a linear system with a linear constraint may be completed by the Simplex algorithm or any other constrained minimisation algorithm.
(84) To accelerate the search, it is possible to add a constraint of positivity of the weighting factors a.sub.i(t). In this case, the weighting factors a.sub.i(t) are estimated by solving the system [Math 4] with the dual constraint that Σ.sub.ia.sub.i(t)(t)=
(t), and that ∀i, a.sub.i(t)≥0. Moreover, the constraint of positivity of the weighting factors a.sub.i allows avoiding phase reversals, thereby leading to better estimation results.
(85) Alternatively, in order to reduce the computing time, another implementation consists in directly integrating the hyperplan constraint Σ.sub.ia.sub.i(t)(t)=
(t) into the system [Math 4], which ultimately amounts to resolution of the linear system:
(86)
(87) In this instance, the coefficient α allows homogenising the units of the quantities (t)x.sub.a(t) and
(t). Indeed, the considered quantities are not homogenous and, depending on the unit selected for the position coordinates (meter, centimeter, . . . ), the solutions will favor either the equations set Σ.sub.ia.sub.i(t)
(t)x.sub.i(t)=
(t)x.sub.a(t), or the hyperplan Σ.sub.ia.sub.i(t)
(t)=
(t). In order to make these quantities homogeneous, the coefficient α is, for example, selected equal to the L−2 norm of the vector x.sub.a(t), i.e. α=∥x.sub.a(t)∥.sub.2, with
(88)
In practice, it may be interesting to constrain even more the interpolation coefficients to meet the hyperplan constraint Σ.sub.ia.sub.i(t)(t)=
(t). This may be obtained by weighting the amplifying factor α by an amplification factor λ>1. The results show that an amplification factor λ from 2 to 10 makes the prediction of the pressure gradients more robust.
(89) Thus, we also note in this second variant that the weighting factors a.sub.i(t) are estimated from: the interpolation position, represented by the vector x.sub.a(t); the position of each of the N microphones 100m, each represented by the corresponding vector x.sub.i(t), i from 1 to N; the N pressures W.sub.i(t), i from 1 to N, captured by the N microphones; and the estimated power (t) of the sound field at the interpolation position,
(90) (t) being actually estimated from the considered quantities as described hereinabove.
(91) As of now, the performances of the method of [
(92) More particularly, the four microphones 300m are disposed at the four corners of a room and the source 300s is disposed at the center of the room. The room has an average reverberation, with a reverberation time or T.sub.60 of about 500 ms. The sound field captured by the microphones 300m is encoded in a form comprising a captured pressure and the associated pressure gradient vector.
(93) The results obtained by application of the method of [
(94)
(95) The simulations show that this heuristic formula provides better results than the method with fixed weights suggested in the literature.
(96) To measure the performance of the interpolation of the field, we use the intensity vector {right arrow over (I)}(t) which theoretically should point in the direction opposite to the active source 300s. In [
(97) As of now, the performances of the method of [
(98) More particularly, in comparison with the configuration of the stage 300 of [
(99) In [
(100) As of now, another embodiment of the method for interpolating the sound field captured by the microphones 100m of the stage 100 is presented, with reference to [
(101) According to the embodiment of [
(102) However, in other embodiments that are not illustrated in [
(103) Back to [
(104) To avoid this, the embodiment of [
(105) Thus, at a step E500, for given encoded sound field among the N encoded sound fields output by the selected N microphones 100m, a transformation of the given encoded sound field is performed by application of a time-frequency transformation such as Fourier transform or a perfect or almost perfect reconstruction filter bank, such as quadrature mirror filters or QMF. Such a transformation outputs M field frequency components associated to the given encoded sound field, each field frequency component among the M field frequency components being located within a distinct frequency sub-band.
(106) For example, the encoded field vector, ψ.sub.i, output by the microphone bearing the index i, i from 1 to N, is segmented into frames bearing the index n, with a size T compatible with the steady state of the sources present in the stage.
ψ.sub.i(n)=[ψ.sub.i(t.sub.n−T+1)ψ.sub.i(t.sub.n−T+2) . . . ψ.sub.i(t.sub.n)].
(107) For example, the frame rate corresponds to the reset rate T.sub.a of the weighting factors a.sub.i(t), i.e.:
t.sub.n+1=t.sub.n+E[T.sub.a/T.sub.s],
where Ts=1/fs is the sampling frequency of the signals and E[⋅] refers to the floor function.
(108) Thus, the transformation is applied to each component of the vector ψ.sub.i representing the sound field encoded by the microphone 100m bearing the index i (i.e. is applied to the captured pressure, to the components of the pressure gradient vector, as well as to the high-order components present in the encoded sound field, where appropriate), to produce a time-frequency representation. For example, the considered transformation is a direct Fourier transform. In this manner, we obtain for the l-th component ψ.sub.i,l of the vector ψ.sub.i:
(109)
(110) where j=√{square root over (−1)}, and ω the normalised angular frequency.
(111) In practice, it is possible to select T as a power of two (for example, immediately greater than T.sub.a) and select ω=2πk/T, 0≤k<T so as to implement the Fourier transform in the form of a fast Fourier transform
(112)
(113) In this case, the number of frequency components M is equal to the size of the analysis frame T. When T>T.sub.a, it is also possible to apply the zero-padding technique in order to apply the fast Fourier transform. Thus, for a considered frequency sub-band ω (or k in the case of a fast Fourier transform), the vector constituted by all of the components ψ.sub.i,l(n, ω), (ou ψ.sub.i,l(n, k)) for the different l, represents the frequency component of the field ψ.sub.i within the considered frequency sub-band ω (or k).
(114) Moreover, in other variants, the transformation applied at step E500 is not a Fourier transformation, but an (almost) perfect reconstruction filter bank, for example a filter bank: QMF (standing for “Quadrature Mirror Filter”); PQMF (standing for “Pseudo—Quadrature Mirror Filter”); or MDCT (standing for “Modified Discrete Cosine Transform”).
(115) Back to [
(116) In this manner, steps E210 and E210a described hereinabove with reference to [
(117) For example, in order to implement the resolution of the systems [Math 4] or [Math 5], the effective power of each frequency sub-band is estimated either by a rolling average:
(118)
(119) or by an autoregressive filtering:(n,ω)=α.sub.w
(n−1,ω)+(1−α.sub.w)|W.sub.i.sup.2(n,ω)|.
(120) Thus, the interpolation repeated for the M frequency sub-bands outputs M interpolated field frequency components at the interpolation position, each interpolated field frequency component among the M interpolated field frequency components being located within a distinct frequency sub-band.
(121) Thus, at a step E510, an inverse transformation of the transformation applied at step E500 is applied to the M interpolated field frequency components outputting the interpolated encoded sound field at the interpolation position.
(122) For example, considering again the example provided hereinabove where the transformation applied at step E500 is a direct Fourier transform, the inverse transformation applied at step E510 is an inverse Fourier transform.
(123) As of now, a method for rendering the sound field captured by the microphones 100m of
(124) More particularly, at a step E600, the sound field is captured by the microphones 110m, each microphone among the microphones 110m outputting a corresponding captured sound field;
(125) At a step E610, each of the captured sound fields is encoded in a form comprising the captured pressure and an associated pressure gradient vector.
(126) In other non-illustrated embodiments, the sound field captured by the microphones 110m is encoded in a form comprising the captured pressure, an associated pressure gradient vector as well as all or part of the higher order components of the sound field decomposed in the ambisonic format.
(127) Back to [
(128) At a step E630, the interpolated encoded sound field is compressed, for example by implementing an entropic encoding. Thus, a compressed interpolated encoded sound field is output. For example, the compression step E630 is implemented by the device 700 (described hereinbelow with reference to
(129) Thus, at a step E640, the compressed interpolated encoded sound field output by the device 700 is transmitted to the rendering device 110hp. In other embodiments, the compressed interpolated encoded sound field is transmitted to another device provided with a computing capacity allowing decompressing a compressed content, for example a smartphone, a computer, or any other connected terminal provided with enough computing capacity, in preparation for a subsequent transmission.
(130) Back to [
(131) At a step E660, the interpolated encoded sound field is rendered on the rendering device 110hp.
(132) Thus, when the interpolation position corresponds to the physical position of the listener 110, the latter feels as if the sound field rendered to him is coherent with the sound sources 100s (i.e. the field rendered to him actually arrives from the direction of the sound sources 100s).
(133) In some embodiments that are not illustrated in [
(134) In other embodiments that are not illustrated in [
(135) As of now, an example of a structure of a rendering device 700 according to an embodiment of the invention is presented, with reference to [
(136) The device 700 comprises a random-access memory 703 (for example a RAM memory), a processing unit 702 equipped for example with a processor, and driven by a computer program stored in a read-only memory 701 (for example a ROM memory or a hard disk). Upon initialisation, the computer program code instructions are loaded for example in the random-access memory 703 before being executed by the processor of the processing unit 702.
(137) This [
(138) In the case where the device 700 is made with a reprogrammable computing machine, the corresponding program (that is to say the sequence of instructions) may be stored in a storage medium, whether removable (such as a floppy disk, a CD-ROM or a DVD-ROM) or not, this storage medium being partially or totally readable by a computer or processor.
(139) Moreover, in some embodiments discussed hereinabove with reference to [
(140) Thus, in some embodiments, the device 700 is included in the rendering device 110hp.
(141) In other embodiments, the device 700 is included in one of the microphones 110m or is duplicated in several ones of the microphones 110m.
(142) Still in other embodiments, the device 700 is included in a piece of equipment remote from the microphones 110m as well as from the rendering device 110hp. For example, the remote equipment is a MPEG-H 3D decoder, a contents server, a computer, etc.