Method and apparatus for adaptive control of decorrelation filters
11501785 · 2022-11-15
Assignee
Inventors
Cpc classification
H04S2420/07
ELECTRICITY
H04S2420/03
ELECTRICITY
G10L19/008
PHYSICS
H04S5/00
ELECTRICITY
H04S3/008
ELECTRICITY
H04S2400/01
ELECTRICITY
International classification
Abstract
An audio signal processing method and apparatus for adaptively adjusting a decorrelator. The method comprises obtaining a control parameter and calculating mean and variation of the control parameter. Ratio of the variation and mean of the control parameter is calculated, and a decorrelation parameter is calculated based on the said ratio. The decorrelation parameter is then provided to a decorrelator.
Claims
1. An audio signal processing method for adaptively adjusting a decorrelator, the method comprising: obtaining a control parameter; calculating a mean of the control parameter and/or a variation of the control parameter; and calculating a decorrelation parameter based on the calculated mean of the control parameter and/or the calculated variation of the control parameter.
2. The method according to claim 1, wherein calculating the decorrelation parameter comprises calculating a targeted decorrelation filter length.
3. The method according to claim 1, wherein the control parameter is (i) received from an encoder, (ii) obtained from information available at a decoder, or (iii) obtained by a combination of available and received information.
4. The method according to claim 1, wherein the control parameter is a performance measure that is obtained from estimated reverberation length, correlation measures, estimation of spatial width, or prediction gain.
5. The method according to claim 1, wherein the control parameter is determined based on an estimated performance of a parametric description of spatial properties of an input audio signal.
6. The method according to claim 1, the method further comprising performing adaptation of the decorrelation parameter in at least two sub-bands, wherein each frequency band of said at least two sub-bands has an optimal decorrelation parameter.
7. The method according to claim 2, the method further comprising: calculating a decorrelation signal strength based on the calculated targeted decorrelation filter length, wherein at least one of the targeted decorrelation filter length and the decorrelation signal strength are controlled by an analysis of decoded audio signals.
8. The method according to claim 2, the method further comprising: calculating a decorrelation signal strength based on the calculated targeted decorrelation filter length, wherein at least one of the targeted decorrelation filter length and the decorrelation signal strength are controlled as functions of two or more different control parameters.
9. The method according to claim 1, the method further comprising: calculating the mean of the control parameter; and calculating the variation of the control parameter, wherein the decorrelation parameter is calculated based on the calculated mean of the control parameter and the calculated variation of the control parameter.
10. The method according to claim 9, the method further comprising: calculating a ratio of the calculated mean of the control parameter and the calculated variation of the control parameter, wherein the decorrelation parameter is calculated based on the calculated ratio.
11. The method according to claim 2, wherein the targeted decorrelation filter length is calculated based on two different filter lengths.
12. The method according to claim 1, wherein calculating the mean of the control parameter comprises calculating an average of values of the control parameter over time.
13. The method according to claim 12, wherein calculating the average of values of the control parameter over time comprises: calculating a frame value of the control parameter for each of a plurality of frames; and calculating an average of the frame values of the control parameter.
14. An apparatus for adaptively adjusting a decorrelator, the apparatus comprising a processor and a memory, said memory comprising instructions executable by said processor whereby said apparatus is operative to: obtain a control parameter; calculate a mean of the control parameter and/or a variation of the control parameter; and calculate a decorrelation parameter based on the calculated mean of the control parameter and/or the calculated variation of the control parameter.
15. The apparatus according to claim 14, wherein calculating the decorrelation parameter comprises calculating a targeted decorrelation filter length.
16. The apparatus according to claim 14, further configured to (i) receive the control parameter from an encoder, (ii) obtain the control parameter from information available at the apparatus, (iii) obtain the control parameter from a combination of available and received information.
17. The apparatus according to claim 14, wherein the control parameter is a performance measure that is obtained from estimated reverberation length, correlation measures, estimation of spatial width, or prediction gain.
18. The apparatus according to claim 14, wherein the control parameter is determined based on an estimated performance of a parametric description of spatial properties of an input audio signal.
19. The apparatus according to claim 14, further configured to perform adaptation of the decorrelation parameter in at least two sub-bands, each frequency band having an optimal decorrelation parameter.
20. The apparatus according to claim 14, wherein the apparatus is further operative to: calculate the mean of the control parameter; and calculate the variation of the control parameter, wherein the decorrelation parameter is calculated based on the calculated mean of the control parameter and the calculated variation of the control parameter.
21. The apparatus according to claim 20, wherein the apparatus is further operative to: calculate a ratio of the calculated mean of the control parameter and the calculated variation of the control parameter, wherein the decorrelation parameter is calculated based on the calculated ratio.
22. The apparatus of claim 14, wherein the apparatus is included in a decorrelator used for spatial synthesis in a parametric stereo decoder, a stereo or multi-channel audio codec, or a parametric stereo decoder.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) For a more complete understanding of example embodiments of the present invention, reference is now made to the following descriptions taken in connection with the accompanying drawings in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
DETAILED DESCRIPTION
(10) An example embodiment of the present invention and its potential advantages are understood by referring to
(11) Existing solutions for representation of non-coherent signal components are based on time-invariant decorrelation filters and the amount of non-coherent components in the decoded multi-channel audio is controlled by the mixing of decorrelated and non-decorrelated signal components.
(12) An issue of such time-invariant decorrelation filters is that the decorrelated signal will not be adapted to properties of the input signals which are affected by variations in the auditory scene. For example, the ambience in a recording of a single speech source in a low reverb environment would be represented by decorrelated signal components from the same filter as for a recording of a symphony orchestra in a big concert hall with significantly longer reverberation. Even if the amount of decorrelated components is controlled over time the reverberation length and other properties of the decorrelation is not controlled. This may cause the ambience for the low reverb recording sound too spacious while the auditory scene for the high reverb recording is perceived to be too narrow. A short reverberation length, which is desirable for low reverb recordings, often results in metallic and unnatural ambiance for recordings of more spacious recordings.
(13) The proposed solution improves the control of non-coherent audio signals by taking into account how the non-coherent audio varies over time and uses that information to adaptively control the character of the decorrelation, e.g. the reverberation length, in the representation of non-coherent components in a decoded and rendered multi-channel audio signal.
(14) The adaptation can be based on signal properties of the input signals in the encoder and controlled by transmission of one or several control parameters to the decoder. Alternatively, it can be controlled without transmission of an explicit control parameter but from information already available at the decoder or by a combination of available and transmitted information (i.e. information received by the decoder from the encoder).
(15) A transmitted control parameter may for example be based on an estimated performance of the parametric description of the spatial properties, i.e. the stereo image in case of two-channel input. That is, the control parameter may be a performance measure. The performance measure may be obtained from estimated reverberation length, correlation measures, estimation of spatial width or prediction gain.
(16) The solution provides a better control of reverberation in decoded rendered audio signals which improves the perceived quality for a variety of signal types, such as clean speech signals with low reverberation or spacious music signals with large reverberation and a wide audio scene.
(17) The essence of embodiments is an adaptive control of a decorrelation filter length for representation of non-coherent signal components utilized in a multi-channel audio decoder. The adaptation is based on a transmitted performance measure and how it varies over time. In addition, the strength of the decorrelated component may be controlled based on the same control parameter as the decorrelation length.
(18) The proposed solution may operate on frames or samples in the time domain on frequency bands in a filterbank or transform domain, e.g. utilizing Discrete Fourier Transform (DFT), for processing on frequency coefficients of frequency bands. Operations performed in one domain may be equally performed in another domain and the given embodiments are not limited to the exemplified domain.
(19) In one embodiment, the proposed solution is utilized for a stereo audio codec with a coded down-mix channel and a parametric description of the spatial properties, i.e. as illustrated in
(20) A down-mix channel of two input channels x and Y may be obtained from
(21)
(22) where M is the down-mix channel and s is the side channel. The down-mix matrix u, may be chosen such that the M channel energy is maximized and the s channel energy is minimized. The down-mix operation may include phase or time alignment of the input signals. An example of a passive down-mix is given by
(23)
(24) The side channel s may not be explicitly encoded but parametrically modelled for example by using a prediction filter where ŝ is predicted from the decoded mid channel M and used at the decoder for spatial synthesis. In this case prediction parameters, e.g. prediction filter coefficients, may be encoded and transmitted to the decoder.
(25) Another way to model the side channel is to approximate it by decorrelation of the mid channel. The decorrelation technique is typically a filtering method used to generate an output signal that is incoherent with the input signal from a fine-structure point of view. The spectral and temporal envelopes of the decorrelated signal shall ideally remain. Decorrelation filters are typically all-pass filters with phase modifications of the input signal.
(26) In this embodiment, the proposed solution is used to adaptively adjust a decorrelator used for spatial synthesis in a parametric stereo decoder.
(27) Spatial rendering (up-mix) of the encoded mono channel M is obtained by
(28)
(29) where U.sub.2 is an up-mix matrix and D is ideally uncorrelated to M on a fine-structure point of view. The up-mix matrix controls the amount of M and D in the synthesized left (g) and right (Ŷ) channel. It is to be noted that the up-mix can also involve additional signal components, such as a coded residual signal.
(30) An example of an up-mix matrix utilized in parametric stereo with transmission of ILD and ICC is given by
(31)
(32) The rotational angle α is used to determine the amount of correlation between the synthesized channels and is given by
α=½ arccos(ICC). (7)
(33) The overall rotation angle β is obtained as
(34)
(35) The ILD between the two channels x[n] and y[n] is given by
(36)
(37) where n=[1, . . . , N] is the sample index over a frame of N samples.
(38) The coherence between channels can be estimated through the inter-channel cross correlation (ICC). A conventional ICC estimation relies on the cross-correlation function (CCF) r.sub.xy which is a measure of similarity between two waveforms x[n] and y[n], and is generally defined in the time domain as
r.sub.xy[n,τ]=E[x[n]y[n+τ]], (10)
(39) where τ is the time-lag and EH the expectation operator. For a signal frame of length N the cross-correlation is typically estimated as
r.sub.xy[τ]=Σ.sub.n=0.sup.N-1x[n]y[n+τ] (11)
(40) The ICC is then obtained as the maximum of the CCF which is normalized by the signal energies as follows
(41)
(42) Additional parameters may be used in the description of the stereo image. These can for example reflect phase or time differences between the channels.
(43) A decorrelation filter may be defined by its impulse response h.sub.d(n) or transfer function H.sub.d(k) in the DFT domain where n and k are the sample and frequency index, respectively. In the DFT domain a decorrelated signal M.sub.d is obtained by
M.sub.d[k]=H.sub.d[k]{circumflex over (M)}[k] (13)
(44) where k is a frequency coefficient index. Operating in the time domain a decorrelated signal is obtained by filtering
m.sub.d[n]=h.sub.d[n]*{circumflex over (m)}[n] (14)
(45) where n is a sample index.
(46) In one embodiment a reverberator based on A serially connected all-pass filters is obtained as
(47)
(48) where ψ[a] and d[a] specifies the decay and the delay of the feedback. This is just an example of a reverberator that may be used for decorrelation and alternative reverberators exist, fractional sample delays may for example be utilized. The decay factors ψ[a] may be chosen in the interval [0,1) as a value larger than 1 would result in an instable filter. By choosing a decay factor ψ[a]=0, the filter will be a delay of d[a] samples. In that case, the filter length will be given by the largest delay d[a] among the set of filters in the reverberator.
(49) Multi-channel audio, or in this example two-channel audio, has naturally a varying amount of coherence between the channels depending on the signal characteristics. For a single speaker recorded in a well-damped environment there will be a low amount of reflections and reverberation which will result in high coherence between the channels. As the reverberation increases the coherence will generally decrease. This means that for clean speech signals with low amount of noise and ambience the length of the decorrelation filter should probably be shorter than for a single speaker in a reverberant environment. The length of the decorrelator filter is one important parameter that controls the character of the generated decorrelated signal. Embodiments of the invention may also be used to adaptively control other parameters in order to match the character of the decorrelated signal to that of the input signal, such as parameters related to the level control of the decorrelated signal.
(50) By utilizing a reverberator for rendering of non-coherent signal components the amount of delay may be controlled in order to adapt to different spatial characteristics of the encoded audio. More generally one can control the length of the impulse response of a decorrelation filter. As mentioned above controlling the filter length can be equivalent to controlling the delay of a reverberator without feedback.
(51) In one embodiment the delay d of a reverberator without feedback, which in this case is equivalent to the filter length, is a function ƒ.sub.1(.Math.) of a control parameter c.sub.1
d=ƒ.sub.1(c.sub.1). (16)
(52) A transmitted control parameter may for example be based on an estimated performance of the parametric description of the spatial properties, i.e. the stereo image in case of two-channel input. The performance measure r may for example be obtained from estimated reverberation length, correlation measures, estimation of spatial width or prediction gain. The decorrelation filter length d may then be controlled based on this performance measure, i.e. c.sub.1 is the performance measure r. One example of a suitable control function ƒ.sub.1(.Math.) is given by
(53)
(54) where γ.sub.1 is a tuning parameter typically in the range [0, D.sub.max] with a maximum allowed delay D.sub.max and θ.sub.1 is an upper limit of g(r). If g(r)>θ.sub.1 a shorter delay is chosen, e.g. d=1.
(55) θ.sub.1 is a tuning parameter that may for example be set to θ.sub.1=7.0. There is a relation between θ.sub.1 and the dynamics of g(r) and in another embodiment it may for example be θ.sub.1=0.22. The sub-function g(r) may be defined as the ratio between the change of r and the average r over time. This ratio will go higher for sounds that have a lot of variation in the performance measure compared to its mean value, which is typically the case for sparse sounds with little background noise or reverberation. For more dense sounds, like music or speech with background noise this ratio will be lower and therefor works like a sound classifier, classifying the character of the non-coherent components of the original input signal. The ratio can be calculated as
(56)
(57) where θ.sub.max is an upper limit e.g. set to 200 and θ.sub.min is a lower e.g. set to 0. The limits may for example be related to the tuning parameter θ.sub.1, e.g. θ.sub.ma=1.5θ.sub.1.
(58) An estimation of the mean of a transmitted performance measure is for frame i obtained as
(59)
(60) For the first frame r.sub.mean[i−1] may be initialized to 0. The smoothing factors α.sub.7 and α.sub.neg may be chosen such that upward and downward changes of r are followed differently. In one example α.sub.pos=0.005 and α.sub.neg=0.5 which means that the mean estimation follows to a larger extent the minima of the mean performance measure over time. In another embodiment, the positive and negative smoothing factors are equal, e.g. α.sub.pps=α.sub.neg=0.1.
(61) Similarly, the smoothed estimation of the performance measure variation is obtained as
(62)
(63) Alternatively, the variance of r may be estimated as
(64)
(65) The ratio g(r) may then relate the standard deviation √{square root over (σ.sub.r.sup.2)} to the mean r.sub.mean, i.e.
(66)
(67) or the variance may be related to the squared mean, i.e.
(68)
(69) Another estimation of the standard deviation could be given by
(70)
(71) which has lower complexity.
(72) The smoothing factors β.sub.pos and β.sub.neg may be chosen such that upward and downward changes of r.sub.c are followed differently. In one example β.sub.pos=0.5 and β.sub.neg=0.05 which means that the mean estimation follows to a larger extent the maxima of the change in the performance measure over time. In another embodiment, the positive and negative smoothing factors are equal, e.g. β.sub.pos=β.sub.neg=0.1.
(73) Generally for all given examples the transition between the two smoothing factors may be made for any threshold that the update value of the current frame is compared to. I.e. in the given example of equation 25 r.sub.c[i]>θ.sub.thres.
(74) In addition, the ratio g(r) controlling the delay may be smoothed over time according to
(75) where the smoothing factor α.sub.s is a tuning factor e.g. set to 0.01. This means that g (r[i]) in equation 17 is replaced by g[i] for the frame i.
(76) In another embodiment, the ratio g(r) is conditionally smoothed based on the performance measure c.sub.1, i.e.
(77) One example of such function is
(78)
(79) where the smoothing parameters are a function of the performance measure. For example
(80)
(81) Depending on the performance measure used the function ƒ.sup.thres may be differently chosen.
(82) It can for example be an average, a percentile (e.g. the median), the minimum or the maximum of c.sub.1 over a set of frames or samples or over a set of frequency sub-bands or coefficients, i.e. for example
ƒ.sub.thres(c.sub.1)=max(c.sub.1[b]), (30)
(83) where b=b.sub.0, . . . b.sub.N-1 is an index for N frequency sub-bands. The smoothing factors control the amount of smoothing when the threshold θ.sub.high, e.g. set to 0.6, is exceeded, respectively not exceeded and can be equal for positive and negative updates or different, e.g. κ.sub.pos_high=0.03, κ.sub.neg_high=0.05, κ.sub.pos_low=0.1, κ.sub.neg_tow=0.001.
(84) It may be noted that additional smoothing or limitation of change in the obtained decorrelation filter length between samples or frames is possible in order to avoid artifacts. In addition, the set of filter lengths utilized for decorrelation may be limited in order to reduce the number of different colorations obtained when mixing signals. For example, there might be two different lengths where the first one is relatively short and the second one is longer.
(85) In one embodiment, a set of two available filters of different lengths d.sub.1 and d.sub.2 are used. A targeted filter length d may for example be obtained as
(86)
(87) where γ.sub.1 is a tuning parameter that for example is given by
γ.sub.1=d.sub.2Γd.sub.1+δ, (32)
(88) where δ is an offset term that e.g. can be set to 2. Here d.sub.2 is assumed to be larger than d.sub.1. It is noted that the target filter length is a control parameter but different filter lengths or reverberator delays may be utilized for different frequencies. This means that shorter or longer filters than the targeted length may be used for certain frequency sub-bands or coefficients.
(89) In this case, the decorrelation filter strength s controlling the amount of decorrelated signal D in the synthesized channels {circumflex over (X)} and Ŷ may be controlled by the same control parameters, in this case with one control parameter, the performance measure c.sub.1≡r.
(90) In another embodiment, the adaptation of the decorrelation filter length is done in several, i.e. at least two, sub-bands so that each frequency band can have the optimal decorrelation filter length.
(91) In an embodiment where the reverberator uses a set of filters with feedback, as depicted in equation 15, the amount of feedback, ψ[a], may also be adapted in similar way as the delay parameter d[a]. In such embodiment the length of the generated ambiance is a combination of both these parameters and thus both may need to be adapted in order to achieve a suitable ambiance length.
(92) In yet another embodiment, the decorrelation filter length or reverberator delay d and decorrelation signal strength s are controlled as functions of two or more different control parameters, i.e.
d=ƒ.sub.2(c.sub.21,c.sub.22, . . . ), (33)
s=ƒ.sub.3(c.sub.31,c.sub.32, . . . ). (34)
(93) In yet another embodiment, the decorrelation filter length and decorrelation signal strength are controlled by an analysis of the decoded audio signals.
(94) The reverberation length may additionally be specially controlled for transients, i.e. sudden energy increases, or for other signals with special characteristics.
(95) As the filter changes over time there should be some handling of changes over frames or samples. This may for example be interpolation or window functions with overlapping frames. The interpolation can be made between previous filters of their respectively controlled length to the currently targeted filter length over several samples or frames. The interpolation may be obtained by successively decrease the gain of previous filters while increasing the gain of the current filter of currently targeted length over samples or frames. In another embodiment, the targeted filter length controls the filter gain of each available filter such that there is a mixture of available filters of different lengths when the targeted filter length is not available. In the case of two available filters h.sub.1 and h.sub.2 of length d.sub.1 and d.sub.2 respectively, their gains s.sub.1 and s.sub.2 may be obtained as
s.sub.1=ƒ.sub.3(d.sub.1,d.sub.2,c.sub.1), (35)
s.sub.2=ƒ.sub.4(d.sub.1,d.sub.2,c.sub.1). (36)
(96) The filter gains may also be depending on each other, e.g. in order to obtain equal energy of the filtered signal, i.e. s.sub.2=ƒ(s.sub.1) in case h.sub.1 is the reference filter which gain is controlled by c.sub.1. For example the filter gain s.sub.i may be obtained as
s.sub.1=(d.sub.2−d)/(d.sub.2−d.sub.1), (37)
(97) where d is the targeted filter length in the range [d.sub.1,d.sub.2] and d.sub.2>d.sub.1. The second filter gain may then for example be obtained as
s.sub.2=√{square root over (1−s.sub.1.sup.2)}. (38)
(98) The filtered signal m.sub.d[n] is then obtained as
m.sub.d[n]=(s.sub.1h.sub.1[n]+s.sub.2h.sub.2[n])*{circumflex over (m)}[n], (39)
(99) if the filtering operation is performed in the time domain.
(100) In the case the decorrelation signal strength s is controlled by a control parameter c.sub.1 it may be beneficial to control it as a function ƒ.sub.4(.Math.) of control parameters of previous frames and the decorrelation filter length d. I.e.
s[i]=ƒ.sub.4(d,c.sub.1[i],c.sub.i[i−1], . . . ,c.sub.1[i−N.sub.m]). (40)
(101) One example of such function is
s[i]=min(β.sub.4c.sub.1[i−d],c.sub.1[i−d](1−α.sub.4)+α.sub.4c.sub.1[i]), (41)
(102) where α.sub.4 and β.sub.4 are tuning parameters, e.g. α.sub.4=0.8 or α.sub.4=0.6 and β.sub.4=1.0. α.sub.4 should typically be in the range [0,1] while β.sub.4 may be larger than one as well.
(103) In the case of a mixture of more than one filter the strength s of the filtered signal m.sub.d[n] in the up-mix with {circumflex over (m)}[n] may for example be obtained based on a weighted average, i.e. in case of two filters h.sub.1 and h.sub.2 by
s[i]=min(β.sub.4w[i],w[i](1−α.sub.4)+α.sub.4c.sub.1[i]), (42)
(104) where
w[i]=s.sub.1c.sub.1[i−d.sub.1]+s.sub.2c.sub.1[i−d.sub.2]. (43)
(105)
(106)
(107)
(108)
(109) The methods may be performed by a parametric stereo decoder or a stereo audio codec.
(110)
(111) The apparatus 700 may be comprised in an audio decoder, such as the parametric stereo decoder shown in a lower part of
(112)
(113) In an embodiment, the decorrelation length calculator 802 comprises an obtaining unit for receiving or obtaining a performance measure parameter, i.e. a control parameter. It further comprises a first calculation unit for calculating a mean and a variation of the performance measure, a second calculation unit for calculating the ratio of the variation and the mean of the performance measure, and a third calculation unit for calculating targeted decorrelation filter length. It may further comprise a providing unit for providing the targeted decorrelation filter length to a decorrelation unit.
(114) By way of example, the software or computer program 730 may be realized as a computer program product, which is normally carried or stored on a computer-readable medium, preferably non-volatile computer-readable storage medium. The computer-readable medium may include one or more removable or non-removable memory devices including, but not limited to a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc (CD), a Digital Versatile Disc (DVD), a Blue-ray disc, a Universal Serial Bus (USB) memory, a Hard Disk Drive (HDD) storage device, a flash memory, a magnetic tape, or any other conventional memory device.
(115) Embodiments of the present invention may be implemented in software, hardware, application logic or a combination of software, hardware and application logic. The software, application logic and/or hardware may reside on a memory, a microprocessor or a central processing unit. If desired, part of the software, application logic and/or hardware may reside on a host device or on a memory, a microprocessor or a central processing unit of the host. In an example embodiment, the application logic, software or an instruction set is maintained on any one of various conventional computer-readable media.
(116) Abbreviations
(117) ILD/ICLD Inter-channel Level Difference
(118) IPD/ICPD Inter-channel Phase Difference
(119) ITD/ICTD Inter-channel Time difference
(120) IACC Inter-Aural Cross Correlation
(121) ICC Inter-Channel correlation
(122) DFT Discrete Fourier Transform
(123) CCF Cross Correlation Function