Methods and systems for far-field denoise and dereverberation
09799318 · 2017-10-24
Assignee
Inventors
Cpc classification
H04R3/002
ELECTRICITY
G10K11/16
PHYSICS
International classification
H04B3/20
ELECTRICITY
H04M9/08
ELECTRICITY
Abstract
Method and system for use with audible signals that analyzes the signals into time-frequency frames over first and second time periods. Estimates of the noise and/or reverberation are derived from the frames. Gains are derived from the estimates and raised to a power to create modified gains. The modified gains are applied to the frames in the appropriate time periods. Modified audible signals are output after being processed by the modified gains.
Claims
1. A method to adjust suppression of one or more of reverberation and noise from an audible digital signal comprising: analyzing the audible digital signal into a plurality of time-frequency frames; deriving a first estimation of one or more of reverberation and noise from one or more of the frames from a first time instant; deriving a first suppression gain from the first estimation and selecting a first exponent from a predetermined set of exponents; deriving a modified first suppression gain based on the first suppression gain raised to a power equal to the first exponent; wherein the modified first suppression gain is applied to one or more of the frames in the first time instant; deriving a second estimation of one or more of reverberation and noise from one or more of the frames in a second time instant; deriving a second suppression gain from the second estimation and selecting a second exponent from a predetermined set of exponents; deriving a modified second suppression gain based on the second suppression gain raised to a power equal to the second exponent; wherein the modified second suppression gain is applied to one or more of the frames in the second time instant; outputting an audible signal that involves processing the frames in the first time instant utilizing the first modified suppression gain; and outputting an audible signal that involves processing the frames in the second time instant utilizing the second modified suppression gain; wherein the first exponent and the second exponent are different from one another; and wherein the second time instant is subsequent to the first time instant.
2. The method of claim 1, wherein said first and second time instants partially overlap one another.
3. The method of claim 1, wherein said audible signal is single channel or multi-channel or stereo or binaural.
4. The method of claim 1, wherein the selection of at least one of the first and second exponent is made by a user based upon audible information.
5. The method of claim 1, wherein the time-frequency frames are derived using a short-time Fourier transform (STFT) or a wavelet transform or a polyphase filterbank or a multi rate filterbank or a quadrature mirror filterbank or a warped filterbank or an auditory-inspired filterbank.
6. The method of claim 1, wherein the audible digital signal is the audible portion of a multimedia signal.
7. A system to adjust suppression of one or more of reverberation and noise from an audible digital signal comprising: a signal processing system connectable to one or more microphones and adapted to: analyze the audible digital signal into a plurality of time-frequency frames; derive a first estimation of one or more of reverberation and noise from one or more of the frames from a first time instant; derive a first suppression gain from the first estimation and selecting a first exponent from a predetermined set of exponents; derive a modified first suppression gain based on the first suppression gain raised to a first power equal to the first exponent; wherein the modified first suppression gain is applied to one or more of the frames in the first time instant; derive a second estimation of one or more of reverberation and noise from one or more of the frames in a second time instant; derive a second suppression gain from the second estimation and selecting a second exponent from a predetermined set of exponents; deriving a modified second suppression gain based on the second suppression gain raised to a second power equal to the second exponent; wherein the modified second suppression gain is applied to one or more of the frames in the second time instant; output an audible signal that involves processing the frames in the first time instant utilizing the first modified suppression gain; and output an audible signal that involves processing the frames in the second time instant utilizing the second modified suppression gain; wherein the first exponent and the second exponent are different from one another; and wherein the second time instant is subsequent to the first time instant.
8. The system of claim 7, wherein the modified first suppression gain is determined by raising the first suppression gain to a power related to the first exponent.
9. The system of claim 7, wherein the modified second suppression gain is determined by raising the second suppression gain to a power related to the second exponent.
10. The system of claim 7, wherein the selection of at least one of the first and second exponent is made by a user based upon audible information.
11. The system of claim 7, wherein the time-frequency frames are derived using a short-time Fourier transform (STFT) or a wavelet transform or a polyphase filterbank or a multi rate filterbank or a quadrature mirror filterbank or a warped filterbank or an auditory-inspired filterbank.
12. The system of claim 7, wherein the audible digital signal is the audible portion of a multimedia signal.
13. A method to adjust suppression of one or more of reverberation and noise from a multimedia digital signal comprising: analyzing at least the audible portion of the multimedia digital signal into a plurality of time-frequency frames; deriving a first estimation of one or more of reverberation and noise from one or more of the frames from a first time instant; deriving a first suppression gain from the first estimation and selecting a first exponent from a predetermined set of exponents; deriving a modified first suppression gain based on the first suppression gain raised to a first power equal to the first exponent; wherein the modified first suppression gain is applied to one or more of the frames in the first time instant; deriving a second estimation of one or more of reverberation and noise from one or more of the frames in a second time instant; deriving a second suppression gain from the second estimation and selecting a second exponent from a predetermined set of exponents; deriving a modified second suppression gain based on the second suppression gain raised to a second power equal to the second exponent; wherein the modified second suppression gain is applied to one or more of the frames in the second time instant; outputting at least the audible portion of the multimedia digital signal that involves processing the frames in the first time instant utilizing the first modified suppression gain; and outputting at least the audible portion of the multimedia digital signal that involves processing the frames in the second time instant utilizing the second modified suppression gain; wherein the first exponent and the second exponent are different from one another; and wherein the second time instant is subsequent to the first time instant.
14. The method of claim 13, wherein said first and second time instants partially overlap one another.
15. The method of claim 13, wherein said audible signal is single channel or multi-channel or stereo or binaural.
16. The method of claim 13, wherein the selection of at least one of the first and second exponent is made by a user based upon audible information.
17. A system to adjust suppression of one or more of reverberation and noise from multimedia digital signal comprising: a multimedia signal processing system connectable to one or more microphones and adapted to: analyze at least the audible portion of the multimedia digital signal into a plurality of time-frequency frames; derive a first estimation of one or more of reverberation and noise from one or more of the frames from a first time instant; derive a first suppression gain from the first estimation and selecting a first exponent from a predetermined set of exponents; derive a modified first suppression gain based on the first suppression gain raised to a first power equal to the first exponent; wherein the modified first suppression gain is applied to one or more of the frames in the first time instant; derive a second estimation of one or more of reverberation and noise from one or more of the frames in a second time instant; derive a second suppression gain from the second estimation and selecting a second exponent from a predetermined set of exponents; derive a modified second suppression gain based on the second suppression gain raised to a second power equal to the second exponent; wherein the modified second suppression gain is applied to one or more of the frames in the second time instant; output at least the audible portion of the multimedia signal that involves processing the frames in the first time instant utilizing the first modified suppression gain; and output at least the audible portion of the multimedia digital signal that involves processing the frames in the second time instant utilizing the second modified suppression gain; wherein the first exponent and the second exponent are different from one another; and wherein the second time instant is subsequent to the first time instant.
18. The system of claim 17, wherein the modified first suppression gain is determined by raising the first suppression gain to a power related to the first exponent.
19. The system of claim 17, wherein the modified second suppression gain is determined by raising the second suppression gain to a power related to the second exponent.
20. The system of claim 17, wherein the selection of at least one of the first and second exponent is made by a user based upon audible information.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) For a more complete understanding of the invention, reference is made to the following description and accompanying drawings, in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
DETAILED DESCRIPTION
(16) Hereinafter, embodiments of the present invention will be described in detail in accordance with the references to the accompanying drawings. It is understood that other embodiments may be utilized and structural changes may be made without departing from the scope of the present application.
(17) The exemplary systems and methods of this invention will also be described in relation to reducing reverberation in audio systems. However, to avoid unnecessarily obscuring the present invention, the following description omits well-known structures and devices that may be shown in block diagram form or otherwise summarized.
(18) For purposes of explanation, numerous details are set forth in order to provide a thorough understanding of the present invention. It should be appreciated however that the present invention may be practiced in a variety of ways beyond the specific details set forth herein. The terms determine, calculate and compute, and variations thereof, as used herein are used interchangeably and include any type of methodology, process, mathematical operation or technique.
(19)
(20) Following one exemplary model, the RIR captures the acoustic characteristics of a closed space. An exemplary RIR is shown as an example in
h(n)=h.sub.dir(n)+h.sub.ear(n)+h.sub.lat(n) (2)
where n is the discrete time index. The direct part of the RIR h.sub.dir(n) can be modeled as Kronecker delta function, shifted n.sub.s samples and attenuated by a factor κ
h.sub.dir(n)=κδ(n−n.sub.s) (3)
where κ and n.sub.s mainly depend on the source-receiver distance and the physical characteristics of the propagation medium.
(21) For illustrative reasons, one exemplary model for reverberation is described below. According to
(22)
where x.sub.j(n) represents the jth discrete-time anechoic source signal, h.sub.ij(n) is the impulse response that models the acoustic path between the jth source and the ith receiver and the * sign denotes time-domain convolution. According to the equations 2, 3, 4 a captured reverberant signal accounts for three components: (i) an anechoic part, (ii) early reverberation part and (iii) late reverberation part
(23)
Considering now a direct part consisting of the anechoic part and the early reflections part {circumflex over (x)}.sub.i(n) and a late reverberation part {circumflex over (r)}.sub.i(n), equation 5 becomes
(24)
(25) Although the effect of reverberation can be observed in the time domain signal, the effect of the acoustic environment and in particular the room dimensions and materials are best observed in the frequency domain. Dereverberation can be theoretically achieved either in the time or in the frequency domain. As a consequence, it is beneficial to utilize dereverberation estimation and reduction techniques in the time-frequency domain, using a relevant transform. The time-domain reverberant signal of equation 5 can be transformed in the time-frequency domain using any relevant technique. For example, this can be done via a short-time Fourier transform (STFT), a wavelet transform, a polyphase filterbank, a multi rate filterbank, a quadrature mirror filterbank, a warped filterbank, an auditory-inspired filterbank, etc. Each one of the above transforms will result to a specific time-frequency resolution, that will change the processing accordingly. All embodiments of the present application can use any available time-frequency transform.
(26) The reverberant signal y.sub.i(n) can be transformed to the Y.sub.i(ω, μ) where ω is a frequency index and μ is a time index. In exemplary embodiments, ω denotes the index of the frequency bin or the sub-band and μ denotes the index of a time frame or a time sample. In some embodiments, the Short Time Fourier Transform technique can be used, together with an appropriate overlap analysis-synthesis technique such as the overlap add or overlap save. Analysis windows can be set, for example, at 32, 64, 128, 256, 512, 1024, 2048, 4096 and 8192 samples for a sampling frequencies of 4000, 8000, 12000, 16000, 44100, 48000 and 96000, 192000 Hz. According to equation 4 the captured reverberant signal in the time-frequency domain can be represented as
(27)
where X.sub.j(ω, μ) and H.sub.ij(ω, μ) are the time-frequency representations of x.sub.j(n) and h.sub.ij(n) respectively.
(28) Generally speaking, reverberation is a convolutive distortion; however since late reverberation arrives in the diffuse field, it is not highly correlated with the original sound source. Given the foregoing, it can be sometimes considered as an additive degradation with noise-like characteristics. Considering late reverberation as an additive distortion and by transforming equation 6 in the time-frequency domain the reverberant signals can be modeled as
Y.sub.i(ω,μ)={circumflex over (X)}.sub.i(ω,μ)+{circumflex over (R)}.sub.i(ω,μ) (10)
where {circumflex over (X)}.sub.i(ω, μ) represents the direct sound received in the ith microphone (containing the anechoic signal and the early reverberation) and {circumflex over (R)}.sub.i(ω, μ) is the late reverberation received in the ith microphone. Following this model we can estimate the direct part of the sound signals. Many techniques can be used for this such as spectral subtraction, Wiener filtering, Kalman filtering, a Minimum Mean Square Estimators (MMSE), Least Means Square (LMS) filtering, etc. All relevant techniques are in the scope of the present application. As an example application and without, departing from the scope of the present invention spectral subtraction (i.e. a subtraction in the time-frequency domain) will be mostly used thereafter:
{circumflex over (X)}.sub.i(ω,μ)=Y.sub.i(ω,μ)−{circumflex over (R)}.sub.i(ω,μ) (11)
(29) The estimation of the clean signals can be derived by applying appropriate gains G.sub.i(ω, μ) on the reverberant signals i.e.:
(30)
and in an exemplary embodiment where spectral subtraction is used
(31)
The term gain in such techniques is not just a typical amplification gain (although the signal may be amplified in some cases). The dereverberation gain functions mentioned in embodiments of the present invention can be viewed as scale factors that modify the signal in the time-frequency domain. Given that {circumflex over (X)}.sub.i(ω, μ) and {circumflex over (R)}.sub.i(ω, μ) can be assumed uncorrelated (due to the nature of late reverberation), equation 10 can be written as
|Y.sub.i(ω,μ)=|{circumflex over (X)}.sub.i(ω,μ)
+|{circumflex over (R)}.sub.i(ω,μ)
(15)
For certain embodiments =1, 2 and the described model is implemented in the magnitude or power spectrum domain respectively. All embodiments of the present invention are relevant for any
value. In order to keep the notations simple, the magnitude spectrum is discussed in detail but any
value can be used.
(32) Equation 12 presents an example for producing a signal where late reverberation has been removed. The gain function G is calculated based on the received (reverberant) signal and knowledge of the nature of late reverberation in the acoustic environment. G can be measured or known a priori, or stored from previous measurements. G is a function of frequency (ω) and time (μ) but can also be a scalar or a function of just ω or μ.
(33) The gain functions G.sub.i(ω, μ) of equations 12, 13, 14 can be bounded in the closed interval [0, 1]. When G.sub.i(ω, μ)=0 we consider that the signal component consists entirely of late reverberation and we totally suppress the original signal. When G.sub.i(ω, μ)=1 we consider that the reverberant signal does not contain any late reverberation and the reverberant signal remains intact. Spectral subtraction is not the only way to derive gain functions G.sub.i(ω, μ). As mentioned before, in other exemplary embodiments the gain functions G.sub.i(ω, μ) can be extracted according to equation 13 by any technique that provides a first estimation of a clean signal {circumflex over (X)}.sub.i(ω, μ), such as Wiener filtering, subspace, statistically based, perceptually-motivated, etc.
(34) Ideally, both early and late reverberation must be suppressed from the reverberant signal. However, it is known that: (i) late reverberation is considered more harmful than the early reflections, (ii) blind dereverberation methods, where no knowledge other than the reverberant signal is used, usually result to severe processing artifacts and (iii) the aforementioned processing artifacts are more likely to appear when we are trying to completely remove all signal distortions rather than just reducing the more harmful ones. Hence, in exemplary embodiments we might be interested in removing only late reverberation.
(35) A metric for measuring the reverberation degradation is the Signal to Reverberation Ratio (SRR), which is the equivalent to the Signal to Noise Ratio (SNR) when reverberation is considered a form of additive noise. High SRR regions are not severely contaminated from reverberation and they are usually located in signal components where the energy of the anechoic signal is high. Therefore, in such signal parts, the anechoic sound source is dominant and they are mainly contaminated by early reverberation, typically as a form of spectral coloration. On the other hand, low SRR signal parts are significantly distorted from reverberation. Such signal components are likely to be found in places where the anechoic signal was quiet. (i.e. low-energy anechoic signal components). These regions are usually located at the signal's reverberant tails.
(36) In an exemplary embodiment, the energy of the reverberant signal's magnitude spectrum can be calculated in each frame as
(37)
where Ω is the number of frequency bins. Since this energy was found to be directly related to the amount of reverberation degradation, it can be used in exemplary embodiments in order to provide a dereverberation gain and used to remove reverberation, as explained for example in equation 12. In order to bound the E.sub.i(μ) values between [0,1], the energy values are normalized using an appropriate normalization factor f.sub.Ω. Hence, the direct sound can be estimated as
(38)
where E.sub.i(μ)/f.sub.Ω represents the gain G; as a function of time at the i.sup.th receiver. The factor f.sub.Ω is typically related to the size of the reverberant frame. In one example, the factor f.sub.Ω can be computed as the energy of a white noise frame of length Ω and of the maximum possible amplitude allowed by the reproduction system. In another example, f.sub.Ω can be obtained as the maximum spectral frame energy selected from a large number of speech samples, reproduced at the maximum amplitude allowed by the system. In other exemplary embodiments, instead of calculating the mean energy over each frame, the mean energy over specific sub-bands can be calculated. In examples, these sub-bands can be defined from the mel scale or the bark scale, they can rely on properties of the auditory system or they can be signal-dependent.
(39)
(40) In another embodiment of the present application, we can assume that low energy frequency bins are more likely to contain significant amounts of reverberation and high energy frequency bins are more likely to contain direct signal components. This can be also verified from
(41)
where λ>1 is a factor controlling the suppression rate and f is a normalization factor. This approach disproportionately increases the energy of high energy frequency bins when compared to the energy of low frequency bins. The normalization factor f is directly linked to the maximum amplitude that the system can reproduce without distortion. The factor f can be measured or known and may also change with time.
(42)
(43) Blind methods for the suppression of late reverberation typically produce processing artifacts, mainly due to late reverberation estimation errors. Embodiments of the present invention minimize or totally avoid such detrimental processing artifacts. In exemplary embodiments this is achieved by combining different reverberation estimation methods, in order to improve the quality of the dereverberated signal. An output signal resulting from a dereverberation method that compensates for early reverberation, ideally contains: (i) an anechoic signal and (ii) late reverberation. An output signal resulting from a dereverberation method that compensates for late reverberation, ideally contains: (i) an anechoic signal and (ii) early reverberation.
(44) Given the foregoing,
(45)
(46) In another embodiment, two or more late reverberation estimation methods can be combined to provide a new method for late reverberation suppression, with minimal or no processing artifacts. All embodiments of the present application relating to methods of dereverberation can be either single-channel, binaural or multi-channel.
(47) An exemplary case of the general reverberation concept (previously illustrated in
(48) In binaural setups such as the one described in
(49) For illustrative reasons one binaural model for reverberation will be described. Assuming a speaker and a listener having one receiver in his left ear and one receiver in his right ear. According to equation 10 the time-frequency domain discrete-time signal Y.sub.L(ω, μ) received in the listener's left ear is described as
Y.sub.L(ω,μ)=X.sub.L(ω,μ)+R.sub.L(ω,μ) (19)
and the captured signal in his right ear receiver can be expressed in the time-frequency domain Y.sub.R(ω, μ) is described as
Y.sub.R(ω,μ)=X.sub.R(ω,μ)+R.sub.R(ω,μ) (20)
where X.sub.L(ω, μ) and X.sub.R(ω, μ) are the direct signals (including the anechoic and the early reverberation parts) for the left and right channels respectively and R.sub.L(ω, μ) and R.sub.R(ω, μ) are the late reverberation components for the left, and right channels respectively. Since we want to apply identical processing, we can derive a hybrid signal containing information from both the left and right ear channels. Therefore, we derive a new signal {tilde over (Y)}(ω, μ) representing the sum of the left and right captured signals
{tilde over (Y)}(ω,μ)=Y.sub.R(ω,μ)+Y.sub.L(ω,μ) (21)
Now using {tilde over (Y)}(ω, μ), we can broadly estimate late reverberation for both channels {tilde over (R)}(ω, μ). In other embodiments, any combination of the left and right channel and can be used in order to derive {tilde over (Y)}(ω, μ). Alternatively the new signal {tilde over (Y)}(ω, μ) can be derived in the time domain and then transformed to the time frequency domain. Any known method for estimating late reverberation {tilde over (R)}(ω, μ) can be used. However, some examples are presented in the embodiments described below.
(50) In one embodiment, late reverberation {tilde over (R)}(ω, μ) of both channels can be estimated by the spectral energy of each frame of {tilde over (Y)}(ω, μ), as described in equations 16 and 17
(51)
(52) In an exemplary embodiment, late reverberation {tilde over (R)}(ω, μ) of both channels can be estimated by the spectral energy of each frame of {tilde over (Y)}(ω, μ), as described in equation 18
(53)
(54) In an exemplary embodiment, late reverberation is considered as a statistical quantity that does not dramatically change, across different room positions in the same room. Then h(n) is modeled as a discrete non-stationary stochastic process:
(55)
where b(n) is a zero-mean stationary Gaussian noise. The short time spectral magnitude of the reverberation is estimated as:
(56)
where |SNR.sub.pri(ω, μ)| is the a priori Signal to Noise Ratio that can be approximated by a moving average of the a posteriori Signal to Noise Ratio |SNR.sub.post(ω, μ)| in each frame:
|SNR.sub.pri(ω,μ)|=β|SNR.sub.pri(ω,μ−1)|+(1−β)max(0,|SNR.sub.post(ω,μ)−1|) (26)
where β is a constant taking values close to 1.
(57) In an exemplary embodiment, the late reverberation estimation is motivated by the observation that the smearing effect of late reflections produces a smoothing of the signal spectrum in the time domain. Hence, the late reverberation power spectrum is considered a smoothed and shifted version of the power spectrum of the reverberant speech:
|{tilde over (R)}(ω,μ)|.sup.2=γω(μ−ρ)*|{tilde over (Y)}(ω,μ)|.sup.2 (27)
where ρ is a frame delay, γ a scaling factor. The term ω(μ) represents an asymmetrical smoothing function given by the Rayleigh distribution:
(58)
where α represents a constant number of frames.
(59) In an exemplary embodiment, the short time power spectrum of late reverberation in each frame can be estimated as the sum of filtered versions of the previous frames of the reverberant signal's short time power spectrum:
(60)
where K is the number of frames that corresponds to an estimation of the RT.sub.60 and a.sub.l(ω, μ) are the coefficients of late reverberation. The coefficients of late reverberation can be derived from
(61)
(62) After having estimated the late reverberation {tilde over (R)}(ω, μ) from {tilde over (Y)}(ω, μ), this estimate is used in a dereverberation process. This can be done with many techniques including spectral subtraction, Wiener filtering, etc. For example, following the spectral subtraction approach, the binaural dereverberation gain {tilde over (G)}(ω, μ) will be
(63)
Since we want to preserve the binaural localization cues, this gain is then applied separately both on the left and right channels (according for example to equation 12), in order to obtain the estimation of the dereverberated signals for the left and right ear channel respectively. In equation 15 it is shown that for specific embodiments of the present application any exponent of the frequency transformation of the reverberant signal can be used. Hence, the binaural gain can be derived from and applied to |Y.sub.L(ω, μ) and |Y.sub.R(ω, μ)
for any
, but it can also be applied directly to the complex spectrum of left and right channels.
(64) An example method of the present invention provides dereverberation for binaural or 2-channel systems. Spectral processing tends to produce estimation artifacts. Looking at these artifacts with respect to the dereverberation gain (see equation 12, 13, 14), there are mainly two types of errors that result: Case I: The direct signal is incorrectly identified as reverberation. This results in low dereverberation gain values (G.sub.i(ω, μ).fwdarw.0), in places where the gain should have been high (G.sub.i(ω, μ).fwdarw..sub.1). As a consequence the output signal suffers from severe distortions, since direct signal parts are suppressed. Case II: reverberation parts are not located correctly and there is remaining reverberation in the output signal. These errors are originated when the method derives high dereverberation gain values (G.sub.i(ω, μ).fwdarw.1), in places where the gain should have been low (G.sub.i(ω, μ).fwdarw.0).
In exemplary embodiments of the present application, these artifacts are minimized with respect to the derived dereverberation gain. An example uses the coherence between the left and right channel as an indicator of the reverberation intrusion and modifies the original late reverberation estimation accordingly. This is an exemplary embodiment of the more general case presented in
(65) In a first step of an exemplary embodiment, the coherence Φ(ω, μ) between the left Y.sub.L(ω, μ) and the right Y.sub.R(ω, μ) reverberant channel is derived. The coherence can provide an estimation of distortion produced from early and late reverberation. There are many ways to calculate the coherence and they can all be used in different embodiments of the present application. As an example the coherence can be calculated as
(66)
(67) The coherence is (or can be) bounded in the closed interval [0,1]. Reverberation has an impact on the derived coherence values: Φ(ω, μ) values are smaller when reverberation is dominant and there is evidence that coherence can be seen as a measure of subjective diffuseness. Given the foregoing, we can assume that When Φ.fwdarw.1 the left and right channels are similar. This means that the signals are dominated by the direct signal Φ.fwdarw.0 the left and right channels are uncorrelated. This means that room interference is very significant (i.e. reverberation dominates the signals)
Note that the coherence estimation takes into account the constant changes of room acoustic conditions. These changes in room-acoustics are very significant in real-life applications, especially for the cases of moving speakers or a moving receivers.
(68) In exemplary embodiments of the present application the above findings are used to correct the reverberation estimation errors and produce dereverberated signals without artifacts. One way to do this, is by manipulating the derived dereverberation gain and extracting a new room-adaptive gain. This room-adaptive gain modification can be performed using any relevant technique such as a function, a method, a lookup table, an equation, a routine, a system, a set of rules etc. In exemplary embodiments four gain modification schemes can be assumed: 1. The coherence is relatively low, i.e. Φ.fwdarw.0 and the late reverberation estimation yields a relatively large dereverberation gain (i.e. {tilde over (G)}(ω, μ).fwdarw.1). In this case, the coherence estimation reveals that late reverberation dominates the signal and the gain is decreased in order to efficiently suppress reverberation. 2. The coherence is relatively low, i.e. Φ.fwdarw.0 and the late reverberation estimation yields a relatively small dereverberation gain (i.e. {tilde over (G)}(ω, μ).fwdarw.0). In this case, the coherence estimation reveals that late reverberation dominates the signal and the gain is not significantly changed. 3. The coherence is relatively high, i.e. Φ.fwdarw.1 and the late reverberation estimation yields a relatively large dereverberation gain (i.e. {tilde over (G)}(ω, μ).fwdarw.1). In this case, the coherence estimation reveals that direct components dominate the signal and the gain is not significantly changed. 4. The coherence is relatively high, i.e. Φ.fwdarw.1 and the late reverberation estimation yields a relatively small dereverberation gain (i.e. {tilde over (G)}(ω, μ).fwdarw.0). In this case, the coherence estimation reveals that direct components dominate the signal and the gain is significantly increased in order to protect the signal from overestimation artifacts. Such artifacts typically appear when direct signal components are suppressed from a dereverberation method, since they are mistaken for late reverberation.
Generally speaking, the suppression of direct signal parts may result in significant distortion. This type of distortion is generally less acceptable than the reverberation degradation itself. Hence, in particular embodiments of the present applications when gain is modified, said gain increase is more drastic than said gain decrease.
(69) In an example application, we can use the coherence estimation in order to correct the estimation errors of any dereverberation algorithm. A new room-adaptive gain is obtained through the following function:
G.sub.coh(ω,μ)=({tilde over (G)}(ω,μ).sup.1−Φ(ω,μ).sup.
where γ is a tuning parameter. This gain can be used to obtain the dereverberated left and right signals as
X.sub.L(ω,μ)=G.sub.coh(ω,μ)Y.sub.L(ω,μ) (37)
and
X.sub.R(ω,μ)=G.sub.coh(ω,μ)Y.sub.R(ω,μ) (38)
Again, the derived gain can be derived from and applied to |Y.sub.L(ω, μ) and |Y.sub.R(ω, μ)
for any
, but it can also be derived from and applied directly to the complex spectrum of left and right channels. Then the dereverberated time domain signals for the left x.sub.L(n) and right channels x.sub.R(n) can be obtained through an inverse transformation from the frequency to the time domain.
(70) The effect of coherence in the gain function of equation 36 is explained in the example illustrated in
(71)
(72) The first gain estimation of equation 39 1002 is shown as an example in
(73) In
(74) In
(75) In other exemplary embodiments of the present application, the aforementioned process may be applied in any multichannel dereverberation scenario. This can be done by any appropriate technique. For example, the coherence can be calculated between consecutive pairs of input channels, or between groups of channels, etc.
(76) In an exemplary embodiment, the amount of dereverberation is controlled, in relation to a modification of a dereverberation gain G(ω, μ). If a linear control is applied, all gain values will be equally treated:
G.sub.new(ω,μ)=ζ(G(ω,μ) (40)
where ζ is the operator that changes the suppression rate. This linear operation is not necessarily a good choice for dereverberation. Reverberation is a convolutive degradation, it is highly correlated to the input signal and a simple linear control of the dereverberation gain might not be sufficient. In this exemplary embodiment dereverberation is controlled in accordance to the original gain values: When there's a need for significantly reducing the suppression (i.e. increase the overall gain), the lower gain values are increased more drastically than the higher gain values. This can fix possible overestimation errors (where direct signal components are incorrectly suppressed), that are present in frequency components where a low gain was originally estimated. When there's a need for significantly increasing the suppression (i.e. reduce the overall gain), the higher gain values are decreased more drastically than the lower gain values. This prevents the frequency components assigned with a low gain from overestimation errors.
In typical examples, the dereverberation gain is increased at a higher rate than it is decreased. In some applications, the above dereverberation gain can be controlled automatically and fine-tuned according to the acoustic conditions. In other applications, it can be user-defined permitting for example to a hearing aid user to adapt the dereverberation rate to his specific needs.
(77) In an example, the gain function of a dereverberation filter G(ω, μ) is controlled through a parameter v, in order to extract a new filter G.sub.new(ω, μ) as
G.sub.new(ω,μ)=(G(ω,μ)).sup.ν (41)
where ν>0. In
(78) Even though embodiments of the present invention are related to the suppression of late reverberation, the methods presented in this application are also appropriate for the suppression of ambient noise. All assumptions made for late reverberation in the diffuse field (e.g. stationarity, stochastic characteristics, noise-like) broadly stand for ambient noise. Hence, the embodiments presented in this application inherently suppress both ambient noise and late reverberation and they are valid for ambient noise reduction as well.
(79)
(80) While the above-described flowcharts have been discussed in relation to a particular sequence of events, it should be appreciated that changes to this sequence can occur without materially effecting the operation of the invention. Additionally, the exemplary techniques illustrated herein are not limited to the specifically illustrated embodiments but can also be utilized and combined with the other exemplary embodiments and each described feature is individually and separately claimable.
(81) Additionally, the systems, methods and protocols of this invention can be implemented on a special purpose computer, a programmed microprocessor or microcontroller and peripheral integrated circuit element(s), an ASIC or other integrated circuit, a digital signal processor, a hard-wired electronic or logic circuit such as discrete element circuit, a programmable logic device such as PLD, PLA, FPGA, PAL, a modem, a transmitter/receiver, any comparable means, or the like. In general, any device capable of implementing a state machine that is in turn capable of implementing the methodology illustrated herein can be used to implement the various communication methods, protocols and techniques according to this invention.
(82) Furthermore, the disclosed methods may be readily implemented in software using object or object-oriented software development environments that provide portable source code that can be used on a variety of computer or workstation platforms. Alternatively the disclosed methods may be readily implemented in software on an embedded processor, a micro-processor or a digital signal processor. The implementation may utilize either fixed-point or floating point operations or both. In the case of fixed point operations, approximations may be used for certain mathematical operations such as logarithms, exponentials, etc. Alternatively, the disclosed system may be implemented partially or fully in hardware using standard logic circuits or VLSI design. Whether software or hardware is used to implement the systems in accordance with this invention is dependent on the speed and/or efficiency requirements of the system, the particular function, and the particular software or hardware systems or microprocessor or microcomputer systems being utilized. The systems and methods illustrated herein can be readily implemented in hardware and/or software using any known or later developed systems or structures, devices and/or software by those of ordinary skill in the applicable art from the functional description provided herein and with a general basic knowledge of the audio processing arts.
(83) Moreover, the disclosed methods may be readily implemented in software that can be stored on a storage medium, executed on programmed general-purpose computer with the cooperation of a controller and memory, a special purpose computer, a microprocessor, or the like. In these instances, the systems and methods of this invention can be implemented as program embedded on personal computer such as an applet, JAVA® or CGI script, as a resource residing on a server or computer workstation, as a routine embedded in a dedicated system or system component, or the like. The system can also be implemented by physically incorporating the system and/or method into a software and/or hardware system, such as the hardware and software systems of an electronic device.
(84) It is therefore apparent that there has been provided, in accordance with the present invention, systems and methods for reducing reverberation in electronic devices. While this invention has been described in conjunction with a number of embodiments, it is evident that many alternatives, modifications and variations would be or are apparent to those of ordinary skill in the applicable arts. Accordingly, it is intended to embrace all such alternatives, modifications, equivalents and variations that are within the spirit and scope of this invention.