METHOD, DEVICE, HEADPHONES AND COMPUTER PROGRAM FOR ACTIVELY SUPPRESSING THE OCCLUSION EFFECT DURING THE PLAYBACK OF AUDIO SIGNALS
20230328462 · 2023-10-12
Inventors
Cpc classification
H04R1/1091
ELECTRICITY
G10K11/17827
PHYSICS
G10K2210/1081
PHYSICS
International classification
Abstract
In the method according to the invention for actively suppressing the occlusion effect during the playback of audio signals by means of headphones or a hearing aid, a sound signal occurring from outside is captured by means of at least one outer microphone of the headphones or the hearing aid. A voice signal is captured by means of at least one additional microphone. The dry component of the captured voice signal is estimated, the dry component of the captured voice signal being the component of the captured voice signal without reverberation caused by the surrounding space. By means of a filter, a voice component is extracted from the outer sound captured using the at least one outer microphone. The extracted or produced voice component is output by means of a loudspeaker of the headphones or the hearing aid.
Claims
1. A method for actively suppressing the occlusion effect during the playback of audio signals by means of headphones or a hearing aid, comprising: capturing with at least one outer microphone of the headphones or the hearing aid, external sound in the form of a sound signal occurring from the outside; capturing a voice signal with at least one additional microphone; estimating the dry component of the captured voice signal, wherein the dry component of the captured voice signal is the component of the captured voice signal without reverberation caused by the surrounding space and without ambient noises; extracting a voice component by a filter from the external sound captured with the at least one outer microphone, with filter coefficients of the filter being determined based on the estimated dry component of the captured voice signal, or the estimated dry component of the captured voice signal is filtered such that a voice component is produced which has a comparable spatiality to the voice component at the at least one outer microphone; and outputting the extracted or generated voice component via a loudspeaker of the headphones or hearing aid.
2. The method according to claim 1, wherein the voice signal is captured with at least one microphone or microphone array directed towards the mouth of the user and/or an inner microphone of the headphones or hearing aid.
3. The method of claim 2, wherein a monaural dry component is estimated from the captured voice signal and based thereon binaural voice signals are extracted from the signals of at least two outer microphones of left and right headphones or left and right hearing aids, or the estimated monaural dry voice component is filtered to generate binaural voice signals with a comparable spatiality to the voice component at the outer microphones.
4. The method according to claim 2, wherein the binaural voice signals are filtered for left and right headphones or left and right hearing aids prior to being respectively output via a loudspeaker.
5. The method according to claim 2, wherein the estimate of the dry voice component at the outer microphone is performed by filtering with the respective relative impulse response between the mouth microphone or microphone array and the outer microphone and subsequent averaging.
6. The method according to claim 1, wherein the filter for extracting or generating the voice component based on the detected outside noise and the estimated dry voice is a Wiener filter, an adaptive filter or a filter which simulates a room impulse response.
7. The method according to claim 1, wherein the estimated dry component of the detected voice signal and the extrated or generated voice component are linearly weighted and then added.
8. Device for actively suppressing the occlusion effect during the playback of audio signals by means of a loudspeaker in a headphone or hearing aid provided with at least one outer microphone, comprising: at least one additional microphone for capturing a voice signal from a user; a digital signal processor which is arranged to estimate a dry component of a voice signal captured with the at least one additional microphone, wherein the dry component of the captured voice signal is the component of the captured voice signal without reverberation caused by the surrounding space and without ambient noises; extract from the external sound captured by the at least one outer microphone the voice component using a filter, wherein filter coefficients of the filter are determined based on the estimated dry component of the captured voice signal, or filters the estimated dry component of the captured voice signal to produce a voice component which has comparable spatiality to the voice component at the outer microphones; and outputting the extracted or generated voice component via the loudspeaker.
9. The device according to claim 8, further comprising a digital filter is, to which the extracted or generated voice component is supplied before it is output via the loudspeaker.
10. Headphones adapted to perform a method according to claim 1.
11. (canceled)
Description
[0026] Further features of the embodiments will become apparent from the following description and claims in conjunction with the figures.
[0027]
[0028]
[0029]
[0030]
[0031]
[0032] For a better understanding of the principles of the present invention, embodiments of the invention are explained in more detail below with reference to the figures. It is understood that the invention is not limited to these embodiments and that the features described can also be combined or modified without departing from the scope of protection of the invention as defined in the claims.
[0033] The disclosed method can be used, for example, to reduce the occlusion effect of in-ear headphones, as shown schematically in
[0034] A noise signal x(t) arriving at the headphones from the environment, which can contain the voice of the user in particular, but also environmental noise, is detected with an outer microphone 11, which is directed away from the ear canal in the direction of the headphones ‘surroundings. Furthermore, the in-ear headphones 10 have an inner microphone 12 which is directed towards the ear canal 15 in the direction of the ear canal or eardrum of the user and a loudspeaker 13 located near the inner microphone 12. A compensation signal u(t) can be output by means of the loudspeaker 13, with which the occlusion effect is suppressed as comprehensively as possible, or at least reduced, so that the user is ideally given the impression that he would not be wearing headphones.
[0035] With the help of the outer microphone 11, the airborne components of the noise signal are detected, and a compensation signal is generated for them. In addition, the inner microphone 12 detects a residual signal e(t) after a superimposition of the compensation signal u(t) filtered through the secondary path S(s) with the noise signal x(t) filtered through the primary path P(s) and enables, in particular, also to detect a structure-borne noise component and to take it into account in the compensation signal. The primary acoustic path P(s) describes the transfer function for the acoustic transmission from the outer microphone 11 to the inner microphone 12, and can be measured with an external loudspeaker structure, for example. The secondary acoustic path S(s) describes the transfer function from the internal loudspeaker 13 to the inner microphone 12 and can be measured using this loudspeaker and inner microphone.
[0036] The in-ear headphones shown have only one outer microphone, but multiple microphones arranged in a microphone array can also be used. Furthermore, the occlusion effect can also occur with other headphones, such as headband headphones with circumaural ear pads that close the ear canal due to their closed design, or hearing aids and can be compensated for as described below.
[0037]
[0038] Then, in step 22, the dry component of the voice signal captured with the additional microphone is estimated. As is well known to those skilled in the art, a dry audio signal is understood to mean a pure sound signal as it originally was when it was generated, i.e., without any reverberation due to reflections of the sound waves generated, in a closed room or in a naturally delimited area and free from ambient, acoustic disturbances. In this step, the voice signal is estimated as it was generated directly by the user’s vocal tract
[0039] Based on the estimated dry component of the captured voice signal, in the subsequent step 23 for the microphone signal of the respective outer microphone the contained binaural voice signal is estimated and extracted with a filter, where filter coefficients of the filter are determined based on the estimated dry component of the captured voice signal. Alternatively, the estimated dry voice signal can be filtered in such a way that it has a comparable spatiality to the voice component at the outer microphones. The extracted or generated binaural voice component is then output in step 24 via the corresponding loudspeaker of the headphones or hearing aid, with the signal being adjusted beforehand by means of a forward (“feedforward ”) filter in such a way that the acoustically transparent reproduction of the voice signals is possible.
[0040]
[0041] As already mentioned in connection with
[0042] Furthermore, a mouth microphone 17 is provided. This can be part of a communication headset, for example, and can be attached to a pivoting bracket in order to be placed in front of the user’s mouth and aligned with the mouth. However, a microphone array consisting of several microphones can also be provided, which is arranged on the outside of the headphones or hearing aid and is aligned with the mouth, for example using a beam-forming method. In addition to the primary path P(z), which describes the acoustic transmission from the outer microphone to the inner microphone, and the secondary path S(z) for the transmission from the loudspeaker to the inner microphone, there is also the transmission path B(z) between the mouth microphone and the external reference microphone noted, which is given for example in a communication headset by the predefined position of the swivel microphone in front of the mouth relative to the position of the outer microphone. The transmission paths also include the influence of other components, such as the analog-to-digital converter and digital-to-analog converter (not shown).
[0043] If the user of the headphones or hearing aid outputs a voice, then a voice signal x.sub.v (n) corresponding to this voice output is detected by the outer microphone 11. The detected voice signal x.sub.v (n) contains the room impulse response, which contains all relevant information about the current acoustic room properties. In addition to this voice signal, however, an interference signal x.sub.a (n) caused by ambient noise is also detected by the outer microphone 11, since the outer microphone 11 is attached to the outside of the headphones. The audio signal x(n) consisting of these two signal components is then processed as described below based on an estimate of the dry voice signal to provide acoustic transparency for the user’s own voice by an output of the processed voice signals u(n) via the loudspeaker 13 of the headphones or hearing aid. The voice signal that hits the headphones from the outside is transmitted both via the primary path P(z) from the outer to the inner microphone and via the secondary path S(z) in the form of the signal that is actively output via the loudspeaker 13. In this way, the missing airborne sound part of one’s own voice is added again. Acoustic interference of the sound signals transmitted via these two paths then leads to the acoustic transparency for the voice signal.
[0044] In the exemplary embodiment shown, both the voice signal v(n) measured by the mouth microphone 17 and the error signal e(n) from the inner microphone are fed to an estimation unit 30, in which the pure, dry voice signal ṽ(n), as produced in the vocal tract and without reverberation caused by the surrounding space and free from ambient acoustic interference; is estimated. Based on this monaural estimate v̂(n) a second estimation unit 31 extracts the binaural voice signal from the signal captured with the outer microphone of the left and right headphones. Alternatively, the estimated dry voice signal can also be filtered in such a way that it has a comparable spatiality to the voice component at the outer microphones. The binaural voice signals x.sub.v (n) are then filtered by a digital filter unit 32 with a negated transfer function and finally fed as a loudspeaker signal u(n) to a sound transducer for output via the headphones. The digital filter unit 32 is designed here in particular as a forward filter (“feedforward filter”).
[0045] For dry voice signal estimation ṽ(n) in the estimation unit 30, the voice signal v(n) can be measured by a mouth microphone 17 and then used as a speech reference. The estimation of the dry voice component at the outer microphone can be done, for example, by filtering the additional signals with the respective relative impulse response between the additional microphone and the outer microphone and then averaging them. For this purpose, the mouth microphone signal v(n) can be filtered, for example, by an estimation
of the relative transmission path B(z) between the mouth microphone and the outer microphones. The voice signal v(n) is considered here as a monaural source, which is then used for both headphones or ears.
[0046] An error signal e(n) can also be detected by the inner microphone 12, which can also be used for the estimation of the dry voice signal ṽ(n) and can be fed to the estimation unit 30 for this purpose. Since the ear is closed by the headphones, one’s own voice couples strongly into the ear canal via the body, so that information about one’s own voice can also be obtained by means of the microphone signals from the inner microphone. The error signal e(n) comprises an error component e.sub.v(n) based on the voice signal and a further error component e.sub.b(n) which is based on further disturbances such as impact sound transmitted via the user’s body into the ear canal. In this case, separate error signals are generated for each of the two headphones or ears. These can differ, for example, if the fit of the headphones differs. However, the separate error signals can also be averaged, if necessary, in order to obtain a monaural signal again.
[0047] The signals from the mouth microphone and the inner microphones can be adjusted, for example, by digital filtering and then combined by averaging to further improve the signal-to-noise ratio. It should be noted that the signals played back via the headphone loudspeakers are each convolved with an estimate of the respective secondary path and subtracted from the respective inner microphone signal in order to prevent signal feedback.
[0048] Since the inner microphones mainly record the structure-borne noise component of one’s own voice, which does not allow for a breakdown of fricatives, for example, an extension of the bandwidth of the signals from the inner microphones is also conceivable.
[0049] Since both the mouth microphone and the inner microphones offer a good signal-to-noise ratio, it can also be envisaged that instead of an estimation based on a combination of signals from the two microphones, an estimation based only on the signal measured with the mouth microphone or the signal of the inner microphone can be performed. Finally, in particularly favorable conditions, these can already provide a dry reference of the voice without the need for an additional estimate.
[0050] In the second estimation unit 31, the binaural voice signal is estimated by extracting the binaural voice from the signals of the outer microphone signals, disturbed by ambient noise, based on the estimate of the dry voice, or by generating a voice signal which has a comparable spatiality to the voice component at the external microphones. It is important that the processing has a short and constant delay so that the delay can be taken into account for the calculation of the forward filter W(z).
[0051] For this purpose, for example, a Wiener filter or other algorithms for noise suppression can be used. In the Wiener filter, the magnitude spectra of the detected signals are evaluated in order to calculate a filter with an estimate of the speech signal and an estimate of the existing interference signal, with which the speech signal can be optimally extracted. For example, the magnitude spectrum of the mouth microphone can be combined with the magnitude spectrum of the inner microphones to estimate the magnitude spectrum of the dry vocal signal and then extract the speech component from the outer microphone signals. Here, the transfer function B(z) can be used to estimate how the dry voice arrives from the mouth microphone at the outer microphone, in order to then compensate for the propagation times of the direct sound.
[0052] Since the transfer function B(z) in a communication headset is very similar for different persons, the impulse response can be determined, for example, by a series of measurements for a specific headset and then used for applications with headsets of this design.
[0053] One possibility is Wiener filtering in a “filter bank equalizer” structure. This structure assumes a prototype low-pass filter that has a constant group delay. The spectral weights of the Wiener filter require an estimate of the useful and the interference signal. The estimate of the dry voice can be used to estimate the useful signal component
[0054] Alternatively, an adaptive filter a(n) can be used to estimate the binaural voice. Assuming that the outer microphone signal x(n)= x.sub.a (n)+x.sub.v (n) is composed of ambient noise x.sub.a (n) and a voice component x.sub.v (n), which is coherent to the estimate v̂(n) of the dry voice, an adaptive filter can be used to reproduce the voice component x.sub.v (n) in x(n) based on v̂(n).
[0055] With the output
of the adaptive filter, a prescription for adapting the adaptive filter can be found based on the following cost function:
[0056] Furthermore, the estimation unit 31 can analyze the acoustic influence of the room on one’s own voice and based thereon select or design a filter which can be applied to the estimated dry voice signal in order to generate a voice signal which has a comparable spatiality to the voice component at the outer microphones.
[0057] The forward filter W(z) can be obtained, for example, by solving the Wiener-Hopf equation
[0058] This requires one or more measurements of the primary path P(z) and the secondary path S(z). These measurements can be carried out, for example, on an artificial head or on test persons. It is important here that any delay caused by the processing in the branch between the respective outer microphone and the headphone loudspeaker is taken into account by the secondary path used for the calculation of the forward filter. If, for example, the signal x(n) or any signals derived from it, which are subsequently played back via the loudspeaker, are delayed when the binaural voice is estimated, this delay by the secondary path must be taken into account. This is indicated by an apostrophe in the Wiener-Hopf equation above.
[0059] The desired transmission behavior from the outer to the inner microphone, which is usually characterized by a flat magnitude response for the natural perception of one’s own voice, is described by H(z) in the z-range or by the impulse response h(n) and is also required for the Wiener-Hopf equation.
[0060]
[0061] As described above, one consequence of the occlusion effect is that the low-frequency components of one’s own voice are amplified. To compensate for this, the inner microphone signal can additionally be filtered with a feedback controller in such a way that the low frequency components of one’s own voice are reduced. In this way, the perception of one’s own voice appears even more natural when wearing headphones.
[0062] In this case, the estimation units 30 and 31 and the control unit 40 can be part of a processor unit which has one or more digital signal processors but can also contain other types of processors or combinations thereof. Furthermore, the filter coefficients of the digital filter 32 can be adjusted by the digital signal processor. The filter can be implemented as a time-invariant filter that is calculated once, uploaded to the headphone firmware and used in this form without any changes being made at runtime. An adaptive filter, which changes at runtime and adapts to the current circumstances, can also be used.
[0063] The disclosed device is preferably completely integrated in a headphone since the latency is very low due to the transmission of one’s own voice through the structure-borne noise. In this case, the mouth microphone can also be part of the headphones, for example in a so-called communication headset attached to a bracket to be attached in front of the mouth or integrated in a head shell as a microphone array with directional characteristics. Likewise, a separate microphone can also serve as a mouth microphone. In principle, parts of the device can also be part of an external device, such as a smartphone.
[0064]
[0065] The disclosed embodimentscan be used to suppress the occlusion effect when reproducing audio signals with any headphones or hearing aids, such as telephony or communication with communication headsets / hearables, so-called in- ear monitoring for checking one’s own voice during a live performance, augmented / virtual reality applications or use with hearing aids.
TABLE-US-00001 Reference List 10 Single Headphone, Single Hearing Aid 11 Outer microphone 12 Inner Microphone 13 Loudspeakers 14 Ear insert 15 Ear canal, 16 Eardrum 17 Mouth microphone 20-24 Process steps 30 First estimation unit 31 Second estimation unit 32 Digital eardrum filter 40 Control unit 41, 42 Weight unit 50 Processor unit