METHOD, DEVICE, HEADPHONES AND COMPUTER PROGRAM FOR ACTIVELY SUPPRESSING THE OCCLUSION EFFECT DURING THE PLAYBACK OF AUDIO SIGNALS

20230328462 · 2023-10-12

    Inventors

    Cpc classification

    International classification

    Abstract

    In the method according to the invention for actively suppressing the occlusion effect during the playback of audio signals by means of headphones or a hearing aid, a sound signal occurring from outside is captured by means of at least one outer microphone of the headphones or the hearing aid. A voice signal is captured by means of at least one additional microphone. The dry component of the captured voice signal is estimated, the dry component of the captured voice signal being the component of the captured voice signal without reverberation caused by the surrounding space. By means of a filter, a voice component is extracted from the outer sound captured using the at least one outer microphone. The extracted or produced voice component is output by means of a loudspeaker of the headphones or the hearing aid.

    Claims

    1. A method for actively suppressing the occlusion effect during the playback of audio signals by means of headphones or a hearing aid, comprising: capturing with at least one outer microphone of the headphones or the hearing aid, external sound in the form of a sound signal occurring from the outside; capturing a voice signal with at least one additional microphone; estimating the dry component of the captured voice signal, wherein the dry component of the captured voice signal is the component of the captured voice signal without reverberation caused by the surrounding space and without ambient noises; extracting a voice component by a filter from the external sound captured with the at least one outer microphone, with filter coefficients of the filter being determined based on the estimated dry component of the captured voice signal, or the estimated dry component of the captured voice signal is filtered such that a voice component is produced which has a comparable spatiality to the voice component at the at least one outer microphone; and outputting the extracted or generated voice component via a loudspeaker of the headphones or hearing aid.

    2. The method according to claim 1, wherein the voice signal is captured with at least one microphone or microphone array directed towards the mouth of the user and/or an inner microphone of the headphones or hearing aid.

    3. The method of claim 2, wherein a monaural dry component is estimated from the captured voice signal and based thereon binaural voice signals are extracted from the signals of at least two outer microphones of left and right headphones or left and right hearing aids, or the estimated monaural dry voice component is filtered to generate binaural voice signals with a comparable spatiality to the voice component at the outer microphones.

    4. The method according to claim 2, wherein the binaural voice signals are filtered for left and right headphones or left and right hearing aids prior to being respectively output via a loudspeaker.

    5. The method according to claim 2, wherein the estimate of the dry voice component at the outer microphone is performed by filtering with the respective relative impulse response between the mouth microphone or microphone array and the outer microphone and subsequent averaging.

    6. The method according to claim 1, wherein the filter for extracting or generating the voice component based on the detected outside noise and the estimated dry voice is a Wiener filter, an adaptive filter or a filter which simulates a room impulse response.

    7. The method according to claim 1, wherein the estimated dry component of the detected voice signal and the extrated or generated voice component are linearly weighted and then added.

    8. Device for actively suppressing the occlusion effect during the playback of audio signals by means of a loudspeaker in a headphone or hearing aid provided with at least one outer microphone, comprising: at least one additional microphone for capturing a voice signal from a user; a digital signal processor which is arranged to estimate a dry component of a voice signal captured with the at least one additional microphone, wherein the dry component of the captured voice signal is the component of the captured voice signal without reverberation caused by the surrounding space and without ambient noises; extract from the external sound captured by the at least one outer microphone the voice component using a filter, wherein filter coefficients of the filter are determined based on the estimated dry component of the captured voice signal, or filters the estimated dry component of the captured voice signal to produce a voice component which has comparable spatiality to the voice component at the outer microphones; and outputting the extracted or generated voice component via the loudspeaker.

    9. The device according to claim 8, further comprising a digital filter is, to which the extracted or generated voice component is supplied before it is output via the loudspeaker.

    10. Headphones adapted to perform a method according to claim 1.

    11. (canceled)

    Description

    [0026] Further features of the embodiments will become apparent from the following description and claims in conjunction with the figures.

    [0027] FIG. 1 schematically shows an in-ear headphone with occlusion of a user’s ear canal;

    [0028] FIG. 2 shows a flow chart of the disclosed method for actively suppressing the occlusion effect;

    [0029] FIG. 3 shows a block diagram of a first embodiment of a disclosed headphone;

    [0030] FIG. 4 shows a block diagram of a second embodiment of a disclosed headphone; and

    [0031] FIG. 5 schematically shows a communication headset for carrying out the disclosed method.

    [0032] For a better understanding of the principles of the present invention, embodiments of the invention are explained in more detail below with reference to the figures. It is understood that the invention is not limited to these embodiments and that the features described can also be combined or modified without departing from the scope of protection of the invention as defined in the claims.

    [0033] The disclosed method can be used, for example, to reduce the occlusion effect of in-ear headphones, as shown schematically in FIG. 1. The in-ear headphones 10 are in this case located on the ear of a user, with an ear insert 14 of the in-ear headphones being inserted in the external ear canal 15 in order to hold them in place. Depending on the individual fit in the ear canal and the material, the ear insert seals the ear canal to a certain degree. This results in external noise being at least partially shielded, so that this noise then only reaches the user’s eardrum 16 at a reduced level. Thus, on the one hand, music playback via the headphones or the playback of a caller’s voice during a telephone call using the headphones is less disturbed. On the other hand, the ear insert also dampens the user’s voice and thus leads to the occlusion effect mentioned above.

    [0034] A noise signal x(t) arriving at the headphones from the environment, which can contain the voice of the user in particular, but also environmental noise, is detected with an outer microphone 11, which is directed away from the ear canal in the direction of the headphones ‘surroundings. Furthermore, the in-ear headphones 10 have an inner microphone 12 which is directed towards the ear canal 15 in the direction of the ear canal or eardrum of the user and a loudspeaker 13 located near the inner microphone 12. A compensation signal u(t) can be output by means of the loudspeaker 13, with which the occlusion effect is suppressed as comprehensively as possible, or at least reduced, so that the user is ideally given the impression that he would not be wearing headphones.

    [0035] With the help of the outer microphone 11, the airborne components of the noise signal are detected, and a compensation signal is generated for them. In addition, the inner microphone 12 detects a residual signal e(t) after a superimposition of the compensation signal u(t) filtered through the secondary path S(s) with the noise signal x(t) filtered through the primary path P(s) and enables, in particular, also to detect a structure-borne noise component and to take it into account in the compensation signal. The primary acoustic path P(s) describes the transfer function for the acoustic transmission from the outer microphone 11 to the inner microphone 12, and can be measured with an external loudspeaker structure, for example. The secondary acoustic path S(s) describes the transfer function from the internal loudspeaker 13 to the inner microphone 12 and can be measured using this loudspeaker and inner microphone.

    [0036] The in-ear headphones shown have only one outer microphone, but multiple microphones arranged in a microphone array can also be used. Furthermore, the occlusion effect can also occur with other headphones, such as headband headphones with circumaural ear pads that close the ear canal due to their closed design, or hearing aids and can be compensated for as described below.

    [0037] FIG. 2 schematically shows the basic concept for a method for actively suppressing the occlusion effect, as can be carried out, for example, when reproducing audio signals with an in-ear headphone from FIG. 1. Here, in a first step 20, the external sound is detected with at least one outer microphone 11 of the headphones or hearing aid. This detected external noise also includes an acoustic voice component, which originates from a voice output by the user who is wearing the headphones. In a subsequent step 21, a voice signal that corresponds to the user’s voice output is detected with at least one additional microphone, for example a microphone of a communication headset directed at the user’s mouth, hereinafter also referred to in short as mouth microphone.

    [0038] Then, in step 22, the dry component of the voice signal captured with the additional microphone is estimated. As is well known to those skilled in the art, a dry audio signal is understood to mean a pure sound signal as it originally was when it was generated, i.e., without any reverberation due to reflections of the sound waves generated, in a closed room or in a naturally delimited area and free from ambient, acoustic disturbances. In this step, the voice signal is estimated as it was generated directly by the user’s vocal tract

    [0039] Based on the estimated dry component of the captured voice signal, in the subsequent step 23 for the microphone signal of the respective outer microphone the contained binaural voice signal is estimated and extracted with a filter, where filter coefficients of the filter are determined based on the estimated dry component of the captured voice signal. Alternatively, the estimated dry voice signal can be filtered in such a way that it has a comparable spatiality to the voice component at the outer microphones. The extracted or generated binaural voice component is then output in step 24 via the corresponding loudspeaker of the headphones or hearing aid, with the signal being adjusted beforehand by means of a forward (“feedforward ”) filter in such a way that the acoustically transparent reproduction of the voice signals is possible.

    [0040] FIG. 3 shows a block diagram of a disclosed device, which can be implemented in particular in headphones, but also in a hearing aid. Although transducers are usually provided for both ears of the user in headphones or hearing aids, only the conceptual structure relating to one ear is shown in the figure for the sake of clarity. Likewise, analog-to-digital converters for digitizing the sound signals detected with the microphones and digital-to-analog converters for converting the processed signals for output via the loudspeaker are required for digital signal processing but are not shown in the figure for simplification. Due to the digital signal processing, the signals are considered in the following in the time domain with a discrete time index n, the index z correspondingly stands for a frequency domain representation of the time-discrete signals and filters.

    [0041] As already mentioned in connection with FIG. 1, an outer microphone 11 and an inner microphone 12 are provided in addition to the loudspeaker 13, which can each be arranged in an earphone or a headphone shell. The outer microphone 11, which supplies the signal x(n), is attached to the outside of the headphones. The loudspeaker 13 and the inner microphone 12, on the other hand, are arranged inside the headphones and are directed in the direction of the eardrum.

    [0042] Furthermore, a mouth microphone 17 is provided. This can be part of a communication headset, for example, and can be attached to a pivoting bracket in order to be placed in front of the user’s mouth and aligned with the mouth. However, a microphone array consisting of several microphones can also be provided, which is arranged on the outside of the headphones or hearing aid and is aligned with the mouth, for example using a beam-forming method. In addition to the primary path P(z), which describes the acoustic transmission from the outer microphone to the inner microphone, and the secondary path S(z) for the transmission from the loudspeaker to the inner microphone, there is also the transmission path B(z) between the mouth microphone and the external reference microphone noted, which is given for example in a communication headset by the predefined position of the swivel microphone in front of the mouth relative to the position of the outer microphone. The transmission paths also include the influence of other components, such as the analog-to-digital converter and digital-to-analog converter (not shown).

    [0043] If the user of the headphones or hearing aid outputs a voice, then a voice signal x.sub.v (n) corresponding to this voice output is detected by the outer microphone 11. The detected voice signal x.sub.v (n) contains the room impulse response, which contains all relevant information about the current acoustic room properties. In addition to this voice signal, however, an interference signal x.sub.a (n) caused by ambient noise is also detected by the outer microphone 11, since the outer microphone 11 is attached to the outside of the headphones. The audio signal x(n) consisting of these two signal components is then processed as described below based on an estimate of the dry voice signal to provide acoustic transparency for the user’s own voice by an output of the processed voice signals u(n) via the loudspeaker 13 of the headphones or hearing aid. The voice signal that hits the headphones from the outside is transmitted both via the primary path P(z) from the outer to the inner microphone and via the secondary path S(z) in the form of the signal that is actively output via the loudspeaker 13. In this way, the missing airborne sound part of one’s own voice is added again. Acoustic interference of the sound signals transmitted via these two paths then leads to the acoustic transparency for the voice signal.

    [0044] In the exemplary embodiment shown, both the voice signal v(n) measured by the mouth microphone 17 and the error signal e(n) from the inner microphone are fed to an estimation unit 30, in which the pure, dry voice signal ṽ(n), as produced in the vocal tract and without reverberation caused by the surrounding space and free from ambient acoustic interference; is estimated. Based on this monaural estimate v̂(n) a second estimation unit 31 extracts the binaural voice signal from the signal captured with the outer microphone of the left and right headphones. Alternatively, the estimated dry voice signal can also be filtered in such a way that it has a comparable spatiality to the voice component at the outer microphones. The binaural voice signals x.sub.v (n) are then filtered by a digital filter unit 32 with a negated transfer function and finally fed as a loudspeaker signal u(n) to a sound transducer for output via the headphones. The digital filter unit 32 is designed here in particular as a forward filter (“feedforward filter”).

    [0045] For dry voice signal estimation ṽ(n) in the estimation unit 30, the voice signal v(n) can be measured by a mouth microphone 17 and then used as a speech reference. The estimation of the dry voice component at the outer microphone can be done, for example, by filtering the additional signals with the respective relative impulse response between the additional microphone and the outer microphone and then averaging them. For this purpose, the mouth microphone signal v(n) can be filtered, for example, by an estimation

    [00001]B^n

    of the relative transmission path B(z) between the mouth microphone and the outer microphones. The voice signal v(n) is considered here as a monaural source, which is then used for both headphones or ears.

    [0046] An error signal e(n) can also be detected by the inner microphone 12, which can also be used for the estimation of the dry voice signal ṽ(n) and can be fed to the estimation unit 30 for this purpose. Since the ear is closed by the headphones, one’s own voice couples strongly into the ear canal via the body, so that information about one’s own voice can also be obtained by means of the microphone signals from the inner microphone. The error signal e(n) comprises an error component e.sub.v(n) based on the voice signal and a further error component e.sub.b(n) which is based on further disturbances such as impact sound transmitted via the user’s body into the ear canal. In this case, separate error signals are generated for each of the two headphones or ears. These can differ, for example, if the fit of the headphones differs. However, the separate error signals can also be averaged, if necessary, in order to obtain a monaural signal again.

    [0047] The signals from the mouth microphone and the inner microphones can be adjusted, for example, by digital filtering and then combined by averaging to further improve the signal-to-noise ratio. It should be noted that the signals played back via the headphone loudspeakers are each convolved with an estimate of the respective secondary path and subtracted from the respective inner microphone signal in order to prevent signal feedback.

    [0048] Since the inner microphones mainly record the structure-borne noise component of one’s own voice, which does not allow for a breakdown of fricatives, for example, an extension of the bandwidth of the signals from the inner microphones is also conceivable.

    [0049] Since both the mouth microphone and the inner microphones offer a good signal-to-noise ratio, it can also be envisaged that instead of an estimation based on a combination of signals from the two microphones, an estimation based only on the signal measured with the mouth microphone or the signal of the inner microphone can be performed. Finally, in particularly favorable conditions, these can already provide a dry reference of the voice without the need for an additional estimate.

    [0050] In the second estimation unit 31, the binaural voice signal is estimated by extracting the binaural voice from the signals of the outer microphone signals, disturbed by ambient noise, based on the estimate of the dry voice, or by generating a voice signal which has a comparable spatiality to the voice component at the external microphones. It is important that the processing has a short and constant delay so that the delay can be taken into account for the calculation of the forward filter W(z).

    [0051] For this purpose, for example, a Wiener filter or other algorithms for noise suppression can be used. In the Wiener filter, the magnitude spectra of the detected signals are evaluated in order to calculate a filter with an estimate of the speech signal and an estimate of the existing interference signal, with which the speech signal can be optimally extracted. For example, the magnitude spectrum of the mouth microphone can be combined with the magnitude spectrum of the inner microphones to estimate the magnitude spectrum of the dry vocal signal and then extract the speech component from the outer microphone signals. Here, the transfer function B(z) can be used to estimate how the dry voice arrives from the mouth microphone at the outer microphone, in order to then compensate for the propagation times of the direct sound.

    [0052] Since the transfer function B(z) in a communication headset is very similar for different persons, the impulse response can be determined, for example, by a series of measurements for a specific headset and then used for applications with headsets of this design.

    [0053] One possibility is Wiener filtering in a “filter bank equalizer” structure. This structure assumes a prototype low-pass filter that has a constant group delay. The spectral weights of the Wiener filter require an estimate of the useful and the interference signal. The estimate of the dry voice can be used to estimate the useful signal component

    [0054] Alternatively, an adaptive filter a(n) can be used to estimate the binaural voice. Assuming that the outer microphone signal x(n)= x.sub.a (n)+x.sub.v (n) is composed of ambient noise x.sub.a (n) and a voice component x.sub.v (n), which is coherent to the estimate v̂(n) of the dry voice, an adaptive filter can be used to reproduce the voice component x.sub.v (n) in x(n) based on v̂(n).

    [0055] With the output

    [00002]xv^n

    of the adaptive filter, a prescription for adapting the adaptive filter can be found based on the following cost function:

    [00003]Cv=Exn- xv^n2,withxv^n=anv^n.

    [0056] Furthermore, the estimation unit 31 can analyze the acoustic influence of the room on one’s own voice and based thereon select or design a filter which can be applied to the estimated dry voice signal in order to generate a voice signal which has a comparable spatiality to the voice component at the outer microphones.

    [0057] The forward filter W(z) can be obtained, for example, by solving the Wiener-Hopf equation

    [00004]w=Ψss1φsph

    [0058] This requires one or more measurements of the primary path P(z) and the secondary path S(z). These measurements can be carried out, for example, on an artificial head or on test persons. It is important here that any delay caused by the processing in the branch between the respective outer microphone and the headphone loudspeaker is taken into account by the secondary path used for the calculation of the forward filter. If, for example, the signal x(n) or any signals derived from it, which are subsequently played back via the loudspeaker, are delayed when the binaural voice is estimated, this delay by the secondary path must be taken into account. This is indicated by an apostrophe in the Wiener-Hopf equation above.

    [0059] The desired transmission behavior from the outer to the inner microphone, which is usually characterized by a flat magnitude response for the natural perception of one’s own voice, is described by H(z) in the z-range or by the impulse response h(n) and is also required for the Wiener-Hopf equation.

    [0060] FIG. 4 shows a block diagram of a further disclosed device. In addition to the units of the device from FIG. 3, a control unit 40 for controlling two weighting units 41 and 42 is also provided here. Since in the case shown v̂(n) and x.sub.v (n) are coherent, i.e., are not or at least not noticeably shifted from one another in the time domain, both signals can be weighted with linear weighting factors α and 1-α, with 0≦α≦1 and then added. The weighting units 41 and 42 hereby enable the user to personalize the mix of dry and binaural voice. The user can thus decide and adjust for himself how he perceives his voice, for example in what ratio the volume of the reverberation should be to the volume of his own voice. However, the control can also take place automatically.

    [0061] As described above, one consequence of the occlusion effect is that the low-frequency components of one’s own voice are amplified. To compensate for this, the inner microphone signal can additionally be filtered with a feedback controller in such a way that the low frequency components of one’s own voice are reduced. In this way, the perception of one’s own voice appears even more natural when wearing headphones.

    [0062] In this case, the estimation units 30 and 31 and the control unit 40 can be part of a processor unit which has one or more digital signal processors but can also contain other types of processors or combinations thereof. Furthermore, the filter coefficients of the digital filter 32 can be adjusted by the digital signal processor. The filter can be implemented as a time-invariant filter that is calculated once, uploaded to the headphone firmware and used in this form without any changes being made at runtime. An adaptive filter, which changes at runtime and adapts to the current circumstances, can also be used.

    [0063] The disclosed device is preferably completely integrated in a headphone since the latency is very low due to the transmission of one’s own voice through the structure-borne noise. In this case, the mouth microphone can also be part of the headphones, for example in a so-called communication headset attached to a bracket to be attached in front of the mouth or integrated in a head shell as a microphone array with directional characteristics. Likewise, a separate microphone can also serve as a mouth microphone. In principle, parts of the device can also be part of an external device, such as a smartphone.

    [0064] FIG. 5 shows schematically the use of a communication headset in which the disclosed method can be carried out and which has the device described above for this purpose. A headset 10 is provided for each of the two ears of the user, in each of which an outer microphone 11, an inner microphone 12 and a loudspeaker 13 are integrated. Furthermore, a mouth microphone 17 is provided, which is attached to a swivel bracket Furthermore, a processor unit 50 is arranged in one of the two headphones, by which the estimation units and possibly the control unit 40 are implemented. The individual components are connected to the processor unit 50, but this is not shown in the figure to improve clarity.

    [0065] The disclosed embodimentscan be used to suppress the occlusion effect when reproducing audio signals with any headphones or hearing aids, such as telephony or communication with communication headsets / hearables, so-called in- ear monitoring for checking one’s own voice during a live performance, augmented / virtual reality applications or use with hearing aids.

    TABLE-US-00001 Reference List 10 Single Headphone, Single Hearing Aid 11 Outer microphone 12 Inner Microphone 13 Loudspeakers 14 Ear insert 15 Ear canal, 16 Eardrum 17 Mouth microphone 20-24 Process steps 30 First estimation unit 31 Second estimation unit 32 Digital eardrum filter 40 Control unit 41, 42 Weight unit 50 Processor unit