Hearing aid device for hands free communication

Abstract

The present invention regards a hearing aid device at least one environment sound input, a wireless sound input, an output transducer, electric circuitry, a transmitter unit, and a dedicated beamformer-noise-reduction-system. The hearing aid device is configured to be worn in or at an ear of a user. The at least one environment sound input is configured to receive sound and to generate electrical sound signals representing sound. The wireless sound input is configured to receive wireless sound signals. The output transducer is configured to stimulate hearing of the hearing aid device user. The transmitter unit is configured to transmit signals representing sound and/or voice. The dedicated beamformer-noise-reduction-system is configured to retrieve a user voice signal representing the voice of a user from the electrical sound signals. The wireless sound input is configured to be wirelessly connected to a communication device and to receive wireless sound signals from the communication device. The transmitter unit is configured to be wirelessly connected to the communication device and to transmit the user voice signal to the communication device.

Claims

1. A hearing aid device configured to be worn in or at an ear of a user, the hearing aid device comprising: two or more environment sound inputs, each for receiving sound and generating an electrical sound signal representing sound; a beamformer system configured to retrieve, from the electrical sound signals, a user voice signal representing the voice of the user; a wireless sound input for receiving wireless sound signals from a communication unit; an output transducer configured to stimulate hearing of the user; a transmitter unit configured to transmit signals representing sound and/or the user voice signal, the transmitter unit being configured to be wirelessly connected to the communication unit and to transmit the user voice signal to the communication unit; a voice activity detection unit configured to detect if a voice signal is present in the electrical sound signals; and electric circuitry configured to estimate a noise power spectral density of a disturbing background noise from the sound received with at least one of the environment sound inputs when the voice activity detection unit detects an absence of a voice signal of the user in the electrical sound signals, wherein a predetermined noise signal is used to remove noise in the electrical sound signals.

2. The hearing aid device according to claim 1, wherein when the hearing aid device operates in a telephone mode, the electric circuitry is configured to process the electrical sound signals in combination with a wirelessly received wireless sound signal to generate an output signal.

3. The hearing aid device according to claim 1, wherein the beamformer system is configured to process the electrical sound signals by suppressing pre-determined spatial directions of the electrical sound signals generating a spatial sound signal.

4. The hearing aid device according to claim 3, wherein the hearing aid device comprises a memory configured to store data, and wherein the beamformer system is configured to use values of predetermined spatial direction parameters representing an acoustic transfer function stored in the memory to suppress the predetermined spatial directions of the electrical sound signals.

5. The hearing aid device according to claim 1, wherein values of pre-determined spatial direction parameters are determined in dependence of the noise power spectral density of the disturbing background noise.

6. The hearing aid device according to claim 1, wherein the hearing aid device is configured to update spatial direction parameters of the beamformer system when the voice activity detection unit detects a presence of a voice signal of the user in the electrical sound signals.

7. The hearing aid device according to claim 1, wherein the beamformer system comprises a single channel noise reduction unit, and wherein the single channel noise reduction unit is configured to reduce noise in the electrical sound signals.

8. The hearing aid device according to claim 1, wherein the pre-determined noise signal used to remove the noise in the electrical sound signals is determined by sound received by the at least one environment sound input when the voice activity detection unit detects an absence of a voice signal of the user in the electrical sound signals.

9. The hearing aid device according to claim 1, further comprising a controllable switch configured to establish a wireless connection between the hearing aid device and the communication unit, wherein the controllable switch is adapted to be activated by the user.

10. A system comprising a hearing aid device according to claim 1, wherein the communication unit is configured as a remote control to control functionality of the hearing aid device.

11. A system according to claim 10, wherein the communication unit is a mobile phone, and wherein the function as a remote control is implemented as an application in the mobile phone, the hearing aid device comprising a wireless interface to the mobile phone.

12. The hearing aid device according to claim 1, wherein said output transducer includes one of a speaker outputting an airborne acoustic signal, an implanted vibrator, and an implanted electrical stimulator.

Description

(1) The present invention will be more fully understood from the following detailed description of embodiments thereof, taken together with the drawings in which:

(2) FIG. 1 shows a schematic illustration of a first embodiment of a hearing aid device wirelessly connected to a mobile phone;

(3) FIG. 2 shows a schematic illustration of the first embodiment of a hearing aid device worn by a user and wirelessly connected to a mobile phone;

(4) FIG. 3 shows a schematic illustration of a portion of a second embodiment of a hearing aid device;

(5) FIG. 4 shows a schematic illustration of a first embodiment of a hearing aid device worn by a dummy head in a beamformer dummy head model system;

(6) FIG. 5 shows a block diagram of a first embodiment of a method for using a hearing aid device connectable to a communication device; and

(7) FIG. 6 shows a block diagram of a second embodiment of a method for using a hearing aid device.

(8) FIG. 1 shows a hearing aid device 10 wirelessly connected to a mobile phone 12. The hearing aid device 10 comprises a first microphone 14, a second microphone 14′, electric circuitry 16, a wireless sound input 18, a transmitter unit 20, an antenna 22, and a (loud)speaker 24. The mobile phone 12 comprises an antenna 26, a transmitter unit 28, a receiver unit 30, and an interface to a public telephone network 32. The hearing aid device 10 can run several modes of operation, e.g., a communication mode, a wireless sound receiving mode, a silent environment mode, a noisy environment mode, a normal listening mode, a user speaking mode or another mode. The hearing aid device 10 can also comprise further processing units common in hearing aid devices 10, e.g., a spectral filter bank for dividing electrical sound signals in frequency bands, e.g. an analysis filter bank, amplifiers, analog-to-digital converters, digital-to-analog converters, a synthesis filter bank, an electrical sound signals combination unit or other common processing units used in hearing aid devices (e.g. a feedback estimation/reduction unit, not shown).

(9) Incoming sound 34 is received by the microphones 14 and 14′ of the hearing aid device 10. The microphones 14 and 14′ generate electrical sound signals 35 representing the incoming sound 34. The electrical sound signals 35 can be divided in frequency bands by the spectral filterbank (not shown) (in which case the subsequent analysis and/or processing of the band split signal is performed for each (or selected) frequency subband. For example, a VAD decision could then be a local per-frequency band decision). The electrical sound signals 35 are provided to the electric circuitry 16. The electric circuitry 16 comprises a dedicated beamformer-noise-reduction-system 36, which comprises a beamformer (Beamformer) 38 and a single channel noise reduction unit (Single-Channel Noise Reduction) 40, and which is connected to a voice activity detection unit 42. The electrical sound signals 35 are processed in the electric circuitry 16, which generates a user voice signal 44, if a voice of a user 46 (see FIG. 2) is present in at least one of the electrical sound signals 35 (or according to a predefined scheme, if working on a band split signal, e.g. if a user's voice is detected in a majority of the analysed frequency bands). When in the communication mode, the user voice signal 44 is provided to the transmitter unit 20, which uses the antenna 22 to wirelessly connect to the antenna 26 of the mobile phone 12 and to transmit the user voice signal 44 to the mobile phone 12. The receiver unit 28 of the mobile phone 12 receives the user voice signal 44 and provides it to the interface to the public telephone network 32, which is connected to another communication device, e.g., a base station of the public telephone network, another mobile phone, a telephone, a personal computer, a tablet, or any other device, which is part of the public telephone network. The hearing aid device 10 can also be configured to transmit electrical sound signals 35, if a voice of the user 46 is absent in the electrical sound signals 35, e.g., transmitting music or other non-speech sound (e.g. in an environment monitoring mode, where a current environment sound signal picked up by the hearing aid device is transmitted to another device, e.g. the mobile phone 12 and/or to another device via the public telephone network).

(10) The processing of the electrical sound signals 35 in the electric circuitry 16 is performed as follows. The electrical sound signals 35 are first analysed in the voice activity detection unit 42, which is further connected to the wireless sound input 18. If a wireless sound signal 19 is received by the wireless sound input 18 the communication mode is activated. In the communication mode the voice activity detection unit 42 is configured to detect an absence of a voice signal in the electrical sound signal 35. It is assumed in this embodiment of the communication mode, that receiving a wireless sound signal 19 corresponds to the user 46 listening during communication. The voice activity detection unit 42 can also be configured to detect an absence of a voice signal in the electrical sound signal 35 with a higher probability if the wireless sound input 18 receives a wireless sound signal 19. Receiving a wireless sound signal 19 here means, that a wireless sound signal 19 is received, which has a signal-to-noise ratio and/or sound level above a predetermined threshold. If no wireless sound signal 19 is received by the wireless sound input 18 the voice activity detection unit 42 detects whether a voice signal is present in the electrical sound signals 35. If the voice activity detection unit 42 detects a voice signal of a user 46 (see FIG. 2) in the electrical sound signals 35, the user speaking mode can be activated in parallel to the communication mode. The voice detection is performed according to methods known in the art, e.g., by using means to detect whether harmonic structure and synchronous energy is present in the electrical sound signals 35, which indicates a voice signal, as vowels have unique characteristics consisting of a fundamental tone and a number of harmonics showing up synchronously in the frequencies above the fundamental tone. The voice activity detection unit 42 can be configured to especially detect the voice of the user, i.e., own-voice or user voice signal, e.g., by comparison to training voice patterns received by the user 46 of the hearing aid device 10.

(11) The voice activity detection unit (VAD) 42 can further be configured to detect a voice signal only when the signal-to-noise ratio and/or the sound level of a detected voice are above a predetermined threshold. The voice activity detection unit 42 operating in the communication mode can also be configured to continuously detect whether a voice signal is present in the electrical sound signal 35, independent of the wireless sound input 18 receiving a wireless sound signal 19.

(12) The voice activity detection unit (VAD) 42 indicates to the beamformer 38 if a voice signal is present in at least one of the electrical sound signals 35, i.e., in the user speaking mode (dashed arrow from VAD 42 to Beamformer 38 in FIG. 3). The beamformer 38 suppresses spatial directions in dependence of predetermined spatial direction parameters, i.e., the look vector and generates a spatial sound signal 39 (see FIG. 3).

(13) The spatial sound signal 39 is provided to the single channel noise reduction unit (Single-Channel Noise Reduction) 40. The single channel noise reduction unit 40 uses a predetermined noise signal to reduce the noise in the spatial sound signal 39, e.g., by subtracting the predetermined noise signal from the spatial sound signal 39. The pre-determined noise signal is for example an electrical sound signal 35, a spatial sound signal 39, or a processed combination thereof of a previous time period, in which a voice signal is absent in the respective sound signal or sound signals. The single channel noise reduction unit 40 generates a user voice signal 44, which is then provided to the transmitter unit 20 (cf. FIG. 1). Therefore the user 46 (cf. FIG. 2) can use the microphones 14 and 14′ (cf. FIG. 1) of the hearing aid device 10 to communicate via the mobile phone 12 with another user on another mobile phone.

(14) In other modes the hearing aid device 10 can for example be used as an ordinary hearing aid, e.g., in a normal listening mode, in which, e.g., the listening quality is optimized (cf. FIG. 1). The hearing aid device 10 in the normal listening mode receives incoming sound 34 by the microphones 14 and 14′ which generate electrical sound signals 35. The electrical sound signals 35 are processed in the electric circuitry 16 by, e.g., amplification, noise reduction, spatial directionality selection, sound source localization, gain reduction/enhancement, frequency filtering, and/or other processing operations. An output sound signal is generated from the processed electrical sound signals, which is provided to the speaker 24, which generates an output sound 48. Instead of the speaker 24 the hearing aid device 10 can also comprise another form of output transducer, e.g., a vibrator of a bone anchored hearing aid device or electrodes of a cochlear implant hearing aid device which is configured to stimulate the hearing of the user 46.

(15) The hearing aid device 10 further comprises a switch 50 to, e.g., select and control the modes of operation and a memory 52 to store data, such as the modes of operation, algorithms and other parameters, e.g., spatial direction parameters (cf. FIG. 1). The switch 50 can for example be controlled via a user interface, e.g. a button, a touch sensitive display, an implant connected to the brain functions of a user, a voice interacting interface or other kind of interface (e.g. a remote control, e.g. implemented via a display of a SmartPhone) used for activating and/or deactivating the switch 50. The switch 50 can for example be activated and/or deactivated by a code word spoken by the user, a blinking sequence of the eyes of the user, or by clicking a button which activates the switch 50.

(16) The algorithm as described estimates the clean voice signal of the user (wearer) of the hearing aid device as picked up by a (or one or more) chosen microphone(s). However, for the far-end listener, the speech signal would sound more natural, if it were picked up in front of the mouth of the speaker (here the user of the hearing device). This is, of course, not completely possible, since we don't have a microphone positioned there, but we can in fact make a compensation to the output of our algorithm to simulate how it would sound if it were picked up in front of the mouth. This may be done simply by passing the output of our algorithm through a time-invariant linear filter, simulating the transfer function from microphone to mouth. This linear filter could be found from the dummy head in a completely analogous way to what we have done so far. Hence, in an embodiment, the hearing aid device comprises an (optional) post-processing block (M2Mc, microphone-to-mouth compensation) between the output of the current algorithm (Beamformer, Single-Channel Noise Reduction unit (38, 40)) and the transmitter unit (20), cf. dashed unit M2Mc in FIG. 3.

(17) FIG. 2 shows the hearing aid device 10 wirelessly connected to the mobile phone 12 presented in FIG. 1 worn at the ear of the user 46 in the communication mode. The hearing aid device 10 is configured to transmit user voice signals 44 to the mobile phone 12 and to receive wireless sound signals 19 from the mobile phone 12. This allows a hands free communication of the user 46 using the hearing aid device 10, while the mobile phone 12 can be left in a pocket or bag when in use and wirelessly connected to the hearing aid device 10. It is also possible to wirelessly connect the mobile phone 12 with two hearing aid devices 10 (e.g. constituting a binaural hearing aid system), e.g., on a left and on a right ear of the user 46 (not shown). In the binaural hearing aid system case the two hearing aid devices 10 preferably also are connected wirelessly with each other (e.g. by an inductive link or a link based on radiated fields (RF), e.g. according to the Bluetooth specification or equivalent) to exchange data and sound signals. The binaural hearing aid system preferably has at least four microphones, two microphones on each of the hearing aid devices 10.

(18) In the following, an exemplary communication scenario is presented. A phone call reaches the user 46. The phone call is accepted by the user 46, e.g., by activating the switch 50 at the hearing aid device 10 (or via another user interface, e.g. a remote control, e.g. implemented in the user's mobile phone). The hearing aid device 10 activates the communication mode and connects wirelessly to the mobile phone 12. A wireless sound signal 19 is wirelessly transmitted from the mobile phone 12 to the hearing aid device 10 using the transmitter unit 28 of the mobile phone 12 and the wireless sound input 18 of the hearing aid device 10. The wireless sound signal 19 is provided to the speaker 24 of the hearing aid device 10, which generates an output sound 48 (see FIG. 1) to stimulate the hearing of the user 46. The user 46 responds by speaking. The user voice signal is picked up by the microphones 14 and 14′ of the hearing aid device 10. Due to the distance of the mouth of the user 46, i.e., the target sound source 58 (see also FIG. 4), to the microphones 14 and 14′, additional background noise is also picked up by the microphones 14 and 14′, resulting in noisy sound signals reaching the microphones 14 and 14′. The microphones 14 and 14′ generate noisy electrical sound signals 35 from the noisy sound signals reaching the microphones 14 and 14′. Transmitting the noisy electrical sound signals 35 to another user using the mobile phone 12 without further processing would typically lead to poor conversation quality due to the noise, so processing is most often necessary. The noisy electrical sound signals 35 are processed by retrieving the user voice signal, i.e., own voice, from the electrical sound signals 35 using the dedicated own voice beamformer 38 (FIG. 1, 3). The output, i.e., spatial sound signal 39 of the beamformer 38 is further processed in the single chancel noise reduction unit 40. The resulting noise-reduced electrical sound signal 35, i.e., user voice signal 44, which ideally consists of mainly own voice, is transmitted to the mobile phone 12 and from the mobile phone 12 to another user using another mobile phone e.g. via a (public) switched (telephone and/or data) network.

(19) The voice activity detection (VAD) algorithm or voice activity detection (VAD) unit 42 allows for adapting the user voice, i.e., own voice, retrieval system. The VAD 42 task in this particular situation is rather simple as a user voice signal 44 is likely absent, when a wireless sound signal 19 (having a certain signal content) is received by the wireless sound input 18. When the VAD 42 detects no user voice, in the electrical sound signals 35, while the wireless sound input 18 receives a wireless sound signal 19, a noise power spectral density (PSD) used in the single channel noise reduction unit 40 for reducing noise in the electrical sound signal 35 is updated (because it is assumed that the user is silent (while listening to a remote talker) and hence ambient sounds picked up the microphone(s) of the hearing aid device can be considered as noise (in the present situation)). The look vector in the beamforming algorithm or beamformer unit 38 can be updated as well. When the VAD 42 detects a user voice the beamformers spatial direction, i.e., the look vector is (or may be) updated. This allows the beamformer 38 to compensate for the variation (deviation) of the hearing aid users' head characteristics from a standard dummy head 56 (see FIG. 4), and to compensate for the variation of the exact mounting of the hearing aid device 10 on an ear from day to day. Beamformer designs exist and are known to the person skilled in the art which are independent of the exact microphone locations, in the sense that they aim at retrieving an own voice target sound signal, i.e., the user voice signal 44, in a minimum mean-square sense or in a minimum-variance distortionless response sense independent of the microphone geometry, see e.g. [Kjems & Jensen; 2012] (U. Kjems and J. Jensen, “Maximum Likelihood Based Noise Covariance Matrix Estimation for Multi-Microphone Speech Enhancement,” Proc. Eusipco 2012, pp. 295-299).

(20) FIG. 3 shows a second embodiment of a portion of a hearing aid device 10′. The hearing aid device 10′ has two microphones 14 and 14′, a voice activity detection unit (VAD) 42, and a dedicated beamformer-noise-reduction-system 36, comprising a beamformer 38 and a single-channel noise reduction unit 40.

(21) The microphones 14 and 14′ receive incoming sound 34 and generate electrical sound signals 35. The hearing aid device 10′ has more than one signal transmission path to process the electrical sound signals 35 received by the microphones 14 and 14′. A first transmission path provides the electrical sound signals 35 as received by the microphones 14 and 14′ to the voice activity detection unit 42, corresponding to the mode of operation presented in FIG. 1.

(22) A second transmission path provides the electrical sound signals 35 as received by the microphones 14 and 14′ to the beamformer 38. The beamformer 38 suppresses spatial directions in the electrical sound signals 35 using the predetermined spatial direction parameters, i.e., the look vector, to generate a spatial sound signal 39. The spatial sound signal 39 is provided to the voice activity detection unit 42 and the single channel noise reduction unit 40. The voice activity detection unit 42 determines whether a voice signal is present in the spatial sound signal 39. If a voice signal is present in the spatial sound signal 39 the voice activity detection unit 42 transmits a voice detected signal to the single channel noise reduction unit 40 and if no voice signal is present in the spatial sound signal 39 the voice activity detection unit 42 transmits a no voice detected signal to the single channel noise reduction unit 40 (cf. dashed arrow from VAD 42 to Single-Channel Noise Reduction 40 in FIG. 3. The single channel noise reduction unit 40 generates a user voice signal 44 when it receives a voice detected signal from the voice activity detection unit 42 by subtracting a pre-determined noise signal from the spatial sound signal 39 received from the beamformer 38 or a (e.g. adaptively updated) noise signal corresponding to the spatial sound signal 39 when it receives a no voice detected signal. The predetermined noise signal corresponds e.g. to a spatial sound signal 39 without voice signal, which was received in an earlier time interval. The user voice signal 44 can be supplied to a transmitter unit 20 to be transmitted to a mobile phone 12 (not shown). As described in connection with FIG. 1, the hearing aid device may comprise an (optional) post-processing block (M2Mc, dashed outline) providing a microphone-to-mouth compensation, e.g. using a time-invariant linear filter, simulating the transfer function from an (imaginary centrally and frontally located) microphone to the mouth.

(23) In a normal listening mode, the environment sound picked up by microphones 14, 14′ may be processed by a beamformer and noise reduction system (but with other parameters, e.g. another look vector (not aiming at the user's mouth), e.g. an adaptively determined look vector depending on the current sound field around the user/hearing aid device) and further processed in a signal processing unit (electric circuitry 16) before being presented to the user via an output transducer (e.g. speaker 24 in FIG. 1).

(24) In the following, the dedicated beamformer-noise-reduction-system 36 comprising the beamformer 38 and the single channel noise reduction unit 40 is described in more detail. The beamformer 38, the single channel noise reduction unit 40, and the voice activity detection unit 42 are considered to be algorithms in the following which are stored in the memory 52 and executed on the electric circuitry 16 (cf. FIG. 1). The memory 52 is further configured to store the parameters used and described in the following, e.g., the predetermined spatial direction parameters (transfer functions) adapted to cause a beamformer 38 to suppress sound from other spatial directions than the spatial directions determined by values of the predetermined spatial direction parameters, such as the look vector, an inter-environment sound input noise covariance matrix for the current acoustic environment, a beamformer weight vector, a target sound covariance matrix, or further predetermined spatial direction parameters.

(25) The beamformer 38 can for example be a generalized sidelobe canceller (GSC), a minimum variance distortionless response (MVDR) beamformer 38, a fixed look vector beamformer 38, a dynamic look vector beamformer 38, or any other beamformer type known to a person skilled in the art.

(26) A so-called minimum variance distortionless response (MVDR) beamformer 38, see, e.g., [Kjems & Jensen; 2012] or [Haykin; 1996] (S. Haykin, “Adaptive Filter Theory,” Third Edition, Prentice Hall International Inc., 1996), can generally be described by the MVDR beamformer weight vector W.sub.H, as follows

(27) $W_{H} (k) = \frac{{\hat{R}}_{V V} (k) \hat{d} (k) \hat{d} * (k, i_{r e f})}{{\hat{d}}^{H} (k) {\hat{R}}_{V V}^{- 1} (k) \hat{d} (k)}$
where {circumflex over (R)}.sub.VV(k) is (an estimate of) the inter-microphone noise covariance matrix for the current acoustic environment, {circumflex over (d)}(k) is the estimated look vector (representing the inter-microphone transfer function for a target sound source at a given location), k is a frequency index and i.sub.ref is an index of a reference microphone (* denotes complex conjugate, and H denotes Hermitian transposition). It can be shown that this beamformer 38 minimizes the noise power in its output, i.e., the spatial sound signal 39, under the constraint that a target sound component, i.e., the voice of the user 46, is unchanged, see, e.g., [Haykin; 1996]. The look vector d represents the ratio of transfer functions corresponding to the direct part, i.e., first 20 ms, of room impulse responses from the target sound source 58, e.g., the mouth of a user 46 (see FIG. 4, where ‘user’ 46 is dummy head 56), to each of M microphones, e.g., the two microphones 14 and 14′ of the hearing aid device 10 located at an ear of the user 46. The look vector is normalized so that d.sup.Hd=1, and is computed as the eigenvector corresponding to the largest eigenvalue of the covariance matrix {circumflex over (R)}.sub.SS(k), i.e., the inter-microphone target sound signal covariance matrix (s referring to microphone signal s).

(28) A second embodiment of the beamformer 38 is a fixed look vector beamformer 38. A fixed look vector beamformer 38 from a user's mouth, i.e., target sound source 58, to the microphones 14 and 14′ of the hearing aid device 10 can, e.g., be implemented by determining a fixed look vector d=d.sub.0 (e.g. using an artificial dummy head 56 (see FIG. 4), e.g., the Head and Torso Simulator (HATS) 4128C from Brüel & Kjær Sound & Vibration Measurement A/S), and using such fixed look vector d.sub.0 (defining the target sound source 58 to microphone 14, 14′ configuration, which is relatively identical from one user 46 to another user) together with a dynamically determined inter-microphone noise covariance matrix for the current acoustic environment {circumflex over (R)}.sub.VV(k) (thereby taking into account a dynamically varying acoustic environment (different (noise) sources, different location of (noise) sources over time)). A calibration sound, i.e., training voice signals 60 or training signals (see FIG. 4), preferably comprising all relevant frequencies, e.g., a white noise signal having frequency content between a minimum frequency of, e.g., above 20 Hz and a maximum frequency of, e.g., below 20 kHz is emitted from the target sound source 58 of the dummy head 56 (see FIG. 4), and signals s.sub.m(n,k) (n being a time index and k a frequency index) are picked up by the microphones 14 and 14′ (m=1, . . . , M, here, e.g., M=2 microphones) of the hearing aid device 10′ when located at or in an ear of the dummy head 56. The resulting inter-microphone covariance matrix {circumflex over (R)}.sub.SS(k) is estimated for each frequency k based on the training signal

(29) ${\hat{R}}_{S S} (k) = \frac{1}{N} \underset{n}{.Math.} s (n, k) s^{H} (n, k),$
where s(n,k)=[s(n,k,1)s(n,k,2)].sup.T and s(n,k,m) is the output of an analysis filter bank, for microphone m, at time frame n and frequency index k. For a true point sound source, the signal impinging on the microphones 14 and 14′ or on a microphone array would be of the form s(n,k)=s(n,k)d(k) such that (assuming that signal s(n,k) is stationary) the theoretical target covariance matrix R.sub.SS(k)=E[s(n,k)s.sup.H(n,k)] would be of the form

(30) $R_{S S} (k) = ϕ_{S S} (k) d (k) d^{H} (k),$
where ϕ.sub.SS(k) is the power spectral density of the target sound signal, i.e., the voice of the user 46 coming from the target sound source 58, meaning the user voice signal 44, observed at the reference microphone 14. Therefore, the eigenvector of R.sub.SS(k) corresponding to the non-zero eigenvalue is proportional to d(k). Hence, the look vector estimate {circumflex over (d)}(k), e.g., the relative target sound source 58 to microphone 14, i.e., mouth to ear transfer function {circumflex over (d)}.sub.0(k), is defined as the eigenvector corresponding to the largest eigenvalue of the estimated target covariance matrix {circumflex over (R)}.sub.SS(k). In an embodiment, the look vector is normalized to unit length, that is:

(31) $d (k) := \frac{d (k)}{\sqrt{d^{H} (k) d (k)}},$
such that ∥d∥.sup.2=1. The look vector estimate {circumflex over (d)}(k) thus encodes the physical direction and distance of the target sound source 58, it is therefore also called the look direction. The fixed, pre-determined look vector estimate {circumflex over (d)}.sub.0(k) can now be combined with an estimate of the inter-microphone noise covariance matrix {circumflex over (R)}.sub.VV(k) to find MVDR beamformer weights (see above).

(32) In a third embodiment, the look vector can be dynamically determined and updated by a dynamic look vector beamformer 38. This is desirable in order to take into account physical characteristics of the user 46 which differ from those of the dummy head 56, e.g., head form, head symmetry, or other physical characteristics of the user 46. Instead of using a fixed look vector d.sub.0, as determined by using the artificial dummy head 56, e.g. HATS (see FIG. 4), the above described procedure for determining the fixed look vector can be used during time segments where the user's own voice, i.e., the user voice signal, is present (instead of the training voice signal 60) to dynamically determine a look vector d for the user's head and actual mouth to hearing aid device microphone(s) 14, 14′ arrangement. To determine these own-voice dominated time-frequency regions, a voice activity detection (VAD) 42 algorithm can be run on the output of the own-voice beamformer 38, i.e., the spatial sound signal 39, and target speech inter-microphone covariance matrices estimated (as above) based on the spatial sound signal 39 generated by the beamformer 38. Finally, the dynamic look vector can be determined as the eigenvector corresponding to the dominant eigenvalue. As this procedure involves VAD decisions based on noisy signal regions, some classification errors can occur. To avoid that these influence algorithm performance, the estimated look vector can be compared to the predetermined look vector and/or pre-determined spatial direction parameters estimated on the HATS. If the look vectors differ significantly, i.e., if their difference is not physically plausible, the predetermined look vector is preferably used instead of the look vector determined for the user 46. Clearly, many variations on the look vector selection mechanism can be envisioned, e.g., using a linear combination of the predetermined fixed look vector and the dynamically estimated look vector, or other combinations.

(33) The beamformer 38 provides an enhanced target sound signal (here focusing on the user's own voice) comprising the clean target sound signal, i.e., the user voice signal 44, (e.g., because of the distortionless property of the MVDR beamformer 38), and additive residual noise, which the beamformer 38 was unable to completely suppress. This residual noise can be further suppressed in a single-channel post filtering step using the single channel noise reduction unit 40 or a single channel noise reduction algorithm executed on the electric circuitry 16. Most single channel noise reduction algorithms suppress time-frequency regions where the target sound signal-to-residual noise ratio (SNR) is low, while leaving high-SNR regions unchanged, hence an estimate of this SNR is needed. The power spectral density (PSD) σ.sub.w.sup.2(k,m) of the noise entering the single-channel noise reduction unit 40 can be expressed as

(34) $σ_{w}^{2} (k, m) = w^{H} (k, m) {\hat{R}}_{V V} w (k, m)$
Given this noise PSD estimate, the PSD of the target sound signal, i.e., user voice signal 44, can be estimated as

(35) ${\hat{σ}}_{s}^{2} (k, m) = σ_{x}^{2} (k, m) {\hat{σ}}_{w}^{2} (k, m) .$
The ratio of {circumflex over (σ)}.sub.s.sup.2(k,m) and {circumflex over (σ)}.sub.w.sup.2(k,m) forms an estimate of the SNR at a particular time-frequency point. This SNR estimate can be used to find the gain of the single channel reduction unit 40, e.g., a Wiener filter, an mmse-stsa optimal gain, or the like, see, e.g., P. C. Loizou, “Speech Enhancement: Theory and Practice,” Second Edition, CRC Press, 2013 and the references therein.

(36) The described own-voice beamformer estimates the clean own-voice signal as observed by one of the microphones. This sounds slightly strange, and the far-end listener may be more interested in the voice signal as measured at the mouth of the HA user. Obviously, we don't have a microphone located at the mouth, but since the acoustical transfer function from mouth to microphone is roughly stationary, it is possible to make a compensation (pass the current output signal through a linear time-invariant filter) which emulates the transfer function from microphone to mouth.

(37) FIG. 4 shows a beamformer dummy head model system 54 with two hearing aid devices 10 mounted on a dummy head 56. The hearing aid devices 10 are mounted at the sides of the dummy head 56 at locations corresponding to ears of a user. The dummy head 56 has a dummy target sound source 58 that produces training voice signals 60 and/or training signals. The dummy target sound source 58 is located at a location corresponding to a mouth of a user. The training voice signals 60 are received by the microphones 14 and 14′ and can be used to determine the location of the target sound source 58 relative to the microphones 14 and 14′. An adaptive beamformer 38 (referring now to FIG. 4: you need (at least) two mics 14 and 14′ to be able to make a beamformer in each hearing aid device or alternatively one microphone in each hearing aid device of a binaural hearing aid system (binaural beamformer)) in each of the hearing aid devices 10 is configured to determine the look vector, (i.e. a (relative) acoustic transfer function from source to microphone(s)) while the hearing aid device 10 is in operation and while a training voice signal 60 is present in the spatial sound signal 39. The electric circuitry 16 estimates training voice inter-microphone covariance matrices and determines an eigenvector corresponding to a dominant eigenvalue of the covariance matrix, when the training voice signal 60 is detected. The eigenvector corresponding to the dominant eigenvalue of the covariance matrix is the look vector d (eigenvector is one way). The look vector depends on the relative location of the dummy target sound source 58 relative to the microphones 14 and 14′. The look vector therefore represents an estimate of the transfer function from the dummy target sound source 58 to the microphones 14 and 14′. The dummy head 56 is chosen in correspondence to an average human head, taking into account female and male heads. The look vector can also be gender specifically determined by using a corresponding female and/or male (or child-specific) dummy head 56, corresponding to an average female or male (or child) head.

(38) FIG. 5 shows a first embodiment of a method for using a hearing aid device 10 or 10′ connected to a communication device, e.g., the mobile phone 12. The method comprises the steps:

(39) 100 receiving sound 34 and generating electrical sound signals 35 representing sound 34,

(40) 110 determining if a wireless sound signal 19 is received,

(41) 120 activating a first processing scheme 130 if a wireless sound signal 19 is received and activating a second processing scheme 160 if no wireless sound signal 19 is received.

(42) The first processing scheme 130 comprises the steps 140 and 150.

(43) 140 using the electrical sound signals 35 to update a noise signal representing noise used for noise reduction,

(44) 150 using the noise signal to update values of predetermined spatial direction parameters.

(45) (In an embodiment, steps 140 and 150 are combined to update an inter-microphone noise-only covariance matrix)

(46) The second processing scheme 160 comprises the step 170.

(47) 170 determining if the electrical sound signals 35 comprise a voice signal representing voice and activating the first processing scheme 130 if a voice signal is absent in the electrical sound signals 35 and activating a noise reduction scheme 180 if the electrical sound signals 35 comprise a voice signal.

(48) The noise reduction scheme 180 comprises the steps 190 and 200.

(49) 190 using the electrical sound signals 35 to update the values of the pre-determined spatial direction parameters (if near-end speech is dominant, update estimate of own-voice inter-microphone covariance matrix and then find (e.g.) the dominant eigenvector=(relative) transfer function from source to microphone(s)),

(50) 200 retrieving a user voice signal 44 representing the user voice from the electrical sound signals 35. Preferably a spatial sound signal 39 representing spatial sound is generated from the electrical sound signals 35 using the predetermined spatial direction parameters and a user voice signal 44 is generated from the spatial sound signal 39 using (e.g.) the noise signal to reduce noise in the spatial sound signal 39.

(51) Optionally the user voice signal can be transmitted to, e.g., a communication device such as a mobile phone 12 wirelessly connected to the hearing aid device 10. The method can be performed continuously by starting again at step 100 after step 150 or step 200.

(52) FIG. 6 shows a second embodiment of a method for using the hearing aid device 10. The method shown in FIG. 6 uses the hearing aid device 10 as an own-voice detector. The method presented in FIG. 6 comprises the following steps.

(53) 210 Receive sound 34 from the environment in the microphones 14 and 14′.

(54) 220 Generate electrical sound signals 35 representing the sound 34 from the environment.

(55) 230 Use of the beamformer 38 to process the electrical sound signals 35, which generates a spatial sound signal 39 corresponding to predetermined spatial direction parameters, i.e., corresponding to the look vector d.

(56) 240 An optional step (dashed outline in FIG. 6) can be to use the single channel noise reduction unit 40 to reduce noise in the spatial sound signal 39 to increase the signal-to-noise ratio of the spatial sound signal 39, e.g., by subtracting a predetermined spatial noise signal from the spatial sound signal 39. A predetermined spatial noise signal can be determined by determining a spatial sound signal 39 when a voice signal is absent in the spatial sound signal 39, meaning when the user 46 is not speaking.

(57) 250 Use of the voice activity detection unit 42 to detect whether a user voice signal 44 of a user 46 is present in the spatial sound signal 39. Alternatively the voice activity detection unit 42 can also be used to determine whether the user voice signal 44 of the user 46 overcomes a signal-to-noise ratio threshold and/or sound signal level threshold.

(58) 260 Activate a mode of operation in dependence of the output of the voice activity detection unit 42, i.e., activating the normal listening mode, if no voice signal is present in the spatial sound signal 39 and activating the user speaking mode, if a voice signal is present in the spatial sound signal 39. If a wireless sound signal 19 is received additionally to the voice signal in the spatial sound signal 39 the method is preferably adapted to activate the communication mode and/or the user speaking mode.

(59) Additionally the beamformer 38 can be an adaptive beamformer 38. In this case the method is used for training the hearing aid device 10 as an own-voice detector and the method further comprises the following steps.

(60) 270 If a voice signal is present in the spatial sound signal 39, determine an estimate of the user voice inter-environment sound input covariance matrices and the eigenvector corresponding to the dominant eigenvalue of the covariance matrix. This eigenvector is the look vector. The look vector is then applied to the adaptive beamformer 38 to improve the spatial direction of the adaptive beamformer 38. The adaptive beamformer 38 is used to determine a new spatial sound signal 39. In this embodiment the sound 34 is obtained continuously. The electrical sound signal 35 can be sampled or supplied as a continuous electrical sound signal 35 to the beamformer 38.

(61) The beamformer 38 can be an algorithm performed on the electric circuitry 16 or a unit in the hearing aid device 10. The method can also be performed independent of the hearing aid device 10 on any other suitable device. The method can be iteratively performed, e.g., by starting again at step 210 after performing step 270.

(62) In the above examples, the hearing aid device(s) communicate(s) directly with a mobile phone. Other embodiments, where the hearing aid device(s) communicate(s) with the mobile phone VIA an intermediate device is also intended to be within the scope of the accompanying claims. The user advantage is that, whereas today the mobile phone or the intermediate device must be held in a hand or worn in a string around the neck so that its microphone is just below the mouth, with the proposed invention, the mobile phone and/or the intermediate device may be covered by clothes or carried in a pocket. This is convenient and has the benefit that the user does not need to flash that he wears a hearing aid device.

(63) In the above examples, the processing (electric circuitry 16) of the input sound signals (from microphone(s) and wireless receiver) is generally assumed to be located in the hearing aid device. In case of sufficient available bandwidth for transmitting audio signals ‘back and forth’, such processing (e.g. including beamforming and noise reduction) may be located in an external device, e.g. an intermediate device or a mobile telephone device. Thereby power and space can be saved in the hearing aid device; such parameters typically both being limited in a state of the art hearing aid device.

REFERENCE SIGNS

(64) 10 hearing aid device 12 mobile phone 14 microphone 16 electric circuitry 18 wireless sound input 19 wireless sound signal 20 transmitter unit 22 antenna 24 speaker 26 antenna 28 transmitter unit 30 receiver unit 32 interface to public telephone network 34 incoming sound 35 electrical sound signal representing sound 36 dedicated beamformer-noise-reduction-system 38 beamformer 39 spatial sound signal 40 single channel noise reduction unit 42 voice activity detection unit 44 user voice signal 46 user 48 output sound 50 switch 52 memory 54 dummy head model system 56 dummy head 58 target sound source 60 training voice signal

Hearing aid device for hands free communication

Assignee

Inventors

Cpc classification

Classification Explorer

H04R25/30

ELECTRICITY

Classification Explorer

H04R2499/11

ELECTRICITY

Classification Explorer

H04R2225/39

ELECTRICITY

Classification Explorer

H04R25/552

ELECTRICITY

Classification Explorer

H04R25/407

ELECTRICITY

Classification Explorer

H04R25/554

ELECTRICITY

Classification Explorer

H04R2225/41

ELECTRICITY

Classification Explorer

H04R1/1083

ELECTRICITY

Classification Explorer

H04R25/305

ELECTRICITY

Classification Explorer

H04R25/43

ELECTRICITY

Classification Explorer

H04R2225/55

ELECTRICITY

International classification

Classification Explorer

H04R25/00

ELECTRICITY

Abstract

Claims

Description