Hearing device comprising a noise reduction system

11533554 · 2022-12-20

Assignee

Inventors

Cpc classification

International classification

Abstract

A hearing device adapted for being located at or in an ear of a user, or for being fully or partially implanted in the head of a user comprises a) an input unit for providing at least one electric input signal representing sound in an environment of the user, said electric input signal comprising a target speech signal from a target sound source and additional signal components, termed noise signal components, from one or more other sound sources, b) a noise reduction system for providing an estimate of said target speech signal, wherein said noise signal components are at least partially attenuated, and c) an own voice detector for repeatedly estimating whether or not, or with what probability, said at least one electric input signal, or a signal derived therefrom, comprises speech originating from the voice of the user. The noise signal components are identified during time segments wherein the own voice detector indicates that the at least one electric input signal, or a signal derived therefrom, originates from the voice of the user, or originates from the voice of the user with a probability above an own voice presence probability (OVPP) threshold value. A method of operating a hearing device is further disclosed.

Claims

1. A hearing aid adapted for being located at or in an ear of a user, or for being fully or partially implanted in the head of a user, the hearing device comprising an input unit for providing at least one electric input signal representing sound in an environment of the user, said electric input signal comprising a target speech signal from a target sound source and additional signal components, termed noise signal components, from one or more other sound sources, a noise reduction system for providing an estimate of said target speech signal, wherein said noise signal components are at least partially attenuated, and an own voice detector for repeatedly estimating whether or not, or with what probability, said at least one electric input signal, or a signal derived therefrom, comprises speech originating from the voice of the user, wherein said hearing aid is configured to provide that said noise signal components are identified during time segments wherein said own voice detector indicates that the at least one electric input signal, or a signal derived therefrom, originates from the voice of the user, or originates from the voice of the user with a probability above an own voice presence probability (OVPP) threshold value, and the target sound source is an external speaker in the environment of the hearing aid user.

2. The hearing aid according to claim 1, wherein the input unit comprises at least one microphone, each of the at least one microphone providing an electric input signal comprising said target speech signal and said noise signal components.

3. The hearing aid according to claim 2 comprising a voice activity detector for repeatedly estimating whether or not, or with what probability, said at least one electric input signal, or a signal derived therefrom, comprises speech.

4. The hearing aid according to claim 2 comprising one or more beamformers, and wherein the input unit is configured to provide at least two electric input signals connected to the one or more beamformers, and wherein the one or more beamformers are configured to provide at least one beamformed signal.

5. The hearing aid according to claim 2, wherein said noise signal components are additionally identified during time segments wherein said voice activity detector indicates an absence of speech in the at least one electric input signal, or a signal derived therefrom, or a presence of speech with a probability below a speech presence probability (SPP) threshold value.

6. The hearing aid according to claim 1 comprising a voice activity detector for repeatedly estimating whether or not, or with what probability, said at least one electric input signal, or a signal derived therefrom, comprises speech.

7. The hearing aid according to claim 6 comprising one or more beamformers, and wherein the input unit is configured to provide at least two electric input signals connected to the one or more beamformers, and wherein the one or more beamformers are configured to provide at least one beamformed signal.

8. The hearing aid according to claim 6, wherein said noise signal components are additionally identified during time segments wherein said voice activity detector indicates an absence of speech in the at least one electric input signal, or a signal derived therefrom, or a presence of speech with a probability below a speech presence probability (SPP) threshold value.

9. The hearing aid according to claim 1 comprising one or more beamformers, and wherein the input unit is configured to provide at least two electric input signals connected to the one or more beamformers, and wherein the one or more beamformers are configured to provide at least one beamformed signal.

10. The hearing aid according to claim 9, wherein the one or more beamformers comprises one or more own voice cancelling beamformers configured to attenuate signal components originating from the user's mouth, while signal components from all other directions are left unchanged or attenuated less.

11. The hearing aid according to claim 1, wherein said noise signal components are additionally identified during time segments wherein said voice activity detector indicates an absence of speech in the at least one electric input signal, or a signal derived therefrom, or a presence of speech with a probability below a speech presence probability (SPP) threshold value.

12. The hearing aid according to claim 1 comprising a voice interface for voice-control of the hearing aid or other devices or systems.

13. The hearing aid according to claim 1, wherein the target speech signal from the target sound source comprises an own voice speech signal from the hearing aid user.

14. The hearing aid according to claim 1, wherein the hearing aid further comprises a timer configured to determine a time segment of overlap between the own voice speech signal and a further speech signal.

15. The hearing aid according to claim 14, wherein the hearing aid is configured to determine whether said time segment exceeds a time limit, and if so to label the further speech signal as part of the noise signal component.

16. A binaural hearing system comprising a first and a second hearing aid as claimed in claim 1, the binaural hearing system being configured to allow an exchange of data between the first and the second hearing aids.

17. A method of operating a hearing aid adapted for being located at or in an ear of a user, or for being fully or partially implanted in the head of a user, the method comprising providing at least one electric input signal representing sound in an environment of the user, said electric input signal comprising a target speech signal from a target sound source and additional signal components, termed noise signal components, from one or more other sound sources, providing an estimate of said target speech signal, wherein said noise signal components are at least partially attenuated, repeatedly estimating whether or not, or with what probability, said at least one electric input signal, or a signal derived therefrom, comprises speech originating from the voice of the user, and identifying, by operation of said hearing aid, said noise signal components during time segments wherein said own voice detector indicates that the at least one electric input signal, or a signal derived therefrom, originates from the voice of the user, or originates from the voice of the user with a probability above an own voice presence probability (OVPP) threshold value, wherein the target sound source is an external speaker in the environment of the hearing aid user.

18. A non-transitory computer readable medium on which is stored a computer program comprising instructions which, when the program is executed by a computer, cause the computer to carry out the method of claim 17.

Description

BRIEF DESCRIPTION OF DRAWINGS

(1) The aspects of the disclosure may be best understood from the following detailed description taken in conjunction with the accompanying figures. The figures are schematic and simplified for clarity, and they just show details to improve the understanding of the claims, while other details are left out. Throughout, the same reference numerals are used for identical or corresponding parts. The individual features of each aspect may each be combined with any or all features of the other aspects. These and other aspects, features and/or technical effect will be apparent from and elucidated with reference to the illustrations described hereinafter in which:

(2) FIG. 1A shows an exemplary application scenario of a hearing device system according to the present disclosure.

(3) FIGS. 1B to 1D show the corresponding voice activity, voice activity detector (VAD), and noise update, respectively, for identical time segments according the present disclosure.

(4) FIG. 2A shows an exemplary application scenario of a hearing device system according to the present disclosure.

(5) FIGS. 2B to 2D show the corresponding voice activity, voice activity detector (VAD), and noise update, respectively, for identical time segments according the present disclosure.

(6) FIG. 3A shows an exemplary application scenario of a hearing device system according to the present disclosure.

(7) FIGS. 3B to 3D show the corresponding voice activity, voice activity detector (VAD), and noise update, respectively, for identical time segments according the present disclosure.

(8) FIG. 4A shows an exemplary input unit coupled to an exemplary noise reduction system.

(9) FIG. 4B shows an exemplary input unit coupled to an exemplary noise reduction system according to the present disclosure.

(10) FIG. 5A shows an exemplary block diagram of a hearing aid comprising a noise reduction system according to an embodiment of the present disclosure.

(11) FIG. 5B shows an exemplary block diagram of a hearing aid comprising a noise reduction system according to an embodiment of the present disclosure in a handsfree telephony mode of operation.

(12) FIG. 5C shows an exemplary block diagram of a hearing aid comprising a noise reduction system according to an embodiment of the present disclosure including a voice control interface.

(13) FIG. 6 shows an exemplary application scenario of a hearing device system according to the present disclosure.

(14) The figures are schematic and simplified for clarity, and they just show details which are essential to the understanding of the disclosure, while other details are left out. Throughout, the same reference signs are used for identical or corresponding parts.

(15) Further scope of applicability of the present disclosure will become apparent from the detailed description given hereinafter. However, it should be understood that the detailed description and specific examples, while indicating preferred embodiments of the disclosure, are given by way of illustration only. Other embodiments may become apparent to those skilled in the art from the following detailed description.

DETAILED DESCRIPTION OF EMBODIMENTS

(16) The detailed description set forth below in connection with the appended drawings is intended as a description of various configurations. The detailed description includes specific details for the purpose of providing a thorough understanding of various concepts. However, it will be apparent to those skilled in the art that these concepts may be practiced without these specific details. Several aspects of the apparatus and methods are described by various blocks, functional units, modules, components, circuits, steps, processes, algorithms, etc. (collectively referred to as “elements”). Depending upon particular application, design constraints or other reasons, these elements may be implemented using electronic hardware, computer program, or any combination thereof.

(17) The electronic hardware may include micro-electronic-mechanical systems (MEMS), integrated circuits (e.g. application specific), microprocessors, microcontrollers, digital signal processors (DSPs), field programmable gate arrays (FPGAs), programmable logic devices (PLDs), gated logic, discrete hardware circuits, printed circuit boards (PCB) (e.g. flexible PCBs), and other suitable hardware configured to perform the various functionality described throughout this disclosure, e.g. sensors, e.g. for sensing and/or registering physical properties of the environment, the device, the user, etc. Computer program shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise.

(18) The present application relates to the field of hearing devices, e.g. hearing aids.

(19) Speech enhancement and noise reduction are often needed in real-world audio applications where noise from the acoustic environment masks a desired speech signal often resulting in reduced speech intelligibility. Examples of audio applications where noise reduction can be beneficial are hands-free wireless communication devices e.g. headsets, automatic speech recognition systems, and hearing aids (HA). In particular, applications such as headset communication devices where a (‘far end’) human listener needs to understand the noisy own voice picked-up by the headset microphones, noise can greatly reduce sound quality and speech intelligibility making conversations more difficult.

(20) ‘Headset applications’ may in the present context include normal headset applications for use in communication with a ‘far end speaker’ e.g. via a network (such as office or call-centre applications) but also hearing aid applications where the hearing aid is in a specific ‘communication or telephone mode’ adapted to pick up a user's voice and transmit it to another device (e.g. a far-end-communication partner), while possibly receiving audio from the other device (e.g. from the far-end-communication partner).

(21) Noise reduction algorithms implemented in multi microphone devices may comprise a set of linear filters, e.g. spatial filters and temporal filters that are used to shape the sound picked-up by the microphones. Spatial filters are able to alter the sound by enhancing or attenuating sound as a function of direction, while temporal filters alter the frequency response of the noisy signal to enhance or attenuate specific frequencies. To find the optimal filter coefficients, it is usually necessary to know the noise characteristics of the acoustic environment. Unfortunately, these noise characteristics are often unknown and need to be estimated online.

(22) Characteristics that are often necessary as inputs to multichannel noise reduction algorithms are e.g. the cross power spectral densities (CPSDs) of the noise. The noise CPSDs are for example needed for the minimum variance distortionless response (MVDR) and multichannel Wiener filter (MWF) beamformers which are common beamformers implemented in multi-microphone noise reduction systems.

(23) To estimate the noise statistics, researchers have developed a wide variety of estimators of the noise statistics e.g. [1-5]. In [1,4] they propose a maximum likelihood (ML) estimator of the noise CPSD matrix during speech presence by assuming that the noise CPSD matrix remains identical up to a scalar multiplier. This estimator performs well, when the underlying structure of the noise CPSD matrix does not change over time, e.g. for car cabin noise and isotropic noise fields, but may fail otherwise. In many realistic acoustic environments, the underlying structure of the noise CPSD matrix cannot be assumed fixed, for example when a prominent non-stationary interference noise source is present in the acoustic scene. In particular, when the interference is a competing speaker, then many noise reduction systems fail at efficiently suppressing the competing speaker as it is harder to determine whether the own voice or the competing speaker is the desired speech.

(24) In FIG. 1A, the environment of the hearing device user 1 is shown. The environment is shown to comprise the hearing device user 1, a target sound source 2, and noise signal components 3.

(25) The hearing device user 1 may wear a hearing device comprising a first microphone 4 and a second microphone 5 on a left ear of the user 1, and a third microphone 6 and a fourth microphone 7 on the right ear of the user 1.

(26) The target sound source 2 may be located near the hearing device user 1 and may be configured to generate and emit a target speech signal into the environment of the user 1. The target source 2 may as such be a person, a radio, a television, etc. configured to generate a target speech signal. The target speech signal may be directed towards the user 1 or may be directed away from the user 1.

(27) The noise signal components 3 are shown to surround both the hearing device user 1 and the target sound source 2 and therefore effect the target source signal received at the hearing device user 1. The noise signal components may comprise localized noise sources (e.g. a machine, a fan, etc.), and/or distributed (diffuse, isotropic) noise sound sources.

(28) The first microphone 4, the second microphone 5, the third microphone 6 and the fourth microphone 7 may (each) provide an electric input signal comprising the target speech signal and the noise signal components 3.

(29) In FIG. 1B, the voice activity (VA) is illustrated as a function of a time segment. It is assumed that the target source 2 and the user 1 are speaking back-to-back, i.e. with no or only minimal pause in between speech, e.g. of a conversation. The user 1 is illustrated to speak in the time segment between t1 and t2, and between t5 and t6 (denoted ‘own voice’), whereas the target source 2 is illustrated to speak in the time segment between t3 and t4, and between t7 and t8 (denoted ‘target sound source’). During the entire time segment of FIG. 1B, there is a noise signal with a randomly fluctuating noise level (solid curve denoted ‘Noise’).

(30) FIG. 1C illustrates how the exemplary voice activity of FIG. 1B may be detected with use of an own-voice VAD (e.g. own voice detector (OVD)) and with a VAD (i.e. a classical VAD).

(31) The own voice VAD may detect that the user 1 is speaking in the time segment between t1 and t2 and in the time segment between t5 and t6. The VAD on the other hand will detect that speech (from both the user 1 and the target source 2) is being generated in the entire time segment from t1 to t8. However, depending on the resolution of the VAD used there may be a small break in detected voice activity in the segments t2 to t3, t4 to t5, and t6 to t7.

(32) FIG. 1D illustrates when the hearing device may be able to update a noise reduction system for providing an estimate of said target speech signal and at least partially attenuating the noise signal components 3.

(33) In a classical approach (upper part of FIG. 1D) in which the VAD may be used to detect the presence of speech, the noise reduction system of the hearing device will only be updated at times where no speech is generated (both from the user 1 and from the target source 2), as VAD is not able to distinguish between speech from the user 1 and from the target source 2. Accordingly, only at times where the VAD does not detect speech, i.e. from t0 to t1 and from t8 ongoing, the noise reduction system will be updated.

(34) With use of an own voice VAD (lower part of FIG. 1D), the noise reduction system of the hearing device may be updated not only when no speech is detected, but also when speech from the user 1 is detected with the own voice VAD, i.e. from t0 to t2, from t5 to t6, and from t8 ongoing.

(35) Accordingly, noise signal components may be identified during time segments (time intervals) where said own voice detector indicates that the at least one electric input signal, or a signal derived therefrom, originates from the voice of the user 1, or originates from the voice of the user 1 with a probability above an own voice presence probability (OVPP) threshold value, e.g. 60%, or 70%.

(36) Combining the own voice VAD and the VAD in the hearing device, the noise reduction system may be configured to both detect when the user 1 is speaking and when the target source 2 is speaking. Thereby, the noise reduction system may be updated during time segments where no speech signal is generated and where the user 1 is speaking, but may be prevented from updating at time segments where only the target sound source 2 is generating a target speech signal (speaking).

(37) In FIG. 2A, the environment of the hearing device user 1 is shown. The environment is shown to comprise the hearing device user 1, a competing speaker 8, and noise signal components 3.

(38) As was the case in FIG. 1A, the hearing device user 1 may wear a hearing device comprising a first microphone 4 and a second microphone 5 at or on a left ear of the user 1, and a third microphone 6 and a fourth microphone 7 at or on the right ear of the user 1.

(39) The competing speaker 8 may be located near the hearing device user 1 and may be configured to generate and emit a competing speech signal (i.e. an unwanted speech signal) into the environment of the user 1. The competing speaker 8 may as such be a person, a radio, a television, etc. configured to generate a competing speech signal. The competing speech signal may be directed towards the user 1 or may be directed away from the user 1.

(40) The noise signal components 3 are shown to surround both the hearing device user 1 and the competing speaker 8 and therefore effect the estimation of the own voice of the user 1, i.e. the wanted speech signal (e.g. in case the hearing device comprises or implements a headset), received at the hearing device microphones 4,5,6,7.

(41) In FIG. 2B, the voice activity (VA) is illustrated as a function of a time segment (Time). It is assumed that the user 1 is speaking from t1 to t3 and that the competing speaker 8 is speaking from t2 to t4, whereby the voice of the competing speaker 8 is overlapping the voice of the user 1. During the entire time segment of FIG. 2B, there is a noise signal with a randomly fluctuating noise level.

(42) FIG. 2C illustrates how the exemplary voice activity of FIG. 2B may be detected with use of an own-voice VAD and with a (general) VAD.

(43) The own voice VAD (lower part of FIG. 2C) may detect that the user 1 is speaking in the time segment between t1 and 3. The VAD (upper part of FIG. 2C) on the other hand will detect that speech (from both the user 1 and the competing speaker 8) is being generated in the entire time segment from t1 to t4.

(44) FIG. 2D illustrates when the hearing device may be able to update a noise reduction system for providing an estimate of said target speech signal and at least partially attenuating the noise signal components 3.

(45) In a classical approach (upper part of FIG. 2D) in which the VAD would be used to detect the presence of speech, the noise reduction system of the hearing device would only be updated at times where no speech is generated (both from the user 1 and from the competing speaker 8), as the general VAD is not able to distinguish between speech from the user 1 and from the competing speaker 8. Accordingly, only at times where the VAD does not detect speech, i.e. from t0 to t1 (and from t4 and on), the noise reduction system may be updated.

(46) With use of an own voice VAD (lower part of FIG. 2D), the noise reduction system of the hearing device may be configured to be updated not only when no speech is detected, i.e. from t0 to t1 (and from t4 and on), but also when speech from the user 1 is detected with the own voice VAD, i.e. (in total) from t0 to t3.

(47) Accordingly, noise signal components (including from the competing speaker 8) may be identified during time segments where said own voice detector indicates that the at least one electric input signal, or a signal derived therefrom, originates from the voice of the user 1, or originates from the voice of the user 1 with a probability above an own voice presence probability (OVPP) threshold value.

(48) Combining the own voice VAD and the VAD in the hearing device, the noise reduction system may be configured to both detect when the user 1 is speaking and when the competing speaker 8 is speaking alone. Thereby, the noise reduction system may be updated during time intervals where no speech signal is generated and where the user 1 is speaking, but may be prevented from updating at time intervals where the competing speaker 8 is generating a speech signal.

(49) In FIG. 3A, the environment of the hearing device user 1 is shown. The environment is shown to comprise the hearing device user 1, a target sound source 2, a competing speaker 8, and noise signal components 3.

(50) As was the case in FIGS. 1A and 2A, the hearing device user 1 may wear a hearing device comprising a first microphone 4 and a second microphone 5 on a left ear of the user 1, and a third microphone 6 and a fourth microphone 7 on the right ear of the user 1.

(51) The target sound source 2 and the competing speaker 8 may be located near the hearing device user 1 and may be configured to generate and emit a speech signals into the environment of the user 1. The target speech signal and/or the competing speaker speech signal may be directed towards the user 1 or may be directed away from the user 1.

(52) The noise signal components 3 are shown to surround both the hearing device user 1, the competing speaker 8, and the target sound source 2 and may therefore affect the target source signal received at the hearing device user 1.

(53) The first microphone 4, the second microphone 5, the third microphone 6 and the fourth microphone 7 may provide an electric input signal comprising the target speech signal, the competing speaker signal, and the noise signal components 3.

(54) In FIG. 3B, the voice activity (VA) is illustrated as a function of a time interval (Time). It is assumed that the target source 2 and the user 1 are speaking back-to-back and that the competing speaker 8 is overlapping the speech of the target source 2 and the user 1. The user 1 is illustrated to speak in the time interval between t1 and t2, and between t5 and t6 (Own voice), whereas the target source 2 is illustrated to speak in the time interval between t3 and t4, and between t7 and t8 (Target sound source). The competing speaker 8 is illustrated to speak in the time interval between t1* and t7* (Competing speaker). During the entire time interval of FIG. 3B, there is a noise signal with a randomly fluctuating noise level (solid graph denoted ‘noise’).

(55) FIG. 3C illustrates how the exemplary voice activity of FIG. 3B may be detected with use of an own-voice VAD and with a VAD.

(56) The own voice VAD will detect that the user 1 is speaking in the time interval between t1 and t2 and in the time interval between t5 and t6. The VAD on the other hand will detect that speech (from both the user 1, the competing speaker 8, and the target source 2) is being generated in the entire time interval from t1 to t8.

(57) FIG. 3D illustrates the time intervals at which the hearing device would be able to update a noise reduction system for providing an estimate of said target speech signal and at least partially attenuating the noise signal components 3, including the competing speaker signal.

(58) In a classical approach in which the VAD may be used to detect the presence of speech, the noise reduction system of the hearing device would only be updated at times where no speech is generated (both from the user 1, the competing speaker 8, and from the target source 2), as the VAD is not able to distinguish between speech from the user 1, the competing speaker 8, and from the target source 2. Accordingly, only at times where the VAD does not detect speech, i.e. from t0 to t1 and from t8 ongoing, the noise reduction system will be updated.

(59) With use of an own voice VAD, the noise reduction system of the hearing device may be configured to be updated not only when no speech is detected, but also when speech from the user 1 is detected by the own voice VAD, i.e. from t0 to t2, from t5 to t6, and from t8 ongoing.

(60) Accordingly, noise signal components may be identified during time segments where said own voice detector indicates that the at least one electric input signal, or a signal derived therefrom, originates from the voice of the user 1, or originates from the voice of the user 1 with a probability above an own voice presence probability (OVPP) threshold value.

(61) Combining the own voice VAD and the VAD in the hearing device, the noise reduction system may be configured to both detect when the user 1 is speaking and when the target source 2 and the competing speaker 8 are speaking. Thereby, the noise reduction system may be updated during time intervals where no speech signal is generated and where the user 1 is speaking, but may be prevented from updating at time intervals where the target sound source 2 is generating a target speech signal.

(62) In FIGS. 4A and 4B a noise reduction system (NRS) is coupled to an input unit (IU) comprising M input transducers (IT.sub.1, . . . , IT.sub.M), e.g. microphones, where M is larger than or equal to 2. The M input transducers may be located in a single hearing device, e.g. a hearing aid (e.g. located in or at an ear of a user). The M input transducers may be distributed over two (separate) hearing devices, e.g. hearing aids (e.g. in (two) hearing devices located in or at respective ears of a user). The latter configuration may form part of or constitute a binaural hearing system, e.g. a binaural hearing aid system. Each of the hearing devices of the binaural hearing aid system may comprise one or more (at least one), e.g. two or more, input transducers (e.g. microphones). A configuration of microphones of a binaural hearing aid system, wherein each hearing aid comprises two microphones, is e.g. illustrated in FIG. 6. Various embodiments of a hearing device (e.g. a hearing aid) comprising a noise reduction system according to the present disclosure are illustrated in FIGS. 5A, 5B, 5C.

(63) FIG. 4A shows an exemplary input unit (IU) coupled to an exemplary noise reduction system.

(64) Each of the M input transducers receive (at their respective, different locations) sound signals (s.sub.1, . . . , s.sub.M) from an input sound filed (comprising environment sound). The input unit (IU) comprises M input sub-units (IU.sub.1, . . . , I.sub.UM). Each input unit comprises an input transducer (IT.sub.1, . . . , IT.sub.M), e.g. a microphone, for converting an input sound signal to an electric input signal (s′.sub.1, . . . , s′.sub.M). Each input transducer may comprise an analogue-to-digital converter for converting an analogue input signal to a digital signal (with a certain sampling rate, e.g. 20 kHz, or more). Each input unit further comprises an analysis filter bank for converting a time-domain (digital) signal to a number (K, e.g. >16, or >24 or >64) of frequency sub-band signals (S.sub.1(k,n), . . . , S.sub.M(k,n), where k and n are frequency and time indices, respectively, and where k=1, . . . , K). The respective electric input signals (S.sub.1(k,n), . . . , S.sub.M(k,n)) in a time-frequency representation (k,n) are fed to the noise reduction system (NRS).

(65) The noise reduction system (NRS) is configured to provide an estimate S(k,n) of a target speech signal (e.g. the hearing aid user's own voice, and/or the voice of a target speaker in the environment of the user), wherein noise signal components are at least partially attenuated. The noise reduction system (NRS) comprises a a number of beamformers. The noise reduction system (NRS) comprises a beamformer (BF), e.g. an MVDR beamformer or a MVF beamformer, connected to the input unit (IU) and configured to receive the electric input signals (S.sub.1(k,n), . . . , S.sub.M(k,n)) in a time-frequency representation. The beamformer (BF) is configured to provide at least one beamformed (spatially filtered) signal, e.g. the estimate Ŝ(k,n) of a target speech signal.

(66) Directionality by beamforming is an efficient way to attenuate unwanted noise as a direction-dependent gain can cancel noise from one direction while preserving the sound of interest impinging from another direction hereby potentially improving the intelligibility of a target speech signal (thereby providing spatial filtering). Typically, beamformers in hearing devices, e.g. hearing aids, have beampatterns, which are continuously adapted in order to minimize noise components while sound impinging from a target direction is unaltered. Typically, the acoustic properties of the noise signal changes over time. Hence, the noise reduction system is implemented as an adaptive system, which adapts the directional beampattern in order to minimize the noise while the target sound (direction) is unaltered.

(67) The noise reduction system (NRS) of FIG. 4A further comprises a voice activity detector (VAD) for repeatedly estimating whether or not, or with what probability, at least one (a majority, or all) of the electric input signals, or a signal or signals derived therefrom, comprise(s) speech. The electric input signals (S.sub.1(k,n), . . . , S.sub.M(k,n)), or at least one of them (or a processed, e.g. beamformed, version thereof), is/are fed to the VAD, and based thereon, a voice activity signal (VA) indicative of whether or not, or with what probability, the electric input signal or signals or processed versions thereof contains speech, is provided. The VA is fed to the update unit (UPD-C.sub.noise) for updating noise covariance matrices C.sub.noise. The noise covariance matrices are determined (at a given point in time) from the (noisy) electric input signals (S.sub.1(k,n), . . . , S.sub.M(k,n)) in the absence of speech (assuming that only noise is present in the sound field at such time instants). An updated noise covariance matrix C.sub.noise(k,n) is used by the update filter weights unit (UPD-W), wherein updated filter weights W(k,n) at the given time instant when the noise covariance matrix was updated are determined based on the latest noise covariance matrix C.sub.noise(k,n) and an estimate of current relative or absolute acoustic transfer functions (e.g. arranged in a look vector d (k,m)) from the target sound source to the respective input transducers of the input unit (IU) of the hearing system (or device)). The calculation of the noise covariance matrix C.sub.noise(k,n) and the beamformer weights W(k,n) is known from the prior art and e.g. described in [11] and/or in EP2701145A1. The updated beamformer weights W(k,n) are applied to the electric input signals (S.sub.1(k,n), . . . , S.sub.M(k,n)) in the beamformer (BF), whereby an estimate S(k,n) of the target signal is provided.

(68) FIG. 4B shows an exemplary input unit (IU) coupled to an exemplary noise reduction system (NRS) according to the present disclosure. The embodiment of FIG. 4B is equal to the embodiment of FIG. 4A in that it contains the same functional elements as the embodiment of FIG. 4A. Additionally, however, it contains an own voice detector (OVAD) for repeatedly estimating whether or not, or with what probability, at least one (a majority, or all) of the electric input signals (S.sub.1, S.sub.M), or a signal derived therefrom, comprises speech originating from the voice of the user. Some acoustic events have distinct directional beampatterns, which can be distinguished from other acoustic events. A hearing device user's own voice is an example of such an event. This is utilized in the present disclosure. By simultaneously monitoring (general) voice presence (indicated by voice activity signal VA from the VAD) and (specifically) own voice presence (indicated by own voice activity signal OVA from the OVAD), another scheme (than general voice absence) for identifying appropriate time segments for updating the noise covariance matrix C.sub.noise(k,n) can advantageously be used. As shown in the examples of FIG. 1D, 2D, 3D, the noise reduction system according to the present disclosure is configured to update the noise covariance matrix C.sub.noise(k,n) during own voice speech activity (and possibly during general speech absence). The update unit (UPD-C.sub.noise) may e.g. comprise an own voice cancelling beamformer configured to cancel (or attenuate) sounds from the user's mouth, while leaving sounds from other directions un-changed (or less attenuated). The update filter weights unit (UPD-W) may include the function of a (single channel) post filter in that—in addition to spatial filtering of the target signal—noise components are further attenuated by the own-voice cancelling beamformer of the update unit (UPD-C.sub.noise). The update filter weights unit (UPD-W) may receive or calculate own voice transfer functions (mouth to microphones), e.g. arranged in a look vector d (cf. input d). The look vector may be determined in advance of or during operation of the hearing device. The look vector may be used in determining the current filter weights. The look vector may represent transfer functions or relative transfer functions to the user's own voice or to an external target sound source, e.g. a target speaker in the environment. Look vectors for the user's own-voice as well as for an environment target speaker may be provided to or adaptively determined by the noise reduction system. The noise reduction system (NRS) may comprise a mode select input (Mode) configured to indicate a mode of operation of the system, e.g. of the beamformer(s) and/or the updating strategy, e.g. whether the target signal is the user's own voice or a target signal from the environment of the user (and possibly to indicate a direction to or location of such target sound source). The mode control signal may e.g. be provided from a user interface, e.g. from a remote control device (e.g. implemented as an APP of a smartphone or similar device, e.g. a smartwatch or the like). The user interface may comprise a voice control interface (see e.g. FIG. 5C). The mode control signal (Mode) may e.g. be automatically generated, e.g. using one or more sensors, e.g. initiated by the reception of a wireless signal, e.g. from a telephone). The output of the beamformer (BF) may be an estimate of the user's voice S.sub.OV, or an estimate of a target sound from the environment Ŝ.sub.ENV, see e.g. FIG. 5B.

(69) FIG. 5A shows an exemplary block diagram of a hearing device, e.g. a hearing aid (HD), comprising a noise reduction system (NRS) according to an embodiment of the present disclosure. The hearing device comprises an input unit (IU) for picking up sound sin from the environment and providing a multitude (M) of electric input signals (S.sub.1, . . . , S.sub.M) and a noise reduction system (NRS) for estimating a target signal Ŝ in the input sound sin based on the electric input signals and optionally further information (e.g. the mode control signal (Mode)) as described in connection with FIGS. 4A, 4B. The hearing aid further comprises a processor (PRO) for applying one or more processing algorithms to a signal of the forward path from input to output transducer (e.g. as here to the estimate Ŝ of the target signal, provided in a time-frequency representation Ŝ(k,n)). The one or more processing algorithms may e.g. comprise a compression algorithm configured to amplify (or attenuate) a signal according to the needs of the user, e.g. to compensate for a hearing impairment of the user. Other processing algorithms may include frequency transposition, feedback control, etc. The processor provides a processed output (OUT) that is fed to an output unit (OU) for converting output signal (out) to stimuli s.sub.out perceivable by the user as sound (Perceived output sound), e.g. acoustic vibrations (e.g. in air and/or skull bone) or electric stimuli of the cochlear nerve. In a non-hearing aid, e.g. headset application, the processor may be configured to further enhance the signal from the noise reduction system or be dispensed with (so that the estimate Ŝ of the target signal is fed directly to the output unit). The target signal may be the user's own voice, and/or a target sound in the environment of the user (e.g. a person (other than the user) speaking, e.g. communicating with the user).

(70) FIG. 5B shows an exemplary block diagram of a hearing device, e.g. a hearing aid (HD), comprising a noise reduction system (NRS) according to an embodiment of the present disclosure in a handsfree telephony mode of operation. The embodiment of FIG. 5B comprises the functional blocks described in connection with the embodiment of FIG. 5A. Specifically, however, the embodiment of FIG. 5B is configured—in a particular communication mode—to implement a wireless headset allowing a user to conduct a spoken communication with a remote communication partner. In the particular communication mode of operation (e.g. a telephone mode), the hearing aid is configured to pick up a user's voice using electric input signals provided by the input unit (IU.sub.MIC) and to provide an estimate Ŝ.sub.OV(k,n) of the user's voice using a noise reduction system NRS1 according to the present disclosure, and to transmit the estimate (Own voice audio) to a another device (e.g. a telephone or similar device) or system via a synthesis filter bank (FBS) and appropriate transmitter (Tx) and antenna circuitry. Additionally, the hearing aid (HD) comprises an auxiliary audio input (Audio input) configured to receive a direct audio input (e.g. wired or wirelessly) from another device or system, e.g. a telephone (or similar device). In the embodiment of FIG. 5B, a wirelessly received input (e.g. a spoken communication from a communication partner) is shown to be received by the hearing aid via antenna and input unit (IU.sub.AUX). The auxiliary input unit (IU.sub.AUX) comprises appropriate receiver circuitry, an analogue-to-digital converter (if appropriate), and an analysis filter bank to provide audio signal, S.sub.aux, in a time-frequency representation as frequency sub-band signals S.sub.aux(k,n). The forward path of the hearing aid of FIG. 5B comprises the same components as described for the embodiment of FIG. 5A and additionally a selector-mixer (SEL-MIX) allowing the signal of the forward path (which is processed in the processor (PRO) and presented to the user as stimuli perceivable as sound) to be configurable. In control of the Mode control signal (Mode), the output S.sub.x(k,n) of the selector-mixer (SEL-MIX) can be a) the environment signal SENv(k,n) (e.g. an estimate of a target signal in the environment, or an omni-directional signal, e.g. from one of the microphones), b) the auxiliary input signal S.sub.aux(k,n) from another device, or c) a mixture (e.g. a (possibly configurable, e.g. via a user interface) weighted mixture) thereof. Further, compared to the embodiment of FIG. 5A, the forward path of the embodiment of FIG. 5B comprises synthesis filter bank (FBS) configured to convert a signal in the time-frequency domain, represented by a number of frequency sub-band signals (here signal OUT(k,n) from the processor (PRO) to a signal (out) in the time domain. The hearing aid (forward path) further comprises an output transducer (OT) for converting output signal (out) to stimuli (s.sub.out) perceivable by the user as sound (output sound), e.g. acoustic vibrations (e.g. in air and/or skull bone). The output transducer (OT) may comprise a digital-to-analogue converter as appropriate.

(71) The first noise reduction system (NRS1) is configured to provide an estimate of the user's own voice Ŝ.sub.OV. The first noise reduction system (NRS1) may comprise an own voice maintaining beamformer and an own voice cancelling beamformer. The own voice cancelling beamformer comprises the noise sources when the user speaks.

(72) The second noise reduction system (NRS2) is configured to provide an estimate of a target sound source (e.g. a voice SENV of a speaker in the environment of the user). The second noise reduction system (NRS2) may comprise an environment target source maintaining beamformer and an environment target source cancelling beamformer, and/or an own voice cancelling beamformer. The target cancelling beamformer comprises the noise sources when the target speaker speaks. The own voice cancelling beamformer comprises the noise sources when the user speaks.

(73) FIG. 5B may represent an ordinary headset application, e.g. by separating the microphone to transmitter path (IU.sub.MIC-Tx) and the direct audio input to loudspeaker r path (IU.sub.AUX-OT). This may be done in several ways, e.g. by removing the second noise reduction system (NRS2) and the selector mixer (SEL-MIX), and possibly the synthesis filter bank (FBS) (if the auxiliary input signal S.sub.aux is processed in the time domain), to feed the auxiliary input signal S.sub.aux directly to the processor (PRO), which may or (generally) may not be configured to compensate for a hearing impairment of the user,

(74) FIG. 5C shows an exemplary block diagram of a hearing aid comprising a noise reduction system according to an embodiment of the present disclosure including a voice control interface. The embodiment of FIG. 5C comprises a forward path as the embodiment of FIG. 5B, except that the option of including an (e.g. wirelessly received) auxiliary audio signal in the beamformed signal composed by the electric input signals from the input transducers is omitted in the embodiment of FIG. 5C. In another embodiment, the embodiments of FIGS. 5B and 5C may be mixed so that the hearing aid of FIG. 5C additionally comprises the auxiliary input from another device and the option of transmitting the own voice signal to the other device (to implement a communication mode) may be implemented as well. The initiation (or termination) of the communication mode (e.g. telephone mode) may e.g. be provided via the voice interface, e.g. voice control signal Vctr. In the embodiment of FIG. 5C, the estimate of the user's own voice Ŝ.sub.OV provided by the first noise reduction system (NRS1) is used as input to the voice control interface (VCI). The voice control interface (VCI) may e.g. be activated in dependence of the detection of a wake-word (spoken by the user and extracted from the estimate Ŝ.sub.OV of the user's voice). When the voice control interface is activated, a command word among a number of predefined command words may be extracted, and a control signal (VCtr, xVCtr) may be generated in dependence thereof. Functionality of the hearing aid (e.g. implemented by the processor (PRO) may be controlled via the voice interface (VCI), cf. signal Vctr. Extracted wake-words (e.g. ‘Hey Siri’, ‘Hey Google’ or ‘OK Google’, ‘Alexa’, ‘X Oticon’, etc.) and/or command words may be transmitted to another device (e.g. to a smartphone or other voice-controllable devices), cf. control signal xvctr that is transmitted to another device via (optional synthesis filter bank, FBS) and antenna and transmitter circuitry (Tx).

Example 1

(75) In the present application, a maximum likelihood estimator of the noise CPSD matrix that overcomes the limitation of the method presented [1,4] (e.g. when a prominent interference is present in the acoustic environment) is disclosed. It is proposed to extend the noise CPSD matrix model. In the following, the signal model of the noisy observations in the acoustic scene is presented. Based on the signal model, the proposed ML estimator of the interference-plus-noise CPSD matrix is derived, and the proposed method is exemplified by application to own voice retrieval.

(76) The acoustic scene consists of a user equipped with hearing aids or a headset with access to at least M>2 microphones. The microphones pick up the sound from the environment and the noisy signal is sampled into a discrete sequence x.sub.m(t)∈custom character; t∈custom character.sub.0 for all m=1, . . . , M microphones. As illustrated in FIG. 6, the user is active in the acoustic scene, and the desired clean speech signal produced by the user, which we refer to the own voice, is defined as the discrete sequence s.sub.o(t). The interference is modelled as a point source referred to as v.sub.c(t) and the noise in the acoustic environment is v.sub.e,m(t). The noisy signal picked up by the microphones is then a sum of all three components i.e.
x.sub.m(t)=s.sub.o(t)*d.sub.o,m(t)+v.sub.c(t)*d.sub.m(t,θ.sub.c)θv.sub.c,m(t),  (1)
where * denotes the convolution, d.sub.o,m(t) is the relative impulse response between the m'th microphone and the own-voice source, d.sub.m(t, θ.sub.c) is the relative impulse response between m'th microphone and the interference arriving from direction θ.sub.c∈Θ, where we without loss of generality assume that Θ is a discrete set of directions Θ={−180°, . . . , 180} with I elements. An illustration of the acoustic scene is shown in FIG. 7. The objective of the noise reduction system is then to retrieve s.sub.o(t) from the noisy observation x.sub.m(t).

(77) We apply the short-time Fourier transform (STFT) on x.sub.m(t) to transform the noisy signal into the time-frequency (TF) domain with frame length T, decimation factor D, and analysis window w.sub.A(t) such that

(78) x m ( k , n ) = .Math. t = 0 T - 1 w A ( t ) x ( t - nD ) exp ( - j 2 π kt T ) , ( 2 )
is the TF domain representation of the noisy signal where j=√{square root over (−1)}, k is the frequency bin index, and n is the frame index. The signal model for the noisy observation in the TF domain then becomes

(79) x m ( k , n ) = s o ( k , n ) d o , m ( k , n ) + v c ( k , n ) d m ( k , n , θ c ) + v c , m ( k , n ) , ( 3 )

(80) and for convenience, we vectorize the noisy observation such that x(k, n)=[x.sub.1(k, n), . . . , x.sub.M(k, n)].sup.T and

(81) x ( k , n ) = s o ( k , n ) d o ( k , n ) + v c ( k , n ) d ( k , n , θ c ) + v c ( k , n ) v ( n , k ) . ( 4 )

(82) We further assume that the relative transfer function (RTF) vectors (i.e. d.sub.o(k, n) and d(k, n, θ.sub.c)) remain identical over time so we may define d.sub.o(k)custom characterd.sub.o(k, n) and d(k,θ.sub.c)custom characterd(k, n, θ.sub.c). In practice, it is often the case that s.sub.o(k, n), v.sub.c(k, n), and v.sub.e(k, n) are uncorrelated random

(83) C x ( k , n ) = λ s ( k , n ) d o ( k ) d o H ( k ) + λ c ( k , n ) d ( k , θ c ) + λ e ( k , n ) Γ e ( k , n ) C v ( n , k ) , ( 5 )

(84) processes meaning that the CPSD matrix of the noisy observations, i.e. C.sub.x(k, n)=custom character{x(k, n)x.sup.H(k, n)}, is given as

(85) where λ.sub.s(k, n), λ.sub.c (k, n), and λ.sub.c (k, n) are power spectral densities (PSDs) of the own-voice, interference, and noise respectively. Γ.sub.e(k, n) is the normalized noise CPSD matrix with 1 at the reference microphone index and we assume that Γ.sub.e (k, n) is a known matrix, but can for approximately isotropic noise fields be modelled as

(86) Γ e ( k , n ) = .Math. i = 1 I d ( k , θ i ) d H ( k , θ i ) . ( 6 )

(87) We assume that the own voice RTF vector d.sub.o(k) is known, as it can be measured in advance before deployment. The parameters that remain to be estimated are λ.sub.c(k, n), λ.sub.e(k, n), and θ.sub.c and the proposed ML estimators of these parameters will in the following section be presented.

(88) To estimate the interference-plus-noise PSDs λ.sub.c(k, n) and λ.sub.e(k, n) and the interference direction θ.sub.e, we first apply an own voice cancelling beamformer to obtain an interference-plus-noise-only signal (e.g. the signals from the own voice and from a competing speaker). The own voice cancelling beamformer is implemented using an own voice blocking matrix B.sub.o(k). A common approach to find the own voice blocking matrix, is to first find the orthogonal projection matrix of d.sub.o(k) and then select the first M−1 column vectors of the projection matrix. More explicitly, let I.sub.M×M be an M×M identity matrix then I.sub.M×M−1 is the first M−1 column vectors of I.sub.M×M. The own voice blocking matrix is then given as

(89) B o ( k ) = ( I M × M - d o ( k ) d o H ( k ) d o H ( k ) d o ( k ) ) I M × M - 1 , ( 7 )

(90) where B.sub.o(k)∈C.sup.M×M−1. The own voice blocked signal, z(k, n), can be expressed as

(91) z ( k , n ) = B o H ( k ) x ( k , n ) = s c ( k , n ) B o H ( k ) d c ( k , θ c ) d ~ ( k , θ c ) + B o H ( k ) v e ( k , n ) v ~ e ( k , n ) , ( 8 )

(92) and the own voice blocked CPSD matrix is

(93) C z ( k , n ) = λ e ( k , n ) d ~ ( k , θ c ) d ~ H ( k , θ c ) + λ e ( k , n ) Γ ~ e ( k , n ) . ( 9 )

(94) Before presenting the ML estimators of λ.sub.c(k, n), λ.sub.e(k, n), and θ.sub.c, we introduce the own voice-plus-interference blocking matrix {tilde over (B)}(θ.sub.i).

(95) This step is necessary as the ML estimator of the noise PSD, λ.sub.e(k, n), further requires that the interference is removed from the own voice blocked signal z(k, n). Forming the own-voice-plus-interference blocking matrix follows similar procedure as forming the own voice blocking matrix. The own voice-plus-interference blocking matrix can be found as

(96) B ~ ( θ i ) = ( I M - 1 × M - 1 - B o H d ( k , θ i ) d H ( k , θ i ) B o d H ( k , θ i ) B o B o H d ( k , θ i ) ) I M - 1 × M - 2 , ( 10 )

(97) where {tilde over (B)}(θ.sub.i)∈custom character The own voice-plus-interference blocking matrix {tilde over (B)}(θ.sub.i) is a function of direction, as the direction of the interference is generally unknown. The own voice-plus-interference blocked signal is then

(98) 0 q ( n , k ) = B ~ H ( θ i ) z ( k , n ) = s e ( k , n ) B ~ H ( θ i ) d ~ ( k , n , θ c ) + B ~ H ( θ i ) v ~ e ( k , n ) = B ~ H ( θ i ) v ~ e ( k , n ) , if θ i = θ c , ( 11 )

(99) and the blocked own voice-plus-interference CPSD matrix is

(100) C q ( k , n ) = B ~ H ( θ i ) C z ( k , n ) B ~ ( θ i ) = λ e ( k , n ) B ~ H ( θ i ) Γ ~ e ( k , n ) B ~ ( θ i ) , ( 12 )

(101) only if θ.sub.i=θ.sub.c.

(102) It is common to assume that the own-voice, interference, and noise are temporally uncorrelated [6]. Under this assumption, the blocked own voice-plus-interference signal is distributed according to a circular symmetric complex Gaussian distribution i.e. z(k, n)˜custom character.sub.C (0, C.sub.z(k, n)), meaning that the likelihood function for N observations of z(k, n) with Z(k, n)=[z(k, n−N+1), . . . , z(k, n)]∈custom character is given as

(103) f ( Z ( k , n ) .Math. θ i λ e ( k , n ) , λ e ( k , n ) ) = exp ( - Ntr ( C ^ z ( k , n ) C z - 1 ( k , n , θ i ) ) ) π MN .Math. "\[LeftBracketingBar]" C z ( k , n , θ i ) .Math. "\[RightBracketingBar]" N , ( 13 )
where tr(⋅) denotes the trace operator and

(104) C ˆ z ( k , n ) = 1 N Z ( k , n ) Z H ( k , n )
is the sample estimate of the own voice blocked CPSD matrix. ML estimators of the interference-plus-noise PSDs λ.sub.c(k, n) and λ.sub.e(k, n) have been derived in [1,4]. The ML estimator of λ.sub.e(k, n) is given as

(105) λ ^ e ( k , n , θ i ) = 1 M - 2 × tr ( C ˆ q ( k , n , θ i ) ( B ~ H ( θ i ) Γ ~ e ( k , n ) B ~ ( θ i ) ) - 1 ) , ( 14 )
with

(106) C ˆ q ( k , n ) = 1 N B ~ H ( θ i ) Z ( k , n ) Z H ( k , n ) B ~ H ( θ i )
being the sample covariance of the own voice-plus-interference blocked signal and the ML estimator of the interference PSD is then given as [7]
λ.sub.c(k,n,θ.sub.i)={tilde over (w)}.sup.H(θ.sub.i)(Ĉ.sub.z(k,n)−{circumflex over (λ)}.sub.e(k,n,θ.sub.i){tilde over (Γ)}.sub.e(k,n)){tilde over (w)}(θ.sub.i),  (15)
where {tilde over (w)}(θ.sub.i) is the MVDR beamformer constructed from the blocked own voice CPSD matrix i.e.

(107) w ~ ( θ i ) = Γ ~ e - 1 ( k , n ) d o ( k ) d o H ( k ) Γ ~ e - 1 ( k , n ) d o ( k ) . ( 16 )

(108) Inserting the ML estimates {circumflex over (λ)}.sub.e (k, n, θ.sub.i) and {circumflex over (λ)}.sub.c(k, n, θ.sub.i) into the likelihood function, we obtain the concentrated likelihood function ƒ(Z(k, n)|θ.sub.i, {circumflex over (λ)}.sub.c(k, n, θ.sub.i), {circumflex over (λ)}.sub.e (k, n, θ.sub.i) which we simplify to ƒ(Z (k, n)|θ.sub.i). It is common to maximize the log-likelihood function by applying the natural logarithmic function to the concentrated likelihood function. It can then be shown that the concentrated log-likelihood function is proportional to [8,9]
ln f(Z(k,n)|θ.sub.i)×−|{circumflex over (λ)}.sub.c(k,n,θ.sub.i){tilde over (d)}(k,θ.sub.i){tilde over (d)}(k,θ.sub.i).sup.H+{circumflex over (λ)}.sub.c(k,n,θi){tilde over (Γ)}c(k,n)|.   (17)

(109) Under the assumptions that only one single interference is present in the acoustic environment and that the noisy observations across frequency bins are uncorrelated, then a wideband concentrated log-likelihood function can be derived as

(110) ln f ~ ( Z ( 1 , n ) , .Math. , Z ( K , n ) .Math. θ i ) = .Math. k = 1 K ln f ~ ( Z ( k , n ) .Math. θ e ) , ( 18 )

(111) where K is the total number of frequency bins of the one-sided spectrum. To obtain the ML estimate of the interference direction, we maximize the function

(112) θ ^ c argmax θ i Θ ln f ~ ( Z ( 1 , n ) , .Math. , Z ( K , n ) .Math. θ i ) . ( 19 )

(113) As θ.sub.i belongs to a discrete set of directions, the ML estimate of θ.sub.c is obtained through an exhaustive search over θ.sub.i. Finally, to obtain an estimate of the interference-plus-noise CPSD matrix we insert the ML estimates into the interference-plus-noise CPSD model i.e.

(114) C ^ v ( k , n ) = λ ^ c ( k , n , θ ^ c ) d ( k , θ ^ c ) d H ( k , θ ^ c ) + λ ^ e ( k , n , θ ^ c ) Γ e ( k , n ) . ( 20 )

(115) For own voice retrieval, we implement the MWF beamformer. It is well-known that the MWF can be decomposed into an MVDR beamformer and a single channel post Wiener filter [10]. The MVDR beamformer is given as

(116) 0 w MDVR ( k , n ) = C ^ v - 1 ( k , n ) d o ( k ) d o H ( k ) C ^ v - 1 ( k , n ) d o ( k ) , ( 21 )

(117) and the single-channel post Wiener filter is

(118) g ( k , n ) = ( 1 - w MDVR H ( k , n ) C ^ v ( k , n ) w MDVR ( k , n ) w MDVR H ( k , n ) C ^ x ( k , n ) w MDVR ( k , n ) ) . ( 22 )

(119) The MWF beamformer coefficients are then found as
w.sub.MWF(k,n)=w.sub.MVDR(k,n).Math.g(k,n).  (23)

(120) Finally, the own voice signal can be estimated as a linear combination of the noisy observations using the beamformer weights i.e.
y(k,n)=w.sub.MWF.sup.H(k,n)×(k,n).  (24)

(121) The enhanced TF domain signal, y(k, n) is then transformed back into the time domain using the inversion STFT, such that y(t) is the retrieved own voice time domain signal.

(122) It is intended that the structural features of the devices described above, either in the detailed description and/or in the claims, may be combined with steps of the method, when appropriately substituted by a corresponding process.

(123) As used, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well (i.e. to have the meaning “at least one”), unless expressly stated otherwise. It will be further understood that the terms “includes,” “comprises,” “including,” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will also be understood that when an element is referred to as being “connected” or “coupled” to another element, it can be directly connected or coupled to the other element but an intervening element may also be present, unless expressly stated otherwise. Furthermore, “connected” or “coupled” as used herein may include wirelessly connected or coupled. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items. The steps of any disclosed method is not limited to the exact order stated herein, unless expressly stated otherwise.

(124) It should be appreciated that reference throughout this specification to “one embodiment” or “an embodiment” or “an aspect” or features included as “may” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. Furthermore, the particular features, structures or characteristics may be combined as suitable in one or more embodiments of the disclosure. The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects.

(125) The claims are not intended to be limited to the aspects shown herein but are to be accorded the full scope consistent with the language of the claims, wherein reference to an element in the singular is not intended to mean “one and only one” unless specifically so stated, but rather “one or more.” Unless specifically stated otherwise, the term “some” refers to one or more.

(126) Accordingly, the scope should be judged in terms of the claims that follow.

REFERENCES

(127) [1] U. Kjems and J. Jensen, “Maximum likelihood based noise covariance matrix estimation for multimicrophone speech enhancement,” in 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO), August 2012, pp. 295-299. [2] Yujie Gu and A. Leshem, “Robust Adaptive Beamforming Based on Interference Covariance Matrix Reconstruction and Steering Vector Estimation,” IEEE Transactions on Signal Processing, vol. 60, no. 7, pp. 3881-3885, July 2012. [3] Richard C. Hendriks and Timo Gerkmann, “Estimation of the noise correlation matrix,” in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, May 2011, pp. 4740-4743, IEEE. [4] Jesper Jensen and Michael Syskind Pedersen, “Analysis of beamformer directed single-channel noise reduction system for hearing aid applications,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, Queensland, Australia, April 2015, pp. 5728-5732, IEEE. [5] Mehrez Souden, Jingdong Chen, Jacob Benesty, and Sofi′ ene Affes, “An Integrated Solution for Online Multichannel Noise Tracking and Reduction,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 2159-2169, September 2011. [6] K. L. Bell, Y. Ephraim, and H. L. Van Trees, “A Bayesian approach to robust adaptive beamforming,” IEEE Transactions on Signal Processing, vol. 48, no. 2, pp. 386-398, February 2000. [7] Adam Kuklasinski, Simon Doclo, Timo Gerkmann, Soren Holdt Jensen, and Jesper Jensen, “Multi-channel PSD estimators for speech dereverberation—A theoretical and experimental comparison,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, Queensland, Australia, April 2015, pp. 91-95, IEEE. [8] Mehdi Zohourian, Gerald Enzner, and Rainer Martin, “Binaural Speaker Localization Integrated Into an Adaptive Beamformer for Hearing Aids,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 3, pp. 515-528, March 2018. [9] Hao Ye and D. DeGroat, “Maximum likelihood DOA estimation and asymptotic Cramer-Rao bounds for additive unknown colored noise,” IEEE Transactions on Signal Processing, vol. 43, no. 4, pp. 938-949, April 1995. [10] Michael Brandstein and Darren Ward, Microphone Arrays: Signal Processing Techniques and Applications, 2001. [11] EP2701145A1 (Retune, Oticon) 26 Feb. 2014