HEARING DEVICE COMPRISING A LOW COMPLEXITY BEAMFORMER
20230186934 · 2023-06-15
Assignee
Inventors
- Jan M. DE HAAN (Smørum, DK)
- Robert REHR (Smørum, DK)
- Sebastien CURDY-NEVES (Ballerup, DK)
- Svend FELDT (Ballerup, DK)
- Jesper Jensen (Smørum, DK)
- Michael Syskind Pedersen (Smørum, DK)
- Michael Noes GÄTKE (Smørum, DK)
- Mohammad EL-SAYED (Smørum, DK)
- Stig PETRI (Ballerup, DK)
- Karsten BONKE (Smørum, DK)
- Gary Jones (Smørum, DK)
- Poul Hoang (Smørum, DK)
Cpc classification
H04R2430/20
ELECTRICITY
H04R25/407
ELECTRICITY
International classification
G10K11/178
PHYSICS
H04R1/10
ELECTRICITY
Abstract
A hearing device includes a) a multitude of input transducers providing a corresponding multitude of electric input signals; and b) a processor for providing a processed signal in dependence of the electric input signals. The processor includes b1) a beamformer for providing a spatially filtered signal in dependence of electric input signals and beamformer filter coefficients determined in dependence of a fixed steering vector including as elements respective acoustic transfer functions from a target signal source, to each of said multitude of input transducers; and b2) a target adaptation module connected to the input transducers and to at least one beamformer, the target adaptation module being configured to provide compensation signals to compensate the electric input signals so that they match the fixed steering vector.
Claims
1. A hearing device configured to be worn by a user, the hearing device comprising a multitude of input transducers, each providing an electric input signal representing sound in the environment of the hearing device, thereby providing a corresponding multitude of electric input signals; a processor for providing a processed signal in dependence of said multitude of electric input signals, the processor comprising at least one beamformer for providing a spatially filtered signal in dependence of said electric input signals, or signals originating therefrom, and beamformer filter coefficients, said beamformer filter coefficients being determined in dependence of a fixed steering vector comprising as elements respective acoustic transfer functions from a target signal source providing a target signal to each of said multitude of input transducers, or acoustic transfer functions from a reference input transducer among said multitude of input transducers to each of the remaining input transducers; and a target adaptation module connected to said multitude of input transducers and to said at least one beamformer, said target adaptation module being configured to provide compensation signal(s) to compensate said multitude of electric input signals so that they match said fixed steering vector.
2. A hearing device according to claim 1 wherein the target adaptation module comprises at least one adaptive filter for estimating said compensation signal(s).
3. A hearing device according to claim 2 wherein the at least one adaptive filter of the target adaptation module is configured to adaptively determine at least one correction factor to be applied to said electric input signals to provide said compensation signal(s).
4. A hearing device according to claim 2 comprising a voice activity detector for estimating whether or not, or with what probability, an input signal comprises a voice signal at a given point in time, and wherein the at least one adaptive filter is controlled by said voice activity detector.
5. A hearing device according to claim 2 wherein said at least one adaptive filter of the target adaptation module comprises an adaptive algorithm and a variable filter, wherein the adaptive algorithm comprises a step size parameter, and wherein the adaptive algorithm is configured to determine a sign of the step size parameter.
6. A hearing device according to claim 5 wherein the adaptive algorithm of the target adaptation module is a complex sign Least Mean Squares algorithm, and wherein the adaptive algorithm is configured to determine the sign of the step size parameter in dependence of ‘the electric input signal’ and the error signal.
7. A hearing device according to claim 1 wherein the processor is configured to minimize an error between a given current electric input signal from a given non-reference input transducer and the electric, reference, input signal from the reference microphone as modified by the steering vector of the at least one beamformer, to thereby compensate said multitude of electric input signals so that they match said fixed steering vector.
8. A hearing device according to claim 1 wherein said matching of the fixed steering vector comprises matching a complex-valued steering vector.
9. A hearing device according to claim 8 wherein said matching of the complex steering vector comprises matching the real and imaginary part separately.
10. A hearing device according to claim 8 wherein said matching of the complex steering vector comprises matching a), a1) a magnitude, or a2) a magnitude squared, or b) the phase of the steering vector, or both a) and b).
11. A hearing device according to claim 3 wherein the least one beamformer comprises an own voice beamformer, and wherein the target adaptation module comprises an own voice-only detector configured to determine when said at least one correction factor is updated.
12. A hearing device according to claim 1 wherein the processor is configured to apply one or more processing algorithms to the multitude of electric input signals, or to one or more signals, originating therefrom.
13. A hearing device according to claim 1 wherein said at least one beamformer comprises a time invariant, target-maintaining beamformer and a time invariant, target-cancelling beamformer, respectively.
14. A hearing device according to claim 1 further comprising a noise canceller comprising an adaptive filter for estimating an adaptive noise reduction parameter and providing a noise reduced target signal.
15. A hearing device according to claim 14 wherein the adaptive algorithm of the adaptive filter of the noise canceller comprises the complex sign Least Mean Squares algorithm, and wherein the adaptive algorithm is configured to determine the sign of the step size parameter in dependence of output of the target-cancelling beamformer and the noise reduced target signal.
16. A hearing device according to claim 1 comprising a post filter providing a resulting noise reduced signal exhibiting a further reduction of noise in the target signal in dependence of the spatially filtered signals and optionally one or more further signals.
17. A hearing device according to claim 1 comprising an output transducer for converting said processed signal to stimuli perceivable by the user as sound.
18. A hearing device according to claim 1 being constituted by or comprising a hearing aid.
19. A method of operating a hearing device configured to be worn by a user, the method comprising providing a multitude of electric input signal representing sound in the environment of the hearing device, providing a processed signal in dependence of said multitude of electric input signals, at least by providing a spatially filtered signal in dependence of said electric input signals, or signals originating therefrom, and beamformer filter coefficients, said beamformer filter coefficients being determined in dependence of a fixed steering vector comprising as elements respective acoustic transfer functions from a target signal source providing a target signal to each of said multitude of input transducers, or acoustic transfer functions from a reference input transducer among said multitude of input transducers to each of the remaining input transducers; and by providing compensation signal(s) to compensate said multitude of electric input signals so that they match said fixed steering vector.
20. A non-transitory computer readable medium storing a computer program comprising instructions which, when the program is executed by a computer, cause the computer to carry out the method of claim 19.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0127] The aspects of the disclosure may be best understood from the following detailed description taken in conjunction with the accompanying figures. The figures are schematic and simplified for clarity, and they just show details to improve the understanding of the claims, while other details are left out. Throughout, the same reference numerals are used for identical or corresponding parts. The individual features of each aspect may each be combined with any or all features of the other aspects. These and other aspects, features and/or technical effect will be apparent from and elucidated with reference to the illustrations described hereinafter in which:
[0128]
[0129]
[0130]
[0131]
[0132]
[0133]
[0134]
[0135]
[0136]
[0137]
[0138]
[0139]
[0140] The figures are schematic and simplified for clarity, and they just show details which are essential to the understanding of the disclosure, while other details are left out. Throughout, the same reference signs are used for identical or corresponding parts.
[0141] Further scope of applicability of the present disclosure will become apparent from the detailed description given hereinafter. However, it should be understood that the detailed description and specific examples, while indicating preferred embodiments of the disclosure, are given by way of illustration only. Other embodiments may become apparent to those skilled in the art from the following detailed description.
DETAILED DESCRIPTION OF EMBODIMENTS
[0142] The detailed description set forth below in connection with the appended drawings is intended as a description of various configurations. The detailed description includes specific details for the purpose of providing a thorough understanding of various concepts. However, it will be apparent to those skilled in the art that these concepts may be practiced without these specific details. Several aspects of the apparatus and methods are described by various blocks, functional units, modules, components, circuits, steps, processes, algorithms, etc. (collectively referred to as “elements”). Depending upon particular application, design constraints or other reasons, these elements may be implemented using electronic hardware, computer program, or any combination thereof.
[0143] The electronic hardware may include micro-electronic-mechanical systems (MEMS), integrated circuits (e.g. application specific), microprocessors, microcontrollers, digital signal processors (DSPs), field programmable gate arrays (FPGAs), programmable logic devices (PLDs), gated logic, discrete hardware circuits, printed circuit boards (PCB) (e.g. flexible PCBs), and other suitable hardware configured to perform the various functionality described throughout this disclosure, e.g. sensors, e.g. for sensing and/or registering physical properties of the environment, the device, the user, etc. Computer program shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise.
[0144] The present application relates to the field of hearing devices, e.g. hearing assistive devices, such as headsets or hearing aids.
[0145] In hearing assistive devices, it is desirable to capture and enhance speech for different applications.
[0146] An efficient way of enhancing speech is to use multichannel noise reduction techniques such as beamforming. The purpose of the beamforming system is two-fold: pass the speech signal without distortion, while suppressing the less important background noise to a certain level.
[0147] A time-invariant beamformer may be a good baseline for a noise reduction system, if it is possible to make reasonable prior assumptions about the target and the background noise. In a hearing aid system, it may be a fair assumption that the target is impinging from the front of the user wearing the hearing aid system. In a headset use case, on the other hand, it is a fair assumption that wanted (target) speech is coming from the user's mouth and that all sources in other directions and distances are assumed to be noise sources.
[0148] In a speakerphone use case, target speech may generally impinge on the microphones from any direction (which may dynamically change). In a speakerphone, a multitude (e.g. four) of fixed directions may be defined and a fixed beamformer be implemented for each direction.
[0149]
[0150] The leftmost part of
[0151] After (‘downstream of’) the input stage denoted ‘SIGNAL MODEL’ a section termed ‘BEAMFORMER’ is included in
[0152] where C.sub.v is the (inter-microphone) noise covariance matrix for the current noise field (e.g. based on an assumption, e.g. isotropy, of the noise). In MVDR beamforming, e.g., the microphone signals are processed such that the sound impinging from a target direction at a chosen reference microphone is unaltered (‘distortionless’) by the beamformer. In the embodiment of
[0153] 1) Robust Time-Invariant Beamformer:
[0154] The purpose of the exemplary time-invariant beamformer shown in
[0155] The term “On average” is taken to mean that acoustical and device variations are considered (taken into account). This could be variations related to device placement, individual head- and torso acoustics (user variations, head size, ears, motion, vibrations, etc.), variations in device and production tolerances (microphone sensitivity assembly, plastics, ageing, deformation, etc). “On average” may be taken to mean that we do not adapt to individual differences but rather estimate a set of parameters which have the best performance across different variations. If we only have one set of parameters (weights) we aim at a high average performance for most individuals rather than possibly achieving even higher performance for a few and lower performance for many.
[0156] Additionally, this embodiment of a time-invariant beamformer requires an assumption on the noise field. If no specific assumptions can be made, the uncorrelated noise (i.e., microphone noise) and/or isotropic noise field (noise is equally likely and occurs with the same intensity from any direction) assumption is often used.
[0157] An initial representation of the actual noise field is obtained by a robust target-cancelling beamformer w.sub.tc, i.e., a spatial filter/beamformer which “on average” provides as much attenuation of the target component as possible, leaving the rest of the input sound field unaltered as much as possible. This provides a good representation of the background noise as input to an adaptive noise canceller. This is illustrated in
[0158]
[0159] Where d is an acoustic transfer function vector for sound from the target signal source to the microphones (M.sub.1, M.sub.2) of the hearing device (e.g. comprising relative transfer functions (RTF or d) for propagation of sound impinging on the reference microphone (MO from the target sound source). In the two-microphone example of
[0160] Time-invariant beamformers may e.g. be designed using the Minimum Variance Distortionless Response (MVDR) objective with an average steering vector and uncorrelated or isotropic noise assumption. Regarding the meaning of an ‘average steering vector’, it may refer to an average across users' heads, wearing styles, etc., as e.g. indicated above regarding the term ‘on average’ and in the next paragraph regarding the MVDR formula. More general objective functions may be formulated for robustness against steering vector variations. This objective function can be solved by numeric optimization methods, where data and or models of variability are employed.
[0161] The MVDR formula for determining beamformer filter coefficients
[0162] requires the steering vector d as input parameter. The steering vector represents a transfer function between a reference microphone and the other microphones for a given impinging sound source.
[0163] The transfer function may include head-related impulse responses, i.e. taking into account that the microphones are placed on a head, e.g. on a hearing aid shell mounted behind the pinna or in the pinna.
[0164] An average steering vector d may represent a transfer function estimated across an average head. Or it may represent a transfer function which on average across individuals performs well, e.g. in terms of maximizing the directivity (or other performance parameters) across individuals.
[0165] 2) Noise Field Adaptation:
[0166] The noise field adaption may be seen as an add-on to the time-invariant (fixed) beamformer in section 1) above. Since the time-invariant beamformer is optimal for uncorrelated noise or isotropic noise fields, noise field adaptation may be employed to achieve a more optimal beamformer with respect to the actual noise field. This requires adaptation to the noise field.
[0167] An adaptive noise cancelling system may be employed, where the output (b) of the target-cancelling beamformer (w.sub.tc.sup.H) is filtered (cf. multiplication unit (‘x’) and adaptive parameter (β*), where * in β* indicates complex conjugate such that it provides an estimate of the noise component (NE) in the output of the time-invariant beamformer ((w.sup.H) from section 1 and
[0168] The time-invariant beamformer may be defined by
[0169] where C.sub.v is a diagonal matrix. Thereby a solution which minimizes internal (microphone) noise is provided.
[0170]
[0171] The filter coefficients (of the filter applied to the microphone signals; i.e. the resulting weights applied to each microphone signal are the (frequency-dependent) filter coefficients) may, e.g., be adapted using a complex sign LMS algorithm (denoted ‘SIGN LMS’ in
[0172] The adaptive SIGN LMS algorithm may e.g. provide the adaptive parameter according to the following recursive expression:
β.sub.l+1=β.sub.l+μ sign(y*)sign(b)
[0173] where l is a time index, μ is a step size of the adaptive algorithm and
[0174] The sign of a complex value x.sub.c is here defined as:
sign(x.sub.c)=sign(Re(x.sub.c))+jsign(Im(x.sub.c)),
[0175] where the sign of a real value x.sub.r is defined as
[0176] The complex sign real and imaginary parts sign(Re(x.sub.c)), sign(Im(x.sub.c)) can only take on values −1, or +1.
[0177] The notation used above for beamformers (w.sup.H, w.sub.tc.sup.H) and adaptive parameter β* is the common academic textbook notation. This means that the filter operations of the type y=w.sup.Hx are implemented as y={tilde over (w)}.sup.T x where {tilde over (w)}=w*, i.e. the weights are pre-conjugated.
[0178] Furthermore, the adaptation of filter weights is done such that they compute conjugated weights. So, the complex sign-sign adaptation of beta in an implementation will compute the conjugated beta {tilde over (β)}:
{tilde over (β)}.sub.l+1={tilde over (β)}+μ sign(y)sign(b*)
[0179] The purpose of this is to reduce the number of conjugation operations (to thereby reduce computational complexity, which is important for miniature devices, such as hearing aids).
[0180] The NLMS update of beta is given by
[0181] This calculation requires a division (which is expensive and good to avoid). If we solely consider the sign(y*b)=sign(y*)sign(b), as proposed in the present disclosure, we still get the gradient direction correct. However, the gradient step size may not be optimal. So, an advantage is to provide a decreased computational complexity (by avoiding the division operation).
[0182] As the proposed algorithm adapts to the noise, we get an improved noise estimate compared to a set of fixed weights, which is only optimal for the “average” noise.
[0183] The accuracy of the filter coefficients may be improved by only updating it in noise-only periods. In order to achieve this a negated target detector output (cf.
[0184] The voice detector/own voice detector may be frequency band specific, or it may be implemented as a broad band detector (at a given time having the same value for all frequency bands).
[0185] If the time-invariant beamformer was designed without a steering vector (i.e., by using other objective functions than the MVDR), a d.sub.2 value may be computed for any 2-microphone time-invariant beamformer w using
where d.sub.1=1 and w.sup.Hd=1. The corresponding target-cancelling beamformer may be found by computing
[0186] where d=[1, d.sub.2].sup.T
[0187] The formula for the beamformer weights of an MVDR beamformer
[0188] is a general formula, which is valid for M micropnones. but also in the case where a noise estimate is subtracted from the distortion less signal can be generalized (often termed generalized sidelobe canceller, GSC), as described in the following.
[0189]
[0190] The above equation for w.sub.tc is actually a special case for M=2 of the target cancelling beamformer, where the adaptive beamformer weights are defined as
w.sub.GSC(k)=a−Bβ,
[0191] Where a typically is a time-invariant M×1 delay-and-sum beamformer vector not altering the target signal direction, and B is a time-invariant blocking matrix of size M×(M−1), and β is an (M−1)×1 adaptive filtering vector.
[0192] Matrix B is found by taking M-1 columns from matrix H, which is defined as:
[0193] The optimal adaptive coefficients given by (cf. e.g. [4])
β=(B.sup.HC.sub.vB).sup.−1B.sup.HC.sub.va,
[0194] where a and B are orthogonal to each other, i.e. a.sup.HB=0.sub.1×(M−1), and β is updated when speech is not present. The optimal beamformer weights are thus calculated as
w.sub.GSC(k)=a−B(B.sup.HC.sub.vB).sup.−1B.sup.HC.sub.va.
[0195] For the M>2 case, the term may also be estimated by a gradient update.
[0196] The complex sign-sign LMS update equation for the (M−1)×1 beta vector in the M>2 case is given by:
β.sub.l+1=β.sub.l+μ sign(y*) sign(b)
[0197] Where b=B.sup.Hx, and where a and B are fixed and β is adaptive.
[0198] A disadvantage of a noise field adaptation is that any robustness errors of the time-invariant beamformers will be exaggerated, so the performance improvement of the noise field adaptation may be reduced dependent on how well the acoustic situation matches the time-invariant beamformers. In order to improve this behavior, the target steering adaptation as described below may be introduced.
[0199] 3) Target Steering Adaptation:
[0200] The target steering adaptation may be seen as an add-on to the beamformer systems described in sections 1) and 2) above. The main idea is to filter the microphone signal(s) in such a way that the target component in the signals at the microphones acoustically matches the signal model (look vector) used to design the time-invariant beamformer. In other words, the purpose of the correction is to realign the signal in phase to meet the original beamformer design.
[0201] The main purpose of the target steering adaptation stage is to compensate for the acoustical and device variations to achieve improved capturing of the target speech and reduce the loss of the target signal. Furthermore, this compensation will improve the target-cancelling beamformer of the system described in section 2) above, in such way that the target signal is attenuated more.
[0202] The solution is related to look vector estimation for beamforming, but instead of computing a new beamformer based on an estimated steering vector, it is proposed that the inputs to an existing beamformer are compensated to match the look vector of the existing beamformer.
[0203] The solution comprises correction filters on all microphones except for the reference microphone. The correction filters are adapted using a complex sign LMS algorithm, where the error signal is computed using the steering vector of the fixed beamformer from section 1) above. The error signal quantifies the deviation between the actual acoustics compared to the signal model which is assumed by the beamformer.
[0204] In principle, the update of the compensation filter is only done when the microphone signal consists of the noise-free target signal. In practice, the update is performed, when it is most likely that the target signal is dominant. This is achieved by using a target speech detector.
[0205] A target speech detector may be based on the ratio of the target- and target-cancelling-beamformer output powers. In the case of own voice enhancement, magnitude of the error signal can be employed for characterization of the input, i.e., if the magnitude of the error signal is large, it is unlikely that the input speech is the user's own voice (might instead be an undesired external speech source).
[0206]
[0207] The algorithm requires the steering vector d, which is the time-invariant beamformers steering vector. For a time-invariant beamformers with more than 2 microphones, a steering vector is the vector d that fulfils w.sup.Hd=1 and B.sup.Hd=0, where
[0208] and B is obtained by taking (any) M−1 (of the M) columns of H.
[0209] The purpose of the target estimation is to monitor how much the target signal deviates from the look vector which was used to compute the time invariant beamformers. This is done by computing an error signal corresponding to microphone signals 2, . . . , M.
e.sub.m=x.sub.1−c*x.sub.m, for m=2, . . . M
[0210] where x.sub.m denotes the m-th microphone and c is a complex microphone correction coefficient. The correction coefficient is updated using a complex sign-sign LMS according to
c.sub.m,l+1=c.sub.m,l+μ sign(e*.sub.m) sign(x.sub.m)VAD, for m=2, . . . , M
[0211] The update is done in time frequency regions with target activity only, l being a time index.
[0212] The step size μ of both LMS algorithms (for noise field adaptation and target steering adaptation, respectively) may be interdependent (e.g. equal). The step size μ of the two LMS algorithms may, however, be independently determined, e.g. so that the adaptation to the background noise may be set to be faster than the adaptation to a target. E.g. in the case of adapting to own voice, it may be advantageous to have a slower step size μ for the target adaptation.
[0213] The step size can also vary across frequency bands. The choice of the step size value is a trade-off between convergence speed and accuracy. Generally, the step-size is time-invariant, but may also be changed adaptively, based on estimates of the accuracy, e.g., the magnitude of the error signal.
[0214] 4) Complex Sign LMS
[0215] In the following a low complexity implementation of the noise and target adaptation algorithms is proposed. The (non-complex) sign LMS algorithm is a well-known low complexity version of the LMS algorithm (cf. e.g. references [1], [2], [3]).
[0216] The Complex LMS refers to the LMS algorithm for complex data and coefficients.
[0217] The Sign LMS comes in many variants, and usually for real-valued data and weights cf. e.g. [3]):
TABLE-US-00001 Signed Error LMS h(n + 1) = h(n) + μ x(n)sign(e(n)) Signed Data LMS h(n + 1) = h(n) + μ sign(x(n))e(n) Sign-Sign LMS h(n + 1) = h(n) + μ sign(x(n))sign(e(n))
[0218] In all these cases the sign operation for real values is given by
[0219] The Complex Sign LMS is simply a Sign LMS for complex valued data and coefficients.
[0220] For example, the Complex Sign-Sign algorithm
h(n+1)=h(n)+μ sign(x(n))sign(e*(n))
[0221] The complex sign (of a complex number x) may be given by taking the sign of the real (x.sub.R) and imaginary (x.sub.l) part, i.e., sign(x)=sign(x.sub.R)+jsign(x.sub.l).
[0222] The Least Mean Square (LMS) update rule is given by
h(n+1)=h(n)+μx(n)e*(n),
[0223] where h(n) is the filter coefficient, x(n) is the filter input and e(n) is the error signal. The error signal is defined as
e(n)=t(n)−h*(n)x(n),
[0224] where t(n) is the desired signal.
[0225] The filter coefficient h(n) may e.g. only be updated when (own) voice is detected, e.g. only when the signal to noise ratio is greater than 5 dB or greater than 10 dB. The filter coefficient may only be updated when the error is small, i.e. if the filter coefficient is close to the desired transfer function d. Hereby adapting to directions which are not of interest is avoided.
[0226] The voice activity detector (VAD) may as well be based on a binaural criterion, e.g. combination on the VAD decision based on left-ear and right-ear devices.
[0227] The voice activity detector used for target adaptation may be different from the inverse voice activity detector which is used in the noise canceller to update the noise estimate (β).
[0228] The magnitude of the update step is dependent on the step-size μ, the input signal x(n) and the error signal e (n).
[0229] In the complex-sign LMS, the magnitude of the update step is only dependent on the step-size. The complex sign is given by taking the sign of the real and imaginary part, i.e., sign(x)=sign(x.sub.R)+jsign(x.sub.l). Applying the complex sign operator on e*(n) and x(n) normalizes the magnitude effectively to √{square root over (2)} and hence, the update no longer depends on the magnitude of e*(n) and x(n). The update rule for the complex-sign LMS is given by
h(n+1)=h(n)+μ sign(x(n))sign(e*(n)).
[0230] A drawback of the Sign-Sign LMS is that if a very large step size is chosen to achieve fast convergence, the excess error is large and can lead to audible artifacts. This can be improved by a double filter approach, where we define a foreground and a background filter. The foreground filter is a fast-converging Complex Sign-Sign LMS filter (large step size).
e(n)=d(n)−h*(n)x(n)
h(n+1)=h(n)+μ sign(x(n))sign(e*(n))
[0231] The background filter may be updated from the foreground coefficient according to the following rationale
[0232] The output of the background filter y(n)=h.sub.2(n)x(n) is then used as the algorithm output signal. In words: the background filter is a smoothed version of the foreground filter when the foreground filter has a smaller error signal magnitude (with marginal γ), otherwise the background filter coefficient will not be updated. The smoothing operation is a common first order smoothing, where factor a is a smoothing coefficient.
[0233] The double filter can be used in the LMS algorithm in the precorrection as well as in the noise canceller.
[0234] Other metrics may be used to determine the input correction than the error signal e(n), e.g. a in the equation for h.sub.2(n+1) above. Or prior knowledge, or other evaluation parameters of the inputs (e.g. SNR).
[0235]
[0236] Examples of Use of a Noise Reduction System According to the Present Disclosure:
[0237]
[0238]
[0239] Additionally, the hearing aid (HD) comprises an auxiliary audio input (Audio input) configured to receive a direct audio input (e.g. wired or wirelessly) from another device or system, e.g. a telephone (or similar device). In the embodiment of
[0240] The first noise reduction system (NRS1) is configured to provide an estimate of the user's own voice Ŝ.sub.OV. The first noise reduction system (NRS1) may comprise own voice maintaining beamformer and an own voice cancelling beamformer (cf. e.g.
[0241] The second noise reduction system (NRS2) may be configured to provide an estimate of a target sound source (e.g. a voice Ŝ.sub.ENV of a speaker in the environment of the user). The second noise reduction system (NRS2) may comprise an environment target source maintaining beamformer and an environment target source cancelling beamformer, and/or an own voice cancelling beamformer. The target-cancelling beamformer comprises the noise sources when the target speaker (in the environment) speaks. The own voice cancelling beamformer comprises the noise sources when the user speaks. The second noise reduction system (NRS2) may be a noise reduction system according to the present disclosure.
[0242]
[0243]
[0244]
[0245] The target-maintaining beamformer (w.sup.H) and target-cancelling beamformer (w.sub.tc.sup.H) provide spatially filtered signals a(k) and b(k), respectively, as (different) weighted combinations of the first and second electric input signals x.sub.1(k) and x.sub.2(k), respectively. The first, target-maintaining beamformer (w.sup.H) may represent a delay and sum beamformer providing an (enhanced) omni-directional signal (a(k)). The second target-cancelling beamformer (w.sub.tc.sup.H) may represent a delay and subtract beamformer providing target-cancelling signal (b(k)). The first and second spatially filtered signals provided by the respective fixed beamformers (w.sup.H) and (w.sub.tc.sup.H) are hence given by
a(k)=w.sub.1(k).Math.x.sub.1(k)+w.sub.2(k).Math.x.sub.2(k),
b(k)=w.sub.tc1(k).Math.x.sub.1(k)+w.sub.tc2(k).Math.x.sub.2(k),
[0246] In the embodiment of
y(k)=a(k)−β*(k).Math.b(k),
[0247] where β(k) is the frequency dependent parameter controlling the final shape of the directional beam pattern (of signal y).
[0248] The noise reduced (spatially filtered) target signal (y(k)) and the target-cancelling signal (b(k)) are fed to post filter (PF) for further noise reduction and provision of a (resulting) noise reduced signal (y.sub.NR) of the noise reduction system (NRS).
[0249]
[0250] The embodiment of
[0251] The adaptive noise canceller of
β.sub.l+1=β.sub.l+μ sign(y*)sign(b)
[0252] where μ is the step size of the adaptive algorithm and
[0253] Further, compared to
[0254] The (fixed) target maintaining beamformer (w.sup.H) and a (fixed) target cancelling beamformer (W.sub.tc.sup.H) of the generalized noise reduction system of
c.sub.m,l+1=c.sub.m.l+μ sign(e*.sub.m) sign(x.sub.m) VAD, for m=2, . . . , M,
[0255] where l is a time index, and m is an input signal (e.g. a microphone) index.
[0256] The SIGN LMS algorithms of the embodiment of
[0257] The generalized expressions for the steering vector d, and the weights of the target maintaining (w.sup.H) and the target cancelling (W.sub.tc.sup.H) beamformers are indicated in
[0258] Own Voice-Only Detection/Estimation:
[0259]
[0260]
[0261] The own voice-maintaining beamformer (w.sub.ov.sup.H) represents an enhanced omni beamformer calibrated to own voice (OV) as measured on a model (e.g. a HATS or KEMAR model, or similar, cf. the Head and Torso Simulator (HATS) 4128C from Brüel & Kjær Sound & Vibration Measurement A/S, or the head and torso model KEMAR from GRAS Sound and Vibration A/S), but where model provides the own voice (‘the model talks’). The target cancelling beamformer (w.sub.tc.sup.H) is calibrated to cancel the ‘own voice’ of the model. Hence, the beamformers (w.sub.ov.sup.H, w.sub.tc.sup.H) represent fixed beamformers.
[0262] A problem of fixed beamformers is that the hearing device may not be ‘correctly’ mounted (e.g. different from the (presumably careful) mounting on the model) resulting in the pre-defined (fixed) calibration being non-optimal, and hence effectively resulting in a ‘target signal loss’. This again may result in that an adaptive parameter β (cf. e.g.
[0263] In
[0264] Based on the second electric input signal (x.sub.2) (e.g. from a rear, non-reference microphone (M.sub.2) and the error signal (e), the Sign-LMS-algorithm (SIGN LMS) provides a (first) complex correction factor c*.sub.fast that is multiplied onto the rear microphone signal (x.sub.2) in a multiplication unit (x). The resulting signal (x.sub.2.Math.c*.sub.fast) is subtracted from the result (x.sub.1.Math.d.sub.2′) of a multiplication of the first electric input signal (x.sub.1) from the first (e.g. front, reference) microphone, with the (model) relative acoustic transfer function (d.sub.2′) from the first (reference) microphone (M.sub.1) to the second microphone (M.sub.2) in subtraction unit (+). Thereby the error signal (e) is provided. The error signal (e) is minimized by the Sign-LMS-algorithm, given the current second (rear) microphone signal (x.sub.2). The complex correction factor (c*.sub.fast) is further fed to a variable level estimator (VARLE) that provides a smoothed complex correction factor (c*.sub.slow) that is multiplied onto the rear microphone signal (x.sub.2) so that the rear microphone signal (x.sub.2) is corrected to fit to the original steering vector (d.sub.2′) of the model, see signal (x.sub.2′) after multiplication unit (x). The complex ‘slow’ correction factor c*.sub.slow, may e.g. be fed back to own voice detector OVOD via a low-pass filtering function (cf. LPz.sup.−1-block providing parameter μ.sub.ov to the own voice detector (OVOD), e.g. for recursively updating the average value of the correction factor (c*.sub.fast) during own voice-only (cf. below in connection with
[0265] Each user has a unique correction factor (c*) due to different acoustics in the head and torso etc., from person to person. The “average value of the correction factor” (μ.sub.ov) may e.g. be initialized individually for each user. The personalized correction factor may e.g. be measured in a (preferably quiet) sound studio where the subject talks while the hearing device(s) are mounted on the person. Instead of a measuring on the particular user, an average correction factor for a given user, may be initialized as the average value of measured personalized correction factors on a multitude of test persons performed in the sound studio.
[0266] Compared to the embodiment of
[0267] The block diagram of
[0268]
[0269] The main input of the own voice-only detector (OVOD) is the ‘fast’ correction factor (c*.sub.fast(k,n)), which is provided by the Sign LMS algorithm (output of the SIGN LMS block, see
[0270] The values of the frequency dependent acoustic environment parameter (Φ(k)) and the average correction factor μ(k) may e.g. be found by training a neural network with ground truth data for different sound scenes (including own-voice-only scenes) with different noise levels (including estimation of the bias value (Φ.sub.0)).
[0271] The resulting time-domain signal x(n) indicating whether or not own voice-only is present is compared to a first threshold value (Thr1, e.g. =0) in ‘>Thr1’ block in
[0272] The lower signal path starting from frequency dependent voice activity signal VAD(k,n) is intended to give more robustness to the own-voice-only detection. As also indicated in FIG.
[0273] 8, a (e.g. modulation based) per frequency band voice activity detector is used to check whether the source is a modulated source (e.g. speech) and has a ‘decent’ SNR. The individual band specific VAD-signals are combined (cf. Sum-block (SUM,k) in
[0274] Own voice is typically high level (≥70 dB) because the sound source (mouth) is closer to the microphones of the hearing aid than any other sound source around the user. Such criterion (OV-level≥Lth) may be added as a further input to the AND-block to thereby make the own-voice-only-decision still more robust.
[0275] The output of the AND-block is the ‘robust’ ovod-signal of the OVOD-block, which is used to control to the variable level detector-block (VARLE) in
[0276] Embodiments of the disclosure may e.g. be useful in applications such as hearing aids or headsets or other wearable audio processing devices with a relatively limited power budget.
[0277] It is intended that the structural features of the devices described above, either in the detailed description and/or in the claims, may be combined with steps of the method, when appropriately substituted by a corresponding process.
[0278] As used, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well (i.e. to have the meaning “at least one”), unless expressly stated otherwise. It will be further understood that the terms “includes,” “comprises,” “including,” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof It will also be understood that when an element is referred to as being “connected” or “coupled” to another element, it can be directly connected or coupled to the other element, but an intervening element may also be present, unless expressly stated otherwise. Furthermore, “connected” or “coupled” as used herein may include wirelessly connected or coupled. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items. The steps of any disclosed method are not limited to the exact order stated herein, unless expressly stated otherwise.
[0279] It should be appreciated that reference throughout this specification to “one embodiment” or “an embodiment” or “an aspect” or features included as “may” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. Furthermore, the particular features, structures or characteristics may be combined as suitable in one or more embodiments of the disclosure. The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects.
[0280] The claims are not intended to be limited to the aspects shown herein but are to be accorded the full scope consistent with the language of the claims, wherein reference to an element in the singular is not intended to mean “one and only one” unless specifically so stated, but rather “one or more.” Unless specifically stated otherwise, the term “some” refers to one or more.
REFERENCES
[0281] [1] S. Haykin, “Adaptive Filter Theory,” 5th edition, Prentice Hall, 2013.
[0282] [2] A. Sayed, “Adaptive Filters,” IEEE Press, 2008.
[0283] [3] M. Clarksson, “Optimal and Adaptive Signal Processing,” CRC Press, 1993.
[0284] [4] J. Bitzer and K.U.Simmer, “Superdirective Microphone Arrays,” in “Microphone Arrays—Signal Processing Techniques,” M. Brandstein and D. Wards (Eds.), Springer-Verlag, 2001, Chapter 2.
[0285] EP3236672A1 (Oticon) 25.10.2017.