HEARING SYSTEM COMPRISING A DATABASE OF ACOUSTIC TRANSFER FUNCTIONS
20230054213 · 2023-02-23
Assignee
Inventors
- Jan M. De Haan (Smorum, DK)
- Jesper Jensen (Smorum, DK)
- Michael Syskind Pedersen (Smorum, DK)
- Svend FELDT (Ballerup, DK)
- Stig PETRI (Ballerup, DK)
- Jakob Sloth LAURIDSEN (Smorum, DK)
Cpc classification
H04S2420/01
ELECTRICITY
H04S7/302
ELECTRICITY
H04R25/70
ELECTRICITY
H04R25/407
ELECTRICITY
H04R25/554
ELECTRICITY
International classification
Abstract
A hearing system comprises a) a multitude of M of microphones providing M corresponding electric input signals x.sub.m(n), m=1, ..., M, and n representing time, b) a processor connected to said multitude of microphones and providing a processed signal in dependence thereof, c) an output unit for providing an output signal in dependence of said processed signal, and d) a database (Θ) comprising a dictionary (Δ.sub.pd) of previously determined acoustic transfer function vectors (ATF.sub.pd). The processor is configured A) to determine a constrained estimate of a acoustic transfer function vector (ATF.sub.pd,.sub.cur) in dependence of said M electric input signals and said dictionary (Δ.sub.pd), B) to determine an unconstrained estimate of a current acoustic transfer function vector (ATF.sub.uc,.sub.cur) in dependence of said M electric input signals, and C) to determine a resulting acoustic transfer function vector (ATF*) for a user of the hearing system in dependence thereof and of a confidence measure related to said electric input signals. A method of operating a hearing device is also disclosed. Thereby an improved noise reduction system for a hearing aid or headset may be provided.
Claims
1. A hearing system configured to be worn by a user, the hearing system comprising a microphone system comprising a multitude of M of microphones, where M is larger than or equal to two, the microphone system being adapted for picking up sound from the environment and to provide M corresponding electric input signals x.sub.m(n), m=1, ..., M, and n representing time, the environment sound at an m.sup.th microphone comprising a target sound signal propagated from a target sound source to the m.sup.th microphone of the hearing system when worn by the user, and a processor connected to said multitude of microphones, the processor being configured to process said M electric input signals and to provide a processed signal in dependence thereof, and an output unit for providing an output signal in dependence of said processed signal, a database (Θ) comprising a dictionary (Δ.sub.pd) of previously determined acoustic transfer function vectors (ATF.sub.pd), whose elements ATF.sub.pd,.sub.m, m=1, ..., M, are frequency dependent acoustic transfer functions representing location-dependent (θ) and frequency dependent (k) propagation of sound from a location (θ.sub.j) of the target sound source to each of said M microphones, k being a frequency index, k=1, ..., K, where K is a number of frequency bands, when said microphone system is mounted on a head at or in an ear of a natural or artificial person, and wherein said dictionary Δ.sub.pd comprises acoustic transfer function vectors for said natural or for said artificial person for a multitude (J) of different locations θ.sub.j, j=1, ..., J, relative to the microphone system; wherein the processor is configured to determine a constrained estimate of a current acoustic transfer function vector (ATF.sub.pd,.sub.cur) in dependence of current values of said M electric input signals and said dictionary (Δ.sub.pd) of previously determined acoustic transfer function vectors (ATF.sub.pd), to determine an unconstrained estimate of a current acoustic transfer function vector (ATF.sub.uc,.sub.cur) in dependence of said current values of said M electric input signals, and to determine a resulting acoustic transfer function vector (ATF*) for said user in dependence of said constrained estimate of a current acoustic transfer function vector (ATF.sub.pd,.sub.cur), said unconstrained estimate of a current acoustic transfer function vector (ATF.sub.uc,.sub.cur), and of a confidence measure related to said current values of said M electric input signals; and to provide said processed signal in dependence of said resulting acoustic transfer function vector (ATF*) for said user.
2. A hearing system according to claim 1 wherein said hearing system is configured to determine said confidence measure comprising at least one of a target-signal-quality-measure indicative of a signal quality of a current target signal from said target sound source in dependence of at least one of said current values of said M electric input signals, or a signal or signals originating therefrom; respective acoustic-transfer-function-vector-matching-measures indicative of a degree of matching of said constrained estimate and said unconstrained estimate of a current acoustic transfer function vector (ATF.sub.pd,.sub.cur, ATF.sub.uc,.sub.cur), respectively, considering the current values of said M electric input signals; and a target-sound-source-location-identifier indicative of a location of, or proximity of, the current target sound source relative to the user.
3. A hearing system according to claim 2 comprising a target signal quality estimator configured to provide said target-signal-quality-measure indicative of a signal quality of a target signal from said target sound source in dependence of at least one of said current values of said M electric input signals, or a signal or signals originating therefrom.
4. A hearing system according to claim 2 comprising an ATF-vector-comparator configured to provide an acoustic-transfer-function-vector-matching-measure indicative of a degree of matching of the constrained estimate and the unconstrained estimate of a current acoustic transfer function vector (ATF.sub.pd,.sub.cur, ATF.sub.uc,.sub.cur), respectively, wherein the ATF-vector-comparator is configured to apply a vector distance measure, e.g. an Euclidian distance, to the respective ATF-vectors.
5. A hearing system according to claim 2 comprising a location estimator configured to provide said target-sound-source-location-identifier.
6. A hearing system according to claim 2 wherein the unconstrained estimate of the current acoustic transfer function vector (ATF.sub.uc,.sub.cur) is used as the resulting acoustic transfer function vector (ATF*) for said user, if a first criterion depending on said target-signal-quality-measure is fulfilled.
7. A hearing system according to claim 2 wherein the unconstrained estimate of the current acoustic transfer function vector (ATF.sub.uc,.sub.cur) is used as the resulting acoustic transfer function vector (ATF*) for said user, if a first criterion depending on said acoustic-transfer-function-vector-matching-measures is fulfilled.
8. A hearing system according to claim 2 wherein said resulting acoustic transfer function vector (ATF*) for said user is determined as a mixture of said constrained estimate of the current acoustic transfer function vector (ATF.sub.pd,.sub.cur) and said unconstrained estimate of the current acoustic transfer function vector (ATF.sub.uc,.sub.cur) in dependence of said target signal quality measure and/or said acoustic-transfer-function-vector-matching-measure.
9. A hearing system according to claim 1 wherein the database (Θ) comprises a sub-dictionary (Δ.sub.pd,.sub.std) of previously determined, standard acoustic transfer function vectors (ATF.sub.pd,.sub.std).
10. A hearing system according to claim 1 wherein the unconstrained estimate of the current acoustic transfer function vector (ATF.sub.uc,.sub.cur) is stored in a sub-dictionary (Δ.sub.pd,.sub.tr) of said database, if a second criterion is fulfilled.
11. A hearing system according to claim 1 wherein the unconstrained estimate of the current acoustic transfer function vector (ATF.sub.uc,.sub.cur) is assigned a target location (θ*.sub.j) in dependence of its proximity to the existing dictionary elements (ATF.sub.pd(θ.sub.j)).
12. A hearing system according to claim 1 wherein a target location (θ*) of the target sound source of current interest to the user is independently estimated for the unconstrained estimate of the current acoustic transfer function vector (ATF.sub.uc,.sub.cur).
13. A hearing system according to claim 1 wherein the previously determined acoustic transfer function vectors (ATF.sub.pd) of the dictionary (Δ.sub.pd) are ranked in dependence of their frequency of use.
14. A hearing system according to claim 1 wherein the acoustic transfer function vectors (ATF) of the database (Θ) are or comprise relative acoustic transfer function vectors (RATF).
15. A hearing system according to claim 1 wherein the output unit comprises an output transducer configured to provide a stimulus perceivable by the user as an acoustic signal in dependence of the processed signal.
16. A hearing system according to claim 1 wherein the output unit comprises a transmitter for transmitting the processed signal to another device or system.
17. A hearing system according to claim 1 comprising at least one hearing device configured to be worn on the head at or in an ear of a user of the hearing system.
18. A hearing system according to claim 17 wherein the hearing device is constituted by or comprises an air-conduction type hearing aid, a bone-conduction type hearing aid, a cochlear implant type hearing aid, or a combination thereof.
19. A hearing system according to claim 1 being constituted by or comprising a hearing aid or a headset, or a combination thereof.
20. A hearing system according to claim 1 being constituted by or comprising left and right hearing devices and comprising antenna and transceiver circuitry configured to allow an exchange of data between the left and right hearing devices.
21. A hearing system according to claim 20 wherein the unconstrained estimate of the current acoustic transfer function vector (ATF.sub.uc,.sub.cur) is determined in each of the left and right hearing devices and stored in said database(s) jointly in dependence of a common criterion regarding at least one of said target signal quality measure(s), said acoustic-transfer-function-vector-matching-measure, and said target-sound-source-location-identifier.
22. A hearing system according to claim 1 wherein said confidence measure is related to the target sound signal impinging on said microphone system.
23. A method of operating a hearing system, comprising at least one hearing device configured to be worn on the head at or in an ear of a user, the hearing system comprising a microphone system comprising a multitude of M of microphones, where M is larger than or equal to two, the microphone system being adapted for picking up sound from the environment, and an output unit for providing an output signal in dependence of a processed signal, the method comprising providing M electric input signals representing sound in the environment at an m.sup.th microphone and comprising a target sound signal propagated from a target sound source to the m.sup.th microphone of the hearing aid when worn by the user, and processing said M electric input signals to provide said processed signal in dependence thereof, and providing a database Θ comprising a dictionary Δ.sub.pd of previously determined acoustic transfer function vectors (ATF.sub.pd), whose elements ATF.sub.pr,.sub.m, m=1, ..., M, are frequency dependent acoustic transfer functions representing location-dependent (θ), and frequency dependent (k) propagation of sound from a location (θ.sub.j) of a target sound source to each of said M microphones, k being a frequency index, k=1, ..., K, where K is a number of frequency bands, when said microphone system is mounted on a head at or in an ear of a natural or artificial person, and wherein said dictionary Δ.sub.pd comprises acoustic transfer function vectors for said natural or for said artificial person for a multitude (J) of different locations θ.sub.j,j=1, ..., J, relative to the microphone system ; determining a constrained estimate of a current acoustic transfer function vector (ATF.sub.pd,.sub.cur) in dependence of current values of said M electric input signals and said dictionary Δ.sub.pd of previously determined acoustic transfer function vectors (ATF.sub.pd); determining an unconstrained estimate of a current acoustic transfer function vector (ATF.sub.uc,.sub.cur) in dependence of current values of said M electric input signals; and determining a resulting acoustic transfer function vector (ATF*) in dependence of said constrained estimate of a current acoustic transfer function vector (ATF.sub.pd,.sub.cur); said unconstrained estimate of a current acoustic transfer function vector (ATF.sub.uc,.sub.cur); and of of a confidence measure related to current values of said M electric input signals; and providing said processed signal in dependence of said resulting acoustic transfer function vector (ATF*).
24. A method according to claim 23 wherein said confidence measure is determined by said hearing system and comprises at least one of a target-signal-quality-measure indicative of a signal quality of a current target signal from said target sound source in dependence of current values of at least one of said M electric input signals or a signal or signals originating therefrom; respective acoustic-transfer-function-vector-matching-measures indicative of a degree of matching of said constrained estimate and said unconstrained estimate of a current acoustic transfer function vector (ATF.sub.pd,.sub.cur, ATF.sub.uc,.sub.cur), respectively, considering the current values of said M electric input signals; and a target-sound-source-location-identifier indicative of a location of, or proximity of, the current target sound source relative to the user.
25. A non-transitory computer readable medium storing a computer program comprising instructions which, when the program is executed by a computer, cause the computer to carry out the method of claim 23.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0117] The aspects of the disclosure may be best understood from the following detailed description taken in conjunction with the accompanying figures. The figures are schematic and simplified for clarity, and they just show details to improve the understanding of the claims, while other details are left out. Throughout, the same reference numerals are used for identical or corresponding parts. The individual features of each aspect may each be combined with any or all features of the other aspects. These and other aspects, features and/or technical effect will be apparent from and elucidated with reference to the illustrations described hereinafter in which:
[0118]
[0119]
[0120]
[0121]
[0122]
[0123]
[0124]
[0125]
[0126]
[0127] The figures are schematic and simplified for clarity, and they just show details which are essential to the understanding of the disclosure, while other details are left out. Throughout, the same reference signs are used for identical or corresponding parts.
[0128] Further scope of applicability of the present disclosure will become apparent from the detailed description given hereinafter. However, it should be understood that the detailed description and specific examples, while indicating preferred embodiments of the disclosure, are given by way of illustration only. Other embodiments may become apparent to those skilled in the art from the following detailed description.
DETAILED DESCRIPTION OF EMBODIMENTS
[0129] The detailed description set forth below in connection with the appended drawings is intended as a description of various configurations. The detailed description includes specific details for the purpose of providing a thorough understanding of various concepts. However, it will be apparent to those skilled in the art that these concepts may be practiced without these specific details. Several aspects of the apparatus and methods are described by various blocks, functional units, modules, components, circuits, steps, processes, algorithms, etc. (collectively referred to as “elements”). Depending upon particular application, design constraints or other reasons, these elements may be implemented using electronic hardware, computer program, or any combination thereof.
[0130] The electronic hardware may include micro-electronic-mechanical systems (MEMS), integrated circuits (e.g. application specific), microprocessors, microcontrollers, digital signal processors (DSPs), field programmable gate arrays (FPGAs), programmable logic devices (PLDs), gated logic, discrete hardware circuits, printed circuit boards (PCB) (e.g. flexible PCBs), and other suitable hardware configured to perform the various functionality described throughout this disclosure, e.g. sensors, e.g. for sensing and/or registering physical properties of the environment, the device, the user, etc. Computer program shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise.
[0131] The present disclosure relates to a wearable hearing system comprising one or more hearing devices, e.g. headsets or hearing aids. The present disclosure relates in particular to individualization of a multi-channel noise reduction system exploiting and extending a database comprising a dictionary of acoustic transfer functions, e.g. relative acoustic transfer functions (RATF).
[0132] The human ability to spatially localize a sound source is to a large extent dependent on perception of the sound at both ears. Due to different physical distances between the sound source and the left and right ears, a difference in time of arrival of a given wavefront of the sound at the left and right ears is experienced (the Interaural Time Difference, ITD). Consequently, a difference in phase of the sound signal (at a given point in time) will likewise be experienced and in particular perceivable at relatively low frequencies (e.g. below 1500 Hz). Due to the shadowing effect of the head (diffraction), a difference in level of the received sound signal at the left and right ears is likewise experienced (the Interaural Level Difference, ILD). The attenuation by the head (and body) is larger at relatively higher frequencies (e.g. above 1500 Hz). The detection of the cues provided by the ITD and ILD largely determine our ability to localize a sound source in a horizontal plane (i.e. perpendicular to a longitudinal direction of a standing person). The diffraction of sound by the head (and body) is described by the Head Related Transfer Functions (HRTF). The HRTF for the left and right ears ideally describe respective transfer functions from a sound source (from a given location) to the ear drums of the left and right ears. If correctly determined, the HRTFs provide the relevant ITD and ILD between the left and right ears for a given direction of sound relative to the user’s ears. Such HRTF.sub.left and HRTF.sub.right are preferably applied to a sound signal received by a left and right hearing assistance device in order to improve a user’s sound localization ability (cf. e.g. Chapter 14 of [Dillon; 2001]).
[0133] Several methods of generating HRTFs are known. Standard HRTFs from a dummy head can e.g. be provided, as e.g. derived from the KEMAR HRTF database of [Gardner and Martin, 1994] and applied to sound signals received by left and right hearing assistance devices of a specific user. Alternatively, a direct measurement of the user’s HRTF, e.g. during a fitting session can - in principle - be performed, and the results thereof be stored in a memory of the respective (left and right) hearing assistance devices. During use, e.g. in case the hearing assistance device is of the Behind The Ear (BTE) type, where the microphone(s) that pick up the sound typically are located near the top of (and often, a little behind) pinna, a direction of impingement of the sound source may be determined by each device, and the respective relative HRTFs applied to the (raw) microphone signal to (re)establish the relevant localization cues in the signal presented to the user, cf. e.g. EP2869599A1.
[0134] An essential part of a multi-channel noise reduction systems (such as minimum variance distortionless response (MVDR), Multichannel Wiener Filter (MWF), etc.) in hearing devices is to have access to relative acoustic transfer function RATF for the source of interest. Any mismatch between the true RATF and the RATF employed in the noise reduction system may lead to distortion and/or suppression of the signal of interest.
[0135] A first method (‘Method 1’) to find the RATF that is associated with the source signal of interest is the selection of a RATF from a dictionary of plausible (previously determined) RATFs. This method is referred to as constrained maximum likelihood RATF estimation [1,2].
[0136] For all the (previously determined (pd)) RATFs (RATF.sub.pd) in the database, the likelihood that a source of interest can be associated with a specific RATF is calculated based on the microphone input(s). The RATF (among the multitude of RATFs (RATF.sub.pd) of the data base) which is associated with the maximum likelihood is then selected as the current acoustic transfer function (RATF.sub.pd,.sub.cur) for the current electric input signal(s).
[0137] The advantage of this (first) method is good performance even in acoustic environments of poor target signal quality (e.g. low SNR) because the selected RATF (RATF.sub.pd,.sub.cur) is always a plausible RATF. Another advantage is that prior information may be used for the RATF selection, for example if some target directions are more likely than others (e.g. in dependence of a sensor or detector, e.g. an own voice detector, e.g. in case the user’s own voice is the target signal).
[0138] The disadvantage is that the dictionary elements need to be known beforehand and are typically measured on a mannequin (e.g. a head and torso model). Even though the RATFs (RATF.sub.pd,.sub.std) measured on the mannequin are plausible, they may differ from the true RATFs due to differences in the acoustics due to difference in the wearer’s anatomy, and/or device placement.
[0139] The second method (‘Method 2’) of RATF estimation is unconstrained which means that any RATF may be estimated from the input data. A maximum likelihood estimator is e.g. provided by the covariance whitening method (see e.g. [3,4]). The second, unconstrained RATF estimation method may e.g. comprise an estimator of the noisy input- and noise-only-covariance matrices, where the latter requires a target speech activity detector (to separate noise-only parts from noisy parts). Furthermore, the method may comprise an eigenvalue decomposition of the noise-only covariance matrix which is used to “whiten” the noisy input covariance matrix. The results may finally be used to compute the maximum likelihood estimate of the RATF. Any RATF may be found by this method, under the condition that the target signal is active in the input signals. Unconstrained HRTFs, e.g. RATFs, of a binaural hearing system, e.g. a binaural hearing aid system, for given electric input signals from microphones of the system may e.g. be determined as discussed in EP2869599A1.
[0140] The advantage of this (second) method is that an accurate estimate of the RATF can be found at high SNR, more accurately than with the constrained ML method (dictionary method), since it is not constrained to a finite/discrete set of dictionary elements. Further, the unconstrained acoustic transfer functions are personalized, in that they are estimated while the user wears the hearing system.
[0141] A disadvantage is that less accurate estimates are obtained in low SNR due to estimation errors, as compared to the constrained method, because the unconstrained method does not employ the prior knowledge that the RATF in question is related to a human head/mannequin - in other words, it could produce estimates which are not physically plausible.
[0142] The present disclosure proposes to combine these two methods (‘Method 1’, ‘Method 2’) into a hybrid method, in such a way that their advantages are harvested, and their disadvantages avoided.
[0143] Consider a RATF estimator that uses a pre-calibrated dictionary (cf. e.g. Δ.sub.pd in
[0144] Under certain conditions (see example below) this more accurate RATF, estimated at high SNRs, can be stored as a new dictionary element which will then be available in ‘Method 1’ as a plausible RATF. We will refer to these dictionary elements as ‘trained’ (cf. e.g. Δ.sub.pd and (dashed) arrow ATF.sub.uc,.sub.cur from controller (CTR3) to the data base (MEM [DB]) in
[0145] The dictionary elements that are allowed to be updated can be regarded as additional dictionary elements, i.e. a base of dictionary elements (cf. e.g. Δ.sub.pd,.sub.td in
[0146] The dictionary elements may be updated jointly in both of a left and a right hearing instrument (of a binaural hearing system). A database adapted to the particular location of the left hearing device of a binaural hearing aid system (on the user’s head) may be stored in the left hearing device. Likewise, a database adapted to the particular location of the right hearing device of a binaural hearing aid system (on the user’s head) may be stored in the right hearing device. A database located in a separate device (e.g. a processing device in communication with the left and right hearing devices) may comprise a set of dictionary elements for the left hearing device and a corresponding set of dictionary elements for the right hearing device.
[0147] The RATFs estimated by the unconstrained method (and stored in the additional dictionary (Δ.sub.pd,.sub.tr)) may (or may not) be assigned to a target location, e.g. depending on the proximity to the existing dictionary elements (which may (typically) be related to a specific target location (cf. e.g. θ.sub.j). The distance may e.g. be determined as or based on the mean-squared error (MSE), or other distance measures allowing a ranking of vectors in order of similarity (proximity).
[0148] Instead of (or in addition to) assigning a location to the personalized additional dictionary elements (ATF.sub.pd,.sub.tr) of the sub-dictionary (Δ.sub.pd,.sub.tr), the processor may be configured to log a frequency of use of these vectors to allow a ‘ranking’ of their use to be made. Thereby an improved scheme for storing new dictionary elements in the sub-dictionary (Δ.sub.pd,.sub.tr) can be provided. The lowest ranking elements may e.g. be deleted, when a certain number of elements have been included in the personalized sub-dictionary (Δ.sub.pd,.sub.tr). Thereby a qualified criterion is provided to limit the number of additional elements in the personalized sub-dictionary (Δ.sub.pd,.sub.tr).
[0149] The previously determined acoustic transfer function vectors (ATF.sub.pd) of the dictionary (Δ.sub.pd) may generally be ranked in dependence of their frequency of use, e.g. in that the processor logs a frequency of use of the vectors. The processor may e.g. be configured to log a frequency of use of the previously determined (standard) dictionary elements (ATF.sub.pd,.sub.std) of the sub-dictionary (Δ.sub.pd,.sub.std). A comparison of the frequency of use of corresponding dictionary elements of the standard and personalized sub-dictionaries (Δ.sub.pd,.sub.std, Δ.sub.pd,.sub.tr) can be provided (e.g. logged). Based thereon conclusions regarding the relevance of the standard and/or personalized elements can be drawn. Elements concluded to be irrelevant may e.g. be deleted (either in an automatic process (e.g. the lowest ranking, e.g. above a certain number of stored elements, or manually, e.g. by the user or by a hearing care professional).
[0150]
[0151] A dictionary Δ.sub.pd of absolute and/or relative transfer functions may be determined as indicated in
[0152]
[0153] To determine the relative acoustic transfer functions (RATF), e.g. RATF-vectors d.sub.θ, of the dictionary Δ.sub.pd, from the corresponding absolute acoustic transfer functions (AATF), H.sub.θ, the element of RATF-vector (d.sub.θ) for the m.sup.th microphone and direction (θ) is d.sub.m(k, θ) = H.sub.m(θ, k)/H.sub.i(θ, k), where H.sub.i(θ,k) is the (absolute) acoustic transfer function from the given location (θ) to a reference microphone (m=i) among the M microphones of the microphone system (e.g. of a hearing instrument, or a binaural hearing system), and H.sub.m(θ,k) is the (absolute) acoustic transfer function from the given location (θ) to the m.sup.th microphone. Such absolute and relative transfer functions (for a given artificial (e.g. a mannequin) or natural person (e.g. the user or (typically) other person)) can be estimated (e.g. in advance of the use of the hearing aid system) and stored in the dictionary Δ.sub.pd as indicated above. The resulting (absolute) acoustic transfer function (AATF) vector H.sub.θ for sound from a given location (θ) to a hearing instrument or hearing system comprising M microphones may be written as
[0154] The corresponding relative acoustic transfer function (RATF) vector d.sub.θ from this location may be written as
where, d.sub.i(k,θ)=1.
Target Estimation in Hearing Aids
[0155] Classical hearing aid beamformers assume that the target of interest is in front of the hearing aid user. Beamformer systems may perform better in terms of target loss and thereby provide an SNR improvement for the user if they have access to accurate estimates of the target location.
[0156] The proposed method may use predetermined (standard) dictionary (vector) elements (ATF.sub.pd,.sub.std) measured on a mannequin (e.g. the Head and Torso Simulator (HATS) 4128C from Brüel & Kjær Sound & Vibration Measurement A/S, or the head and torso model KEMAR from GRAS Sound and Vibration A/S, or similar) as a baseline (e.g. stored in dictionary Δ.sub.pd,.sub.std of the database Θ). The proposed method may further estimate more accurate (unconstrained) dictionary (vector) elements (ATF.sub.uc,.sub.cur) (e.g. RATFs) in good SNR (as estimated by an SNR estimator) and store them as dictionary elements (ATF.sub.pd,.sub.tr) given certain conditions (e.g. in a dictionary Δ.sub.pd,.sub.tr of the database Θ).
[0157] An advantage is that this method can accommodate for individual acoustic properties as well as replacement effects, in both good and less good input SNR scenarios.
[0158] Example of usage in hearing aid application: A base dictionary (Δ.sub.pd,.sub.std) may be given by 48 plausible RATF vectors (RATF.sub.pd,.sub.std) describing relative transfer functions of hearing aid microphones, measured on a HATS in the horizontal plane with 7.5 degrees interval (cf. e.g.
Own Voice Enhancement in Headsets (or Hearing Aids)
[0159] Beamforming is used in headsets to enhance the user’s own voice in communication scenarios - hence, in this situation, the user’s own voice is the signal of interest to be retrieved by a beamforming system. Microphones can be mounted at different locations in the headset. For example, multiple microphones may be mounted on a boom-arm pointing at the user’s mouth, and/or multiple microphones may be mounted inside and outside of small in-ear headsets (or earpieces).
[0160] The RATFs which are needed for own voice capture may be affected by acoustic variations, such as: Individual user acoustic properties (as opposed to HATS in a calibration situation), microphone location variations due to boom arm placement, and human head movements (for example jaw movements affecting microphones placed in the ear canal).
[0161] A baseline dictionary may contain RATFs measured on a HATS in a standard boom arm placement and in a set of representative boom arm placements. The extended dictionary elements can then accommodate (for an individual user) variations and replacement variations related to the actual wearing situation, for example if the boom arm is outside the expected range of variations.
[0162] In a hearing aid, estimation of the user’s own voice may also be of interest in a communication mode of operation, e.g. for transmission to a far-end communication partner (when using the hearing aid in a headset- or telephone-mode). Also, estimation of the user’s own voice may be of interest in a hearing aid in connection with a voice control interface, where the user’s own voice may be analysed in a keyword detector or by a speech recognition algorithm.
Hybrid Method Operation
[0163] The RATF estimator may operate in different ways: [0164] 1. Switch between dictionary (constrained) method (‘Method 1’) and unconstrained method (‘Method 2’). Thereby we allow any RATF under certain pre-defined conditions (decision rationale). [0165] 2. Always use dictionary method (‘Method 1’). Thereby we ensure that only dictionary elements are used, either pre-calibrated or trained
Rationale for Updating a Trained Dictionary Element
[0166] In order to update a trainable dictionary element, the method needs a rationale. A straightforward rationale is when the target signal is available in good quality, e.g. when the (target) signal-to-noise-ratio (SNR) is sufficiently high, e.g. larger than a threshold value (SNR.sub.TH). A, preferably reliable/robust, target signal quality estimator, e.g. an SNR estimator may provide this. The Power Spectral Density (PSD) estimators provided by the maximum likelihood (ML) methods of e.g. [2] and [5] may e.g. be used to determine the SNR. US20190378531A1 teaches SNR-estimation.
[0167] Furthermore, the rationale may include the likelihood (cf. e.g. p(ATF.sub.uc,.sub.cur) in
[0168] The rationale may also be related to other detection algorithms, e.g., voice activity detection (VAD) algorithms, see [4] for an example (no update unless clear voice activity is detected), sound pressure level estimators (no update unless sound pressure level is within reasonable range for noise-free speech, e.g., between 55 and 70 dB SPL, cf. e.g. signal voice activity control signal (V-NV) from voice activity detector (VAD) to the controller (CTR) in
[0169] A criterion for determining whether or not an estimated HRTF is plausible may be established (e.g. does it correspond to a likely direction; is within a reasonable range of values, etc.), e.g. relying on an own voice detector (OVD), or a proximity detector, or a direction-of-arrival (DOA) detector. Hereby an estimated HRTF may be dis-qualified, if it is not likely (and hence not used or not stored).
Binaural Devices
[0170] With one device on each ear, for example hearing aids and in-ear headsets, we may exploit a binaural decision rationale for updating a trainable dictionary element.
[0171] The update criterion may be a binaural criterion, also taking into account that e.g. an otherwise plausible 45 degree HRTF is not plausible if the contralateral HRTF-angle does not correspond to a similar direction. Such differences may indicate that the hearing instruments are not correctly mounted (see also section on ‘user feedback’ below).
[0172] Comparing estimated left and right angles may e.g. reveal if the angle related to the dictionary elements agree on both sides. It could be that the angles are systematically shifted by a few degrees when comparing the left and right angles. This may indicate that the mounted instruments are not pointing towards the same direction. This bias may be taken into account when assigning the dictionary elements.
User Feedback on Device Usage
[0173] If there is a large difference between trained elements (cf. e.g. ATF.sub.pd,.sub.tr in
[0174] It may also imply problems with microphones, for example in the case of dust or dirt in the microphone inlets.
[0175] Also, in the case of unexpected deviations in the binaural case, the user can be informed about possible problems with the device.
Relation to “Head Dictionaries”
[0176] In our co-pending European patent application number EP20210249.7 filed with European patent office on Nov. 27, 2020 and having the title “A hearing aid system comprising a database of acoustic transfer functions”, it is proposed to include dictionaries of head related transfer functions for different heads (e.g. different users, sizes, forms, etc., cf. e.g.
[0177]
[0178] The exemplary contents of the database Θ are illustrated in the upper right part of
[0179] The location of the sound source (S, or loudspeaker symbol in
Exemplary Embodiments of a Hearing Device
[0180]
[0181] The hearing device (HD) further comprises a database Θ stored in memory (MEM [DB]). The database Θ comprises a dictionary Δ.sub.pd of stored acoustic transfer function vectors (ATF.sub.pd), whose elements ATF.sub.pd,.sub.m, m=1, ..., M, are frequency dependent acoustic transfer functions representing location-dependent (θ) and frequency dependent (k) propagation of sound from a location (θ.sub.j) of a target sound source to each of said M microphones, k being a frequency index, k=1, ..., K, where K is a number of frequency bands. The stored acoustic transfer function vectors (ATF.sub.pd(θ, k)) may e.g. be determined in advance of use of the hearing device, while the microphone system (M.sub.1, ..., M.sub.M) is mounted on a head at or in an ear of a natural or artificial person (preferably as it is when the hearing system/device is operationally worn for normal use by the user), e.g. gathered in a standard dictionary Δ.sub.pd,.sub.std). The (or some of the) stored acoustic transfer function vectors (ATF.sub.pd) may e.g. be updated during use of the hearing device (where the user wears the microphone system (M.sub.1, ..., M.sub.M)), or a further dictionary (Δ.sub.pd,.sub.tr) comprising said updated or ‘trained’ acoustic transfer function vectors (determined by the unconstrained method, and evaluated to be reliable (e.g. by fulfilling a target signal quality criterion)) may be generated during use of the hearing system. The dictionary Δ.sub.pd comprises standard acoustic transfer function vectors (ATF.sub.pd,.sub.std) for the natural or artificial person (e.g. grouped in dictionary Δ.sub.pd,.sub.std) and, optionally, trained acoustic transfer function vectors (ATF.sub.pd,.sub.tr) (e.g. grouped in dictionary Δ.sub.pd,.sub.tr), for a multitude (J) of different locations θ'j, j=1, ..., J′, relative to the microphone system (see
[0182] The hearing device (HD), e.g. the controller (CTR), is configured to determine a constrained estimate of a current acoustic transfer function vector (ATF.sub.pd,.sub.cur) in dependence of said M electric input signals and said dictionary Δ.sub.pd of stored acoustic transfer function vectors (ATF.sub.pd,.sub.td, and optionally ATF.sub.pd,.sub.tr, cf.
[0183] The database is in the embodiment of
[0184] In the embodiment of
[0185]
[0186] The embodiment of
[0187] In the embodiment of
[0188] In the embodiment of
[0189] The embodiment of
[0190] The embodiment of
[0191] The embodiment of
[0192] The controller (CTR) is connected to the database (MEM [DB]), cf. signal ATF, and configured to determine the constrained estimate of a current acoustic transfer function vector (ATF.sub.pd,.sub.cur) in dependence of the M electric input signals and the dictionary Δ.sub.pd of stored acoustic transfer function vectors (ATF.sub.pd, and optionally ATF.sub.pd,tr, cf.
[0193] In the embodiments of
[0194]
[0195] The hearing system, e.g. the processor (PRO), may comprise a multitude of M of analysis filter banks (FBAm, m=1, ..., M) for converting the time domain electric input signals (x.sub.1, ..., x.sub.M) to electric signals (X.sub.1, ..., X.sub.M) in a time frequency representation (k, /).
[0196] The hearing system, e.g. the processor (PRO), comprises a controller (CTR1) configured to determine a constrained estimate of a current acoustic transfer function vector (ATF.sub.pd,cur) in dependence of the M electric input signals (X.sub.1, ..., X.sub.M) and the dictionary (Δ.sub.pd) of previously determined acoustic transfer function vectors (ATF.sub.pd) stored in the database (Θ, MEM [DB]) via signal ATF. The database may form part of the at least one hearing device (HD), e.g. of the processor (PRO), or be accessible to the processor, e.g. via wireless link. The controller (CTR1) is further configured to provide an estimate of the reliability (p(ATF.sub.pd,.sub.cur)) of the constrained estimate of the current acoustic transfer function vector (ATF.sub.pd,.sub.cur).The reliability may e.g. be provided in the form of an acoustic-transfer-function-vector-matching-measure indicative of a degree of matching of the constrained estimate of the current acoustic transfer function vector (ATF.sub.pd,.sub.cur) considering the current electric input signals. The reliability may e.g. be related to how well the constrained estimate of the current acoustic transfer function vector (ATF.sub.pd,cur) matches the current electric input signals in a maximum likelihood sense (see e.g. EP3413589A1).
[0197] The hearing system, e.g. the processor (PRO), comprises a controller (CTR2) configured to determine an unconstrained estimate of a current acoustic transfer function vector (ATF.sub.uc,.sub.cur) in dependence of the M electric input signals (X.sub.1, ..., X.sub.M). The controller (CTR2) is further configured to provide an estimate of the reliability (p(ATF.sub.uc,.sub.cur)), e.g. in the form of a probability) of the unconstrained estimate of the current acoustic transfer function vector (ATF.sub.uc,.sub.cur). The reliability may e.g. be provided in the form of an acoustic-transfer-function-vector-matching-measure indicative of a degree of matching of the unconstrained estimate of the current acoustic transfer function vector (ATF.sub.uc,.sub.cur) considering the current electric input signals. The reliability may e.g. be related to how well the unconstrained estimate of the current acoustic transfer function vector (ATF.sub.uc,.sub.cur) matches the current electric input signals in a maximum likelihood sense (see e.g. [4]).
[0198] The hearing system, e.g. the processor (PRO), comprises a target signal quality estimator (TQM-E, e.g. a target signal to noise (SNR) estimator, see e.g. SNRE in
[0199] The hearing system, e.g. the processor (PRO), comprises a controller (CTR3) configured to determine a resulting acoustic transfer function vector (ATF*) for the user in dependence of a) the constrained estimate of the current acoustic transfer function vector (ATF.sub.pd,.sub.cur), b) the unconstrained estimate of the current acoustic transfer function vector (ATF.sub.uc,.sub.cur), and of c) at least one of c1) the acoustic-transfer-function-vector-matching-measure (p(ATF.sub.pd,.sub.cur)) indicative of a degree of matching of the constrained estimate (ATF.sub.pd,.sub.cur), of c2) the acoustic-transfer-function-vector-matching-measure p(ATF.sub.uc,.sub.cur)) of the unconstrained estimate (ATF.sub.uc,.sub.cur), and of c3) a target-sound-source-location-identifier (TSSLI) indicative of a location of, direction to, or proximity of, the current target sound source.
[0200] The hearing system, e.g. the processor (PRO), may comprise a location estimator (LOCE) connected to one or more of the electric input signals (here X.sub.1, ..., X.sub.m), or to a signal or signals derived therefrom. The location estimator (LOCE) may e.g. be configured to provide the target-sound-source-location-identifier (TSSLI) in dependence of an own voice detector configured to estimate whether or not (or with what probability) a given input sound (e.g. a voice, e.g. speech) originates from the voice of the user of the wearable hearing system (e.g. the hearing device), e.g. in dependence of at least one of said M electric input signals or a signal or signals originating therefrom. If own voice is detected (or detected with a high probability) in the electric input signal(s), and if own voice is assumed to be the target signal (e.g. in a communication mode of operation) the target source location is the user’s mouth (and all other locations around the user can be ignored (or have less probability) in relation to determination of an appropriate current acoustic transfer function. The location estimator (LOCE) may e.g. be configured to provide the target-sound-source-location-identifier (TSSLI) in dependence of a direction of arrival estimator configured to estimate a direction of arrival of a current target sound source, e.g. in dependence of at least one of said M electric input signals or a signal or signals originating therefrom. Thereby acoustic transfer functions associated with locations within an angular range of the estimated direction of the location estimator may be associated with a higher probability than other transfer functions. The location estimator (LOCE) may e.g. be configured to provide the target-sound-source-location-identifier (TSSLI) in dependence of a proximity detector configured to estimate a distance to a current target sound source, e.g. in dependence of at least one of the M electric input signals or a signal or signals originating therefrom, or in dependence of a distance sensor or detector. Thereby appropriate acoustic transfer functions associated with locations around the user that are within a range of the estimated distance of the location estimator may be associated with a higher probability than other transfer functions.
[0201] The hearing system, e.g. the processor (PRO), comprises an audio signal processing part (SP) configured to provide the processed signal (OUT) in dependence of the resulting acoustic transfer function vector (ATF*) for the user. The signal audio signal processing part (SP) may e.g. comprise a beamformer (cf. BF in
[0202] The controller (CTR) in
[0203] The hearing device (HD), e.g. a hearing aid, of
[0204] The synthesis filter bank (FBS) is configured to convert a number of frequency sub-band signals (OUT) to one time-domain signal (out). The signal processor (SP) is configured to apply one or more processing algorithms to the electric input signals (e.g. beamforming and compressive amplification) and to provide a processed output signal (OUT; out) for presentation to the user via an output unit (OU), e.g. an output transducer. The output unit is configured to a) convert a signal representing sound to stimuli perceivable by the user as sound (e.g. in the form of vibrations in air, or vibrations in bone, or as electric stimuli of the cochlear nerve) or to b) transmit the processed output signal (out) to another device or system.
[0205] The processor (PRO) and the signal processor (SP) may form part of the same digital signal processor (or be independent units). The analysis filter banks (FB-A1, FB-A2), the processor (PRO), the signal processor (SP), the synthesis filter bank (FBS), the controller (CTR), the target signal quality estimator (TQME; SNR-E), the voice activity detector (VAD), the target-sound-source-location-identifier (TSSLI), and the memory (MEM [DB]) may form part of the same digital signal processor (or be independent units).
[0206] The hearing device may comprise a transceiver allowing an exchange of data with another device, e.g. a contra-lateral hearing device of a binaural hearing system, a smartphone or any other portable or stationary device or system. The database may be located in the other device. Likewise, the processor PRO (or a part thereof) may be located in the other device (e.g. a dedicated processing device).
[0207]
[0208] The beamformers (BF) and (OV-BF) are connected to an acoustic transfer function estimator (ATFE) for providing the current acoustic transfer function vector (ATF*) in dependence of the current electric input signals (and possible sensors or detectors) according to the present invention. In a communication mode (e.g. telephone mode) of operation, the own-voice beamformer (OV-BF) is activated and the current acoustic transfer function vector (ATF*) is an own voice acoustic transfer function (ATF*.sub.ov), determined when the user speaks. In a non-communication mode of operation, the environment beamformer (BF) is activated and the current acoustic transfer function vector (ATF*) is an environment acoustic transfer function (ATF*.sub.env) (e.g. determined when the user does not speak). Likewise, in a communication mode wherein the environment beamformer is activated, the environment acoustic transfer function (ATF*.sub.env) may be determined from the electric input signals (X.sub.1, X.sub.2) when the user’s voice is not present (e.g. when the far-end communication partner speaks).
[0209]
[0210] The control unit (CONT) is configured to dynamically control the processing of the SSP- and MSP-signal processing units (G1 and G2, respectively), e.g. based on one or more control input signals (not shown).
[0211] The input signals (S-IN, M-IN) to the headset (HD) may be presented in the (time-) frequency domain or converted from the time domain to the (time-) frequency domain by appropriate functional units, e.g. included in receiver unit (Rx) and input unit (IU) of the headset. A headset according to the present disclosure may e.g. comprise a multitude of time to time time-frequency conversion units (e.g. one for each input signal that is not otherwise provided in a time-frequency representation, e.g. in the form of analysis filter bank units (FB-Am, m=1, ..., M) of
[0212]
[0213] In the embodiment of a hearing device in
[0214] The hearing system (here, the hearing device HD) may further comprise a detector unit e.g. comprising one or more inertial measurement units (IMU), e.g. a 3D gyroscope, a 3D accelerometer and/or a 3D magnetometer, here denoted IMU.sub.1 and located in the BTE-part (BTE). Inertial measurement units (IMUs), e.g. accelerometers, gyroscopes, and magnetometers, and combinations thereof, are available in a multitude of forms (e.g. multi-axis, such as 3D-versions), e.g. constituted by or forming part of an integrated circuit, and thus suitable for integration, even in miniature devices, such as hearing devices, e.g. hearing aids. The sensor IMU.sub.1 may thus be located on the substrate (SUB) together with other electronic components (e.g. MEM, FE, DSP). One or more movement sensors (IMU) may alternatively or additionally be located in or on the ITE part (ITE) or in or on the connecting element (IC), e.g. used to pick up sound from the user’s mouth (own voice).
[0215] The hearing device (HD) further comprises an output unit (e.g. an output transducer) providing stimuli perceivable by the user as sound based on a processed audio signal from the processor or a signal derived therefrom. In the embodiment of a hearing device in
[0216] The electric input signals (from input transducers M.sub.BTE1, M.sub.BTE2, M.sub.BTE3, M.sub.1 M.sub.2, M.sub.3, IMU.sub.1) may be processed in the time domain or in the (time-) frequency domain (or partly in the time domain and partly in the frequency domain as considered advantageous for the application in question).
[0217] The hearing device (HD) exemplified in
[0218] In the above description and examples, focus has been made on wearable hearing devices associated with a particular person. The inventive ideas of the present disclosure (to select a predetermined acoustic transfer function from a dictionary (constrained method) OR to estimate a new acoustic transfer function (un-constrained method) in dependence of a confidence parameter, e.g. regarding the quality of a current target signal, or the location of the audio source of current interest to the user) may, however, further be applied to hearing devices associated with a particular acoustic environment, e.g. of a particular location where the hearing device is located, e.g. a particular room. An example of such device may be a speakerphone configured to pick up sound from audio sources (e.g. one or more persons speaking) located in the particular room, and to (e.g. process and) transmit the captured sound to one or more remote listeners. The speakerphone may further be configured to play sound received from the one or more remote listeners to allow persons located in the particular room to hear it. Instead of being adapted to and adapting to a particular person, acoustic transfer functions of the speakerphone (or other audio device) may be adapted to the particular room.
[0219] It is intended that the structural features of the devices described above, either in the detailed description and/or in the claims, may be combined with steps of the method, when appropriately substituted by a corresponding process.
[0220] As used, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well (i.e. to have the meaning “at least one”), unless expressly stated otherwise. It will be further understood that the terms “includes,” “comprises,” “including,” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will also be understood that when an element is referred to as being “connected” or “coupled” to another element, it can be directly connected or coupled to the other element but an intervening element may also be present, unless expressly stated otherwise. Furthermore, “connected” or “coupled” as used herein may include wirelessly connected or coupled. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items. The steps of any disclosed method is not limited to the exact order stated herein, unless expressly stated otherwise.
[0221] It should be appreciated that reference throughout this specification to “one embodiment” or “an embodiment” or “an aspect” or features included as “may” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. Furthermore, the particular features, structures or characteristics may be combined as suitable in one or more embodiments of the disclosure. The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects.
[0222] The claims are not intended to be limited to the aspects shown herein but are to be accorded the full scope consistent with the language of the claims, wherein reference to an element in the singular is not intended to mean “one and only one” unless specifically so stated, but rather “one or more.” Unless specifically stated otherwise, the term “some” refers to one or more.
REFERENCES
[0223] M. Zohourian, G. Enzner, and R. Martin, “Binaural Speaker Localization Integrated Into an Adaptive Beamformer for Hearing Aids,” IEEE TASLP, vol. 26, no. 3, pp. 515-528, March 2018. [0224] Hao Ye and D. DeGroat, “Maximum likelihood DOA estimation and asymptotic Cramer-Rao bounds for additive unknown colored noise,” IEEE Transactions on Signal Processing, vol. 43, no. 4, pp. 938-949, April 1995. [0225] S. Markovich-Golan and S. Gannot, “Performance analysis of the covariance subtraction method for relative transfer function estimation and comparison to the covariance whitening method,” in IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2015, pp. 544-548. [0226] P. Hoang, Z.-H. Tan, J.M. de Haan and J. Jensen, “Joint maximum likelihood estimation of power spectral densities and relative acoustic transfer function for acoustic beamforming,” in IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2021 (to be published). [0227] J. Jensen and M. S. Pedersen, “Analysis of Beamformer Directed Single-Channel Noise Reduction System for Hearing Aid Applications”, in IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2015. [0228] EP3413589A1 (Oticon) Dec. 12, 2018. [0229] [Gardner and Martin; 1994] B. Gardner, K. Martin, "HRTF Measurements of a KEMAR Dummy-Head Microphone. MIT Media Lab Machine Listening Group, Technical Report #280, 1-7, 1994 [0230] [Dillon; 2001] Dillon H. (2001), Hearing Aids, Thieme, New York-Stuttgart, 2001. [0231] EP2869599A1 (Oticon) May 06, 2015. [0232] US20190378531A1 (Oticon) Dec. 12, 2019. [0233] EP3236672A1 (Oticon) Oct. 25, 2017