Hearing system comprising a personalized beamformer
11582562 · 2023-02-14
Assignee
Inventors
- Jesper Jensen (Smørum, DK)
- Nels Hede Rohde (Smørum, DK)
- Thomas Bentsen (Smørum, DK)
- Michael Syskind Pedersen (Smørum, DK)
- Svend Oscar PETERSEN (Smørum, DK)
Cpc classification
H04S7/30
ELECTRICITY
H04R2430/20
ELECTRICITY
H04R25/50
ELECTRICITY
H04R25/407
ELECTRICITY
H04R2225/43
ELECTRICITY
H04S2420/01
ELECTRICITY
H04R25/70
ELECTRICITY
H04R2201/40
ELECTRICITY
International classification
Abstract
A hearing system configured to be located at or in the head of a user, comprises a) at least two microphones providing at least two electric input signals, b) an own voice detector, c) access to a database (O.sub.l, H.sub.l) comprising c1) relative or absolute own voice transfer function(s), and corresponding c2) absolute or relative acoustic transfer functions for a multitude of test-persons, d) a processor connectable to the at least two microphones, to the own voice detector, and to the database. The processor is configured A) to estimate an own voice relative transfer function for sound from the user's mouth to at least one of the at least two microphones, and B) to estimate personalized relative or absolute head related acoustic transfer functions from at least one spatial location other than the user's mouth to at least one of the microphones of the hearing system in dependence of the estimated own voice relative transfer function(s) and the database (O.sub.l, H.sub.l). The hearing system further comprises e) a beamformer configured to receive the at least two electric input signals, or processed versions thereof, and to determine personalized beamformer weights based on the personalized relative or absolute head related acoustic transfer functions or impulse responses. A method of determining personalized beamformer coefficients (w.sub.k) is further disclosed.
Claims
1. A hearing system configured to be located at or in an ear or in the head at the ear of a user, the hearing system comprising at least two microphones, one of which being denoted the reference microphone, each for converting sound from the environment of the hearing system to an electric input signal representing said sound as received at the location of the microphone in question; an own voice detector configured to estimate whether or not, or with what probability, said at least two electric input signals, comprises a voice from the user of the hearing system, and to provide an own voice control signal indicative thereof; a memory wherein a database (Ol, Hl) of absolute or relative acoustic transfer functions or impulse responses, or a transformation thereof, for a multitude of test-persons are stored, or a transceiver allowing access to said database (Ol, Hl), the database (Ol, Hl) comprising for each of said multitude of test-persons a relative or absolute own voice transfer function or impulse response, or a transformation thereof, for sound from the mouth of a given test-person among said multitude of test-persons to at least one of the microphones of a microphone system worn by said given test-person, and a relative or absolute head related acoustic transfer function or impulse response, or a transformation thereof, from at least one spatial location other than the given test-person's mouth to at least one of the microphones of the microphone system worn by said given test-person; a processor connected or connectable to the at least two microphones, to said own voice detector, and to said database, the processor being configured to estimate an own voice relative transfer function for sound from the user's mouth to at least one of the at least two microphones in dependence of said at least two electric input signals, and of said own voice control signal, and to estimate personalized relative or absolute head related acoustic transfer functions or impulse responses, or a transformation thereof, from at least one spatial location other than the user's mouth to at least one of the microphones of said hearing system worn by said user in dependence of said estimated own voice relative transfer function(s) and said database (Ol, Hl); and a beamformer configured to receive said at least two electric input signals, and to determine personalized beamformer weights based on said personalized relative or absolute head related acoustic transfer functions or impulse responses, or a transformation thereof, wherein a transform thereof is a Fourier or inverse Fourier transformation, a cosine or sine transformation, or a Laplace transformation.
2. A hearing system according to claim 1 comprising a detector or estimator of a current signal quality in dependence of said at least two electric input signals.
3. A hearing system according to claim 2 comprising an SNR estimator (SNRE) for providing an estimate of signal to noise ratio.
4. A hearing system according to claim 1 wherein the microphone systems worn by said multitude of test-persons comprise microphones located at the same positions as the at least two microphones of the hearing system.
5. A hearing system according to claim 1 wherein the processor comprises a relative own voice transfer function estimator (ROVTE) for estimating a relative own voice transfer function vector OVTk,user whose elements are the relative transfer functions for sound from the user's mouth to each of the at least two microphones of the hearing system.
6. A hearing system according to claim 1 comprising an own-voice power spectral density estimator (OV-PSDE) configured to provide an estimate of the own-voice power spectral density vector Sk at a given point in time.
7. A hearing system according to claim 1 comprising a personalized head related transfer functions estimator (P-HRTF-E) for estimating said personalized relative or absolute head related acoustic transfer functions dk,user or impulse responses from said estimated own voice transfer function vector OVTk,user and said database (Ol, Hl).
8. A hearing system according to claim 6 wherein said relative own voice transfer function vector OVTk,user is estimated from the input own-voice power spectral density vector Sk as OVTk,user=sqrt(Sk/Sk,iref), where iref is the index of a reference microphone among said at least two microphones.
9. A hearing system according to claim 1 comprising a trained neural network for determining the personalized head related transfer functions using the estimated relative own voice transfer function vector OVTk,user as an input vector.
10. A hearing system according to claim 1 being constituted by or comprising a hearing aid, a headset, an earphone, an ear protection device or a combination thereof.
11. A method of estimating personalized beamformer weights for a hearing system comprising at least two of microphones, one of which being denoted the reference microphone, the hearing system being configured to be worn by a specific user, the method comprising, providing at least two electric signals representing sound in an environment of the user at a location of the microphones of the hearing system, the electric input signal from said reference microphone being denoted the reference microphone signal; providing an own voice control signal indicative of whether or not, or with what probability, said at least two electric input signals, comprises a voice from the user of the hearing system; and providing a database (Ol, Hl), or providing access to such database (Ol, Hl), of absolute or relative acoustic transfer functions or impulse responses, or a transformation thereof, for a multitude of test-persons other than said user, and for each of said multitude of test-persons providing in the database (Ol, Hl) a relative or absolute own voice transfer function or impulse response, or a transformation thereof, for sound from the mouth of a given test-person among said multitude of test-persons to at least one of the at least two microphones of a microphone system worn by said given test-person; and providing in the database (Ol, Hl) a relative or absolute head related acoustic transfer function or impulse response, or a transformation thereof, from at least one spatial location other than the given test-person's mouth to at least one of the microphones of a microphone system worn by said given test-person; estimating an own voice relative transfer function for sound from the user's mouth to at least one of the at least two microphones of the hearing system in dependence of said at least two electric input signals, and on said own voice control signal, and estimating personalized relative or absolute head related acoustic transfer functions or impulse responses, or a transformation thereof, from at least one spatial location other than the user's mouth to at least one of the microphones of said hearing system worn by said user in dependence of said estimated own voice relative transfer function and said database (Ol, Hl); and determining personalized beamformer weights (wk,user) for a beamformer configured to receive said at least two electric input signals, based on said personalized relative or absolute head related acoustic transfer functions (HRTFl*) or impulse responses (HRlRl*), or a transformation thereof, wherein a transform thereof is a Fourier or inverse Fourier transformation, a cosine or sine transformation, or a Laplace transformation.
12. A method according to claim 11 comprising wherein the beamformer is binaural beamformer based on electric input signals from said at least two microphones located at left as well as right ears of the user.
13. A method according to claim 11 comprising mapping said relative own voice transfer function (OVTuser) or impulse response to an absolute or relative own voice transfer function (OVTl*) or impulse response of a specific test-person l* among said multitude of test-persons from said database (Ol, Hl) according to a predefined criterion; and deriving estimated absolute or relative far-field head related transfer functions (HRTFuser) for said user in dependence of the absolute or relative far-field head related transfer functions (HRTFl*) for said specific test-person stored in said database (Ol, Hl).
14. A method according to claim 11 wherein the predefined criterion comprises minimization of a cost function.
15. A method according to claim 11 comprising providing a beamformed signal based on said personalized beamformer weights.
16. A hearing system configured to be located at or in an ear, or in the head at the ear of a user, the hearing system comprising at least two microphones, one of which being denoted the reference microphone, each for converting sound from the environment of the hearing system to an electric input signal representing said sound as received at the location of the microphone in question; an own voice detector configured to estimate whether or not, or with what probability, said at least two electric input signals, comprises a voice from the user of the hearing system, and to provide an own voice control signal indicative thereof; a memory wherein a database (Ol, Hl) of absolute or relative acoustic transfer functions or impulse responses, or a transformation thereof, for a multitude of test-persons are stored, or a transceiver allowing access to said database (Ol, Hl), the database (Ol, Hl) comprising for each of said multitude of test-persons a relative or absolute own voice transfer function or impulse response, or a transformation thereof, for sound from the mouth of a given test-person among said multitude of test-persons to at least one of the microphones of a microphone system worn by said given test-person, and a relative or absolute head related acoustic transfer function or impulse response, or a transformation thereof, from at least one spatial location other than the given test-person's mouth to at least one of the microphones of the microphone system worn by said given test-person; a processor connected or connectable to the at least two microphones, to said own voice detector, and to said database, the processor being configured to estimate an own voice relative transfer function for sound from the user's mouth to at least one of the at least two microphones in dependence of said at least two electric input signals, and of said own voice control signal, and to estimate personalized relative or absolute head related acoustic transfer functions or impulse responses, or a transformation thereof, from at least one spatial location other than the user's mouth to at least one of the microphones of said hearing system worn by said user in dependence of said estimated own voice relative transfer function(s) and said database (Ol, Hl), wherein a transform thereof is a Fourier or inverse Fourier transformation, a cosine or sine transformation, or a Laplace transformation.
17. A hearing system according to claim 16 comprising a signal processor configured to process said at least two electric signals in dependence of said estimated personalized relative or absolute head related acoustic transfer functions or impulse responses, or a transformation thereof.
18. A hearing system according to claim 17 wherein said signal processor is configured to at least two electric signals to process said at least two electric signals to compensate for a user's hearing impairment.
19. A method of estimating personalized relative or absolute head related acoustic transfer functions or impulse responses, or a transformation thereof, for a hearing system comprising at least two of microphones, one of which being denoted the reference microphone, the hearing system being configured to be worn by a specific user, the method comprising, providing at least two electric signals representing sound in an environment of the user at a location of the microphones of the hearing system, the electric input signal from said reference microphone being denoted the reference microphone signal; providing an own voice control signal indicative of whether or not, or with what probability, said at least two electric input signals, comprises a voice from the user of the hearing system; and providing a database (Ol, Hl), or providing access to such database (Ol, Hl), of absolute or relative acoustic transfer functions or impulse responses, or a transformation thereof, for a multitude of test-persons other than said user, and for each of said multitude of test-persons providing in the database (Ol, Hl) a relative or absolute own voice transfer function or impulse response, or a transformation thereof, for sound from the mouth of a given test-person among said multitude of test-persons to at least one of the at least two microphones of a microphone system worn by said given test-person; and providing in the database (Ol, Hl) a relative or absolute head related acoustic transfer function or impulse response, or a transformation thereof, from at least one spatial location other than the given test-person's mouth to at least one of the microphones of a microphone system worn by said given test-person; estimating an own voice relative transfer function for sound from the user's mouth to at least one of the at least two microphones of the hearing system in dependence of said at least two electric input signals, and on said own voice control signal, and estimating personalized relative or absolute head related acoustic transfer functions or impulse responses, or a transformation thereof, from at least one spatial location other than the user's mouth to at least one of the microphones of said hearing system worn by said user in dependence of said estimated own voice relative transfer function and said database (Ol, Hl), wherein a transform thereof is a Fourier or inverse Fourier transformation, a cosine or sine transformation, or a Laplace transformation.
20. A method according to claim 19 comprising processing said at least two electric signals in dependence of said estimated personalized relative or absolute head related acoustic transfer functions or impulse responses, or a transformation thereof.
Description
BRIEF DESCRIPTION OF DRAWINGS
(1) The aspects of the disclosure may be best understood from the following detailed description taken in conjunction with the accompanying figures. The figures are schematic and simplified for clarity, and they just show details to improve the understanding of the claims, while other details are left out. Throughout, the same reference numerals are used for identical or corresponding parts. The individual features of each aspect may each be combined with any or all features of the other aspects. These and other aspects, features and/or technical effect will be apparent from and elucidated with reference to the illustrations described hereinafter in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15) The figures are schematic and simplified for clarity, and they just show details which are essential to the understanding of the disclosure, while other details are left out. Throughout, the same reference signs are used for identical or corresponding parts.
(16) Further scope of applicability of the present disclosure will become apparent from the detailed description given hereinafter. However, it should be understood that the detailed description and specific examples, while indicating preferred embodiments of the disclosure, are given by way of illustration only. Other embodiments may become apparent to those skilled in the art from the following detailed description.
DETAILED DESCRIPTION OF EMBODIMENTS
(17) The detailed description set forth below in connection with the appended drawings is intended as a description of various configurations. The detailed description includes specific details for the purpose of providing a thorough understanding of various concepts. However, it will be apparent to those skilled in the art that these concepts may be practiced without these specific details. Several aspects of the apparatus and methods are described by various blocks, functional units, modules, components, circuits, steps, processes, algorithms, etc. (collectively referred to as “elements”). Depending upon particular application, design constraints or other reasons, these elements may be implemented using electronic hardware, computer program, or any combination thereof.
(18) The electronic hardware may include micro-electronic-mechanical systems (MEMS), integrated circuits (e.g. application specific), microprocessors, microcontrollers, digital signal processors (DSPs), field programmable gate arrays (FPGAs), programmable logic devices (PLDs), gated logic, discrete hardware circuits, printed circuit boards (PCB) (e.g. flexible PCBs), and other suitable hardware configured to perform the various functionality described throughout this disclosure, e.g. sensors, e.g. for sensing and/or registering physical properties of the environment, the device, the user, etc. Computer program shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise.
(19) The present application relates to the field of hearing devices, e.g. hearing aids.
(20)
(21) A. Measurement of own voice transfer functions (OVT) using a close-talk microphone located at the mouth of a user (cf. e.g.
(22) B. Mapping of OVTs to absolute or relative head related transfer functions (HRTFs).
(23) C. Computation of personalized beamformer coefficients from the HRTFs
(24)
(25) A. Estimation of relative own voice transfer functions (OVT) using own-voice detector and hearing aid (HA) microphones (cf.
(26) B. Mapping of OVTs to absolute or relative head related transfer functions (HRTFs).
(27) C. Computation of personalized beamformer coefficients from the HRTFs
(28) Absolute and Relative Own Voice Transfer Functions
(29) Let h.sub.i(n) denote an own-voice impulse response (OIR), i.e., the impulse response from a point just outside the mouth of hearing aid user (here the location of the ‘Close-talk microphone’) to the i'th microphone of the hearing aid (
(30)
(31) As a first step in the proposed method, the absolute or relative OVT must be estimated.
(32) The absolute OVT vector OVT′.sub.k, k=1, . . . , K may be estimated, e.g., at a hearing care professional (HCP) during a hearing aid (HA) fitting using a small voice sample: the hearing aid user wears the HAs and a mouth reference microphone (cf. ‘Close-talk microphone’ of
(33) Additionally, or alternatively, the relative OVT may be used. If the absolute OVT is measured at the HCP, then the relative OVT may easily be derived from the absolute OVT. Alternatively, the relative OVT may be estimated online during everyday use of the hearing aid as follows (see
(34)
(35)
where {circumflex over (R)}.sub.VV(k) is (an estimate of) the inter-microphone noise covariance matrix for the current acoustic environment, {circumflex over (d)}(k) is the estimated look vector (representing the inter-microphone transfer function for a target sound source at a given location), k is the frequency index and i.sub.ref is an index of the reference microphone (*denotes complex conjugate, and .sup.H denotes Hermitian transposition).
(36)
(37) The power spectral density (psd) of an audio signal is a representation of the distribution over frequencies of the energy of the signal (determined over a certain time range). A graph of power spectral density versus frequency may also be termed the ‘spectral energy distribution’. For a stochastic signal (e.g. some types of noise), the power spectral density is defined as the Fourier transform of its auto-correlation function. For an audio signal, a power spectral density may e.g. be based on a prior classification of the signal, e.g. classified as ‘speech’ or ‘noise’, e.g. using a voice activity detector. The power spectral density may e.g. be determined over the time frame of a syllable, a word, a sentence or longer periods of coherent speech. In the present context of own voice, a power spectral density may appropriately be related to time periods where an own voice detector indicates the presence of own voice (e.g. with a probability above a certain threshold, e.g. 70% or 80%).
(38) The block OVD represents an own-voice detection algorithm that continuously monitors if/when the hearing aid user speaks in a situation without too much background noise—this detection may be combined with other detectors, e.g. SNR detectors (SNRE) for robustness. The detection threshold is set such that only highly probable own-voice situations are detected—we are interested in detecting a few situations, e.g., one an hour or one every 6 hours, where own-voice is highly likely (in other words, the false-alarm rate should preferably be low). It is less important, if many own-voice situations go undetected.
(39) Let S.sub.k,i denote the power spectral density of the own voice signal picked up by microphone i at frequency k, and let S.sub.k=[S.sub.k,1 . . . S.sub.k,M] denote a vector of such power spectral densities. In a situation, where own-voice is detected with high likelihood, the relative OVT may be estimated as OVT.sub.k,user=sqrt(S.sub.k/S.sub.k,iref).
(40) The own-voice power spectral density estimator (OV-PSDE) provides an estimate of the own-voice power spectral density vector S.sub.k=[S.sub.k,1 S.sub.k,2] at a given point in time. The estimate is based on inputs from one or more detectors related to a current signal content of the first and second electric input signal (X.sub.1, X.sub.2). In the embodiment of
(41) The hearing device further comprises a relative OVT estimator (ROVTE) for estimating a relative transfer function vector OVT.sub.k,user. The elements of the relative own voice transfer function vector are the relative transfer functions for sound from the user's mouth to each of the microphones of the hearing device, estimated from the input own-voice power spectral density vector S.sub.k as OVT.sub.k,user=sqrt(S.sub.k/S.sub.k,iref), where iref is the index of the reference microphone. For the embodiment of
(42) The hearing device further comprises a personalized head related transfer functions estimator (P-HRTF-E) for estimating the personalized relative or absolute head related acoustic transfer functions d.sub.k,user or impulse responses from the estimated own voice transfer function vector OVT.sub.k,user and the database (O.sub.l, H.sub.l). An embodiment of the personalized head related transfer functions estimator (P-HRTF-E) is described in further detail in connection with
(43) Estimating Absolute or Relative HRTFs from Absolute or Relative OVTs
(44) We propose to estimate absolute/relative HRTFs from the absolute/relative OVT estimates described above.
(45) Absolute and Relative HRTFs:
(46) Let g.sub.i,j,l(n) denote an impulse response (head related impulse response (HRIR)) from a j.sup.th point in space (at ‘Speaker j’ in
(47) Using an identical procedure as for OIRs, HRIRs may be transformed into absolute HRTFs: let e′.sub.k,i,j,l, k=1, . . . , K denote a Fourier transform of the HRIR g.sub.i,j,l(n), where e′.sub.k,i,j,l is the absolute HRTF at frequency k, from spatial point j to microphone i, for the l.sup.th test subject. We may then form an absolute HRTF vector e′.sub.k,j,l=[e′.sub.k,0,j,l . . . e′.sub.k,M-1,j,l] and define a relative HRTF vector e.sub.k,j,l=e′.sub.k,j,l/e.sub.k,iref,j,l.
(48)
(49) A Priori Database of HRTF and OVT Pairs:
(50) We assume that a database of (O.sub.l, H.sub.l) pairs have been collected a priori for many test subjects. Here, O.sub.l denotes one or more or all (for all microphones) pre-measured OVT's for test subject l (for example stacked as a vector), and similarly, H.sub.l denotes one or more or all HRTFs for test subject l (for example stacked as a vector).
(51) For example, a could be the collection of absolute OVTs OVT′.sub.k,l, for all frequencies k=1, . . . , K and for all microphones for test subject l. As another example, O.sub.l could be defined as the relative transfer functions OVT.sub.k,l for one or some microphones for test subject l. Many other obvious variants exist (combinations of frequencies, absolute/relative OVTs, and microphone indices).
(52) Similarly, H.sub.l could be a collection of absolute HRTFs e′.sub.k,j,l, for all frequencies k=1, . . . , K from spatial point j. Alternatively, H.sub.l could represent a collection of absolute HRTFs for all frequencies k=1, . . . , K, and for all spatial points, j=1, . . . , J. Alternatively, H.sub.l could represent a collection of relative HRTFs for all frequencies, k=1, . . . , K, for a subset of spatial points and a subset of microphones. Many other obvious variants exist (combinations of frequencies, absolute/relative OVTs, spatial points, and microphone indices).
(53) As an alternative to having a (O.sub.l, H.sub.l) pair, the OVT could be substituted by a transfer function measured from a certain position, e.g. as described in EP2928215A1.
(54) Mapping from OVTs to HRTFs:
(55) Given the a priori database of (O.sub.l, H.sub.l)-pairs, l=1, . . . , L (where L is the number of test objects), there exist several ways of estimating the HRTF-information of the user, H.sub.user, from the users' OVT-information, O.sub.user. Note that the HRTF- and OVT-information of a particular user is unlikely to be present in the a priori database.
(56) Table Lookup Based Approach:
(57) The OVT-information of the user, measured either at the HCP or in the online procedure as outlined above, may be compared to each and every instance of O.sub.l, l=1, . . . , L, in the a priori database in order to find the data base entry, l*, for which O.sub.l matches O.sub.user best. For example, the least-square distance measure could be used. The corresponding estimate of the users personalized HRTF-information is then H.sub.l*, where
l*=argmin.sub.ld(O.sub.l,O.sub.user),
where d(⋅) is a distance measure between OVTs. Several different distance measures may be used, e.g. based on minimizing an Euclidean distance.
Statistical Model based Approach:
(58) Based on the a priori database, an a priori statistical model may be derived. In particular, if the (O.sub.l, H.sub.l)-pairs in the a priori data base are considered as realizations of random vector variables, then a joint probability density model f(O, H) may be fitted to the entries in the database, e.g., using a Gaussian Mixture Model or other parametric models. Given this statistical model and the estimated O.sub.user information of a particular user, for which an estimate of her HRTF information is desired, it is straightforward to compute minimum mean-square (mmse) estimates of the personal HRTF information:
H.sub.mmse=∫H*f(H|O.sub.user)dH,
where ∫ denotes a multi-dimensional integral across all dimensions in vector H, and where f(H|O) denotes a conditional power distribution function (pdf), which can be derived from the joint pdf model f(O,H). The integral may be evaluated numerically.
(59) Alternatively, a maximum a posteriori (map) probability estimate of H.sub.user may be found by maximization of the posterior probability:
HRTF.sub.map=max.sub.Hf(O.sub.user|H)*f(H),
where f(H) denotes a prior probability on the HRTF vector, which may, e.g., be chosen as a uniform distribution, f(H)=const. The maximization may be performed numerically.
Deep Neural Network Based Approach:
(60) Based on the a priori database, a Deep Neural Network may be trained in an offline procedure prior to deployment, using O.sub.l and H.sub.l as target outputs, respectively. The DNN may be a feedforward network (multi-layer perceptron), a recurrent network, a convolutional network, or combinations and variants thereof. The cost function optimized during training may be mean-square error between estimated and true HRTF vectors, etc.
(61) Finding Beamformer Coefficients from Estimated Personalized HRTFs:
(62) From the estimated personalized HRTF information, H.sub.est, it is straightforward to derive personalized beamformer coefficients. For example, if H.sub.est contains relative HRTFs e.sub.k,j, k=1, . . . , K, for a sound source from a frontal location (j) for two microphones in the same hearing aid, then the coefficients of a Minimum Variance Distortion-Less Response (MVDR) beamformer are given by
w.sub.k=(R.sub.vv,k).sup.−1e.sub.k,j/(e.sub.k,j.sup.T(R.sub.vv,k).sup.−1e.sub.k,j),
where (⋅).sup.−1 denotes matrix inversion, matrix R.sub.vv,k is a noise-cross power spectral density matrix [Loizou] for the microphones involved, and e.sub.k,j is the (2-element) relative HRTF vector related to a spatial point (j) in the frontal direction.
(63) Many other personalized beamformer variants, e.g., the Multi-Channel Wiener Filter [Brandstein], binaural beamformers (involving microphones in hearing aids on both ears) [Marquard], etc., may be derived from estimated personalized absolute/relative HRTF vectors.
(64) Extensions:
(65) Using HRIRs and ORIRs Rather than HRTFs and OVTs:
(66) The concept of the present disclosure is described in terms of OVTs and HRTFs. It is, however, straightforward to exchange these quantities with the time-domain analogies, OIRs and HRIRs and perform a mapping from OIRs (estimated from a voice sample of the user, either at the HCP or in an “online” approach) to HRIRs.
(67) Detecting Implausible OVTs:
(68) Performing the mapping from personalized OVTs to personalized HRTFs using the “Table-based Approach” involves the computation of distances between an estimated personal OVT and OVTs of test subjects which have been measured and stored up-front in the a priori data base. The computed minimum distance may be used to estimate the reliability of the OVT measurement. Specifically, if the minimum distance exceeds a pre-specified threshold, the OVT measurement may be labeled as potentially unreliable (e.g., due to noise, reverberation, or other problems during the OVT estimation process).
(69)
(70) The OV-RTF could be estimated in a controlled setup, where the user is prompted to speak. Alternatively, the OV-RTF could be estimated/updated, when OV is detected. The OV detector could be based on acoustic features, or alternatively/in addition on other features such as detected jaw vibrations (from an accelerometer or a vibration microphone) or based on individual features such as pitch frequency. The OV detection may also be based on results from another hearing instrument.
(71) The validation procedure is exemplified with OV as an example. However, the described validation method may also be used to validate other impulse response measurements, such as a measured 0 degrees (frontal) impulse response (e.g. measured as described in EP2928215A1).
(72)
(73)
(74) Instead of measuring transfer functions at different frequencies, corresponding impulse responses (OIR, HRIR) may be measured and converted to the frequency domain by an appropriate transformation (e.g. a Fourier transformation algorithm, e.g. a discrete Fourier transformation (DFT) algorithm).
(75)
(76)
(77) In the embodiment of a hearing device in
(78) The substrate (SUB) further comprises a configurable signal processor (DSP, e.g. a digital signal processor), e.g. including a processor for applying a frequency and level dependent gain, e.g. providing beamforming, noise reduction, filter bank functionality, and other digital functionality of a hearing device, e.g. implementing features according to the present disclosure. The configurable signal processor (DSP) is adapted to access the memory (MEM) e.g. for selecting appropriate parameters for a current configuration or mode of operation and/or listening situation and/or for writing data to the memory (e.g. algorithm parameters, e.g. for logging user behavior) and/or for accessing the database (O.sub.l, H.sub.l) of absolute or relative acoustic transfer functions or impulse responses according to the present disclosure. The configurable signal processor (DSP) is further configured to process one or more of the electric input audio signals and/or one or more of the directly received auxiliary audio input signals, based on a currently selected (activated) hearing aid program/parameter setting (e.g. either automatically selected, e.g. based on one or more sensors, or selected based on inputs from a user interface). The mentioned functional units (as well as other components) may be partitioned in circuits and components according to the application in question (e.g. with a view to size, power consumption, analogue vs. digital processing, acceptable latency, etc.), e.g. integrated in one or more integrated circuits, or as a combination of one or more integrated circuits and one or more separate electronic components (e.g. inductor, capacitor, etc.). The configurable signal processor (DSP) provides a processed audio signal, which is intended to be presented to a user. The substrate further comprises a front-end IC (FE) for interfacing the configurable signal processor (DSP) to the input and output transducers, etc., and typically comprising interfaces between analogue and digital signals (e.g. interfaces to microphones and/or loudspeaker(s), and possibly to sensors/detectors). The input and output transducers may be individual separate components, or integrated (e.g. MEMS-based) with other electronic circuitry.
(79) The hearing device (HD) further comprises an output unit (e.g. an output transducer) providing stimuli perceivable by the user as sound based on a processed audio signal from the processor or a signal derived therefrom. In the embodiment of a hearing device in
(80) The electric input signals (from input transducers M.sub.BTE1, M.sub.BTE2, M.sub.ITE) may be processed in the time domain or in the (time-) frequency domain (or partly in the time domain and partly in the frequency domain as considered advantageous for the application in question).
(81) All three (M.sub.BTE1, M.sub.BTE2, M.sub.ITE) or two of the three microphones (M.sub.BTE1, M.sub.ITE) may be included in the ‘personalization’-procedure for head related transfer functions according to the present disclosure. The ‘front’-BTE-microphone (M.sub.BTE1) may be selected as a reference microphone, and the ‘rear’-BTE-microphone (M.sub.BTE2) and/or the ITE-microphone (M.sub.ITE) may be selected as normal microphones for which relative own-voice transfer functions can be measured by the hearing device. Since, relative to the hearing device user's own voice, the hearing device microphones (M.sub.BTE1, M.sub.BTE2, M.sub.ITE) are located in the acoustic near-field, a relatively large level difference may be experienced for the own voice sound receive at the respective microphones. Thus, the relative transfer functions may be substantially different from 1.
(82) In the embodiment of
(83) The embodiment of a hearing device (HD) exemplified in
(84)
(85) In an aspect, the present application proposes an offline or online procedure for estimating personalized beamformer coefficients for a particular user from information regarding personal own-voice-transfer function(s). The procedure comprises:
(86) A. Measurement of own voice transfer function(s), using microphones located at an ear of the user, and optionally a close-talk microphone located at the mouth of a user;
(87) B. Mapping of the measured own voice transfer function(s) to a set of absolute or relative head related transfer functions;
(88) C. Computation of personalized beamformer coefficients from the set of head related transfer functions.
(89) In an embodiment, a method of estimating personalized beamformer weights for a hearing system comprising a multitude of microphones, one of which being denoted the reference microphone, the hearing system being configured to be worn by a specific user is provided. The method comprises S1. providing at least two electric signals representing sound in an environment of the user at a location of the microphones of the hearing system, the electric input signal from said reference microphone being denoted the reference microphone signal; S2. providing an own voice control signal indicative of whether or not, or with what probability, said at least two electric input signals, or a processed version thereof, comprises a voice from the user of the hearing system, and; S3. providing a database (O.sub.l, H.sub.l), or providing access to such database (O.sub.l, H.sub.l), of absolute or relative acoustic transfer functions or impulse responses, or any transformation thereof, for a multitude of test-persons other than said user, and for each of said multitude of test-persons S3a. providing in the database (O.sub.l, H.sub.l) a relative or absolute own voice transfer function or impulse response, or any transformation thereof, for sound from the mouth of a given test-person among said multitude of test-persons to at least one of a multitude of microphones of a microphone system worn by said given test-person, and S3b. providing in the database (O.sub.l, H.sub.l) a relative or absolute head related acoustic transfer function or impulse response, or any transformation thereof, from at least one spatial location other than the given test-person's mouth to at least one of the microphones of a microphone system worn by said given test-person; S4. estimating an own voice relative transfer function for sound from the user's mouth to at least one of the at least two microphones of the hearing system in dependence of said at least two electric input signals, or a processed version thereof, and on said own voice control signal, and S5. estimating personalized relative or absolute head related acoustic transfer functions or impulse responses from at least one spatial location other than the user's mouth to at least one of the microphones of said hearing system worn by said user in dependence of said estimated own voice relative transfer function and said database (O.sub.l, H.sub.l); and S6. determining personalized beamformer weights (w.sub.k,user) for a beamformer configured to receive said at least two electric input signals, or processed versions thereof, based on said personalized relative or absolute head related acoustic transfer functions (HRTF.sub.l*) or impulse responses (HRIR.sub.l*).
(90) It is intended that the structural features of the devices described above, either in the detailed description and/or in the claims, may be combined with steps of the method, when appropriately substituted by a corresponding process.
(91) As used, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well (i.e. to have the meaning “at least one”), unless expressly stated otherwise. It will be further understood that the terms “includes,” “comprises,” “including,” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will also be understood that when an element is referred to as being “connected” or “coupled” to another element, it can be directly connected or coupled to the other element but an intervening element may also be present, unless expressly stated otherwise. Furthermore, “connected” or “coupled” as used herein may include wirelessly connected or coupled. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items. The steps of any disclosed method is not limited to the exact order stated herein, unless expressly stated otherwise.
(92) It should be appreciated that reference throughout this specification to “one embodiment” or “an embodiment” or “an aspect” or features included as “may” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. Furthermore, the particular features, structures or characteristics may be combined as suitable in one or more embodiments of the disclosure. The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects.
(93) The claims are not intended to be limited to the aspects shown herein but are to be accorded the full scope consistent with the language of the claims, wherein reference to an element in the singular is not intended to mean “one and only one” unless specifically so stated, but rather “one or more.” Unless specifically stated otherwise, the term “some” refers to one or more.
(94) Accordingly, the scope should be judged in terms of the claims that follow.
REFERENCES
(95) [Moore; 2019] A. H. Moore, J. M. de Haan, M. S. Pedersen, P. A. Naylor, M. Brookes, and J. Jensen, Personalized Signal-Independent Beamforming for Binaural Hearing Aids, J. Acoust. Soc. Am., Vol. 145, No. 5, pp. 2971-2981, April 2019. [Loizou; 2013] P. C. Loizou, Speech Enhancement—Theory and Practice, CRC Press, 2nd edition, 2013. [Brandstein; 2001] M. Brandstein, D. Ward (Eds.), Microphone Arrays—Signal Processing Techniques and Applications, Springer 2001. [Marquardt; 2015] Development and evaluation of psychoacoustically motivated binaural noise reduction and cue preservation techniques, PhD Thesis, University of Oldenburg, Germany, November 2015. EP2928215A1 (Oticon A/S) 7 Oct. 2015