Hearing system comprising a personalized beamformer

11582562 · 2023-02-14

Assignee

Inventors

Cpc classification

International classification

Abstract

A hearing system configured to be located at or in the head of a user, comprises a) at least two microphones providing at least two electric input signals, b) an own voice detector, c) access to a database (O.sub.l, H.sub.l) comprising c1) relative or absolute own voice transfer function(s), and corresponding c2) absolute or relative acoustic transfer functions for a multitude of test-persons, d) a processor connectable to the at least two microphones, to the own voice detector, and to the database. The processor is configured A) to estimate an own voice relative transfer function for sound from the user's mouth to at least one of the at least two microphones, and B) to estimate personalized relative or absolute head related acoustic transfer functions from at least one spatial location other than the user's mouth to at least one of the microphones of the hearing system in dependence of the estimated own voice relative transfer function(s) and the database (O.sub.l, H.sub.l). The hearing system further comprises e) a beamformer configured to receive the at least two electric input signals, or processed versions thereof, and to determine personalized beamformer weights based on the personalized relative or absolute head related acoustic transfer functions or impulse responses. A method of determining personalized beamformer coefficients (w.sub.k) is further disclosed.

Claims

1. A hearing system configured to be located at or in an ear or in the head at the ear of a user, the hearing system comprising at least two microphones, one of which being denoted the reference microphone, each for converting sound from the environment of the hearing system to an electric input signal representing said sound as received at the location of the microphone in question; an own voice detector configured to estimate whether or not, or with what probability, said at least two electric input signals, comprises a voice from the user of the hearing system, and to provide an own voice control signal indicative thereof; a memory wherein a database (Ol, Hl) of absolute or relative acoustic transfer functions or impulse responses, or a transformation thereof, for a multitude of test-persons are stored, or a transceiver allowing access to said database (Ol, Hl), the database (Ol, Hl) comprising for each of said multitude of test-persons a relative or absolute own voice transfer function or impulse response, or a transformation thereof, for sound from the mouth of a given test-person among said multitude of test-persons to at least one of the microphones of a microphone system worn by said given test-person, and a relative or absolute head related acoustic transfer function or impulse response, or a transformation thereof, from at least one spatial location other than the given test-person's mouth to at least one of the microphones of the microphone system worn by said given test-person; a processor connected or connectable to the at least two microphones, to said own voice detector, and to said database, the processor being configured to estimate an own voice relative transfer function for sound from the user's mouth to at least one of the at least two microphones in dependence of said at least two electric input signals, and of said own voice control signal, and to estimate personalized relative or absolute head related acoustic transfer functions or impulse responses, or a transformation thereof, from at least one spatial location other than the user's mouth to at least one of the microphones of said hearing system worn by said user in dependence of said estimated own voice relative transfer function(s) and said database (Ol, Hl); and a beamformer configured to receive said at least two electric input signals, and to determine personalized beamformer weights based on said personalized relative or absolute head related acoustic transfer functions or impulse responses, or a transformation thereof, wherein a transform thereof is a Fourier or inverse Fourier transformation, a cosine or sine transformation, or a Laplace transformation.

2. A hearing system according to claim 1 comprising a detector or estimator of a current signal quality in dependence of said at least two electric input signals.

3. A hearing system according to claim 2 comprising an SNR estimator (SNRE) for providing an estimate of signal to noise ratio.

4. A hearing system according to claim 1 wherein the microphone systems worn by said multitude of test-persons comprise microphones located at the same positions as the at least two microphones of the hearing system.

5. A hearing system according to claim 1 wherein the processor comprises a relative own voice transfer function estimator (ROVTE) for estimating a relative own voice transfer function vector OVTk,user whose elements are the relative transfer functions for sound from the user's mouth to each of the at least two microphones of the hearing system.

6. A hearing system according to claim 1 comprising an own-voice power spectral density estimator (OV-PSDE) configured to provide an estimate of the own-voice power spectral density vector Sk at a given point in time.

7. A hearing system according to claim 1 comprising a personalized head related transfer functions estimator (P-HRTF-E) for estimating said personalized relative or absolute head related acoustic transfer functions dk,user or impulse responses from said estimated own voice transfer function vector OVTk,user and said database (Ol, Hl).

8. A hearing system according to claim 6 wherein said relative own voice transfer function vector OVTk,user is estimated from the input own-voice power spectral density vector Sk as OVTk,user=sqrt(Sk/Sk,iref), where iref is the index of a reference microphone among said at least two microphones.

9. A hearing system according to claim 1 comprising a trained neural network for determining the personalized head related transfer functions using the estimated relative own voice transfer function vector OVTk,user as an input vector.

10. A hearing system according to claim 1 being constituted by or comprising a hearing aid, a headset, an earphone, an ear protection device or a combination thereof.

11. A method of estimating personalized beamformer weights for a hearing system comprising at least two of microphones, one of which being denoted the reference microphone, the hearing system being configured to be worn by a specific user, the method comprising, providing at least two electric signals representing sound in an environment of the user at a location of the microphones of the hearing system, the electric input signal from said reference microphone being denoted the reference microphone signal; providing an own voice control signal indicative of whether or not, or with what probability, said at least two electric input signals, comprises a voice from the user of the hearing system; and providing a database (Ol, Hl), or providing access to such database (Ol, Hl), of absolute or relative acoustic transfer functions or impulse responses, or a transformation thereof, for a multitude of test-persons other than said user, and for each of said multitude of test-persons providing in the database (Ol, Hl) a relative or absolute own voice transfer function or impulse response, or a transformation thereof, for sound from the mouth of a given test-person among said multitude of test-persons to at least one of the at least two microphones of a microphone system worn by said given test-person; and providing in the database (Ol, Hl) a relative or absolute head related acoustic transfer function or impulse response, or a transformation thereof, from at least one spatial location other than the given test-person's mouth to at least one of the microphones of a microphone system worn by said given test-person; estimating an own voice relative transfer function for sound from the user's mouth to at least one of the at least two microphones of the hearing system in dependence of said at least two electric input signals, and on said own voice control signal, and estimating personalized relative or absolute head related acoustic transfer functions or impulse responses, or a transformation thereof, from at least one spatial location other than the user's mouth to at least one of the microphones of said hearing system worn by said user in dependence of said estimated own voice relative transfer function and said database (Ol, Hl); and determining personalized beamformer weights (wk,user) for a beamformer configured to receive said at least two electric input signals, based on said personalized relative or absolute head related acoustic transfer functions (HRTFl*) or impulse responses (HRlRl*), or a transformation thereof, wherein a transform thereof is a Fourier or inverse Fourier transformation, a cosine or sine transformation, or a Laplace transformation.

12. A method according to claim 11 comprising wherein the beamformer is binaural beamformer based on electric input signals from said at least two microphones located at left as well as right ears of the user.

13. A method according to claim 11 comprising mapping said relative own voice transfer function (OVTuser) or impulse response to an absolute or relative own voice transfer function (OVTl*) or impulse response of a specific test-person l* among said multitude of test-persons from said database (Ol, Hl) according to a predefined criterion; and deriving estimated absolute or relative far-field head related transfer functions (HRTFuser) for said user in dependence of the absolute or relative far-field head related transfer functions (HRTFl*) for said specific test-person stored in said database (Ol, Hl).

14. A method according to claim 11 wherein the predefined criterion comprises minimization of a cost function.

15. A method according to claim 11 comprising providing a beamformed signal based on said personalized beamformer weights.

16. A hearing system configured to be located at or in an ear, or in the head at the ear of a user, the hearing system comprising at least two microphones, one of which being denoted the reference microphone, each for converting sound from the environment of the hearing system to an electric input signal representing said sound as received at the location of the microphone in question; an own voice detector configured to estimate whether or not, or with what probability, said at least two electric input signals, comprises a voice from the user of the hearing system, and to provide an own voice control signal indicative thereof; a memory wherein a database (Ol, Hl) of absolute or relative acoustic transfer functions or impulse responses, or a transformation thereof, for a multitude of test-persons are stored, or a transceiver allowing access to said database (Ol, Hl), the database (Ol, Hl) comprising for each of said multitude of test-persons a relative or absolute own voice transfer function or impulse response, or a transformation thereof, for sound from the mouth of a given test-person among said multitude of test-persons to at least one of the microphones of a microphone system worn by said given test-person, and a relative or absolute head related acoustic transfer function or impulse response, or a transformation thereof, from at least one spatial location other than the given test-person's mouth to at least one of the microphones of the microphone system worn by said given test-person; a processor connected or connectable to the at least two microphones, to said own voice detector, and to said database, the processor being configured to estimate an own voice relative transfer function for sound from the user's mouth to at least one of the at least two microphones in dependence of said at least two electric input signals, and of said own voice control signal, and to estimate personalized relative or absolute head related acoustic transfer functions or impulse responses, or a transformation thereof, from at least one spatial location other than the user's mouth to at least one of the microphones of said hearing system worn by said user in dependence of said estimated own voice relative transfer function(s) and said database (Ol, Hl), wherein a transform thereof is a Fourier or inverse Fourier transformation, a cosine or sine transformation, or a Laplace transformation.

17. A hearing system according to claim 16 comprising a signal processor configured to process said at least two electric signals in dependence of said estimated personalized relative or absolute head related acoustic transfer functions or impulse responses, or a transformation thereof.

18. A hearing system according to claim 17 wherein said signal processor is configured to at least two electric signals to process said at least two electric signals to compensate for a user's hearing impairment.

19. A method of estimating personalized relative or absolute head related acoustic transfer functions or impulse responses, or a transformation thereof, for a hearing system comprising at least two of microphones, one of which being denoted the reference microphone, the hearing system being configured to be worn by a specific user, the method comprising, providing at least two electric signals representing sound in an environment of the user at a location of the microphones of the hearing system, the electric input signal from said reference microphone being denoted the reference microphone signal; providing an own voice control signal indicative of whether or not, or with what probability, said at least two electric input signals, comprises a voice from the user of the hearing system; and providing a database (Ol, Hl), or providing access to such database (Ol, Hl), of absolute or relative acoustic transfer functions or impulse responses, or a transformation thereof, for a multitude of test-persons other than said user, and for each of said multitude of test-persons providing in the database (Ol, Hl) a relative or absolute own voice transfer function or impulse response, or a transformation thereof, for sound from the mouth of a given test-person among said multitude of test-persons to at least one of the at least two microphones of a microphone system worn by said given test-person; and providing in the database (Ol, Hl) a relative or absolute head related acoustic transfer function or impulse response, or a transformation thereof, from at least one spatial location other than the given test-person's mouth to at least one of the microphones of a microphone system worn by said given test-person; estimating an own voice relative transfer function for sound from the user's mouth to at least one of the at least two microphones of the hearing system in dependence of said at least two electric input signals, and on said own voice control signal, and estimating personalized relative or absolute head related acoustic transfer functions or impulse responses, or a transformation thereof, from at least one spatial location other than the user's mouth to at least one of the microphones of said hearing system worn by said user in dependence of said estimated own voice relative transfer function and said database (Ol, Hl), wherein a transform thereof is a Fourier or inverse Fourier transformation, a cosine or sine transformation, or a Laplace transformation.

20. A method according to claim 19 comprising processing said at least two electric signals in dependence of said estimated personalized relative or absolute head related acoustic transfer functions or impulse responses, or a transformation thereof.

Description

BRIEF DESCRIPTION OF DRAWINGS

(1) The aspects of the disclosure may be best understood from the following detailed description taken in conjunction with the accompanying figures. The figures are schematic and simplified for clarity, and they just show details to improve the understanding of the claims, while other details are left out. Throughout, the same reference numerals are used for identical or corresponding parts. The individual features of each aspect may each be combined with any or all features of the other aspects. These and other aspects, features and/or technical effect will be apparent from and elucidated with reference to the illustrations described hereinafter in which:

(2) FIG. 1A shows an exemplary offline procedure according to the present disclosure for estimating personalized beamformer coefficients from personal own-voice-transfer function information, and

(3) FIG. 1B shows an exemplary online procedure according to the present disclosure for estimating personalized beamformer coefficients from personal own-voice-transfer function information,

(4) FIG. 2 schematically illustrates a path for sound from mouth to ear for measuring an own-voice impulse response for right-ear microphones Mi, i=1, 2, 3,

(5) FIG. 3A shows a block-diagram of an embodiment of hearing system comprising a beamformer and a system for estimating relative own-voice transfer functions from a user's speech according to the present disclosure;

(6) FIG. 3B shows a block-diagram of an embodiment of a hearing system for estimating relative own-voice transfer functions from a user's speech signal according to the present disclosure; and

(7) FIG. 3C shows a block-diagram of an embodiment of hearing device comprising a beamformer and a system for estimating relative own-voice transfer functions from a user's speech according to the present disclosure,

(8) FIG. 4 schematically shows a setup for measuring head-related impulse responses (or acoustic transfer functions) for a person, e.g. test subject l, between spatial direction or location j and microphone i of a hearing system according to the present disclosure worn by the person,

(9) FIG. 5 shows a plot of complex (real part, imaginary part) relative own voice transfer functions (OV-RTF) measured across different individuals (asterisks) compared to far-field (FF-RTF) relative transfer functions measured across different individuals and across different directions (dots),

(10) FIG. 6A schematically shows a database (O.sub.l, H.sub.l) of absolute or relative acoustic transfer functions or impulse responses for a multitude of persons comprising corresponding own voice transfer functions (OVT) and head related transfer functions (HRTF), and

(11) FIG. 6B schematically indicates a setup for measurement of the own voice transfer functions (OVT) (or impulse responses, OIR) and acoustic head related transfer functions (HRTF) (or impulse responses, HRIR) for a microphone system located at the left and right tears of a person,

(12) FIG. 7 shows an embodiment of a part of the processor for providing personalized acoustic far field head related transfer functions for a wearer of a hearing system,

(13) FIG. 8 schematically shows a hearing device of the receiver in the ear type according to an embodiment of the present disclosure, and

(14) FIG. 9 shows a flow chart for an embodiment of a method of estimating personalized acoustic far-field transfer functions for a wearer of a hearing system.

(15) The figures are schematic and simplified for clarity, and they just show details which are essential to the understanding of the disclosure, while other details are left out. Throughout, the same reference signs are used for identical or corresponding parts.

(16) Further scope of applicability of the present disclosure will become apparent from the detailed description given hereinafter. However, it should be understood that the detailed description and specific examples, while indicating preferred embodiments of the disclosure, are given by way of illustration only. Other embodiments may become apparent to those skilled in the art from the following detailed description.

DETAILED DESCRIPTION OF EMBODIMENTS

(17) The detailed description set forth below in connection with the appended drawings is intended as a description of various configurations. The detailed description includes specific details for the purpose of providing a thorough understanding of various concepts. However, it will be apparent to those skilled in the art that these concepts may be practiced without these specific details. Several aspects of the apparatus and methods are described by various blocks, functional units, modules, components, circuits, steps, processes, algorithms, etc. (collectively referred to as “elements”). Depending upon particular application, design constraints or other reasons, these elements may be implemented using electronic hardware, computer program, or any combination thereof.

(18) The electronic hardware may include micro-electronic-mechanical systems (MEMS), integrated circuits (e.g. application specific), microprocessors, microcontrollers, digital signal processors (DSPs), field programmable gate arrays (FPGAs), programmable logic devices (PLDs), gated logic, discrete hardware circuits, printed circuit boards (PCB) (e.g. flexible PCBs), and other suitable hardware configured to perform the various functionality described throughout this disclosure, e.g. sensors, e.g. for sensing and/or registering physical properties of the environment, the device, the user, etc. Computer program shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise.

(19) The present application relates to the field of hearing devices, e.g. hearing aids.

(20) FIG. 1A shows an exemplary offline procedure according to the present disclosure for estimating personalized beamformer coefficients from personal own-voice-transfer function information. The offline procedure comprises:

(21) A. Measurement of own voice transfer functions (OVT) using a close-talk microphone located at the mouth of a user (cf. e.g. FIG. 2)

(22) B. Mapping of OVTs to absolute or relative head related transfer functions (HRTFs).

(23) C. Computation of personalized beamformer coefficients from the HRTFs

(24) FIG. 1B shows an exemplary online procedure according to the present disclosure for estimating personalized beamformer coefficients from personal own-voice-transfer function information. The online procedure comprises:

(25) A. Estimation of relative own voice transfer functions (OVT) using own-voice detector and hearing aid (HA) microphones (cf. FIG. 3A, 3B, 3C).

(26) B. Mapping of OVTs to absolute or relative head related transfer functions (HRTFs).

(27) C. Computation of personalized beamformer coefficients from the HRTFs

(28) Absolute and Relative Own Voice Transfer Functions

(29) Let h.sub.i(n) denote an own-voice impulse response (OIR), i.e., the impulse response from a point just outside the mouth of hearing aid user (here the location of the ‘Close-talk microphone’) to the i'th microphone of the hearing aid (FIG. 2). Let OVT′.sub.k,i, k=1, . . . , K (K being a number of frequency bands) denote a Fourier transform of h.sub.i(n)−OVT′.sub.k,i thus being the OVT from the mouth to the i.sup.th microphone. Define the absolute OVT vector OVT′.sub.k=[OVT.sub.k,1 . . . OVT.sub.k,M], where M is the number of microphones. Finally, let the relative OVT vector with respect to a pre-defined reference microphone with index iref be given by OVT.sub.k=OVT′.sub.k/OVT.sub.k,iref. Clearly, absolute OVTs carry more information than relative OVTs, because the latter can be derived from the former, but not the other way around. In other words, absolute OVTs carry explicit information about the sound traveling time from mouth to microphones, while relative OVTs do not.

(30) FIG. 2 schematically illustrates an own-voice impulse response for right-ear microphones M.sub.i, i=1, 2, 3 for an exemplary hearing device (HD) according the present disclosure. The hearing device (HD) is located at a right ear of the user. The hearing device comprises a multitude of microphones, here at least three microphones (M.sub.1, M.sub.2, M.sub.3). The three microphones (M.sub.1, M.sub.2, M.sub.3) are located in a BTE-part of the hearing device located at or behind the outer ear (pinna) of the user. The three microphones are located in the hearing aid (here in the BTE-part) so as to facilitate determining filter weights for a beamformer for picking up the voice of the user (‘own voice’) as well as for a beamformer for picking up sounds from the environment. In the exemplary embodiment of FIG. 2, the microphones are located at vertices of a triangle. The BTE-part is connected to an ITE-part adapted for being located at or in an ear canal of the user. The BTE-part and the ITE-part are connected (e.g. acoustically and/or electrically connected to each other by a connecting element (IC). The connecting element may comprise a tube for guiding sound from a loudspeaker located in the BTE-part to the ear canal of the user. The connecting element may comprise a number of electrical conductors for electrically connecting the BTE-part to the ITE-part. The ITE-part may comprise a loudspeaker located in the ear canal of the user. In the latter case, the hearing aid may implement a ‘receiver in the ear’ (RITE) style.

(31) As a first step in the proposed method, the absolute or relative OVT must be estimated.

(32) The absolute OVT vector OVT′.sub.k, k=1, . . . , K may be estimated, e.g., at a hearing care professional (HCP) during a hearing aid (HA) fitting using a small voice sample: the hearing aid user wears the HAs and a mouth reference microphone (cf. ‘Close-talk microphone’ of FIG. 2) is placed in front of the users' mouth (‘Mouth’ in FIG. 2). The user pronounces a few test sentences, which are recorded at the mouth reference microphone and the HA microphones (cf. M.sub.1, M.sub.2, M.sub.3 in FIG. 2). The absolute OVT's are estimated from the microphone signals using standard system identification algorithms, e.g. for microphone M.sub.3, H.sub.3=(IN(M.sub.3)/IN(M.sub.CT)), where IN(x) is the complex (frequency dependent) input signal picked up by the microphone x (x=M.sub.3, M.sub.CT). The frequency dependent input signal IN(x) may be determined from the corresponding time domain input signal (e.g. by a Fourier transform, such as a Discrete Fourier Transform (DFT)), e.g. based on a measured impulse response.

(33) Additionally, or alternatively, the relative OVT may be used. If the absolute OVT is measured at the HCP, then the relative OVT may easily be derived from the absolute OVT. Alternatively, the relative OVT may be estimated online during everyday use of the hearing aid as follows (see FIG. 3A, 3B, 3C).

(34) FIG. 3A shows a block-diagram of an embodiment of hearing system comprising a hearing device (HD) and an external database (MEM) accessible from the hearing device. The hearing device comprises a beamformer and a system for estimating relative own-voice transfer functions from a user's speech according to the present disclosure. The hearing device (HD) of FIG. 3A comprises a multitude (M) of microphones (M.sub.1, . . . , M.sub.M) configured to pick up sound from the environment of the hearing device and convert the sound to (e.g. digitized) electric input signals (IN1, . . . , INM). The hearing device (HD) further comprises analysis filter banks (FB-A1, . . . , FB-AM) for converting the multitude of (time-domain) electric input signals (IN1, . . . , INM) to respective (frequency domain) electric input signals as frequency sub-band signals (X.sub.1, . . . , X.sub.M). The hearing device comprises an own voice detector configured to estimate whether or not, or with what probability, said multitude of electric input signals, or a processed version thereof, comprises a voice from the user of the hearing system, and to provide an own voice control signal (OV) indicative thereof. The hearing device further comprises a processor (PRO) connected to the multitude of microphones (M.sub.1, . . . , M.sub.M), to the own voice detector (OVD), and to a transceiver (Rx/Tx) for providing access to a database (e.g. located on a server). The database (O.sub.l, H.sub.l) comprises (e.g. measured or otherwise determined/estimated) absolute or relative acoustic transfer functions or impulse responses (cf. signal OVT-HRTF retrieved from the database) for a multitude of persons. The database (O.sub.l, H.sub.l) comprises for each of the multitude of persons a) a relative or absolute own voice transfer function or impulse response from the mouth of a given person among said multitude of persons to at least one (e.g. all) of the microphones of a microphone system worn by the given person, and b) a relative or absolute head related acoustic transfer function or impulse response from at least one spatial location other than the given person's mouth to at least one (e.g. all) of the microphones of a microphone system worn by the given person. The transceiver (Rx/Tx) may implement a wireless connection to another device (e.g. a smartphone or the like) or to a server (e.g. a cloud server), e.g. via a network, e.g. the Internet. In an embodiment, the database (or a part thereof) is stored in a memory (MEM) of the hearing device (see e.g. FIG. 3B, 3C). The processor (PRO) is configured to estimate an own voice relative transfer function (OVT.sub.user) from the user's mouth to at least one (e.g. to all) of the multitude of microphones (M.sub.1, . . . , M.sub.M) in dependence of the multitude of electric input signals (IN1, . . . , INM), or a processed version thereof, and on the own voice control signal (OV). The processor (PRO) is further configured to access the database and to estimate personalized relative or absolute head related acoustic transfer functions (d.sub.k,user) (or corresponding impulse responses) from at least one spatial location other than the user's mouth (e.g. in front of the user) to at least one (e.g. all) of the microphones of the hearing device in dependence of the estimated own voice relative transfer function(s) (OVT.sub.user) and the database (O.sub.l, H.sub.l). The hearing device further comprises a beamformer (BF) configured to receive the multitude of electric input signals (IN1, . . . , INM), or processed versions thereof (X.sub.1, . . . , X.sub.M), and to determine beamformer weights (Wij) based on the personalized relative or absolute head related acoustic transfer functions (d.sub.k,user) or impulse responses and to provide a beamformed signal Y.sub.BF based thereon. The beamformed signal may be further processed in a further processor (see e.g. signal processor SP of FIG. 3C) before being subject to conversion to the time domain by synthesis filter bank (FB-S) providing time-domain output signal OUT that is fed to an output unit crating stimuli perceivable by the user as sound. In the embodiment of FIG. 3A, the output unit comprises a loudspeaker (SPK) for converting the signal OUT to an acoustic signal (comprising vibrations in air). Determination of beamformer weights from relative or absolute transfer functions for a given beamformer structure is well-known in the art. For an MVDR beamformer, the determination of beamformer weights W.sub.H(k) can be written as

(35) W H ( k ) = R ^ VV ( k ) d ^ ( k ) d ^ * ( k , i ref ) d ^ H ( k ) R ^ VV - 1 ( k ) d ^ ( k )
where {circumflex over (R)}.sub.VV(k) is (an estimate of) the inter-microphone noise covariance matrix for the current acoustic environment, {circumflex over (d)}(k) is the estimated look vector (representing the inter-microphone transfer function for a target sound source at a given location), k is the frequency index and i.sub.ref is an index of the reference microphone (*denotes complex conjugate, and .sup.H denotes Hermitian transposition).

(36) FIG. 3B shows a block-diagram of a hearing system comprising n embodiment of an online system for estimating relative own-voice transfer functions from users' speech signal. The hearing device (HD) of FIG. 3B comprises first and second microphones (M.sub.1, M.sub.2) configured to pick up sound from the environment of the hearing device and convert the sound to first and second (e.g. digitized) electric input signals (IN1, IN2). The hearing device (HD) further comprises first and second analysis filter banks (FB-A1, FB-A2) for converting the first and second (time-domain) electric input signals (IN1, IN2) to respective first and second (frequency domain) electric input signals as frequency sub-band signals (X.sub.1, X.sub.2). The first and second electric (frequency sub-band) input are fed to respective detectors (OVD, SNRE, . . . , DETX) as well as to an own-voice power spectral density estimator (OV-PSDE) providing (frequency dependent) spectral densities S.sub.k,i for each electric input signal (X.sub.1, X.sub.2) (here M=2, so i=1, 2).

(37) The power spectral density (psd) of an audio signal is a representation of the distribution over frequencies of the energy of the signal (determined over a certain time range). A graph of power spectral density versus frequency may also be termed the ‘spectral energy distribution’. For a stochastic signal (e.g. some types of noise), the power spectral density is defined as the Fourier transform of its auto-correlation function. For an audio signal, a power spectral density may e.g. be based on a prior classification of the signal, e.g. classified as ‘speech’ or ‘noise’, e.g. using a voice activity detector. The power spectral density may e.g. be determined over the time frame of a syllable, a word, a sentence or longer periods of coherent speech. In the present context of own voice, a power spectral density may appropriately be related to time periods where an own voice detector indicates the presence of own voice (e.g. with a probability above a certain threshold, e.g. 70% or 80%).

(38) The block OVD represents an own-voice detection algorithm that continuously monitors if/when the hearing aid user speaks in a situation without too much background noise—this detection may be combined with other detectors, e.g. SNR detectors (SNRE) for robustness. The detection threshold is set such that only highly probable own-voice situations are detected—we are interested in detecting a few situations, e.g., one an hour or one every 6 hours, where own-voice is highly likely (in other words, the false-alarm rate should preferably be low). It is less important, if many own-voice situations go undetected.

(39) Let S.sub.k,i denote the power spectral density of the own voice signal picked up by microphone i at frequency k, and let S.sub.k=[S.sub.k,1 . . . S.sub.k,M] denote a vector of such power spectral densities. In a situation, where own-voice is detected with high likelihood, the relative OVT may be estimated as OVT.sub.k,user=sqrt(S.sub.k/S.sub.k,iref).

(40) The own-voice power spectral density estimator (OV-PSDE) provides an estimate of the own-voice power spectral density vector S.sub.k=[S.sub.k,1 S.sub.k,2] at a given point in time. The estimate is based on inputs from one or more detectors related to a current signal content of the first and second electric input signal (X.sub.1, X.sub.2). In the embodiment of FIG. 3B, an own voice detector (OVD) and an SNR estimator (SNRE) are indicated. The own voice detector (OVD) provides an indicator (OV=OV(k,m)) of whether or not (or with what probability), at a given time m and frequency k (i.e. time-frequency unit (k,m)), the first and/or second electric input signal(s) (X.sub.1, X.sub.2) or a signal or signals originating therefrom (e.g. a combined, e.g. beamformed, version thereof), comprises the user's own voice. The SNR estimator (SNRE) may e.g. provide an estimate (SNR=SNR(k,m)) a given time m and frequency k. The detector signals may be used to improve the estimate of own-voice power spectral density at a given point in time, e.g. utilizing an estimate of the presence of own voice (OV) and/or on an estimate of the quality of the target (own voice) speech (SNR). Other detectors (indicated in FIG. 3B as DETX providing detector signal ‘detx’) may e.g. comprise a movement detector (e.g. comprising an accelerometer), a signal similarity or correlation detector, e.g. an auto-correlation detector, etc.

(41) The hearing device further comprises a relative OVT estimator (ROVTE) for estimating a relative transfer function vector OVT.sub.k,user. The elements of the relative own voice transfer function vector are the relative transfer functions for sound from the user's mouth to each of the microphones of the hearing device, estimated from the input own-voice power spectral density vector S.sub.k as OVT.sub.k,user=sqrt(S.sub.k/S.sub.k,iref), where iref is the index of the reference microphone. For the embodiment of FIG. 3B with only (M=) two microphones, and if the reference microphone is M.sub.1 (i.e. iref=1), the relative own voice transfer function vector OVT.sub.k,user comprises two elements (1, OVT.sub.k,user,1)=(1, S.sub.k,2/S.sub.k,1), k=1, . . . , K.

(42) The hearing device further comprises a personalized head related transfer functions estimator (P-HRTF-E) for estimating the personalized relative or absolute head related acoustic transfer functions d.sub.k,user or impulse responses from the estimated own voice transfer function vector OVT.sub.k,user and the database (O.sub.l, H.sub.l). An embodiment of the personalized head related transfer functions estimator (P-HRTF-E) is described in further detail in connection with FIG. 7. The database (O.sub.l, H.sub.l) (or a part thereof) is stored in a memory (MEM) of the hearing device (HD) FIG. 3C shows a block-diagram of an embodiment of hearing device (HD) comprising a beamformer (BF) and a system for estimating relative own-voice transfer functions (OVT) from a user's speech according to the present disclosure. The hearing device (HD) of FIG. 3C comprises the same functional elements as shown in the embodiment of a hearing system of FIG. 3A and as described in connection therewith, except that the database (O.sub.l, H.sub.l) is located in a memory (MEM) of the hearing device (instead of being implemented on a separate device accessible via a (e.g. wireless) communication link as in FIG. 3A), and that the hearing device specifically comprises two microphones (M.sub.1, M.sub.2) instead of M, where M may be larger than two.

(43) Estimating Absolute or Relative HRTFs from Absolute or Relative OVTs

(44) We propose to estimate absolute/relative HRTFs from the absolute/relative OVT estimates described above.

(45) Absolute and Relative HRTFs:

(46) Let g.sub.i,j,l(n) denote an impulse response (head related impulse response (HRIR)) from a j.sup.th point in space (at ‘Speaker j’ in FIG. 4) to the i.sup.th microphone of a hearing aid system (microphone ‘M.sub.i’ in FIG. 4) worn by user l (‘Test subject’ l in FIG. 4). For example, impulse responses may be used from spatial points (e.g. J points, cf. FIG. 4) located equidistantly on a circle in the horizontal plane (each spatial point being (360/J)° apart), centered at the users' head, and at a height corresponding the users' ears (cf. dashed circle in FIG. 4). For example, a number of J=16, J=32, or J=48 points may e.g. be used.

(47) Using an identical procedure as for OIRs, HRIRs may be transformed into absolute HRTFs: let e′.sub.k,i,j,l, k=1, . . . , K denote a Fourier transform of the HRIR g.sub.i,j,l(n), where e′.sub.k,i,j,l is the absolute HRTF at frequency k, from spatial point j to microphone i, for the l.sup.th test subject. We may then form an absolute HRTF vector e′.sub.k,j,l=[e′.sub.k,0,j,l . . . e′.sub.k,M-1,j,l] and define a relative HRTF vector e.sub.k,j,l=e′.sub.k,j,l/e.sub.k,iref,j,l.

(48) FIG. 4 schematically illustrates the geometrical arrangement defining a head-related impulse response between spatial point j and microphone i for test subject l.

(49) A Priori Database of HRTF and OVT Pairs:

(50) We assume that a database of (O.sub.l, H.sub.l) pairs have been collected a priori for many test subjects. Here, O.sub.l denotes one or more or all (for all microphones) pre-measured OVT's for test subject l (for example stacked as a vector), and similarly, H.sub.l denotes one or more or all HRTFs for test subject l (for example stacked as a vector).

(51) For example, a could be the collection of absolute OVTs OVT′.sub.k,l, for all frequencies k=1, . . . , K and for all microphones for test subject l. As another example, O.sub.l could be defined as the relative transfer functions OVT.sub.k,l for one or some microphones for test subject l. Many other obvious variants exist (combinations of frequencies, absolute/relative OVTs, and microphone indices).

(52) Similarly, H.sub.l could be a collection of absolute HRTFs e′.sub.k,j,l, for all frequencies k=1, . . . , K from spatial point j. Alternatively, H.sub.l could represent a collection of absolute HRTFs for all frequencies k=1, . . . , K, and for all spatial points, j=1, . . . , J. Alternatively, H.sub.l could represent a collection of relative HRTFs for all frequencies, k=1, . . . , K, for a subset of spatial points and a subset of microphones. Many other obvious variants exist (combinations of frequencies, absolute/relative OVTs, spatial points, and microphone indices).

(53) As an alternative to having a (O.sub.l, H.sub.l) pair, the OVT could be substituted by a transfer function measured from a certain position, e.g. as described in EP2928215A1.

(54) Mapping from OVTs to HRTFs:

(55) Given the a priori database of (O.sub.l, H.sub.l)-pairs, l=1, . . . , L (where L is the number of test objects), there exist several ways of estimating the HRTF-information of the user, H.sub.user, from the users' OVT-information, O.sub.user. Note that the HRTF- and OVT-information of a particular user is unlikely to be present in the a priori database.

(56) Table Lookup Based Approach:

(57) The OVT-information of the user, measured either at the HCP or in the online procedure as outlined above, may be compared to each and every instance of O.sub.l, l=1, . . . , L, in the a priori database in order to find the data base entry, l*, for which O.sub.l matches O.sub.user best. For example, the least-square distance measure could be used. The corresponding estimate of the users personalized HRTF-information is then H.sub.l*, where
l*=argmin.sub.ld(O.sub.l,O.sub.user),
where d(⋅) is a distance measure between OVTs. Several different distance measures may be used, e.g. based on minimizing an Euclidean distance.
Statistical Model based Approach:

(58) Based on the a priori database, an a priori statistical model may be derived. In particular, if the (O.sub.l, H.sub.l)-pairs in the a priori data base are considered as realizations of random vector variables, then a joint probability density model f(O, H) may be fitted to the entries in the database, e.g., using a Gaussian Mixture Model or other parametric models. Given this statistical model and the estimated O.sub.user information of a particular user, for which an estimate of her HRTF information is desired, it is straightforward to compute minimum mean-square (mmse) estimates of the personal HRTF information:
H.sub.mmse=∫H*f(H|O.sub.user)dH,
where ∫ denotes a multi-dimensional integral across all dimensions in vector H, and where f(H|O) denotes a conditional power distribution function (pdf), which can be derived from the joint pdf model f(O,H). The integral may be evaluated numerically.

(59) Alternatively, a maximum a posteriori (map) probability estimate of H.sub.user may be found by maximization of the posterior probability:
HRTF.sub.map=max.sub.Hf(O.sub.user|H)*f(H),
where f(H) denotes a prior probability on the HRTF vector, which may, e.g., be chosen as a uniform distribution, f(H)=const. The maximization may be performed numerically.
Deep Neural Network Based Approach:

(60) Based on the a priori database, a Deep Neural Network may be trained in an offline procedure prior to deployment, using O.sub.l and H.sub.l as target outputs, respectively. The DNN may be a feedforward network (multi-layer perceptron), a recurrent network, a convolutional network, or combinations and variants thereof. The cost function optimized during training may be mean-square error between estimated and true HRTF vectors, etc.

(61) Finding Beamformer Coefficients from Estimated Personalized HRTFs:

(62) From the estimated personalized HRTF information, H.sub.est, it is straightforward to derive personalized beamformer coefficients. For example, if H.sub.est contains relative HRTFs e.sub.k,j, k=1, . . . , K, for a sound source from a frontal location (j) for two microphones in the same hearing aid, then the coefficients of a Minimum Variance Distortion-Less Response (MVDR) beamformer are given by
w.sub.k=(R.sub.vv,k).sup.−1e.sub.k,j/(e.sub.k,j.sup.T(R.sub.vv,k).sup.−1e.sub.k,j),
where (⋅).sup.−1 denotes matrix inversion, matrix R.sub.vv,k is a noise-cross power spectral density matrix [Loizou] for the microphones involved, and e.sub.k,j is the (2-element) relative HRTF vector related to a spatial point (j) in the frontal direction.

(63) Many other personalized beamformer variants, e.g., the Multi-Channel Wiener Filter [Brandstein], binaural beamformers (involving microphones in hearing aids on both ears) [Marquard], etc., may be derived from estimated personalized absolute/relative HRTF vectors.

(64) Extensions:

(65) Using HRIRs and ORIRs Rather than HRTFs and OVTs:

(66) The concept of the present disclosure is described in terms of OVTs and HRTFs. It is, however, straightforward to exchange these quantities with the time-domain analogies, OIRs and HRIRs and perform a mapping from OIRs (estimated from a voice sample of the user, either at the HCP or in an “online” approach) to HRIRs.

(67) Detecting Implausible OVTs:

(68) Performing the mapping from personalized OVTs to personalized HRTFs using the “Table-based Approach” involves the computation of distances between an estimated personal OVT and OVTs of test subjects which have been measured and stored up-front in the a priori data base. The computed minimum distance may be used to estimate the reliability of the OVT measurement. Specifically, if the minimum distance exceeds a pre-specified threshold, the OVT measurement may be labeled as potentially unreliable (e.g., due to noise, reverberation, or other problems during the OVT estimation process).

(69) FIG. 5 shows relative own voice transfer functions (OV-RTF) measured across different individuals (asterisks) compared to far-field (FF-RTF) relative transfer functions measured across different individuals and across different directions (dots). FIG. 5 illustrates how a relative own voice transfer function (OV-RTF) measured across different individuals (each asterisk indicates an individual OV relative TF) compared to far-field transfer function measured across different individuals and across different directions (each dot indicates an individual far-field relative TF (FF-RTF)). It can be seen that the location in the complex plane of the OV-RTFs differs from most of the FF-RTFs. Knowing the typical location of the OV-RTF (not only at the shown frequency but across different frequencies) can be used to validate the estimated OV-RTF. It could e.g. be used to determine if the RTF could be used to update the weights of an OV-beamformer. The validation decision can be based on a distance measure between the estimated OV-RTF and the most likely OV-RTF, e.g. measured across different frequencies. Alternatively, the validation may be based on supervised learning (e.g. training a neural network based on examples of labelled valid and invalid OV-RTF).

(70) The OV-RTF could be estimated in a controlled setup, where the user is prompted to speak. Alternatively, the OV-RTF could be estimated/updated, when OV is detected. The OV detector could be based on acoustic features, or alternatively/in addition on other features such as detected jaw vibrations (from an accelerometer or a vibration microphone) or based on individual features such as pitch frequency. The OV detection may also be based on results from another hearing instrument.

(71) The validation procedure is exemplified with OV as an example. However, the described validation method may also be used to validate other impulse response measurements, such as a measured 0 degrees (frontal) impulse response (e.g. measured as described in EP2928215A1).

(72) FIG. 6A schematically shows a database (O.sub.l, H.sub.l) of absolute or relative acoustic transfer functions or impulse responses for a multitude of persons (l=1, . . . , L) comprising corresponding own voice transfer functions (OVT.sub.l) and acoustic (far-field) head related transfer functions (HRTF.sub.l) for a number of test subjects. FIG. 6A shows a table comprising a left (first) column denoted ‘Test subject l’, l=1, . . . , L, a second (middle) column denoted OVT.sub.l(k,i), and a third (right column) denoted HRTF.sub.l(θ.sub.j,k,i), where j, k, and i are indices defining a direction to (or location of) an acoustic far field sound source (j=1, . . . , J), a frequency (k=1, . . . , K), and a microphone (i=1, . . . , M), respectively. Each row relates to a specific test object with index l′ (e.g. a given person). Row l′ comprises in the second column (e.g. predefined, e.g. measured) values of frequency dependent (k=1, . . . , K) own voice transfer functions (OVT.sub.l′) for a given microphone (M.sub.i, i=1, . . . , M) for the specific test object l′. Row l′ further comprises in the third column (e.g. predefined, e.g. measured) values of frequency dependent (k=1, . . . , K) acoustic (far-field) head related transfer functions (HRTF.sub.l′) for a number of directions to (or locations of) an acoustic far field sound source (θ.sub.j, j=1, . . . , J), for a given microphone (M.sub.i, i=1, . . . , M) for the specific test object l′. The frequencies k for which values of transfer functions are provided in the database (e.g. the table of FIG. 6A) may be representative for the frequency range of operation of the hearing device, e.g. 0 to 10 kHz, or for the occurrence of speech. The frequencies k for which values are provided in the database may be limited in number, e.g. to less than or equal to eight, or less than or equal to four, e.g. at least one, or at least two. The number of microphones M of the hearing device or hearing system (e.g. a binaural hearing aid system) for which values of transfer functions are provided in the database may e.g. be all (M, or M−1 in case or relative transfer functions) or a subset of microphones, e.g. at least one microphone. The number of directions to (or locations of) an acoustic far field sound source (θ.sub.j, j=1, . . . , J) for which values of far field acoustic transfer functions are provided in the database may be representative of a space around the user wearing the hearing device or hearing system, e.g. evenly distributed, or it may be focused on directions (or locations) that are considered to be most important to the user, e.g. from one or more selected directions (e.g. including one or more of the front, the sides, and the back, e.g. in an appropriate distance for communication, e.g. one to two meters from the user). The number of directions to (or locations of) an acoustic far field sound source (θ.sub.j, j=1, . . . , J) for which values of far field acoustic transfer functions are provided in the database is larger than or equal to one, but may be limited to sixteen or to eight or to four.

(73) FIG. 6B schematically indicates the measurement of own voice transfer functions (OVT.sub.l(k)) and acoustic (far-field) head related transfer functions (HRTF.sub.l(θ.sub.j,k)) for a microphone system comprising a multitude of microphones M.sub.i, i=1, . . . , M, worn by specific person (‘Test object l’). The microphones M.sub.i (i=1, 2, 3, 4) of FIG. 6A may e.g. represent microphones of a binaural hearing aid system comprising first and second hearing devices located at left and right ears of the person (l), each hearing device comprising two microphones ((M.sub.1, M.sub.2) and (M.sub.3, M.sub.4), respectively) located at respective left and right ears of the person. The measurements may e.g. be performed in a sound studio by a hearing care professional (HCP). A frequency dependent own voice transfer function OVT.sub.l(k) for the person (l) for a specific microphone M.sub.i may be measured while the user speaks, e.g. a specific test word or sentence(s) (cf. ‘Own voice(l) in FIG. 6B). The measurement may be repeated for different microphones. Likewise, a frequency dependent acoustic (far-field) head related transfer function (HRTF.sub.l(θ.sub.j,k)) for the person (‘Test subject l’) for a specific microphone M.sub.i and location of the (far-field) sound source (here corresponding to a frontal direction (‘θ=0’) relative to the person, cf. dotted line through the head of the person and the sound source) may be measured while the sound source is playing a test sound (cf. ‘test sound’ in FIG. 6B). The measurement may be repeated for different microphones and locations of the (far-field) sound source. The dashed circle around the test person may represent a typical distance of a conversation partner, e.g. 1-2 m. Far-field head related transfer function (HRTF.sub.l(θ.sub.j,k)) may be measured and stored in the database for more than one distance e.g. for two or more distances from the test person, e.g. for distances 1 m and 3 m. The locations on a circle of a given radius, e.g. 1 m may not necessarily be equidistant, but may e.g. have a higher density of measurements locations in front of the person than to the rear of the person. The distribution of measurement points around the test object may be adapted to the application scenario envisioned (e.g. a fixed, e.g. car, application vs. a walk around teaching application).

(74) Instead of measuring transfer functions at different frequencies, corresponding impulse responses (OIR, HRIR) may be measured and converted to the frequency domain by an appropriate transformation (e.g. a Fourier transformation algorithm, e.g. a discrete Fourier transformation (DFT) algorithm).

(75) FIG. 7 shows an embodiment of a part (P-HRTF-E) of the processor for estimating personalized (relative or absolute) acoustic far field head related transfer functions d.sub.k,user or impulse responses for a wearer of a hearing system. The personalized head related transfer functions estimator (P-HRTF-E) is configured to estimate the personalized relative or absolute head related acoustic transfer functions d.sub.k,user or impulse responses from an estimated own voice transfer function vector OVT.sub.k,user (based on currently received own voice signals by the microphones of the hearing device) by comparison with (predetermined, e.g. measured) transfer function (or impulse response) data stored in the database (O.sub.l, H.sub.l), cf. memory (MEM) in FIG. 7. The processor part (P-HRTF-E) comprises a comparator configured to compare an estimated own voice transfer function vector OVT.sub.k,user (e.g. estimated in another part (ROVTE) of the processor (PRO), cf. e.g. FIG. 3B) with the stored own voice transfer functions OVT.sub.l(k), l=1, . . . , L, of the database (O.sub.l, H.sub.l), cf. FIG. 6A. The comparator estimates for each own voice transfer function OVT.sub.l(k), l=1, . . . , L, of the database (O.sub.l, H.sub.l) a difference ΔOVT.sub.user,l(k) to the estimated own voice transfer function vector OVT.sub.k,user (either for one microphone, or for several microphones or for all microphones of the hearing system). The processor part (P-HRTF-E) further comprises a minimizer (MIN) configured to identify the index l* for which the a difference ΔOVT.sub.user,l(k) is minimum. The processor part (P-HRTF-E) further comprises a selector (SEL) for selecting the relative or absolute head related acoustic transfer functions HRTF.sub.l*(k) for the user l* among the L sets of transfer functions stored in the database (O.sub.l, H.sub.l) and to provide the personalized transfer functions as vector d.sub.k,user. The personalized transfer functions as vector d.sub.k,user can e.g. be used to determine personalized beamformer weights of a beamformer, cf. e.g. FIG. 3A, 3C.

(76) FIG. 8 shows a hearing device of the receiver in the ear type according to an embodiment of the present disclosure. FIG. 8 shows a BTE/RITE style hearing device according to an embodiment of the present disclosure (BTE=‘Behind-The-Ear’; RITE=Receiver-In-The-Ear’). The exemplary hearing device (HD), e.g. a hearing aid, is of a particular style (sometimes termed ‘receiver-in-the ear’, or RITE, style) comprising a BTE-part (BTE) adapted for being located at or behind an ear of a user, and an ITE-part (ITE) adapted for being located in or at an ear canal of the user's ear and comprising a receiver (=loudspeaker, SPK). The BTE-part and the ITE-part are connected (e.g. electrically connected) by a connecting element (IC) and internal wiring in the ITE- and BTE-parts (cf. e.g. wiring Wx in the BTE-part). The connecting element may alternatively be fully or partially constituted by a wireless link between the BTE- and ITE-parts. Other styles, e.g. where the ITE-part comprises or is constituted by a custom mould adapted to a user's ear and/or ear canal, may of course be used.

(77) In the embodiment of a hearing device in FIG. 8, the BTE part comprises an input unit comprising two input transducers (e.g. microphones) (M.sub.BTE1, M.sub.BTE2), each for providing an electric input audio signal representative of an input sound signal (S.sub.BTE) (originating from a sound field S around the hearing device). The input unit further comprises two wireless receivers (WLR.sub.1, WLR.sub.2) (or transceivers) for providing respective directly received auxiliary audio and/or control input signals (and/or allowing transmission of audio and/or control signals to other devices, e.g. a remote control or processing device, or a telephone, or another hearing device). Access to a database (O.sub.l, H.sub.l) of absolute or relative acoustic transfer functions or impulse responses according to the present disclosure may furthermore be provided via one of the wireless transceivers (WLR.sub.1, WLR.sub.2). The hearing device (HD) comprises a substrate (SUB) whereon a number of electronic components are mounted, including a memory (MEM), e.g. storing different hearing aid programs (e.g. user specific data, e.g. related to an audiogram, or parameter settings derived therefrom, e.g. defining such (user specific) programs, or other parameters of algorithms, e.g. beamformer filter weights, and/or fading parameters) and/or hearing aid configurations, e.g. input source combinations (M.sub.BTE1, M.sub.BTE2 (M.sub.ITE), WLR.sub.1, WLR.sub.2), e.g. optimized for a number of different listening situations. The memory (MEM) may further comprise a database (O.sub.l, H.sub.l) of absolute or relative acoustic transfer functions or impulse responses according to the present disclosure. In a specific mode of operation, two or more of the electric input signals from the microphones are combined to provide a beamformed signal provided by applying appropriate (e.g. complex) weights to (at least some of) the respective signals. The beamformer weights are preferably personalized as proposed in the present disclosure.

(78) The substrate (SUB) further comprises a configurable signal processor (DSP, e.g. a digital signal processor), e.g. including a processor for applying a frequency and level dependent gain, e.g. providing beamforming, noise reduction, filter bank functionality, and other digital functionality of a hearing device, e.g. implementing features according to the present disclosure. The configurable signal processor (DSP) is adapted to access the memory (MEM) e.g. for selecting appropriate parameters for a current configuration or mode of operation and/or listening situation and/or for writing data to the memory (e.g. algorithm parameters, e.g. for logging user behavior) and/or for accessing the database (O.sub.l, H.sub.l) of absolute or relative acoustic transfer functions or impulse responses according to the present disclosure. The configurable signal processor (DSP) is further configured to process one or more of the electric input audio signals and/or one or more of the directly received auxiliary audio input signals, based on a currently selected (activated) hearing aid program/parameter setting (e.g. either automatically selected, e.g. based on one or more sensors, or selected based on inputs from a user interface). The mentioned functional units (as well as other components) may be partitioned in circuits and components according to the application in question (e.g. with a view to size, power consumption, analogue vs. digital processing, acceptable latency, etc.), e.g. integrated in one or more integrated circuits, or as a combination of one or more integrated circuits and one or more separate electronic components (e.g. inductor, capacitor, etc.). The configurable signal processor (DSP) provides a processed audio signal, which is intended to be presented to a user. The substrate further comprises a front-end IC (FE) for interfacing the configurable signal processor (DSP) to the input and output transducers, etc., and typically comprising interfaces between analogue and digital signals (e.g. interfaces to microphones and/or loudspeaker(s), and possibly to sensors/detectors). The input and output transducers may be individual separate components, or integrated (e.g. MEMS-based) with other electronic circuitry.

(79) The hearing device (HD) further comprises an output unit (e.g. an output transducer) providing stimuli perceivable by the user as sound based on a processed audio signal from the processor or a signal derived therefrom. In the embodiment of a hearing device in FIG. 8, the ITE part comprises (at least a part of) the output unit in the form of a loudspeaker (also termed a ‘receiver’) (SPK) for converting an electric signal to an acoustic (air borne) signal, which (when the hearing device is mounted at an ear of the user) is directed towards the ear drum (Ear drum), where sound signal (S.sub.ED) is provided. The ITE-part further comprises a guiding element, e.g. a dome, (DO) for guiding and positioning the ITE-part in the ear canal (Ear canal) of the user. In the embodiment of FIG. 8, the ITE-part further comprises a further input transducer, e.g. a microphone (M.sub.ITE), for providing an electric input audio signal representative of an input sound signal (S.sub.ITE) at the ear canal. Propagation of sound (S.sub.ITE) from the environment to a residual volume at the ear drum via direct acoustic paths through the semi-open dome (DO) are indicated in FIG. 8 by dashed arrows (denoted Direct path). The directly propagated sound (indicated by sound fields S.sub.dir) is mixed with sound from the hearing device (HD) (indicated by sound field S.sub.HI) to a resulting sound field (S.sub.ED) at the ear drum. The ITE-part may comprise a (possibly custom made) mould for providing a relatively tight fitting to the user's ear canal. The mould may comprise a ventilation channel to provide a (controlled) leakage of sound from the residual volume between the mould and the ear drum (to manage the occlusion effect).

(80) The electric input signals (from input transducers M.sub.BTE1, M.sub.BTE2, M.sub.ITE) may be processed in the time domain or in the (time-) frequency domain (or partly in the time domain and partly in the frequency domain as considered advantageous for the application in question).

(81) All three (M.sub.BTE1, M.sub.BTE2, M.sub.ITE) or two of the three microphones (M.sub.BTE1, M.sub.ITE) may be included in the ‘personalization’-procedure for head related transfer functions according to the present disclosure. The ‘front’-BTE-microphone (M.sub.BTE1) may be selected as a reference microphone, and the ‘rear’-BTE-microphone (M.sub.BTE2) and/or the ITE-microphone (M.sub.ITE) may be selected as normal microphones for which relative own-voice transfer functions can be measured by the hearing device. Since, relative to the hearing device user's own voice, the hearing device microphones (M.sub.BTE1, M.sub.BTE2, M.sub.ITE) are located in the acoustic near-field, a relatively large level difference may be experienced for the own voice sound receive at the respective microphones. Thus, the relative transfer functions may be substantially different from 1.

(82) In the embodiment of FIG. 8, the connecting element (IC) comprises electric conductors for connecting electric components of the BTE and ITE-parts. The connecting element (IC) may comprises an electric connector (CON) to attach the cable (IC) to a matching connector in the BTE-part. In another embodiment, the connecting element (IC) is an acoustic tube and the loudspeaker (SPK) is located in the BTE-part. In a still further embodiment, the hearing device comprises no BTE-part, but the whole hearing device is housed in the ear mould (ITE-part).

(83) The embodiment of a hearing device (HD) exemplified in FIG. 8 is a portable device comprising a battery (BAT), e.g. a rechargeable battery, e.g. based on Li-Ion battery technology, e.g. for energizing electronic components of the BTE- and possibly ITE-parts. In an embodiment, the hearing device, e.g. a hearing aid, is adapted to provide a frequency dependent gain and/or a level dependent compression and/or a transposition (with or without frequency compression of one or more frequency ranges to one or more other frequency ranges), e.g. to compensate for a hearing impairment of a user. The BTE-part may e.g. comprise a connector (e.g. a DAI or USB connector) for connecting a ‘shoe’ with added functionality (e.g. an FM-shoe or an extra battery, etc.), or a programming device, or a charger, etc., to the hearing device (HD). Alternatively or additionally, the hearing device may comprise a wireless interface for programming and/or charging the hearing device.

(84) FIG. 9 shows a flow chart for an embodiment of a method of estimating personalized acoustic far-field transfer functions for a wearer of a hearing system.

(85) In an aspect, the present application proposes an offline or online procedure for estimating personalized beamformer coefficients for a particular user from information regarding personal own-voice-transfer function(s). The procedure comprises:

(86) A. Measurement of own voice transfer function(s), using microphones located at an ear of the user, and optionally a close-talk microphone located at the mouth of a user;

(87) B. Mapping of the measured own voice transfer function(s) to a set of absolute or relative head related transfer functions;

(88) C. Computation of personalized beamformer coefficients from the set of head related transfer functions.

(89) In an embodiment, a method of estimating personalized beamformer weights for a hearing system comprising a multitude of microphones, one of which being denoted the reference microphone, the hearing system being configured to be worn by a specific user is provided. The method comprises S1. providing at least two electric signals representing sound in an environment of the user at a location of the microphones of the hearing system, the electric input signal from said reference microphone being denoted the reference microphone signal; S2. providing an own voice control signal indicative of whether or not, or with what probability, said at least two electric input signals, or a processed version thereof, comprises a voice from the user of the hearing system, and; S3. providing a database (O.sub.l, H.sub.l), or providing access to such database (O.sub.l, H.sub.l), of absolute or relative acoustic transfer functions or impulse responses, or any transformation thereof, for a multitude of test-persons other than said user, and for each of said multitude of test-persons S3a. providing in the database (O.sub.l, H.sub.l) a relative or absolute own voice transfer function or impulse response, or any transformation thereof, for sound from the mouth of a given test-person among said multitude of test-persons to at least one of a multitude of microphones of a microphone system worn by said given test-person, and S3b. providing in the database (O.sub.l, H.sub.l) a relative or absolute head related acoustic transfer function or impulse response, or any transformation thereof, from at least one spatial location other than the given test-person's mouth to at least one of the microphones of a microphone system worn by said given test-person; S4. estimating an own voice relative transfer function for sound from the user's mouth to at least one of the at least two microphones of the hearing system in dependence of said at least two electric input signals, or a processed version thereof, and on said own voice control signal, and S5. estimating personalized relative or absolute head related acoustic transfer functions or impulse responses from at least one spatial location other than the user's mouth to at least one of the microphones of said hearing system worn by said user in dependence of said estimated own voice relative transfer function and said database (O.sub.l, H.sub.l); and S6. determining personalized beamformer weights (w.sub.k,user) for a beamformer configured to receive said at least two electric input signals, or processed versions thereof, based on said personalized relative or absolute head related acoustic transfer functions (HRTF.sub.l*) or impulse responses (HRIR.sub.l*).

(90) It is intended that the structural features of the devices described above, either in the detailed description and/or in the claims, may be combined with steps of the method, when appropriately substituted by a corresponding process.

(91) As used, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well (i.e. to have the meaning “at least one”), unless expressly stated otherwise. It will be further understood that the terms “includes,” “comprises,” “including,” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will also be understood that when an element is referred to as being “connected” or “coupled” to another element, it can be directly connected or coupled to the other element but an intervening element may also be present, unless expressly stated otherwise. Furthermore, “connected” or “coupled” as used herein may include wirelessly connected or coupled. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items. The steps of any disclosed method is not limited to the exact order stated herein, unless expressly stated otherwise.

(92) It should be appreciated that reference throughout this specification to “one embodiment” or “an embodiment” or “an aspect” or features included as “may” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. Furthermore, the particular features, structures or characteristics may be combined as suitable in one or more embodiments of the disclosure. The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects.

(93) The claims are not intended to be limited to the aspects shown herein but are to be accorded the full scope consistent with the language of the claims, wherein reference to an element in the singular is not intended to mean “one and only one” unless specifically so stated, but rather “one or more.” Unless specifically stated otherwise, the term “some” refers to one or more.

(94) Accordingly, the scope should be judged in terms of the claims that follow.

REFERENCES

(95) [Moore; 2019] A. H. Moore, J. M. de Haan, M. S. Pedersen, P. A. Naylor, M. Brookes, and J. Jensen, Personalized Signal-Independent Beamforming for Binaural Hearing Aids, J. Acoust. Soc. Am., Vol. 145, No. 5, pp. 2971-2981, April 2019. [Loizou; 2013] P. C. Loizou, Speech Enhancement—Theory and Practice, CRC Press, 2nd edition, 2013. [Brandstein; 2001] M. Brandstein, D. Ward (Eds.), Microphone Arrays—Signal Processing Techniques and Applications, Springer 2001. [Marquardt; 2015] Development and evaluation of psychoacoustically motivated binaural noise reduction and cue preservation techniques, PhD Thesis, University of Oldenburg, Germany, November 2015. EP2928215A1 (Oticon A/S) 7 Oct. 2015