A HEARING AID SYSTEM FOR ESTIMATING ACOUSTIC TRANSFER FUNCTIONS
20230044634 · 2023-02-09
Assignee
Inventors
- Nels Hede Rohde (Smørum, DK)
- Thomas Bentsen (Smørum, DK)
- Anders Brødløs Olsen (Smørum, DK)
- Michael Syskind Pedersen (Smørum, DK)
- Svend FELDT (Ballerup, DK)
- Jesper Jensen (Smørum, DK)
Cpc classification
G01B17/00
PHYSICS
H04R25/407
ELECTRICITY
H04R25/554
ELECTRICITY
H04R2225/43
ELECTRICITY
H04R2225/55
ELECTRICITY
H04R2225/67
ELECTRICITY
H04R25/30
ELECTRICITY
International classification
Abstract
A hearing aid system comprises a hearing aid, and a portable auxiliary device’ adapted to establish a communication link between them. The hearing aid comprises a microphone providing an electric input signal, a signal processor, and an output unit. The auxiliary device comprises a microphone providing an auxiliary electric input signal, and a user control interface allowing a user to initiate a specific calibration mode of operation of the hearing aid system. The signal processor of the hearing aid is configured to receive corresponding time segments of said electric input signal and said auxiliary electric input signal to provide an estimate of an acoustic transfer function from said microphone of said auxiliary device to said microphone of said hearing aid. A method of operating a hearing aid system is further disclosed. The invention may e.g. be used in various applications related to own voice detection and estimation.
Claims
1. A hearing system comprising a headset adapted for being worn by a user at or in an ear of the user, and a portable auxiliary device, wherein the headset is adapted to establish a communication link to the auxiliary device to provide that data can be exchanged between the headset and the auxiliary device, or forwarded from one of the headset and the auxiliary device to the other, wherein the headset further comprises an input unit comprising at least one microphone for picking up sound from the environment of the headset, including the user's own voice, and to provide at least one electric input signal representative of said sound, an output unit for presenting stimuli perceivable as sound to the user, and a signal processor configured to perform a processing on a time segment of said at least one electric input signal, and a corresponding time segment of at least one auxiliary electric input signal provided by a microphone of said auxiliary device, or a transform of said time segment of said at least one electric input signal, and a corresponding transform of said corresponding time segment of said at least one auxiliary electric input signal, or a selected frequency range of said time segment of said at least one electric input signal, and a selected frequency range of said corresponding time segment of said at least one auxiliary electric input signal; and provide, based on said processing, an estimate of an acoustic transfer function from a microphone of said auxiliary device to said at least one microphone of the headset.
2. A hearing system according to claim 1 wherein the headset is configured to pick up the user's own voice via said input unit and to transmit the picked-up user's own voice to a far-end communication partner, and to receive sound from a far-end communication partner and present the received sound to the user via the output unit of the headset.
3. A hearing system according to claim 1 wherein the input unit of the headset comprises at least two microphones each providing an electric input signal.
4. A hearing system according to claim 3 comprising a beamformer filter providing one or more beamformers by applying predetermined or adaptively determined filter weights to the respective electric input signals of the at least two microphones.
5. A hearing system according to claim 4 wherein the one or more beamformers comprise an own voice beamformer comprising personalized filter weights, the own voice beamformer being configured to enhance signals originating from the direction of the user's mouth and to suppress sound signals from other directions.
6. A hearing system according to claim 4 wherein said one or more beamformers further comprises a beamformer comprising personalized filter weights, wherein the beamformer is configured to suppress sound signals from a far-field speaker.
7. A hearing system according to claim 5 wherein said personalized filter weights are determined in dependence of the estimate of at least one acoustic transfer function from said at least one microphone of said auxiliary device to said at least two microphones of the headset.
8. A hearing system according to claim 1 wherein said headset, in a communication mode of operation, is configured to transmit a signal comprising the estimate of the user's own voice to another device.
9. A hearing system according to claim 1 wherein said auxiliary device comprises at least one microphone for picking up sound from the environment of the auxiliary device and for providing corresponding at least one auxiliary electric input signal representative of the sound.
10. A hearing system according to claim 1 wherein said auxiliary device comprises a user control interface allowing a user to initiate a specific calibration mode of operation of the hearing system.
11. A hearing system according to claim 10 wherein the auxiliary device is configured to generate a calibration control signal upon initiation of the specific calibration from the user control interface.
12. A hearing system according to claim 11 wherein the auxiliary device is configured to transmit a current time segment of the at least one auxiliary electric input signal, or a transform of the current time segment of the at least one auxiliary electric input signal, or a selected frequency region of the current time segment of the at least one auxiliary electric input signal, to the headset in dependence of the calibration control signal.
13. A hearing system according to claim 1 wherein said headset and said auxiliary device comprises antenna and transceiver circuitry allowing the communication link to be established between the head set and the auxiliary device.
14. A hearing system according to claim 1 wherein said headset comprises a single earpiece adapted to be located at a left and/or right ear of the user.
15. A hearing system according to claim 1 wherein said headset comprises left and right earpieces adapted to be located at left and right ears of the user, respectively.
16. A hearing system according to claim 15 wherein said left and right earpieces are configured to establish a communication link allowing the exchange of data between them.
17. A hearing system according to claim 1 comprising a memory that stores: said time segment of said at least one electric input signal, and/or said corresponding time segment of said at least one auxiliary electric input signal, or said transform of said at least one electric input signal, and/or said corresponding transform of said at least one auxiliary electric input signal, or said selected frequency of said at least one electric input signal, and/or said selected frequency region of said at least one auxiliary electric input signal.
18. A hearing system according to claim 1 comprising a distance sensor for estimating a distance between the auxiliary device and the headset.
19. A method of operating a hearing system, the hearing system comprising a headset adapted for being worn by a user at or in an ear of the user, the headset comprising at least one microphone, and a portable auxiliary device comprising at least one auxiliary microphone, wherein the hearing system is adapted to establish a communication link between the headset and the auxiliary device by which data is exchanged between the headset and the auxiliary device, or forwarded from one to of the headset and the auxiliary device to the other, the method comprising in the headset receiving, via the at least one microphone, at least one electric input signal representative of sound from the environment of the headset, presenting stimuli perceivable as sound to the user, in the auxiliary device receiving, via the at least one auxiliary microphone, at least one auxiliary electric input signal representative of said sound from the environment of the headset, performing a processing on a time segment of said at least one electric input signal, and a corresponding time segment of said at least one auxiliary electric input signal, or a transform of said time segment of said at least one electric input signal, and a corresponding transform of said corresponding time segment of said at least one auxiliary electric input signal, or a selected frequency region of said time segment of said at least one electric input signal, and a selected frequency region of said corresponding time segment of said at least one auxiliary electric input signal, and providing, based on said processing, an estimate of a personalized transfer function from said at least one auxiliary microphone of said auxiliary device to said at least one microphone of said headset.
20. A method according to claim 19 further comprising providing a user control interface allowing the user to initiate a specific own voice calibration mode of operation of the hearing system.
21. A method according to claim 19 further comprising providing an own voice beamformer comprising personalized filter weights determined in dependence of said estimate of a personalized transfer function.
22. A headset configured to be used in a hearing system, wherein the headset is configured to be worn by a user at or in an ear of the user, the headset comprising antenna and transceiver circuitry allowing the headset to establish a communication link to an auxiliary device to provide that data can be exchanged between them or forwarded from one to the other, at least one microphone for picking up sound from the environment of the headset and for provide corresponding at least one electric input signal representative of said sound, a signal processor configured to process said at least one electric input signal or a signal or signals derived therefrom, and an output transducer for presenting stimuli perceivable as sound to the user representative of the processed signal, wherein the headset is configured to receive at least one auxiliary electric input signal provided by a microphone of said auxiliary device via said communication link, and wherein the signal processor, in a specific own voice calibration mode of operation of the headset, is configured to perform a processing on a time segment of said at least one electric input signal, and a corresponding time segment of said at least one auxiliary electric input signal, or a transform of said time segment of said at least one electric input signal, and a corresponding transform of said corresponding time segment of said at least one auxiliary electric input signal, or a selected frequency range of said time segment of said at least one electric input signal, and a selected frequency region of said corresponding time segment of said at least one auxiliary electric input signal, and to provide, based on said processing, an estimate of a personalized own voice transfer function from said microphone of said auxiliary device to said at least one microphone of the headset.
23. A headset according to claim 22 further comprising a beamformer filter configured to provide an own voice beamformer or an own-voice cancelling beamformer comprising personalized filter weights determined in dependence of said estimate of a personalized own voice transfer function.
24. A headset according to claim 23 configured to receive a calibration control signal from a user control interface allowing a user to initiate said specific own voice calibration mode of operation of the headset.
25. A non-transitory computer-readable medium on which is stored an application, termed an APP, comprising executable instructions configured to be executed on an auxiliary device to implement a user control interface for a hearing system as claimed in claim 1 wherein the user control interface is configured to allow a user to control functionality of the hearing system, including an initiation of a specific calibration mode of operation of the hearing system.
26. A non-transitory computer-readable medium according to claim 25 wherein the APP is configured to run on a cellular phone or on another portable device allowing communication with said headset or said hearing system.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0133] The aspects of the disclosure may be best understood from the following detailed description taken in conjunction with the accompanying figures. The figures are schematic and simplified for clarity, and they just show details to improve the understanding of the claims, while other details are left out. Throughout, the same reference numerals are used for identical or corresponding parts. The individual features of each aspect may each be combined with any or all features of the other aspects. These and other aspects, features and/or technical effect will be apparent from and elucidated with reference to the illustrations described hereinafter in which:
[0134]
[0135]
[0136]
[0137]
[0138]
[0139]
[0140]
[0141]
[0142]
[0143] The figures are schematic and simplified for clarity, and they just show details which are essential to the understanding of the disclosure, while other details are left out. Throughout, the same reference signs are used for identical or corresponding parts.
[0144] Further scope of applicability of the present disclosure will become apparent from the detailed description given hereinafter. However, it should be understood that the detailed description and specific examples, while indicating preferred embodiments of the disclosure, are given by way of illustration only. Other embodiments may become apparent to those skilled in the art from the following detailed description.
DETAILED DESCRIPTION OF EMBODIMENTS
[0145] The detailed description set forth below in connection with the appended drawings is intended as a description of various configurations. The detailed description includes specific details for the purpose of providing a thorough understanding of various concepts. However, it will be apparent to those skilled in the art that these concepts may be practiced without these specific details. Several aspects of the apparatus and methods are described by various blocks, functional units, modules, components, circuits, steps, processes, algorithms, etc. (collectively referred to as “elements”). Depending upon particular application, design constraints or other reasons, these elements may be implemented using electronic hardware, computer program, or any combination thereof.
[0146] The electronic hardware may include micro-electronic-mechanical systems (MEMS), integrated circuits (e.g. application specific), microprocessors, microcontrollers, digital signal processors (DSPs), field programmable gate arrays (FPGAs), programmable logic devices (PLDs), gated logic, discrete hardware circuits, printed circuit boards (PCB) (e.g. flexible PCBs), and other suitable hardware configured to perform the various functionality described throughout this disclosure, e.g. sensors, e.g. for sensing and/or registering physical properties of the environment, the device, the user, etc. Computer program shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise.
[0147] The present application relates to the field of hearing aids. It deals in particular to various aspects of retrieval and/or detection of a hearing aid user's own voice, e.g. in relation to beamforming and/or preservation or reestablishment of spatial cues.
[0148] Personal own voice transfer functions (OVTFs) may be estimated simply by using a portable electronic device, e.g. a mobile phone (or similar communication device comprising a microphone and a transmitter), or a wireless microphone. Imagine that the hearing aid (HA) system is in an OVTF estimation mode (calibration mode), e.g., triggered by the HA-user or a hearing care professional (HCP), e.g. via a user interface, e.g. an APP (e.g. of a mobile phone). In such calibration mode, the hearing aid system may be configured to prompt the HA user to place the mobile phone in front of his/her mouth and speak in a natural manner for some time, e.g. 1-10 seconds. For simplicity, the user may be asked to speak particular sound elements, e.g. a particular sentence (e.g. presented at the user interface, e.g. with a certain vocal effort, e.g. dependent on an environment noise level). For OVTF estimation the exact content of the speech signal is irrelevant. The OVTF estimation procedure should preferably take place in an otherwise acoustically quiet situation. This may be verified by the HA-system, e.g. the hearing aid(s), or the mobile phone, or a combination of both (or by a separate device), before initiating the estimation (calibration) procedure. Ideally, the user should be located away from reflecting surfaces, such as walls, etc., during calibration. Furthermore, ideally, the auxiliary device (e.g. a mobile phone) should be placed in a manner to reduce reflections from the phone surface to the microphones of the HA (e.g. by positioning it to have its largest surface, e.g. its display, in a horizontal plane, when the user is upright, cf.
[0149]
[0150] The speech signal of the HA-user is picked up by the microphone(s) (ADM) in the phone (AD) and by the microphone(s) (M.sub.i) in the users' HA(s) (HD). From these signals, the acoustic transfer function from the HA-users' mouth (actually from the microphone of the phone) to the microphones of the HA-system may be estimated. The user may wear a hearing aid at one ear or at both ears.
[0151] In more detail, let s.sub.ov(n) denote the own-voice time-domain signal picked up by a microphone in the mobile phone, placed at the mouth reference point, i.e., a position in front of (and close to) the HA-users' mouth. Furthermore, let s.sub.1(n), . . . , s.sub.M(n) denote the corresponding speech signals picked up by the M microphones of the HA (either in one HA at one ear, or in two HAs at both ears or in additional devices, e.g., a separate wireless microphone). Consider the Fourier transform of the picked-up signals and denote them by S.sub.ov(ω) and S.sub.1(ω), . . . , S.sub.M(ω), respectively. Clearly, the acoustic transfer function from the mouth reference point to microphone i, i.e., the OVTF, is given by
[0152] In practice, S.sub.i(ω) and S.sub.ov(ω) are found by applying the Discrete Fourier Transform (DFT) to the microphone signals s.sub.ov(n) and s.sub.1(n), . . . , s.sub.M(n) leading to discrete acoustic transfer functions
where k is the frequency bin index and K is the order of the DFT, e.g. 64 or 128.
[0153] For signal processing applications, it is often useful to collect the OVTFs for all microphones in one vector,
H.sub.ov(k)=[H.sub.ov,1(k) . . . H.sub.ov,M(k)].sup.T.
[0154] It is often of relevance (see examples below) to consider relative OVTFs, defined as
where 1≤i′≤M is the index of a pre-selected reference microphone (one of the microphones in the HA system, e.g. a front microphone of a hearing aid), and to collect these in a relative OVTF vector, defined as
d.sub.ov(k)=[d.sub.ov,1(k) . . . d.sub.ov,M(k)].sup.T.
[0155] In summary, OVTFs H.sub.ov(k)=[H.sub.ov,1(k) . . . H.sub.ov,M(k)].sup.T and relative OVTFs d.sub.ov(k)=[d.sub.ov,1(k) . . . d.sub.ov,M(k)].sup.T may be estimated from microphone signals s.sub.ov(n) and s.sub.1(n), . . . , s.sub.M(n). Note, when estimated in the manner described here, these OVTFs are personal, i.e., they reflect the personal acoustics (head shape, size, pinna, HA-location) of a particular HA-user. In practice, slightly more advanced, noise-robust, and data-efficient methods may be applied for estimating the OFTFs H.sub.ov,i(k) [Farina, 2000] rather than simply forming the ratio H.sub.ov,i (k)=S.sub.i(k)/S.sub.ov(k). The estimation procedure described above assumes that all relevant signals are available for processing in one place—so we assume that the relevant signals are transmitted (e.g. wirelessly), e.g. from the mobile phone to the hearing aid system (or elsewhere).
[0156]
[0157] Similarly, it is of interest to estimate the (relative) acoustic transfer function from the typical position of a conversation partner (or a competing speaker) to the microphones of the HA—we denote this acoustic transfer function as the frontal head-related transfer function (HRTF). Estimation of this HRTF may be done using a mobile phone as a wireless loudspeaker. EP2928215A1 describes the use of an auxiliary device (e.g. a mobile telephone) for self-calibration of beamformers for retrieving non-own-voice sound sources of interest.
[0158] Imagine that the HA system is in a (frontal) HRTF estimation mode, e.g. triggered by the HA-user or a hearing care professional (HCP) via an APP. The user holds the mobile phone in a frontal position at an arm's length distance (the typical position of a conversation partner) at a height corresponding to the users' mouth, the loudspeaker of the mobile telephone emits a test sound signal s.sub.f(n) from its speaker, and the probe signal is picked up by the microphones of the HA-system worn by the user (cf.
[0159]
[0160] A camera of the mobile phone may be used to give feedback to the user, that the mobile phone is in the correct position (e.g. according to a predefined criterion). The duration of the test sound signal could be ranging from a few 100 ms to several seconds (e.g. in the range between 1 s and 10 s; the longer the duration, the more accurately the HRTF may be estimated, but the higher the risk that the user is unable to hold the mobile phone or his or her head still). The exact content of the test sound signal is less important, as long as the signal contains energy at all relevant frequencies (e.g. speech frequencies). Ideally, the estimation procedure takes place in an otherwise acoustically quiet situation and in a room without too many reflections, e.g. in a room with soft carpets, curtains, etc. Even if the measurement takes place in a reflective environment, the late reflections may be removed from the estimated impulse response (IR) by truncation of the ‘reverberant’ IR tail.
[0161] In an embodiment the phone is mounted in a selfie stick. Based on a correlation (e.g. estimated by the hearing aid system, e.g. the hearing aid or the auxiliary device) between the hearing aid microphones and the microphone of the mobile phone, the length of the selfie stick may be adjusted such that a desired distance between the hearing instrument microphones and the phone in front of the user is obtained.
[0162] In the setup of
[0163] The user may (e.g. via the user interface, e.g. via the auxiliary device) initiate the (calibration) measurement, when the auxiliary device is located in an intended position relative to the user. The measurement may also be initiated when a certain distance is obtained (as e.g. determined by a distance sensor). Hereby the user does not have to actively initiate the measurement.
[0164] In an embodiment, the user is notified prior to the beginning of the measurement (to achieve that the user is not moving during the measurement). Notification may happen via the phone screen, by audio from the phone, or via audio played via the output unit of the hearing aid. This has the advantage that the user becomes aware not to move.
[0165] As before, let s.sub.1(n), . . . , s.sub.M(n) denote the corresponding signals picked up by the microphones of the HA-system. Now the frontal HRTF H.sub.f,i(k) from the mobile phone to the ith microphone, and the frontal relative HRTF d.sub.f,i(k)=H.sub.f,i(k)/H.sub.f,i′(k) can be estimated exactly as in the discussion in connection with
H.sub.f(k)=[H.sub.f,1(k) . . . H.sub.f,M(k)].sup.T,
and the relative frontal HRTF is denoted as
d.sub.f(k)=[d.sub.f,1(k) . . . d.sub.f,M(k)].sup.T.
[0166] In practice, the (relative) HRTF may be estimated using slightly more complicated procedures than described in the previous section. Specifically, it may be beneficial that the test sound signal is a chirp signal (a tonal signal whose frequency increases with time); in this case, the HRTF may be estimated using the procedure outlined in [Farina, 2000].
[0167] The HRTFs may be measured for multiple sound source positions (angles), not only the frontal. Clearly, it is hard for a person to hold a mobile phone in her hand at an angle of, say, 25 degrees wrt. his/her nose direction. However, the hearing aid system may be configured to provide that the auxiliary device (e.g. the phone) delivers feedback to the user (e.g., via the loudspeaker or the screen) if/when the phone is held in the correct position. This may be achieved using the camera of phone (e.g. based on a user input regarding the position of interest, e.g. selected among a number of predefined positions, e.g. via the user interface). Once in the correct position, the phone emits the test sound signal and measures the HRTF as described above. This process could be repeated for a range of front-half-plane locations of the mobile phone.
EXAMPLES
Application 1. Personalized Own-Voice Beamformer/Noise Reduction System
[0168] This application uses the OVTFs d.sub.o(k)=[d.sub.o,1(k) . . . d.sub.o,M(k)].sup.T estimated as described above.
[0169] For an application such as handsfree telephony in HAs and voice-controlled HAs, it is essential to be able to retrieve (an estimate of) a clean version of the users' speech signal, even in acoustically noisy situations. In order to do so, one can design beamforming systems based on the microphone signals of the HA system in order to enhance signals originating from the direction of the users' mouth and suppress sound signals from other directions.
[0170] For example, it is well-known that the filter coefficients of a Minimum Variance Distortion-Less Response (MVDR) beamformer are given by
where C.sub.v(k,l) denotes the cross-power spectral density matrix at frequency k and time instant l (see e.g. [Jensen et al., 2015] and the references therein for methods for estimating C.sub.v(k,l)), and where d(k) is the relative acoustic transfer function from a sound source of interest to microphones providing input to the MVDR-beamformer.
[0171] Inserting the estimated OVTF vector, d.sub.o(k), into this expression leads to a personalized own voice beamformer,
which leads to a better own-voice retrieval/noise reduction trade-off than when using a non-personalized d(k), e.g. as estimated from a Head-And-Torso Simulator (HATS). Alternative own-voice retrieval systems easily follow, e.g. based on the Multi-Channel Wiener Filter, Delay-and-Sum Beamformer [Brandstein et al., 2001], Beamformer-Informed Postfilter solutions [Jensen et al., 2015], etc.
Application 2. Personalized Own-Voice Beamformer with Frontal Interference Rejection
[0172] This application uses the OVTFs d.sub.o(k)=[d.sub.o,1(k) . . . d.sub.o,M(k)].sup.T estimated as described above, together with the frontal HRTFs d.sub.f(k)=[d.sub.f,1(k) . . . d.sub.f,M(k)].sup.T estimated as described above.
[0173] The idea is an extension of the idea described in section ‘Application 1’ above, where, in addition to retrieving the users' own voice signal, a spatial null is directed towards the frontal direction, in order to maximally suppress a presumed competing speaker. It is well-known that a beamformer, which can perform this task is a special case of a Linear Constrained Minimum Variance (LCMV) beamformer. The beamformer coefficient vector is found by solving the problem
subject to the constraints
w.sup.H(k,l)d.sub.o(k)=1,
and
w.sup.H(k,l)d.sub.f(k)=0.
[0174] It is well-known that this problem obeys a simple, closed-form solution [Haykin, 2001].
[0175] Alternatives to the LCMV beamformer solution exist—for example, it is straightforward to extend it with a postfilter.
Application 3. Online Personalization of Own-Voice-Driven Algorithms
[0176] This application uses the OVTFs d.sub.o(k)=[d.sub.o,1(k) . . . d.sub.o,M(k)].sup.T estimated as described above and assumes (optionally) that a batch of the users' own voice is recorded with the HA-microphones. An extension of the idea also uses the (frontal) HRTF d.sub.f(k)=[d.sub.f,1(k) . . . d.sub.f,M(k)].sup.T estimated as described above.
[0177] Assume that a data-driven algorithm is present in the HA-system. Such algorithm could typically involve a deep neural network (DNN) trained to solve a relevant task. In the example below, we assume that this algorithm is an own-voice activity detector (OVAD), but this is only an example—other data-driven own-voice-relevant algorithms exist, e.g., keyword spotting algorithms, hands-free telephony related algorithms, etc.
[0178] Assume, for example, that the OVAD is based on a deep neural network (DNN), which is trained to classify each time-frequency tile in the input signal as a) own-voice dominated, b) not own-voice dominated (comprising background noise, external talkers, silence, etc.), cf. e.g. [Garde, 2019]. An OVAD serves as a pre-requisite for other algorithms, e.g., algorithms for estimating the noise cross-power spectral density matrix C.sub.v(k,l), etc., cf. e.g. [Garde, 2019]. Traditionally, the training of such DNN-OVAD takes place off-line, i.e., prior to HA-usage, using speech signals uttered by many different speakers (males, females, children) and recorded by HAs on their individual ears. The resulting OVAD-algorithm works well on average across a group of representative users—this is a speaker-independent algorithm.
[0179] However, given access to the personal OVTF d.sub.o(k) along with examples of speech from the user in question, the DNN may be re-trained (or trained further, aka transfer learning) online, i.e., during HA usage, using artificially generated own-voice microphone signals. Specifically, the artificial own-voice signals may be generated according to
S.sub.i(k,l)=d.sub.o,i(k).Math.S.sub.o(k,l),
where S.sub.i(k,l) is the Short-Time Fourier Transform of the artificial personalized own-voice signal recorded at microphone i, d.sub.o,i(k) is the OVTF estimated as described above, and S.sub.o(k,l) is the STFT of the recording of the users' own voice. Time-domain versions of the artificial own-voice microphone signals may be constructed by applying the inverse STFT to the STFT-signals. If a recording of the users' own voice is not available, a collection of other speech signals may be used, e.g. from speakers of the same gender as the user, if such information is available. In this situation, the data-driven algorithm will be personalized in terms of OVTFs but not in terms of the users' voice characteristics.
[0180] Re-training (or continued training) of a DNN during HA-usage may be hard due to memory and computational complexity limitations of the HA. One could bypass this problem by transmitting the relevant data (OVTFs and optional own voice signals and optional DNN parameters) wirelessly to an external computational unit, which, after re-training, would transmit the resulting DNN weights back to the HA-system.
[0181] As already mentioned, the presented idea of using the OVTFs and (optionally) recordings of the users' own voice is not limited to the OVAD example described above, but may be applied to personalize any data-driven algorithm onboard the HA.
[0182] An extension of the idea involves including a frontal competing speaker in the artificially generated training data. In particular, noisy own-voice signals may be generated according to
X.sub.i(k,l)=d.sub.o,i(k).Math.S.sub.o(k,l)+d.sub.f,i(k).Math.S.sub.f(k,l)+V(k,l),
where d.sub.f,i(k) are (frontal) HRTFs, e.g. measured as described in Sec. 2.2, S.sub.f(k,l) is the STFT of the voice signal of a competing speaker, and V (k,l) is an arbitrary noise signal representing non-coherent noise sources in the acoustic environment. The competing speech signal S.sub.f(k,l) could be generated from arbitrary speech signals from a large quantity of male and female speakers (as the competing speaker is generally unknown in practice), and V (k,l) could be generated from relevant acoustic noise, e.g., noise from a cafeteria situation or a passenger-in-a-train situations, etc. as recorded by the HA-microphones on a HATS. It is assumed that signals S.sub.f(k,l), and V (k,l) are present in an external computational device, where (re-)training of the network weights take place.
Application 4. OVTF Equalization
[0183] The idea uses the OVTFs d.sub.o(k)=[d.sub.o,1(k) . . . d.sub.o,M(k)].sup.T estimated as described above.
[0184] One approach to realize personalized own-voice processing is by modifying the actual signal processing algorithms taking place in the HA-system, e.g. (re-)training DNN weights to fit personal head acoustics (example 3) or modifying beamformer weights to reflect personal head- and torso-acoustics. It may, however, be desirable to maintain the same signal processing algorithm implementations (including DNN weights) for all users (such processing algorithms may include own-voice-relevant algorithms, e.g. an own voice detection algorithm, a speech recognition algorithm, e.g. a keyword detection algorithm, etc.). In particular, it would be desirable, if the own voice processing algorithms on-board the HA system were optimized for the same OVTF, e.g. the one of a HATS—this would make system development, debugging, maintenance, and logistics easier.
[0185] To do so, while still achieving the improvements of personalized processing, we propose to pre-weigh or equalize the microphone signals during signal regions where the own-voice signal dominates (e.g. as estimated using an OVAD). In particular, when operating the own-voice related algorithms during own-voice activity, we propose to weigh the ith microphone signal S.sub.mics,i(k,l) according to
S.sub.mics,i(k,l)=d.sub.HATS,i(k)/d.sub.o,i(k).Math.S.sub.mics,i(k,l),
where d.sub.o,i(k) is the OVTF of the particular user estimated as described above, d.sub.HATS,i(k) is a set of OVTF coefficients as measured on a HATS (offline in a sound studio of the HA manufacturer, e.g. estimated as described above) and stored in the HA memory, and S.sub.mics,i(k,l) denotes the STFT of the own-voice signal recorded on the ith microphone, for the user in question.
[0186] The proposed equalization scheme transforms the own-voice microphone signals of a particular user, to the own-voice microphone signals of a HATS. This allows the subsequent processing applied in the HA-system to be optimized for a HATS, irrespective of the actual user. In other words, the processing after the equalization would be identical for all users.
Application 5. Acoustic Rendering Using HRTFs
[0187] The idea uses the (frontal) absolute HRTF H.sub.f(k)=[H.sub.f,1(k) . . . H.sub.f,M(k)].sup.T, estimated as described above. Optionally, the idea uses the frontal HRTF in addition to absolute HRTFs measured from other directions than the frontal.
[0188] We propose to combine the set of measured personal HRTFs with a set of pre-measured HRTFs (e.g., from a HATS), for other directions not covered by the personal HRTF set. We propose to use the combined set of HRTFs for spatially realistic rendering of acoustic signals for the user of a hearing device. In particular, the combined HRTF set makes it possible to play back sounds of interest for the user, e.g., phone calls, sound notifications, jingles, etc., as if they originated from a position outside the users body, e.g., in the frontal position, or slightly to the left, etc., or to render an ambient signal more realistically, using more or all HRTFs in the combined set.
[0189] Specifically, without loss of generality, let i=1 denote the index of a HA-microphone close to the left eardrum of the user, and let i=2 denote the index of a HA-microphone close to the right eardrum of the user. Also, still without loss of generality, let us consider rendering a sound source as originating from the frontal position (for example). Hence, H.sub.f,1(k) denotes the acoustic transfer function from a position in front of the user to her left ear, while H.sub.f,2(k) denotes the acoustic transfer function from the same position in front of the user to her right ear.
[0190] Then a sound of interest for the user may be rendered as originating from the front according to
S.sub.i(k,l)=H.sub.f,i(k)S(k,l), i=1,2,
where S(k,l) is the STFT of the sound of interest, while S.sub.1(k,l) and S.sub.2 (k,l) is the STFT of the signal present to the left and right ear, respectively, of the user.
[0191] This approach may be generalized to the synthesis of more complex sound fields according to
where S.sub.j(k,l) is the STFT of the component of the sound of interest originating from location j, H.sub.j,i(k) is the (personalized or HATS-based) HRTF from location j to the microphone close to the ith ear, and S.sub.i(k,l) is the STFT of the sound to be presented to the ith ear. The location index j, could span some or all HRTFs in the combined HRTF set (i.e., both personal and HATS-based HRTFs). The advantage of including personal HRTFs over using all-HATS-based HRTFs is that the spatial sound perception becomes more realistic to the individual user.
[0192]
[0193] The hearing aid (HD) comprises an input unit (IU) comprising at least one microphone (here two, M.sub.1, M.sub.2) for picking up sound from the environment of the hearing aid and to provide corresponding at least one electric input signal (S.sub.1(ω), S.sub.2(ω) representative of the sound (where ω may represent frequency). The input unit (IU) may comprise analogue to digital converters to provide the electric input signal(s) in digitized form as digital samples, and analysis filter banks for providing the electric input signal(s) as frequency sub-band signals, as appropriate for the application in question. The hearing aid (HD) further comprises a signal processor (SPU) configured to perform processing in the hearing aid. The signal processor (SPU) may comprise a hearing aid processor part (HAP) that is configured to process the at least one electric input signal or a signal or signals derived therefrom and to provide a processed signal (OUT). The hearing aid (HD) further comprises an output unit (OU), e.g. comprising a loudspeaker, a vibrator, or a multi-electrode array, for presenting stimuli (e.g. acoustic vibrations or electric stimuli) perceivable as sound to the user representative of the processed signal (OUT), see solid arrow denoted ‘Stimuli’ in
[0194] The auxiliary device (AD) comprises at least one microphone (AD-M) for picking up sound from the environment of the auxiliary device (AD) and to provide corresponding at least one auxiliary electric input signal (ADM-IN) representative of the sound. The auxiliary device (AD) further comprises a user control interface (UI), e.g. a keyboard of a touch sensitive screen, allowing a user (U) to initiate a specific calibration mode of operation of the hearing aid system (HAS), see solid arrow denoted ‘V-Control’ and symbolic hand denoted ‘T-control’ in
[0195] The embodiment of a hearing aid system in
[0196] The signal processor (SPU) of the hearing aid (HD) is configured to compare corresponding time segments of the at least one electric input signal (S.sub.1(ω), S.sub.2(ω)), and the at least one auxiliary electric input signal (ADin), or corresponding transforms thereof, and to provide an estimate of a transfer function (HRTF, OVTF) from the auxiliary device (AD) (e.g. from the at least one microphone (ADM) or from a loudspeaker (AD-SPK) of the auxiliary device, see below) to the at least one microphone (M.sub.1, M.sub.2) of the hearing aid (HD). In the embodiment of
[0197] When the at least one microphone (ADM) of the auxiliary device (AD) is positioned in proximity of, e.g. in front of, the user's mouth (as e.g. described in connection with
[0198] The mode control signal (MCtr) from the user interface (UI) may e.g. be used to control the hearing aid signal processor (HAP) of the forward path of the hearing aid (HD) between the input unit (IU) and the output unit (OU), cf. control signal HActr. In the embodiment of
[0199] A customization (personalization) of the filter weights of the (far-field) beamformer (FF-BF) to the particular user may be performed (as described and exemplified in detail above) by the present embodiment of a hearing aid system using a loudspeaker of the auxiliary device to play a test sound (calibration sound) in a specific calibration mode whose aim it is to determine head related transfer functions (HRTF, cf. e.g.
[0200] The auxiliary device (AD) may thus (in an embodiment) preferably comprise a loudspeaker (AD-SPK) and the auxiliary device may be configured to—in a specific calibration mode of operation—play a test sound signal (cf. ‘test sound, s.sub.f(n) in
[0201] In the calibration mode, the auxiliary device is positioned at a preferred location relative to the user (hearing aid microphone(s)) from which an (acoustic) transfer function is to be estimated, e.g. held in a hand, or located at a table or other support. The preferred location (e.g. distance to, angle to, etc.) relative to the user may be known in advance (e.g. carrying auxiliary device on a stick (e.g. a ‘selfie-stick’) of known length), or be estimated during calibration, e.g. using one or more sensors, e.g. of the auxiliary device and/or the hearing aid, e.g. a camera, and/or a radar sensor. The hearing aid system (HAS) may be configured to make data representative of the estimated location of the loudspeaker (AD-SPK) relative to the hearing aid (HD) microphones (M.sub.1, M.sub.2) available (e.g. transmitted) to the hearing aid (e.g. via the communication link (WL-RF), and e.g. to form part of the mode control signal (MCtr) fed to the controller (TF-PRO).
[0202] The auxiliary device (AD) comprises a controller (CNT) configured to—in said specific (far-field) calibration mode of operation—provide a test or calibration signal (CalS), which is fed to and played by the loudspeaker (AD-SPK) thereby providing the test sound signal (cf. also
[0203] The auxiliary device (AD) is configured to allow the control inputs (UCtr) from the user control interface (UI) to control the transmission of microphone signals (ADM-IN) and/or test/calibration signals (CalS′) and/or other control signals (UCtr), e.g. mode control signals for initiating and/or terminating a calibration mode, and/or other modes of operation of the hearing aid (e.g. a telephone mode) from the auxiliary device to the hearing aid(s).
[0204] In the embodiments of
[0205]
[0206]
[0207]
[0208] The instructions for calibrating own voice transfer functions (OVTF) are [0209] Locate device horizontally (microphone close to mouth). [0210] During calibration keep your head still and don't move device. [0211] Speak normally for ˜10 s.
[0212] These instructions should prompt the user to [0213] Place the device with its microphone input close to the user's mouth (e.g. ≤0.1 m from) while trying to minimize reflections of the user's voice by the device (which may provide reverberation-like disturbances and thus degrade the quality of the OVTF-estimation). [0214] Preferably, keep the device (and the body) as still as possible during the length of the calibration, which is estimated at 10 seconds. [0215] Speak normal sentences during the calibration period (e.g. with a normal vocal effort). A further instruction may be to ask the user to read a specific text that is known to ‘excite’ a relevant frequency range of the user's voice. [0216] Press Start/Stop ‘button’ to initiate calibration procedure.
[0217]
[0218] The instructions for calibrating head related transfer functions (HRTF) are [0219] Locate (e.g. hold) device at intended location with screen towards you (loudspeaker at ear-level). [0220] Activate selfie mode. [0221] During calibration (while test sound is being played) keep your head still and don't move device.
[0222] These instructions should prompt the user to [0223] Place the auxiliary device in a location (direction and distance) relative to the user where the target sound source is expected to be located, e.g. in front of the user, e.g. ≥1 m away from the user, e.g. by holding the auxiliary device in a hand or on mounted on a stick (e.g. a ‘selfie-stick’). [0224] Activate a camera mode of operation where the screen shows you a ‘mirror-image of yourself’. This might help in positioning the device in the right height (and may facilitate the use of automatic positioning sensing using the camera image). Preferably, the device should be at level with the eyes (and ears) of the user. [0225] Preferably, to keep the device (and the body) as still as possible during the length of the calibration, which can be verified by the user by the perception of the test sound (the calibration procedure is e.g. estimated at 10 seconds). The camera of the auxiliary device may record the user while the sound is played (allowing an estimate of possible movements during calibration). [0226] Press Start/Stop ‘button’ to initiate calibration procedure.
[0227] The Start/Stop ‘button’ may further be used to terminate the calibration procedure, e.g. if something is not right (sudden movements, noise, other activities, etc.).
[0228] An acceptance step, requesting the user to accept the calibration measurement may be included (to give the user a chance to discard the results, if for some reason they are not as intended, e.g. due to noise or other unintended events during the measurements).
[0229] Preferably, the initiation time of the calibration procedure (pressing of START) (and possibly the start time (and/or end time) of the calibration signal), the chosen location (e.g. angle and distance to the user), and possibly characteristics of the calibration signal (magnitude vs. frequency, spectrum, or the calibration signal itself (or a part thereof), etc.), are communicated to the left and right hearing devices for use in determining customized head related transfer functions (HRTF) or own voice transfer functions (OVTF). The customized (personalized) transfer functions may e.g. be used to choose an appropriate corresponding (e.g. predetermined) set of filter weights, or for calculating such weights, e.g. for an appropriate beamformer (cf. e.g. FF-BF and OV-BF in
[0230] An example of an application of personalized transfer functions according to the present disclosure is illustrated in
[0231] It is intended that the structural features of the devices described above, either in the detailed description and/or in the claims, may be combined with steps of the method, when appropriately substituted by a corresponding process.
[0232] As used, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well (i.e. to have the meaning “at least one”), unless expressly stated otherwise. It will be further understood that the terms “includes,” “comprises,” “including,” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will also be understood that when an element is referred to as being “connected” or “coupled” to another element, it can be directly connected or coupled to the other element but an intervening element may also be present, unless expressly stated otherwise. Furthermore, “connected” or “coupled” as used herein may include wirelessly connected or coupled. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items. The steps of any disclosed method is not limited to the exact order stated herein, unless expressly stated otherwise.
[0233] It should be appreciated that reference throughout this specification to “one embodiment” or “an embodiment” or “an aspect” or features included as “may” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. Furthermore, the particular features, structures or characteristics may be combined as suitable in one or more embodiments of the disclosure. The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects.
[0234] The claims are not intended to be limited to the aspects shown herein but are to be accorded the full scope consistent with the language of the claims, wherein reference to an element in the singular is not intended to mean “one and only one” unless specifically so stated, but rather “one or more.” Unless specifically stated otherwise, the term “some” refers to one or more.
[0235] Accordingly, the scope should be judged in terms of the claims that follow.
REFERENCES
[0236] [Farina, 2000]: Farina, Angelo. “Simultaneous measurement of impulse response and distortion with a swept-sine technique.” Audio Engineering Society Convention 108. Audio Engineering Society, 2000 [0237] [Jensen et al., 2015]: J. Jensen and M. S. Pedersen, “Analysis of Beamformer Directed Single-Channel Noise Reduction System for Hearing Aid Applications”, Proc. Int. Conf. Acoust., Speech, Signal Processing, pp. 5728-5732, April 2015. [0238] [Brandstein et al., 2001]: M. Brandstein and D. Ward (Eds.), “Microphone Arrays—Signal Processing Techniques and Applications,” Springer, 2001. [0239] [Haykin, 2001]: S. Haykin, “Adaptive Filter Theory”, Prentice Hall, 2001. [0240] [Heymann, et al., 2017] J. Heymann, L. Drude, R. Haeb-Umbach, “A Generic Neural Acoustic Beamforming Architecture for Robust Multi-Channel Speech Processing,” Computer, Speech and Language, Vol. 46, pp. 374-385, November 2017. [0241] [Garde, 2019]. J. Garde, “Own-Voice Retrieval for Hearing Assistive Devices: A Combined DNN-Beamforming Approach,” Master's Thesis, Aalborg University, 2019. EP2928215A1 (Oticon) Jul. 10, 2015