LOW LATENCY HEARING AID

20220394397 · 2022-12-08

Assignee

Inventors

Cpc classification

International classification

Abstract

A hearing aid comprises at least one input unit for providing at least one stream of samples of an electric input signal in a first domain; at least one encoder configured to convert said at least one stream of samples of the electric input signal in the first domain to at least one stream of samples of the electric input signal in a second domain; a processing unit configured to process said at least one electric input signal in the second domain, to provide a compensation for the user's hearing impairment, and to provide a processed signal as a stream of samples in the second domain; a decoder configured to convert said stream of samples of the processed signal in the second domain to a stream of samples of the processed signal in the first domain. The at least one encoder is configured to convert a first number of samples from said at least one stream of samples of the electric input signal in the first domain to a second number of samples in said at least one stream of samples of the electric input signal in the second domain. The decoder is configured to convert said second number of samples from said stream of samples of the processed signal in the second domain to said first number of samples in said stream of samples of the electric input signal in the first domain. The second number of samples is larger than the first number of samples. The at least one encoder is trained, and at least a part of said processing unit providing said compensation for the user's hearing impairment is implemented as a trained neural network. A method of operating a hearing aid is further disclosed. Thereby an improved hearing aid may be provided.

Claims

1. A hearing aid configured to be worn by a user, the hearing aid comprising at least one input unit for providing at least one stream of samples of an electric input signal in a first domain, said at least one electric input signal representing sound in an environment of the hearing aid; at least one encoder configured to convert said at least one stream of samples of the electric input signal in the first domain to at least one stream of samples of the electric input signal in a second domain; a processing unit configured to process said at least one electric input signal in the second domain, to provide a compensation for the user's hearing impairment, and to provide a processed signal as a stream of samples in the second domain; a decoder configured to convert said stream of samples of the processed signal in the second domain to a stream of samples of the processed signal in the first domain; wherein said at least one encoder is configured to convert a first number (N1) of samples from said at least one stream of samples of the electric input signal in the first domain to a second number (N2) of samples in said at least one stream of samples of the electric input signal in the second domain, and said decoder is configured to convert said second number (N2) of samples from said stream of samples of the processed signal in the second domain to said first number (N1) of samples in said stream of samples of the electric input signal in the first domain, and wherein the second number (N2) of samples is larger than the first number (N1) of samples, and wherein said at least one encoder is optimized and wherein at least a part of said processing unit providing said compensation for the user's hearing impairment is implemented as a trained neural network.

2. A hearing aid according to claim 1 wherein the first domain is the time domain.

3. A hearing aid according to claim 1 wherein the encoder and/or the decoder is/are implemented as a neural network.

4. A hearing aid according to claim 1 wherein the at least one encoder and the processing unit are configured to be optimized jointly in order to process the at least one electric input signal optimally under a low-latency constraint.

5. A hearing aid according to claim 4 wherein the at least one encoder and the processing unit are configured to be optimized jointly in that they are optimized in a common training procedure with a single cost function.

6. A hearing aid according to claim 4 wherein said low-latency constraint comprises a restriction to the processing time through the hearing device.

7. A hearing aid according to claim 6 wherein said low-latency constraint is related to the processing time through the encoder, the processing unit and the decoder.

8. A hearing aid according to claim 1 wherein parameters of the at least one encoder, the processing unit, and optionally the decoder are trained in order to minimize a cost function given by the difference to a hearing aid comprising linear filter banks instead of said at least one encoder and said decoder.

9. A hearing aid according to claim 8 wherein said parameters of the at least one encoder, the processing unit, and optionally the decoder that participate in the optimization may for the neural network include one or more of the weight-, bias-, and non-linear function-parameters of the neural network.

10. A hearing aid according to claim 8 wherein said parameters of the at least one encoder, the processing unit, and optionally the decoder that participate in the optimization may for the encoder and/or decoder include one or more of the first and second number of samples.

11. A hearing aid according to claim 8 wherein said parameters of the at least one encoder, the processing unit, and optionally the decoder that participate in the optimization may for the encoder include weights of the encoding matrix G.

12. A hearing aid according to claim 1 wherein a transformation matrix (G) of said encoder is an N2×N1 matrix, where N2>N1, such that a transformed signal is S=Gs, where G is a N2×N1 matrix, the input signal s of the first domain is a N1×1 vector, and the transformed signal S of the second domain is a N2×1 vector.

13. A hearing aid according to claim 1 comprising an output unit for providing stimuli perceivable as sound to the user based on said stream of samples of the processed signal in the first domain.

14. A hearing aid according to claim 1 comprising at least one earpiece configured to be worn at or in an ear of the user; and a separate audio processing device, wherein said earpiece and said separate audio processing device are configured to allow an exchange of audio signals or parameters derived therefrom between each other.

15. A hearing aid according to claim 14, wherein said earpiece comprises said at least one input unit; and an output unit for providing stimuli perceivable as sound to the user based on said stream of samples of the processed signal in the first domain.

16. A hearing aid according to claim 14 wherein said separate audio processing device comprises said processing unit.

17. A hearing aid according to claim 14 wherein said separate audio processing device comprises said encoder and/or said decoder.

18. A hearing aid according to claim 14 wherein said earpiece comprises said or an encoder and/or said decoder.

19. A method of operating a hearing aid configured to be worn by a user, the method comprising providing at least one stream of samples of an electric input signal in a first domain, said at least one electric input signal representing sound in an environment of the hearing aid; converting said at least one stream of samples of the electric input signal in the first domain to at least one stream of samples of the electric input signal in a second domain; processing said at least one electric input signal in the second domain to provide a compensation for the user's hearing impairment, and providing a processed signal as a stream of samples in the second domain; converting said stream of samples of the processed signal in the second domain to a stream of samples of the processed signal in the first domain; providing stimuli perceivable as sound to the user based on said stream of samples of the processed signal in the first domain, converting a first number (N1) of samples from said at least one stream of samples of the electric input signal in the first domain to a second number (N2) of samples in said at least one stream of samples of the electric input signal in the second domain, and converting said second number (N2) of samples from said stream of samples of the processed signal in the second domain to said first number (N1) of samples in said stream of samples of the electric input signal in the first domain, and wherein the second number (N2) of samples is larger than the first number (N1) of samples, and wherein said converting of samples in the first domain to samples the second domain is optimized and wherein said compensation for the user's hearing impairment is provided by a trained neural network.

20. A method of optimizing parameters of an encoder-/decoder-based hearing aid in order to minimize a difference between an output signal of a target encoder-/decoder-encoder-based hearing aid and an output signal of a filter bank-based hearing aid, the encoder-/decoder-encoder-based hearing aid comprising a forward path comprising, an encoder configured to convert a stream of samples of an electric input signal in a first domain to a stream of samples of the electric input signal in a second domain; a processing unit configured to process said at least one electric input signal in the second domain, to provide a compensation for the user's hearing impairment, and to provide a processed signal as a stream of samples in the second domain; a decoder configured to convert said stream of samples of the processed signal in the second domain to a first stream of samples of the processed signal in the first domain; the filter bank-based hearing aid comprising a forward path comprising a filter bank operating in the Fourier domain, the filter bank comprising an analysis filter bank for converting said stream of samples of the electric input signal in the first domain to a signal in the Fourier domain; and a processing unit connected to the analysis filter bank and the synthesis filter bank and configured to process said signal in the Fourier domain to compensate for the user's hearing impairment and to provide a processed signal in the Fourier domain; a synthesis filter bank for converting said processed signal in the Fourier domain to a second stream of samples of the processed signal in the first domain; the method comprising providing said stream of samples of an electric input signal in a first domain, said at least one electric input signal representing sound in an environment of the target encoder-/decoder-encoder-based hearing aid and/or the filter bank-based hearing aid; minimizing a cost function given by the difference between said first and second stream of samples of the processed signal in the first domain to thereby optimize said parameters of the encoder-/decoder-based hearing aid.

21. A method according to claim 20 wherein said parameters comprise one or more of weight-, bias-, and non-linear function-parameters of a neural network, and one or more of the first and second number of samples.

22. A method of training according to claim 20 comprising providing a separate delay (D) in the forward path of the encoder-/decoder-based hearing aid in addition to the processing delay of the encoder, the processing unit and the decoder, wherein a delay parameter (D) is used to adjust for an intended latency difference between the target hearing aid and the encoder-based hearing aid.

23. A method according to claim 20 wherein the encoder, the processing unit, and the decoder of the low-latency encoder-based hearing aid are trained as one deep neural network, wherein the first, input, layers of the deep neural network correspond to the encoder, and the last, output, layers correspond to the decoder, and the layers in between correspond to the hearing loss compensation processing.

Description

BRIEF DESCRIPTION OF DRAWINGS

[0146] The aspects of the disclosure may be best understood from the following detailed description taken in conjunction with the accompanying figures. The figures are schematic and simplified for clarity, and they just show details to improve the understanding of the claims, while other details are left out. Throughout, the same reference numerals are used for identical or corresponding parts. The individual features of each aspect may each be combined with any or all features of the other aspects. These and other aspects, features and/or technical effect will be apparent from and elucidated with reference to the illustrations described hereinafter in which:

[0147] FIG. 1 shows a hearing device configured to process signals in the frequency domain,

[0148] FIG. 2 shows an embodiment of a hearing device according to the present disclosure,

[0149] FIG. 3A shows an example of an encoder/decoder function according to the present disclosure, FIG. 3B, shows the example of FIG. 3A in more detail where the transformation matrix G converts 20 samples to 200 values (encoding), and the inverse transformation matrix G.sup.−1 converts the 200 values back into 20 samples (decoding), and

[0150] FIG. 3C schematically illustrates an example of the basis functions of the transformation matrix G.

[0151] FIG. 4 shows an embodiment of a hearing device according to the present disclosure, wherein parameters of the encoder/processing/decoder are trained in order to minimize a cost function given by the difference to a regular hearing instrument with linear filter banks and a hearing loss compensation and (optional) noise reduction,

[0152] FIG. 5 shows an example of a hearing device according to the present disclosure comprising an earpiece and a separate (external) audio processing device wherein a low-latency encoder may allow processing in the external audio processing device,

[0153] FIG. 6 shows an example of a hearing device according to the present disclosure comprising a similar functional configuration as in FIG. 5, but wherein only parts of the signal processing are moved to the external audio processing device,

[0154] FIG. 7 shows an example of a binaural hearing system according to the present disclosure wherein the estimated gains may depend on signals from both hearing devices in a binaural hearing aid system,

[0155] FIG. 8 shows an embodiment of a hearing aid according to the present disclosure, and

[0156] FIG. 9 shows an embodiment of a hearing aid according to the present disclosure comprising a BTE-part located behind an ear of the user and an ITE part located in an ear canal of the user in communication with an auxiliary device comprising a user interface for the hearing aid.

[0157] The figures are schematic and simplified for clarity, and they just show details which are essential to the understanding of the disclosure, while other details are left out. Throughout, the same reference signs are used for identical or corresponding parts.

[0158] Further scope of applicability of the present disclosure will become apparent from the detailed description given hereinafter. However, it should be understood that the detailed description and specific examples, while indicating preferred embodiments of the disclosure, are given by way of illustration only. Other embodiments may become apparent to those skilled in the art from the following detailed description.

DETAILED DESCRIPTION OF EMBODIMENTS

[0159] The detailed description set forth below in connection with the appended drawings is intended as a description of various configurations. The detailed description includes specific details for the purpose of providing a thorough understanding of various concepts. However, it will be apparent to those skilled in the art that these concepts may be practiced without these specific details. Several aspects of the apparatus and methods are described by various blocks, functional units, modules, components, circuits, steps, processes, algorithms, etc. (collectively referred to as “elements”). Depending upon particular application, design constraints or other reasons, these elements may be implemented using electronic hardware, computer program, or any combination thereof.

[0160] The electronic hardware may include micro-electronic-mechanical systems (MEMS), integrated circuits (e.g. application specific), microprocessors, microcontrollers, digital signal processors (DSPs), field programmable gate arrays (FPGAs), programmable logic devices (PLDs), gated logic, discrete hardware circuits, printed circuit boards (PCB) (e.g. flexible PCBs), and other suitable hardware configured to perform the various functionality described throughout this disclosure, e.g. sensors, e.g. for sensing and/or registering physical properties of the environment, the device, the user, etc. Computer program shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise.

[0161] The present application relates to the field of hearing devices. The disclosure relates in particular to such devices configured to have a low delay in the processing of audio signals. [Luo et al.; 2019] describe a scheme for speaker-independent speech separation using a fully convolutional time-domain audio separation network in a deep learning framework (DNN) for end-to-end time-domain speech separation. The DNN uses a linear encoder to generate a representation of the speech waveform optimized for separating individual speakers. Speaker separation is achieved through application of a set of weighting functions (masks) to the encoder output. The modified encoder representations are then inverted back to the waveforms using a linear decoder. The masks are found using a temporal convolutional network (TCN) consisting of stacked 1-D dilated convolutional blocks, which allows the network to model the long-term dependencies of the speech signal while maintaining a small model size.

[0162] FIG. 1 shows a hearing device (HD′), e.g. a hearing aid, configured to process signals in the frequency domain. The time domain signal(s) (I.sub.1, I.sub.M, M≥1) picked up by the microphone(s) (M.sub.1, . . . , M.sub.M) are converted into the time-frequency domain (signals IF.sub.1, . . . , IF.sub.M), using an analysis filter bank (AFB). In the frequency domain, the signal is modified in order to compensate for a hearing loss of the user (cf. unit HLC, and output signal OF), and possibly also processed in order to enhance speech in a noisy background (e.g. by reducing noise in the input signal(s) (IF.sub.1, . . . , IF.sub.M), cf. block NR, and output signal IFNR). The purpose of the NR block is to reduce the background noise in order to enhance a target signal. The noise is typically attenuated using beamforming and/or by attenuating regions in time and frequency wherein the signal to noise ratio (SNR) is estimated to be poor. The processed signal (OF) is converted to the time-domain by a synthesis filter bank (SFB) and the resulting time-domain signal (O) is presented to the user via an output transducer (here a loudspeaker (SPK)).

[0163] In the block diagram of a hearing instrument (HD′) shown in FIG. 1, the microphone signal(s) (I.sub.1, . . . , I.sub.M) are processed in the frequency domain in order to provide a frequency dependent gain (e.g. to provide a hearing loss compensation for the user of the hearing instrument). Frequency domain processing typically requires filtering. The filters (analysis+synthesis filters, AFB, SFB) have a certain length, and hereby a delay is introduced in the processing path. As a rule of thumb, a higher frequency resolution requires a longer filter, and hereby a higher delay through the hearing instrument.

[0164] There is however a limit to how much latency a hearing device can introduce before the processed sound is significantly degraded. Typically, delays exceeding approximately 10 milliseconds (ms) are unacceptable during daily hearing device use.

[0165] FIG. 2 shows an embodiment of a hearing device (HD), e.g. a hearing aid, according to the present disclosure. FIG. 2 shows an embodiment of the proposed hearing device structure: The analysis and synthesis filter bank (AFB, SFB) of FIG. 1 are replaced with a more generic low-latency encoder/decoder (LL-ENC, LL-DEC). The low-latency encoder (LL-ENC) takes in few samples at a time, which via the encoder are mapped into a high-dimensional space. The LL-ENC for each microphone may contain the same set of optimized parameters. The input is processed (in processing unit (PRO)) in the high dimensional space before it is synthesized back into a time-domain signal by low-latency decoder (LL-DEC) and presented to the listener by an output transducer (here a loudspeaker (SPK)). The system is optimized jointly in order to process the input optimally under the low-latency constraint (i.e. apply hearing loss compensation and noise reduction, e.g. provided by the processing unit (PRO)). It is noted, though, that the decoder (LL-DEC) is not required to perfectly reconstruct the time-domain signal.

[0166] The LL decoder (LL-DEC) may be jointly optimized together with the processing unit (as the processing unit will typically alter the input signal). As it rarely happens that the input signal is unaltered by the processing unit, a requirement of perfect reconstruction may be unnecessary (and the parameters of the encoder and the decoder may be utilized in a better way).

[0167] Similarly to an analysis filter bank (AFB in FIG. 1), the low-latency encoder (LL-ENC) is mapping time domain samples into another domain. However, instead of mapping the samples into a Fourier domain, the time domain samples are mapped into a high-dimensional domain. E.g. a time frame consisting of e.g. T=20 samples at a sample rate of 20 kHz is encoded into a high-dimensional domain, e.g. consisting of N=200 values. This is schematically illustrated in FIG. 3A, 3B.

[0168] FIG. 3A, 3B shows an example of the function of an encoder/decoder according to the present disclosure. The bottom part of FIG. 3A, 3B represents the low-dimensional space (here the time-domain), whereas the top part of FIG. 3A, 3B represents the high-dimensional space. The left half of bottom part of FIG. 3A, 3B shows a stream of input audio samples, whereas the right half of bottom part of FIG. 3A, 3B shows a stream of (processed) output audio samples. A frame (denoted INF in FIG. 3A, 3B) of time domain samples (cf. left square bracket embracing T (e.g. N1) samples from s(n−T) to s(n) in the input stream of audio samples in the lower part of FIG. 3A, 3B, n being a time sample index) is encoded into a high-dimensional space. For example, T=20 samples are encoded into a high-dimensional space, e.g. to (N2=) N=200 values, using the encoding function G(s), cf. arrow from the square bracket (INF) to ‘G(s)’. The input signal (stream) is processed in this high-dimensional space (cf. ‘Processing’ in the top part of FIG. 3) before being decoded (using the decoding function G.sup.−1(.Math.)) back to a time domain signal (cf. arrow from G.sup.−1(.Math.) to square bracket denoted OUTF in the output stream of time domain samples. As the input frame (INF) is based on only few samples, the latency between the encoding and decoding is kept at a minimum. The size of the output frame may be similar to the size of the input frame. The frames may overlap in time.

[0169] FIG. 3B, shows the example of FIG. 3A in more detail where the transformation matrix G converts N1=20 samples to N2=200 values (encoding), and the inverse transformation matrix G.sup.−1 converts the N2=200 values back into N1=20 samples (decoding). In FIG. 3B, the input and output frames (INF-HD, OUTF-HD) of the high-dimensional spaces are specifically illustrated.

[0170] FIG. 3C schematically illustrates an example of the basis functions of the transformation matrix G. Each basis function may correlate with specific features in the input signal. It may e.g. be speech specific features such as onsets, pitch, modulation, frequency specific features or certain waveforms. Typically, the basis functions will be trained on different output signals. The basis functions may e.g. be trained in order to achieve a decoded hearing loss-compensated signal in order to implement a low-latency hearing loss compensation, as proposed by the present disclosure.

[0171] A transform according to the present disclosure may be different from a Fourier transform in that the transformation matrix (G, related to encoding according to the present disclosure) is an N2×N1 matrix (cf. FIG. 3C), where N2>N1, such that the transformed signal is S=Gs, where G=N2×N1, s=N1×1 and S=N2×1, where s is the original (e.g. time domain) signal. Thereby the inverse transformation matrix G.sup.−1 (related to decoding) may be written as a N1×N2 matrix, such that the inversely transformed signal is s=G.sup.−1 S.

[0172] The encoding/decoding functions may be linear, e.g. G(s) could be an N×T matrix, and the decoding function could be a T×N matrix, where N≥T (T being the number of samples in an input frame). A DFT (Discrete Fourier Transform) matrix is a special case of such an encoding function. The encoding/decoding functions may as well be non-linear, e.g. implemented as a neural network, e.g. as a feed-forward neural network. The neural network may be a deep neural network. Perfect reconstruction (i.e. GG.sup.−1=I, where I is a T×T identity matrix) is not a requirement.

[0173] The encoding step may be written as a matrix multiplication:


z=G(s)=f(sU),

where U is a T×N matrix, and f is an optional non-linear function.

[0174] Similarly, G.sup.−1(z)=h(zW), where W is an N×T matrix, and h is an optional non-linear function.

[0175] Some examples exist in literature on the decomposition of speech into a high-dimensional space of basis vectors (i.e. basis functions), see e.g. an illustration of basis function examples in FIG. 5 of [Lewicki & Sejnowski; 2000], or in FIG. 2 of [Bell & Sejnowski; 1996]. This encoding can be trained using independent component analysis or a more general approach by using a neural network (cf. [Luo et al.; 2019]).

[0176] A main concept of the present disclosure is shown in FIG. 4. FIG. 4 shows an embodiment of a hearing device (HD, excl. output transducer of FIG. 2), e.g. a hearing aid, according to the present disclosure (bottom part of FIG. 4), wherein parameters of the encoder/processing/decoder (are trained in order to minimize a cost function (cf. error L(α, . . . ) in FIG. 4) given by the difference to a regular hearing instrument (HD′, excl. output transducer of FIG. 1) with linear filter banks (AFB, SFB) and a hearing loss compensation (HLC) and (optional) noise reduction (NR) units (top part of FIG. 4). The error signal L(α, . . . ) is provided by combination unit (CU) here a subtraction unit (‘+’) subtracting the output (O′) of the prior art hearing aid (HD′) from the output (O) of the hearing aid (HD) according to the present disclosure. The hearing loss compensation (HLC) is a function of the hearing ability of the user (e.g. an audiogram) parameterized by input α to the HLC-block The low latency encoder (LL-ENC) may encode the microphone signals (I.sub.1, . . . , I.sub.M) jointly or separately, depending on how the neural network (NN) (representing the processing unit (PRO) of the embodiment of FIG. 2) is structured.

[0177] It is thus proposed to train the parameters in a low-latency encoder/decoder hearing aid (FIG. 2) according to the present disclosure in order to minimize the difference (‘error’) (L(α, . . . ) in FIG. 4) at the output signal (O′) of a regular hearing aid (HD′) with a filter bank (AFB, SFB) operating in the Fourier domain (cf. combination unit ‘CU’, here performing a subtraction of the (possibly delayed (cf. delay unit z.sup.−1)) output of the low-latency encoder/decoder hearing aid from the output of the regular hearing aid comprising a filter bank (AFB, SFB)).

[0178] Advantages of the proposed model is that the latency of the encoder/decoder-based hearing aid (HD) can be kept at a minimum compared to traditional hearing aid (HD′) processing. It may even allow training towards a hearing aid wherein the delay (of the corresponding filter bank-based hearing aid) is higher than what is typically allowed (e.g. >10 ms, e.g. ≥15 ms). E.g., the analysis filter bank (AFB) may have a higher frequency resolution than what is typically allowed in a hearing aid due to latency. Such a higher resolution will e.g. allow attenuation of noise between the harmonic frequencies of a speech signal.

[0179] The delay parameter D (cf. delay element z.sup.−D inserted in the signal path between the low latency decoder (LL-DEC) and the combination unit (CU)) is used to adjust for the latency difference between the filter bank-based hearing aid and the encoder-based hearing aid (to thereby train towards a hearing aid having a lower latency while exhibiting the benefits of a larger delay (e.g. increased frequency resolution) in the filter bank-based hearing aid). The delay parameter may be substituted with an all-pass filter allowing a frequency-dependent delay. The encoder-based hearing aid (HD) may be trained as one deep neural network, wherein the first layers correspond to the encoder, and the last layers correspond to the decoder. Layers in-between correspond to the noise reduction and hearing loss compensation processing. The network may be trained jointly. In an embodiment the encoder and decoder are trained but may be kept fixed for fine tuning to an individual audiogram (where only the layers in-between are trained). The layers corresponding to the low-latency-encoder and/or of the low-latency-decoder may e.g. be implemented as a feed forward neural network. The layers corresponding to the hearing loss compensation (etc.) may e.g. be implemented as a recurrent neural network.

[0180] In the exemplary training setup of FIG. 4, the two hearing aid processing schemes (HD′, HD) that are compared each have from 1 to M microphones (M.sub.1, . . . , M.sub.M). M may be one or more, two or more, such as three or more, etc. In the training situation, identical audio data are fed to the two ‘hearing aids’, e.g. from a database, either by playing identical sound signals to (identical microphone configurations M.sub.1, . . . , M.sub.M) of the two hearing aids, or by feeding received signals I.sub.1, . . . , I.sub.M from one hearing aid to the other, or by feeding electrical versions of the sound signals directly to the analysis filter bank(s) and low-latency-encoder(s), respectively. This is indicated by the dashed lines combining the respective input signals I.sub.1, . . . , I.sub.M of the two hearing aids (HD′, HD).

[0181] The main objective of the training is to provide that the low-latency hearing instrument in the lower part of FIG. 4 mimics the performance of the (conventional) hearing aid in the upper part of FIG. 4.

[0182] The gained lower latency may be used to compensate for an additional transmission delay, in the case the signals or encoded features partly or fully are processed in an external device. The external device may contain additional microphones, or it may base its calculations on signals from more than one hearing aid, such as a pair of hearing aids mounted on the left and the right ear. Different examples are shown in FIG. 5, FIG. 6, and FIG. 7.

[0183] FIG. 5 shows an example of a hearing device (HD), e.g. a hearing aid, according to the present disclosure comprising an earpiece (EP) adapted for being located at or in an ear of the user and a separate (external) audio processing device (ExD), e.g. adapted for being worn by the user, wherein a low-latency encoder (LL-ENC) may allow processing in the external audio processing device (ExD). The earpiece (EP) of the embodiment of FIG. 5 comprises two microphones (M.sub.1, M.sub.2) for picking up sound at the earpiece (EP) and providing respective electric input signals (I.sub.1, I.sub.2) representing the sound. Input signals, e.g. signals I.sub.1, I.sub.2), or a representation thereof, e.g. a filtered (e.g. beamformed) version thereof, are transmitted from the earpiece (EP) (cf. transmitted signal I.sub.EP) to the external audio processing device (ExD) (cf. received signal I.sub.ExD) via a (wired or wireless) communication link (LNK) provided by transceivers (transmitters (Tx) and receivers (Rx)) of the respective devices (EP, ExD). The receiver (Rx) of the external audio processing device (ExD) provides input signal (or signals) Ix to low-latency encoder (or encoders) (LL-ENC) according to the present disclosure. The low-latency encoder (or encoders) (LL-ENC) provides input signal(s) I.sub.ENC in a high-dimensional space. The input signal(s) I.sub.ENC is(are) fed to the processing unit (PRO, cf. dotted enclosure). The processing unit (PRO) may e.g. comprise a hearing loss compensation algorithm (and/or other audio processing algorithms for enhancing the input signal(s), e.g. performing beamforming and/or other noise reduction). In the embodiment of FIG. 5, the processing unit (PRO) comprises gain unit (G) for determining appropriate gains G.sub.ENC (e.g. for compensating for a hearing loss of the user, etc.) that are applied to the input signal I.sub.ENC in combination unit (‘X’), e.g. a multiplication unit. The combination unit (CU) (and here the processing unit (PRO)) provides processed signal O.sub.ENC. The processed signal is fed to the low-latency decoder (LL-DEC) providing processed (time-domain) output signal O.sub.x, which is provided to transmitter Tx for transmission to the earpiece (EP) via wireless link (LNK), cf. transmitted signal O.sub.ExD and received signal O.sub.EP. The receiver (Rx) of the earpiece (EP) provides (time-domain) output signal (O) to the output transducer (here loudspeaker SPK) of the earpiece. The output signal (O) is presented as stimuli perceivable by the user as sound (her as vibrations in air to the user's eardrum).

[0184] The thereby provided lower latency of processing (cf. processing unit PRO in dotted enclosure of the external audio processing device (ExD)) may compensate for the transmission delay incurred by the communication link (LNK) between the earpiece (EP) of the hearing instrument and the external audio processing device (ExD). Hereby the hearing instrument (HD) has access to more processing power compared to local processing in the earpiece (EP), e.g. to better enable computation intensive tasks, e.g. related to neural network computations.

[0185] The parameters of the external audio processing device (ExD) of FIG. 5 (and/or the hearing device shown in FIG. 4) can be trained towards a specific hearing loss, and a specific hearing loss compensation strategy (such as NAL-NL2, DSL 5.0, etc.). The latency in the low-latency instrument (HD) can be specified. The latency may e.g. be 1 ms, 5 ms, 8 ms, or less than 10 ms. The parameters may be trained jointly in order to compensate for a hearing loss as well as in order to suppress background noise.

[0186] The encoder (LL-ENC) may be implemented with real-valued weights or alternatively with complex-valued weights.

[0187] The earpiece (EP) and the external audio processing device (ExD) may be connected by an electric cable. The link (LNK) may, however, be a short-range wireless (e.g. audio) communication link, e.g. based on Bluetooth, e.g. Bluetooth Low Energy, or Ultra-Wide Band (UWB) technology.

[0188] In the above description, the earpiece (EP) and the external audio processing device (ExD) are assumed to form part of the hearing device (HD). The external audio processing device (ExD) may be constituted by a dedicated, preferably portable, audio processing device, e.g. specifically configured to carry out (at least) more processing intensive tasks of the hearing device.

[0189] The external audio processing device (ExD) may be portable communication device, e.g. a smartphone, adapted to carry out processing tasks of the earpiece, e.g. via an application program (APP), but also dedicated to other tasks that are not directly related to the hearing device functionality.

[0190] The earpiece (EP) may comprise more functionality than shown in the embodiment of FIG. 5.

[0191] The earpiece (EP) may e.g. comprise a forward path that is used in a certain mode of operation, when the external audio processing device (ExD) is not available (or intentionally not used). In such case the earpiece (EP) may perform the normal function of the hearing device.

[0192] The hearing device (HD) may be constituted by a hearing aid (hearing instrument) or a headset.

[0193] FIG. 6 shows an example of a hearing device (HD), e.g. a hearing aid, according to the present disclosure comprising a similar functional configuration as in FIG. 5, but wherein only parts of the signal processing are moved to the external audio processing device (ExD). In the embodiment of FIG. 6, gain estimation (cf. block G) is performed in the external audio processing device (ExD), and the estimated gains (G.sub.ENC) in the high dimensional domain are transmitted to the earpiece (EP) via the wireless link (LNK). The earpiece of FIG. 6 comprises a forward path comprising the (here two) microphones (M.sub.1, M.sub.2), respective low-latency encoders (LL-ENC) providing electric input signal(s) (I.sub.ENC) in the high dimensional domain, a combination unit (‘X’, here a multiplication unit), a low-latency decoder (LL-DEC) and an output transducer (SPK, here a loudspeaker). The estimated gains (G.sub.ENC) received in the earpiece from the external audio processing device (ExD) are applied to the electric input signal(s) (I.sub.ENC) in the high dimensional domain in the combination unit (‘X’) of the earpiece (EP) and the resulting processed signal (O.sub.ENC) is fed to the low-latency decoder (LL-DEC) of the earpiece providing processed (time-domain) output signal (O). The processed output signal (O) is fed to the loudspeaker (SPK) of the earpiece (EP) for presentation to the user as a hearing loss compensated sound signal.

[0194] Compared to the embodiment of FIG. 5, the external audio processing device (ExD) of the embodiment of FIG. 6 does not need an encoder.

[0195] In an embodiment, a hearing device (HD) is provided which is configured to switch between two modes of operation implementing the embodiments of FIG. 5 and FIG. 6, respectively, as different modes (in which case, the external audio processing device (ExD) comprises a low-latency decoder (LL-DEC)). Switching between the two modes of operation may be provided automatically in dependence of a current acoustic environment, and/or of a current processing capability (e.g. battery status) of the ear-piece (or the external audio processing device (ExD)). Switching between the two modes of operation may be provided via a user interface, e.g. implemented in the external audio processing device (ExD).

[0196] FIG. 7 shows an example of a binaural hearing system according to the present disclosure wherein the estimated gains may depend on signals from both hearing devices in a binaural hearing aid system. In the embodiment of FIG. 7, the binaural hearing system (e.g. a binaural hearing aid system) comprises first and second ear pieces (EP1, EP2) and an external audio processing device (ExD). The external audio processing device (ExD) is configured to service each of the first and second earpieces (EP1, EP2). Respective communication links (LNK) between each of the first and second earpieces (EP1, EP2) and the external audio processing device (ExD) may be established via appropriate transceiver circuitry (Rx, Tx) in the three devices. The first and second earpieces (EP1, EP2) of FIG. 7 comprise the same functional elements as shown in, and described in connection with, FIG. 6. In the embodiment of FIG. 7, the external audio processing device (ExD) is, however, configured to determine the estimated gains (G.sub.ENC1), G.sub.ENC2) based on microphone signals from both earpieces (EP1, EP2). Thereby binaural effects can be taken care of in the gain estimation (e.g. to ensure that spatial cues are appropriately maintained at the respective ears of the user, to maintain the user's directional awareness).

[0197] In an embodiment, spatial cues, such as interaural time differences or interaural level differences are part of the cost function in the optimization process. E.g. the interaural time difference between the left and the right target signals and the estimated left and right target signals may be implemented as a term in the cost function. Alternatively, the interaural transfer functions of the clean speech or the noise may be included in the cost function, in order to preserve spatial cues.

[0198] FIG. 8 shows an embodiment of a hearing aid (HD) according to the present disclosure. The embodiment of FIG. 8 has the same functionality as the embodiment shown in FIG. 2. As in FIG. 2, the hearing aid (HD) comprises M input transducers, here microphones (M.sub.1, . . . , M.sub.M, where M≥1), each providing an electric input signal (I.sub.1, . . . , I.sub.M), which are fed to respective low-latency encoders (here all comprised in unit LL-ENC-NN). In FIG. 8, each of the low-latency encoder (LL-ENC-NN) and decoder (LL-DEC-NN) are implemented as a neural network (NN), e.g. respective feed forward neural networks. The processing unit (PRO, solid enclosure) is configured to compensate for the user's hearing impairment (e.g. by applying a hearing loss compensation algorithm, e.g. based on an audiogram of, and optionally on further data about, the user) is likewise implemented at least partially by a neural network, e.g. a recurrent neural network. In the embodiment of FIG. 8, the neural network of the processing unit (PRO-HLC-NN) receives an input vector comprising (or being extracted from) the encoded input signal(s) (I.sub.ENC). The input vector of the neural network may comprise one or more ‘frames’ of the second high dimensional domain and provide as an output vector (G.sub.ENC) a ‘frame’ of appropriate gain values G.sub.ENC in the second high dimension domain. The input vector may additionally comprise values of one or more sensors (e.g. a movement sensor) or detectors (e.g. a voice detector, e.g. an own voice detector, etc.). The input vector of the neural network of the processing unit may (for a given time unit) comprise stacked ‘frames’ of encoded versions of the M input signals (I.sub.1, . . . , I.sub.M), or data extracted therefrom. The processing unit (PRO) further comprises a combination unit (‘X’), here a multiplication unit, receiving the estimated gains (G.sub.ENC) from the neural network (PRO-HLC-NN) and the encoded input signal(s) (I.sub.ENC) The combination unit (CU) applies the estimated gains (G.sub.ENC) to the encoded signal or signals (I.sub.ENC), whereby the encoded processed output signal (O.sub.ENC) of the processing unit (PRO) is provided and (here) fed to the decoder (L-DEC-NN) for conversion from the second (high dimensional) domain to the first (low dimensional) domain, here the time domain (cf. signal O). The processed (hearing loss compensated) time domain signal is fed to the output transducer, here a loudspeaker, and presented to the user. Other output transducers may be a vibrator of a bone conduction type hearing aid, or a multielectrode array of a cochlear implant type hearing aid.

[0199] FIG. 9 shows an embodiment of a hearing aid according to the present disclosure comprising a BTE-part located behind an ear of the user and an ITE part located in an ear canal of the user in communication with an auxiliary device comprising a user interface for the hearing aid.

[0200] FIG. 9 shows an embodiment of a hearing device (HD), e.g. a hearing aid, according to the present disclosure comprising a BTE-part located behind an ear (Ear (Pinna)) of a user and an ITE part located in an ear canal of the user in communication with an auxiliary device (AUX) comprising a user interface (UI) for the hearing device. The auxiliary device (AUX) may comprise an external audio processing device as described in connection with FIG. 5, 6, 7. FIG. 9 illustrates an exemplary hearing aid (HD) formed as a receiver in the ear (RITE, Receiver-In-The-Ear) type hearing aid comprising a BTE-part (BTE) adapted for being located at or behind pinna (Ear (Pinna)) and a part (ITE) comprising an output transducer (e.g. a loudspeaker/receiver) adapted for being located in an ear canal (Ear canal) of the user (e.g. exemplifying a hearing aid (HD) as shown in FIG. 2 or FIG. 8). The BTE-part (BTE) and the ITE-part (ITE) are connected (e.g. electrically connected) by a connecting element (IC). In the embodiment of a hearing aid of FIG. 9, the BTE part (BTE) comprises two input transducers (here microphones) (M.sub.1, M.sub.2) each for providing an electric input audio signal representative of an input sound signal from the environment (in the scenario of FIG. 9, including sound source S). The hearing aid of FIG. 9 further comprises two wireless receivers or transceivers (WLR.sub.1, WLR.sub.2) for providing respective directly received auxiliary audio and/or information/control signals (and optionally for transmitting such signals to other devices). The hearing aid (HD) comprises a substrate (SUB) whereon a number of electronic components are mounted, functionally partitioned according to the application in question (analogue, digital, passive components, etc.), but including a signal processor (DSP), a front-end chip (FE) mainly containing analogue circuitry and interfaces between analogue and digital processing, and a memory unit (MEM) coupled to each other and to input and output units via electrical conductors Wx. The mentioned functional units (as well as other components) may be partitioned in circuits and components according to the application in question (e.g. with a view to size, power consumption, analogue vs digital processing, radio communication, etc.), e.g. integrated in one or more integrated circuits, or as a combination of one or more integrated circuits and one or more separate electronic components (e.g. inductor, capacitor, etc.). The signal processor (DSP) provides an enhanced audio signal (cf. signal O in FIG. 2, or FIG. 6-8), which is intended to be presented to a user. In the embodiment of a hearing aid device in FIG. 9 the ITE part (ITE) comprises an output unit in the form of a loudspeaker (receiver) (SPK) for converting the electric signal (O) to an acoustic signal (providing, or contributing to, acoustic signal S.sub.ED at the ear drum (Ear drum). The ITE-part may further comprise an input unit comprising one or more input transducer (e.g. microphones). In FIG. 9, the ITE part comprises a microphone (M.sub.ITE) located at an entrance to the ear canal of the user. The ITE-microphone (M.sub.ITE) is configured to provide an electric input audio signal representative of an input sound signal from the environment at or in the ear canal (i.e. including any acoustic modifications of the input signal due to pinna, reflecting the acoustic characteristics of pinna). In another embodiment, the hearing aid may further comprise an input unit (e.g. a microphone or a vibration sensor) located elsewhere than at the entrance of the ear canal (e.g. facing the eardrum) in combination with one or more input units located in the BTE-part and/or the ITE-part. The ITE-part further comprises a guiding element, e.g. a dome, (DO) (or an open or closed mould) for guiding and positioning the ITE-part in the ear canal of the user.

[0201] The hearing aid (HD) exemplified in FIG. 9 is a portable device and further comprises a battery (BAT) for energizing electronic components of the BTE- and ITE-parts.

[0202] The hearing aid (HD) may comprise a directional microphone system adapted to enhance a target acoustic source relative to a multitude of acoustic sources in the local environment of the user wearing the hearing aid device (e.g. based on the electric input signals from two or more of the microphones (M.sub.1, M.sub.2, M.sub.ITE). The memory unit (MEM) may comprise predefined (or adaptively determined) complex, frequency dependent constants defining predefined or (or adaptively determined) beam patterns, etc.

[0203] The memory (MEM) may e.g. comprise data related to the user, e.g. preferred settings.

[0204] The hearing aid of FIG. 9 may constitute or form part of a hearing aid and/or a binaural hearing system according to the present disclosure.

[0205] The hearing aid (HD) according to the present disclosure may comprise a user interface UI, e.g. as shown in the lower left part of FIG. 9 implemented in an auxiliary device (AUX), e.g. a remote control, e.g. implemented as an APP in a smartphone or other portable (or stationary) electronic device, e.g. a separate audio processing device as described above in connection with FIG. 5-7. In the embodiment of FIG. 9, the screen of the user interface (UI) illustrates a Latency Configuration APP. The screen ‘Select configuration of hearing aid system’ allows a user to decide how the processing according to the present disclosure is configured. The user may indicate whether a monaural (Single Hearing Aid system) or a binaural system comprising left and right hearing aids is currently relevant. The user may further for a monaural system indicate whether the hearing aid (HD.sub.l) is located at the left or right ear. The user (U) may further indicate whether an external audio processing device (AxD) should be used or not (cf. embodiments as described in connection with FIG. 5, 6, 7). In the shown example, a monaural system using only a hearing device at the left ear of the user (U) is selected (cf. solid tick boxes (.square-solid.) at ‘Monaural system’, and ‘Left’). It is further selected that an external audio processing device communicating (via wireless link (LNK)) with the left hearing aid (HD.sub.l), e.g. an earpiece, should be used (cf. solid tick box (.square-solid.) at ‘Ext. processing device?’). The auxiliary device (AUX (ExD)) and the hearing aid are adapted to allow communication of data representative of the currently selected configuration via a, e.g. wireless, communication link (cf. dashed arrow LNK in FIG. 9). The communication link WL2 between the hearing device (HD), and the auxiliary device (AUX (ExD)) may e.g. be based on far field communication, e.g. Bluetooth or Bluetooth Low Energy (or similar technology, e.g. UWB), implemented by appropriate antenna and transceiver circuitry in the hearing aid (HD) and the auxiliary device (AUX), indicated by transceiver unit WLR.sub.2 in the hearing aid. The transceiver in the hearing aid indicated by WLR1 may be for establishing an interaural link, e.g. for exchanging audio signals (or parts thereof), and/or control or information parameters between the left and right hearing aids (HD.sub.l, HD.sub.r) of a binaural hearing aid system. The interaural link may e.g. be implemented as an inductive link or as the communication link (WL2).

[0206] The auxiliary device may e.g. be constituted by or comprise the external audio processing device (ExD).

[0207] Other aspects related to the control of hearing aid (e.g. the beamformer), the volume setting, specific hearing aid programs for a given listening situation, etc.) may be made selectable or configurable from the user interface (UI). The user interface may e.g. be configured to allow a user to decide on specific modes of operation of the latency setup, cf. e.g. as discussed in connection with FIG. 6.

[0208] It is intended that the structural features of the devices described above, either in the detailed description and/or in the claims, may be combined with steps of the method, when appropriately substituted by a corresponding process.

[0209] As used, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well (i.e. to have the meaning “at least one”), unless expressly stated otherwise. It will be further understood that the terms “includes,” “comprises,” “including,” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will also be understood that when an element is referred to as being “connected” or “coupled” to another element, it can be directly connected or coupled to the other element, but an intervening element may also be present, unless expressly stated otherwise. Furthermore, “connected” or “coupled” as used herein may include wirelessly connected or coupled. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items. The steps of any disclosed method are not limited to the exact order stated herein, unless expressly stated otherwise.

[0210] It should be appreciated that reference throughout this specification to “one embodiment” or “an embodiment” or “an aspect” or features included as “may” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. Furthermore, the particular features, structures or characteristics may be combined as suitable in one or more embodiments of the disclosure. The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects.

[0211] The claims are not intended to be limited to the aspects shown herein but are to be accorded the full scope consistent with the language of the claims, wherein reference to an element in the singular is not intended to mean “one and only one” unless specifically so stated, but rather “one or more.” Unless specifically stated otherwise, the term “some” refers to one or more.

REFERENCES

[0212] [Luo & Mesgarani; 2019] Yi Luo, Nima Mesgarani, “Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation”, IEEE/ACM transactions on audio, speech, and language processing, 27(8), 1256-1266 (2019). [0213] [Lewicki & Sejnowski; 2000] Michael S. Lewicki, Terrence J. Sejnowski, “Learning Overcomplete Representations”, Neural Computation, 12, 337-365, Massachusetts Institute of Technology (2000). [0214] [Bell & Sejnowski; 1996] Anthony J Bell and Terrence J Sejnowski, “Learning the higher-order structure of a natural sound”, Network: Computation in Neural Systems, 7, 261-266, IOP Publishing Ltd (1996).