LOW LATENCY HEARING AID
20220394397 · 2022-12-08
Assignee
Inventors
Cpc classification
International classification
Abstract
A hearing aid comprises at least one input unit for providing at least one stream of samples of an electric input signal in a first domain; at least one encoder configured to convert said at least one stream of samples of the electric input signal in the first domain to at least one stream of samples of the electric input signal in a second domain; a processing unit configured to process said at least one electric input signal in the second domain, to provide a compensation for the user's hearing impairment, and to provide a processed signal as a stream of samples in the second domain; a decoder configured to convert said stream of samples of the processed signal in the second domain to a stream of samples of the processed signal in the first domain. The at least one encoder is configured to convert a first number of samples from said at least one stream of samples of the electric input signal in the first domain to a second number of samples in said at least one stream of samples of the electric input signal in the second domain. The decoder is configured to convert said second number of samples from said stream of samples of the processed signal in the second domain to said first number of samples in said stream of samples of the electric input signal in the first domain. The second number of samples is larger than the first number of samples. The at least one encoder is trained, and at least a part of said processing unit providing said compensation for the user's hearing impairment is implemented as a trained neural network. A method of operating a hearing aid is further disclosed. Thereby an improved hearing aid may be provided.
Claims
1. A hearing aid configured to be worn by a user, the hearing aid comprising at least one input unit for providing at least one stream of samples of an electric input signal in a first domain, said at least one electric input signal representing sound in an environment of the hearing aid; at least one encoder configured to convert said at least one stream of samples of the electric input signal in the first domain to at least one stream of samples of the electric input signal in a second domain; a processing unit configured to process said at least one electric input signal in the second domain, to provide a compensation for the user's hearing impairment, and to provide a processed signal as a stream of samples in the second domain; a decoder configured to convert said stream of samples of the processed signal in the second domain to a stream of samples of the processed signal in the first domain; wherein said at least one encoder is configured to convert a first number (N1) of samples from said at least one stream of samples of the electric input signal in the first domain to a second number (N2) of samples in said at least one stream of samples of the electric input signal in the second domain, and said decoder is configured to convert said second number (N2) of samples from said stream of samples of the processed signal in the second domain to said first number (N1) of samples in said stream of samples of the electric input signal in the first domain, and wherein the second number (N2) of samples is larger than the first number (N1) of samples, and wherein said at least one encoder is optimized and wherein at least a part of said processing unit providing said compensation for the user's hearing impairment is implemented as a trained neural network.
2. A hearing aid according to claim 1 wherein the first domain is the time domain.
3. A hearing aid according to claim 1 wherein the encoder and/or the decoder is/are implemented as a neural network.
4. A hearing aid according to claim 1 wherein the at least one encoder and the processing unit are configured to be optimized jointly in order to process the at least one electric input signal optimally under a low-latency constraint.
5. A hearing aid according to claim 4 wherein the at least one encoder and the processing unit are configured to be optimized jointly in that they are optimized in a common training procedure with a single cost function.
6. A hearing aid according to claim 4 wherein said low-latency constraint comprises a restriction to the processing time through the hearing device.
7. A hearing aid according to claim 6 wherein said low-latency constraint is related to the processing time through the encoder, the processing unit and the decoder.
8. A hearing aid according to claim 1 wherein parameters of the at least one encoder, the processing unit, and optionally the decoder are trained in order to minimize a cost function given by the difference to a hearing aid comprising linear filter banks instead of said at least one encoder and said decoder.
9. A hearing aid according to claim 8 wherein said parameters of the at least one encoder, the processing unit, and optionally the decoder that participate in the optimization may for the neural network include one or more of the weight-, bias-, and non-linear function-parameters of the neural network.
10. A hearing aid according to claim 8 wherein said parameters of the at least one encoder, the processing unit, and optionally the decoder that participate in the optimization may for the encoder and/or decoder include one or more of the first and second number of samples.
11. A hearing aid according to claim 8 wherein said parameters of the at least one encoder, the processing unit, and optionally the decoder that participate in the optimization may for the encoder include weights of the encoding matrix G.
12. A hearing aid according to claim 1 wherein a transformation matrix (G) of said encoder is an N2×N1 matrix, where N2>N1, such that a transformed signal is S=Gs, where G is a N2×N1 matrix, the input signal s of the first domain is a N1×1 vector, and the transformed signal S of the second domain is a N2×1 vector.
13. A hearing aid according to claim 1 comprising an output unit for providing stimuli perceivable as sound to the user based on said stream of samples of the processed signal in the first domain.
14. A hearing aid according to claim 1 comprising at least one earpiece configured to be worn at or in an ear of the user; and a separate audio processing device, wherein said earpiece and said separate audio processing device are configured to allow an exchange of audio signals or parameters derived therefrom between each other.
15. A hearing aid according to claim 14, wherein said earpiece comprises said at least one input unit; and an output unit for providing stimuli perceivable as sound to the user based on said stream of samples of the processed signal in the first domain.
16. A hearing aid according to claim 14 wherein said separate audio processing device comprises said processing unit.
17. A hearing aid according to claim 14 wherein said separate audio processing device comprises said encoder and/or said decoder.
18. A hearing aid according to claim 14 wherein said earpiece comprises said or an encoder and/or said decoder.
19. A method of operating a hearing aid configured to be worn by a user, the method comprising providing at least one stream of samples of an electric input signal in a first domain, said at least one electric input signal representing sound in an environment of the hearing aid; converting said at least one stream of samples of the electric input signal in the first domain to at least one stream of samples of the electric input signal in a second domain; processing said at least one electric input signal in the second domain to provide a compensation for the user's hearing impairment, and providing a processed signal as a stream of samples in the second domain; converting said stream of samples of the processed signal in the second domain to a stream of samples of the processed signal in the first domain; providing stimuli perceivable as sound to the user based on said stream of samples of the processed signal in the first domain, converting a first number (N1) of samples from said at least one stream of samples of the electric input signal in the first domain to a second number (N2) of samples in said at least one stream of samples of the electric input signal in the second domain, and converting said second number (N2) of samples from said stream of samples of the processed signal in the second domain to said first number (N1) of samples in said stream of samples of the electric input signal in the first domain, and wherein the second number (N2) of samples is larger than the first number (N1) of samples, and wherein said converting of samples in the first domain to samples the second domain is optimized and wherein said compensation for the user's hearing impairment is provided by a trained neural network.
20. A method of optimizing parameters of an encoder-/decoder-based hearing aid in order to minimize a difference between an output signal of a target encoder-/decoder-encoder-based hearing aid and an output signal of a filter bank-based hearing aid, the encoder-/decoder-encoder-based hearing aid comprising a forward path comprising, an encoder configured to convert a stream of samples of an electric input signal in a first domain to a stream of samples of the electric input signal in a second domain; a processing unit configured to process said at least one electric input signal in the second domain, to provide a compensation for the user's hearing impairment, and to provide a processed signal as a stream of samples in the second domain; a decoder configured to convert said stream of samples of the processed signal in the second domain to a first stream of samples of the processed signal in the first domain; the filter bank-based hearing aid comprising a forward path comprising a filter bank operating in the Fourier domain, the filter bank comprising an analysis filter bank for converting said stream of samples of the electric input signal in the first domain to a signal in the Fourier domain; and a processing unit connected to the analysis filter bank and the synthesis filter bank and configured to process said signal in the Fourier domain to compensate for the user's hearing impairment and to provide a processed signal in the Fourier domain; a synthesis filter bank for converting said processed signal in the Fourier domain to a second stream of samples of the processed signal in the first domain; the method comprising providing said stream of samples of an electric input signal in a first domain, said at least one electric input signal representing sound in an environment of the target encoder-/decoder-encoder-based hearing aid and/or the filter bank-based hearing aid; minimizing a cost function given by the difference between said first and second stream of samples of the processed signal in the first domain to thereby optimize said parameters of the encoder-/decoder-based hearing aid.
21. A method according to claim 20 wherein said parameters comprise one or more of weight-, bias-, and non-linear function-parameters of a neural network, and one or more of the first and second number of samples.
22. A method of training according to claim 20 comprising providing a separate delay (D) in the forward path of the encoder-/decoder-based hearing aid in addition to the processing delay of the encoder, the processing unit and the decoder, wherein a delay parameter (D) is used to adjust for an intended latency difference between the target hearing aid and the encoder-based hearing aid.
23. A method according to claim 20 wherein the encoder, the processing unit, and the decoder of the low-latency encoder-based hearing aid are trained as one deep neural network, wherein the first, input, layers of the deep neural network correspond to the encoder, and the last, output, layers correspond to the decoder, and the layers in between correspond to the hearing loss compensation processing.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0146] The aspects of the disclosure may be best understood from the following detailed description taken in conjunction with the accompanying figures. The figures are schematic and simplified for clarity, and they just show details to improve the understanding of the claims, while other details are left out. Throughout, the same reference numerals are used for identical or corresponding parts. The individual features of each aspect may each be combined with any or all features of the other aspects. These and other aspects, features and/or technical effect will be apparent from and elucidated with reference to the illustrations described hereinafter in which:
[0147]
[0148]
[0149]
[0150]
[0151]
[0152]
[0153]
[0154]
[0155]
[0156]
[0157] The figures are schematic and simplified for clarity, and they just show details which are essential to the understanding of the disclosure, while other details are left out. Throughout, the same reference signs are used for identical or corresponding parts.
[0158] Further scope of applicability of the present disclosure will become apparent from the detailed description given hereinafter. However, it should be understood that the detailed description and specific examples, while indicating preferred embodiments of the disclosure, are given by way of illustration only. Other embodiments may become apparent to those skilled in the art from the following detailed description.
DETAILED DESCRIPTION OF EMBODIMENTS
[0159] The detailed description set forth below in connection with the appended drawings is intended as a description of various configurations. The detailed description includes specific details for the purpose of providing a thorough understanding of various concepts. However, it will be apparent to those skilled in the art that these concepts may be practiced without these specific details. Several aspects of the apparatus and methods are described by various blocks, functional units, modules, components, circuits, steps, processes, algorithms, etc. (collectively referred to as “elements”). Depending upon particular application, design constraints or other reasons, these elements may be implemented using electronic hardware, computer program, or any combination thereof.
[0160] The electronic hardware may include micro-electronic-mechanical systems (MEMS), integrated circuits (e.g. application specific), microprocessors, microcontrollers, digital signal processors (DSPs), field programmable gate arrays (FPGAs), programmable logic devices (PLDs), gated logic, discrete hardware circuits, printed circuit boards (PCB) (e.g. flexible PCBs), and other suitable hardware configured to perform the various functionality described throughout this disclosure, e.g. sensors, e.g. for sensing and/or registering physical properties of the environment, the device, the user, etc. Computer program shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise.
[0161] The present application relates to the field of hearing devices. The disclosure relates in particular to such devices configured to have a low delay in the processing of audio signals. [Luo et al.; 2019] describe a scheme for speaker-independent speech separation using a fully convolutional time-domain audio separation network in a deep learning framework (DNN) for end-to-end time-domain speech separation. The DNN uses a linear encoder to generate a representation of the speech waveform optimized for separating individual speakers. Speaker separation is achieved through application of a set of weighting functions (masks) to the encoder output. The modified encoder representations are then inverted back to the waveforms using a linear decoder. The masks are found using a temporal convolutional network (TCN) consisting of stacked 1-D dilated convolutional blocks, which allows the network to model the long-term dependencies of the speech signal while maintaining a small model size.
[0162]
[0163] In the block diagram of a hearing instrument (HD′) shown in
[0164] There is however a limit to how much latency a hearing device can introduce before the processed sound is significantly degraded. Typically, delays exceeding approximately 10 milliseconds (ms) are unacceptable during daily hearing device use.
[0165]
[0166] The LL decoder (LL-DEC) may be jointly optimized together with the processing unit (as the processing unit will typically alter the input signal). As it rarely happens that the input signal is unaltered by the processing unit, a requirement of perfect reconstruction may be unnecessary (and the parameters of the encoder and the decoder may be utilized in a better way).
[0167] Similarly to an analysis filter bank (AFB in
[0168]
[0169]
[0170]
[0171] A transform according to the present disclosure may be different from a Fourier transform in that the transformation matrix (G, related to encoding according to the present disclosure) is an N2×N1 matrix (cf.
[0172] The encoding/decoding functions may be linear, e.g. G(s) could be an N×T matrix, and the decoding function could be a T×N matrix, where N≥T (T being the number of samples in an input frame). A DFT (Discrete Fourier Transform) matrix is a special case of such an encoding function. The encoding/decoding functions may as well be non-linear, e.g. implemented as a neural network, e.g. as a feed-forward neural network. The neural network may be a deep neural network. Perfect reconstruction (i.e. GG.sup.−1=I, where I is a T×T identity matrix) is not a requirement.
[0173] The encoding step may be written as a matrix multiplication:
z=G(s)=f(sU),
where U is a T×N matrix, and f is an optional non-linear function.
[0174] Similarly, G.sup.−1(z)=h(zW), where W is an N×T matrix, and h is an optional non-linear function.
[0175] Some examples exist in literature on the decomposition of speech into a high-dimensional space of basis vectors (i.e. basis functions), see e.g. an illustration of basis function examples in
[0176] A main concept of the present disclosure is shown in
[0177] It is thus proposed to train the parameters in a low-latency encoder/decoder hearing aid (
[0178] Advantages of the proposed model is that the latency of the encoder/decoder-based hearing aid (HD) can be kept at a minimum compared to traditional hearing aid (HD′) processing. It may even allow training towards a hearing aid wherein the delay (of the corresponding filter bank-based hearing aid) is higher than what is typically allowed (e.g. >10 ms, e.g. ≥15 ms). E.g., the analysis filter bank (AFB) may have a higher frequency resolution than what is typically allowed in a hearing aid due to latency. Such a higher resolution will e.g. allow attenuation of noise between the harmonic frequencies of a speech signal.
[0179] The delay parameter D (cf. delay element z.sup.−D inserted in the signal path between the low latency decoder (LL-DEC) and the combination unit (CU)) is used to adjust for the latency difference between the filter bank-based hearing aid and the encoder-based hearing aid (to thereby train towards a hearing aid having a lower latency while exhibiting the benefits of a larger delay (e.g. increased frequency resolution) in the filter bank-based hearing aid). The delay parameter may be substituted with an all-pass filter allowing a frequency-dependent delay. The encoder-based hearing aid (HD) may be trained as one deep neural network, wherein the first layers correspond to the encoder, and the last layers correspond to the decoder. Layers in-between correspond to the noise reduction and hearing loss compensation processing. The network may be trained jointly. In an embodiment the encoder and decoder are trained but may be kept fixed for fine tuning to an individual audiogram (where only the layers in-between are trained). The layers corresponding to the low-latency-encoder and/or of the low-latency-decoder may e.g. be implemented as a feed forward neural network. The layers corresponding to the hearing loss compensation (etc.) may e.g. be implemented as a recurrent neural network.
[0180] In the exemplary training setup of
[0181] The main objective of the training is to provide that the low-latency hearing instrument in the lower part of
[0182] The gained lower latency may be used to compensate for an additional transmission delay, in the case the signals or encoded features partly or fully are processed in an external device. The external device may contain additional microphones, or it may base its calculations on signals from more than one hearing aid, such as a pair of hearing aids mounted on the left and the right ear. Different examples are shown in
[0183]
[0184] The thereby provided lower latency of processing (cf. processing unit PRO in dotted enclosure of the external audio processing device (ExD)) may compensate for the transmission delay incurred by the communication link (LNK) between the earpiece (EP) of the hearing instrument and the external audio processing device (ExD). Hereby the hearing instrument (HD) has access to more processing power compared to local processing in the earpiece (EP), e.g. to better enable computation intensive tasks, e.g. related to neural network computations.
[0185] The parameters of the external audio processing device (ExD) of
[0186] The encoder (LL-ENC) may be implemented with real-valued weights or alternatively with complex-valued weights.
[0187] The earpiece (EP) and the external audio processing device (ExD) may be connected by an electric cable. The link (LNK) may, however, be a short-range wireless (e.g. audio) communication link, e.g. based on Bluetooth, e.g. Bluetooth Low Energy, or Ultra-Wide Band (UWB) technology.
[0188] In the above description, the earpiece (EP) and the external audio processing device (ExD) are assumed to form part of the hearing device (HD). The external audio processing device (ExD) may be constituted by a dedicated, preferably portable, audio processing device, e.g. specifically configured to carry out (at least) more processing intensive tasks of the hearing device.
[0189] The external audio processing device (ExD) may be portable communication device, e.g. a smartphone, adapted to carry out processing tasks of the earpiece, e.g. via an application program (APP), but also dedicated to other tasks that are not directly related to the hearing device functionality.
[0190] The earpiece (EP) may comprise more functionality than shown in the embodiment of
[0191] The earpiece (EP) may e.g. comprise a forward path that is used in a certain mode of operation, when the external audio processing device (ExD) is not available (or intentionally not used). In such case the earpiece (EP) may perform the normal function of the hearing device.
[0192] The hearing device (HD) may be constituted by a hearing aid (hearing instrument) or a headset.
[0193]
[0194] Compared to the embodiment of
[0195] In an embodiment, a hearing device (HD) is provided which is configured to switch between two modes of operation implementing the embodiments of
[0196]
[0197] In an embodiment, spatial cues, such as interaural time differences or interaural level differences are part of the cost function in the optimization process. E.g. the interaural time difference between the left and the right target signals and the estimated left and right target signals may be implemented as a term in the cost function. Alternatively, the interaural transfer functions of the clean speech or the noise may be included in the cost function, in order to preserve spatial cues.
[0198]
[0199]
[0200]
[0201] The hearing aid (HD) exemplified in
[0202] The hearing aid (HD) may comprise a directional microphone system adapted to enhance a target acoustic source relative to a multitude of acoustic sources in the local environment of the user wearing the hearing aid device (e.g. based on the electric input signals from two or more of the microphones (M.sub.1, M.sub.2, M.sub.ITE). The memory unit (MEM) may comprise predefined (or adaptively determined) complex, frequency dependent constants defining predefined or (or adaptively determined) beam patterns, etc.
[0203] The memory (MEM) may e.g. comprise data related to the user, e.g. preferred settings.
[0204] The hearing aid of
[0205] The hearing aid (HD) according to the present disclosure may comprise a user interface UI, e.g. as shown in the lower left part of
[0206] The auxiliary device may e.g. be constituted by or comprise the external audio processing device (ExD).
[0207] Other aspects related to the control of hearing aid (e.g. the beamformer), the volume setting, specific hearing aid programs for a given listening situation, etc.) may be made selectable or configurable from the user interface (UI). The user interface may e.g. be configured to allow a user to decide on specific modes of operation of the latency setup, cf. e.g. as discussed in connection with
[0208] It is intended that the structural features of the devices described above, either in the detailed description and/or in the claims, may be combined with steps of the method, when appropriately substituted by a corresponding process.
[0209] As used, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well (i.e. to have the meaning “at least one”), unless expressly stated otherwise. It will be further understood that the terms “includes,” “comprises,” “including,” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will also be understood that when an element is referred to as being “connected” or “coupled” to another element, it can be directly connected or coupled to the other element, but an intervening element may also be present, unless expressly stated otherwise. Furthermore, “connected” or “coupled” as used herein may include wirelessly connected or coupled. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items. The steps of any disclosed method are not limited to the exact order stated herein, unless expressly stated otherwise.
[0210] It should be appreciated that reference throughout this specification to “one embodiment” or “an embodiment” or “an aspect” or features included as “may” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. Furthermore, the particular features, structures or characteristics may be combined as suitable in one or more embodiments of the disclosure. The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects.
[0211] The claims are not intended to be limited to the aspects shown herein but are to be accorded the full scope consistent with the language of the claims, wherein reference to an element in the singular is not intended to mean “one and only one” unless specifically so stated, but rather “one or more.” Unless specifically stated otherwise, the term “some” refers to one or more.
REFERENCES
[0212] [Luo & Mesgarani; 2019] Yi Luo, Nima Mesgarani, “Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation”, IEEE/ACM transactions on audio, speech, and language processing, 27(8), 1256-1266 (2019). [0213] [Lewicki & Sejnowski; 2000] Michael S. Lewicki, Terrence J. Sejnowski, “Learning Overcomplete Representations”, Neural Computation, 12, 337-365, Massachusetts Institute of Technology (2000). [0214] [Bell & Sejnowski; 1996] Anthony J Bell and Terrence J Sejnowski, “Learning the higher-order structure of a natural sound”, Network: Computation in Neural Systems, 7, 261-266, IOP Publishing Ltd (1996).