Hearing device component, hearing device, computer-readable medium and method for processing an audio-signal for a hearing device

Abstract

A hearing device component (6) comprises a sensor-unit (8) for receiving an audio-signal (AS), a separation device (9) for separating part-signals (PS.sub.i) from the audio-signal (AS), a classification device (10) for classifying the part-signals (PS.sub.i) separated from the audio-signals (AS), and a modulation device (11) for modulating the part-signals (PS.sub.i), wherein the classification device (10) is communicatively coupled to the modulation device (11) and wherein the modulation device (11) is designed to enable a concurrent modulation of different part-signals (PS.sub.i) with different modulation-functions depending on their classification.

Claims

1. Hearing device component comprising: a sensor unit for receiving an audio-signal (AS); a separation device for separating a plurality of source-specific part-signals (PSi) from the audio-signal (AS); a classification device for classifying each part-signal of the plurality of part-signals (PSi) separated from the audio-signal (AS); and a modulation device for modulating each part-signal of the plurality of part-signals (PSi), wherein the classification device is communicatively coupled to the modulation device and wherein the modulation device is configured to enable a concurrent modulation of each part-signal of the plurality of part-signals (PSi) with a source-specific modulation-function that is based on a classification of the respective part-signal by the classification device.

2. The hearing device component according to claim 1, wherein the modulation device comprises a dataset of modulation-functions, which are associated with outputs from the classification device.

3. The hearing device component according to claim 1, wherein the classification device comprises a deep neural network.

4. The hearing device component according to claim 1, wherein the hearing device comprises an interface to receive inputs from an external control unit.

5. The hearing device component according to claim 1, wherein the hearing device further comprises a receiver to provide a combination of the modulated part-signals (PSi) to a user.

6. A non-transitory computer-readable medium storing instructions, which when executed by a processor, cause a hearing device to perform a method, the method comprising: providing an audio-signal (AS), separating a plurality of source-specific part-signals (PSi) from the audio-signal (AS), associating a classification parameter with the separated part-signals (PSi), applying a modulation-function to each part-signal (PSi), wherein the modulation-function for any given part-signal (PSi) is dependent on the classification parameter associated with the respective part-signal (PSi), wherein several part-signals (PSi) can be modulated with source-specific modulation-functions concurrently based on the classification parameter associated with the respective part-signal (PSi), providing the modulated part-signals (PSi) to a receiver.

7. The non-transitory computer-readable medium according to claim 6, wherein the classification and the modulation are executed in parallel.

8. The non-transitory computer-readable medium according to claim 6, wherein at least three part-signals (PSi) are classified and modulated concurrently.

9. The non-transitory computer-readable medium according to claim 6, wherein the modulation-functions are dynamically adapted.

10. The non-transitory computer-readable medium according to claim 6, wherein for each of the part-signals (PSi) separated from the audio-signal (AS) the classification parameter is derived at each time-frequency bin.

11. The non-transitory computer-readable medium according to claim 6, wherein the separation and/or the classification comprises the estimation of power spectrum densities (PSD) and/or signal to noise ratios (SNR) and/or the processing of a deep neuronal net (DNN).

12. The non-transitory computer-readable medium according to claim 6, wherein two or more part-signals (PSi) are modulated together by applying the same modulation-function to each of them.

13. A method for processing an audio-signal (AS) for a hearing device comprising the following steps: providing an audio-signal (AS), separating a plurality of source-specific part-signals (PSi) from the audio-signal (AS), associating a classification parameter to the separated part-signals (PSi), applying a modulation-function to each part-signal (PSi), wherein the modulation-function for any given part-signal (PSi) is dependent on the classification parameter associated with the respective part-signal (PSi), wherein several part-signals (PSi) can be modulated with source-specific modulation-functions concurrently based on the classification parameter associated with the respective part-signal (PSi), providing the modulated part-signals (PSi) to a receiver.

14. The method according to claim 13, wherein at least two of the processing steps selected from the separation step, the classification step and the modulation step are executed in parallel.

15. The method according to claim 13, wherein at least three part-signals (PSi) are classified and modulated concurrently.

16. The method according to claim 13, wherein the modulation-functions are dynamically adapted.

17. The method according to claim 13, wherein for each of the part-signals (PSi) separated from the audio-signal (AS) the classification parameter is derived at each time-frequency bin.

18. The method according to claim 13, the separation and/or the classification comprises the estimation of power spectrum densities (PSD) and/or signal to noise ratios (SNR) and/or the processing of a deep neuronal net (DNN).

19. The method according to claim 13, wherein two or more part-signals (PSi) are modulated together by applying the same modulation-function to each of them.

Description

BRIEF DESCRIPTION OF THE FIGURES

(1) Further details and benefits of the present inventive technology follow from the description of various embodiments with the help of the figures.

(2) FIG. 1A illustrates an exemplary spectrogram of an audio-signal in accordance with some implementations of the inventive technology.

(3) FIG. 1B shows the same spectrogram as FIG. 1A as simplified black and white line drawing signal in accordance with some implementations of the inventive technology.

(4) FIG. 2 shows an embodiment of a hearing device with a separation and classification device followed by different gain models signal in accordance with some implementations of the inventive technology.

(5) FIG. 3 shows three exemplary different gain models for three different types of audio-sources signal in accordance with some implementations of the inventive technology.

(6) FIG. 4 illustrates a variant of a hearing device according to FIG. 2 with a frequency domain source separation and individual gain model for each source category, with information exchange signal in accordance with some implementations of the inventive technology.

(7) FIG. 5 illustrates yet another variant of a hearing device with a microphone array input and a two-stage separation algorithm signal in accordance with some implementations of the inventive technology.

(8) FIG. 6 illustrates yet another variant of a hearing device with an interface to an external control unit signal in accordance with some implementations of the inventive technology.

(9) FIG. 7 illustrates in a highly schematic was a flow diagram of a method for processing audio-signals signal in accordance with some implementations of the inventive technology.

DETAILED DESCRIPTION

(10) Physical sound sources create different types of audio events. They can in turn be categorized. It is for example possible to identify events such as a slamming door, the wind going through the leaves of a tree, birds singing, someone speaking, traffic noise or other types of audio events. Such different types can also be referred to as categories or classes. Depending on the context some types of audio events can be interesting, in particular relevant at any given time, others can be neglected, since they are not relevant in a certain context.

(11) For people with hearing loss decoding such events becomes difficult. The use of a hearing aid can help. It has been recognized, that the usefulness of a hearing aid, in particular the use experience of such hearing aid, can be improved by selectively modulating sound signals from specific sources or specific categories whilst reducing others. In addition, it can be desirable, that a user can individually decide, which types of audio events are enhanced and which types are suppressed.

(12) For that purpose a system is needed, which can analyze an acoustic scene, separate source or category specific part-signals from an audio-signal and modulate the different part-signals in a source-specific manner.

(13) Preferably the system can process the incoming audio stream in real time or at least with a short latency. The latency between the actual sound event and the provision of the corresponding modulated signal is preferably at most 30 milliseconds, in particular at most 20 milliseconds, in particular at most 10 milliseconds. The latency can in particular be as low as 6 ms or even less.

(14) Preferably part-signals from separate audio sources, which can be separated from a complex audio-signal can be processed simultaneously, in particular in parallel. After the source specific modulation of at least some of the different types of audio events, they can be combined again and provided to a loudspeaker, in particular an earphone, commonly referred to as a receiver.

(15) It has been further recognized, that it can be advantageous, in particular it can enhance the user experience, if specific, different profiles referred to as modulation functions, such as gain models, are applied simultaneously to different identified sources.

(16) It is in particular proposed to combine tasks such as source separation from an audio-signal, classification of the separated sources and application of source-specific gain models to the classified source signals. In other words, the modulation function, in particular the gain model, used to modulate a part-signal of the audio-signal, which part-signal is associated with a certain type of category of audio events, for example a certain source, is dependent on the classification of the respective part-signal.

(17) In order to separate and/or classify part-signals PS.sub.i from an audio-signal AS one can analyze the audio-signal in the time-frequency-domain.

(18) In FIG. 1A a spectrogram of an exemplary audio-signal is shown. FIG. 1B shows the same spectrogram as FIG. 1A as simplified black and white line drawing. Different types of source-signals can be distinguished by their different frequency components. For illustrative purposes contribution of speech events 1, traffic noise 2 and public transport noise 3 are highlighted in the spectrograms in FIG. 1A and FIG. 1B as well as background noise 4.

(19) In FIG. 3 three different types of exemplary gain models (gain G vs. input I) for three different types of sources, namely speech 1, impulsive sounds 31 and background noise 4 (BGN) are shown. With this example, speech 1 is emphasized, background noise 4 reduced and impulsive sounds 31 are amplified only up to a set for its output level.

(20) Further gain models are known from the prior art.

(21) To provide more examples of suitable gain models, the following observations are useful:

(22) a. In quiet speech with light noise background and potentially some impulsive events such as a slamming door or rattling cutlery, the background stationary noise can be ignored, while impulsive events should be just slightly amplified and the speech-signals should be enhanced. A training set of different impulsive events can help to define and/or derive a suitable gain model for impulsive sounds.
b. In noisy situations, the background noise should be reduced in order to achieve either a target signal to noise ratio or a target audibility level. However, it should be avoided to remove background noise completely. Such a gain model for background noise keeps the noise audible for comfort, but keeps it below the target speech.
c. In traffic noise, it is important that cars passing by and audio notifications such as traffic light warnings or signal-horns, stay audible for the security of the user. A gain model for warning sounds should be designed with security in mind. The detection of such sound should however mitigate between comfort (low false positive rate) and security (low false negative rate).
d. For music signals different gain models for tonal instruments with sustained sounds, such as string instruments and/or wind instruments, and for percussive instruments with more transient sounds can be applied. Such gain models can be derived by adaptation of the gain model for speech and the gain model for impulsive sounds, respectively.

(23) FIG. 2 shows in a highly schematic fashion the components of a hearing device 5. The hearing device 5 comprises a hearing device component 6 and a receiver 7.

(24) The hearing device component 6 can also be part of a cochlear device, in particular a cochlear implant.

(25) The hearing device component 6 serves to process an incoming audio-signal AS.

(26) The receiver 7 serves to provide a combination of modulated part-signals PS.sub.i to a user. The receiver 7 can comprise one or more loudspeakers, in particular miniature loudspeakers, in particular earphones, in particular of the so-called in-ear-type.

(27) The hearing device component 6 comprises a sensor unit 8. The sensor unit 8 can comprise one or more sensors, in particular microphones. It can also comprise different types of sensors.

(28) The hearing device component 6 further comprises a separation device 9 and a classification device 10. The separation device 9 and the classification device 10 can be incorporated into a single, common separation-classification device for separating and classify part-signals PS.sub.i from the audio-signal AS.

(29) Further, the hearing device component 6 comprises a modulation device 11 for modulating the part-signal PS.sub.i separated from the audio-signal AS. The modulation device 11 is designed such that several part-signals PS.sub.i can be modulated simultaneously. Herein, different part-signals PS.sub.i can be modulated by different modulation-functions depicted as gain models GM.sub.i. GM.sub.1 can for example represent a gain model for speech. GM.sub.2 can for example represent a gain model for impulsive sounds. And GM.sub.3 can for example represent a gain model for background noise.

(30) The modulated part-signals PS.sub.i can be recombined by a synthetizing device 12 to form and output signal OS. The output signal OS can then be transmitted to the receiver 7. For that a specific transmitting device (not shown in FIG. 2) can be used.

(31) If the hearing device component 6 is embodied as physically separate component from the receiver 7, the transmission of the output signal OS to the receiver can be in a wireless way. For that, a Bluetooth, modified Bluetooth, 3G, 4G or 5G signal transmission can be used.

(32) If the hearing device component 6 or at least some parts of the same, in particular the synthesizing device 12, is incorporated into a part of the hearing device 5 worn by the user on the head, in particular close to the ear, the output signal OS can be transmitted to the receiver 7 by a physical signal line, such as wires.

(33) The processing can be executed fully internally in the parts of the hearing device worn by the user on the head, fully externally by a separate device, for example a mobile phone, or in a distributed manner, partly internally and partly externally.

(34) The sensor unit 8 solves to acquire the input signal for the hearing device 5. In general, the sensor unit 8 is designed for receiving the audio-signal AS. It can also receive a pre-processed, in particular an externally pre-procced version of the audio-signal AS. The actual acquisition of the audio-signal AS can be executed by a further component, in particular by one or more separate devices.

(35) The separation device 9 is designed to separate one or more part-signals PS.sub.i (i=1 . . . n) from the incoming audio-signal AS. In general, the part-signals PS.sub.i form audio streams.

(36) The separated part-signals PS.sub.i each correspond to a predefined category of signal. Which category the different part-signals PS.sub.i correspond to is determined by the classification device 10.

(37) Depending on the classification of the different part-signals PS.sub.i the gain model associated with the respective classification is used to modulate the respective part-signal PS.sub.i.

(38) FIG. 2 only shows one exemplary variant of the components of the hearing device 5 and the signal flow therein. It mainly serves illustrative purposes. Details of the system can vary, for instance, whether the gain models GM.sub.i are independent from one stream to the other.

(39) In FIG. 4 a variant of the hearing device 5 is shown, again in a highly schematic way. Same elements are noted by the same reference numerals as in FIG. 2.

(40) In the hearing device 5 according to FIG. 4 the audio-signal AS received by the sensor unit 8 is transformed by a transformation device 13 from the time domain T to the frequency domain F. In the frequency domain F a mask-based source separation algorithm is used. Herein, different masks 14.sub.i can be used to separate different part-signals PS.sub.i from the audio-signal AS. The different masks 14.sub.i are further used as inputs to the different gain models GM.sub.i. By that, they can help the gain models GM.sub.i to take into account meaningful information such as masking effects.

(41) According to a variant (not shown in the figure) the computed masks 14.sub.i can be shared with all the gain models GM in all of the streams of the different part-signals PS.sub.i.

(42) After the modulated part-signals PS.sub.i have been recombined, the output signal OS can be determined by a back-transformation of the signal from the frequency domain F to the time domain T, by the transformation device 19.

(43) According to a further variant, which is not shown in the figures, the separation and classification of the part-signals PS.sub.i can be implemented with a deep neural network DNN. Hereby temporal memory, spectral consistency and other structures, which can be learned from a data base, can be taken into account. In particular, the masks 14.sub.i can be learned independently, with one DNN per source category.

(44) A single DNN could also be used to derive masks 14.sub.i which sum to 1, hence learning to predict the posterior probabilities of the different categories given the input audio-signal AS.

(45) In general, any source separation technique can be used for separating the part-signals PS.sub.i from the audio-signal AS. In particular, classical techniques consisting of estimating power spectrum density (PSD) and/or signal to noise ratios (SNR) to then derive time-frequency masks (TF-masks) and/or gains can be used in this context.

(46) FIG. 5 shows a further variant of the hearing device 5. Similar components wear the same reference numerals as in the preceding variants.

(47) In this variant the sensor unit 8 comprises a microphone array with three microphones. A different number of microphones is possible. It is further possible to include external, physically separated microphones in the sensor unit 8. Such microphones can be positioned at a distance for example of more than 1 m from the other microphones. This can help to use physical cues for separating different sound sources. It helps in particular to use beam former technologies to separate the part-signals PS.sub.i from the audio-signal AS.

(48) Further, the separation and classification device is embodied as a two-stage source separation module 15. The source separation module 15 as shown in an exemplary fashion comprises a first separation stage as the separation device 9. The separation in that separation stage is based mostly or exclusively on physical cues such as a spatial beam, or independent component analysis. In further comprises a second stage as the classification device 10. The second stage focusses on classifying the resulting beam and recombining them into source types.

(49) The two stages can take advantage one from the other. They can be reciprocally connected in an information transmitting manner.

(50) The first stage can for example be modeled by a linear and calibrated system.

(51) The second stage can be executed via a trained machine, in particular a deep neural network.

(52) Alternatively, the first stage or both, the first and the second stage together can be replaced by a data-driven system such as a trained DNN.

(53) As shown in FIG. 6, it has been recognized, that it can be advantageous, to provide the hearing device 5, in particular the hearing device component 6, with an interface 17 to an external control unit 16.

(54) The control unit 16 enables interaction with external input 18, for example from the user or an external agent. The interface 17 can also enable inputs from further sensor units, in particular with non-auditory sensors.

(55) Via the interface 17 it is in particular possible to provide the hearing device component 6 with inputs about the environment.

(56) The external input 18 can for example comprise general scene classification results. Such data can be provided by a smart device, for example a mobile phone.

(57) Such interface 17 for external inputs is advantageous for each of the variants described above.

(58) It can further be advantageous, to provide the hearing device component 6 with an interface for user inputs. In particular, user could use a graphical user interface (GUI) in order to mitigate the balance between background noise, impulsive sounds and speech. For that, the user can set the combination gains and/or actually modify the modulation-functions, in particular the individual gain model parameters.

(59) FIG. 7 shows in a schematic way a diagram of a method for processing the audio-signal AS of the hearing device 5. The audio-signal AS is provided in a provision step 21.

(60) In a separation step 22 at least one, in particular several part-signals PS.sub.i, (i=1 . . . n) are separated from the audio-signal AS.

(61) In a classification step 23 the part-signals PS.sub.i are classified into different categories. For that, a classification parameter is associated with the separated part-signals PS.sub.i.

(62) In a modulation step 24 a modulation-function is applied to each part-signal PS.sub.i. Herein the modulation-function for any given part-signal is dependent on the classification parameter associated with the respective part-signal PS.sub.i.

(63) According to an aspect several part-signals PS.sub.i can be modulated with different modulation-functions concurrently.

(64) In a recombination step 25 the modulated part-signals PS.sub.i are recombined to the output signal OS.

(65) In a transmission step 26 the output signal OS is provided to the receiver 7.

(66) Details of the different processing steps follow from the previous description.

(67) The algorithms for the separation step 22 and/or the classification step 23 and/or the dataset of the modulation-functions for modulating the part-signals PS.sub.i can be stored on a computer-readable medium. Such computer-readable medium can be read by a processing unit of a hearing device component 5 according to the previous description. It is in particular possible, to provide the details of the processing of the audio-signal AS to a computing unit by the computer-readable medium. The computing or processing unit can herein be embodied as external processing unit or can be inbuilt into the hearing device 5.

(68) The computer-readable medium or the instructions and/or data stored thereon may be exchangeable. Alternatively, the computer-readable medium can be non-transitory and stored in the hearing device and/or in an external device such as a mobile phone.

(69) In the following, some aspects, which can be advantageous respective of the other details of the embodiment of the hearing device 5 are summarized in keywords:

(70) The separation of the part-signals PS.sub.i and/or their classification can be done in the time domain, in the frequency domain or in the time-frequency domain. It can in particular involve classical methods of digital signals processing, such as masking and/or filtering, only.

(71) The separation and/or the classification of the part-signals PS.sub.i from the audio-signal AS can also be done with help of one or more DNN.

(72) The hearing device 5 can comprise a control unit 16 for interaction with the user or an external agent. It can in particular comprise an interface 17 to receive external inputs.

(73) At the input stage, the hearing device 5 can in particular comprise a sensor array. The sensor array comprises preferably one, two or more microphones. It can further comprise one, two or more further sensors, in particular for receiving non-auditory inputs.

(74) The number of part-signals PS.sub.i separated from the audio-signal AS at any given time stamp can be fixed. Preferably, this number is variable.

(75) At any given time stamp several different modulation-functions, in particular gain models, can be used simultaneously to modulate the separated part-signals PS.sub.i.

(76) Whereas it will usually suffice to modulate each part-signal PS.sub.i by a single modulation-function depending on its classification, it can be advantageous, to modulate one and the same part-signal PS.sub.i with different modulation-functions. Such modulation with different modulation-functions can be done in parallel, in particular simultaneously. Such processing can be advantageous, for example if the classification of the part-signal PS.sub.i is not certain to at least a predefined degree. For example, it might be difficult to decide, whether a given part-signal PS.sub.i is correctly classified as human speech or vocal music. If a part-signal PS is to be modulated by different modulation-functions, it is preferably first duplicated. After the modulation, the two or more modulated signals can be combined to a single modulated part-signal, for example by calculating some kind of weighed average.

(77) The use of different modulation-functions, in particular separate gain models for different types of part-signals PS.sub.i, can lead to improvements in the efficiency of the processing of the audio-signal AS. In particular, it makes the global design of the gain model easier.

(78) A further advantage of the proposed system is, that it allows to define very flexibly how to deal with different types of source-signals, in particular also with respect to interferes, such as noise. Furthermore, the classification type source separation also allows to define different target sources, such as speech, music, multi-talker situations, etc.

Hearing device component, hearing device, computer-readable medium and method for processing an audio-signal for a hearing device

Assignee

Inventors

Cpc classification

Classification Explorer

G10L25/30

PHYSICS

Classification Explorer

G10L21/0232

PHYSICS

Classification Explorer

H04R25/505

ELECTRICITY

Classification Explorer

G10L21/0272

PHYSICS

Classification Explorer

H04R25/507

ELECTRICITY

Classification Explorer

H04R25/43

ELECTRICITY

International classification

Classification Explorer

H04R25/00

ELECTRICITY

Abstract

Claims

Description