HEARING DEVICE ARRANGEMENT AND METHOD FOR AUDIO SIGNAL PROCESSING
20230262400 · 2023-08-17
Inventors
- Markus Hofbauer (Hombrechtikon, CH)
- Andreas Breitenmoser (Wetzikon, CH)
- Claudio Santelli (Stäfa, CH)
- Paul Wagner (Portland, OR, US)
- Sebastian Kroedel (Erlenbach ZH, CH)
- Manon Barbier (Soultzmatt, FR)
- Manuela Feilner (Egg, CH)
Cpc classification
H04R2460/03
ELECTRICITY
H04R25/554
ELECTRICITY
International classification
Abstract
A hearing device arrangement includes two hearing devices which are connected to each other in a data transmitting manner. Each hearing device includes an audio input unit for obtaining an input audio signal, a processing unit for audio signal processing of the input audio signal to obtain an output audio signal, a neural network which, when executed by the processing unit performs a processing step of the audio signal processing, and an audio output unit for outputting the output audio signal. The hearing device arrangement is configured to transmit neural network data of the neural network of at least one of the hearing devices to the respective other hearing device to be used in the audio signal processing by the processing unit of the respective other hearing device.
Claims
1. Hearing device arrangement, comprising two hearing devices (L, R) which are connected to each other in a data transmitting manner, each hearing device (L, R) comprising an audio input unit (3L, 3R) for obtaining an input audio signal (IL, IR), a processing unit (4L, 4R) for audio signal processing of the input audio signal (IL, IR) to obtain an output audio signal (OL, OR), a neural network (5L, 5R; 105L, 105R; 205L, 205R; 305L, 305R; 605L, 605R; 705L, 705R; 805L, 805R; 905L, 905R; 1005L, 1005R; 1105L, 1105R) which, when executed by the processing unit (4L, 4R), performs a processing step of the audio signal processing, and an audio output unit (7L, 7R) for outputting the output audio signal (OL, OR), wherein the hearing device arrangement is configured to transmit neural network data (ND; NO; NP; NOR, NOL; M; ML, MR) of the neural network (5L, 5R; 105L, 105R; 205L, 205R; 305L, 305R; 605L, 605R; 705L, 705R; 805L, 805R; 905L, 905R; 1005L, 1005R; 1105L, 1105R) of at least one of the hearing devices (L, R) to the respective other hearing device (L, R) to be used in the audio signal processing by the processing unit (4L, 4R) of the respective other hearing device (L, R).
2. Hearing device arrangement according to claim 1, wherein the neural networks (805L, 805R; 1005L, 1005R; 1105L, 1105R) of the respective hearing devices (L, R) are configured to perform different processing steps of the audio signal processing.
3. Hearing device arrangement according to claim 1, wherein the hearing device arrangement is further configured to use the neural network data (NP; NO) of the neural network (205L, 205R; 705L, 705R; 1005L; 1105L) of at least one of the hearing devices (L, R) as a neural network input for the neural network (205L, 205R; 705L, 705R; 1005R; 1105R) of the respective other hearing device (L, R).
4. Hearing device arrangement according to claim 1, wherein the hearing device arrangement is further configured to use a neural network output (NO) of the neural network data (NP; NO) of the neural network (205L, 205R; 705L, 705R; 1005L; 1105L) of at least one of the hearing devices (L, R) as a neural network input for the neural network (205L, 205R; 705L, 705R; 1005R; 1105R) of the respective other hearing device (L, R).
5. Hearing device arrangement according to claim 1, wherein the hearing device arrangement is further configured to assign a work cycle (W1, W2, W3) to each of the neural networks (5L, 5R; 105L, 105R; 205L, 205R; 305L, 305R), wherein the work cycles (W1, W2, W3) govern the execution of the neural network (5L, 5R; 105L, 105R; 205L, 205R; 305L, 305R) by the respective processing units (4L, 4R).
6. Hearing device arrangement according to claim 5, wherein the work cycles (W1, W2, W3) of neural networks (5L, 5R; 105L, 105R; 205L, 205R; 305L, 305R) of different of the two hearing devices (L, R) differ.
7. Hearing device arrangement according to claim 5, wherein the work cycles (W1, W2, W3) of neural networks (5L, 5R; 105L, 105R; 205L, 205R; 305L, 305R) of different of the two hearing devices (L, R) alternate.
8. Hearing device arrangement according to claim 5, wherein the hearing device arrangement is further configured to determine the work cycles (W1, W2, W3) based on internal states and/or external states of the hearing device arrangement.
9. Hearing device arrangement according to claim 1, wherein the hearing device arrangement is further configured to transmit features (FL, FR) obtained from the input audio signal (IL, IR) and/or sensor data (eL, eR) of at least one of the hearing devices (L, R) to the respective other hearing device (L, R) and to use the transferred features (FL, FR) as part of a neural network input for the neural network (5L, 5R; 105L, 105R; 205L, 205R; 305L, 305R; 605L, 605R; 705L, 705R; 805L, 805R; 905L, 905R; 1005L, 1005R; 1105L, 1105R) of the respective other hearing device (L, R).
10. Hearing device arrangement according to claim 1, wherein the hearing devices (L, R) belong to different users.
11. Method for audio signal processing, comprising the steps of providing two hearing (L, R) devices which are connected to each other in a data transmitting manner, each hearing device (L, R) comprising an audio input unit (3L, 3R) for obtaining an input audio signal (IL, IR), a processing unit (4L, 4R) for audio signal processing of the input audio signal (IL, IR) to obtain an output audio signal (OL, OR), a neural network (5L, 5R; 105L, 105R; 205L, 205R; 305L, 305R; 605L, 605R; 705L, 705R; 805L, 805R; 905L, 905R; 1005L, 1005R; 1105L, 1105R) which, when executed by the processing unit (4L, 4R), performs a processing step of the audio signal processing, and an audio output unit (7L, 7R) for outputting the output audio signal (OL, OR), obtaining respective input audio signals (IL, IR) using the audio input units (3L, 3R) of the hearing devices (L, R), processing the input audio signals (IL, IR) to obtain respective output audio signals (OL, OR) using the processing units (4L, 4R) of the hearing devices (L, R), wherein the processing unit (4L, 4R) of at least one of the hearing devices (L, R) executes the respective neural network (5L, 5R; 105L, 105R; 205L, 205R; 305L, 305R; 605L, 605R; 705L, 705R; 805L, 805R; 905L, 905R; 1005L, 1005R; 1105L, 1105R) to perform a processing step of the audio signal processing, neural network data (ND; NO; NP; NOR, NOL; M; ML, MR) of the executed neural network (5L, 5R; 105L, 105R; 205L, 205R; 305L, 305R; 605L, 605R; 705L, 705R; 805L, 805R; 905L, 905R; 1005L, 1005R; 1105L, 1105R) is transmitted to the respective other hearing device (L, R), and the transmitted neural network data (ND; NO; NP; NOR, NOL; M; ML, MR) is used in the audio signal processing by the processing unit (4L, 4R) of the respective other hearing device (L, R), outputting the output audio signals (OL, OR) by the respective audio output units (7L, 7R) of the hearing devices (L, R).
12. Method according to claim 11, wherein the neural networks (805L, 805R; 1005L, 1005R; 1105L, 1105R) of different of the hearing devices (L, R) are configured to perform different processing steps of the audio signal processing.
13. Method according to claim 11, wherein the transmitted neural network data (NP; NO) is used as a neural network input for the neural network (205L, 205R; 705L, 705R; 1005R; 1105R) of the other hearing device (L, R).
14. Method according to claim 11, wherein a transmitted neural network output (NO) of the transmitted neural network data (NP; NO) is used as a neural network input for the neural network (205L, 205R; 705L, 705R; 1005R; 1105R) of the other hearing device (L, R).
15. Method according to claim 11, wherein a work cycle (W1, W2, W3) is assigned to each of the neural networks (5L, 5R; 105L, 105R; 205L, 205R; 305L, 305R) of the hearing devices (L, R) and the neural networks (5L, 5R; 105L, 105R; 205L, 205R; 305L, 305R) are executed by the respective processing units (4L, 4R) in accordance with the respective work cycle (W1, W2, W3).
16. Method according to claim 15, wherein the work cycles (W1, W2, W3) of different neural networks (5L, 5R; 105L, 105R; 205L, 205R; 305L, 305R) of different hearing devices (L, R) differ.
17. Method according to claim 15, wherein the work cycles (W1, W2, W3) of different neural networks (5L, 5R; 105L, 105R; 205L, 205R; 305L, 305R) of different hearing devices (L, R) alternate.
18. Method according to claim 15, wherein the work cycles (W1, W2, W3) are determined based on internal states and/or external states of the hearing device arrangement.
19. Method according to claim 11, wherein features (FL, FR) obtained from the input audio signal (IL, IR) and/or sensor data (eL, eR) of at least one of the hearing devices (L, R) is provided to the respective other hearing device (L, R) and used as part of a neural network input for the neural network (5L, 5R; 105L, 105R; 205L, 205R; 305L, 305R; 605L, 605R; 705L, 705R; 805L, 805R; 905L, 905R; 1005L, 1005R; 1105L, 1105R) of the respective other hearing device (L, R).
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0073]
[0074]
[0075]
[0076]
[0077]
[0078]
[0079]
[0080]
[0081]
[0082]
[0083]
[0084]
[0085]
[0086]
[0087]
[0088]
DETAILED DESCRIPTION
[0089]
[0090] The hearing device system 2 optionally comprises a peripheral device P. The peripheral device P is in form of a mobile device, in particular a smartphone. The shown configuration of the hearing device system 2 is purely exemplary. Other hearing device systems may not comprise a peripheral device P or may comprise even two or more peripheral devices. Further, it is possible that the hearing device system only comprises one hearing device, for example a hearing device to be worn in or at one of the ears of the hearing device system user.
[0091] In further, not explicitly shown embodiments, the hearing device may belong to different users. In particular, a hearing device arrangement may comprise two or more hearing device systems belonging to different users. Hearing devices of the respective hearing device systems may be connected with each other in data transmitting manner. Hearing devices of different hearing device systems may contribute to audio signal processing by transmitting neural network data from one hearing device to the other. Hearing devices of different hearing device system may be directly connected via wireless data connection or indirectly connected via peripheral devices of the hearing device systems and/or via a remote device, e.g. via the internet. Combining hearing devices of different hearing device systems enhances the flexibility and possibilities of distributed neural network processing.
[0092] The hearing devices L, R each comprise an audio input unit 3L, 3R. Here and in the following, the appendix L, R is used to indicate components or signals or other features belonging to or being associated with the respective hearing device L, R. The audio input units 3L, 3R are configured to obtain a respective input audio signal IL, IR. In the shown embodiment, the audio input units 3L, 3R are configured to receive respective input signals in form of ambient sound SL, SR and to convert the received ambient sound SL, SR to the respective input audio signal IL, IR. For that, the audio input units 3L, 3R each comprise an electroacoustic transducer in the form of e.g. one or more microphones. The received ambient sounds SL, SR may be different for the respective hearing devices L, R due to different positions of the hearing devices L, R, in particular the left and right ear of a hearing device system user, respectively. Differences in the ambient sound SL, SR may in particular result from head shadowing. Correspondingly, the input audio signals IL, IR may differ. In other embodiments, the audio input units may be configured to receive respective input signals in form of audio streams streamed from another device, e.g. from external microphones which receive and convert respective ambient sounds.
[0093] The hearing devices L, R comprise a processing unit 4L, 4R for audio signal processing of the respective input audio signals IL, IR to obtain an output audio signals OL, OR. The processing units 4L, 4R are not depicted in detail. The processing units 4L, 4R may each comprise a data storage on which audio signal processing routines are stored. The processing units 4L, 4R may further comprise a computing device for executing the audio signal processing algorithms stored on the data storage. The computing device may comprise a processor, in particular a central processing unit (CPU). The computing device may further comprise a main storage.
[0094] The hearing devices L, R comprise a neural network 5L, 5R which, when executed, performs a step of the audio signal processing. The neural network 5L, 5R may be stored on and executed by the processing unit 4L, 4R, respectively. The neural networks 5L, 5R of the hearing devices L, R may be trained for equivalent tasks or for different tasks. Neural network processing by executing the neural networks 5L, 5R may perform any suitable step of audio signal processing on the respective hearing device L, R, in particular noise reduction, noise cancellation, noise suppression, noise attenuation, dereverberation, speaker separation, speaker extraction, feature extraction, classification, in particular audio scene classification, and/or own voice extraction. The neural network processing may result in processed audio signals and/or filter masks and/or gain models which can directly be used in and/or converted into an output audio signal. For example, the output audio signal may correspond to a processed audio signal directly outputted by the neural networks 5L, 5R. It is also possible that an audio signal, in particular the input audio signal IL, IR and/or features obtained therefrom, is filtered using a filter mask outputted by the neural networks 5L, 5R. Additionally or alternatively, the neural network processing may indirectly contribute to the audio signal processing, e.g. by steering the further audio signal processing based on a neural network output produced by the neural networks 5L, 5R. For example, the neural networks 5L, 5R may perform an audio scene classification, directional classification and/or self-monitoring. Based on the corresponding results, the further audio signal processing on the hearing devices L, R may be steered. Additionally or alternatively, the neural network processing may contribute to feature extraction, in particular to local feature extraction from the input audio signals IL, IR, and/or other sensor data.
[0095] The hearing device L, R comprise further audio processing routines 6L, 6R. The further audio processing routines 6L, 6R are only exemplarily shown. The further audio processing routines 6L, 6R can in particular comprise further neural networks and/or traditional audio signal processing routines. Further audio signal processing routines 6L, 6R may for example comprise filtering routines which filter audio signals based on a filter mask obtained by a neural network processing of the neural networks 5L, 5R. Further audio signal processing routines 6L, 6R may also comprise conditioning routines for conditioning the input audio signals IL, IR for further processing, in particular for further processing by the neural networks 5L, 5R. For example, audio signal processing routines 6L, 6R may comprise algorithms for feature extraction from an audio signal and/or other sensor data.
[0096] The hearing devices L, R comprise audio output units 7L, 7R for outputting the output audio signals OL, OR. The audio output units 7L, 7R each comprise an electroacoustic transducer in the form of e.g. one or more loudspeakers or receivers.
[0097] The hearing devices L, R comprise sensors 8L, 8R for sensing environmental data EL, ER and/or self-monitoring. Corresponding sensor data eL, eR can be transmitted to the processing units 4L, 4R, respectively for being considered in the audio signal processing. Sensor data eL, eR may contain information on internal states and/or external states of the hearing devices L, R.
[0098] Environmental data EL, ER may for example comprise position and/or movement data. For example, sensors 8L, 8R may comprise accelerometers and/or position sensors. For example, external states such as head movement of the user of the hearing device system 2 may be sensed. Exemplary, internal states may relate to battery level, sensor health and/or processing load.
[0099] The hearing devices L, R each comprise a data connection interface 9L, 9R. The data connection interfaces 9L, 9R establish a wireless data connection 10 in between different devices of the hearing device arrangement 1. The hearing devices L, R are connected by a wireless data connection 10LR. The left hearing device L is connected to the peripheral device P by a wireless data connection 10LP. The right hearing device R is connected to the peripheral device P by a wireless data connection 10RP. The peripheral device P comprises a data connection interface 11 for establishing the wireless data connections 10LP, 10RP. Any suitable protocol can be used for establishing the wireless data connection 10. Different wireless data connections may employ different data connection technologies, in particular data connection protocols. For example, the wireless data connection 10LR between the hearing devices L, R may be based on another data technology than the wireless data connection 10LP, 10RP between the hearing devices L, R, respectively, and the peripheral device P.
[0100] The hearing device arrangement 1 is configured to distribute neural network processing among the devices of the hearing device arrangement 1, in particular the hearing devices L, R. For this purpose, the hearing device arrangement 1 is configured to exchange neural network data ND via the wireless data connection 10LR between the hearing devices L, R. For example, neural network data ND produced by the neural network 5L on the hearing device L is transmitted to the hearing device R and used in the audio signal processing in the hearing device R. Neural network data ND produced by the neural network 5R of the hearing device R may be transmitted to the hearing device L and used in the audio signal processing on the hearing device L.
[0101] The peripheral device P comprises a peripheral processing unit 12. The peripheral processing unit 12 may execute a peripheral neural network 13. The peripheral neural network 13 may contribute to the audio signal processing on the hearing devices L, R. Using the peripheral device P allows to distribute neural network processing on even further devices of the hearing device arrangement 1. Neural network processing on the peripheral device P has the advantage that typical peripheral devices are less restricted with regard to computational power and/or battery capacity. For example, neural network processing on a peripheral device P can be used for complex processing tasks which are not critical with respect to latency, such as general audio scene classification.
[0102] The peripheral device P may comprise peripheral sensors 14 for sensing further environmental data.
[0103] The hearing device system 2 of the hearing device arrangement 1 is in data connection with a remote device 15 in the form of a remote server via remote data connections 16. Remote data connections 16 may for example be established over the Internet. The remote device 15 may comprise remote processing algorithms 17. The remote processing algorithms 17 may contribute to audio signal processing on the hearing devices L, R. For example, remote processing algorithms 17 may comprise one or more neural networks. It is also possible that remote processing algorithms 17 are configured for training neural networks, for example for training and updating neural networks 5L, 5R and/or peripheral neural network 13. Using the remote device 15, audio signal processing, in particular neural network processing as part of audio signal processing, can be distributed on even further devices. In particular, cloud processing can be used for contributing to the audio signal processing on the hearing devices L, R.
[0104] Remote data connections 16 may be established using a peripheral device P, for example by using an internet connection of the peripheral device P. Alternatively or additionally, hearing devices L, R may directly connect to the remote device 15 via a remote data connection 16.
[0105] Using the hearing device arrangement 1, a distribution of neural network processing over several devices is possible. In particular, neural network processing can be distributed among the hearing devices L, R. Preferably, but not mandatorily, one or more peripheral devices, such as peripheral device P, and one or more remote devices, such as remote device 15, can contribute to the audio signal processing on the hearing devices L, R, in particular by executing respective neural networks.
[0106] In the following, exemplary embodiments of audio signal processing are described. The respective audio signal processing may be performed by the hearing device arrangement as shown in
[0107] In the following embodiments, audio signal processing is described with respect to functional steps of the audio signal processing. Data transfer between the functional steps is indicated using arrows. Data transfer between different devices is indicated by arrows with dashed lines. Many of the functional steps can be performed by different devices. In case one of the exemplary functional steps is associated with one of the hearing devices, this is indicated by respective appendix (e.g. “L” for the left hearing device) and/or by a dotted box resembling the respective device. It should be borne in mind, that the following embodiments are only exemplary. In case a functional step is shown to be associated with a specific device, the same functional step or an equivalent functional step may be performed by a different device in another embodiment.
[0108] With reference to
[0109] Input audio signals IL, IR and/or sensor data eL, eR are provided in a respective input step 20L, 20R on the respective hearing devices. The provided data, in particular the input audio signal IL, IR, are inputted to the respective neural network 5L, 5R. The neural networks 5L, 5R are trained for calculating a filter mask ML, MR based on the input audio signal IL, IR and/or sensor data eL, eR, respectively. The filter masks ML, MR are respective neural network outputs of the neural network 5L, 5R. The input audio signals IL, IR are each duplicated and fed into a respective filter module 21L, 21R. Filtering the respective input audio signal IL, IR with the respective filter mask ML, MR, the filter modules 21L, 21R generate the output audio signals OL, OR. The output audio signals OL, OR are outputted to the user in respective audio output steps 22L, 22R. In the audio output step 22L, 22R, the output audio signal OL, OR may be outputted to the user by using a respective audio output unit 7L, 7R.
[0110] The neural networks 5L, 5R may be used to calculate any suitable filter mask, e.g. for noise reduction, cancellation and/or suppression. Further suitable filter masks are for dereverberation, speaker separation, speaker extraction and/or own voice extraction. The neural networks 5L, 5R are trained and executed only with input audio signals IL, IR, respectively, from the audio input units 3L, 3R of the respective hearing device L, R.
[0111] The signal processing on the respective hearing device L, R may be performed independently. However, it is beneficial to distribute the neural network processing among the hearing devices L, R. In the shown embodiment, the neural networks 5L, 5R are executed in accordance with a multiplexing scheme. The neural networks 5L, 5R are executed with different work cycles. The execution of the neural networks 5L, 5R, in particular their respective work cycles, are determined by a control unit 23 of the hearing device arrangement 1. The control unit 23 is a functional unit. The control unit 23 may be incorporated in any of the devices of the hearing device arrangement 1, in particular in the hearing devices L, R.
[0112] In the shown embodiment, multiplexing of the neural network processing leads to an alternating execution of the neural networks 5L, 5R. The neural networks 5L, 5R are executed at different times. In order to provide continuous output audio signals OL, OR, the hearing device arrangement 1 is configured to transmit the outputted filter masks ML, MR to the respective other hearing device. At times where the neural network 5L is executed on the hearing device L, the filter mask ML is used for filtering the input audio signal IL by filter module 2L. Further, the filter mask ML is transmitted to the other hearing device R and used for filtering the input signal IR by filter module 21R. Hence, both filter modules 21L, 21R use the same filter mask ML provided by neural network 5L. At times where the neural network 5R is executed on hearing device R, the filter mask MR is transmitted to hearing device 5L and used in the respective filter modules 21L, 21R for filtering the respective input audio signals IL, IR. This way, it is ensured that a filter mask is provided to the filter module 21L, 21R for filtering the respective input audio signal IL, IR even at times when the respective neural network 5L, 5R is not executed.
[0113] Multiplexing the neural network execution has the advantage that the neural networks 5L, 5R do not have to be both executed at the same time. This reduces computational load and battery consumption on the individual hearing devices L, R.
[0114] Multiplexing the neural network processing is governed by the control unit 23. Different multiplexing schemes may be employed. For example, the neural network processing may be subjected to time multiplexing. In time multiplexing, the neural network processing alternates based on a given time schedule. For example, the respective work cycles can be equally redistributed leading to 50% work cycles on the respective devices.
[0115] Additionally or alternatively, multiplexing may be performed based on sensor data multiplexing. Sensor data multiplexing uses sensor data regarding external and/or internal states of the hearing devices L, R to distribute the work cycles for executing the respective neural networks 5L, 5R. Suitable internal states may be obtained from system monitoring of the hearing devices. Relevant internal states may comprise memory capacity, processor load, working temperature, battery level, sensor health and/or radio strength. For example, the work cycles may be based on the respective battery levels of the hearing devices L, R. The work cycles may be distributed in a way so that an equal amount of battery consumption is achieved. Additionally or alternatively, work cycles may be determined in a way to equalize the state of charge of the respective batteries of the hearing devices L, R.
[0116] External states may be obtained from sensing audio, motion, location, temperature, pressure, light and/or health signals. Respective signals may be obtained using the sensors 8L, 8R of the hearing devices and/or a peripheral sensor 14 of a peripheral device P. Based on external states, different selection criteria can be chosen, such as, for example, signal quality, in particular signal-to-noise-ratio, signal strength, signal reliability, in particular dropouts, signal completeness, spectrum, latency, data availability and/or spatial information about the environment, in particular coherence. For example, sensor data multiplexing may be based on signal quality. This way, the hearing device L, R which has the best input audio signals may be chosen for neural network processing.
[0117] With respect to
[0118] In the embodiment of
[0119]
[0120] The different work cycles W1, W2, W3 are shown not to overlap in
[0121] In other embodiments, the work cycles may also overlap, leading to a parallel execution of one or more neural networks on one or more devices. In yet further embodiments, neural network processing may be distributed, in particular multiplexed, on only two or more than three devices. The size of the hearing device arrangement may be flexibly scaled based on the respective demands.
[0122]
[0123] The embodiment of
[0124] With reference to
[0125] The audio signal processing shown in
[0126] Loss functions minimizing binaural cue distortion may in particular regularize one or more of the following properties: interaural intensity difference (IID, also referred to as interchannel intensity difference), interaural phase difference (IPD, also referred to as interchannel phase difference), interaural coherence (IC, also referred to as interchannel coherence), and overall phase difference (OPD). Particularly suitable loss functions and training methods are described in B. Tolooshams and K. Koishida: “A Training Framework for Stereo-Aware Speech Enhancement using Deep Neural Networks”, arXiv:2112.04939v2, 31.01.2022.
[0127] In inference mode, the neural networks 105L, 105R receive inputs based on the respective input audio signals IL, IR and/or the respective sensor data eL, eR of the both hearing devices L, R for binaural processing. A respective feature extraction step 25L, 25R is performed for extracting features FL, FR from the input audio signals IL, IR and/or the sensor data eL, eR, which have been obtained in an input step 20L, 20R on the respective hearing device L, R. The extracted features FL, FR are transmitted to the respective other hearing device L, R and combined with the features FR, FL extracted thereon. Thus, the input to the neural networks 105L, 105R contains features FL, FR from both hearing devices L, R. Transmitting extracted features FL, FR has the advantage that the corresponding data volume is significantly less than the data volume of corresponding unprocessed input audio signals IL, IR and/or sensor data eL, eR. Transmission of extracted features FL, FR may be multiplexed in accordance with the multiplexing of the neural networks 105L, 105R. For example, when neural network 105L is active, only features FR obtained in feature extraction step 25R are transmitted from the right hearing device to the left hearing device L. This way, transmitted data volume can be further reduced.
[0128] In other embodiments, uncompressed or raw input audio signals IL, IR and/or sensor data eL, eR may be exchanged in between the devices in addition to or alternatively to the extracted features FL, FR.
[0129] The neural networks 105L, 105R are each configured to calculate two filter masks ML, MR. The outputted filter mask ML is specifically adapted for filtering input audio signals IL and/or respective features FL on the left hearing device L. The outputted filter mask MR is specifically adapted for filtering input audio signals IR and/or respective features FR on the right hearing device R. The filter mask ML, MR adapted for the respective other hearing device L, R is transmitted to that hearing device and used in the respective filter module 21L, 21R. For example, during the work cycle in which the neural network 105L on the left hearing device L is executed, the neural network 105L calculates the filter masks ML, MR. The filter mask ML is used as an input to the filter module 21L on the hearing device L itself. The filter mask MR is transmitted to the right hearing device R and used as an input to the respective filter module 21R. During work cycles in which the neural network 105R on the right hearing device R is executed, the neural network 105R outputs filter mask ML, MR. Filter mask MR is used as an input to the filter module 21R on the hearing device R. Filter mask ML is transmitted to the left hearing device L and used as an input in the respective filter module 21L.
[0130] Multiplexing execution of neural networks 105L, 105R for calculating the filter masks ML, MR is more efficient on battery consumption and computational load than calculating the filter masks ML, MR using dedicated neural networks executed in parallel on both hearing devices L, R. Further, the exchange of features FL, FR allows for binaural processing and at the same time provision of dedicated filter masks ML, MR for both hearing devices L, R.
[0131] With reference to
[0132] The audio signal processing in accordance with
[0133] In a feature extraction step 25L, 25R, respective features FL, FR are extracted from the input audio signals IL, IR and/or sensor data eL, eR, which have been obtained in respective input steps 20L, 20R. In contrast to the embodiment shown in
[0134] The neural networks 205L, 205R are configured to calculate a respective filter mask ML, MR. The neural networks 205L, 205R are executed in accordance with a multiplexing scheme and transmit the calculated filter mask ML, MR to the respective other hearing device L, R. Additionally, the neural networks 205L, 205R are configured to exchange neural network parameters NP. Neural network parameters NP may comprise neural network features and/or neural network states, in particular networks weights. Exchange of neural network parameters NP happens upon the switching from executing one neural network 205L, 205R to the other neural network 205R, 205L in accordance with the multiplexing scheme. Exchanging neural network parameters NP has the advantage that context of the processing can be handed over upon switching neural network processing. This way, coherence in processing is improved. A transition from executing one of the neural networks 205L, 205R to the respective other neural network 205R, 205L is smoothened. Further, the transmission of neural network parameters NP, in particular of neural network features, exchanges binaural information to include such binaural information in the audio signal processing.
[0135] In the embodiment of
[0136] With respect to
[0137] In the audio signal processing according to
[0138] Neural networks 305L, 305R are executed in accordance with a multiplexing scheme. Multiplexing of neural network processing is controlled by control unit 23. Neural networks 305L, 305R are alternately executed. The neural network outputs NOR, NOL of the active neural network 305L, 305R is transmitted to the respective other hearing device R, L and used as an input in the steering unit 27R, 27L.
[0139] The neural network output NOL, NOR of the active neural network 305L, 305R is fed into respective steering units 27L, 27R of the hearing devices L, R. Steering units 27L, 27R steer the further audio signal processing on the hearing devices L, R based on the directional classification.
[0140] The audio signal processing as shown in
[0141] With reference to
[0142] Audio signal processing according to
[0143] Quality checkers 31L, 31R evaluate the inputted features FL, FR to determine a quality parameter QL, QR. The quality parameter QL, QR resembles the results of the quality checker. For example, quality parameter QL, QR may resemble a signal quality of the input signals IL, IR and/or a fidelity on the respective audio input unit 3L, 3R and/or the respective sensor 8L, 8R. Quality parameters QL, QR are fed into a respective steering unit 427L, 427R. Steering units 427L, 427R steer the further audio signal processing based on the quality parameters QL, QR. For example, multiplexing of neural networks and further audio signal processing may be controlled based on the signal quality of the respective input audio signal IL, IR.
[0144] In the embodiment of
[0145] Feature extraction units 30L, 30R and/or quality checkers 31L, 31R may comprise neural networks trained for the respective tasks. In case that the extraction units 30L, 30R comprise respective neural networks, multiplexing of the feature extraction units 30L, 30R constitutes multiplexing of neural network processing as a step of the audio signal processing on the hearing devices L, R. The extracted features FL, FR are neural network s transmitted to the respective other hearing device.
[0146] With respect to
[0147] In contrast to the embodiment shown in
[0148] In the embodiment of
[0149] Based on the multiplexing of quality checker 531L, 531R, either quality parameter QL or quality parameter QR is fed into both of the steering units 527L, 527R. Based on the quality parameters QL, QR, further audio signal processing on the hearing devices L, R is steered by the steering units 527L, 527R. For example, multiplexing of neural network processing on hearing devices may be controlled by the steering units 527L, 527R.
[0150] The quality checkers 531L, 531R may comprise respective neural networks. In this case, the alternate execution of quality checkers 531L, 531R comprises multiplexing of neural network processing. The quality parameters QL, QR are neural network outputs which are transmitted from one hearing device L, R to the respective other hearing device R, L.
[0151] With reference to
[0152] Input audio signals IL, IR and/or sensor data eL, eR obtained in an input step 20L, 20R are fed into respective feature extraction steps 25L, 25R. In feature extraction steps 25L, 25R, respective features FL, FR are extracted from the input audio signal IL, IR and/or sensor data eL, eR. The obtained features FL, FR are transmitted to the respective other hearing device R, L and combined with features FR, FL extracted thereon. The combination of features FL, FR is fed into respective neural networks 605L, 605R. Neural networks 605L, 605R are configured to calculate respective filter masks ML, MR which are outputted by the neural network 605L, 605R. Outputted filter masks ML, MR are provided to respective filter modules 21L, 21R. Filter modules 21L, 21R filter the respective input audio signals IL, IR and/or features FL, FR to obtain respective output audio signals OL, OR.
[0153] The filter masks ML, MR calculated on different hearing devices L, R may coincide or differ. In the latter case, differences in the respective input signals IL, IR and/or sensor data eL, eR may be incorporated in the calculation of the filter masks ML, MR.
[0154] Due to exchanging features FL, FR between the hearing devices L, R, neural networks 605L, 605R perform mask calculation based on binaural information. Preferably, the neural networks 605L, 605R may be specifically configured, in particular trained for binaural optimization. The audio signals to be filtered by the filter modules 21L, 21R may also comprise binaural information based on the transmitted features FL, FR.
[0155] In
[0156] In a variant of the embodiment shown in
[0157] With respect to
[0158] In the embodiment according to
[0159] The neural networks 705L, 705R may be trained with input signals being obtained from the respective input step 20L, 20R on the respective hearing device L, R. Additionally, the neural networks 705L, 705R may be trained with neural network parameters NP provided by the respective other neural network 705R, 705L. A loss function for use in training may be optimized for minimizing binaural cue distortion.
[0160] In inference mode, the neural networks 705L, 705R calculate respective filter masks ML, MR based on the respective input signals and the received neural network parameters. The filter masks are provided to filter modules 21L, 21R. Filter modules 21L, 21R filter the respective input audio signals IL, IR using the respective filter mask ML, MR to obtain respective output audio signal OL, OR. The output audio signal OL, OR are outputted in respective audio output steps 22L, 22R.
[0161] A particularly advantageous embodiment is achieved by combining the feature exchange as shown in
[0162] With reference to
[0163] Input audio signals IL, IR and/or sensor data eL, eR are obtained in a respective input step 20L, 20R on each of the hearing devices L, R. The obtained input audio signals IL, IR and/or sensor data eL, eR are inputted in respective neural networks 805L, 805R. The neural networks 805L, 805R process the respective input audio signal IL, IR and/or sensor data eL, eR for performing respective steps of the audio signal processing.
[0164] The neural networks 805L, 805R are executed simultaneously. The neural networks 805L, 805R are configured to perform different steps of the audio signal processing. Neural network 805L on the left hearing device L calculates a neural network output NOL. Neural network 805R on the right hearing device R calculates a neural network output NOR. The neural network output NOL is transmitted from the left hearing device L to the right hearing device R. The neural network output NOR is transmitted from the right hearing device R to the left hearing device L. Thus, both neural network outputs NOL, NOR are available on each hearing device L, R for further processing.
[0165] Each hearing device L, R comprises a respective audio processing routine 33L, 33R for audio signal processing of the respective input audio signals IL, IR to obtain respective output audio signals OL, OR. The neural network outputs NOL, NOR are inputted into the audio processing routines 33L, 33R. The neural network outputs NOL, NOR steer the audio signal processing by the audio processing routines 33L, 33R. The output audio signals OL, OR are outputted in an audio output step 22L, 22R.
[0166] In the shown embodiment, neural network 805L is trained for own voice recognition and/or keyword recognition. Neural network output NOL contains information on whether the own voice and/or a specific keyword has been recognized in the input audio signal IL. Neural network 805R is trained for audio scene classification. Neural network output NOR contains information on the classified audio scene. Using the neural network outputs NOL, NOR, signal processing by the audio processing routines 33L, 33R can be steered, in particular by choosing adequate processing algorithms and/or filters.
[0167] The above-specified network configurations of neural networks 805L, 805R are only exemplary. Of course, other configurations of one or both neural networks 805L, 805R are possible. It is, for example, also possible that neural networks 805L, 805R may be configured for local feature extraction. For example, neural network 805L may extract local features from the input audio signal IL and/or sensor data eL on the left hearing device L. Neural network 805R may be trained for local feature extraction from the input audio signal IR and/or sensor data eR on the right hearing device R. The extracted features may be provided as part of the neural network outputs NOL, NOR to the audio processing routines 33L, 33R to be considered in the audio signal processing.
[0168] The distribution of different tasks to be performed by the neural networks 805L, 805R may be based on computational costs. For example, the neural network processing may be distributed in order to equalize computational load on the hearing devices L, R. Additionally or alternatively, computational tasks may be redistributed to equalize battery consumption and/or battery levels on the hearing devices L, R. For example, if the battery level is low on one of the hearing devices L, R, computational tasks may be redistributed from that hearing device to the other. It is also possible to distribute different computational tasks based on specific hardware one one or both of the hearing devices L, R.
[0169] In the embodiment of
[0170] With reference to
[0171]
[0172] Using additional sensors and/or microphones on the peripheral device P and/or the remote data connection 16, additional information and data can be made available for audio signal processing. Additional information may comprise position data, IoT data, user profile data, user preferences, vital signs, user health data, weather and/or information about other people interacting with the user. Positon data may in particular comprise GPS data, maps and/or meta information about nearby places, for example restaurants and their respective acoustics. Such data may be transmitted from the peripheral device P to the hearing device L, R via the wireless data connections 10LP, 10RP. Such data may be inputted on further processing steps on the peripheral device P, in particular as an input to a peripheral neural network 913.
[0173] The peripheral neural network 913 on a peripheral device P may perform a classification of the audio scene and/or user activity and/or user intention.
[0174] On the hearing devices L, R, neural networks 905L, 905R can be executed for performing a step of the audio signal processing. Neural networks 905L, 905R may, for example, be configured for a local feature extraction. For example, local features may be calculated by the neural network 905L, 905R based on local input audio signals IL, IR and/or sensor data eL, eR. Suitable local features may, for example, be based on head acoustics, head movements, user activity and/or health sensor information. Local features may in particular comprise correlations, coherence, spectra, focus, i.e. what the user is looking at and/or left-right differences.
[0175] Neural network parameters NPL, NPR of neural networks 905L, 905R, respectively, may be provided to the peripheral neural network 913. Such neural network parameter NPL, NPR may be considered by the peripheral neural network 913 in its classification task. For example, neural network parameter NPL, NPR may comprise extracted features and/or pre-classification results obtained by executing neural networks 905L, 905R. Based on the classification preformed by peripheral neural network 913, peripheral neural network data PND may be provided to the neural networks 905L, 905R. Peripheral neural network data PND may comprise a peripheral neural network output, such as the classification result and/or systems steering commands based on the neural network processing of the peripheral neural network 913. Peripheral neural network data PND may influence the neural network execution on the hearing devices L, R.
[0176] The neural networks 905L, 905R provide respective neural network outputs NOL, NOR. Neural network outputs NOL, NOR provide steering commands to respective audio processing routines 33L, 33R. Based on the steering by the neural network outputs NOL, NOR, the respective audio processing routines 33L, 33R perform audio signal processing on the respective input audio signals IL, IR. Output audio signals OL, OR obtained from the respective audio processing routines 33L, 33R are outputted in a respective audio output step 22L, 22R.
[0177] With reference to
[0178] Input audio signals IL, IR and/or sensor data eL, eR are obtained in a respective input step 20L, 20R on each hearing device L, R. In a respective feature extraction step 25L, 25R, features are extracted from the input audio signal IL, IR and/or the sensor data eL, eR.
[0179] The left hearing device L comprises a neural network 1005L. The neural network 1005L receives the features FL extracted in feature extraction step 25L on the left hearing device L. The neural network 1005L comprises an encoder module 35 and a bottleneck module 36. The encoder module 35 encodes the features FL to obtain encoded features F′. Encoded features F′ are passed to the bottleneck module 36. A neural network output NO of the neural network 1005L comprises a combination of encoded features F′ passing by the encoder module 36 and an output of the bottleneck module 36. The neural network output NO of the neural network 1005L is provided to the right hearing device R.
[0180] The right hearing device R comprises a neural network 1005R. The neural network 1005R receives the neural network output transmitted from the left hearing device L as an input. The neural network 1005R realizes a decoder module decoding the neural network output NO to calculate a filter mask M. Neural network parameters NP are transmitted from the encoder module 35 directly to the neural network 1005R. The direct transmission of neural network parameter NP from the decoder module 35 to the neural network 1005R establishes skip connections between the encoder module 35 and the decoder module realized by the neural network 1005R.
[0181] Using the neural networks 1005L, 1005R, a complex Unet-shaped model can be implemented on the hearing device L, R by executing different function modules, in this case the encoder module, bottleneck module and decoder module, on different hearing devices L, R.
[0182] The mask M which is outputted by the neural network 1005R is transmitted to the left hearing device L. The mask M is used as an input to respective filter modules 21L, 21R on each hearing device L, R. The filter modules 21L, 21R output respective output audio signals OL, OR which are outputted in an audio output step 22L, 22R.
[0183] With reference to
[0184] In addition to
[0185] With reference to
[0186] The hearing devices L, R comprise feature extraction unit 1230L, 1230R which extracts local features from input audio signals IL, IR and/or sensor data eL, eR. Local feature extraction with feature extracting units 1230L, 1230R make use of dedicated hardware. For example, features may be extracted from input audio signals IL, IR on dedicated hardware for Short-Time Fourier Transform (STFT) on a processing unit of the hearing device, in particular on a hearing device processor. Additionally or alternatively, feature extraction units 1230L, 1230R may be realized by dedicated sensor hardware, e.g. by an in-ear microphone or an accelerometer. Extracted features FL, FR may be exchanged between the hearing devices by a wireless link. Feature exchange may be used for multiplexing of the feature extraction by the feature extraction units 1230L, 1230R.
[0187] Features FL, FR may optionally be combined in a binaural feature combination step 1226 to obtain a common feature vector F. Binaural feature combination step 1226 may be performed locally on the hearing devices L, R or on a peripheral device P. Common feature vector F may be used as an input in a peripheral neural network 1213. Peripheral neural network 1213 may process feature vector F to calculate steering commands CL, CR for the respective hearing devices L, R. Steering commands CL, CR may be passed to respective steering unit 1227L, 1227R on the respective hearing device L, R. Using steering commands CL, CR, steering units 1227L, 1227R may steer the further audio signal processing on the hearing devices L, R.
[0188] The above-discussed embodiments are only exemplary embodiments. Based on the above description, the skilled person will readily realize further embodiments, in particular alterations to the shown embodiments without departing from the inventive technology described herein and covered by the claims. In particular, it is clear to the skilled person that details of the individual embodiments may be combined. For example, some embodiments show the inclusion of processing a peripheral neural network. It is clear that these embodiments may be realized without using network processing on a peripheral device. It is further clear that also other embodiments may profit from processing a peripheral neural network on a peripheral device. Instead of or additionally to executing a peripheral neural network on a peripheral device, a remote neural network executed on a remote device may contribute to audio signal processing.