HEARING DEVICE SYSTEM AND METHOD FOR PROCESSING AUDIO SIGNALS

20210281958 · 2021-09-09

    Inventors

    Cpc classification

    International classification

    Abstract

    A hearing device (2) comprises a recording unit (5) for recording an input signal (I), an audio processing unit (6) for determining an output signal (0) and a playback unit (7) for playing back the output signal (0) to a user (U). The audio processing unit (6) comprises a neural network (8) for separating a user voice signal (u) from the input signal (I). Further, a system (1) and a method for processing audio signals are described.

    Claims

    1. A hearing device comprising: a recording unit for recording an input signal, an audio processing unit for determining an output signal, wherein the audio processing unit comprises a neural network for separating a user voice signal from the input signal, and a playback unit for playing back the output signal to a user.

    2. The hearing device according to claim 1, wherein the audio processing unit further comprises a classical audio signal processing means for processing at least parts of the input signal to denoise at least parts of the input signal.

    3. The hearing device according to claim 2, wherein the classical audio signal processing means and the neural network are configured to be run in parallel and/or in series.

    4. The hearing device according to claim 1, wherein the neural network is configured as a long short-term memory network with three layers.

    5. The hearing device according to claim 1, further comprising: a voice detection unit for measuring a presence of a user voice signal in the input signal.

    6. A system for processing audio signals, comprising a hearing device comprising an audio processing unit for determining an output signal, wherein the audio processing unit comprises a first neural network for separating a user voice signal from the input signal, a secondary device, wherein the secondary device comprises a secondary audio processing unit for determining a secondary output signal, wherein the secondary audio processing unit comprises a secondary neural network for denoising, at least parts of a secondary input signal, wherein the secondary device configured to form a wireless data connection with the hearing device for transmitting at least parts of the secondary output signal to the hearing device and/or to receive the secondary input signal from the hearing device.

    7. The system according to claim 6, wherein the secondary device comprises a secondary recording unit for recording the secondary input signal.

    8. The system according to claim 6, wherein the secondary neural network is configured to separate the user voice signal from the secondary input signal.

    9. The system according to claim 6, wherein that the secondary audio processing unit comprises a calibration neural network for calibrating the neural network and/or the secondary neural network.

    10. The system according to claim 6, wherein the secondary device is mobile device or a wireless microphone.

    11. The system according to claim 6, wherein the data connection is implemented using a BLUETOOTH protocol or using a proprietary protocol, wherein the proprietary protocol has a lower latency than BLUETOOTH.

    12. A method for processing audio signals, the method comprising: recording an input signal using the recoding unit, determining an output signal using the audio processing unit, wherein a user voice signal is separated from the input signal by the neural network, providing the output signal to the user using a hearing device.

    13. The method according to claim 12, further comprising: determining the output signal comprises denoising at least parts of the input signal by a classical audio signal processing means.

    14. The method according to claim 12, wherein at least parts of the input signal are denoised by the classical audio signal processing means in parallel to the separation of the user voice signal by the neural network.

    15. The method according to claim 12, wherein at least parts of the input signal are processed denoised by the classical audio signal processing means after the user voice signal is separated from the input signal by the neural network.

    16. The method according to claim 12, wherein the neural network is a first neural network, further comprising: determining a secondary output signal by denoising at least parts of a secondary input signal using a second neural network, and transmitting at least parts of the secondary output signal to the hearing device.

    17. The method according to claim 16, wherein denoising of the secondary input signal by the secondary neural network comprises separating the user voice signal from the secondary input signal.

    18. The method according to claim 16, wherein the secondary output signal is at least partially included in the output signal by the audio processing unit of the hearing device.

    19. The method according to claim 16, wherein the method further comprises the step calibrating the neural network and/or the secondary neural network using a calibration neural network being part of the secondary audio processing unit.

    20. The method according to claim 19, wherein a calibration input signal is provided to and analyzed by the calibration neural network.

    Description

    BRIEF DESCRIPTION OF THE FIGURES

    [0065] FIG. 1 shows a schematic representation of a system for processing audio signals comprising a hearing device and a secondary device.

    [0066] FIG. 2 shows a schematic representation of a process flow of a method for processing audio signals using the system of FIG. 1.

    [0067] FIG. 3 shows a first operation mode for an audio processing step of the method of FIG. 2.

    [0068] FIG. 4 shows an alternative operation mode for the audio processing step of the method of FIG. 2.

    [0069] FIG. 5 shows a further alternative operation mode for the audio processing step of the method of FIG. 2.

    [0070] FIG. 6 shows a further alternative operation mode for the audio processing step of the method of FIG. 2.

    DETAILED DESCRIPTION

    [0071] FIG. 1 shows a schematic representation of a system 1 for processing audio signals. The system 1 comprises a hearing device 2 and a secondary device 3. In the shown embodiment, the hearing device 2 is a hearing aid. In other embodiments, the hearing device may be a hearing implant, for example a Cochlea implant, or a hearable, e.g. a smart headphone. In the shown embodiment, the secondary device 3 is a mobile phone, in particular a smart phone.

    [0072] The hearing device 2 comprises a power supply 4 in form of a battery. The hearing device comprises a recording unit 5, an audio processing unit 6 and a playback unit 7. The recording unit 5 is configured to record an input signal I. The input signal I corresponds to sound, in particular ambient sound, which has been recorded with the recording unit 5. The audio processing unit 6 is configured to determine an output signal O. The playback unit 7 is configured to play back the output signal O to a user U.

    [0073] The audio processing unit 6 comprises a neural network 8 and a classical audio signal processing means 9. The neural network 8 is an artificial neural network. The classical audio signal processing means 9 comprise computational means for audio processing which do not use a neural network. The classical audio signal processing means 9 can, for example, coincide with audio processing means used in known hearing aids such as digital signal processing algorithms carried about in a digital signal processor (DSP). The audio processing unit 6 is configured as an arithmetic unit on which the neural network 8 and/or the classical audio signal processing means 9 can be executed.

    [0074] The neural network 8 is configured to separate a user voice signal u (e.g., FIGS. 3 to 6) from the input signal I. The user voice signal u corresponds to an audio signal representation of the voice of a user U of the hearing device 2. When the voice of the user U is recorded by the recording unit 5, the input signal I contains the user voice signal u. The neural network is trained to identify and separate an audio signal corresponding to a voice with specific voice characteristics. In order to identify the correct voice, the neural network receives a user's speaker embedding together with the input signal I as input variables. The user's speaker embedding is data describing the user's voice characteristics. The neural network 8 separates the user voice signal u from the input signal and returns the user voice signal u as an output variable. If no user voice signal u is contained in the input signal I, the output of the neural network 8 is empty. In alternative embodiments, the neural network 8 is specifically trained to identify and separate only the user's voice. In such embodiments the user's speaker embedding is not needed.

    [0075] The neural network 8 is highly specialized. It can be run efficiently with low computational requirements. Further, running the neural network 8 does not require high energy consumption. The neural network 8 can be reliably run on the hearing device 2 for long times on a single charge of the power supply 4. The neural network 8 can have any suitable architecture for neural net-works. An exemplary neural network 8 is a long short-term memory (LSTM) network with three layers. In an exemplary embodiment, each layer has 256 units.

    [0076] The hearing device 2 comprises a sensor 10. The sensor 10 is a vibration sensor. The sensor 10 detects vibrations caused by the user U speaking. The sensor 10 can be used to measure a presence of the user voice signal u in the input signal I.

    [0077] The hearing device 2 comprises a data interface 11. The secondary device 3 comprises a secondary data interface 12. The hearing device 2 and the secondary device 3 are connected via a wireless data connection 13, e.g., via a standard BLUETOOTH wireless data connection or via a wireless data connection implemented with a proprietary protocol such as the ROGER protocol or such as a proprietary protocol implemented via modifying the BLUETOOTH protocol. Proprietary protocol, such as ROGER, can present the advantage of permitting to reach a lower audio delay than the audio delay than can be achieved with standard protocols.

    [0078] The secondary device 3 comprises a secondary power supply 14. The secondary device 3 comprises a secondary recording unit 15 and a secondary audio processing unit 16. The secondary recording unit 15 comprises one or more microphones to record a secondary input signal J. The secondary input signal J corresponds to sounds, in particular ambient sounds, which have been recorded with the secondary recording unit. Many modern mobile phones comprise several microphones which may be used by the secondary recording unit. Using several microphones, spatial information about the secondary input signal J. Further, the secondary input signal J can be recorded in stereo.

    [0079] The secondary audio processing unit 16 is configured to determine a secondary output signal P. The secondary output signal P is determined based on the secondary input signal J. The secondary audio processing unit 16 comprises a secondary neural network 17. The secondary neural network 17 is configured to separate the user voice signal u from the secondary input signal J. To this end, the secondary neural network 17 uses the same user's speaker embedding as the neural network 8. In contrast to the neural network 8, the secondary neural network 17 does not return the user voice signal u, but the remaining audio signals contained in the secondary input signal J which do not correspond to the user voice signal u. The secondary neural network 17 removes the user voice signal u from the secondary input signal J. In other words, the secondary neural network 17 calculates the relative complement the user voice signal u in the secondary input signal J, i.e. J−u. The secondary neural network 17 is further configured to denoise the secondary input signal J. In other words, the secondary neural network filters noise and the user voice signal u from the secondary input signal J. The output of the secondary neural network 17 hence is the denoised relative complement of the user voice signal u, i.e. a denoised version of the audio signals (J−u). The secondary output signal P comprises the output of the secondary neural network 17.

    [0080] The secondary neural network 17 can perform more advanced operations on the secondary input signal J than the neural network 8 performs on the input signal I. Hence, the secondary neural network 17 requires more computational power. This is possible, because the secondary device 3 does not have comparable constraints concerning computational capabilities and capacity of the power supply as the hearing device 2. Hence, the secondary device 3 is able to run the more complex secondary neural network 17.

    [0081] Any suitable network architecture can be used for the secondary neural network 17. An exemplary secondary neural network is a long short-term memory (LSTM) network with four layers. Per layer, the secondary neural network may comprise 300 units. In other embodiments, the secondary audio processing unit 16 may comprise more than one secondary neural networks 17. In these embodiments, different of the secondary neural networks 17 may be specialized for different purposes. For example, one of the secondary neural networks 17 may be configured to remove the user voice signal u from the secondary input signal J. One or more different secondary neural networks may be specialized for denoising specific kinds of audio signals, for example voices, music and/or traffic noise.

    [0082] The secondary audio processing unit 16 further comprises a calibration neural network 18. The calibration neural network is configured to calibrate the neural network 8 and the secondary neural network 17. The calibration neural network 18 calculates the user's speaker embedding needed identify the user voice signal. To this end, the calibration neural network 18 receives a calibration input signal containing information about the user's voice characteristics. In particular, the calibration neural network 18 uses Mel Frequency Cepstral Coefficients (MFCC) as well as two derivatives therefrom of examples of a user's voice. The calibration neural network 18 returns the user's speaker embedding, used as input variable in the neural network 8 as well as the secondary neural network 17.

    [0083] Any suitable architecture can be used for the calibration neural network 18. An exemplary calibration neural network 18 is a long short-term memory (LSTM) network with three layers and 256 units per layer.

    [0084] The secondary neural network 17 and the calibration neural network 18 are run on the secondary audio processing unit 16. In the shown embodiment, the secondary audio processing unit 16 comprises two secondary arithmetic units 19, on which the secondary neural network 17 and the calibration neural network 18 can be run respectively. In the shown embodiment, the secondary arithmetic units 19 are AI-chips of the secondary device 3. In alternative embodiments, the secondary neural network 17 and the calibration neural network 19 can be run on the same arithmetic unit. In such embodiments, the secondary audio processing unit 16 can be comprised of a single arithmetic unit.

    [0085] The secondary device 3 further comprises a user interface 20. The user interface 20 of the secondary device is a touchscreen of the mobile phone. Via the user interface 20, information about the audio processing on the hearing device 2 and the secondary device 3 is submitted to the user U. Further, the user U can influence the audio processing, e.g. by setting preferences and changing operation modes. For example, the user U can set the degree of denoising and/or the amplification of the output signal.

    [0086] The secondary device 3 comprises secondary device sensors 21. The secondary device sensors 21 collect user data. The audio processing can be adapted based on the user data. For example, the audio processing can be adapted to position and/or movement of the user. In embodiments with several neural networks 17, the user data can, for example, be used to select one or more of the secondary neural networks 17 which are best adapted to the surroundings of the user U.

    [0087] In the shown embodiment, the hardware of the secondary device 3 is the usual hardware of a modern mobile phone. The functionality of the secondary device 3, in particular the functionality of the secondary audio processing unit 16, is provided by software, in particular an app, which is installed on the mobile phone. The software comprises the secondary neural network 17 as well as the calibration neural network 19. Further, the software provides a program surface displayed to the user U via the user interface 20.

    [0088] With reference to FIG. 2 the general method of processing audio signals with the system 1 is explained. The method can be stored on a non-transitory computer-readable medium and a processor or processors can access the non-transitory computer-readable medium to execute the method. The non-transitory computer-readable medium can be stored on one or multiple devices. In a provision step 25, the system 1 is provided. That is, in the provision step 25 the hearing device 2 and the secondary device 3 are provided. For example, the user U can purchase the hearing device 2 and install a corresponding app on his mobile phone. Alternatively, the user U may purchase the hearing device 2 together with a corresponding secondary device 3.

    [0089] After the provision step 25, the system 1 is calibrated in a calibration step 26. In the calibration step 26, the calibration neural network 18 is used to calibrate the neural network 8 on the hearing device 2 as well as the secondary neural network 17 on the secondary device 3. Samples of the user's voice are recorded using the secondary recording unit 15. The secondary audio processing unit 16 calculates the Mel Frequency Cepstral Coefficients (MFCC) as well as two derivatives from the samples of the user's voice. The calibration neural network evaluates the calculated Mel Frequency Cepstral Coefficients and the derivatives 18 to calculate the user's speaker embedding. The calculated user's speaker embedding is provided to the secondary neural network 17. The calculated user's speaker embedding transferred to the hearing device 2, in particular the neural network 8, via the data connection 13.

    [0090] The samples of the user's voice are recorded for a given amount of time, e.g. between 5 seconds and 30 minutes. For example, the samples may be recorded for about 3 minutes. The more samples, meaning the more time the samples are recorded, the more precise the calibration becomes. In the shown embodiment, the calibration is performed once, when the user U starts to use the system 1. In other embodiments, the calibration step 26 can also be repeated at later times, in order to gradually improve the user's speaker embedding and therefor the quality of the separation of the user voice signal u from the input signal I and the secondary input signal J respectively.

    [0091] The calibrated system can be used for audio processing by the user in an audio processing step 27. In the audio processing step 27, the hearing device 2 is used to generate the output signal O which is played back to the user U. The system 1 provides different operation modes for the audio processing step 27. In the FIGS. 3 to 6, several different operation modes for the audio processing step 27 are described in detail.

    [0092] A first operation mode 28, which is shown in FIG. 3, involves the hearing device 2 and the secondary device 3. For sake of clarity, the hearing device 2 is shown as a broken line to surround all steps performed by the hearing device 2. Similarly, all steps performed by the secondary device 3 are enclosed by a broken line symbolizing the secondary device 3.

    [0093] Suppose that the user is in a surrounding with the ambient sound S. The ambient sound S is recorded as the input signal I by the recording unit 5 of the hearing device 2 in an input recording step 30. The input signal I may comprise the user voice signal u and further audio signals marked with the letter R. The audio signals R are the relative complement of the user voice signal u in the input signal I: R=I−u. At the same time, the ambient sound S is recorded by the secondary recording unit 15 of the secondary device 3 in form of a secondary input signal J in a secondary input step 31. The secondary input signal J mainly coincides with the input signal I, e.g., it may contain the user voice signal u and the further audio signals R. The Possible differences between the input signal I and the secondary input signal J may be caused by the different positions of the recording unit 5 and the secondary recording unit 15 and/or their different recording quality.

    [0094] In the following, the input signal I and the secondary input signal J are processed in parallel in the hearing device 2 and the secondary device 3. The secondary input signal J is passed to the secondary audio processing unit 16 for a secondary output signal determination step 32. In the secondary output signal determination step 32, the secondary neural network 17 removes the user voice signal u from the secondary input signal J in a user voice signal removal step 33. The remaining audio signals R are denoised in a denoising step 34 using the secondary neural network 17. In other embodiments, the user voice signal removal step 33 and the denoising step 34 can be executed in parallel by the secondary neural network 17. In further embodiments, the user voice signal removal step 33 and the denoising step 34 can be subsequently performed by two different secondary neural networks.

    [0095] The denoised remaining audio signals are transmitted as the secondary output signal P to the secondary device 2 in a transmission step 35.

    [0096] The audio processing unit step 6 of the hearing device 2 performs an output signal determination step 36. In the output signal determination step 36 the neural network 8 is used to separate the user signal u from the input signal I in a user voice signal separation step 37. After the user voice signal separation step 37, the user voice signal u is combined with the secondary output signal P which has been received from the secondary device 3 in a combination step 38. In the combination step 38, the user voice signal u and the denoised secondary output signal P can be mixed with varying amplitudes in order to adapt the output signal O to the preferences of the user U. The output signal O contains the user voice signal u and the secondary output signal P. The output signal O is transferred to the playback unit 7. The output signal O is played back to the user U in form of the processed sound S′ in a playback step 39.

    [0097] Since the user voice signal u and the secondary output signal P can be amplified before being combined, the user can choose how loud the user voice signal is in respect to the remaining audio signals R. In particular, the user can choose that the user voice signal u is not being played back to him.

    [0098] In the above described operation mode 28 of audio processing step 27, the user voice signal u as well as the rest of the audio signals R are processed by neural networks, i.e. the neural network 8 and the secondary neural network 17, respectively. Processing the user voice signal u directly on the hearing device 2 has the advantage that the processed user voice signal u has not to be transferred from the secondary device 3 to the hearing device 2.

    [0099] Hence, the user voice signal can be processed and played back to the user with low latency. Disturbing echoing effects, which occur when the user hears his own voice and the processed version of the own voice. At the same time the rest of the audio signals R are denoised using the secondary neural network 17 on the secondary device 3, which ensures optimum quality of the output signal O and the processed sound S′. Processing the rest of the audio signals R on the secondary device 3 requires transmitting the secondary output signal P from the secondary device 3 to the hearing device 2. This increases the latency, with which the rest of the audio signals R are played back to the user. However, the echoing effect is less pronounced for audio signals which do not correspond to the user's voice, the increased latency of the playback of the rest of the audio signals does not disturb the user.

    [0100] In this regard, it is important to mention that the audio processing step 27 is a continuous process in which the input signal I and the secondary input signal J are permanently recorded and processed. Due to the lower latency of the processing of the user voice signal u, the processed user voice signal u is combined with a secondary output signal P which corresponds to audio signals R which have been recorded slightly earlier than the user voice signal u.

    [0101] In total, the latency, with which the user voice signal u is played back to the user, is 50 ms or less, in particular 25 ms or less, in particular 20 ms or less, in particular 15 ms or less, in particular 10 ms or less.

    [0102] In the operation mode 28 shown in FIG. 3 the user voice signal u is combined with the secondary output signal P from the secondary device 3. The audio processing unit does not process further audio signals. Hence, the classical audio signal processing means 9 are deactivated in the operation mode 28.

    [0103] With reference to FIGS. 4 to 6 alternative operation modes are described. In the alternative operation modes, the hearing device 2 determines the output signal O independent of the secondary device 3. These operation modes hence do not require the data connection 13. These operation modes can be used when the data connection 13 is lost. This is particularly advantageous, when the secondary device 3 is switched off or low on battery. Hence, the alternative operation modes can be used to save battery on the secondary device 3.

    [0104] FIG. 4 shows an alternative operation mode 28a for the audio processing step 27. Components and steps which have already been discussed with reference to FIG. 3 have the same reference numbers and are not discussed in detail again. The operation mode 28a differs from the operation mode 28 shown in FIG. 3 in how the output signal O is determined by the audio processing unit 6 in the output determining step 36a.

    [0105] In the output determining step 36a the input signal I is duplicated. One duplicate of the input signal I is processed in the user voice signal separation step 37 by the neural network 8. The user voice signal separation step 37 returns the user voice signal u in high quality. In parallel, a copy of the input signal I is classically denoised in a classical denoising step 40 using the classical audio signal processing means 9. The denoised input signal I′ is combined with the user voice signal u in a combination step 38a. The output signal O hence contains the user voice signal u and the classically denoised input signal I′. In operation mode 28a the neural network 8 and the classical audio signal processing means 9 are run in parallel by the audio processing unit 6. However, the output signal O contains the high-quality user voice signal and the entire classically denoised input signal I′ which itself contains the user voice signal u with less quality.

    [0106] FIG. 5 shows an alternative operation mode 28b of the audio processing step 27. Components and steps which have already been discussed with reference to the FIG. 3 or 4 have the same reference numbers and are not discussed in detail again. Operation mode 28b only differs in the way how an output signal determination step 36b is performed from operation mode 28a. The input signal I is duplicated. In the user voice signal separation step 37 the user voice signal u is separated from one of the duplicates of the input signal I. In a subtraction step 41 the separated user voice signal u is subtracted from the other duplicate of the input signal I. The subtraction step returns the remaining audio signal R. The remaining audio signals R are then denoised in a classical denoising step 40b using the classical audio signal processing means 9. The denoised remaining audio signals R are forming the output signal O which is being played back to the user U. In operation mode 28b the user voice signal u is not played back the user U, which is particularly advantageous for people who can hear their own voice and do not need reproduction of their own voice by the hearing device 2. Using the neural network 8 it is guaranteed that the output signal O and the processed sound S′ contain no fragments of the user voice signal u which might distract the user U.

    [0107] In FIG. 6 a further alternative operation mode 28c for the audio processing step 27 is shown. Components and steps which have already been discussed with reference to one of the FIGS. 3 to 5 have the same reference numbers and are not discussed in detail again. The operation mode 28 of FIG. 6 only differs in the way the output signal O is determined in an output signal determining step 36c from the previously described operation modes. In the output signal determining step 36c, the input signal I is duplicated. The user voice signal u is separated from one duplicate of the input signal I via the user voice signal separation step 37. The user voice signal u obtained in the user voice signal separation step 37 is duplicated. One duplicate of the user voice signal u is subtracted from the input signal I in the subtraction step 41, resulting in the remaining audio signals R. The remaining audio signals R are classically denoised in the classical denoising step 40b. The denoised remaining audio signals R are combined in a combination step 38c with the user voice signal u. The resulting output signal O comprises the user voice signal u as well as the classically denoised remaining audio signals R. The operation mode 28c has the advantage that the user voice signal is played back to the user with high quality together with classically denoised audio signals.

    [0108] In another operation mode, which is not shown in the figures, the output signal determination step 36 is performed without using the neural network 8. The neural network 8 may be temporarily deactivated, e.g., when the input signal I does not comprise the user voice signal u. In this use cases the neural network 8 is deactivated and the input signal I is simply processed by the classical audio signal processing means 9. This operation mode might be used to save energy, in particular when the charging state of the power supply 4 is low. This operation mode can also be used when the input signal I does not comprise the user voice signal u.

    [0109] In a variant of the above-described operation modes, the output signal determination step comprises an additional pre-processing step for pre-processing the input signal I. In the preprocessing step the hearing device 2 can use sensor data of sensor 10 in order to measure whether the user voice signal u is present. To do so, the sensor 10 measures vibrations caused by the user speaking. Alternatively, the presence of the user voice signal u can be measured using the relative loudness of the user's voice in respect to other audio signals.

    [0110] The different operation modes can be chosen by the user U, e.g., by a command input via the user interface 20. This way the user can choose whether he wants his own voice to be played back to him or not. Further, the user can choose with which quality the remaining audio signals R are denoised, in particular whether the remaining audio signals R are denoised using the secondary neural network 17 of the secondary device 3 or the classical audio signal processing means of the hearing device 2.

    [0111] The system 3 can also automatically change between the different operation modes. For example, the hearing device 2 will automatically use one of the operation modes 28a, 28b, 28c discussed with reference to FIGS. 4 to 6 when the data connection 13 to the secondary device 3 is lost. Also, the secondary device 3 can trigger a change in the operation modes. For example, when the secondary input signal J does not contain a lot of noise, the denoising using the secondary neural network might not be needed. Hence, the secondary device 3 may monitor how much noise is found in the secondary input signal J. If the amount of noise is found to be low, the secondary device 3 may send a command to the hearing device 2 to switch to one of the operation modes which are shown in FIGS. 4 to 6. The command can be part of the secondary output signal P. The secondary device 3 may monitor the amount of noise in the secondary input signal J from time to time to evaluate whether the amount of noise has changed. If the amount of noise increases, the secondary device 3 may send a command to the hearing device 2 to initiate the operation mode 28 shown in FIG. 3.

    [0112] In further embodiments which are not shown in the figures, the system comprises more than one hearing device, in particular two hearing devices.