Hearing device system and method for processing audio signals
11832058 · 2023-11-28
Assignee
Inventors
Cpc classification
H04R2225/39
ELECTRICITY
International classification
Abstract
A hearing device system and the method for processing audio signals are described. The hearing device system has at least one hearing device having a recording device for recording an input signal, at least one neural network for separating at least one audio signal from the input signal and a playback device for playing back an output signal ascertained from the at least audio signal. A calibration device is connected to the at least one hearing device in the data-transmitting manner. The at least one neural network is customizable and/or replaceable by the calibration device.
Claims
1. A hearing device system for processing audio signals, the hearing device system having at least one hearing device comprising: a recording device for recording an input signal; at least one neural network for separating at least one audio signal from the input signal; and a playback device for playing back an output signal ascertained from the at least one audio signal; and a calibration device connected to the at least one hearing device in a data-transmitting manner, wherein: the at least one neural network is replaceable by the calibration device; and the calibration device has at least one neural calibration network for analyzing a calibration signal.
2. The hearing device system according to claim 1, wherein the replacement of the at least one neural network renders a structure of the at least one neural network as customizable.
3. The hearing device system according to claim 1, wherein the calibration device and the at least one hearing device are connected by means of a wireless data connection.
4. The hearing device system according to claim 1, wherein the calibration device is included as part of a mobile phone or part of a wireless microphone.
5. The hearing device system according to claim 1, wherein the at least one neural network is selectable from a plurality of different neural networks by means of the calibration device.
6. The hearing device system according to claim 5, wherein neural networks included in the plurality of different neural networks are transmittable from the calibration device to the at least one hearing device.
7. The hearing device system according to claim 1, wherein the calibration device has a calibration recording device for recording audio data as part of the calibration signal.
8. The hearing device system according to claim 1, wherein the calibration device has a user interface for receiving at least one of user inputs or for outputting information to a user.
9. A method for processing audio signals, the method having the steps of: providing a hearing device system having at least one hearing device having at least one neural network for separating at least one audio signal from an input signal, and a calibration device connected to the at least one hearing device in a data-transmitting manner; providing a calibration signal; analyzing the calibration signal by means of the calibration device; at least one of replacing or customizing the at least one neural network of the at least one hearing device by means of the calibration device on the basis of the analyzed calibration signal; recording an input signal by using a recording device of the at least one hearing device; separating at least one audio signal from the input signal by using the at least one neural network of the at least one hearing device; ascertaining an output signal from the at least one audio signal; and outputting the output signal by using a playback device of the at least one hearing device.
10. The method according to claim 9, wherein the analysis of the calibration signal is effected by using at least one neural calibration network.
11. The method according to claim 9, wherein the calibration device selects the at least one neural network from a plurality of available neural networks.
12. The method according to claim 11, wherein the selected neural network is transmitted from the calibration device to the at least one hearing device.
13. The method according to claim 9, wherein the calibration device conveys operating parameters for the at least one neural network to the at least one hearing device.
14. The method according to claim 9, wherein the calibration signal comprises at least one of audio data, sensor data, user-specific data or system parameters of the hearing device system.
15. The method according to claim 9, wherein the calibration device records audio data as part of the calibration signal.
16. The method according to claim 9, wherein user can influence the at least one of the replacing or the customization of the at least one neural network.
Description
(1) Further details, features and advantages of the inventive technology are obtained from the description of an exemplary embodiment with reference to the figures, in which:
(2)
(3)
(4)
(5) The hearing devices 2 each have a recording device 5 in the form of a microphone. The recording device 5 can be used by the hearing devices 2 to record an input signal E in the form of audio data. The input signal E normally comprises a plurality of audio signals. In addition, the hearing devices 2 each have a playback device 6 in the form of a loudspeaker for playing back an output signal. The hearing devices 2 each have a neural network 7. The neural network 7 is used to separate at least one audio signal from the input signal E. The neural network 7 is an artificial neural network that, in the exemplary embodiment shown, is executed by a computing unit 8 of the respective hearing device 2. The computing unit 8 is not depicted in detail and has a processor, in particular an AI chip, and a main memory.
(6) In addition, the hearing devices 2 each have a data interface 9 for the wireless data connection 4. In the exemplary embodiment shown, the data interface 9 is a Bluetooth antenna. The calibration device 3 also has a corresponding data interface 9.
(7) The hearing devices 2 each have a power supply 10 in the form of a storage battery. The power supply 10 supplies the respective hearing device 2, in particular the recording device 5, the computing unit 8 having the neural network 7, the playback device 6 and the data interface 9, with power for operating the respective hearing device 2.
(8) During operation, the hearing devices 2 perform signal processing. This involves the input signal E being recorded by using the respective recording device 5. The neural network 7 separates at least one audio signal from the input signal E. An output signal A is ascertained from the separated audio signals, said output signal being played back by using the playback device 6. The recording, processing and playback of audio signals is therefore effected in the hearing devices 2 without said audio signals needing to be conveyed to external devices. The latency of the signal processing from recording through to playback is minimized as a result.
(9) The physical properties of the hearing devices 2, in particular the small size thereof, mean that the capacity of the storage battery 10 and the computing power of the computing unit 8 are limited. This limits the processability of the input signal E. In order to allow the high quality of the processing of the input signal E and customization of the output signal A even when the capacity of the storage battery 10 is low and computing power of the computing unit 8 is low, the neural network 7 is customized to the input signal E and/or the audio signals to be separated therefrom. The neural network 7 specialized in this manner can be operated with low computing power and with low power consumption. In order to ensure the specialization for different instances of application, the neural network 7 is customizable and/or replaceable by using the calibration device 3, as will be explained below. The customizability and/or replaceability of the neural network 7 ensures reliable processing of the input signal E even under changing conditions.
(10) The calibration device 3 is a mobile device. In the exemplary embodiment shown, the calibration device 3 is in the form of a mobile phone or smartphone. This means that the calibration device 3 has the hardware of a commercially available mobile phone, software designed for calibrating the hearing devices 2 being installed and executable on the mobile phone. The software can be loaded onto the mobile phone in the form of an app, for example. Established mobile phones have a high level of computing power. Such mobile phones can thus be used to effect complex analysis of a calibration signal. Commercially available mobile phones moreover regularly have an AI chip that can be used to execute neural networks efficiently.
(11) The calibration device 3 has a power supply 11 in the form of a storage battery. Storage batteries of established mobile phones have a charging capacity. The calibration device 3 has a long storage battery operating time.
(12) The calibration device 3 has a calibration recording device 13. The calibration recording device 13 is used to record audio data as a calibration input signal K. The calibration recording device 13 has at least one microphone of the mobile phone. Established mobile phones regularly have multiple microphones. The calibration recording device 13 can make use of a plurality of microphones if need be, in order to record the calibration input signal K using multiple channels, for example as a stereo signal. As a result, in particular spatial information is ascertainable by means of the calibration input signal K.
(13) The calibration device 3 has a signal processing unit 12. By using the signal processing device 12, a calibration signal, in particular the calibration input signal K, is analysable, as will be described in detail below. On the basis of the analysis of the calibration input signal K, the calibration device 3 ascertains the neural network 7, and/or the operating parameters thereof, most suited to processing the input signal E. The neural network 7 and the operating parameters thereof are conveyed to the hearing devices 2 by the calibration device 3 by means of the wireless data connection 4.
(14) The calibration device 3 has a data memory 14. The data memory 14 stores a multiplicity of different neural networks 7, 7a, 7b, three of which are shown in exemplary fashion in
(15) By means of the customization and/or selection of the neural network 7, it is in particular influenceable which audio signals are separated from the input signal E. By way of example, a neural network 7 may specialize in detecting human voices and separating them from the audio signal. The neural network 7 may additionally or alternatively also specialize in the respective type of input signal. By way of example, different neural networks 7 can be used for separating human voices in a restaurant or when on the road. The operating parameters can be used to stipulate the selection of the audio signals to be separated even more accurately. By way of example, a description of three specific voices of speakers with whom the user is conversing can be handed over to the hearing devices 2 as part of the operating parameters. From a large set of human voices, the neural network 7 then separates only those voices that are accordant with the description handed over. The operating parameters can also be used to perform prioritization for the audio signals separated from the input signal E. As such, it is possible to stipulate for example that individual audio signals are amplified or rejected.
(16) The signal processing unit 12 is moreover connected to further sensors 15 of the mobile phone. Exemplary sensors are GPS sensors and/or motion sensors. The sensor data S ascertained by the sensors 15 are useable in addition or as alternative to the calibration input signal K as calibration signals for analysing and ascertaining the best-suited neural network 7 and the operating parameters thereof.
(17) The analysis of the calibration input signal K and/or the sensor data S can be effected in different ways by using the signal processing unit 12. The specific type of analysis is not significant for the functional separation of calibration and signal processing. In the exemplary embodiment depicted, the signal processing unit 12 has at least one neural calibration network 16. The signal processing unit 12 has a computing unit, not shown more specifically, of the mobile phone. The signal processing unit 12 has an AI chip, in particular. The AI chip has for example two, in particular five, teraflops. The neural calibration network 16 is used to separate individual audio signals from the calibration input signal K. As a result, the calibration input signal K and in particular the audio signals contained therein that are relevant to the user are ascertainable. It is therefore possible for the neural network 7 best suited to separation by using the hearing device 2 to be ascertained on the basis of the analysis of the calibration input signal K that is performed by said neural network.
(18) The signal processing unit 12 also has a user interface 17 connected to it. In the case of the calibration device 3 in the form of a mobile phone, the user interface 17 is formed by a touchscreen. The user interface 17 can be used to display information about the hearing device system 1, in particular about the audio signals separated from the calibration input signal K, to the user. The user can use the user interface 17 to influence the replacement and/or customization of the neural network 7 by the calibration device 3. Depending on user inputs, for example other operating parameters and/or another of the neural networks 7, 7a, 7b can be conveyed to the hearing devices 2 in order to ensure signal processing by the hearing devices 2 that is consistent with the user preferences.
(19) User-specific data 18 resulting from earlier user inputs and/or previously analysed calibration input signals K can be stored in the data memory 14. The signal processing unit 12 can save the user-specific data 18 on the data memory 14 and retrieve and analyse them as part of the calibration signal. User-specific data 18 can contain for example information pertaining to preferences and/or needs of the user, for example a preset that specific types of audio signals are supposed to be amplified or rejected.
(20) The calibration device 3 has a further data interface 19. The data interface 19 is used to make a data connection 20 to an external memory 21. The external memory 21 can be a cloud memory. The data interface 19 is in particular a mobile phone network or W-LAN data interface. The cloud memory 21 can be used to mirror the data from the data memory 14. This has the advantage that the user can replace the calibration device 3 without the user-specific data 18 being lost. A further advantage of the connection to the cloud memory 21 is that the cloud memory 21 can also be used to store an even larger number of neural networks 7, 7a, 7b, so that neural networks 7, 7a, 7b optimally customized to the situation can be loaded onto the hearing devices 2 by means of the calibration device 3 as required. The data interface 19 can also be used to load updates for the hearing device system 1, in particular the calibration device 3 and the hearing devices 2.
(21) Referring to
(22) The steps used when using the hearing device system 1 are discussed below. In this case, the steps are associated with the calibration device 3 and the hearing devices 2. For clarification purposes, the respective devices are indicated as dashed borders around the respective associated method steps. First of all, a calibration recording step 25 involves the soundscape G of the restaurant 22 being recorded as a calibration input signal K by using the calibration recording device 13 and being handed over to the signal processing device 12. The soundscape G and hence also the calibration input signal K normally comprises an unknown number of different audio signals. In the exemplary embodiment shown, the calibration input signal K comprises the spoken voice f.sub.i associated with the three friends F.sub.i and also the background noise b.
(23) The signal processing device 12 is used to analyse the calibration input signal K in an analysis step 26. To this end, a calibration separation step 27 first of all involves multiple audio signals that the calibration input signal K contains being separated from the latter. In the exemplary embodiment depicted, the voice data f.sub.i associated with the friends F.sub.i and the background noise b corresponding to a remainder signal are separated from the calibration input signal K. The separation is effected in the calibration separation step 27 by using the at least one neural calibration network 16.
(24) The calibration separation step 27 can comprise a preparation step, not depicted more specifically, for conditioning the calibration input signal K. The preparation step can comprise conventional conditioning, for example. The conventional conditioning can involve for example direction information ascertained on the basis of multiple microphones being ascertained and used for normalizing the sounds. Moreover, the preparation step can involve a first neural calibration network being used to condition the calibration input signal K. An exemplary preparation step can be consistent for example with the preparation step described with reference to FIG. 3 in DE 10 2019 200 954.9 and DE 10 2019 200 956.5.
(25) The conditioned calibration input signal K can be broken down into individual audio signals by a second neural calibration network in a particularly simple and efficient manner, for example. The actual separation following the preparation step can be effected using one or more second neural calibration networks. Different second neural calibration networks may be customized for separating different audio signals. Separation using multiple second neural calibration networks can be effected for example as described with reference to FIG. 4 in DE 10 2019 200 954.9 and DE 10 2019 200 956.5.
(26) The calibration separation step 27 is followed by a classification step 28. The classification step 28 involves the audio signals f.sub.i, b separated from the calibration input signal K being evaluated. On the basis of the evaluation, the calibration device 3 detects that the user is in the restaurant. The classification step 28 can therefore limit the selection of available neural networks 7, 7a, 7b to networks specializing in the separation of audio signals from a soundscape typical of restaurants.
(27) The classification step can alternatively or additionally also involve sensor data S ascertained in a sensor reading step 29 being used. By way of example, the GPS position of the user can be used to ascertain the presence of said user in the restaurant 22. The motion profile of the user can be used to detect that said user is not moving, that is to say is staying in the restaurant. Furthermore, it is also possible for other location-specific data, such as for example a W-LAN access point associated with the restaurant 22, to be used for determining the whereabouts of the user and for selecting the suitable neural network 7 from the available set of neural networks 7, 7a, 7b.
(28) To determine the neural network 7 to be used, the audio signals f.sub.i, b separated from the calibration input signal K are moreover analysed and matched to user preferences and/or user specifications. By way of example, the audio signals f.sub.i corresponding to the friends F.sub.i can be identified as speakers known to the user. To this end, the applicable audio signals f.sub.i can be matched to against voice signals already detected and used previously. The speakers known to the user may be stored as user-specific data in the data memory 14 and/or the cloud memory 21 and can be matched to the separated audio signals f.sub.i in a data matching step 32. In the classification step 28, the system therefore automatically detects the audio signals f.sub.i important to the user. The system can therefore automatically detect that a neural network 7 is needed that can separate three audio signals corresponding to the voices of the friends F.sub.i from the soundscape G typical of restaurants.
(29) The analysis step 26 is followed by a calibration step 30. The calibration step 30 involves the neural network 7 ascertained on the basis of the evaluation of the calibration input signal K being loaded from the data memory 14 or from the cloud memory 21 and transmitted from the calibration device 3 to each of the hearing devices 2. The neural network 7 is also used to transmit operating parameters V.sub.i, which are also referred to as vectors, to the hearing devices 2. The neural network 7 and the operating parameters V.sub.i together form a transmission signal (7, V.sub.i) that is transmitted from the calibration device 3 to the hearing devices 2. The operating parameters V.sub.i convey information pertaining to each of the audio signals to be subsequently separated by means of the neural network 7. In the instance of application depicted, the vectors V.sub.i each contain a description of the voice of the applicable friend F.sub.i and an associated priority parameter. The description of the respective voices is used to ensure that the neural network separates only the voices of the friends and not the voices of other restaurant guests B from the soundscape G. The respective priority parameters indicate the factor by which the respective audio data f.sub.i are each supposed to be amplified.
(30) The neural network 7 transmitted to the hearing devices 2 in the calibration step 30 is initiated in the hearing devices 2 on the basis of the operating parameters V.sub.i by using an initiation step 31.
(31) Following the initiation of the neural network 7 using the operating parameters V.sub.i, the signal processing can be effected by using the hearing devices 2. Using the computing power of established AI chips, the hearing devices 2 can start the signal processing by means of the calibration device 3 within a short time after the provision of the calibration signal, in particular after the recording of the calibration input signal K. The period of time for the initiation is dependent in particular on whether the at least one neural network 7 is customized or replaced. If only operating parameters V.sub.i for customizing the neural network 7 are transmitted to the hearing devices 2, this can take place for example in particular within 1 s, in particular within 750 ms, in particular within 500 ms, in particular within 350 ms. When the neural network 7 is replaced, the new network needs to be transmitted, which is possible for example within 2 ms, within 900 ms, in particular within 800 ms, in particular within 750 ms.
(32) The signal processing proceeds independently in each of the hearing devices 2. In a recording step 33, the respective soundscape G is recorded in the form of the input signal E by using the recording device 5. The input signal E is forwarded to the computing unit 8, where it is processed in a processing step 34. In the processing step 34, the audio signals f.sub.i corresponding to the friends F.sub.i are first of all separated from the input signal E in a separation step 35 by using the neural network 7. The separated audio signals f.sub.i are subsequently modulated in a modulation step 36 on the basis of the priority parameters handed over with the operating parameters V.sub.i. In this case, the audio signals f.sub.i are amplified or rejected according to the preferences of the user. The modulated audio signals f.sub.i are combined in the modulation step 36 to produce an output signal A. The output signal A is forwarded to the playback device 6. The playback device 6 plays back the output signal for the user in the form of a sound output G′ in a playback step 37.
(33) Following the calibration by the calibration device 3, the signal processing is effected entirely on the hearing devices 2. The recording, processing and playback are effected by using the hearing devices 2 and hence without perceptible latency for the user.
(34) The further signal processing can be effected by the hearing devices 2 independently of the calibration device 3. The calibration device 3 can be used to perform further customizations of the neural network 7, however, and/or also to replace the neural network 7 used for the separation step 35. In particular, the calibration is checked and, if need be, customized at regular intervals by virtue of the neural network 7 being replaced and/or customized by the calibration device 3.
(35) In parallel with the signal processing by the hearing devices 2, the calibration device 3 can furthermore record a calibration input signal K in the calibration recording step 25 and analyse it by using the analysis step 26. This allows the accuracy of the analysis and hence of the selection and/or customization of the neural network 7 to be increased. By way of example, the calibration separation step 27 can be customized to the results of the classification step 28 by a classification feedback loop 38. This allows the at least one neural calibration network 16 used in the calibration separation step 27 to be customized to the results of the classification step 28. If the classification step 28 recognizes surroundings of the user, for example on the basis of the sensor data S and/or the audio signals separated from the calibration input signal K, the at least one neural calibration network 16 can be customized to the user surroundings and the soundscape to be expected therein. As such, it is possible for neural calibration networks 16 customized to different situations to be used, for example. In the application example depicted in
(36) Moreover, the operating parameters V.sub.i can be customized or the neural network 7 can be replaced on the basis of user inputs in a user input step 39. The user inputs can be made by using the user interface 17. By way of example, the user can influence the modulation of the signals as a whole. Moreover, the audio signals f.sub.i, b ascertained in the analysis step 26 can be displayed to the user by means of the user interface 17. The user can deliberately select individual instances of the audio signals f.sub.i, b in the user input step 39 in order to initiate the separation of said audio signals by using the neural network 7 and/or to influence the modulation of said audio signals.
(37) Replacement of the neural network 7 is necessary in particular when the input signal E changes. If for example the sensor data S ascertained in the sensor reading step 29 are used to detect that the user is leaving the restaurant 22, replacement of the neural network 7 may be called for. By way of example, the user can exit the restaurant 22 onto the street. In this case, a neural network 7 specializing in road noise can be selected and conveyed to the hearing devices 2. This ensures that audio signals from vehicles, for example approaching cars, are separated from the input signal E and played back for the user as part of the output signal A.
(38) In other instances of application, it is also possible for more than one neural network 7 to be handed over to the hearing devices 2. For example, when the user leaves the restaurant together with his friends F.sub.i. In this case, separation of both the audio signals f.sub.i of the friends F.sub.i and audio signals from other road users, for example approaching vehicles, may be called for. In such an instance of application, two neural networks 7 can be handed over to the hearing devices 2 in order to be able to separate and process a larger number of audio signals. One of the neural networks 7 can specialize in the separation of approaching vehicles from an input signal E typical of road traffic. The second neural network 7 can specialize in the separation of human voices from the input signal E typical of road traffic. In this case, the audio signals relevant to the user can be separated from the input signal E with low computational and power consumption.
(39) In yet other instances of application, the calibration device 3 can also temporally deactivate the neural network 7 of the hearing devices 2. If the user is with his friends F.sub.i in otherwise quiet surroundings, for example, the input signal E corresponds substantially to the audio signals f.sub.i. Separation and/or amplification of the audio signals f.sub.i from the input signal E is therefore not necessary. When the neural network 7 is deactivated, the output signal A is determined from the input signal E by amplifying the latter directly. This is possible with low computational complexity and low power consumption. As soon as further sounds are added to the audio signals f.sub.i, i.e. the hearing situation becomes more complex, the calibration device can detect this and automatically reactivate the neural network 7 of the hearing devices 2. In this case, the neural network 7 can be customized and/or replaced in order to calibrate the hearing devices 2.
(40) In yet another instance of application, the neural network 7 can also be deactivated by the calibration device 3 if a state of charge of the power supply 10 of the hearing devices 2 is below a predetermined limit value. This allows use of the hearing devices 2 to be ensured for a longer period of time even when the state of charge of the power supply 10 is low.
(41) In the instances of application described above, the number of audio signals to be separated from the input signal E is automatically stipulated by the calibration device 3. By using the user interface 17, the user can additionally manually stipulate the number of audio signals to be separated. The user can display the audio signals separated from the calibration input signal K by means of the user interface, i.e. on the display of the calibration device 3. The user can then select individual audio signals to be separated. Alternatively, the user can use an appropriate controller to stipulate the number of audio signals to be separated. The calibration device 3 then selects the applicable number of audio signals in accordance with the respective relevance ascertained by means of the analysis of the calibration input signal.
(42) In a further exemplary embodiment, which is not depicted, the computing unit of the at least one hearing device comprises an application-specific integrated circuit (ASIC) for executing the at least one neural network. The computing unit is optimally customized to perform a respective neural network. The neural network can be executed particularly efficiently as a result. The network is customizable to the respective instance of application, in particular the number and type of audio signals to be separated from the input signal, however, as a result of the vectors calculated by using the calibration unit being handed over. The customization is effected by virtue of the weighting within the network being customized. In some exemplary embodiments in which the computing unit of the at least one hearing device is embodied as an application-specific integrated circuit, the at least one neural network can be nonreplaceable.
(43) In a further exemplary embodiment, which is not depicted, a hearing device system comprises no external, wearable hearing devices but rather at least one implantable hearing device. In one exemplary embodiment, the at least one hearing device can be a cochlear implant. In further exemplary embodiments, the at least one hearing device is a different implant, for example a middle-ear implant or a brain stem implant.