Hearing device for own voice detection and method of operating a hearing device
11115762 · 2021-09-07
Assignee
Inventors
- Ullrich Sigwanz (Hombrechtikon, CH)
- Nadim El Guindi (Zurich, CH)
- Daniel Lucas-Hirtz (Rapperswil, CH)
- Nina Stumpf (Mannedorf, CH)
Cpc classification
H04R25/40
ELECTRICITY
H04R1/1041
ELECTRICITY
H04R2201/107
ELECTRICITY
International classification
Abstract
A hearing device configured to be worn at a head of a user. The hearing device includes a vibration sensor to detect a vibration conducted through the user's head to the hearing device and to output a vibration signal including information about the vibration. At least part of the vibration can be caused by an own voice activity of the user. A method of operating the hearing device allows a reliable own voice detection at rather low processing effort. The processor determines a presence of an own voice characteristic in the vibration signal at a vibration frequency associated with the own voice characteristic and identifies the own voice activity based on an identification criterion including the presence of the own voice characteristic in the vibration signal at the associated vibration frequency. The own voice characteristic indicative of part of the vibration can be caused by the own voice activity.
Claims
1. A hearing device configured to be worn at least partially at a head of a user, the hearing device comprising: a vibration sensor configured to detect a vibration conducted through the user's head to the hearing device, at least a part of the vibration caused by an own voice activity of the user, and to output a vibration signal comprising information about said vibration; and a processor communicatively coupled to the vibration sensor; wherein the processor is configured to: determine a presence of a first own voice characteristic in the vibration signal at a first vibration frequency associated with the first own voice characteristic, said first associated vibration frequency being selected such that it comprises a harmonic frequency of said part of the vibration caused by the own voice activity and the first own voice characteristic being indicative of said part of the vibration caused by the own voice activity; determine a presence of a second own voice characteristic in the vibration signal at an associated second vibration frequency; and identify the own voice activity based on an identification criterion comprising said presence of the first own voice characteristic in the vibration signal at the associated first vibration frequency, and said presence of the second own voice characteristic in the vibration signal at the associated second vibration frequency.
2. The device according to claim 1, characterized in that the processor is configured to determine a temporal sequence of said presence of the first own voice characteristic and the second own voice characteristic in the vibration signal, wherein said identification criterion further comprises said presence of the first own voice characteristic temporally preceding said presence of the second own voice characteristic in the vibration signal.
3. The device according to claim 1, characterized in that at least one of the first own voice characteristic or the second own voice characteristic comprises a peak of the vibration signal at at least one of the first associated vibration frequency or the second associated vibration frequency.
4. The device according to claim 1, characterized a microphone configured to detect a sound conducted through an ambient environment of the user and to output an audio signal comprising information about said sound, the processor communicatively coupled to the microphone.
5. The device according to claim 4, characterized in that the processor is configured to determine a presence of an own voice characteristic in the audio signal at an audio frequency associated with the own voice characteristic, the own voice characteristic in the audio signal indicative of at least a part of said sound which is caused by the own voice activity, wherein said identification criterion further comprises said presence of the own voice characteristic in the audio signal at the associated audio frequency.
6. The device according to claim 5, characterized in that the processor is configured to: determine a signal feature of the vibration signal; determine a signal feature of the audio signal; and determine a similarity measure between the signal feature of the vibration signal and the signal feature of the audio signal, wherein at least one of said presence of at least one of the first own voice characteristic or the second own voice characteristic in the vibration signal at least one of the first associated vibration frequency or the second associated vibration frequency and said presence of the own voice characteristic in the audio signal at the associated audio frequency is determined based on the similarity measure.
7. The device according to claim 4, characterized in that the processor is configured to determine an intensity of the audio signal and to select said associated vibration frequency depending on said audio signal intensity.
8. The device according to claim 1, characterized in that the vibration sensor comprises an accelerometer.
9. The device according to claim 1, characterized in that said vibration signal comprises first directional data indicative of a first direction of said part of the vibration caused by the own voice activity, and second directional data indicative of a second direction of said part of the vibration caused by the own voice activity, wherein the processor is configured to determine said presence of at least one of the first own voice characteristic or the second own voice characteristic in the first directional data and in the second directional data, wherein said identification criterion further comprises a coincidence of said presence of at least one of the first own voice characteristic or the second own voice characteristic in the first directional data and in the second directional data.
10. The device according to claim 1, characterized in that at least one of the first associated vibration frequency or the second associated vibration frequency is selected such that it comprises an alias frequency of a frequency of said part of the vibration caused by the own voice activity.
11. The device according to claim 1, characterized in that the processor is configured to evaluate the vibration signal at a sampling rate of at most 1 kHz.
12. The device according to claim 1, characterized in that the processor is configured to determine a signal feature of the vibration signal; classify, based on a pattern of own voice characteristics, the signal feature as at least one of the first own voice characteristic or the second own voice characteristic; and identify the vibration frequency associated with at least one of the first own voice characteristic or the second own voice characteristic.
13. A method of operating a hearing device configured to be worn at least partially at a head of a user, the method comprising: detecting a vibration conducted through the user's head to the hearing device, at least a part of the vibration caused by an own voice activity of the user; providing a vibration signal comprising information about said vibration; determining a presence of a first own voice characteristic in the vibration signal at a first vibration frequency associated with the first own voice characteristic, said associated first vibration frequency being selected such that it comprises a harmonic frequency of said part of the vibration caused by the own voice activity and the first own voice characteristic being indicative of said part of the vibration caused by the own voice activity; determining a presence of a second own voice characteristic in the vibration signal at an associated second vibration frequency; and identifying the own voice activity based on an identification criterion comprising said determined presence of the first own voice characteristic in the vibration signal at the associated vibration frequency and said presence of the second own voice characteristic in the vibration signal at the associated second vibration frequency.
14. A hearing device configured to be worn at least partially at a head of a user, the hearing device comprising: a vibration sensor configured to detect a vibration conducted through the user's head to the hearing device, at least a part of the vibration caused by an own voice activity of the user, and to output a vibration signal comprising information about said vibration; a processor communicatively coupled to the vibration sensor; a microphone configured to detect a sound conducted through an ambient environment of the user and to output an audio signal comprising information about said sound, the processor communicatively coupled to the microphone; wherein the processor is configured to: determine a presence of an own voice characteristic in the vibration signal at a vibration frequency associated with the own voice characteristic, said associated vibration frequency being selected such that it comprises a harmonic frequency of said part of the vibration caused by the own voice activity and the own voice characteristic being indicative of said part of the vibration caused by the own voice activity; determine a presence of the own voice characteristic in the audio signal at an audio frequency associated with the own voice characteristic, the own voice characteristic in the audio signal indicative of at least a part of said sound which is caused by the own voice activity; determine a signal feature of the vibration signal; determine a signal feature of the audio signal; determine a similarity measure between the signal feature of the vibration signal and the signal feature of the audio signal, wherein at least one of said presence of the own voice characteristic in the vibration signal at the associated vibration frequency and said presence of the own voice characteristic in the audio signal at the associated audio frequency is determined based on the similarity measure; and to identify the own voice activity based on an identification criterion comprising said presence of the own voice characteristic in the vibration signal at the associated, and said presence of the own voice characteristic in the audio signal at the associated audio frequency.
15. A method of operating a hearing device configured to be worn at least partially at a head of a user, the method comprising: detecting a vibration conducted through the user's head to the hearing device, at least a part of the vibration caused by an own voice activity of the user; detecting a sound conducted through an ambient environment of the user; providing a vibration signal comprising information about said vibration; providing an audio signal comprising information about said sound; determining a presence of an own voice characteristic in the vibration signal at a vibration frequency associated with the own voice characteristic, said associated vibration frequency being selected such that it comprises a harmonic frequency of said part of the vibration caused by the own voice activity and the own voice characteristic being indicative of said part of the vibration caused by the own voice activity; determining a presence of an own voice characteristic in the audio signal at an audio frequency associated with the own voice characteristic, the own voice characteristic in the audio signal indicative of at least a part of said sound which is caused by the own voice activity; determining a signal feature of the vibration signal; determining a signal feature of the audio signal; determining a similarity measure between the signal feature of the vibration signal and the signal feature of the audio signal; determining at least one of said presence of the own voice characteristic in the vibration signal at the associated vibration frequency and said presence of the own voice characteristic in the audio signal at the associated audio frequency is based on the similarity measure; and identifying the own voice activity based on an identification criterion comprising said determined presence of the own voice characteristic in the vibration signal at the associated vibration frequency, and said determined presence of the own voice characteristic in the audio signal at the associated audio frequency.
16. A hearing device configured to be worn at least partially at a head of a user, the hearing device comprising: a vibration sensor configured to detect a vibration conducted through the user's head to the hearing device, at least a part of the vibration caused by an own voice activity of the user, and to output a vibration signal comprising information about said vibration; a processor communicatively coupled to the vibration sensor; a microphone configured to detect a sound conducted through an ambient environment of the user and to output an audio signal comprising information about said sound, the processor communicatively coupled to the microphone; wherein the processor is configured to: determine an intensity of the audio signal; determine a presence of an own voice characteristic in the vibration signal at a vibration frequency associated with the own voice characteristic, said associated vibration frequency being selected such that it comprises a harmonic frequency of said part of the vibration caused by the own voice activity, the own voice characteristic being indicative of said part of the vibration caused by the own voice activity, and said audio signal intensity; and to identify the own voice activity based on an identification criterion comprising said presence of the own voice characteristic in the vibration signal at the associated.
17. A method of operating a hearing device configured to be worn at least partially at a head of a user, the method comprising: detecting a vibration conducted through the user's head to the hearing device, at least a part of the vibration caused by an own voice activity of the user; detecting a sound conducted through an ambient environment of the user; providing a vibration signal comprising information about said vibration; providing an audio signal comprising information about said sound; determining an intensity of the audio signal; determining a presence of an own voice characteristic in the vibration signal at a vibration frequency associated with the own voice characteristic, said associated vibration frequency being selected such that it comprises a harmonic frequency of said part of the vibration caused by the own voice activity, the own voice characteristic being indicative of said part of the vibration caused by the own voice activity, and said audio signal intensity; and identifying the own voice activity based on an identification criterion comprising said determined presence of the own voice characteristic in the vibration signal at the associated vibration frequency.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings. The drawings illustrate various embodiments and are a part of the specification. The illustrated embodiments are merely examples and do not limit the scope of the disclosure. Throughout the drawings, identical or similar reference numbers designate identical or similar elements. In the drawings:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
DETAILED DESCRIPTION OF THE DRAWINGS
(9) Referring to
(10) Hearing device 100 may be implemented by any type of hearing device configured to enable or enhance hearing of a user wearing hearing device 100. For example, hearing device 100 may be implemented by a hearing aid configured to provide an amplified version of audio content to a user, an earphone, a cochlear implant system configured to provide electrical stimulation representative of audio content to a user, a sound processor included in a bimodal hearing system configured to provide both amplification and electrical stimulation representative of audio content to a user, or any other suitable hearing prosthesis. Different types of hearing devices can also be distinguished by the position at which a housing accommodating output transducer 100 is intended to be worn at a head of a user relative to an ear canal of the user. Hearing devices which are configured such that the housing enclosing the transducer can be worn at a wearing position outside the ear canal, in particular behind an ear of the user, can include, for instance, behind-the-ear (BTE) hearing aids. Hearing devices which are configured such that the housing enclosing the transducer can be at least partially inserted into the ear canal can include, for instance, earbuds, earphones, and hearing instruments such as receiver-in-the-canal (RIC) hearing aids, in-the-ear (ITE) hearing aids, invisible-in-the-canal (IIC) hearing aids, and completely-in-the-canal (CIC) hearing aids. The housing can be an earpiece adapted for an insertion and/or a partial insertion into the ear canal. Some hearing devices comprise a housing having a standardized shape intended to fit into a variety of ear canals of different users. Other hearing devices comprise a housing having a customized shape adapted to an ear canal of an individual user. The customized housing can be, for instance, a shell formed from an ear mould or an earpiece that is customizable in-situ by the user.
(11) Microphone 106 may be implemented by any suitable audio detection device and is configured to detect a sound presented to a user of hearing device 100. The sound can comprise audio content (e.g., music, speech, noise, etc.) generated by one or more audio sources included in an environment of the user. The sound can also include audio content generated by a voice of the user during an own voice activity, such as a speech by the user. In particular, a vibration of the user's vocal chords during the own voice activity may produce airborne sound in the environment of the user, which is detectable as the audio signal by microphone 106. Microphone 106 is configured to output an audio signal comprising information about the sound detected from the environment of the user. Microphone 106 may be included in or communicatively coupled to hearing device 100 in any suitable manner. Output transducer 110 may be implemented by any suitable audio output device, for instance a loudspeaker of a hearing device or an output electrode of a cochlear implant system.
(12) Vibration sensor 108 may be implemented by any suitable sensor configured to detect a vibration conducted during an own voice activity through the user's head. In particular, the vibrations can be conducted from the user's vocal chords through the bones and tissue of the head. In some implementations, sensor 108 may also be referred to as a bone vibration sensor. Vibration sensor 108 is configured to output a vibration signal comprising information about the detected vibrations. Vibration sensor 108 may be positioned at any position at the user's head allowing the detection of the vibrations conducted through the head. In some implementations, vibration sensor 108 can be positioned behind an ear of the user. For instance, vibration sensor 108 can be included in a part of a BTE or RIC hearing aid intended to be worn behind the user's ear. In some implementations, vibration sensor 108 can be positioned inside an ear canal of the user. For instance, vibration sensor 108 can be included in a part of an earbud or of a MC or ITE or IIC or CIC hearing aid intended to be worn inside the ear canal.
(13) In some implementations, vibration sensor 108 can be included inside a housing of the hearing device. The vibrations can be transmitted from the user's head through the housing to vibration sensor 108. In some implementations, vibration sensor 108 can be provided externally from a housing of the hearing device. In particular, vibration sensor 108 can be provided at a head surface, for instance behind the ear or inside the ear canal, to directly pick up the vibrations from the users head. Thus, while hearing device 100 is being worn by a user, the detected vibrations are representative of the own voice activity. In some implementations, vibration sensor 108 comprises an inertial sensor, in particular an accelerometer and/or a gyroscope. The inertial sensor can be positioned inside the ear canal or at a different position at the user's head. In some implementations, vibration sensor 108 comprises a bone conductive microphone and/or a pressure sensor and/or a strain gauge to be positioned inside an ear canal as disclosed in European patent application No. EP 18195686.3, which is herewith included by reference. In some implementations, vibration sensor 108 comprises an optical sensor employing a light emitter, such as a laser diode or a LED, and a photodetector to detect the vibrations, as disclosed in U.S. patent application publication Nos. US 2018/0011006 A1 and US 2018/0011006 A1, which are herewith included by reference.
(14) In some implementations, vibration sensor 108 is configured to output the vibration signal while microphone 106 outputs the audio signal. Both, the vibration signal and the audio signal can be representative of the own voice activity. For example, the audio signal may represent audio content generated, on the one hand, by one or more audio sources included in an environment and, on the other hand, by the own voice activity, while the vibration signal may represent vibrations mostly generated by the own voice activity. As another example, the vibration signal may contain additional artefacts caused, for instance, by a movement of the user and/or impacts from the environment.
(15) Memory 104 may be implemented by any suitable type of storage medium and may be configured to maintain (e.g., store) data generated, accessed, or otherwise used by processor 102. For example, memory 104 may maintain data representative of an own voice processing program that specifies how processor 102 processes the vibration signal and/or the audio signal. Memory 104 may also be used to maintain a database including data representative of parameters that are employed for the own voice detection. To illustrate, memory 104 may maintain data associated with own voice characteristics that can be representative for an own voice activity in the vibration signal provided by vibration sensor 108 and/or in the audio signal provided by microphone 106. The data may include values of a vibration frequency of the vibration signal and/or values of an audio frequency of the audio signal which are associated with a respective own voice characteristic in the vibration signal and/or audio signal.
(16) Processor 102 may be configured to access the vibration signal generated by vibration sensor 108 and/or the audio signal generated by microphone 106. Processor 102 may use the vibration signal and/or the audio signal to identify an own voice activity of the user. For example, processor 102 may be configured to determine a presence of an own voice characteristic in the vibration signal at a vibration frequency associated with an own voice characteristic, and to identify the own voice activity based on an identification criterion comprising said determined presence of the own voice characteristic in the vibration signal at the associated vibration frequency. As another example, processor 102 may determine a presence of an own voice characteristic in the audio signal at an audio frequency associated with the own voice characteristic, and to identify the own voice activity based on an identification criterion comprising said determined presence of the own voice characteristic in the audio signal at the associated audio frequency. These and other operations that may be performed by processor 102 are described in more detail in the description that follows. References to operations performed by hearing device 100 may be understood to be performed by processor 102 of hearing device 100.
(17)
(18)
(19)
(20) Signal features produced in vibration signals 132-134 by the own voice activity can be visualized in frequency spectra 142-144. In the example, such a signal feature of vibration signals 132-134 produced by the pronunciation of the first vowel can be seen as a peak 147, 148, 149 visible in frequency spectra 142-144 at an associated vibration frequency of approximately 78 Hz. Signal features 147-149 each are indicative of the vibration caused by the own voice activity and thus correspond to an own voice characteristic. Own voice characteristic 147-149 is produced in each vibration signal 132-134 for the different spatial directions 122-124. Determining a presence of the own voice characteristic in vibration signals 132-134 at the associated vibration frequency can thus be exploited to provide an identification criterion for the own voice activity. On the one hand, such an identification criterion can facilitate the own voice detection, in particular to allow a faster detection. On the other hand, such an identification criterion can increase the reliability of the own voice detection, in some implementations also in conjunction with additional requisites satisfying the identification criterion.
(21)
(22) Signal features produced in vibration signals 132-134 by the pronunciation of the second vowel can be seen in frequency spectra 152-154 as a spectral peak 157, 158, 159. Signal features 157-159 each are indicative of the vibration caused by the own voice activity and thus each correspond to an own voice characteristic. The vibration frequency associated with own voice characteristics 157-159 is approximately 92 Hz in each vibration signal 132-134 for the different spatial directions 122-124. The vibration frequency associated with own voice characteristics 147-149 produced in vibration signals 132-134 by the pronunciation of the first vowel thus differs from the vibration frequency associated with own voice characteristics 157-159 produced in vibration signals 132-134 by the pronunciation of the second vowel. This shows that the vibration frequency associated with the own voice characteristics produced in vibration signals 132-134 can depend on the content of the own voice activity, in particular the content of the user's speech. Moreover, the vibration frequency associated with the own voice characteristics generally can also depend on properties of the user. For instance, different voices of different users generally may produce an own voice characteristics associated with a different vibration frequency in the vibration signal, in particular for an own voice activity including the same content. Moreover, different speech volumes of the own voice activity, for instance when the user speaks louder due to noise occurring in the environment, can lead to a frequency shift of the vibration frequency associated with the own voice characteristic. The later phenomenom is also known as the “Lombard effect”. An own voice detection relying on an identification criterion comprising a presence of the own voice characteristic in vibration signals 132-134 may thus account for the occurring variations of the vibration frequency associated with the own voice characteristic in order to increase the detection reliability. Some embodiments of hearing device 100 and methods of its operation, which allow to employ such an identification criterion for own voice detection at varying vibration frequencies associated with the own voice characteristic, are addressed in the subsequent description.
(23)
(24) In the method illustrated in
(25) The determining the signal feature can comprise a peak detection in the vibration signal. In some implementations, the vibration signal can be evaluated in a frequency domain comprising a spectrum of vibration frequencies in order to determine the signal feature. This may imply converting a time dependent vibration signal from a time domain into the frequency domain. In some implementations, the signal feature can be determined directly from a time dependent vibration signal. To illustrate, at least one of peaks 147-149 and/or at least one of peaks 157-159 produced in vibration signals 132-134 at an associated vibration frequency may be extracted after converting at least a temporal section of vibration signals 132-134 from the time domain into the frequency domain, as illustrated in
(26) In operation 607, a decision is performed depending on an identification criterion. The identification criterion can be based on whether the signal feature is determined to be present in the vibration signal at a vibration frequency associated with an own voice characteristic. The signal feature can thus be identified as the own voice characteristic which is determined to be present at the associated vibration frequency. In some implementations, determining the presence of the own voice characteristic at the associated vibration frequency comprises simultaneous determining the signal feature in the vibration signal and determining the presence of the signal feature at the vibration frequency associated with the own voice characteristic in operation 603. In particular, the vibration signal can be evaluated at the associated vibration frequency with respect to the presence of the signal feature which is thus identified as the own voice characteristic. In some implementations, the presence of the own voice characteristic at the associated vibration frequency comprises the operations of determining the signal feature in operation 603, and subsequently determining the presence of the signal feature at the vibration frequency associated with the own voice characteristic. For instance, the vibration signal can be evaluated for any vibration frequency or a plurality of vibration frequencies with respect to the presence of the signal feature and then it can be determined if a vibration frequency at which the signal feature is present corresponds to the vibration frequency associated with the own voice characteristic. To illustrate, vibration signals 132-134 may be evaluated at the vibration frequency associated with at least one of peaks 147-149 and/or at least one of peaks 157-159 in order to determine the presence of the respective peak at the associated vibration frequency, and/or vibration signals 132-134 may be first evaluated with respect to the presence of at least one of peaks 147-149 and/or at least one of peaks 157-159 and then it may be determined if the respective peak is present at the associated vibration frequency.
(27) The vibration frequency associated with the own voice characteristic can comprise a frequency bandwidth. The frequency bandwidth can be selected such that it accounts for inaccuracies and/or variances of a value of the vibration frequency occurring during the detection of the vibration. In some implementations, the frequency bandwidth can be selected such that it is associated with a plurality of own voice characteristics. To illustrate, the vibration frequency can be a frequency bandwidth comprising the vibration frequency associated with at least one of peaks 147-149 and the vibration frequency associated with at least one of peaks 157-159 produced in vibration signals 132-134. The own voice activity may thus be identified depending on at least one of the own voice characteristics determined to be present at the associated vibration frequency. In some implementations, the frequency bandwidth can be selected such that it is associated with a single own voice characteristic. To illustrate, the vibration frequency associated with one own voice characteristic can be a frequency bandwidth comprising the vibration frequency associated with at least one of peaks 147-149 produced in vibration signals 132-134 and not comprising the vibration frequency associated with at least one of peaks 157-159 produced in vibration signals 132-134. The vibration frequency associated with another own voice characteristic can be a frequency bandwidth comprising the vibration frequency associated with at least one of peaks 157-159 produced in vibration signals 132-134 and not comprising the vibration frequency associated with at least one of peaks 147-149 produced in vibration signals 132-134. The own voice activity may thus be identified depending on the respective own voice characteristic determined to be present at the associated vibration frequency.
(28) Depending on the outcome of the decision performed in operation 607, a non-occurring own voice activity of the user is identified in operation 608, if the own voice characteristic has not been determined to be present in the vibration signal at the vibration frequency associated with the own voice characteristic. Conversely, an occurrence of an own voice activity of the user is identified in operation 609, if the own voice characteristic has been determined to be present in the vibration signal at the associated vibration frequency.
(29)
(30)
(31)
(32) To illustrate, the own voice characteristic can be produced in the vibration signal at an alias frequency of the fundamental frequency by employing a sampling rate causing an aliasing effect. Vibration sensor 108 can be configured to record the vibrations caused by the own voice activity at this sampling rate and/or to provide the vibration signal at this sampling rate. To this end, vibration sensor 108 may be configured to sample the vibrations from an analog input without applying an anti-aliasing filter (e.g. low pass filter) in between. Vibration sensor 108 can thus be configured to produce the own voice characteristic in the vibration signal at the fundamental vibration frequency and/or at the alias vibration frequency, in particular such that anti-aliasing components can be produced in the vibration signal. Determining the presence of the own voice characteristic at the alias vibration frequency can have the advantage to allow vibration sensor 108 to operate at a lower sampling rate than the Nyquist rate. This can allow determining the presence of an own voice characteristic in the vibration signal exhibiting a fundamental frequency beyond the Nyquist frequency. For instance, at least one of peaks 147-149 and/or at least one of peaks 157-159 produced in vibration signals 132-134 may be produced by a pronunciation of the vowel at a fundamental frequency corresponding to the associated vibration frequency, or they can be produced by a pronunciation of the vowel at a fundamental frequency larger than the associated vibration frequency, wherein an alias frequency of the fundamental frequency corresponds to the associated vibration frequency. For example, an own voice activity of a female voice characterized by higher vibration frequencies may thus be determined by a presence of the own voice characteristic at the alias vibration frequency of the fundamental frequency, whereas an own voice activity of a male voice characterized by lower vibration frequencies may be determined by a presence of the own voice characteristic at the fundamental frequency.
(33) Depending on the outcome of the decision performed in operation 907, an occurrence of an own voice activity of the user is identified in operation 609 if the own voice characteristic in the vibration signal has been determined to be present at the fundamental vibration frequency associated with the own voice characteristic. Depending on the outcome of the decision performed in operation 908, an occurrence of an own voice activity of the user is identified in operation 609, if the own voice characteristic in the vibration signal has been determined to be present at the alias frequency of the fundamental vibration frequency. Conversely, a non-occurring own voice activity of the user is identified in operation 608 if the own voice characteristic in the vibration signal neither has been determined to be present at the fundamental vibration frequency after the decision in operation 907, nor at the alias vibration frequency after the decision in operation 908. The decisions according to operations 907, 908 may be performed simultaneously or in any order.
(34) In some implementations, the decision performed in operation 907 can be omitted. Those implementations may correspond to some embodiments of the method illustrated in
(35) In some implementations, the maintaining of data relative to the own voice characteristic in operation 702, and/or the retrieving of the associated vibration frequency from the data in operation 703, as illustrated in
(36)
(37) The decision in operation 1007 depending on an identification criterion whether the own voice characteristic is determined to be present at the associated vibration frequency substantially corresponds to operation 607 described above, wherein the identification criterion further depends on whether the own voice characteristic is determined to be present at the first time in the vibration signal. The decision in operation 1008 depending on an identification criterion whether the own voice characteristic is determined to be present at the associated vibration frequency substantially corresponds to operation 607 described above, wherein the identification criterion further depends on whether the own voice characteristic is determined to be present at the second time in the vibration signal. Operations 1007, 1008 can be performed in any order or they can be performed simultaneously. In particular, operation 607 in the method illustrated in
(38) In some implementations, the maintaining of data relative to the own voice characteristic in operation 702, and/or the retrieving of the associated vibration frequency from the data in operation 703, as illustrated in
(39)
(40) In operation 1107, a decision is performed depending on an identification criterion. The identification criterion can be based on at least one of whether the own voice characteristic is determined to be present in the vibration signal at a vibration frequency associated with the own voice characteristic, and whether the own voice characteristic is determined to be present in the audio signal at an audio frequency associated with the own voice characteristic. In some implementations, determining the presence of the own voice characteristic at the associated frequency can comprise determining the signal feature in the vibration signal and/or audio signal and simultaneously determining a presence of the signal feature at the frequency associated with the own voice characteristic in at least one of operations 603, 1103. In some implementations, determining the presence of the own voice characteristic at the associated frequency can also comprise subsequent determining of a signal feature in the vibration signal and/or audio signal in at least one of operations 603, 1103 and then determining the presence of the signal feature at the frequency associated with the own voice characteristic.
(41) In some implementations, the identification criterion can be based on a similarity measure between the signal feature determined in the vibration signal in operation 603 and the signal feature determined in the audio signal in operation 1103. Determining the similarity measure can comprise determining a comparison and/or a correlation, for instance a cross-correlation, of the vibration signal and the audio signal with respect to the frequency at which the signal feature determined in operations 603, 1103 has been determined to be present. Thus, the vibration frequency and the audio frequency at which the signal feature has been determined to be present in operations 603, 1103 can be evaluated with respect to the comparison and/or correlation. The decision in operation 1107 can be performed depending on whether the similarity measure has been determined to be large enough. In particular, the identification criterion may be provided such that the vibration frequency at which the signal feature has been determined to be present in operation 603 and the audio frequency at which the signal feature has been determined to be present in operation 1103 must be similar to a specified degree, for instance such that they are shifted by a certain frequency difference or by at most a maximum value of a frequency difference or such that they are substantially equal. When the similarity measure has been determined to be large enough, at least one of the signal feature determined in operation 603 can be identified as the own voice characteristic determined to be present in the vibration signal at the associated vibration frequency and the signal feature determined in operation 1103 can be identified as the own voice characteristic determined to be present in the audio signal at the associated audio frequency.
(42) In some implementations, at least one of the vibration frequency associated with the own voice characteristic in the vibration signal and the audio frequency associated with the own voice characteristic in the audio signal is set to a predetermined frequency. For instance, at least one of the associated vibration frequency and the associated audio frequency can be retrieved from a database by applying an operation corresponding to operation 703 illustrated in
(43) In some implementations, the maintaining of data relative to the own voice characteristic in operation 702, and/or the retrieving of the associated vibration frequency from the data in operation 703, as illustrated in
(44) In some implementations, an audio signal characteristic is determined from the audio signal in operation 1113. Determining the audio signal characteristic can comprise estimating a signal to noise ratio (SNR) of the audio signal. Determining the audio signal characteristic can comprise estimating a volume level of the audio signal, in particular a volume level of the own voice activity and/or a volume level of other sound in the environment. The determined audio signal characteristic can be employed during the decision performed in operation 1107. For instance, a significance of the signal feature determined to be present in the audio signal can depend on an estimated SNR of the audio signal. For instance, the identification criterion applied in the decision in operation 1107 may predominantly depend on whether the signal feature is determined to be present in the vibration signal at the vibration frequency associated with the own voice characteristic when the SNR is estimated to be rather high in the audio signal. In some implementations, at least one of the vibration frequency associated with the own voice characteristic in the vibration signal and the audio frequency associated with the own voice characteristic in the audio signal is set depending on the audio signal characteristic. In particular, the audio signal characteristic can comprise an estimated volume level of the audio signal and at least one of the vibration frequency associated with the own voice characteristic in the vibration signal and the audio frequency associated with the own voice characteristic in the audio signal can be set depending on the estimated volume level, in order to account for the “Lombard effect” causing a frequency shift of the detected own voice activity at different speech volumes of the user.
(45) In some implementations, a speech recognition is performed in operation 1109. The speech recognition can be used to identify a content of a speech of the user during the own voice activity, for instance keywords spoken by the user. The speech recognition can employ the own voice characteristic determined in the vibration signal at the associated vibration signal and/or the own voice characteristic determined in the audio signal at the associated audio signal. To illustrate, peaks 147-149 and/or peaks 157-159 produced in vibration signals 132-134 may be identified as the respective vowels spoken by the user. In order to identify a plurality of vowels, consonants, words, phonemes, speech pauses, etc. successively spoken by the user, the own voice characteristic can be determined in the vibration signal and/or in the audio signal at different times, in particular by correspondingly applying operations 1003, 1004 illustrated in
(46)
(47) In some implementations, operation 1203 of deriving the own voice characteristic can comprise determining a signal feature in operation 603 and classifying the signal feature as the own voice characteristic. In particular, classifying operation 804 based on a pattern of own voice characteristics provided in operation 805, as illustrated in
(48) In some implementations, operation 1203 of deriving the own voice characteristic can comprise initiating a training operation for an individual user. During the training operation, the user can be instructed to perform a predetermined own voice activity. The own voice characteristic in the vibrations signal that can be attributed to the own voice activity can thus be identified during operation 1203. The associated vibration frequency can thus be identified during operation 1204, in particular as the vibration frequency at which the own voice characteristic has been determined to be present in operation 1203. Initiating the training operation can comprise, for instance, instructing the user to pronounce a certain number of vowels, consonants, phonemes, words, etc. The user may also be instructed to perform the own voice activity at different volume levels.
(49)
(50) A decision in operation 1305 can then be performed depending on the determined similarity measure. In a situation in which a determined similarity has been determined to be larger than a similarity threshold, for instance a correlation has been determined to be large enough, at least one of a vibration frequency associated with the own voice characteristic in the vibration signal and an audio frequency associated with the own voice characteristic in the audio signal can be identified based on the similarity measure in operation 1204. For instance, the associated vibration frequency and/or the associated audio frequency may then be selected to correspond to the vibration frequency and/or audio frequency at which the at least one of the signal features has been determined in operations 603, 1103. The associated vibration frequency and/or the associated audio frequency can then be stored in the data base for own voice characteristics in operation 1209. In a contrary situation, in which the similarity has not been determined to be larger than the similarity threshold, the associated vibration frequency and/or the associated audio frequency cannot be identified and the data base for own voice characteristics is maintained in its present state in operation 702.
(51) In some implementations, operation 1113 of determining an audio signal characteristic, as described above in conjunction with the method illustrated
(52) In some implementations, the hearing device is configured to operate in a first mode of operation in which an own voice activity of the user is detected and in a second mode of operation in which the hearing device can be prepared for the detection of the own voice activity. The first mode of operation may be implemented by at least one of the methods illustrated in
(53)
(54)
(55) In some implementations, peak detector 1403 is configured for peak detection at an harmonic frequency, for instance the fundamental frequency, of the vibration detected by vibration sensor 108, as illustrated by component 104 constituting a harmonic frequency peak detector. A determination, if the detected peak is present at the harmonic frequency, can be carried out simultaneously during peak detection, for instance by harmonic frequency peak detector 1404, or after peak detection, for instance by own voice identifier 1407. In some implementations, peak detector 1403 is configured for peak detection at an alias frequency, of the vibration detected by vibration sensor 108, as illustrated by component 105 constituting an alias frequency peak detector. A determination, if the detected peak is present at the alias frequency, can be carried out simultaneously during peak detection, for instance by alias frequency peak detector 1405, or after peak detection, for instance by own voice identifier 1407.
(56)
(57)
(58) In some implementations, signal processing configuration 1601 further comprises a speech recognizer 1609. Speech recognizer 1609 is configured to identify a content of a speech of the user identified as an own voice activity by own voice identifier 1407. The speech recognition can be based on spectral information comprising the frequencies associated with the previously detected peaks by peak detectors 1403, 1503 and/or temporal information comprising the time interval between the detected peaks provided by modulation analyzers 1605, 1606. For instance, keywords and/or commands and/or sentences spoken by the user may be identified in such a configuration.
(59)
(60) In some implementations, an audio signal comprising information about the multiple audio signals provided by microphones 106, 1706 is provided by beamformer 1702 to audio signal peak detector 1503 and to audio signal modulation analyzer 1606. In some implementations, the audio signal provided by microphone 106 and the audio signal provided by microphone 1706 are provided separately to audio signal peak detector 1503 and to audio signal modulation analyzer 1606. Correlator and/or comparator 1506 can be configured to correlate and/or compare the peaks detected by vibration signal peak detector 1403 in the vibration signal and the peaks detected by audio signal peak detector 1503 in the respective audio signal of both microphones 106, 1706.
(61)
(62) While the principles of the disclosure have been described above in connection with specific devices and methods, it is to be clearly understood that this description is made only by way of example and not as limitation on the scope of the invention. The above described preferred embodiments are intended to illustrate the principles of the invention, but not to limit the scope of the invention. Various other embodiments and modifications to those preferred embodiments may be made by those skilled in the art without departing from the scope of the present invention that is solely defined by the claims.