Hearing device for own voice detection and method of operating a hearing device

11115762 · 2021-09-07

Assignee

Inventors

Cpc classification

International classification

Abstract

A hearing device configured to be worn at a head of a user. The hearing device includes a vibration sensor to detect a vibration conducted through the user's head to the hearing device and to output a vibration signal including information about the vibration. At least part of the vibration can be caused by an own voice activity of the user. A method of operating the hearing device allows a reliable own voice detection at rather low processing effort. The processor determines a presence of an own voice characteristic in the vibration signal at a vibration frequency associated with the own voice characteristic and identifies the own voice activity based on an identification criterion including the presence of the own voice characteristic in the vibration signal at the associated vibration frequency. The own voice characteristic indicative of part of the vibration can be caused by the own voice activity.

Claims

1. A hearing device configured to be worn at least partially at a head of a user, the hearing device comprising: a vibration sensor configured to detect a vibration conducted through the user's head to the hearing device, at least a part of the vibration caused by an own voice activity of the user, and to output a vibration signal comprising information about said vibration; and a processor communicatively coupled to the vibration sensor; wherein the processor is configured to: determine a presence of a first own voice characteristic in the vibration signal at a first vibration frequency associated with the first own voice characteristic, said first associated vibration frequency being selected such that it comprises a harmonic frequency of said part of the vibration caused by the own voice activity and the first own voice characteristic being indicative of said part of the vibration caused by the own voice activity; determine a presence of a second own voice characteristic in the vibration signal at an associated second vibration frequency; and identify the own voice activity based on an identification criterion comprising said presence of the first own voice characteristic in the vibration signal at the associated first vibration frequency, and said presence of the second own voice characteristic in the vibration signal at the associated second vibration frequency.

2. The device according to claim 1, characterized in that the processor is configured to determine a temporal sequence of said presence of the first own voice characteristic and the second own voice characteristic in the vibration signal, wherein said identification criterion further comprises said presence of the first own voice characteristic temporally preceding said presence of the second own voice characteristic in the vibration signal.

3. The device according to claim 1, characterized in that at least one of the first own voice characteristic or the second own voice characteristic comprises a peak of the vibration signal at at least one of the first associated vibration frequency or the second associated vibration frequency.

4. The device according to claim 1, characterized a microphone configured to detect a sound conducted through an ambient environment of the user and to output an audio signal comprising information about said sound, the processor communicatively coupled to the microphone.

5. The device according to claim 4, characterized in that the processor is configured to determine a presence of an own voice characteristic in the audio signal at an audio frequency associated with the own voice characteristic, the own voice characteristic in the audio signal indicative of at least a part of said sound which is caused by the own voice activity, wherein said identification criterion further comprises said presence of the own voice characteristic in the audio signal at the associated audio frequency.

6. The device according to claim 5, characterized in that the processor is configured to: determine a signal feature of the vibration signal; determine a signal feature of the audio signal; and determine a similarity measure between the signal feature of the vibration signal and the signal feature of the audio signal, wherein at least one of said presence of at least one of the first own voice characteristic or the second own voice characteristic in the vibration signal at least one of the first associated vibration frequency or the second associated vibration frequency and said presence of the own voice characteristic in the audio signal at the associated audio frequency is determined based on the similarity measure.

7. The device according to claim 4, characterized in that the processor is configured to determine an intensity of the audio signal and to select said associated vibration frequency depending on said audio signal intensity.

8. The device according to claim 1, characterized in that the vibration sensor comprises an accelerometer.

9. The device according to claim 1, characterized in that said vibration signal comprises first directional data indicative of a first direction of said part of the vibration caused by the own voice activity, and second directional data indicative of a second direction of said part of the vibration caused by the own voice activity, wherein the processor is configured to determine said presence of at least one of the first own voice characteristic or the second own voice characteristic in the first directional data and in the second directional data, wherein said identification criterion further comprises a coincidence of said presence of at least one of the first own voice characteristic or the second own voice characteristic in the first directional data and in the second directional data.

10. The device according to claim 1, characterized in that at least one of the first associated vibration frequency or the second associated vibration frequency is selected such that it comprises an alias frequency of a frequency of said part of the vibration caused by the own voice activity.

11. The device according to claim 1, characterized in that the processor is configured to evaluate the vibration signal at a sampling rate of at most 1 kHz.

12. The device according to claim 1, characterized in that the processor is configured to determine a signal feature of the vibration signal; classify, based on a pattern of own voice characteristics, the signal feature as at least one of the first own voice characteristic or the second own voice characteristic; and identify the vibration frequency associated with at least one of the first own voice characteristic or the second own voice characteristic.

13. A method of operating a hearing device configured to be worn at least partially at a head of a user, the method comprising: detecting a vibration conducted through the user's head to the hearing device, at least a part of the vibration caused by an own voice activity of the user; providing a vibration signal comprising information about said vibration; determining a presence of a first own voice characteristic in the vibration signal at a first vibration frequency associated with the first own voice characteristic, said associated first vibration frequency being selected such that it comprises a harmonic frequency of said part of the vibration caused by the own voice activity and the first own voice characteristic being indicative of said part of the vibration caused by the own voice activity; determining a presence of a second own voice characteristic in the vibration signal at an associated second vibration frequency; and identifying the own voice activity based on an identification criterion comprising said determined presence of the first own voice characteristic in the vibration signal at the associated vibration frequency and said presence of the second own voice characteristic in the vibration signal at the associated second vibration frequency.

14. A hearing device configured to be worn at least partially at a head of a user, the hearing device comprising: a vibration sensor configured to detect a vibration conducted through the user's head to the hearing device, at least a part of the vibration caused by an own voice activity of the user, and to output a vibration signal comprising information about said vibration; a processor communicatively coupled to the vibration sensor; a microphone configured to detect a sound conducted through an ambient environment of the user and to output an audio signal comprising information about said sound, the processor communicatively coupled to the microphone; wherein the processor is configured to: determine a presence of an own voice characteristic in the vibration signal at a vibration frequency associated with the own voice characteristic, said associated vibration frequency being selected such that it comprises a harmonic frequency of said part of the vibration caused by the own voice activity and the own voice characteristic being indicative of said part of the vibration caused by the own voice activity; determine a presence of the own voice characteristic in the audio signal at an audio frequency associated with the own voice characteristic, the own voice characteristic in the audio signal indicative of at least a part of said sound which is caused by the own voice activity; determine a signal feature of the vibration signal; determine a signal feature of the audio signal; determine a similarity measure between the signal feature of the vibration signal and the signal feature of the audio signal, wherein at least one of said presence of the own voice characteristic in the vibration signal at the associated vibration frequency and said presence of the own voice characteristic in the audio signal at the associated audio frequency is determined based on the similarity measure; and to identify the own voice activity based on an identification criterion comprising said presence of the own voice characteristic in the vibration signal at the associated, and said presence of the own voice characteristic in the audio signal at the associated audio frequency.

15. A method of operating a hearing device configured to be worn at least partially at a head of a user, the method comprising: detecting a vibration conducted through the user's head to the hearing device, at least a part of the vibration caused by an own voice activity of the user; detecting a sound conducted through an ambient environment of the user; providing a vibration signal comprising information about said vibration; providing an audio signal comprising information about said sound; determining a presence of an own voice characteristic in the vibration signal at a vibration frequency associated with the own voice characteristic, said associated vibration frequency being selected such that it comprises a harmonic frequency of said part of the vibration caused by the own voice activity and the own voice characteristic being indicative of said part of the vibration caused by the own voice activity; determining a presence of an own voice characteristic in the audio signal at an audio frequency associated with the own voice characteristic, the own voice characteristic in the audio signal indicative of at least a part of said sound which is caused by the own voice activity; determining a signal feature of the vibration signal; determining a signal feature of the audio signal; determining a similarity measure between the signal feature of the vibration signal and the signal feature of the audio signal; determining at least one of said presence of the own voice characteristic in the vibration signal at the associated vibration frequency and said presence of the own voice characteristic in the audio signal at the associated audio frequency is based on the similarity measure; and identifying the own voice activity based on an identification criterion comprising said determined presence of the own voice characteristic in the vibration signal at the associated vibration frequency, and said determined presence of the own voice characteristic in the audio signal at the associated audio frequency.

16. A hearing device configured to be worn at least partially at a head of a user, the hearing device comprising: a vibration sensor configured to detect a vibration conducted through the user's head to the hearing device, at least a part of the vibration caused by an own voice activity of the user, and to output a vibration signal comprising information about said vibration; a processor communicatively coupled to the vibration sensor; a microphone configured to detect a sound conducted through an ambient environment of the user and to output an audio signal comprising information about said sound, the processor communicatively coupled to the microphone; wherein the processor is configured to: determine an intensity of the audio signal; determine a presence of an own voice characteristic in the vibration signal at a vibration frequency associated with the own voice characteristic, said associated vibration frequency being selected such that it comprises a harmonic frequency of said part of the vibration caused by the own voice activity, the own voice characteristic being indicative of said part of the vibration caused by the own voice activity, and said audio signal intensity; and to identify the own voice activity based on an identification criterion comprising said presence of the own voice characteristic in the vibration signal at the associated.

17. A method of operating a hearing device configured to be worn at least partially at a head of a user, the method comprising: detecting a vibration conducted through the user's head to the hearing device, at least a part of the vibration caused by an own voice activity of the user; detecting a sound conducted through an ambient environment of the user; providing a vibration signal comprising information about said vibration; providing an audio signal comprising information about said sound; determining an intensity of the audio signal; determining a presence of an own voice characteristic in the vibration signal at a vibration frequency associated with the own voice characteristic, said associated vibration frequency being selected such that it comprises a harmonic frequency of said part of the vibration caused by the own voice activity, the own voice characteristic being indicative of said part of the vibration caused by the own voice activity, and said audio signal intensity; and identifying the own voice activity based on an identification criterion comprising said determined presence of the own voice characteristic in the vibration signal at the associated vibration frequency.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings. The drawings illustrate various embodiments and are a part of the specification. The illustrated embodiments are merely examples and do not limit the scope of the disclosure. Throughout the drawings, identical or similar reference numbers designate identical or similar elements. In the drawings:

(2) FIG. 1 schematically illustrates an exemplary hearing device including a vibration sensor, a processor, a memory, an output transducer, and a microphone, in accordance with some embodiments of the present disclosure;

(3) FIG. 2 schematically illustrates an accelerometer which can be applied in the hearing device shown in FIG. 1 as an example of the vibration sensor, in accordance with some embodiments of the present disclosure;

(4) FIGS. 3A-3C illustrate exemplary vibration signals which can be provided by the vibration sensor illustrated in FIG. 2, in accordance with some embodiments of the present disclosure;

(5) FIGS. 4A-4C illustrate exemplary frequency spectra which can be obtained from the vibration signals illustrated in FIGS. 3A-3C, in accordance with some embodiments of the present disclosure;

(6) FIGS. 5A-5C illustrate further exemplary frequency spectra which can be obtained from the vibration signals illustrated in FIGS. 3A-3C, in accordance with some embodiments of the present disclosure;

(7) FIGS. 6-13 illustrate exemplary methods of own voice detection that may be executed by the hearing device illustrated in FIG. 1, in accordance with some embodiments of the present disclosure; and

(8) FIGS. 14-18 illustrate exemplary signal processing configurations that may be implemented by the hearing device illustrated in FIG. 1, in accordance with some embodiments of the present disclosure.

DETAILED DESCRIPTION OF THE DRAWINGS

(9) Referring to FIG. 1, a hearing device 100 according to some embodiments of the present disclosure is illustrated. As shown, hearing device 100 includes a processor 102 communicatively coupled to a memory 104, a microphone 106, a vibration sensor 108, and an output transducer 110. Hearing device 100 may include additional or alternative components as may serve a particular implementation.

(10) Hearing device 100 may be implemented by any type of hearing device configured to enable or enhance hearing of a user wearing hearing device 100. For example, hearing device 100 may be implemented by a hearing aid configured to provide an amplified version of audio content to a user, an earphone, a cochlear implant system configured to provide electrical stimulation representative of audio content to a user, a sound processor included in a bimodal hearing system configured to provide both amplification and electrical stimulation representative of audio content to a user, or any other suitable hearing prosthesis. Different types of hearing devices can also be distinguished by the position at which a housing accommodating output transducer 100 is intended to be worn at a head of a user relative to an ear canal of the user. Hearing devices which are configured such that the housing enclosing the transducer can be worn at a wearing position outside the ear canal, in particular behind an ear of the user, can include, for instance, behind-the-ear (BTE) hearing aids. Hearing devices which are configured such that the housing enclosing the transducer can be at least partially inserted into the ear canal can include, for instance, earbuds, earphones, and hearing instruments such as receiver-in-the-canal (RIC) hearing aids, in-the-ear (ITE) hearing aids, invisible-in-the-canal (IIC) hearing aids, and completely-in-the-canal (CIC) hearing aids. The housing can be an earpiece adapted for an insertion and/or a partial insertion into the ear canal. Some hearing devices comprise a housing having a standardized shape intended to fit into a variety of ear canals of different users. Other hearing devices comprise a housing having a customized shape adapted to an ear canal of an individual user. The customized housing can be, for instance, a shell formed from an ear mould or an earpiece that is customizable in-situ by the user.

(11) Microphone 106 may be implemented by any suitable audio detection device and is configured to detect a sound presented to a user of hearing device 100. The sound can comprise audio content (e.g., music, speech, noise, etc.) generated by one or more audio sources included in an environment of the user. The sound can also include audio content generated by a voice of the user during an own voice activity, such as a speech by the user. In particular, a vibration of the user's vocal chords during the own voice activity may produce airborne sound in the environment of the user, which is detectable as the audio signal by microphone 106. Microphone 106 is configured to output an audio signal comprising information about the sound detected from the environment of the user. Microphone 106 may be included in or communicatively coupled to hearing device 100 in any suitable manner. Output transducer 110 may be implemented by any suitable audio output device, for instance a loudspeaker of a hearing device or an output electrode of a cochlear implant system.

(12) Vibration sensor 108 may be implemented by any suitable sensor configured to detect a vibration conducted during an own voice activity through the user's head. In particular, the vibrations can be conducted from the user's vocal chords through the bones and tissue of the head. In some implementations, sensor 108 may also be referred to as a bone vibration sensor. Vibration sensor 108 is configured to output a vibration signal comprising information about the detected vibrations. Vibration sensor 108 may be positioned at any position at the user's head allowing the detection of the vibrations conducted through the head. In some implementations, vibration sensor 108 can be positioned behind an ear of the user. For instance, vibration sensor 108 can be included in a part of a BTE or RIC hearing aid intended to be worn behind the user's ear. In some implementations, vibration sensor 108 can be positioned inside an ear canal of the user. For instance, vibration sensor 108 can be included in a part of an earbud or of a MC or ITE or IIC or CIC hearing aid intended to be worn inside the ear canal.

(13) In some implementations, vibration sensor 108 can be included inside a housing of the hearing device. The vibrations can be transmitted from the user's head through the housing to vibration sensor 108. In some implementations, vibration sensor 108 can be provided externally from a housing of the hearing device. In particular, vibration sensor 108 can be provided at a head surface, for instance behind the ear or inside the ear canal, to directly pick up the vibrations from the users head. Thus, while hearing device 100 is being worn by a user, the detected vibrations are representative of the own voice activity. In some implementations, vibration sensor 108 comprises an inertial sensor, in particular an accelerometer and/or a gyroscope. The inertial sensor can be positioned inside the ear canal or at a different position at the user's head. In some implementations, vibration sensor 108 comprises a bone conductive microphone and/or a pressure sensor and/or a strain gauge to be positioned inside an ear canal as disclosed in European patent application No. EP 18195686.3, which is herewith included by reference. In some implementations, vibration sensor 108 comprises an optical sensor employing a light emitter, such as a laser diode or a LED, and a photodetector to detect the vibrations, as disclosed in U.S. patent application publication Nos. US 2018/0011006 A1 and US 2018/0011006 A1, which are herewith included by reference.

(14) In some implementations, vibration sensor 108 is configured to output the vibration signal while microphone 106 outputs the audio signal. Both, the vibration signal and the audio signal can be representative of the own voice activity. For example, the audio signal may represent audio content generated, on the one hand, by one or more audio sources included in an environment and, on the other hand, by the own voice activity, while the vibration signal may represent vibrations mostly generated by the own voice activity. As another example, the vibration signal may contain additional artefacts caused, for instance, by a movement of the user and/or impacts from the environment.

(15) Memory 104 may be implemented by any suitable type of storage medium and may be configured to maintain (e.g., store) data generated, accessed, or otherwise used by processor 102. For example, memory 104 may maintain data representative of an own voice processing program that specifies how processor 102 processes the vibration signal and/or the audio signal. Memory 104 may also be used to maintain a database including data representative of parameters that are employed for the own voice detection. To illustrate, memory 104 may maintain data associated with own voice characteristics that can be representative for an own voice activity in the vibration signal provided by vibration sensor 108 and/or in the audio signal provided by microphone 106. The data may include values of a vibration frequency of the vibration signal and/or values of an audio frequency of the audio signal which are associated with a respective own voice characteristic in the vibration signal and/or audio signal.

(16) Processor 102 may be configured to access the vibration signal generated by vibration sensor 108 and/or the audio signal generated by microphone 106. Processor 102 may use the vibration signal and/or the audio signal to identify an own voice activity of the user. For example, processor 102 may be configured to determine a presence of an own voice characteristic in the vibration signal at a vibration frequency associated with an own voice characteristic, and to identify the own voice activity based on an identification criterion comprising said determined presence of the own voice characteristic in the vibration signal at the associated vibration frequency. As another example, processor 102 may determine a presence of an own voice characteristic in the audio signal at an audio frequency associated with the own voice characteristic, and to identify the own voice activity based on an identification criterion comprising said determined presence of the own voice characteristic in the audio signal at the associated audio frequency. These and other operations that may be performed by processor 102 are described in more detail in the description that follows. References to operations performed by hearing device 100 may be understood to be performed by processor 102 of hearing device 100.

(17) FIG. 2 illustrates vibration sensor 108 in accordance with some embodiments of the present disclosure. Vibration sensor 108 is provided by an accelerometer 109 which may be configured to detect accelerations in one, two, or three distinct spatial directions. In the illustrated example, accelerometer 109 is configured to detect accelerations in three spatial directions including an x-direction 122, a y-direction 123, and a z-direction 124. When positioned at a user's head, accelerometer 109 is thus configured to provide a respective vibration signal indicative of vibrations caused by an own voice activity in each of spatial directions 122-124.

(18) FIGS. 3A-3C show examples of vibration signals 132, 133, 134 recorded (e.g., sampled) by vibration sensor 108 implemented by an accelerometer 109, as illustrated in FIG. 2, over time, wherein the accelerometer has been positioned at a user's head during an own voice activity. Vibration signal 132 depicted in FIG. 3A represents detected vibrations in x-direction 122, vibration signal 133 depicted in FIG. 3B represents detected vibrations in y-direction 123, vibration signal 134 depicted in FIG. 3B represents detected vibrations in z-direction 124. Vibration signals 132-134 are depicted in a respective functional plot. Each functional plot comprises an axis of ordinates 135 indicating a signal level of the recorded vibration signal, and an axis of abscissas 136 indicating a time during which the vibration signal has been recorded. Vibration signals 132-134 have been recorded during the same time period. The duration of the time period (e.g., sampling time) was 10 seconds. In the example, vibrations caused by an own voice activity of a user have been recorded at a sampling rate of 250 Hz. During the own voice activity, the user has subsequently pronounced three different vowels. The vibrations caused by speaking of the first vowel have been recorded during a time corresponding to approximately the initial three seconds in the functional plots of vibration signals 132-134. The recorded vibrations related of the second vowel correspond to approximately the subsequent three seconds in the functional plots. The recorded vibrations related of the third vowel correspond to approximately the last three seconds in the functional plots.

(19) FIGS. 4A-4C show frequency spectra 142, 143, 144 obtained from vibration signals 132, 133, 134. Frequency spectra 142, 143, 144 have been obtained by evaluating a first temporal section of time dependent vibration signals 132, 133, 134 in a frequency domain. The first temporal section corresponds to a signal portion recorded during the initial three seconds in the functional plots of vibration signals 132-134 depicted in FIGS. 3A-3C. Thus, frequency spectra 142-144 are indicative of the vibrations caused by the first vowel pronounced by the user. Frequency spectrum 142 depicted in FIG. 4A has been obtained from the first temporal section of vibration signal 132, frequency spectrum 143 depicted in FIG. 4B has been obtained from the first temporal section of vibration signal 133, and frequency spectrum 144 depicted in FIG. 4C has been obtained from the first temporal section of vibration signal 134. Frequency spectrum 142 is thus indicative of the detected vibrations in x-direction 122, frequency spectrum 143 is indicative of the detected vibrations in y-direction 123, and frequency spectrum 144 is indicative of the detected vibrations in z-direction 124. Frequency spectra 142-144 are depicted in a respective functional plot. Each functional plot comprises an axis of ordinates 145 indicating a signal level of the recorded vibration signal, and an axis of abscissas 146 indicating a vibration frequency associated with the signal level. In the example, frequency spectra 142-144 are provided as the power spectral density (PSD) of the first temporal section of time dependent vibration signals 132-134. Before obtaining frequency spectra 142-144, vibration signals 132-134 have been frequency filtered by a high pass filter.

(20) Signal features produced in vibration signals 132-134 by the own voice activity can be visualized in frequency spectra 142-144. In the example, such a signal feature of vibration signals 132-134 produced by the pronunciation of the first vowel can be seen as a peak 147, 148, 149 visible in frequency spectra 142-144 at an associated vibration frequency of approximately 78 Hz. Signal features 147-149 each are indicative of the vibration caused by the own voice activity and thus correspond to an own voice characteristic. Own voice characteristic 147-149 is produced in each vibration signal 132-134 for the different spatial directions 122-124. Determining a presence of the own voice characteristic in vibration signals 132-134 at the associated vibration frequency can thus be exploited to provide an identification criterion for the own voice activity. On the one hand, such an identification criterion can facilitate the own voice detection, in particular to allow a faster detection. On the other hand, such an identification criterion can increase the reliability of the own voice detection, in some implementations also in conjunction with additional requisites satisfying the identification criterion.

(21) FIGS. 5A-5C show functional plots of further frequency spectra 152, 153, 154. Frequency spectra 152-154 were obtained by evaluating a second temporal section of time dependent vibration signals 132, 133, 134 in a frequency domain. The second temporal section corresponds to a signal portion recorded within the third and sixth second in the functional plots of vibration signals 132-134 depicted in FIGS. 3A-3C. Thus, frequency spectra 152-154 are indicative of the vibrations caused by the second vowel pronounced by the user. Frequency spectrum 152 depicted in FIG. 5A has been obtained from the second temporal section of vibration signal 132, frequency spectrum 153 depicted in FIG. 5B has been obtained from the second temporal section of vibration signal 133, and frequency spectrum 154 depicted in FIG. 5C has been obtained from the second temporal section of vibration signal 134. Frequency spectrum 152 is thus indicative of the vibrations in x-direction 122, frequency spectrum 153 is indicative of the vibrations in y-direction 123, and frequency spectrum 154 is indicative of the vibrations in z-direction 124.

(22) Signal features produced in vibration signals 132-134 by the pronunciation of the second vowel can be seen in frequency spectra 152-154 as a spectral peak 157, 158, 159. Signal features 157-159 each are indicative of the vibration caused by the own voice activity and thus each correspond to an own voice characteristic. The vibration frequency associated with own voice characteristics 157-159 is approximately 92 Hz in each vibration signal 132-134 for the different spatial directions 122-124. The vibration frequency associated with own voice characteristics 147-149 produced in vibration signals 132-134 by the pronunciation of the first vowel thus differs from the vibration frequency associated with own voice characteristics 157-159 produced in vibration signals 132-134 by the pronunciation of the second vowel. This shows that the vibration frequency associated with the own voice characteristics produced in vibration signals 132-134 can depend on the content of the own voice activity, in particular the content of the user's speech. Moreover, the vibration frequency associated with the own voice characteristics generally can also depend on properties of the user. For instance, different voices of different users generally may produce an own voice characteristics associated with a different vibration frequency in the vibration signal, in particular for an own voice activity including the same content. Moreover, different speech volumes of the own voice activity, for instance when the user speaks louder due to noise occurring in the environment, can lead to a frequency shift of the vibration frequency associated with the own voice characteristic. The later phenomenom is also known as the “Lombard effect”. An own voice detection relying on an identification criterion comprising a presence of the own voice characteristic in vibration signals 132-134 may thus account for the occurring variations of the vibration frequency associated with the own voice characteristic in order to increase the detection reliability. Some embodiments of hearing device 100 and methods of its operation, which allow to employ such an identification criterion for own voice detection at varying vibration frequencies associated with the own voice characteristic, are addressed in the subsequent description.

(23) FIGS. 6-13 illustrate exemplary methods of operating a hearing device according to some embodiments of the present disclosure. Other embodiments may omit, add to, reorder and/or modify any of the operations shown in FIGS. 6-13. Some embodiments may be implemented in hearing device 100 illustrated in FIG. 1. Some embodiments may be implemented in a hearing device comprising additional constituent parts, for instance an additional microphone and/or a beamformer. Some embodiments may be implemented in a hearing system comprising two hearing devices in a binaural configuration.

(24) In the method illustrated in FIG. 6, a vibration signal indicative of a vibration caused by an own voice activity of a user is provided in operation 602. The vibration signal can be provided, for instance, by vibration sensor 108 after detection of the vibration conducted through the user's head. In operation 603, a signal feature is determined in the vibration signal. The signal feature can be produced in the vibration signal by the own voice activity. The signal feature can be a frequency dependent property of the vibration signal such that it is characteristic for a specific vibration frequency. The signal feature can comprise a peak at the vibration frequency. To illustrate, the signal feature may be provided as at least one of peaks 147-149 and/or at least one of peaks 157-159 produced in vibration signals 132-134. Alternatively or additionally, the signal feature can comprise another property, for instance a signal level larger than a specified minimum level at the vibration frequency.

(25) The determining the signal feature can comprise a peak detection in the vibration signal. In some implementations, the vibration signal can be evaluated in a frequency domain comprising a spectrum of vibration frequencies in order to determine the signal feature. This may imply converting a time dependent vibration signal from a time domain into the frequency domain. In some implementations, the signal feature can be determined directly from a time dependent vibration signal. To illustrate, at least one of peaks 147-149 and/or at least one of peaks 157-159 produced in vibration signals 132-134 at an associated vibration frequency may be extracted after converting at least a temporal section of vibration signals 132-134 from the time domain into the frequency domain, as illustrated in FIGS. 4A-4C and FIGS. 5A-5C, and/or the respective peak may be extracted directly from time dependent vibration signals 132-134.

(26) In operation 607, a decision is performed depending on an identification criterion. The identification criterion can be based on whether the signal feature is determined to be present in the vibration signal at a vibration frequency associated with an own voice characteristic. The signal feature can thus be identified as the own voice characteristic which is determined to be present at the associated vibration frequency. In some implementations, determining the presence of the own voice characteristic at the associated vibration frequency comprises simultaneous determining the signal feature in the vibration signal and determining the presence of the signal feature at the vibration frequency associated with the own voice characteristic in operation 603. In particular, the vibration signal can be evaluated at the associated vibration frequency with respect to the presence of the signal feature which is thus identified as the own voice characteristic. In some implementations, the presence of the own voice characteristic at the associated vibration frequency comprises the operations of determining the signal feature in operation 603, and subsequently determining the presence of the signal feature at the vibration frequency associated with the own voice characteristic. For instance, the vibration signal can be evaluated for any vibration frequency or a plurality of vibration frequencies with respect to the presence of the signal feature and then it can be determined if a vibration frequency at which the signal feature is present corresponds to the vibration frequency associated with the own voice characteristic. To illustrate, vibration signals 132-134 may be evaluated at the vibration frequency associated with at least one of peaks 147-149 and/or at least one of peaks 157-159 in order to determine the presence of the respective peak at the associated vibration frequency, and/or vibration signals 132-134 may be first evaluated with respect to the presence of at least one of peaks 147-149 and/or at least one of peaks 157-159 and then it may be determined if the respective peak is present at the associated vibration frequency.

(27) The vibration frequency associated with the own voice characteristic can comprise a frequency bandwidth. The frequency bandwidth can be selected such that it accounts for inaccuracies and/or variances of a value of the vibration frequency occurring during the detection of the vibration. In some implementations, the frequency bandwidth can be selected such that it is associated with a plurality of own voice characteristics. To illustrate, the vibration frequency can be a frequency bandwidth comprising the vibration frequency associated with at least one of peaks 147-149 and the vibration frequency associated with at least one of peaks 157-159 produced in vibration signals 132-134. The own voice activity may thus be identified depending on at least one of the own voice characteristics determined to be present at the associated vibration frequency. In some implementations, the frequency bandwidth can be selected such that it is associated with a single own voice characteristic. To illustrate, the vibration frequency associated with one own voice characteristic can be a frequency bandwidth comprising the vibration frequency associated with at least one of peaks 147-149 produced in vibration signals 132-134 and not comprising the vibration frequency associated with at least one of peaks 157-159 produced in vibration signals 132-134. The vibration frequency associated with another own voice characteristic can be a frequency bandwidth comprising the vibration frequency associated with at least one of peaks 157-159 produced in vibration signals 132-134 and not comprising the vibration frequency associated with at least one of peaks 147-149 produced in vibration signals 132-134. The own voice activity may thus be identified depending on the respective own voice characteristic determined to be present at the associated vibration frequency.

(28) Depending on the outcome of the decision performed in operation 607, a non-occurring own voice activity of the user is identified in operation 608, if the own voice characteristic has not been determined to be present in the vibration signal at the vibration frequency associated with the own voice characteristic. Conversely, an occurrence of an own voice activity of the user is identified in operation 609, if the own voice characteristic has been determined to be present in the vibration signal at the associated vibration frequency.

(29) FIG. 7 illustrates another method of operating a hearing device for detection of an own voice activity of the user according to some embodiments of the present disclosure. In operation 702, data relative to at least one own voice characteristic or a plurality of own voice characteristics is maintained in a database. The data comprises the vibration frequency associated with the own voice characteristic. The data can be stored in memory 104 and/or processed by processor 102. In operation 703, at least one vibration frequency associated with at least one own voice characteristic is retrieved from the data. Operations 702 and 703 can be performed concurrently with providing a vibration signal in operation 602 and/or determining the signal feature in operation 603. In some implementations, an operation 704 can be performed after determining the signal feature in operation 603. In operation 704, the vibration frequency associated with the own voice characteristic, which has been retrieved in operation 703, is compared with the vibration frequency at which the signal feature, as determined in operation 603, is present. The comparison in operation 704 can then be employed in the decision in operation 607 depending on whether the signal feature identified as the own voice characteristic is present in the vibration signal at the vibration frequency associated with the own voice characteristic. In some implementations, the vibration frequency associated with the own voice characteristic, which has been retrieved in operation 703, can be applied during determining the signal feature in operation 603, as indicated by a dashed arrow in FIG. 7. In particular, the vibration signal can be directly evaluated at the associated vibration frequency, which has been retrieved in operation 703, with respect to the presence of the signal feature at the vibration frequency associated with the own voice characteristic. The comparison in operation 704 may then be omitted. The evaluation of the vibration signal at the associated vibration frequency can be employed in the decision in operation 607 depending on whether the own voice characteristic is present in the vibration signal at the vibration frequency associated with the own voice characteristic.

(30) FIG. 8 illustrates another method of operating a hearing device for detection of an own voice activity of the user according to some embodiments of the present disclosure. An operation 804 is performed after determining the signal feature in operation 603. In operation 804, the signal feature determined in operation 603 is classified based on a pattern of own voice characteristics. Before or during operation 804, the pattern of own voice characteristics is provided in operation 805. For instance, the pattern of own voice characteristics can be retrieved by processor 102 from a database, for instance a database stored in memory 104. The classification in operation 804 can comprise determining a similarity measure between the signal feature determined in operation 603 and the pattern of own voice characteristics provided in operation 805. In particular, the signal feature can be classified in operation 804 as the own voice characteristic if the similarity measure is determined to be larger than a threshold value of the similarity measure. For instance, a vibration frequency at which the signal feature determined in operation 603 is present can be determined to exceed the threshold value of the similarity measure with respect to the pattern of own voice characteristics provided in operation 805. For instance, a signal level of the signal feature determined in operation 603 can be determined to exceed the threshold value of the similarity measure with respect to the pattern of own voice characteristics provided in operation 805. For instance, a characteristic of an audio signal indicative of a sound that has been detected simultaneously with the vibration producing the signal feature determined in operation 603 can be determined to exceed the threshold value of the similarity measure with respect to the pattern of own voice characteristics provided in operation 805. The signal feature can thus be classified as the own voice characteristic determined to be present at the associated vibration frequency. In some implementations, the pattern of own voice characteristics provided in operation 805 can then be customized by processor 102 such that it includes new information regarding the signal feature classified as the own voice characteristic. Subsequently, the customized pattern of own voice characteristics may be stored in the database such that the pattern including the new information can be retrieved in future executions of operation 805. In this way, processor 102 can be configured to learn the own voice characteristics and the associated vibration frequency. A corresponding classifier can be operable by processor 102. The classification can be based, for instance, on a Bayesian analysis and/or other classification schemes.

(31) FIG. 9 illustrates another method of operating a hearing device for detection of an own voice activity of the user according to some embodiments of the present disclosure. In the place of the decision in operation 607, two decisions are performed in operations 907 and 908. The decision in operation 907 substantially corresponds to the decision in 607, wherein the vibration frequency associated with the own voice characteristic is selected as a fundamental frequency of the vibration caused by the own voice activity. The decision in operation 908 substantially corresponds to the decision in 607, wherein the vibration frequency associated with the own voice characteristic is selected as an alias frequency of the fundamental frequency of the vibration caused by the own voice activity. In some implementations, the determining the presence of the own voice characteristic at the fundamental vibration frequency and/or at the alias vibration frequency of the fundamental vibration frequency comprises simultaneous determining of the signal feature in operation 603. In some implementations, the presence of the own voice characteristic at the fundamental vibration frequency and/or at the alias vibration frequency is determined by determining the signal feature in operation 603 and then determining the presence of the signal feature at the fundamental vibration frequency and/or at the alias vibration frequency.

(32) To illustrate, the own voice characteristic can be produced in the vibration signal at an alias frequency of the fundamental frequency by employing a sampling rate causing an aliasing effect. Vibration sensor 108 can be configured to record the vibrations caused by the own voice activity at this sampling rate and/or to provide the vibration signal at this sampling rate. To this end, vibration sensor 108 may be configured to sample the vibrations from an analog input without applying an anti-aliasing filter (e.g. low pass filter) in between. Vibration sensor 108 can thus be configured to produce the own voice characteristic in the vibration signal at the fundamental vibration frequency and/or at the alias vibration frequency, in particular such that anti-aliasing components can be produced in the vibration signal. Determining the presence of the own voice characteristic at the alias vibration frequency can have the advantage to allow vibration sensor 108 to operate at a lower sampling rate than the Nyquist rate. This can allow determining the presence of an own voice characteristic in the vibration signal exhibiting a fundamental frequency beyond the Nyquist frequency. For instance, at least one of peaks 147-149 and/or at least one of peaks 157-159 produced in vibration signals 132-134 may be produced by a pronunciation of the vowel at a fundamental frequency corresponding to the associated vibration frequency, or they can be produced by a pronunciation of the vowel at a fundamental frequency larger than the associated vibration frequency, wherein an alias frequency of the fundamental frequency corresponds to the associated vibration frequency. For example, an own voice activity of a female voice characterized by higher vibration frequencies may thus be determined by a presence of the own voice characteristic at the alias vibration frequency of the fundamental frequency, whereas an own voice activity of a male voice characterized by lower vibration frequencies may be determined by a presence of the own voice characteristic at the fundamental frequency.

(33) Depending on the outcome of the decision performed in operation 907, an occurrence of an own voice activity of the user is identified in operation 609 if the own voice characteristic in the vibration signal has been determined to be present at the fundamental vibration frequency associated with the own voice characteristic. Depending on the outcome of the decision performed in operation 908, an occurrence of an own voice activity of the user is identified in operation 609, if the own voice characteristic in the vibration signal has been determined to be present at the alias frequency of the fundamental vibration frequency. Conversely, a non-occurring own voice activity of the user is identified in operation 608 if the own voice characteristic in the vibration signal neither has been determined to be present at the fundamental vibration frequency after the decision in operation 907, nor at the alias vibration frequency after the decision in operation 908. The decisions according to operations 907, 908 may be performed simultaneously or in any order.

(34) In some implementations, the decision performed in operation 907 can be omitted. Those implementations may correspond to some embodiments of the method illustrated in FIG. 6, wherein the decision according to operation 607 is replaced by the decision according to operation 908. Thus, the vibration frequency associated with the own voice characteristic can be selected as an alias frequency of the fundamental frequency of the vibration caused by the own voice activity. In some implementations, the decision performed in operation 908 can be omitted. Those implementations may correspond to some embodiments of the method illustrated in FIG. 6, wherein the decision according to operation 607 is replaced by the decision according to operation 907. Thus, the vibration frequency associated with the own voice characteristic can be selected as the fundamental frequency of the vibration caused by the own voice activity. In some implementations, another harmonic frequency than the fundamental frequency of the vibration caused by the own voice activity is selected as the vibration frequency associated with the own voice characteristic. The harmonic frequency can correspond to an integer multiple of the fundamental frequency. In some implementations, the harmonic frequency can be selected as the associated vibration frequency in the decision performed in operation 907. In some implementations, an alias frequency of the harmonic frequency can be selected as the associated vibration frequency in the decision performed in operation 908.

(35) In some implementations, the maintaining of data relative to the own voice characteristic in operation 702, and/or the retrieving of the associated vibration frequency from the data in operation 703, as illustrated in FIG. 7, can be correspondingly applied. The data can comprise the fundamental vibration frequency associated with the respective own voice characteristic and/or the alias frequency of the fundamental vibration frequency. In some implementations, the comparing operation 704, as illustrated in FIG. 7, can be correspondingly applied and may be employed in at least one of the decisions in operations 907, 908. In some implementations, the classifying operation 804 based on the pattern of own voice characteristics provided in operation 805, as illustrated in FIG. 8, can be correspondingly applied, in particular with respect to a classification of the signal feature determined in operation 603 as an own voice feature and the associated vibration frequency as the fundamental vibration frequency and/or the fundamental alias frequency, and may be employed in at least one of the decisions in operations 907, 908.

(36) FIG. 10 illustrates another method of operating a hearing device for detection of an own voice activity of the user according to some embodiments of the present disclosure. The determining of a signal feature in the vibration signal in operation 1003 substantially corresponds to operation 603 described above, wherein the signal feature is determined at a first time in the vibration signal. The determining of a signal feature in the vibration signal in operation 1004 substantially corresponds to operation 603 described above, wherein the signal feature is determined at a second time in the vibration signal. The second time is different from the first time. The determined own voice characteristic at the second time can be different from the determined own voice characteristic at the first time, or it can be equal. The vibration frequency associated with the own voice characteristic at the second time can be different from the vibration frequency associated with the own voice characteristic at the first time, or it can be equal. To illustrate, a presence of at least one of peaks 147-149 in the first temporal section of vibration signals 132-134 may be determined in operation 1003, and a presence of at least one of peaks 157-159 in the second temporal section of vibration signals 132-134 may be determined in operation 1004. Operations 1003, 1004 can be performed in any order or they can be performed simultaneously. For instance, operation 603 in the method illustrated in FIG. 6 can comprise determining the signal feature at the vibration frequency associated with the own voice characteristic at the first time and determining the signal feature at the vibration frequency associated with the own voice characteristic at the second time. For instance, the vibration signal can be evaluated in a modulation analysis to determine a temporal behaviour of the presence of the own voice characteristic in the vibration signal, in particular temporal variations of the presence of the own voice characteristic. In some implementations, a presence of an own voice characteristic in the vibration signal is determined at least at one additional time in the vibration signal different from the first time and the second time. Taking into account the temporal behaviour of the vibration signal during own detection can improve the detection reliability.

(37) The decision in operation 1007 depending on an identification criterion whether the own voice characteristic is determined to be present at the associated vibration frequency substantially corresponds to operation 607 described above, wherein the identification criterion further depends on whether the own voice characteristic is determined to be present at the first time in the vibration signal. The decision in operation 1008 depending on an identification criterion whether the own voice characteristic is determined to be present at the associated vibration frequency substantially corresponds to operation 607 described above, wherein the identification criterion further depends on whether the own voice characteristic is determined to be present at the second time in the vibration signal. Operations 1007, 1008 can be performed in any order or they can be performed simultaneously. In particular, operation 607 in the method illustrated in FIG. 6 can comprise both decisions depending on the own voice characteristic presence at the associated vibration frequency at the first time and the second time. Only if the own voice characteristic is determined to be present at the first time and the second time, an own voice activity of the user is identified in operation 609. Otherwise, if the own voice characteristic is not determined to be present at the first time and/or the second time, an non-occurrence of the own voice activity of the user is identified in operation 608.

(38) In some implementations, the maintaining of data relative to the own voice characteristic in operation 702, and/or the retrieving of the associated vibration frequency from the data in operation 703, as illustrated in FIG. 7, can be correspondingly applied. The data can comprise the vibration frequency associated with the own voice characteristic at the first time and at the second time. In some implementations, the comparing operation 704, as illustrated in FIG. 7, can be correspondingly applied and may be employed in at least one of the decisions in operations 1007, 1008. In some implementations, the classifying operation 804 based on the pattern of own voice characteristics provided in operation 805, as illustrated in FIG. 8, can be correspondingly applied, in particular with respect to a classification of the signal feature determined in at least one of operations 1003, 1004 as an own voice feature at the first time and/or the second time, and may be employed in at least one of the decisions in operations 1007, 1008. In some implementations, the decision in operation 907 and/or the decision in operation 908, as illustrated in FIG. 9, can be correspondingly applied in the place of at least one of operations 1007, 1008, wherein the vibration frequency associated with the own voice characteristic at the first time and/or the second time is selected as the fundamental vibration frequency and/or an alias frequency of the fundamental vibration frequency.

(39) FIG. 11 illustrates another method of operating a hearing device for detection of an own voice activity of the user according to some embodiments of the present disclosure. In operation 1102, an audio signal indicative of an airborne sound is provided. The sound can comprise audio content generated in an environment of the user and/or audio content generated by an own voice activity of the user. The audio signal can be provided by microphone 106. In operation 1103, a signal feature in the audio signal is determined. The signal feature can be produced in the audio signal by the own voice activity. The signal feature can be a frequency dependent property of the audio signal such that it is characteristic for a specific audio frequency. The signal feature can comprise a peak at the audio frequency. To illustrate, the signal feature produced in the audio signal may be a peak at an audio frequency corresponding to or having a similar value as the vibration frequency at which at least one of peaks 147-149 and/or at least one of peaks 157-159 is produced in vibration signals 132-134. Alternatively or additionally, the signal feature can comprise another property, for instance a signal level larger than a specified minimum level at the audio frequency. Taking into account the audio signal during own detection can improve the detection reliability. Determining the signal feature can comprise employing a peak detection in the audio signal. In some implementations, the audio signal can be evaluated in a frequency domain comprising a spectrum of audio frequencies. This may imply converting a time dependent audio signal from a time domain into the frequency domain. In some implementations, the signal feature can be determined directly in a time dependent audio signal.

(40) In operation 1107, a decision is performed depending on an identification criterion. The identification criterion can be based on at least one of whether the own voice characteristic is determined to be present in the vibration signal at a vibration frequency associated with the own voice characteristic, and whether the own voice characteristic is determined to be present in the audio signal at an audio frequency associated with the own voice characteristic. In some implementations, determining the presence of the own voice characteristic at the associated frequency can comprise determining the signal feature in the vibration signal and/or audio signal and simultaneously determining a presence of the signal feature at the frequency associated with the own voice characteristic in at least one of operations 603, 1103. In some implementations, determining the presence of the own voice characteristic at the associated frequency can also comprise subsequent determining of a signal feature in the vibration signal and/or audio signal in at least one of operations 603, 1103 and then determining the presence of the signal feature at the frequency associated with the own voice characteristic.

(41) In some implementations, the identification criterion can be based on a similarity measure between the signal feature determined in the vibration signal in operation 603 and the signal feature determined in the audio signal in operation 1103. Determining the similarity measure can comprise determining a comparison and/or a correlation, for instance a cross-correlation, of the vibration signal and the audio signal with respect to the frequency at which the signal feature determined in operations 603, 1103 has been determined to be present. Thus, the vibration frequency and the audio frequency at which the signal feature has been determined to be present in operations 603, 1103 can be evaluated with respect to the comparison and/or correlation. The decision in operation 1107 can be performed depending on whether the similarity measure has been determined to be large enough. In particular, the identification criterion may be provided such that the vibration frequency at which the signal feature has been determined to be present in operation 603 and the audio frequency at which the signal feature has been determined to be present in operation 1103 must be similar to a specified degree, for instance such that they are shifted by a certain frequency difference or by at most a maximum value of a frequency difference or such that they are substantially equal. When the similarity measure has been determined to be large enough, at least one of the signal feature determined in operation 603 can be identified as the own voice characteristic determined to be present in the vibration signal at the associated vibration frequency and the signal feature determined in operation 1103 can be identified as the own voice characteristic determined to be present in the audio signal at the associated audio frequency.

(42) In some implementations, at least one of the vibration frequency associated with the own voice characteristic in the vibration signal and the audio frequency associated with the own voice characteristic in the audio signal is set to a predetermined frequency. For instance, at least one of the associated vibration frequency and the associated audio frequency can be retrieved from a database by applying an operation corresponding to operation 703 illustrated in FIG. 7. The decision in operation 1107 can then be performed depending on at least one of whether the own voice characteristic is determined to be present in the vibration signal at the associated vibration frequency, and whether the own voice characteristic is determined to be present in the audio signal at the associated audio frequency, in particular depending on whether both criteria are fulfilled.

(43) In some implementations, the maintaining of data relative to the own voice characteristic in operation 702, and/or the retrieving of the associated vibration frequency from the data in operation 703, as illustrated in FIG. 7, can be correspondingly applied. The data can comprise the vibration frequency associated with the own voice characteristic in the vibration signal and/or the vibration frequency associated with the own voice characteristic in the vibration signal. In some implementations, the comparing operation 704, as illustrated in FIG. 7, can be correspondingly applied and may be employed in the decisions in operation 1107. In some implementations, the classifying operation 804 based on the pattern of own voice characteristics provided in operation 805, as illustrated in FIG. 8, can be correspondingly applied, in particular with respect to a classification of the signal feature determined in the vibration signal in operation 602 and/or with respect to a classification of the signal feature determined in the audio signal in operation 1102. In some implementations, the decision in operation 907 and/or the decision in operation 908, as illustrated in FIG. 9, can be correspondingly applied in the place operations 1107, wherein the vibration frequency and/or the audio frequency associated with the own voice characteristic is selected as the fundamental vibration frequency and/or an alias frequency of the fundamental vibration frequency. In some implementations, the determining of the signal feature at multiple times in operations 1004, 1005 can be correspondingly applied in place of operation 603 and/or in place of operation 1103. Decision operations 1007, 1008 for the own voice characteristic at multiple times may be correspondingly performed at operation 1107.

(44) In some implementations, an audio signal characteristic is determined from the audio signal in operation 1113. Determining the audio signal characteristic can comprise estimating a signal to noise ratio (SNR) of the audio signal. Determining the audio signal characteristic can comprise estimating a volume level of the audio signal, in particular a volume level of the own voice activity and/or a volume level of other sound in the environment. The determined audio signal characteristic can be employed during the decision performed in operation 1107. For instance, a significance of the signal feature determined to be present in the audio signal can depend on an estimated SNR of the audio signal. For instance, the identification criterion applied in the decision in operation 1107 may predominantly depend on whether the signal feature is determined to be present in the vibration signal at the vibration frequency associated with the own voice characteristic when the SNR is estimated to be rather high in the audio signal. In some implementations, at least one of the vibration frequency associated with the own voice characteristic in the vibration signal and the audio frequency associated with the own voice characteristic in the audio signal is set depending on the audio signal characteristic. In particular, the audio signal characteristic can comprise an estimated volume level of the audio signal and at least one of the vibration frequency associated with the own voice characteristic in the vibration signal and the audio frequency associated with the own voice characteristic in the audio signal can be set depending on the estimated volume level, in order to account for the “Lombard effect” causing a frequency shift of the detected own voice activity at different speech volumes of the user.

(45) In some implementations, a speech recognition is performed in operation 1109. The speech recognition can be used to identify a content of a speech of the user during the own voice activity, for instance keywords spoken by the user. The speech recognition can employ the own voice characteristic determined in the vibration signal at the associated vibration signal and/or the own voice characteristic determined in the audio signal at the associated audio signal. To illustrate, peaks 147-149 and/or peaks 157-159 produced in vibration signals 132-134 may be identified as the respective vowels spoken by the user. In order to identify a plurality of vowels, consonants, words, phonemes, speech pauses, etc. successively spoken by the user, the own voice characteristic can be determined in the vibration signal and/or in the audio signal at different times, in particular by correspondingly applying operations 1003, 1004 illustrated in FIG. 10 in the place of operation 603 and/or in the place of operation 1103. For instance, the vibration signal and/or the audio signal may be evaluated in a modulation analysis to determine the own voice characteristic at the associated frequency at different times. In some implementations, the determination of the own voice characteristic at different times can be employed in addition to another speech recognition process, in order to improve and/or stabilize the performance of the speech recognition.

(46) FIG. 12 illustrates a method of providing data relative to an own voice characteristic and a vibration frequency associated with the own voice characteristic. The data can be representative of parameters that are employed for an own voice detection according to the methods described above in conjunction with FIGS. 6-11. The data can be stored in memory 104 and/or processed by processor 102. The data may be employed, for instance, in operation 702 of maintaining data relative to own voice characteristics in a database, and/or in operation 703 of retrieving a vibration frequency associated with the own voice characteristic from the data, as illustrated in FIG. 7. After providing the vibration signal in operation 602, an own voice characteristic is derived in the vibration signal in operation 1203. A vibration frequency associated with the own voice characteristic is identified in operation 1204. Operations 1203, 1204 can be performed simultaneously or subsequently. The identified vibration frequency associated with the own voice feature is stored in a database for own voice characteristics in operation 1209. The associated vibration frequency can be retrieved from the database, in particular in the methods illustrated in FIGS. 6-11, to determine the presence of the own voice characteristic at the associated vibration frequency.

(47) In some implementations, operation 1203 of deriving the own voice characteristic can comprise determining a signal feature in operation 603 and classifying the signal feature as the own voice characteristic. In particular, classifying operation 804 based on a pattern of own voice characteristics provided in operation 805, as illustrated in FIG. 8, can be correspondingly applied. In this way, the vibration frequency associated with the own voice characteristic can be identified in operation 1204, in particular as the vibration frequency at which the own voice characteristic has been determined to be present in operation 1203. The associated vibration frequency may thus be identified despite variations of the frequency occurring at different conditions, for instance different users, different volume levels of the own voice activity, different environmental settings, etc.

(48) In some implementations, operation 1203 of deriving the own voice characteristic can comprise initiating a training operation for an individual user. During the training operation, the user can be instructed to perform a predetermined own voice activity. The own voice characteristic in the vibrations signal that can be attributed to the own voice activity can thus be identified during operation 1203. The associated vibration frequency can thus be identified during operation 1204, in particular as the vibration frequency at which the own voice characteristic has been determined to be present in operation 1203. Initiating the training operation can comprise, for instance, instructing the user to pronounce a certain number of vowels, consonants, phonemes, words, etc. The user may also be instructed to perform the own voice activity at different volume levels.

(49) FIG. 13 illustrates another method of providing data relative to an own voice characteristic and a vibration frequency associated with the own voice characteristic. In operation 1305, a similarity relation between the signal feature determined in the vibration signal in operation 603 and the signal feature determined in the vibration signal in operation 1103 is determined. Similarity determining operation 1305 can comprise determining a similarity measure between the signal feature determined in the vibration signal and the audio signal. Determining the similarity measure can comprise determining a comparison and/or correlation, for instance a cross-correlation, of the vibration signal and the audio signal with respect to the frequency at which the signal features have been determined. Thus, the signal features determined in operations 603, 1103 can be evaluated with respect to the comparison and/or correlation. In particular, the vibration frequency and the audio frequency at which the signal features have been determined in operations 603, 1103 can be evaluated with respect to the comparison and/or correlation.

(50) A decision in operation 1305 can then be performed depending on the determined similarity measure. In a situation in which a determined similarity has been determined to be larger than a similarity threshold, for instance a correlation has been determined to be large enough, at least one of a vibration frequency associated with the own voice characteristic in the vibration signal and an audio frequency associated with the own voice characteristic in the audio signal can be identified based on the similarity measure in operation 1204. For instance, the associated vibration frequency and/or the associated audio frequency may then be selected to correspond to the vibration frequency and/or audio frequency at which the at least one of the signal features has been determined in operations 603, 1103. The associated vibration frequency and/or the associated audio frequency can then be stored in the data base for own voice characteristics in operation 1209. In a contrary situation, in which the similarity has not been determined to be larger than the similarity threshold, the associated vibration frequency and/or the associated audio frequency cannot be identified and the data base for own voice characteristics is maintained in its present state in operation 702.

(51) In some implementations, operation 1113 of determining an audio signal characteristic, as described above in conjunction with the method illustrated FIG. 11, can be employed during the decision performed in operation 1305. In particular, determining the similarity measure and/or setting a similarity threshold can depend on the audio signal characteristic. For instance, the similarity threshold can be set depending on an estimated SNR of the audio signal. For instance, the similarity measure can be determined depending on an estimated volume level of the audio signal, in particular to account for a frequency shift of the detected own voice activity at different speech volumes of the user caused by the “Lombard effect”. In some implementations, an audio signal characteristic determined in operation 1113 can be stored in the data base for own voice characteristics in operation 1209. The audio signal characteristic related to the own voice characteristic can thus be retrieved from the database in addition to the associated vibration frequency and/or audio frequency identified in operation 1204.

(52) In some implementations, the hearing device is configured to operate in a first mode of operation in which an own voice activity of the user is detected and in a second mode of operation in which the hearing device can be prepared for the detection of the own voice activity. The first mode of operation may be implemented by at least one of the methods illustrated in FIGS. 6-11 and/or other combinations of the operations illustrated in those methods. The second mode of operation may be implemented by at least one of the methods illustrated in FIGS. 12 and 13 and/or other combinations of the operations illustrated in those methods.

(53) FIGS. 14-18 illustrate exemplary signal processing configurations of a hearing device according to some embodiments of the present disclosure. Other embodiments may omit, add to, reorder and/or modify any of the functional components shown in FIGS. 14-18. Some embodiments may be implemented in hearing device 100 illustrated in FIG. 1. In particular, some of the illustrated functional components may be operated by processor 102, for instance in a signal processing routine, algorithm, program and/or the like. Other illustrated functional components may be operatively coupled to processor 102, for instance to provide and/or modify a signal processed by processor 102. Some embodiments may be implemented in a hearing device comprising additional constituent parts, for instance an additional microphone and/or beamformer. Some embodiments may be implemented in a hearing system comprising two hearing devices in a binaural configuration.

(54) FIG. 14 illustrates an exemplary signal processing configuration 1401 for a signal processing of a vibration signal provided by vibration sensor 108. As shown, signal processing configuration 1401 comprises a peak detector 1403 configured for peak detection in the vibration signal. To illustrate, peak detector 1403 can be configured to detect at least one of peaks 147-149 and/or at least one of peaks 157-159 produced in vibration signals 132-134 at an associated vibration frequency. For instance, peak detector 1403 can be configured to perform at least one of operations 603, 1003, 1004. Signal processing configuration 1401 further comprises an own voice identifier 1407 configured to identify an own voice activity based on an identification criterion. The identification criterion can comprise the presence of the detected peak or at least one of the detected peaks at a vibration frequency associated with an own voice characteristic. For instance, own voice identifier 1407 can be configured to perform at least one of operations 607, 907, 908, 1007, 1008. Peak detector 1403 and/or own voice identifier 1407 may be operated by processor 102.

(55) In some implementations, peak detector 1403 is configured for peak detection at an harmonic frequency, for instance the fundamental frequency, of the vibration detected by vibration sensor 108, as illustrated by component 104 constituting a harmonic frequency peak detector. A determination, if the detected peak is present at the harmonic frequency, can be carried out simultaneously during peak detection, for instance by harmonic frequency peak detector 1404, or after peak detection, for instance by own voice identifier 1407. In some implementations, peak detector 1403 is configured for peak detection at an alias frequency, of the vibration detected by vibration sensor 108, as illustrated by component 105 constituting an alias frequency peak detector. A determination, if the detected peak is present at the alias frequency, can be carried out simultaneously during peak detection, for instance by alias frequency peak detector 1405, or after peak detection, for instance by own voice identifier 1407.

(56) FIG. 15 illustrates an exemplary signal processing configuration 1501 for a signal processing of a vibration signal provided by vibration sensor 108, and an audio signal provided by microphone 106. A high pass filter 1504 is provided to modify the vibration signal before the peak detection by peak detector 1403. High pass filter 1504 can thus provide the vibration signal with a signal content that is of specific interest for the detection of an own voice activity, in particular such that the peak detection can be facilitated. A low pass filter 1505 is provided to modify the audio signal before a peak detection in the audio signal by a peak detector 1503. For instance, peak detector 1503 can be configured to perform operation 1103. Low pass filter 1505 can thus provide the audio signal with a signal content that is of specific interest for the detection of an own voice activity, in particular such that the peak detection by audio signal peak detector 1503 can be facilitated. Vibration signal peak detector 1403 can comprise harmonic frequency peak detector 1404 and/or alias frequency peak detector 1405, as illustrated in FIG. 14. Audio signal peak detector 1503 can comprise corresponding components configured for peak detection at an harmonic frequency of the sound detected by microphone 106 and/or at an alias frequency of the sound. Signal processing configuration 1501 further comprises a correlator and/or comparator 1506. Correlator and/or comparator 1506 is configured to correlate and/or compare the peaks detected by vibration signal peak detector 1403 and audio signal peak detector 1503. For instance, correlator and/or comparator 1506 can be configured to perform operation 1305. A result of the correlation and/or comparison is provided to own voice identifier 1407. The identification criterion applied by own voice identifier 1407 can comprise the result of the correlation and/or comparison.

(57) FIG. 16 illustrates another exemplary signal processing configuration 1601 for a signal processing of a vibration signal provided by vibration sensor 108, and an audio signal provided by microphone 106. Signal processing configuration 1601 comprises a modulation analyzer 1605 configured to evaluate the vibration signal in a modulation analysis. The vibration signal modulation analyzer 1605 can thus provide temporal information about the peaks detected by peak detector 1403. For instance, modulation analyzer 1605 can provide information if a first peak detected by peak detector 1403 temporally precedes a second peak detected by peak detector 1403 in the vibration signal. For instance, modulation analyzer 1605 can provide information about a time interval between the detected peaks. Signal processing configuration 1601 comprises another modulation analyzer 1606 configured to evaluate the audio signal in a corresponding way with respect to the peaks detected by audio signal peak detector 1503 for different times in the audio signal. The temporal information provided by the audio signal modulation analyzer 1606 and vibration signal modulation analyzer 1605 can be used by correlator and/or comparator 1506, in particular to correlate and/or compare the peaks detected by vibration signal peak detector 1403 and audio signal peak detector 1503 at corresponding times. The temporal information provided by the audio signal modulation analyzer 1606 and vibration signal modulation analyzer 1605 can further be used by own voice identifier 1407 to identify the own voice activity based on the temporal information. For instance, the identification criterion applied by own voice identifier 1407 can comprise that a time interval between the detected peaks determined by vibration signal modulation analyzer 1605 and/or audio signal modulation analyzer 1606 corresponds to a predetermined time interval and/or does not exceed a predetermined maximum duration.

(58) In some implementations, signal processing configuration 1601 further comprises a speech recognizer 1609. Speech recognizer 1609 is configured to identify a content of a speech of the user identified as an own voice activity by own voice identifier 1407. The speech recognition can be based on spectral information comprising the frequencies associated with the previously detected peaks by peak detectors 1403, 1503 and/or temporal information comprising the time interval between the detected peaks provided by modulation analyzers 1605, 1606. For instance, keywords and/or commands and/or sentences spoken by the user may be identified in such a configuration.

(59) FIG. 17 illustrates another exemplary signal processing configuration 1701 for a signal processing of a vibration signal provided by vibration sensor 108, and an audio signal provided by microphone 106. In addition to microphone 106, at least one further microphone 1706 is provided configured to detect the sound detected by microphone 106 at a distance to microphone 106, and to provide a supplementary audio signal comprising information about this sound. For instance, microphone 1706 can be implemented in hearing device 100. The audio signal provided by microphone 106 and the supplementary audio signal provided by supplementary microphone 1706 are processed by a beamformer 1702. A directionality of the beamformer is directed toward the user's mouth when an own voice activity has been identified by own voice identifier 1407, in particular to improve further detection of the own voice activity and/or the speech recognition by speech recognizer 1609.

(60) In some implementations, an audio signal comprising information about the multiple audio signals provided by microphones 106, 1706 is provided by beamformer 1702 to audio signal peak detector 1503 and to audio signal modulation analyzer 1606. In some implementations, the audio signal provided by microphone 106 and the audio signal provided by microphone 1706 are provided separately to audio signal peak detector 1503 and to audio signal modulation analyzer 1606. Correlator and/or comparator 1506 can be configured to correlate and/or compare the peaks detected by vibration signal peak detector 1403 in the vibration signal and the peaks detected by audio signal peak detector 1503 in the respective audio signal of both microphones 106, 1706.

(61) FIG. 18 illustrates another exemplary signal processing configuration 1801 for a signal processing of a vibration signal provided by vibration sensor 108, and an audio signal provided by microphone 106. An additional microphone 1806 is provided. Additional microphone 1806 can be implemented in an additional hearing device corresponding to hearing device 100, in the place of microphone 106. A hearing system comprising hearing device 100 and the additional hearing device can thus be worn in a binaural configuration. Additional microphone 1806 is configured to detect sound from the environment during sound detection of microphone 106 and to provide an additional audio signal. The additional audio signal provided by microphone 8106 is provided to an additional audio signal peak detector 1803 and an additional audio signal modulation analyzer 1806. Correlator and/or comparator 1506 can be configured to correlate and/or compare the peaks detected by vibration signal peak detector 1403 in the vibration signal, the peaks detected by audio signal peak detector 1503 in the audio signal of microphone 106, and the peaks detected by additional audio signal peak detector 1803 in the additional audio signal of additional microphone 1806. In some implementations, each of microphones 106 and 1806 may be operatively connected to a beamformer to enable binaural beamforming. In particular, the configuration depicted in FIG. 17 comprising beamformer 1702 may be correspondingly applied, wherein microphone 1706 may be provided in hearing device 100 and another corresponding microphone may be provided in the additional hearing device at a distance to microphone 1806.

(62) While the principles of the disclosure have been described above in connection with specific devices and methods, it is to be clearly understood that this description is made only by way of example and not as limitation on the scope of the invention. The above described preferred embodiments are intended to illustrate the principles of the invention, but not to limit the scope of the invention. Various other embodiments and modifications to those preferred embodiments may be made by those skilled in the art without departing from the scope of the present invention that is solely defined by the claims.