Estimation of harmonic frequencies for hearing implant sound coding using active contour models
10707836 · 2020-07-07
Assignee
Inventors
Cpc classification
H03H17/0219
ELECTRICITY
International classification
Abstract
A signal processing arrangement generates electrical stimulation signals to electrode contacts in an implanted cochlear implant array. An input sound signal is processed to generate band pass signals that each represent an associated band of audio frequencies. A spectrogram representative of frequency spectrum present in the input sound signal is generated. A characteristic envelope signal is produced for each band pass signal based on its amplitude. An active contour model is applied to estimate dominant frequencies present in the spectrogram, and the estimate is used to generate stimulation timing signals for the input sound signal. The electrode stimulation signals are produced for each electrode contact based on the envelope signals and the stimulation timing signals.
Claims
1. A method for generating electrode stimulation signals for electrode contacts in an implanted cochlear implant electrode array, the method comprising: processing an input sound signal to generate a plurality of band pass signals, each band pass signal representing an associated band of audio frequencies and having a characteristic amplitude; generating a spectrogram representative of frequency spectrum present in the input sound signal; extracting a characteristic envelope signal for each band pass signal based on its amplitude; applying an active contour model to estimate dominant frequencies present in the spectrogram; using the estimate of dominant frequencies to generate stimulation timing signals for the input sound signal; and producing the electrode stimulation signals for each electrode contact based on the envelope signals and the stimulation timing signals stimulating the auditory nerve tissue with the electrode contacts using the electrode stimulation signals.
2. The method according to claim 1, wherein the spectrogram is generated using a short time Fourier transformation (STFT).
3. The method according to claim 1, wherein the electrode stimulation signals include channel-specific sampling sequences (CSSS).
4. The method according to claim 1, wherein using the estimate of dominant frequencies includes smoothing the spectrogram.
5. The method according to claim 1, wherein the estimate of dominant frequencies includes a determination of one or more harmonic frequencies present in the spectrogram.
6. The method according to claim 1, wherein the method is iteratively repeated over a period of time intervals.
7. A system for generating electrode stimulation signals of a cochlear implant to electrode contacts in an implantable cochlear implant electrode array, the system comprising: an implantable electrode array having a plurality of electrode contacts; a preprocessor filter bank configured to process an input sound signal to generate a plurality of band pass signals, each band pass signal representing an associated band of audio frequencies and having a characteristic amplitude; a spectrogram module configured to generate a spectrogram representative of frequency spectrum present in the input sound signal; an envelope detector configured to extract a characteristic envelope signal for each band pass signal based on its amplitude; an active contour model module configured to: i. apply an active contour model to the spectrogram to estimate dominant frequencies present in the spectrogram, and ii. using the estimate of dominant frequencies to generate stimulation timing signals for the input sound signal; and a pulse generator configured to produce and apply the electrode stimulation signals to each electrode contact so as to stimulate the auditory nerve tissue, the electrode stimulation signals based on the envelope signals and the stimulation timing signals.
8. The system according to claim 7, wherein the spectrogram module is configured to use a short time Fourier transformation (STFT) to generate the spectrogram.
9. The system according to claim 7, wherein the active contour model module is configured to use Channel-Specific Sampling Sequences (CSSS) to generate the stimulation timing signals.
10. The system according to claim 7, wherein the active contour model module is configured to include in the estimate of dominant frequencies a smoothing of the spectrogram.
11. The system according to claim 7, wherein the active contour model module is configured to include in the estimate of dominant frequencies a determination of one or more harmonic frequencies present in the spectrogram.
12. The system according to claim 7, wherein the system is configured to iteratively repeat the processing of the input sound signal over a period of time intervals.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The file of this patent contains at least one photograph. Copies of this patent with photograph will be provided by the Office upon request and payment of the necessary fee.
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS
(12) The bandwidths of the band pass filters in a typical cochlear implant signal processor are quite large compared to the auditory filters in normal hearing, and there is likely to be more than one frequency harmonic in each electrode channel. This can cause a poor estimation of the instantaneous frequency of the dominant harmonic in a given channel.
(13) Aubert, Gilles, et al. Image segmentation using active contours: Calculus of variations or shape gradients?. SIAM Journal on Applied Mathematics 63.6 (2003): 2128-2154 (incorporated herein by reference in its entirety) describes active contour models to segment parts of image frames out of a video.
(14) Fruehauf, Florian, et al. Experiments and algorithms to detect snow avalanche victims using airborne ground-penetrating radar. Geoscience and Remote Sensing, IEEE Transactions on 47.7 (2009): 2240-2251.(incorporated herein by reference in its entirety) describes using an active contour model to segment the snow layer out of radar data to automatically detect avalanche victims by flying with a helicopter over an avalanche and using a radar antenna mounted on the helicopter that receives the radar data. The amount of data is very large and the evaluation must be available in real time, but still the snow layer can be extracted out of the radar data.
(15) Embodiments of the present invention are based on applying an active contour model to a spectrogram of the input sound signal, and using that to estimate the course of dominant frequencies such as the dominant harmonics. This estimation is then independent of the cochlear implant filter bank. Such embodiments may provide improved speech intelligibility and perception of music and pitch in hearing implant systems.
(16)
(17) TABLE-US-00001 Input Signal Preprocessing: BandPassFilter (input_sound, band_pass_signals) Spectrogram: Spectrogram (input_sound, spectrogram) Envelope Extraction: BandPassEnvelope (band_pass_signals, band_pass_envelopes) Active Contour Model: DominantFrequencies (spectrogram, dom_freqs) StimulationTiming (dom_freqs, stim_timing) Pulse Generation: PulseGenerate (band_pass_envelopes, stim_timing, out_pulses)
The details of such an arrangement are set forth in the following discussion.
(18) In the arrangement shown in
(19)
(20) The band pass signals U.sub.1 to U.sub.K (which can also be thought of as electrode channels) are output to an Envelope Detector 303, which extracts characteristic envelope signals outputs X.sub.1, . . . , X.sub.K, step 403, that represent the channel-specific band pass envelopes. The envelope extraction can be represented by X.sub.k=LP(|U.sub.k|), where |.| denotes the absolute value and LP (.) is a low-pass filter; for example, using 12 rectifiers and 12 digital Butterworth low pass filters of 2nd order, IIR-type. Alternatively, the Envelope Detector 303 may extract the Hilbert envelope, if the band pass signals U.sub.1, . . . , U.sub.K are generated by orthogonal filters.
(21) A Spectrogram Module 302 generates a spectrogram S representative frequency spectrum present in the input sound signal u, step 402; for example by using a short time Fourier transformation (STFT).
(22)
which assigns each time tT to a frequency h.sub.k(t)F. A timing signal Y.sub.k(t) can be obtained by
(23)
Then the time differences t.sub.k[n+1]t.sub.k[n] of the ones in Y.sub.k correlate with the estimated frequency of the k.sup.th harmonic.
(24) The spectrogram S then is the input signal for an Active Contour Model Module 304, which applies an active contour model to the spectrogram S, step 404. This may be generally as based on the use of active contour models as described in the prior art as to image processing, embodiments of the present invention represent the first use of such active contour model-based image processing techniques to the processing of input sound signals for a hearing implant. The Active Contour Model Module 304 then uses the estimate of the dominant frequencies present in the spectrogram S to generate stimulation timing signals, step 405.
(25) The extracted signal envelopes X.sub.1, . . . , X.sub.K from the Envelope Detector 303, and the stimulation timing signals Y.sub.1, . . . , Y.sub.K from the Active Contour Model Module 304 are input signals to a Pulse Generator 305 that produces the electrode stimulation signals Z for the electrode contacts in the implanted electrode array of the Implant 306, step 406. The Pulse Generator 305 applies a patient-specific mapping functionfor example, using instantaneous nonlinear compression of the envelope signal (map law)That is adapted to the needs of the individual cochlear implant user during fitting of the implant in order to achieve natural loudness growth. The Pulse Generator 305 may apply logarithmic function with a form-factor C as a loudness mapping function, which typically is identical across all the band pass analysis channels. In different systems, different specific loudness mapping functions other than a logarithmic function may be used, with just one identical function is applied to all channels or one individual function for each channel to produce the electrode stimulation signals. The electrode stimulation signals typically are a set of symmetrical biphasic current pulses.
(26) Returning to describe in greater detail the operation of the Active Contour Model Module 304, the spectrogram S can be treated as a continuous mapping from FTR.sub.+.sup.2.fwdarw.R.sub.+, where R.sub.+ denotes the positive real numbers. First, the Active Contour Model Module 304 smooths the spectrogram S for robustness reasons: .sub.1(f, t)=S(f, t)+.Math..sub.fS(f,t), where .sub.f is the Laplace operator corresponding to the frequency and >0. This smoothing corresponds to solving a one dimensional heat equation with initial condition S up to time . is a parameter and determines the smoothing. The larger is chosen, the stronger the smoothing will be.
(27) Then the Active Contour Model Module 304 can detect the first harmonic present in the spectrograms S, given p.sub.1,t:ff+.sub.1(f, t) be a potential for fixed t and >0.
(28)
(29) The foregoing uses only the information for a single moment in time, so that a single false estimation can occur and the estimated harmonics then will not be smooth, and during speech pauses, the course of the estimation will become erratic. To avoid that, the calculations can be iteratively repeated over a period of time intervals. Consider the function:
(30)
where h(t) denotes the derivative with respect to time. The k.sup.th harmonic can be chosen as the local minimizer h.sub.k of H.sub.k. The first term of H.sub.k forces h.sub.k to be in the first local minimum, while the second term makes h.sub.k smooth. The parameter controls the influence of the two terms. The calculation of the minimizer can be done, for example, by a steepest descent method where the corresponding Euler Lagrange equation must be solved:
(31)
This is done iteratively by
(32)
with step size d.
(33) Based on those considerations, the following implementation is yielded in a discrete setting where the spectrogram S and .sub.k are given as a matrix: S: {f.sub.1, . . . , f.sub.M}{t.sub.1, . . . , t.sub.N}.fwdarw.R.sub.+: 1. Define a rounding operator: f=argmin.sub.{f.sub.
(34)
(35)
Go to step 3.
(36)
(37)
can be exchanged by
(38)
in 4.a.ii. of the above implementation. This yields for h.sub.k,i(t.sub.n)=f.sub.i that g.sub.i(t.sub.n) is the cumulative sum of .sub.k(.Math., t.sub.n). Thus, the stopping criteria 5.a. above is reached at a high value of .sub.k(.Math., t.sub.n). Furthermore, a factor also can be introduced into the calculation of .sub.k in step 8 since the harmonic amplitudes generally decrease. Thus
(39)
In
(40) Considering an actual in a hearing implant speech processor, the spectrogram S can be divided into segments in time: S.sub.l,L: {f.sub.1, . . . , f.sub.M}{t.sub.l+1, . . . , t.sub.L+1}.fwdarw.R.sub.+ with S.sub.l,L(.Math., t.sub.j)=S(.Math., t.sub.j+1) for j=1, . . . , L. Then the modified implementation just discussed can be applied to each S.sub.l,L and achieve the harmonics h.sub.k.sup.l for l=0, . . . , NL. Setting h.sub.k(t.sub.j)=h.sub.k.sup.0(t.sub.j) for j=0, . . . , L (initialization) and for the following segments, using only the last value: h.sub.k(t.sub.L+l)=h.sub.k.sup.l(t.sub.L+l) for l=1, . . . , NL. The resultant harmonics are shown in
(41) Various refinements or modifications alone or in combination are possible in different specific embodiments, including: Other methods can be used to get the spectrogram S; for example, applying an STFT on the band pass signals U.sub.1, . . . , U.sub.K. The calculation of the stimulation timing signals Y.sub.k can be changed. The approach of the active contour model can be changed; for example, using another smoothing term .sub.t.sub.
(42)
into the functional H.sub.k. The information of the estimated harmonics can also be used in noise reduction and/or a classification algorithm to improve the signal-processing in these modules.
(43) Applying an active contour model to estimate dominant frequencies present in a spectrogram of the input sound signal can further lead to the development of new coding strategies concepts where the actual harmonics determine the starting points of a CSSS and/or the psychoacoustic phenomenon of the phantom fundamental can be exploited. The course of the dominant frequencies that is determined could also be useful in a scene classification algorithm, and the acquired classification could then be used to control further signal processing. For example a stationary noise reduction (NR) could be turned off when listening to music, or a beamformer could be turned on in a conversation with loud surrounding background noise. The knowledge of the dominant frequencies can be used in a NR as information for a voice-activity detector, which might be able to distinguish between speech and other sounds based on the harmonics present in speech.
(44) Embodiments of the invention may be implemented in part in any conventional computer programming language. For example, preferred embodiments may be implemented in a procedural programming language (e.g., C) or an object oriented programming language (e.g., C++, Python). Alternative embodiments of the invention may be implemented as pre-programmed hardware elements, other related components, or as a combination of hardware and software components.
(45) Embodiments can be implemented in part as a computer program product for use with a computer system. Such implementation may include a series of computer instructions fixed either on a tangible medium, such as a computer readable medium (e.g., a diskette, CD-ROM, ROM, or fixed disk) or transmittable to a computer system, via a modem or other interface device, such as a communications adapter connected to a network over a medium. The medium may be either a tangible medium (e.g., optical or analog communications lines) or a medium implemented with wireless techniques (e.g., microwave, infrared or other transmission techniques). The series of computer instructions embodies all or part of the functionality previously described herein with respect to the system. Those skilled in the art should appreciate that such computer instructions can be written in a number of programming languages for use with many computer architectures or operating systems. Furthermore, such instructions may be stored in any memory device, such as semiconductor, magnetic, optical or other memory devices, and may be transmitted using any communications technology, such as optical, infrared, microwave, or other transmission technologies. It is expected that such a computer program product may be distributed as a removable medium with accompanying printed or electronic documentation (e.g., shrink wrapped software), preloaded with a computer system (e.g., on system ROM or fixed disk), or distributed from a server or electronic bulletin board over the network (e.g., the Internet or World Wide Web). Of course, some embodiments of the invention may be implemented as a combination of both software (e.g., a computer program product) and hardware. Still other embodiments of the invention are implemented as entirely hardware, or entirely software (e.g., a computer program product).
(46) Although various exemplary embodiments of the invention have been disclosed, it should be apparent to those skilled in the art that various changes and modifications can be made which will achieve some of the advantages of the invention without departing from the true scope of the invention.