Speech Processing Method and System in A Cochlear Implant
20220068289 · 2022-03-03
Assignee
Inventors
Cpc classification
International classification
Abstract
The invention discloses a speech processing method and system in a cochlear implant. The method includes: obtaining a sound signal, and converting the sound signal into a digital signal; decomposing the digital signal using a mode decomposition method, obtaining a plurality of intrinsic mode functions, and converting the plurality of intrinsic mode functions into instantaneous frequencies and instantaneous amplitudes or instantaneous energy intensities; sorting the instantaneous frequencies to corresponding the preset electrode frequency bands of the electrodes in the cochlear implant; selecting N most energetic components from the corresponding frequency bands of the electrodes, and generating corresponding electrode stimulation signals according to the selected components. The present invention analyzes sound and composes the final electrode signals all in the time domain based on Hilbert-Huang transform; it is not limited by the principle of uncertainty, and there is no noise generated by harmonics.
Claims
1. A speech processing method in a cochlear implant, characterized in that, it includes the following steps: obtaining a sound signal, and converting the sound signal into a digital signal; decomposing the digital signal using a mode decomposition method, obtaining a plurality of intrinsic mode functions, and converting the plurality of intrinsic mode functions into instantaneous frequencies and instantaneous amplitudes or instantaneous energy intensities; sorting the instantaneous frequencies to corresponding the preset frequency bands of the electrodes in the cochlear implant; selecting N most energetic components from the corresponding frequency bands of the electrodes, and generating corresponding electrode stimulation signals according to the selected components.
2. The speech processing method of claim 1, characterized in that, the mode decomposition method includes Empirical Mode Decomposition method, Ensemble Empirical Mode Decomposition method, or Conjugate Adaptive Dyadic Masking Empirical Mode Decomposition method.
3. The speech processing method of claim 1, characterized in that, it further includes: before decomposing the digital signal using the mode decomposition method, using one of the following methods to suppress noise: adaptive filter bank method or artificial intelligence method.
4. The speech processing method of claim 1, characterized in that, it further includes: before decomposing the digital signal using the mode decomposition method, using one of the following methods to eliminate the cocktail party problem: Computational Auditory Scene Analysis, Non-negative Matrix Factorization, generative model modeling, beamforming, multi-channel blind source separation, Deep Clustering, Deep Attractor Network, and Permutation Invariant Training.
5. The speech processing method of claim 1, characterized in that, it further includes: selecting N most energetic components from the corresponding electrode frequency bands, wherein N≤6, and the energy values of these electrode frequency components are higher than the preset threshold.
6. The speech processing method of claim 1, characterized in that, it further includes: automatic gain control, which adjusts the stimulation signal of each electrode according to patient's audiogram.
7. The speech processing method of claim 1, characterized in that, it further includes: generating the electrode stimulation signal corresponding to the selected intrinsic mode functions by one of the following methods: Simultaneous Analog Signal, Compression Analysis, and Continuous Interleaved Sampling.
8. The speech processing method of claim 1, characterized in that, it further includes: the preset frequency bands in the cochlear implant correspond to the electrodes in the cochlear implant one to one, and the number of electrodes is greater than or equal to 20.
9. A speech processing method in a cochlear implant, characterized in that, it includes the following steps: obtaining a sound signal, and converting the sound signal into a digital signal; decomposing the digital signal using an adaptive filter bank method, obtaining a plurality of pseudo-intrinsic mode functions, and converting the plurality of pseudo-intrinsic mode functions into instantaneous frequencies and instantaneous amplitudes or instantaneous energy intensities; sorting the instantaneous frequencies to corresponding the preset frequency bands of electrodes in the cochlear implant; selecting N most energetic components from the corresponding frequency bands of the electrodes, and generating corresponding electrode stimulation signals according to the selected components.
10. The speech processing method of claim 9, characterized in that, the adaptive filter bank is a mean filter bank or a median filter bank.
11. A speech processing system in a cochlear implant using the speech processing method of claim 1, characterized in that, the speech processing system includes a sound receiving module, a sound processing module, and a signal transmission module, wherein the sound receiving module is configured to receive a sound signal, and convert the sound signal into a digital signal; the sound processing module is configured to perform the following operations: processing the digital signal to obtain a plurality of intrinsic mode functions or pseudo-intrinsic mode functions, and converting the plurality of intrinsic mode functions or pseudo-intrinsic mode functions into instantaneous frequencies and instantaneous amplitudes or instantaneous energy intensities; sorting the instantaneous frequencies to corresponding the preset frequency bands of the electrodes in the cochlear implant; selecting N most energetic components from the corresponding frequency bands of the electrodes, and generating corresponding electrode stimulation signals according to the selected components; and the signal transmission module is configured to transmit the electrode stimulation signals generated by the sound processing module to the electrodes in the cochlear implant, so that the electrodes generate stimulation signals corresponding to the sound.
12. A speech processing system in a cochlear implant using the speech processing method of claim 9, characterized in that, the speech processing system includes a sound receiving module, a sound processing module, and a signal transmission module, wherein the sound receiving module is configured to receive a sound signal, and convert the sound signal into a digital signal; the sound processing module is configured to perform the following operations: processing the digital signal to obtain a plurality of intrinsic mode functions or pseudo-intrinsic mode functions, and converting the plurality of intrinsic mode functions or pseudo-intrinsic mode functions into instantaneous frequencies and instantaneous amplitudes or instantaneous energy intensities; sorting the instantaneous frequencies to corresponding the preset frequency bands of the electrodes in the cochlear implant; selecting N most energetic components from the corresponding frequency bands of the electrodes, and generating corresponding electrode stimulation signals according to the selected components; and the signal transmission module is configured to transmit the electrode stimulation signals generated by the sound processing module to the electrodes in the cochlear implant, so that the electrodes generate stimulation signals corresponding to the sound.
13. The speech processing system of claim 11, characterized in that, it operates mostly in time domain; and based on the decomposition method, the signals for each electrode are in terms of instantaneous frequencies and instantaneous energy intensities as a function of time without the help of spectral representation in any form.
14. The speech processing system of claim 12, characterized in that, it operates mostly in time domain; and based on the decomposition method, the signals for each electrode are in terms of instantaneous frequencies and instantaneous energy intensities as a function of time without the help of spectral representation in any form.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0049]
[0050]
[0051]
[0052]
[0053]
[0054]
[0055]
[0056]
[0057]
[0058]
[0059]
[0060]
DETAILED DESCRIPTION OF THE INVENTION
[0061] In the following, with the reference to the accompanying drawings and the preferred embodiments of the present invention, the technical means adopted by the present invention to achieve the intended purpose of the present invention will be further explained.
EXAMPLE 1
[0062] Referring to
[0063] In step 200, the signal after the noise filtering is decomposed by a mode decomposition method to obtain the Intrinsic Mode Functions (IMFs) of the sound signal. The mode decomposition method refers to any mode decomposition method that can obtain the Intrinsic Mode Function components (IMFs) of the signal. The mode decomposition method includes Empirical Mode Decomposition (EMD), Ensemble Empirical Mode Decomposition (EEMD), Conjugate Adaptive Dyadic Masking Empirical Mode Decomposition (CADM-EMD). In step 210, the mode decomposition result is converted into Instantaneous Frequencies (IF) and Instantaneous Amplitudes (IA) or instantaneous energy intensities. In step 220, according to the instantaneous frequency values, the Intrinsic Mode Function components are assigned to the frequency bands corresponding to the electrodes. The number of electrodes and the frequency bands corresponding to the electrodes have been preset. The greater the number of electrodes, the stronger the frequency resolution capability, and the better the effect achieved. However, there may be problems such as crosstalk between multiple electrodes, and the length of the implant is limited, so the number of electrodes that can be accommodated is also limited. Therefore, the number of electrodes should be appropriate. The frequencies corresponding to the electrodes should be determined according to the characteristics of the sound. For frequency bands where the sound frequencies are more concentrated (such as lower than 1000 Hz), the electrodes can be set densely to improve the frequency resolution. For frequency bands where the sound frequencies are not concentrated (such as higher than 1000 Hz), the number of electrodes can be set less, on an approximately logarithmic scale. To follow the principle of a limited number of electrodes, the number of electrodes can be selected as 20, for example, and the frequency values we specify are: 80, 100, 128, 160, 200, 256, 320, 400, 512, 640, 800, 1024, 1280, 1600, 2048, 2560, 3200, 4096, 5120, 6400, 8192. The specified 21 frequency values define 20 frequency bands, and every two adjacent frequencies define one frequency band. The first frequency band is 80-100 Hz, the second frequency band is 100-128 Hz, . . . , the 20th frequency band is 6400-8192 Hz. These 20 frequency bands correspond to the electrodes in the cochlear implant, and each electrode corresponds to a frequency band. It can be found from the above frequency values that a scale contains 3 frequencies, which are used to distinguish different frequencies in the same scale. In the present invention, more electrodes will improve the frequency difference, thereby improving the final sound quality. For example, the high cut-off frequency and low cut-off frequency can be changed, and up to 25 electrodes can be deployed in a small total range, and a better frequency difference between electrodes can be achieved. When the number of electrodes is 25, for example, the corresponding frequencies can be as follows: 50, 64, 75, 90, 105, 128, 150, 180, 210, 256, 300, 360, 420, 512, 600, 720, 840, 1024, 1200, 1440, 1680, 2048, 2400, 2880, 3360, 4096. Like the case of 20 electrodes, each electrode corresponds to a frequency band. The frequency band corresponding to the first electrode is 50-64 Hz, the frequency band corresponding to the second electrode is 64-75 Hz, . . . , the frequency band corresponding to the twenty-fifth electrode is 3360-4096 Hz. As the number of electrodes increases, the cochlear implant using the speech processing method of the present invention will gain higher and higher frequency resolution capabilities. Because when the number of electrodes increases, the instantaneous frequency set can be increased accordingly, and the resolution of the electrodes to the sound is improved, so the sound produced by the electrodes will be more realistic. Therefore, when using 88 electrodes, for example, we should be able to fully enjoy piano music, albeit with less colorful (timbre) of sounds because the piano sound for each key is highly nonlinear. After pairing the Intrinsic Mode Function components to the corresponding electrode frequency bands, then, in step 230, the most energetic components are selected from the corresponding electrode frequency bands, the number of selected electrodes is not more than 6 at the present time, and the number could increase when technology warrants, and the energy values of these components are higher than preset threshold. Because when multiple electrodes are stimulated at the same time, crosstalk may occur between the electrodes, so current experiments show that when the number of selected electrodes is not more than 6, the influence between the electrodes is small. In addition, the purpose of threshold setting here is that in speech, because there are pauses between different words, phrases or sentences, electrode stimulation is not needed during the pause, and the energy value of the sound component is low at this time, thus the threshold is used to filter the weak energy components at the pause. The threshold can be selected between 10% and 20% of the average energy of the sound.
[0064] In step 300, corresponding electrode stimulation signals are generated according to the selected components. The following methods can be used to generate electrode signals: Simultaneous Analog Signal (SAS), Compressive Analysis (CA), Continuous Interleaved Sampling (CIS). In step 310, automatic gain control is performed to limit its loudness. The automatic gain control is mainly based on the audiogram of the hearing-impaired patient to obtain the sound perception ability of the patient in different frequency ranges, and then adjust the stimulation signal of electrode corresponding to each frequency according to the patient's hearing test results. This step is optional, only for patients who still have remaining hearing ability. Then, in step 320, the electrode stimulation signal is transmitted to the corresponding electrode. When generating electrode signals, although there are some other methods that also claim to use selective frequency bands, such as Advanced Combinatorial Encoders (ACE), Dynamic Peak Picking, Spectral Peak (SPEAK), Current Steering, etc., but it should be noted that the effects of these methods are not obvious, because the implementation of these methods is based on the Fourier filter bank, which is always affected by virtual harmonics. When transmitted to a limited number of electrodes, any electric signal must represent the real neural signal of the sound, but the harmonic signal is not a real sound signal. In hearing aids, the cancellation and combination of harmonics will cause the fundamental wave to be amplified, resulting in louder annoying and yet unclear sound. In the cochlear implants, the harmonics are recetified, and they cannot be eliminated by combination and cancellation, which will cause unnecessary noise. Therefore, if the sound is full of harmonics (such as the sound in an instrument), the problem will become worse. These harmonics will all be intertwined and become inseparable, making music appreciation impossible.
[0065] Compared with the cochlear implant speech processing method based on the Fourier principle, the present invention has the advantages of: (1) The frequencies in the present invention are instantaneous frequencies, so it is not limited by the uncertainty principle; while the Fourier transform is an integral transform, and any method based on integral transform cannot obtain instantaneous frequencies; (2) In the HHT based cochlear implant speech processing method of the present invention, no harmonics will be generated, and each electric signal represents the true neural signal of the sound; while for Fourier based cochlear implants, there are some harmonics in the signal, which cannot be eliminated, resulting in a lot of unnecessary noise; (3) In the present invention, a larger number of electrodes can be used to improve the frequency difference, thereby improving the final sound quality; but cochlear implants based on the Fourier principle have harmonics, even if the number of electrodes is increased, the harmonics cannot be eliminated by combination and cancellation, that is, the final sound quality cannot be improved by increasing the number of electrodes; and (4) In the present invention, the sound components can be selectively amplified according to the patient's hearing test results, to preserve the natural cochlear function of some hearing-impaired patients.
[0066]
[0067]
[0068]
[0069]
[0070]
[0071]
[0072]
[0073]
[0074]
EXAMPLE 2
[0075] Furthermore, in order to save time, any method similar or equivalent to EMD can be used to replace EMD. For example, the running mean or median method of successive running different window sizes is repeatedly applied as needed, as high-pass filtering or other time-domain filtering for filtering the input signals. For example, in the running mean method, there is no guarantee that the signal obtained is a real IMF, which is a requirement for generating accurate and meaningful instantaneous frequencies. However, since spectrum analysis is not used, the approximate value is acceptable. Taking the successive running mean as an example, the steps should be as follows. First, the data is decomposed by successive running mean:
in which <F>.sub.nj represents the running mean of the window size nj (or the running median, reused if necessary). The advantage of using a rectangular filter is that the filter is adaptive, and the response function of the rectangular filter is well known. In addition, the repeated use of the rectangular filter actually changes the known response function of the filter. Repeating twice will produce a triangular filter, and repeating more than four times will produce a response function close to the Gaussian shape. The key parameter for using this filter is the size of the window. According to formula (3), we draw the following conclusions, if the sampling frequency is 22050 Hz, the rectangular filter and EMD have the following equivalent relationship:
nj=3˜7,000 Hz (4)
nj=7˜3,500 Hz
nj=15˜1,500 Hz
nj=31 ˜700 Hz
nj=61 ˜350 Hz
nj=121˜180 Hz
nj=241˜90 Hz
nj=481˜45 Hz
There is no need to continue filtering, because in any case we cannot hear sounds with frequencies lower than the next filter step. The disadvantage of using filters is that no filter is clearer than the above-mentioned EMD.
[0076] Selective enlargement or reduction can be realized as in formula (3), and the reconstructed signal y(t) is obtained as:
in which the value of a.sub.j can be determined according to patient's audiogram.
[0077] Because EMD is more time-consuming, but even so, its computational complexity can still be comparable to Fourier transform. If the filter method is used, the sound may not be particularly clear, because the mean filter does spread the filtered result in a wider time domain. The final result will not be as clear as the complete EMD method, but the filter method can be simpler and cheaper to implement.
[0078] Referring to
[0079] The above are only the preferred embodiments of the present invention, and do not limit the present invention in any form. Although the present invention has been disclosed as above in preferred embodiments, it is not intended to limit the present invention. Anyone who is familiar with the field, without departing from the scope of the technical solution of the present invention, can use the technical content disclosed above to make slight changes or modifications into equivalent embodiments with equivalent changes. Any simple modifications, equivalent changes and variations made to the above embodiments based on the technical essence of the present invention without departing from the technical solution of the present invention still fall within the scope of the technical solution of the present invention.