Speech Processing Method and System in A Cochlear Implant

Abstract

The invention discloses a speech processing method and system in a cochlear implant. The method includes: obtaining a sound signal, and converting the sound signal into a digital signal; decomposing the digital signal using a mode decomposition method, obtaining a plurality of intrinsic mode functions, and converting the plurality of intrinsic mode functions into instantaneous frequencies and instantaneous amplitudes or instantaneous energy intensities; sorting the instantaneous frequencies to corresponding the preset electrode frequency bands of the electrodes in the cochlear implant; selecting N most energetic components from the corresponding frequency bands of the electrodes, and generating corresponding electrode stimulation signals according to the selected components. The present invention analyzes sound and composes the final electrode signals all in the time domain based on Hilbert-Huang transform; it is not limited by the principle of uncertainty, and there is no noise generated by harmonics.

Claims

1. A speech processing method in a cochlear implant, characterized in that, it includes the following steps: obtaining a sound signal, and converting the sound signal into a digital signal; decomposing the digital signal using a mode decomposition method, obtaining a plurality of intrinsic mode functions, and converting the plurality of intrinsic mode functions into instantaneous frequencies and instantaneous amplitudes or instantaneous energy intensities; sorting the instantaneous frequencies to corresponding the preset frequency bands of the electrodes in the cochlear implant; selecting N most energetic components from the corresponding frequency bands of the electrodes, and generating corresponding electrode stimulation signals according to the selected components.

2. The speech processing method of claim 1, characterized in that, the mode decomposition method includes Empirical Mode Decomposition method, Ensemble Empirical Mode Decomposition method, or Conjugate Adaptive Dyadic Masking Empirical Mode Decomposition method.

3. The speech processing method of claim 1, characterized in that, it further includes: before decomposing the digital signal using the mode decomposition method, using one of the following methods to suppress noise: adaptive filter bank method or artificial intelligence method.

4. The speech processing method of claim 1, characterized in that, it further includes: before decomposing the digital signal using the mode decomposition method, using one of the following methods to eliminate the cocktail party problem: Computational Auditory Scene Analysis, Non-negative Matrix Factorization, generative model modeling, beamforming, multi-channel blind source separation, Deep Clustering, Deep Attractor Network, and Permutation Invariant Training.

5. The speech processing method of claim 1, characterized in that, it further includes: selecting N most energetic components from the corresponding electrode frequency bands, wherein N≤6, and the energy values of these electrode frequency components are higher than the preset threshold.

6. The speech processing method of claim 1, characterized in that, it further includes: automatic gain control, which adjusts the stimulation signal of each electrode according to patient's audiogram.

7. The speech processing method of claim 1, characterized in that, it further includes: generating the electrode stimulation signal corresponding to the selected intrinsic mode functions by one of the following methods: Simultaneous Analog Signal, Compression Analysis, and Continuous Interleaved Sampling.

8. The speech processing method of claim 1, characterized in that, it further includes: the preset frequency bands in the cochlear implant correspond to the electrodes in the cochlear implant one to one, and the number of electrodes is greater than or equal to 20.

9. A speech processing method in a cochlear implant, characterized in that, it includes the following steps: obtaining a sound signal, and converting the sound signal into a digital signal; decomposing the digital signal using an adaptive filter bank method, obtaining a plurality of pseudo-intrinsic mode functions, and converting the plurality of pseudo-intrinsic mode functions into instantaneous frequencies and instantaneous amplitudes or instantaneous energy intensities; sorting the instantaneous frequencies to corresponding the preset frequency bands of electrodes in the cochlear implant; selecting N most energetic components from the corresponding frequency bands of the electrodes, and generating corresponding electrode stimulation signals according to the selected components.

10. The speech processing method of claim 9, characterized in that, the adaptive filter bank is a mean filter bank or a median filter bank.

11. A speech processing system in a cochlear implant using the speech processing method of claim 1, characterized in that, the speech processing system includes a sound receiving module, a sound processing module, and a signal transmission module, wherein the sound receiving module is configured to receive a sound signal, and convert the sound signal into a digital signal; the sound processing module is configured to perform the following operations: processing the digital signal to obtain a plurality of intrinsic mode functions or pseudo-intrinsic mode functions, and converting the plurality of intrinsic mode functions or pseudo-intrinsic mode functions into instantaneous frequencies and instantaneous amplitudes or instantaneous energy intensities; sorting the instantaneous frequencies to corresponding the preset frequency bands of the electrodes in the cochlear implant; selecting N most energetic components from the corresponding frequency bands of the electrodes, and generating corresponding electrode stimulation signals according to the selected components; and the signal transmission module is configured to transmit the electrode stimulation signals generated by the sound processing module to the electrodes in the cochlear implant, so that the electrodes generate stimulation signals corresponding to the sound.

12. A speech processing system in a cochlear implant using the speech processing method of claim 9, characterized in that, the speech processing system includes a sound receiving module, a sound processing module, and a signal transmission module, wherein the sound receiving module is configured to receive a sound signal, and convert the sound signal into a digital signal; the sound processing module is configured to perform the following operations: processing the digital signal to obtain a plurality of intrinsic mode functions or pseudo-intrinsic mode functions, and converting the plurality of intrinsic mode functions or pseudo-intrinsic mode functions into instantaneous frequencies and instantaneous amplitudes or instantaneous energy intensities; sorting the instantaneous frequencies to corresponding the preset frequency bands of the electrodes in the cochlear implant; selecting N most energetic components from the corresponding frequency bands of the electrodes, and generating corresponding electrode stimulation signals according to the selected components; and the signal transmission module is configured to transmit the electrode stimulation signals generated by the sound processing module to the electrodes in the cochlear implant, so that the electrodes generate stimulation signals corresponding to the sound.

13. The speech processing system of claim 11, characterized in that, it operates mostly in time domain; and based on the decomposition method, the signals for each electrode are in terms of instantaneous frequencies and instantaneous energy intensities as a function of time without the help of spectral representation in any form.

14. The speech processing system of claim 12, characterized in that, it operates mostly in time domain; and based on the decomposition method, the signals for each electrode are in terms of instantaneous frequencies and instantaneous energy intensities as a function of time without the help of spectral representation in any form.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0049] FIG. 1 is the flow chart of the cochlear implant speech processing method of the present invention.

[0050] FIG. 2 is the sound signal diagram of the Chinese sentence, ‘Zeng xiansheng zao’ (which means ‘Good Morning Mr. Zeng’).

[0051] FIG. 3 is the sound components diagram of the sound signals in FIG. 2 after being filtered by a Fourier bandpass filter bank.

[0052] FIG. 4 is the Fourier time-frequency diagram of the sound signals in FIG. 2.

[0053] FIG. 5 is the sound components diagram of the sound signals in FIG. 2 after EMD decomposition.

[0054] FIG. 6 is the Hilbert time-frequency diagram of the sound signals in FIG. 2.

[0055] FIG. 7 is the IMF components diagram of the sound signals in FIG. 2 obtained by Ensemble Empirical Mode Decomposition, in which the noise level is low (1%), and there are only 2 members in the ensemble.

[0056] FIG. 8 is the IMF components diagram of the sound signals in FIG. 2 obtained by Ensemble Empirical Mode Decomposition, in which the noise level is high (10%), and there are 16 members in the ensemble.

[0057] FIG. 9 is the time-frequency diagram of the 20-electrode frequency band simulation of the IMFs given in FIG. 5, but the frequency axis is plotted in logarithmic scale.

[0058] FIG. 10 is the time-frequency diagram of the 20-electrode frequency band simulation of the IMFs given in FIG. 7 but the frequency axis is plotted in logarithmic scale.

[0059] FIG. 11 is the time-frequency diagram of the 20-electrode frequency band simulation of the IMFs given in FIG. 8 but the frequency axis is plotted in logarithmic scale.

[0060] FIG. 12 shows the cochlear implant speech processing system of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

[0061] In the following, with the reference to the accompanying drawings and the preferred embodiments of the present invention, the technical means adopted by the present invention to achieve the intended purpose of the present invention will be further explained.

EXAMPLE 1

[0062] Referring to FIG. 1, it is detailed implementation of the cochlear implant speech processing method of the present invention. In step 100, a sound signal is digitized, wherein the sampling frequency can be selected as required. To achieve higher fidelity, a high-frequency sampling frequency, 22 KHz or 44 KHz can be used (22 KHz and 44 KHz are the sampling frequencies used by current mainstream acquisition cards). Because some noise may appear in sound, the noise needs to be suppressed or eliminated, in step 110, noise suppression is performed. In noise suppression, adaptive filters can be used, or artificial intelligence methods, such as RNN, DNN, MLP, etc., can be used. In addition, the “cocktail party problem” is also an important issue in the field of speech recognition. The current speech recognition technology can already recognize a person's words with high accuracy, but when there are two or more people speaking, the speech recognition rate will be greatly reduced. This problem is called the cocktail party problem. In step 120, the following techniques can be used to eliminate the cocktail party problem: for single-channel situations, Computational Auditory Scene Analysis (CASA), Non-negative Matrix Factorization (NMF) and generative model modeling can be used; for multi-channel situations, technologies such as beamforming or multi-channel blind source separation can be used; some techniques based on deep learning can also be used, such as Deep Clustering, Deep Attractor Network (DANet) and Permutation Invariant Training (PIT).

[0063] In step 200, the signal after the noise filtering is decomposed by a mode decomposition method to obtain the Intrinsic Mode Functions (IMFs) of the sound signal. The mode decomposition method refers to any mode decomposition method that can obtain the Intrinsic Mode Function components (IMFs) of the signal. The mode decomposition method includes Empirical Mode Decomposition (EMD), Ensemble Empirical Mode Decomposition (EEMD), Conjugate Adaptive Dyadic Masking Empirical Mode Decomposition (CADM-EMD). In step 210, the mode decomposition result is converted into Instantaneous Frequencies (IF) and Instantaneous Amplitudes (IA) or instantaneous energy intensities. In step 220, according to the instantaneous frequency values, the Intrinsic Mode Function components are assigned to the frequency bands corresponding to the electrodes. The number of electrodes and the frequency bands corresponding to the electrodes have been preset. The greater the number of electrodes, the stronger the frequency resolution capability, and the better the effect achieved. However, there may be problems such as crosstalk between multiple electrodes, and the length of the implant is limited, so the number of electrodes that can be accommodated is also limited. Therefore, the number of electrodes should be appropriate. The frequencies corresponding to the electrodes should be determined according to the characteristics of the sound. For frequency bands where the sound frequencies are more concentrated (such as lower than 1000 Hz), the electrodes can be set densely to improve the frequency resolution. For frequency bands where the sound frequencies are not concentrated (such as higher than 1000 Hz), the number of electrodes can be set less, on an approximately logarithmic scale. To follow the principle of a limited number of electrodes, the number of electrodes can be selected as 20, for example, and the frequency values we specify are: 80, 100, 128, 160, 200, 256, 320, 400, 512, 640, 800, 1024, 1280, 1600, 2048, 2560, 3200, 4096, 5120, 6400, 8192. The specified 21 frequency values define 20 frequency bands, and every two adjacent frequencies define one frequency band. The first frequency band is 80-100 Hz, the second frequency band is 100-128 Hz, . . . , the 20th frequency band is 6400-8192 Hz. These 20 frequency bands correspond to the electrodes in the cochlear implant, and each electrode corresponds to a frequency band. It can be found from the above frequency values that a scale contains 3 frequencies, which are used to distinguish different frequencies in the same scale. In the present invention, more electrodes will improve the frequency difference, thereby improving the final sound quality. For example, the high cut-off frequency and low cut-off frequency can be changed, and up to 25 electrodes can be deployed in a small total range, and a better frequency difference between electrodes can be achieved. When the number of electrodes is 25, for example, the corresponding frequencies can be as follows: 50, 64, 75, 90, 105, 128, 150, 180, 210, 256, 300, 360, 420, 512, 600, 720, 840, 1024, 1200, 1440, 1680, 2048, 2400, 2880, 3360, 4096. Like the case of 20 electrodes, each electrode corresponds to a frequency band. The frequency band corresponding to the first electrode is 50-64 Hz, the frequency band corresponding to the second electrode is 64-75 Hz, . . . , the frequency band corresponding to the twenty-fifth electrode is 3360-4096 Hz. As the number of electrodes increases, the cochlear implant using the speech processing method of the present invention will gain higher and higher frequency resolution capabilities. Because when the number of electrodes increases, the instantaneous frequency set can be increased accordingly, and the resolution of the electrodes to the sound is improved, so the sound produced by the electrodes will be more realistic. Therefore, when using 88 electrodes, for example, we should be able to fully enjoy piano music, albeit with less colorful (timbre) of sounds because the piano sound for each key is highly nonlinear. After pairing the Intrinsic Mode Function components to the corresponding electrode frequency bands, then, in step 230, the most energetic components are selected from the corresponding electrode frequency bands, the number of selected electrodes is not more than 6 at the present time, and the number could increase when technology warrants, and the energy values of these components are higher than preset threshold. Because when multiple electrodes are stimulated at the same time, crosstalk may occur between the electrodes, so current experiments show that when the number of selected electrodes is not more than 6, the influence between the electrodes is small. In addition, the purpose of threshold setting here is that in speech, because there are pauses between different words, phrases or sentences, electrode stimulation is not needed during the pause, and the energy value of the sound component is low at this time, thus the threshold is used to filter the weak energy components at the pause. The threshold can be selected between 10% and 20% of the average energy of the sound.

[0064] In step 300, corresponding electrode stimulation signals are generated according to the selected components. The following methods can be used to generate electrode signals: Simultaneous Analog Signal (SAS), Compressive Analysis (CA), Continuous Interleaved Sampling (CIS). In step 310, automatic gain control is performed to limit its loudness. The automatic gain control is mainly based on the audiogram of the hearing-impaired patient to obtain the sound perception ability of the patient in different frequency ranges, and then adjust the stimulation signal of electrode corresponding to each frequency according to the patient's hearing test results. This step is optional, only for patients who still have remaining hearing ability. Then, in step 320, the electrode stimulation signal is transmitted to the corresponding electrode. When generating electrode signals, although there are some other methods that also claim to use selective frequency bands, such as Advanced Combinatorial Encoders (ACE), Dynamic Peak Picking, Spectral Peak (SPEAK), Current Steering, etc., but it should be noted that the effects of these methods are not obvious, because the implementation of these methods is based on the Fourier filter bank, which is always affected by virtual harmonics. When transmitted to a limited number of electrodes, any electric signal must represent the real neural signal of the sound, but the harmonic signal is not a real sound signal. In hearing aids, the cancellation and combination of harmonics will cause the fundamental wave to be amplified, resulting in louder annoying and yet unclear sound. In the cochlear implants, the harmonics are recetified, and they cannot be eliminated by combination and cancellation, which will cause unnecessary noise. Therefore, if the sound is full of harmonics (such as the sound in an instrument), the problem will become worse. These harmonics will all be intertwined and become inseparable, making music appreciation impossible.

[0065] Compared with the cochlear implant speech processing method based on the Fourier principle, the present invention has the advantages of: (1) The frequencies in the present invention are instantaneous frequencies, so it is not limited by the uncertainty principle; while the Fourier transform is an integral transform, and any method based on integral transform cannot obtain instantaneous frequencies; (2) In the HHT based cochlear implant speech processing method of the present invention, no harmonics will be generated, and each electric signal represents the true neural signal of the sound; while for Fourier based cochlear implants, there are some harmonics in the signal, which cannot be eliminated, resulting in a lot of unnecessary noise; (3) In the present invention, a larger number of electrodes can be used to improve the frequency difference, thereby improving the final sound quality; but cochlear implants based on the Fourier principle have harmonics, even if the number of electrodes is increased, the harmonics cannot be eliminated by combination and cancellation, that is, the final sound quality cannot be improved by increasing the number of electrodes; and (4) In the present invention, the sound components can be selectively amplified according to the patient's hearing test results, to preserve the natural cochlear function of some hearing-impaired patients.

[0066] FIG. 2 shows the speech signal data of the Chinese sentence, ‘Zeng xiansheng zao’.

[0067] FIG. 3 is the sound components diagram of the sound signals in FIG. 2 after being filtered by a Fourier bandpass filter bank. FIG. 3 shows the seven band-pass filter frequency bands used in a typical cochlear implant at present, and the result of the Fourier band-pass filter of 8 components will be given. The envelope of these sound components will be the input of the cochlear implant electrodes. FIG. 4 is a detailed enlarged view of the Fourier time-frequency spectrum of the Chinese sentence ‘Zeng xiansheng zao’ in FIG. 2, and it vividly shows the regularity of harmonics. These harmonics are necessary for the representation of nonlinear signal integrity, but they are not truly natural sounds. When they are superimposed, a non-linear distortion waveform will be produced. However, for cochlear implants that use sound signal component envelopes, harmonics will no longer be superimposed to form the fundamentals, rather harmful noise will be generated at the corresponding frequency.

[0068] FIG. 5 is the 8 frequency bands generated by the filter bank of the sound signals in FIG. 2 after EMD decomposition. FIG. 5 seems to be similar to the filtered result of the band-pass filter bank in FIG. 3, but, as discussed above, the filtered result of the band-pass filter bank itself does not represent the sound well. FIG. 6 is the Hilbert time-frequency spectrum of the Chinese sentence “Zeng xiansheng zao” in FIG. 2, covering a frequency range of 0-10000 Hz. Among them, the energy concentration around 300 Hz represents the vibration of the vocal cords, the main energy concentration between 400-1000 Hz represents the resonance of the articulator, and the high-frequency energy between 2000-5000 Hz represents the reflection of the vocal tract. These frequency ranges depend on the person's mouth shape and size and vary from person to person. These frequencies increase the intensity of the sound. It can be seen from FIG. 6 that only few energy values exceed 1000 Hz. More importantly, there are no harmonics in these high-frequency energies, and the time and frequency values are not limited by the uncertainty principle.

[0069] FIG. 7 is the IMF components diagram obtained by Ensemble Empirical Mode Decomposition, in which the noise level is low (1%), and there are only 2 members in the ensemble. Comparing the IMF components in FIG. 7 and FIG. 5, it can be seen that there is a big difference between the two. Ensemble Empirical Mode Decomposition (EEMD) is a noise-assisted data analysis method for the deficiencies of the Empirical Mode Decomposition (EMD) method. EEMD will effectively solve the frequency mixing phenomenon in EMD.

[0070] FIG. 8 is the IMF components diagram obtained by Ensemble Empirical Mode Decomposition, in which the noise level is high (10%), and there are 16 members in the ensemble. Comparing the IMF components in FIG. 8 with FIG. 7 and FIG. 5, it can be seen that the IMF components in FIG. 8 are very different from the IMF components in FIG. 5 or FIG. 7.

[0071] FIG. 9 is the time-frequency diagram of the 20-electrode frequency band simulation of the IMFs given in FIGS. 5 and 6, because of the near logarithmic distribution of our ears. In FIG. 9, the frequency axis is in logarithmic scale, which is also true for FIGS. 10 and 11. The frequencies corresponding to the 20 electrodes are: 80, 100, 128, 160, 200, 256, 320, 400, 512, 640, 800, 1024, 1280, 1600, 2048, 2560, 3200, 4096, 5120, 6400, 8192. Comparing the Hilbert time-frequency diagrams in FIG. 9 with FIG. 6, FIG. 9 provides more details than shown in FIG. 6. It is similar in quality to the full resolution spectrum in FIG. 6, and can contain many fine frequency features of speech.

[0072] FIG. 10 is the time-frequency diagram of the 20-electrode frequency band simulation of the IMFs given in FIG. 7. The frequencies corresponding to the electrodes are the same as in FIG. 9. Again, FIG. 10 provides more details than shown in FIG. 6, it is similar in quality to the spectrum given in FIG. 9.

[0073] FIG. 11 is the time-frequency diagram of the 20-electrode frequency band simulation of the IMFs given in FIG. 8. The frequencies corresponding to the electrodes are the same as in FIG. 9. Again, FIG. 11 provides more details than shown in FIG. 6, it is similar in quality to the spectrum given in FIG. 9.

[0074] FIG. 5, FIG. 7 and FIG. 8 respectively decompose the sound signals in FIG. 2 using different mode decomposition methods, and obtain the corresponding IMF components after decomposition by different methods. It can be seen from the figures that the IMF components decomposed by different methods are very different, and the envelopes of the corresponding IMF components are also very different. However, after being converted into instantaneous frequencies and instantaneous amplitudes or instantaneous energy intensities (squared amplitudes), the time-frequency diagrams are similar, and the electrode stimulation signals of cochlear implant are related to frequencies and energies. Therefore, different decomposition methods will produce basically the same electrode stimulation signals.

EXAMPLE 2

[0075] Furthermore, in order to save time, any method similar or equivalent to EMD can be used to replace EMD. For example, the running mean or median method of successive running different window sizes is repeatedly applied as needed, as high-pass filtering or other time-domain filtering for filtering the input signals. For example, in the running mean method, there is no guarantee that the signal obtained is a real IMF, which is a requirement for generating accurate and meaningful instantaneous frequencies. However, since spectrum analysis is not used, the approximate value is acceptable. Taking the successive running mean as an example, the steps should be as follows. First, the data is decomposed by successive running mean:

[00003] $\begin{matrix} x (t) - {.Math. x (t) .Math.}_{n 1} = h_{1} (t), {.Math. x (t) .Math.}_{n 1} - {.Math. {.Math. x (t) .Math.}_{n 1} .Math.}_{n 2} = h_{2} (t), {.Math. {.Math. x (t) .Math.}_{n 1} .Math.}_{n 2} - {.Math. {.Math. {.Math. x (t) .Math.}_{n 1} .Math.}_{n 2} .Math.}_{n 3} = h_{3} (t), .Math. {.Math. {.Math. {.Math. {.Math. x (t) .Math.}_{n 1} .Math.}_{n 2} .Math.}_{n 3} .Math. .Math.}_{N - 1} - {.Math. {.Math. {.Math. {.Math. x (t) .Math.}_{n 1} .Math.}_{n 2} .Math.}_{n 3} .Math. .Math.}_{N} = h_{N} (t) .Math. x (t) = {.Math.}_{j = 1}^{N} h_{j} (t) + {.Math. {.Math. {.Math. {.Math. x (t) .Math.}_{n 1} .Math.}_{n 2} .Math.}_{n 3} .Math. .Math.}_{N} & (3) \end{matrix}$

in which <F>.sub.nj represents the running mean of the window size nj (or the running median, reused if necessary). The advantage of using a rectangular filter is that the filter is adaptive, and the response function of the rectangular filter is well known. In addition, the repeated use of the rectangular filter actually changes the known response function of the filter. Repeating twice will produce a triangular filter, and repeating more than four times will produce a response function close to the Gaussian shape. The key parameter for using this filter is the size of the window. According to formula (3), we draw the following conclusions, if the sampling frequency is 22050 Hz, the rectangular filter and EMD have the following equivalent relationship:

nj=3˜7,000 Hz (4)

nj=7˜3,500 Hz

nj=15˜1,500 Hz

nj=31 ˜700 Hz

nj=61 ˜350 Hz

nj=121˜180 Hz

nj=241˜90 Hz

nj=481˜45 Hz

There is no need to continue filtering, because in any case we cannot hear sounds with frequencies lower than the next filter step. The disadvantage of using filters is that no filter is clearer than the above-mentioned EMD.

[0076] Selective enlargement or reduction can be realized as in formula (3), and the reconstructed signal y(t) is obtained as:

[00004] $\begin{matrix} y (t) = {.Math.}_{j = 1}^{N} a_{j} \times h_{j} (t) + {.Math. {.Math. {.Math. {.Math. x (t) .Math.}_{n 1} .Math.}_{n 2} .Math.}_{n 3} .Math. .Math.}_{N} & (5) \end{matrix}$

in which the value of a.sub.j can be determined according to patient's audiogram.

[0077] Because EMD is more time-consuming, but even so, its computational complexity can still be comparable to Fourier transform. If the filter method is used, the sound may not be particularly clear, because the mean filter does spread the filtered result in a wider time domain. The final result will not be as clear as the complete EMD method, but the filter method can be simpler and cheaper to implement.

[0078] Referring to FIG. 12, it shows a cochlear implant speech processing system according to an embodiment of the present invention. The speech processing system includes a sound receiving module 10, a sound processing module 20 and a signal transmission module 30. The sound receiving module 10 is configured to receive a sound signal, and convert the sound signal into a digital signal. The sound processing module 20 is configured to perform the following operations: reducing the noise of the received sound digital signal, decomposing the sound signal, and converting the decomposed signal components into instantaneous frequencies and instantaneous amplitudes or instantaneous energy intensities; corresponding the instantaneous frequencies to the electrode frequency bands; selecting several most energetic frequency bands; and generating electrode stimulation signals corresponding to the frequency bands with the highest energy intensity. The principles and detailed steps involved in the key parts of the sound processing module are the same as those listed in the cochlear implant speech processing method. After the sound processing module 20 receives the digital sound signal, a noise reduction unit performs noise suppression on the sound signal and eliminates the cocktail party problem. Then, a sound processing unit will process the sound signal through an adaptive filter bank to obtain a plurality of intrinsic mode functions or pseudo-intrinsic mode functions. Among them, the adaptive filter bank includes mode decomposition filter bank, mean filter bank. The mode decomposition filter bank adopts any method in the present invention that can obtain IMF components, such as Empirical Mode Decomposition (EMD), Ensemble Empirical Mode Decomposition (EEMD), or Conjugate Adaptive Dyadic Masking Empirical Mode Decomposition (CADM-EMD). In addition to the above-mentioned various empirical mode decomposition methods and improved signal decomposition methods based on them, an adaptive filter bank such as mean filter bank can also be used to obtain pseudo-IMFs. The IMFs or pseudo-IMFs obtained by the adaptive filter bank are converted into instantaneous frequencies and instantaneous amplitudes or instantaneous energy intensities. The obtained instantaneous frequencies are corresponded to the electrode frequency bands of the preset frequency value, at most 6 most energetic components are selected from the corresponding electrode frequency bands, and the energies in these frequency bands are greater than the preset threshold value. Then, the corresponding electrode stimulation signals are generated according to the selected components, and the loudness of each signal component is controlled through automatic gain control. When performing automatic gain control on the signal, it is possible to control the amplification of each frequency component according to the patient's audiogram, which can preserve patient's natural cochlear function. The signal transmission module 30 is configured to transmit the electrode stimulation signals generated by the sound processing module to the electrodes in the cochlear implant, so that the electrodes can correctly generate stimulation signals corresponding to the sound in real time.

[0079] The above are only the preferred embodiments of the present invention, and do not limit the present invention in any form. Although the present invention has been disclosed as above in preferred embodiments, it is not intended to limit the present invention. Anyone who is familiar with the field, without departing from the scope of the technical solution of the present invention, can use the technical content disclosed above to make slight changes or modifications into equivalent embodiments with equivalent changes. Any simple modifications, equivalent changes and variations made to the above embodiments based on the technical essence of the present invention without departing from the technical solution of the present invention still fall within the scope of the technical solution of the present invention.

Speech Processing Method and System in A Cochlear Implant

Assignee

Inventors

Cpc classification

Classification Explorer

A61N1/36038

HUMAN NECESSITIES

Classification Explorer

G10L25/51

PHYSICS

Classification Explorer

A61N1/0541

HUMAN NECESSITIES

Classification Explorer

G10L25/48

PHYSICS

Classification Explorer

G10L21/16

PHYSICS

Classification Explorer

G10L2021/02087

PHYSICS

Classification Explorer

G10L21/0208

PHYSICS

International classification

Classification Explorer

G10L21/0208

PHYSICS

Classification Explorer

A61N1/05

HUMAN NECESSITIES

Classification Explorer

G10L25/51

PHYSICS

Abstract

Claims

Description