Method and apparatus for sound enhancement

Abstract

A method and apparatus for sound enhancement are provided in this invention. The method comprises: obtaining sound signals and converting the sound signals into digital signals; decomposing the digital signals to obtain a plurality of IMFs or pseudo-IMFs; selectively amplifying the amplitudes of the IMFs and pseudo-IMFs; reconstituting the selectively amplified IMFs or pseudo-IMFs to obtain reconstituted signals and converting the reconstituted signals into analog signals. The present invention is based on the Hilbert-Huang transform. Through the present invention, the sound can be selectively amplified, and only the high-frequency consonants in the sound are amplified without vowel, which effectively improves the clarity of the enhanced sound. The present invention overcomes the problems in the current sound enhancement method which makes the sound louder without increasing the clarity.

Claims

1. A sound enhancement method comprising: (1) obtaining sound signals and converting the sound signals into digital signals; (2) decomposing the digital signals by a mode decomposition method to obtain a plurality of Intrinsic Mode Function components (IMFs), wherein the IMFs represent amplitude changes of the digital signals converted from the sound signals at different frequencies over time; (3) selectively amplifying the amplitudes of the IMFs obtained in step (2); (4) reconstituting the selectively amplified IMFs to obtain reconstituted signals; (5) converting the reconstituted signals into analog signals.

2. The sound enhancement method of claim 1, wherein the mode decomposition method includes Empirical Mode Decomposition (EMD), Ensemble Empirical Mode Decomposition (EEMD), Conjugate Adaptive Dyadic Masking Empirical Mode Decomposition (CADM-EMD).

3. The sound enhancement method of claim 1, wherein when the amplitudes of the IMFs are amplified in step (3), the amplification frequency band and the amplification factors are determined according to the hearing-impaired patient's audiogram.

4. The sound enhancement method of claim 1, wherein when the amplitudes of the IMFs are amplified in step (3), the IMFs in the frequency band of the consonants are amplified.

5. A sound enhancement method comprising: (1) obtaining sound signals and converting the sound signals into digital signals; (2) decomposing the digital signals by an adaptive filter bank to obtain a plurality of pseudo-Intrinsic Mode Function components (pseudo-IMFs), wherein the pseudo-IMFs represent the amplitude changes of the digital signals converted from the sound signals at different frequencies over time; (3) selectively amplifying the amplitudes of the pseudo-IMFs obtained in step (2); (4) reconstituting the selectively amplified pseudo-IMFs to obtain reconstituted signals; (5) converting the reconstituted signals into analog signals.

6. The sound enhancement method of claim 5, wherein the adaptive filter bank is a mean filter bank.

7. The sound enhancement method of claim 5, wherein when the amplitudes of the pseudo-IMFs are amplified in step (3), the amplification frequency band and the amplification factors are determined according to the hearing-impaired patient's audiogram.

8. The sound enhancement method of claim 5, wherein when the amplitudes of the pseudo-IMFs are amplified in step (3), the pseudo-IMFs in the frequency band of the consonants are amplified.

9. The sound enhancement method of claim 1, wherein the sound enhancement method can be applied to a hearing aid, a telephone and a conference call broadcast.

10. The sound enhancement method of claim 5, wherein the sound enhancement method can be applied to a hearing aid, a telephone and a conference call broadcast.

11. A sound enhancement apparatus comprising: a sound receiving module, a sound enhancement module and a sound playback module; wherein the sound receiving module is used to receive sound signals and convert the sound signals into digital signals; the sound enhancement module is used to process the digital signals to obtain a plurality of Intrinsic Mode Function components (IMFs) or pseudo-IMFs, selectively amplify the amplitudes of the obtained IMFs or pseudo-IMFs, reconstitute the selectively amplified IMFs or pseudo-IMFs to obtain reconstituted signals, and convert the reconstituted signals into analog signals to obtain enhanced sound signals; the sound playback module is used to play the enhanced sound signals.

12. The sound enhancement apparatus of claim 11, wherein the sound enhancement module includes an adaptive filter bank, an enhancement unit and a reconstituting unit; wherein the adaptive filter bank is used to decompose the digital signals to obtain the IMFs or pseudo-IMFs; the enhancement unit is used to selectively amplify the amplitudes of the IMFs or the pseudo-IMFs; the reconstituting unit is used to reconstitute the amplified IMFs or pseudo-IMFs to obtain the enhanced sound signals.

13. The sound enhancement apparatus of claim 12, wherein the sound enhancement module further includes a tuning unit of gain values, which is used to determine the amplification factors of the sound signal amplitudes needed by a hearing-impaired patient in different frequency bands according to the patient's audiogram, or determine the amplification factors according to the frequency band of the consonants; and then the enhancement unit amplifies the amplitudes of the IMFs or pseudo-IMFs according to the tuning unit of gain values.

14. The sound enhancement apparatus of claim 12, wherein the adaptive filter bank includes a mode decomposition filter bank and a mean filter bank.

15. The sound enhancement apparatus of claim 11, wherein the sound enhancement apparatus can be applied to a hearing aid, a telephone and a conference call broadcast.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 is a flowchart of the process from sound generation to enhancement to playback in the present invention.

(2) FIG. 2 is the wave forms and Fourier spectra from the sounds of low A, medium A and high A from a piano. The low A sound is used as a demonstration. Notice the numerous harmonics from the non-sinusoidal wave forms.

(3) FIG. 3 is the Fourier spectrograms for the low A sound. FIG. 3a is the Fourier spectrogram for the low A sound with the fundamental (at 220 Hz) and FIG. 3b is the Fourier spectrogram for the low A sound without the fundamental.

(4) FIG. 4 is the Morlet wavelet spectrograms for the low A sound. FIG. 4a is the Morlet wavelet spectrogram for the low A sound with the fundamental (at 220 Hz) and FIG. 4b is the Morlet wavelet spectrogram for the low A sound without the fundamental.

(5) FIG. 5 is the Hilbert Time-frequency spectrum for the low A sound. FIG. 5a is the Hilbert Time-frequency spectrum for the low A sound with the fundamental (at 220 Hz) and FIG. 5b is the Hilbert Time-frequency spectrum for the low A sound without the fundamental.

(6) FIG. 6 is the Hilbert Holo-spectrum of the low A sound with the fundamental (at 220 Hz).

(7) FIG. 7 is the Hilbert Holo-spectrum of the low A sound without the fundamental (at 220 Hz).

(8) FIG. 8 is the marginal spectra from FIGS. 6 and 7.

(9) FIG. 9 is the data from the sound of ‘zi’; in Chinese Roman phonetic, ‘z’ is an unvoiced consonant, followed by the vowel ‘i’.

(10) FIG. 10 is a diagram of the IMF components of the sound data given in FIG. 9.

(11) FIG. 11 is the Fourier spectrogram of the sound ‘zi’ with the sound signal superimposed.

(12) FIG. 12 is the Hilbert spectrum of the sound ‘zi’ with the sound signal superimposed.

(13) FIG. 13 is a comparison of the reconstituted signals of the sound ‘zi’ after amplification or reduction of the high-frequency part.

(14) FIG. 14 is the data from the sound of ‘hello’. Both ‘h’ and ‘lo’ are audible sounds.

(15) FIG. 15 is a diagram of the IMF components of the sound data given in FIG. 14.

(16) FIG. 16 is the Hilbert spectrum of the sound ‘hello’.

(17) FIG. 17 is the Fourier spectrogram of the sound ‘hello’.

(18) FIG. 18a is the comparison between the first IMF and the mean filtered components.

(19) FIG. 18b is a detailed comparison of the differences in the main parts of the signal.

(20) FIG. 19 is a block diagram of an application scenario of a sound enhancement based adaptive algorithm, which is based on the decomposition and selective amplification of signals of communication devices (such as telephones and conference calls).

DETAILED DESCRIPTION OF THE INVENTION

(21) In the following, with the reference to the accompanying drawings and the preferred embodiments of the present invention, the technical means adopted by the present invention to achieve the intended purpose of the present invention will be further explained.

(22) As shown in FIG. 1, a sound enhancement method is disclosed in an embodiment of the present invention. In step 100, a sound signal from a sound source is received. The incoming sound is digitized at a certain sampling rate (step 110). To reduce processing cost and depending on the need, the sampling rate could be reduced to 10,000 or even 6,000 Hz. Of course, for extra high fidelity, 22 kHz or the full 44 kHz sampling rate is also possible. This signal could be cleansed by an EMD or median filter to remove spiky noise (step 120). Then the signal is decomposed by EMD (step 130) or successive running mean filter (step 140) to obtain the IMFs and pseudo-IMFs of the sound signal. The mode decomposition method refers to any mode decomposition method that can obtain the Intrinsic Mode Function components (IMFs) of the signal. The mode decomposition method includes Empirical Mode Decomposition (EMD), Ensemble Empirical Mode Decomposition (EEMD), Conjugate Adaptive Dyadic Masking Empirical Mode Decomposition (CADM-EMD). Still further, the EMDs could be used together with improved signal decomposition methods based on them, such as successive running mean filter, to obtain the pseudo-IMFs. The obtained IMFs or pseudo-IMFs represent amplitude changes of sound data at different frequency scales over time. We can selectively amplify the high frequency components depending on the hearing-impaired patient's condition (step 150) and reconstitute the signal (step 160). It should be noted that a flattening filter might be required here (step 161), for too large amplification factor in step 160 could cause clipping of the signal and make the reconstituted sound rough. In step 170, the digital signal is converted into an analog signal (ie, a sound signal), and the sound is played back to the hearing-impaired patient through a speaker (step 180).

(23) In order to better explain the sound enhancement method of the present invention, we first take the mode decomposition method as an example. First, a sound signal from a sound source is received (step 100), and the sound signal is digitized (step 110). To save time, the incoming sound is digitized at 22 kHz (step 110). The sampling rate is determined based on the following considerations. In speech, vowels and voiced consonants are dominated by vocal cord vibration frequency, which forms the so call fundamental, Fo. The frequency of Fo ranges from 80 to 400 Hz for a deep male voice to a child. While speech can contain spectral information up to 10 kHz, even the Fourier spectral information necessary for distinguishing different consonants and vowels is largely residing below 3000 to 5000 Hz, because many spectrums consist mostly of harmonics that could have much higher frequencies than the actual sound signals. In terms of Hilbert spectral representation without the artificial harmonics, the instantaneous frequency of many sound signals rarely exceeds 1,000 Hz (to be discussed in details later). Therefore, the sampling rate at 22 kHz is sufficient. To further reduce processing cost, the sampling rate could be reduced to 10,000 or even 6,000 Hz. Of course, for extra high fidelity, the full 44 kHz sampling rate is also possible.

(24) This signal could be cleansed by an EMD or median filter to remove spiky noise (step 120). Then the signal is decomposed by EMD (step 130) to obtain the IMFs,

(25) $\begin{matrix} x (t) = {.Math.}_{j = 1}^{N} c_{j} (t) + r_{N} (t) & (1) \end{matrix}$
with x(t) as the original signal, c.sub.j(t) are the Intrinsic Mode Function (IMF) components and r.sub.N(t) is the residual. The properties of the IMFs are orthogonal and the components are dyadically ranked in time scales. The first IMF component typically consists of 3-point oscillations. As the EMD is almost a bank of filter with dichotomic frequency increases, by the time we reach the 5th IMF component, the oscillation should consist of mean wavelength of the order of 48 points. For data with a sampling rate of 22 kHz, this component is equivalent to the frequency of 450 Hz already. We should stop long before this point depending on the patient's condition. For example, for a signal digitized at 22 kHz, the mean frequency for the first 5 components will be
c.sub.1(t): 3-points ˜7,000 Hz
c.sub.2(t): 6-points ˜3,500 Hz
c.sub.3(t): 12-points ˜1,800 Hz
c.sub.4(t): 24-points ˜900 Hz
c.sub.5(t): 48-points ˜450 Hz (2)

(26) We can selectively amplify the high-frequency components depending on the patient's condition irrespective of the underlying frequency values (step 150) and reconstitute the signal as y(t) (step 160):

(27) $\begin{matrix} y (t) = {.Math.}_{j = 1}^{4} a_{j} \times c_{j} (t) + {.Math.}_{j = 5}^{N} c_{j} (t) + r_{N} (t) & (3) \end{matrix}$

(28) Since r.sub.N(t) represents the trend of sound, the frequency of r.sub.N(t) is very low and cannot be recognized, we ignore the residual and the reconstituted signal y(t) can be expressed as:

(29) $\begin{matrix} y (t) = {.Math.}_{j = 1}^{4} a_{j} \times c_{j} (t) + {.Math.}_{j = 5}^{N} c_{j} (t) & (4) \end{matrix}$

(30) wherein a.sub.j is the amplification factor with each value determined individually according to the patient's audiogram test data to fit individual patient. Besides, the values of a.sub.j can be set according to the frequency band of the consonants. Most of the amplification should be selectively put on the high-frequency components, for those components actually represent the consonants that would add clarity to the sound. As most of the hearing-impaired patients should still be able to hear sounds up to around 500 Hz, for all practical purposes, amplification of the first 4 components should be sufficient, if the sound is digitized at 22 kHz. The reconstituted signal y(t) could be converted back to analog form (step 170) and be played back to the listener. It should be noted that a flattening filter might be required here (step 161), for too large amplification factor could cause clipping of the signal and make the reconstituted sound rough.

(31) For higher degree of fidelity, the sampling rate could be set at 44 kHz. In that case, the first IMF component will be 15 kHz, which might be left out to suppress the ambient noise. At any rate, we only have to amplify the first 5 IMF components to get to 450 Hz.

(32) In order to illustrate the advantages of the sound enhancement method of present invention, in FIGS. 2 to 8, we have compared the Fourier spectra, Morlet wavelet spectra and Hilbert time-frequency spectra. By comparing the spectrograms of different methods, we will first demonstrate the details of the hearing mechanism using the examples of missing fundamental, which will serve to illustrate the failure of the current harmonic amplification approach.

(33) Let us examine the sound of low A from a piano (a percussion instrument). The waveform data of the low A, middle A and high A from the piano are given in FIG. 2 along with their corresponding Fourier spectra. Notice the distorted wave with non-sinusoidal forms in the signals on the left panels. The distorted wave forms would generate harmonics as shown in the accompanying Fourier spectra in the right panels. We will use the low A sound as our example here. FIGS. 3a and 4a show the Fourier spectrogram and Morlet wavelet spectrum comprising fundamental respectively. The fundamental can be removed by a notched filter but the filtered signal still is perceived as the fundamental sound, after the removal of the fundamental. The Fourier spectrogram and wavelet spectrum (FIGS. 3b and 4b) indeed both show the case without fundamental. Compared with FIG. 3a, fundamental is absent in FIG. 3b, bur after each of the two is converted into a sound signal, the two sound signals sound the same. The same is true for FIG. 4a and FIG. 4b. Thus, we have the puzzle of missing fundamental. If we switch to the adaptive HHT analysis, FIGS. 5a and 5b show the Hilbert spectra with and without the fundamental, respectively. The Hilbert spectrum result still shows the existence of a faint fundamental after removing the fundamental in FIG. 5b, but this weak energy density could not explain why the listener can hear the sound. It has been long recognized that the perceived sound actually came from the periodicity of the envelope. Unfortunately, there is no traditional tool to determine the frequency content of the envelope rigorously and objectively. As a result, the perceived sound is currently defined solely by subjective ‘pitch’.

(34) Recently, Huang et al introduced Hilbert Holo-spectral analysis. More specifically, Huang and Yeh introduced a whole set of tools to analyze the acoustic signal pertaining to hearing. If we use the newly developed Holo-spectral representation, the spectra with and without the fundamental in the sound are given in FIGS. 6 and 7, respectively. FIG. 6 shows the Holo-spectrum of the Low A sound with the fundamental. Notice a strong modulating AM frequency around 220 Hz covering almost all FM frequency range. There is also a strong FM around 220 Hz. FIG. 7 shows the Holo-spectrum of the Low A sound without the fundamental. Notice a strong modulating AM frequency around 220 Hz covering almost all FM frequency range still remains. The strong FM around 220 Hz is missing now that indicates the missing fundamental in the filtered data. We further compute the marginal Holo-spectra from both FIGS. 6 and 7, and the result is given in FIG. 8. The AM energy densities in both cases are the dominant ones, even with the fundamental missing. Here the dominance of the modulation frequency is clearly shown for the cases with or without the fundamental, even though in the FM projection the fundamental is missing in the filtered data. The dominant frequency of either FM or AM is the perceived sound. Thus, we have demonstrated the prowess of HHT in acoustic signal analysis, and the effect of missing fundamental of amplifying the fundamental by the sum of harmonics.

(35) However, for speech analysis, the full 4-dimensional time dependent Hilbert Holo-spectral representation is too complicate and unwieldy. The simplified time-dependent Instantaneous frequency-based Hilbert spectral and the AM Time-Frequency Hilbert spectral analysis would be sufficient for the present invention. But even that is still too time consuming. The present invention is based on temporal operation only.

(36) The actual implement is further demonstrated in the following example of an un-voiced sound, ‘zi’, pronounced according to the Chinese Roman Phonetic system. The data is given in FIG. 9, which is the data from the sound of ‘zi’, wherein ‘z’ is an unvoiced consonant, followed by the vowel ‘i’. In fact, it should be noted that the Chinese language contains some of the highest frequency un-voiced sounds (such as z, c, s and j, q, x) that give the hearing aid design special challenge such as the one shown in this example.

(37) The data is decomposed by EMD. The result is given in FIG. 10, which is a diagram of the IMF components of the data given in FIG. 9. Notice the high frequency components in the first few IMFs mostly stand for the sound of ‘z’, especially IMF 1 and 2. The block area shows the time period covered by the data given in FIG. 9.

(38) FIG. 11 is the Fourier spectrogram of the sound ‘zi’ with the signal superimposed. In the first 0.15 seconds, the sound is ‘z’, which is of very high frequency starting from near 8000 Hz and almost reaching 20,000 Hz. The vowel part starts later and is full of harmonics. There are dense fine harmonics within the first 2000 Hz range. Then, there are other high energy density zones at around 4,000 to 5,000 Hz, and 8,000 to 10,000 Hz. With all the drawbacks of the Fourier analysis when applied to nonlinear and nonstationary data, we will make a comparison of the HHT based Hilbert spectral analysis results in FIG. 12.

(39) Here, the same high frequency energy density for ‘z’ sound remains; however, the harmonics for the vowel at 8,000 Hz are absent. The energy at 4,000 Hz is not harmonics of any sound, but the reflection of the voice in the vocal tract. The absence of any harmonic at high frequency range leaves only the consonants, which provides us a unique opportunity to amplify the consonants without altering the sound of the vowel part. This is the key technology of this invention. We can amplify the first few IMFs without influencing the vowels (step 150) according to formula given in Equation (4). This is especially true for the IMF 1 and 2.

(40) FIG. 13 is a comparison of the reconstituted signals (step 160) after amplification or reduction. The amplificated signals (H1z and H2z) represent different amplification factors for the high frequency IMFs, which illustrate the individualized selective amplification effects of the new inventive hearing aid on different patients. Compared to the original signal, we can see the amplification only amplifies the consonant part selectively but leaves the vowel part unchanged.

(41) The reduced signals (L1z and L2z) simulate the hearing loss to various degrees. For presbycusis patients, the loss is usually only in the consonant not the vowel part. Hearing aids with self-compensation mechanism currently on market would make sound louder but lacking clarity. Importantly, if one selectively amplifies harmonics in the range of 1,000 Hz to 4,000 Hz, it is effective to amplify the fundamentals without involving the consonant part. It is equivalent to amplifying L1z or L2z where the sound will become loud but the clarity will not be improved. The reconstituted signals could be converted back to analog form (step 170) for playing back through the hearing aid amplifier or microphone (step 180). For congenital hearing loss case, the amplification might be more important dependent on individual patient.

(42) It should be pointed out that the principle in hearing aid design is ‘selective amplification’ of the sound. The Fourier approach of amplifying the range around 2,000 to 4,000 Hz effectively amplifies harmonics, which is tantamount to amplifying the fundamentals based on the missing fundamental phenomenon. But the fundamentals do not need amplification at all. Unfortunately, some consonants do not have harmonics, nor any tangible signals in and around 2,000 to 4,000 Hz range. The combined effects in Fourier approach actually amplify the audible vowels, equivalent to amplifying the signal L1z or L2z in FIG. 13. The patients would not gain any clarity benefit but only loudness, exactly the common complaint of the users of the current Fourier based hearing aids.

(43) Alternative Implementations

(44) Still further, to save time, the EMD could be substituted by, or using anything equivalent to, repeated applications of successive running means, median means, a separate group of band-pass filters, any filter that could separate the signals into high and low parts, high-pass filters with various window sizes according to the input signals, or other time domain filters. The steps should go like the follows. First decompose the data by successive running mean (or running median):

(45) $\begin{matrix} x (t) - {.Math. x (t) .Math.}_{n 1} = h_{1} (t), {.Math. x (t) .Math.}_{n 1} - {.Math. {.Math. x (t) .Math.}_{n 1} .Math.}_{n 2} = h_{2} (t), {.Math. {.Math. x (t) .Math.}_{n 1} .Math.}_{n 2} - {.Math. {.Math. {.Math. x (t) .Math.}_{n 1} .Math.}_{n 2} .Math.}_{n 3} = h_{3} (t), .Math. {.Math. {.Math. {.Math. {.Math. x (t) .Math.}_{n 1} .Math.}_{n 2} .Math.}_{n 3} .Math. .Math.}_{N - 1} - {.Math. {.Math. {.Math. {.Math. x (t) .Math.}_{n 1} .Math.}_{n 2} .Math.}_{n 3} .Math. .Math.}_{N} = h_{N} (t) x (t) = {.Math.}_{j = 1}^{N} h_{j} (t) + {.Math. {.Math. {.Math. {.Math. x (t) .Math.}_{n 1} .Math.}_{n 2} .Math.}_{n 3} .Math. .Math.}_{N} & (5) \end{matrix}$

(46) in which custom character (x(t).sub.nj indicates the running mean filter of window size nj, which has to be an odd number. The h.sub.j(t) is any of the pseudo-IMFs produced by running filters. Furthermore, repeated applications of the boxcar actually change the filter response function remarked. For example, two repetitions would give a triangular response; four and more repetitions would give almost a Gaussian shape response. The key parameter of using such a filter is the window size. Based on the discussion in Equation (2), at 22 kHz sampling rate, we concluded that the following equivalence between the boxcar filter and EMD should exist:
nj=3 ˜7,000 Hz
nj=7 ˜3,500 Hz
nj=15 ˜1,500 Hz
nj=31 ˜700 Hz
nj=61 ˜350 Hz (6)

(47) The disadvantage of the filter is that none of the filter is as sharp as EMD, a point we will return to later. The filter, however, could be used as a cheaper substitute for EMD.

(48) The selective amplification could be implemented like in Equation (4) and the reconstituted signal y(t) be obtained as

(49) $\begin{matrix} y (t) = {.Math.}_{j = 1}^{N} a_{j} \times h_{j} (t) + {.Math. {.Math. {.Math. {.Math. x (t) .Math.}_{n 1} .Math.}_{n 2} .Math.}_{n 3} .Math. .Math.}_{N} & (7) \end{matrix}$

(50) in which the values of a.sub.j could be assigned according to the patient's audiogram test just as in Equation (4).

(51) In order to explain in detail, the application of alternative methods of EMD and the comparison between these alternative methods and EMD, we use the sound data ‘hello’ as an example. The digitized data of the sound ‘hello’ is given in FIG. 14, and both ‘h’ and ‘lo’ are audible sounds. The EMD decomposition data is given in FIG. 15. The most energetic IMF component is IMF3. There are two high frequency IMFs, IMF1 and IMF2. The Hilbert spectrum of the data is given in FIG. 16. The energy density along the 200 Hz signal represents the vibration of the vocal cords; the main energy density between 400 to near 1,000 Hz represents the resonant of the articulators. The high frequency energy between 2,000 and 3,000 Hz are the reflection from the vocal tract. It differs from people to people depending on the physical size and shape of the vocal tract of the speaker. For example, the reflection signal in FIG. 12 is much higher, around 4,000 Hz, indicates the speaker is of a smaller physical statue. These high frequency components would add to the timbre of the sound. Furthermore, it is noted that very little energy is above and beyond 1,000 Hz.

(52) FIG. 17 is the Fourier spectrogram of the sound ‘hello’. It can be seen from the figure that it covers all harmonics in all frequency ranges. Based on the ‘missing fundamental’ phenomenon discussed above, amplification of the harmonics is tantamount to amplification of the fundamentals. Therefore, any attempt to amplify frequencies in this range in Fourier analysis would result in exactly what the phenomenon of missing fundamental had demonstrated. The result would be to make the sound louder, but without increasing clarity.

(53) FIG. 18a is the comparison between the first IMF and the filtered components. The filter used here is a running mean filter. Overall, they looked similar. Zoomed details are shown in FIG. 18b, for a detailed comparison of the differences in the main parts of the signal, where the lack of dynamic range in the running mean filter results is obvious. The filter approach does not guarantee IMF properties; therefore, the instantaneous frequency and the envelop produced would be different from the EMD approach. The most critical shortcoming of the filter approach is that the running mean filter would remove some harmonics of sharp features of lower frequency components. As a result, there would be leakage. However, the filter approach is also complete. The sum of the pseudo-IMFs so produced would add up to recover the original data in full. Based on these considerations, the filter approach could provide an acceptable, but cheaper, substitute of EMD produced IMFs. The filter approach could still have exactly the same effect to increase clarity without increasing loudness, for the diminishing of clarity is due to inadequate representation of the TFS (Temporal Fine Structure, also known as Consonants). This is what we accomplish in this implementation. They still look similar, but the filter approach would loss some sharpness and other qualitative details.

(54) As EMD is more time consuming, even though the computational complexity is comparable to that of Fourier transform. If we use the filter approach, we could get comparable high frequency components as from EMD. The sound might not be as crystal clear, because the mean filter does spread the filtered results over a wider temporal domain (FIGS. 18a and 18b show the comparisons among different filters in great details). The end results would be less accurate than the full EMD approach. However, the filter approach could be simpler and cheaper to implement.

(55) Additional Implementations

(56) In addition to the hearing aid applications, the sound enhancement based adaptive algorithm of signal decomposition in the present invention can also be used for a communication device, such as a telephone (including a cellphone), a conference call broadcast or any sound transmitting and reproducing device.

(57) Telephone sound is a classical problem for hearing-impaired patients. With the development of the high-quality cellphones, the sound quality has been improved drastically. However, for the hearing-impaired patients, it still could be a challenge. To enhance, denoise and optimization of the sound is highly desirable.

(58) For a conference call broadcast, the fast attenuation of high-frequency components would make the sound reaching the listeners lose its clarity. Therefore, selective amplification of the high frequency would improve the sound quality.

(59) The algorithm in the present invention can be applied to a telephone or a conference call broadcast. The implementation steps are shown in FIG. 19, in which the key part is the sound enhancement module. FIG. 19 is a block diagram of a sound enhancement apparatus. The sound enhancement apparatus includes a sound receiving module 10, a sound enhancement module 20 and a sound playback module 30. The sound receiving module 10 is configured to receive a sound signal and determine whether the received sound signal is an analog signal or a digital signal. When the received sound signal is an analog signal, the analog signal is converted into a digital signal. The sound enhancement module 20 is configured to selectively amplify the received digital signals. The principle and the detailed steps involved in the key parts of the sound enhancement module are the same as those listed in the hearing aid embodiment. After receiving the digital sound signal, the sound enhancement module 10 processes the sound signal by an adaptive filter bank to obtain a plurality of IMFs or pseudo-IMFs. The adaptive filter bank includes mode decomposition filter bank and mean filter bank. The mode decomposition filter bank refers to any model decomposition method that can obtain the Intrinsic Mode Function components (IMFs) of the signal. The model decomposition method includes Empirical Mode Decomposition (EMD), Ensemble Empirical Mode Decomposition (EEMD), Conjugate Adaptive Dyadic Masking Empirical Mode Decomposition (CADM-EMD). Still further, the EMD could be substituted by, or using anything equivalent to, such as successive running mean filter to obtain the pseudo-IMFs. The obtained IMFs or pseudo-IMFs represent amplitude changes of sound data at different frequency scales over time. The tuning unit of gain values can determine the amplification factors of the sound signal amplitude in different frequency bands according to the measurement results of the hearing impaired. The factors can also be preset according to the frequency range of the consonant. According to the tuning unit of gain values, the obtained IMFs or pseudo-IMFs are selectively amplified. The selectively enhanced IMFs and pseudo-IMFs are reconstituted to obtain an enhanced sound signal. The sound playback module 30 is used for playing the enhanced sound.

(60) Even though numerous characteristics and advantages of the present invention have been set forth in the foregoing description, together with details of the structure and features of the invention, the disclosure is illustrative only. Changes may be made in the details, especially in matters of shape, size, and arrangement of parts within the principles of the invention to the full extent indicated by the broad general meaning of the terms in which the appended claims are expressed.

Method and apparatus for sound enhancement

Assignee

Inventors

Cpc classification

Classification Explorer

H04M3/40

ELECTRICITY

Classification Explorer

H04M3/568

ELECTRICITY

Classification Explorer

H04R25/353

ELECTRICITY

Classification Explorer

H04M3/42391

ELECTRICITY

Classification Explorer

H04R25/356

ELECTRICITY

Classification Explorer

H04R25/505

ELECTRICITY

Classification Explorer

H04R2225/43

ELECTRICITY

International classification

Classification Explorer

H04R25/00

ELECTRICITY

Abstract

Claims

Description