Method and apparatus for sound enhancement
11570553 · 2023-01-31
Assignee
Inventors
Cpc classification
H04M3/568
ELECTRICITY
H04M3/42391
ELECTRICITY
International classification
Abstract
A method and apparatus for sound enhancement are provided in this invention. The method comprises: obtaining sound signals and converting the sound signals into digital signals; decomposing the digital signals to obtain a plurality of IMFs or pseudo-IMFs; selectively amplifying the amplitudes of the IMFs and pseudo-IMFs; reconstituting the selectively amplified IMFs or pseudo-IMFs to obtain reconstituted signals and converting the reconstituted signals into analog signals. The present invention is based on the Hilbert-Huang transform. Through the present invention, the sound can be selectively amplified, and only the high-frequency consonants in the sound are amplified without vowel, which effectively improves the clarity of the enhanced sound. The present invention overcomes the problems in the current sound enhancement method which makes the sound louder without increasing the clarity.
Claims
1. A sound enhancement method comprising: (1) obtaining sound signals and converting the sound signals into digital signals; (2) decomposing the digital signals by a mode decomposition method to obtain a plurality of Intrinsic Mode Function components (IMFs), wherein the IMFs represent amplitude changes of the digital signals converted from the sound signals at different frequencies over time; (3) selectively amplifying the amplitudes of the IMFs obtained in step (2); (4) reconstituting the selectively amplified IMFs to obtain reconstituted signals; (5) converting the reconstituted signals into analog signals.
2. The sound enhancement method of claim 1, wherein the mode decomposition method includes Empirical Mode Decomposition (EMD), Ensemble Empirical Mode Decomposition (EEMD), Conjugate Adaptive Dyadic Masking Empirical Mode Decomposition (CADM-EMD).
3. The sound enhancement method of claim 1, wherein when the amplitudes of the IMFs are amplified in step (3), the amplification frequency band and the amplification factors are determined according to the hearing-impaired patient's audiogram.
4. The sound enhancement method of claim 1, wherein when the amplitudes of the IMFs are amplified in step (3), the IMFs in the frequency band of the consonants are amplified.
5. A sound enhancement method comprising: (1) obtaining sound signals and converting the sound signals into digital signals; (2) decomposing the digital signals by an adaptive filter bank to obtain a plurality of pseudo-Intrinsic Mode Function components (pseudo-IMFs), wherein the pseudo-IMFs represent the amplitude changes of the digital signals converted from the sound signals at different frequencies over time; (3) selectively amplifying the amplitudes of the pseudo-IMFs obtained in step (2); (4) reconstituting the selectively amplified pseudo-IMFs to obtain reconstituted signals; (5) converting the reconstituted signals into analog signals.
6. The sound enhancement method of claim 5, wherein the adaptive filter bank is a mean filter bank.
7. The sound enhancement method of claim 5, wherein when the amplitudes of the pseudo-IMFs are amplified in step (3), the amplification frequency band and the amplification factors are determined according to the hearing-impaired patient's audiogram.
8. The sound enhancement method of claim 5, wherein when the amplitudes of the pseudo-IMFs are amplified in step (3), the pseudo-IMFs in the frequency band of the consonants are amplified.
9. The sound enhancement method of claim 1, wherein the sound enhancement method can be applied to a hearing aid, a telephone and a conference call broadcast.
10. The sound enhancement method of claim 5, wherein the sound enhancement method can be applied to a hearing aid, a telephone and a conference call broadcast.
11. A sound enhancement apparatus comprising: a sound receiving module, a sound enhancement module and a sound playback module; wherein the sound receiving module is used to receive sound signals and convert the sound signals into digital signals; the sound enhancement module is used to process the digital signals to obtain a plurality of Intrinsic Mode Function components (IMFs) or pseudo-IMFs, selectively amplify the amplitudes of the obtained IMFs or pseudo-IMFs, reconstitute the selectively amplified IMFs or pseudo-IMFs to obtain reconstituted signals, and convert the reconstituted signals into analog signals to obtain enhanced sound signals; the sound playback module is used to play the enhanced sound signals.
12. The sound enhancement apparatus of claim 11, wherein the sound enhancement module includes an adaptive filter bank, an enhancement unit and a reconstituting unit; wherein the adaptive filter bank is used to decompose the digital signals to obtain the IMFs or pseudo-IMFs; the enhancement unit is used to selectively amplify the amplitudes of the IMFs or the pseudo-IMFs; the reconstituting unit is used to reconstitute the amplified IMFs or pseudo-IMFs to obtain the enhanced sound signals.
13. The sound enhancement apparatus of claim 12, wherein the sound enhancement module further includes a tuning unit of gain values, which is used to determine the amplification factors of the sound signal amplitudes needed by a hearing-impaired patient in different frequency bands according to the patient's audiogram, or determine the amplification factors according to the frequency band of the consonants; and then the enhancement unit amplifies the amplitudes of the IMFs or pseudo-IMFs according to the tuning unit of gain values.
14. The sound enhancement apparatus of claim 12, wherein the adaptive filter bank includes a mode decomposition filter bank and a mean filter bank.
15. The sound enhancement apparatus of claim 11, wherein the sound enhancement apparatus can be applied to a hearing aid, a telephone and a conference call broadcast.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
DETAILED DESCRIPTION OF THE INVENTION
(21) In the following, with the reference to the accompanying drawings and the preferred embodiments of the present invention, the technical means adopted by the present invention to achieve the intended purpose of the present invention will be further explained.
(22) As shown in
(23) In order to better explain the sound enhancement method of the present invention, we first take the mode decomposition method as an example. First, a sound signal from a sound source is received (step 100), and the sound signal is digitized (step 110). To save time, the incoming sound is digitized at 22 kHz (step 110). The sampling rate is determined based on the following considerations. In speech, vowels and voiced consonants are dominated by vocal cord vibration frequency, which forms the so call fundamental, Fo. The frequency of Fo ranges from 80 to 400 Hz for a deep male voice to a child. While speech can contain spectral information up to 10 kHz, even the Fourier spectral information necessary for distinguishing different consonants and vowels is largely residing below 3000 to 5000 Hz, because many spectrums consist mostly of harmonics that could have much higher frequencies than the actual sound signals. In terms of Hilbert spectral representation without the artificial harmonics, the instantaneous frequency of many sound signals rarely exceeds 1,000 Hz (to be discussed in details later). Therefore, the sampling rate at 22 kHz is sufficient. To further reduce processing cost, the sampling rate could be reduced to 10,000 or even 6,000 Hz. Of course, for extra high fidelity, the full 44 kHz sampling rate is also possible.
(24) This signal could be cleansed by an EMD or median filter to remove spiky noise (step 120). Then the signal is decomposed by EMD (step 130) to obtain the IMFs,
(25)
with x(t) as the original signal, c.sub.j(t) are the Intrinsic Mode Function (IMF) components and r.sub.N(t) is the residual. The properties of the IMFs are orthogonal and the components are dyadically ranked in time scales. The first IMF component typically consists of 3-point oscillations. As the EMD is almost a bank of filter with dichotomic frequency increases, by the time we reach the 5th IMF component, the oscillation should consist of mean wavelength of the order of 48 points. For data with a sampling rate of 22 kHz, this component is equivalent to the frequency of 450 Hz already. We should stop long before this point depending on the patient's condition. For example, for a signal digitized at 22 kHz, the mean frequency for the first 5 components will be
c.sub.1(t): 3-points ˜7,000 Hz
c.sub.2(t): 6-points ˜3,500 Hz
c.sub.3(t): 12-points ˜1,800 Hz
c.sub.4(t): 24-points ˜900 Hz
c.sub.5(t): 48-points ˜450 Hz (2)
(26) We can selectively amplify the high-frequency components depending on the patient's condition irrespective of the underlying frequency values (step 150) and reconstitute the signal as y(t) (step 160):
(27)
(28) Since r.sub.N(t) represents the trend of sound, the frequency of r.sub.N(t) is very low and cannot be recognized, we ignore the residual and the reconstituted signal y(t) can be expressed as:
(29)
(30) wherein a.sub.j is the amplification factor with each value determined individually according to the patient's audiogram test data to fit individual patient. Besides, the values of a.sub.j can be set according to the frequency band of the consonants. Most of the amplification should be selectively put on the high-frequency components, for those components actually represent the consonants that would add clarity to the sound. As most of the hearing-impaired patients should still be able to hear sounds up to around 500 Hz, for all practical purposes, amplification of the first 4 components should be sufficient, if the sound is digitized at 22 kHz. The reconstituted signal y(t) could be converted back to analog form (step 170) and be played back to the listener. It should be noted that a flattening filter might be required here (step 161), for too large amplification factor could cause clipping of the signal and make the reconstituted sound rough.
(31) For higher degree of fidelity, the sampling rate could be set at 44 kHz. In that case, the first IMF component will be 15 kHz, which might be left out to suppress the ambient noise. At any rate, we only have to amplify the first 5 IMF components to get to 450 Hz.
(32) In order to illustrate the advantages of the sound enhancement method of present invention, in
(33) Let us examine the sound of low A from a piano (a percussion instrument). The waveform data of the low A, middle A and high A from the piano are given in
(34) Recently, Huang et al introduced Hilbert Holo-spectral analysis. More specifically, Huang and Yeh introduced a whole set of tools to analyze the acoustic signal pertaining to hearing. If we use the newly developed Holo-spectral representation, the spectra with and without the fundamental in the sound are given in
(35) However, for speech analysis, the full 4-dimensional time dependent Hilbert Holo-spectral representation is too complicate and unwieldy. The simplified time-dependent Instantaneous frequency-based Hilbert spectral and the AM Time-Frequency Hilbert spectral analysis would be sufficient for the present invention. But even that is still too time consuming. The present invention is based on temporal operation only.
(36) The actual implement is further demonstrated in the following example of an un-voiced sound, ‘zi’, pronounced according to the Chinese Roman Phonetic system. The data is given in
(37) The data is decomposed by EMD. The result is given in
(38)
(39) Here, the same high frequency energy density for ‘z’ sound remains; however, the harmonics for the vowel at 8,000 Hz are absent. The energy at 4,000 Hz is not harmonics of any sound, but the reflection of the voice in the vocal tract. The absence of any harmonic at high frequency range leaves only the consonants, which provides us a unique opportunity to amplify the consonants without altering the sound of the vowel part. This is the key technology of this invention. We can amplify the first few IMFs without influencing the vowels (step 150) according to formula given in Equation (4). This is especially true for the IMF 1 and 2.
(40)
(41) The reduced signals (L1z and L2z) simulate the hearing loss to various degrees. For presbycusis patients, the loss is usually only in the consonant not the vowel part. Hearing aids with self-compensation mechanism currently on market would make sound louder but lacking clarity. Importantly, if one selectively amplifies harmonics in the range of 1,000 Hz to 4,000 Hz, it is effective to amplify the fundamentals without involving the consonant part. It is equivalent to amplifying L1z or L2z where the sound will become loud but the clarity will not be improved. The reconstituted signals could be converted back to analog form (step 170) for playing back through the hearing aid amplifier or microphone (step 180). For congenital hearing loss case, the amplification might be more important dependent on individual patient.
(42) It should be pointed out that the principle in hearing aid design is ‘selective amplification’ of the sound. The Fourier approach of amplifying the range around 2,000 to 4,000 Hz effectively amplifies harmonics, which is tantamount to amplifying the fundamentals based on the missing fundamental phenomenon. But the fundamentals do not need amplification at all. Unfortunately, some consonants do not have harmonics, nor any tangible signals in and around 2,000 to 4,000 Hz range. The combined effects in Fourier approach actually amplify the audible vowels, equivalent to amplifying the signal L1z or L2z in
(43) Alternative Implementations
(44) Still further, to save time, the EMD could be substituted by, or using anything equivalent to, repeated applications of successive running means, median means, a separate group of band-pass filters, any filter that could separate the signals into high and low parts, high-pass filters with various window sizes according to the input signals, or other time domain filters. The steps should go like the follows. First decompose the data by successive running mean (or running median):
(45)
(46) in which (x(t)
.sub.nj indicates the running mean filter of window size nj, which has to be an odd number. The h.sub.j(t) is any of the pseudo-IMFs produced by running filters. Furthermore, repeated applications of the boxcar actually change the filter response function remarked. For example, two repetitions would give a triangular response; four and more repetitions would give almost a Gaussian shape response. The key parameter of using such a filter is the window size. Based on the discussion in Equation (2), at 22 kHz sampling rate, we concluded that the following equivalence between the boxcar filter and EMD should exist:
nj=3 ˜7,000 Hz
nj=7 ˜3,500 Hz
nj=15 ˜1,500 Hz
nj=31 ˜700 Hz
nj=61 ˜350 Hz (6)
(47) The disadvantage of the filter is that none of the filter is as sharp as EMD, a point we will return to later. The filter, however, could be used as a cheaper substitute for EMD.
(48) The selective amplification could be implemented like in Equation (4) and the reconstituted signal y(t) be obtained as
(49)
(50) in which the values of a.sub.j could be assigned according to the patient's audiogram test just as in Equation (4).
(51) In order to explain in detail, the application of alternative methods of EMD and the comparison between these alternative methods and EMD, we use the sound data ‘hello’ as an example. The digitized data of the sound ‘hello’ is given in
(52)
(53)
(54) As EMD is more time consuming, even though the computational complexity is comparable to that of Fourier transform. If we use the filter approach, we could get comparable high frequency components as from EMD. The sound might not be as crystal clear, because the mean filter does spread the filtered results over a wider temporal domain (
(55) Additional Implementations
(56) In addition to the hearing aid applications, the sound enhancement based adaptive algorithm of signal decomposition in the present invention can also be used for a communication device, such as a telephone (including a cellphone), a conference call broadcast or any sound transmitting and reproducing device.
(57) Telephone sound is a classical problem for hearing-impaired patients. With the development of the high-quality cellphones, the sound quality has been improved drastically. However, for the hearing-impaired patients, it still could be a challenge. To enhance, denoise and optimization of the sound is highly desirable.
(58) For a conference call broadcast, the fast attenuation of high-frequency components would make the sound reaching the listeners lose its clarity. Therefore, selective amplification of the high frequency would improve the sound quality.
(59) The algorithm in the present invention can be applied to a telephone or a conference call broadcast. The implementation steps are shown in
(60) Even though numerous characteristics and advantages of the present invention have been set forth in the foregoing description, together with details of the structure and features of the invention, the disclosure is illustrative only. Changes may be made in the details, especially in matters of shape, size, and arrangement of parts within the principles of the invention to the full extent indicated by the broad general meaning of the terms in which the appended claims are expressed.