System and method for selective enhancement of speech signals

Abstract

A system and method for selectively enhancing an audio signal to make sounds, particularly speech sounds, more distinguishable. The system and method are designed to divide an input auditory signal into a plurality of spectral channels having associated unenhanced signals and perform enhancement processing on a first subset of the spectral channels and not perform enhancement processing on a second subset of the spectral channels. The enhancement processing is performed by determining an output gain for at least the first subset of spectral channels based on a time-varying history of energy of the unenhanced signals associated with each channel in the first subset of the spectral channels and applying the output gain for each of the first subset of the spectral channels to the unenhanced signals to form enhanced signals associated with each of the first subset of the spectral channels. The system and method are then designed to combine the plurality of enhanced signals associated with each of the first subset of the spectral channels and the unenhanced signals associated with each of the second subset of the spectral channels to form a selectively enhanced output auditory signal.

Claims

1. A hearing aid system configured to be coupled with an ear of an individual to selectively enhance an acoustic signal to be received by the ear of the individual, comprising: a microphone configured to receive the acoustic signal and generate an analog electrical signal responsive thereto; an analog-to-digital converter configured to receive the analog electrical signal and convert the analog electrical signal into a digital input signal; a signal processor configured to receive the digital input signal and programmed to: divide the digital input signal into a plurality of spectral channels having associated unenhanced signals; identify a first subset of the spectral channels having associated unenhanced signals corresponding to a pathological response range of the ear of the individual; identify a second subset of the spectral channels having associated unenhanced signals outside the pathological response range of the ear of the individual; perform enhancement processing on the first subset of the spectral channels and not perform enhancement processing on any of the second subset of the spectral channels; and combine the plurality of enhanced signals associated with each of the first subset of the spectral channels and the unenhanced signals associated with each of the second subset of the spectral channels to form a selectively enhanced output signal; and an output device configured to receive the selectively enhanced output signal and communicate the selectively enhanced output signal to the individual.

2. The system of claim 1 wherein the pathological response range corresponds to an audio frequency range within which the ear of the individual has a pathological response.

3. The system of claim 1 wherein the output device includes a speaker.

4. The system of claim 1 wherein the output device includes a cochlear implant.

5. The system of claim 1 wherein the signal processor is configured to use a channel selection criteria designated by a matrix corresponding to the plurality of spectral channels to perform enhancement processing on a first subset of the spectral channels and not perform enhancement processing on a second subset of the spectral channels.

6. The system of claim 5 wherein the matrix includes a block Toeplitz submatrix configured to make the second subset of the spectral channels instantiated by an identity submatrix.

7. The system of claim 1 wherein the signal processor, to perform enhancement processing, is further programmed to: determine an output gain for at least the first subset of spectral channels based on a time-varying history of energy of the unenhanced signals associated with each channel in the first subset of the spectral channels; and apply the output gain for each of the first subset of the spectral channels to the unenhanced signals associated with the respective channel in the first subset of the spectral channels to form enhanced signals associated with each of the first subset of the spectral channels.

8. A method for selectively enhancing an auditory signal, comprising the steps of: (a) dividing an input auditory signal into a plurality of spectral channels having associated unenhanced signals; (b) performing enhancement processing on a first subset of the spectral channels and not performing enhancement processing on any of a second subset of the spectral channels, wherein the enhancement processing includes: (i) determining an output gain for at least the first subset of spectral channels based on a time-varying history of energy of the unenhanced signals associated with each channel in the first subset of the spectral channels; and (ii) applying the output gain for each of the first subset of the spectral channels to the unenhanced signals associated with the respective channel in the first subset of the spectral channels to form enhanced signals associated with each of the first subset of the spectral channels; and (c) combining the plurality of enhanced signals associated with each of the first subset of the spectral channels and the unenhanced signals associated with each of the second subset of the spectral channels to form a selectively enhanced output auditory signal.

9. The method of claim 8 wherein step (b) includes applying a channel selection criteria designated by a matrix corresponding to the plurality of spectral channels.

10. The method of claim 9 wherein the matrix includes a block Toeplitz submatrix configured to make the second subset of the spectral channels instantiated by an identity submatrix.

11. The method of claim 8 wherein a magnitude of the output gain for each of the first subset of spectral channels is inversely related to the history of energy of the unenhanced signals associated with each channel in the first subset of the spectral channels.

12. The method of claim 8 wherein the step (a) includes the step of applying the input auditory signal to a plurality of polyphase multirate filters.

13. The method of claim 8 wherein the step (b)(i) includes the steps of determining a weighted energy history for each channel based on the time varying history of the energy in the channel, converting the weighted energy history into an RMS history weighting value, and determining the output gain for the channel using the RMS history weighting value.

14. The method of claim 13 wherein the step of determining the weighted energy history for each channel includes weighting more recent energy in the channel more heavily than less recent energy in the channel.

15. A system for selectively enhancing an acoustic signal, comprising: a microphone configured to receive an acoustic signal and generate an analog electrical signal responsive thereto; an analog-to-digital converter configured to receive the analog electrical signal and convert the analog electrical signal into a digital input signal; a signal processor configured to receive the digital input signal and programmed to: divide the digital input signal into a plurality of spectral channels having associated unenhanced signals; perform enhancement processing on a first subset of the spectral channels and not perform enhancement processing on a second subset of the spectral channels, the spectral channels in the first subset of the spectral channels and the spectral channels in the second subset of the spectral channels being mutually exclusive; and combine the plurality of enhanced signals associated with each of the first subset of the spectral channels and the unenhanced signals associated with each of the second subset of the spectral channels to form a selectively enhanced output signal; and an output device configured to receive the selectively enhanced output signal and communicate the selectively enhanced output signal.

16. The system of claim 15 wherein the output device includes a speaker configured to communicate the selectively enhanced output signal as an acoustic signal.

17. The system of claim 15 wherein the output device includes a digital-to-analog converter configured to convert the selectively enhanced output signal to an analog electrical output signal.

18. The system of claim 15 wherein the microphone, analog-to-digital converter, and signal processor system are contained in a hearing aid.

19. The system of claim 15 wherein the output device includes a speech recognition system including a display configured to communicate text corresponding to the selectively enhanced output signal.

20. The system of claim 15 wherein the signal processor, to perform enhancement processing, is further programmed to: determine an output gain for at least the first subset of spectral channels based on a time-varying history of energy of the unenhanced signals associated with each channel in the first subset of the spectral channels; and apply the output gain for each of the first subset of the spectral channels to the unenhanced signals associated with the respective channel in the first subset of the spectral channels to form enhanced signals associated with each of the first subset of the spectral channels.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 is a schematic block diagram of an electronic hearing aid device configured to selectively enhance an audio signal in accordance with the present invention.

(2) FIG. 2 is a schematic block diagram of a speech recognition system configured to selectively enhance an audio signal in accordance with the present invention.

(3) FIG. 3 is a flow chart setting forth the steps of a method for selective enhancement of audio signals in accordance with the present invention.

(4) FIG. 4 is a schematic illustration of an exemplary architecture for selectively enhancing an audio signal in accordance with the present invention.

(5) FIGS. 5a-5c are graphs illustrating selective spectral contrast enhancement to a plurality of channels in accordance with the present invention.

DETAILED DESCRIPTION OF THE INVENTION

(6) The present invention provides a system and method for using contrast enhancement (CE) algorithm that is specifically designed to confine enhancement to portions of the spectrum and allow those portions to be selected and highly customized. For example, a CE algorithm may be employed that is designed to enhance spectral differences between adjacent sounds and thereby improve speech intelligibility for hearing impaired (HI) listeners by enhancing signature kinematic properties of connected speech, but is restricted to being applied to portions of the audio spectrum. The CE algorithm may be designed to achieve enhancement of spectral contrast across time, or successive spectral contrast, in addition to enhancement of simultaneous spectral contrast.

(7) The present invention may be employed in electronic hearing aid devices for use by the hearing impaired, particularly for purposes of enhancing the spectrum such that impaired biological signal processing in the auditory brainstem is restored. This process enhances spectral differences between sounds in a fashion mimicking that of non-pathological human auditory systems. The process imitates neural processes of adaptation, suppression, adaptation of suppression, and descending inhibitory pathways, and does not impede functions that are more akin to natural, non-impaired processes by selectively controlling the enhancements. The present invention makes sounds, particularly speech sounds, more distinguishable to listeners and other receivers. Thus, the present invention is applicable to uses other than hearing aids, such as speech recognition systems.

(8) The present invention recognizes that, for many HI listeners, amplification is used to make a signal audible, but because of limited dynamic range, spectral resolution deteriorates at amplified presentation levels. The invention addresses this problem by the manipulation of the spectral composition of the signal to overcome some of the loss of spectral resolution, and to substitute to some extent for additional amplification (which becomes deleterious at higher levels). By selectively applying such enhancements, the present invention avoids the common problems caused by enhancements applied to the entire dynamic spectrum.

(9) Referring to FIGS. 1 and 2, the present invention may include a hearing aid apparatus 10 as illustrated in FIG. 1 or a speech recognition system 30 as illustrated in FIG. 2. For purposes of illustration, a general hearing aid system 10 includes a microphone 12 for receiving audio signals and converting the signals into electrical signals, an amplification and filtering component 14, an analog-to-digital converter 16, a signal processor 18, a digital-to-analog converter 20, additional filters and amplifiers 22, and an output device 24, such as a cochlear implant or a speaker that converts the amplified signal to sound for the hearing impaired listener. Similarly, the speech recognition system 30 may receive sound from a microphone 32 that converts the sound to an analog signal presented to an amplifier and filter 34, the output of which is provided to an analog-to-digital converter 36, which provides digital data to a signal processor 38. The signal processor 38 in this case may be implemented in a general purpose computer. Alternatively, recorded signal data may be provided from a recording system 40 directly to the signal processor 38. The output of the signal processor 38 is provided to a speech recognition system 42, which itself may be a general purpose computer (and the speech recognition system 42 and the signal processor 38 may both be implemented using the same computer), with the output of the speech recognition system 42 provided to output devices 44 (hard copy, video displays, etc.), or to digital storage media 46.

(10) As will be described, the present invention provides a contrast enhancement algorithm and selective control mechanism designed to manipulate the spectral composition of speech sounds across time such that spectral prominences (formants) are spread apart in frequency in an effort to make them sufficiently distinct to overcome spectral blurring that occurs with a combination of SNHL, background noise, increased presentation levels, high-frequency gain, and multichannel compression. However, unlike traditional systems, the present invention recognizes that, although counterintuitive, contrast enhancement, when applied across the entire spectrum and/or when not applied in a highly selective, judicious manner, can actually impede a listener's or other recipient's ability to understand the underlying speech. To provide a high degree of contrast without a corresponding degradation or distortion created by applying contrast enhancement to, for example, portions of the spectrum that may not substantially benefit from enhancement or, when considering the entire spectrum, may ultimately reduce the overall contrast, the present invention is designed to selectively apply enhancement.

(11) Where auditory filtering is relatively normal, any signal manipulation including contrast enhancement distorts information and perceived naturalness. Traditional attempts to improve speech recognition in HI listeners via simultaneous spectral enhancement employed enhancement uniformly across the spectrum, which is one likely reason for their less-than-favorable outcomes. The present invention provides systems and method for customized enhancement so that it is present, for example, only where there is significant hearing loss. For example, for listeners with mild low-frequency hearing loss sloping to moderately severe in the high frequencies, a uniform degree of enhancement might be too great in the low frequencies, thereby unacceptably distorting the signal (e.g., increasing F1 intensity too much, contributing to upward spread of masking of F2), but still insufficient in higher frequencies where it is needed most. Customization of spectral enhancement represents a significant innovation over prior methods.

(12) Referring now to FIG. 3, a flow chart is provided that illustrates the steps of a selective enhancement method 50 in accordance with the present invention. As illustrated, the present method can be broken into a plurality of sub-components, including signal decomposition into a plurality channels 52, selective application of enhancement 54, weighting of channel output according to time via a dynamic compressive gain function 56, weighting of channel gain within frequency neighborhoods via an inhibitory network 58, and signal synthesis 60.

(13) Referring now to FIGS. 3 and 4, the specific steps of this method 50 and an overview of a system architecture 62 for implementing the method 50 will be described. At process block 64, an input signal, x(t), is received and filtered into a plurality of narrowband channels (e.g., 100-Hz bandwidth), H.sub.i(j). Narrow filters are desirable for manipulating amplitudes of individual harmonics including formants to sharpen simultaneous spectral contrast and to enhance successive spectral differences across time. That is, narrow filters are desirable for increasing peak harmonic amplitude and decreasing amplitudes of immediately adjacent harmonics and skewing peak harmonic energy away from where formant energy had been in the immediate past.

(14) Thereafter, at process block 66, channel selection for enhancement is applied. Specifically, after the input acoustic signal, x(t), is divided into a plurality of spectral channels at process block 64, channel selection for enhancement is applied such that only some of the channels are selectively enhanced. It is contemplated that this may be achieved, for example, using a block Toeplitz submatrix. The block Toeplitz submatrix may be constructed such that the spectral channels that remain unprocessed are instantiated by an identity submatrix. The channels that are selectively processed correspond to negative off-diagonal entries, for example, as illustrated in the following exemplary submatrix:

(15) $[\begin{matrix} [\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{matrix}] & \begin{matrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{matrix} & \begin{matrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{matrix} \\ \begin{matrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{matrix} & \begin{matrix} 1 & - 2 & 0 \\ - 2 & 1 & - 2 \\ 0 & - 2 & 1 \end{matrix} & \begin{matrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ - 2 & 0 & 0 \end{matrix} \\ \begin{matrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{matrix} & \begin{matrix} 0 & 0 & - 2 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{matrix} & \begin{matrix} 1 & - 2 & 0 \\ - 2 & 1 & - 2 \\ 0 & - 2 & 1 \end{matrix} \end{matrix}]$

(16) Thereafter, at process block 68, a weighted time history (e.g., 30-300 ms buffer) of the energy passing through each channel to be enhanced is converted into an RMS value. This adaptation stage can be implemented using dynamic compression with a nonlinear convex loss function, such that more recent energy passing through a channel is given greater weight than earlier occurring energy (i.e., a leaky temporal integrator).

(17) At process block 70, the RMS value of the weighted history is converted to a gain factor for the associated channel. For example, the RMS value of the weighted history may be subtracted from unity (1) to yield a gain factor for that channel. The greater the weighted history of energy, the smaller the gain is. Maximum gain (1) is assigned when the weighted history is zero. In this way, processes of adaptation are mimicked and contribute to competition between channels.

(18) Thereafter, at process block 72, processes of lateral inhibition are simulated. This may be achieved in the way gain is balanced across weighted frequency neighborhoods of channels. To this end, it is contemplated that a winner-take-all circuit may be used to simulate a biological network of inhibitory sidebands. Energy in a channel with a relatively high gain factor is increased at the expense of a decrease in adjacent channels with relatively low gain factors. In essence, the channel activities compete on a moment-by-moment basis.

(19) The collective effects of the windowed RMS calculation (dynamic compressive gain) and lateral interactions within frequency neighborhoods results in a form of forward energy suppression specifically designed to enhance the spectrum across time. When an individual channel has relatively high energy in the past, it will tend to suppress its current energy under the condition that its neighboring channels were low in energy. This form of suppression will have the effect of sharpening dynamic modes in the spectrum, especially onsets, while flattening those that are relatively steady state, and in this way, will serve to enhance temporal contrasts. Enhancement of temporal contrasts in speech can especially aid stop consonant perception by emphasizing low-intensity transient energy characteristic of burst onsets and rapid formant transitions.

(20) Consider the case of a single formant traversing frequency. As the formant increases in frequency, the CE algorithm successively attenuates lower-frequency filters through which the spectral prominence has already passed. This has two consequences. First, the shoulder on the low-frequency side of the formant will be sharpened because that is where the most energy was immediately prior. This will serve to sharpen the spectrum as compensation for blurring caused by an impaired cochlea. Second, the effective frequency (center of gravity) of the formant peak will be skewed away from where the formant had been before. Consequently, successive contrast will be imposed on the signal (spreading successive formants apart in frequency). It also is the case that a formant transition will be accelerated via this process. Because the CE algorithm successively attenuates the low-frequency shoulder, the effective slope of the processed formant becomes steeper.

(21) The analysis and synthesis components of the above-described contrast enhancement method and circuit may employ a polyphase decomposition and oversampled discrete Fourier transformed (DFT) modulated filters. That is, as described, the input signal may be first decomposed into a plurality of subbands and CE performed within neighborhoods of subbands, then the subband process can be reversed to reconstruct the output signal. A subband scheme can utilize an analysis filter bank that splits the input into a set of M narrowband signals that are typically downsampled (decimated) by some factor N leading to more efficient processing. Intermediate processing can be performed and the constituents subsequently combined using a synthesis filter bank that is then upsampled (interpolated) by a factor of N. If no intermediate processing is performed, it is generally acknowledged that the input can be perfectly reconstructed at the output of the circuit along with some measure of pure delay. The M subband filters are derived by frequency shifting a well-constructed prototype low-pass filter h[t]. Polyphase decomposition groups the analyzing prototype filter h[t] into M subsequences prior to Fourier transformation. This segmented representation allows rearrangement of the filtering computations and increases the speed of processing approximately M-fold. The output signal is then reconstructed using a synthesis bank containing the inverse DFT matrix and the reconstruction matrix.

(22) Referring again to FIGS. 3 and 5, at process block 74, the channels are combined together with phase information to yield a selectively, contrast-enhanced signal, y(t). Keeping in mind that, as described above, the block Toeplitz submatrix may be constructed such that the spectral channels that remain unprocessed are instantiated by an identity submatrix, the reconstruction of the channels into the selectively contrast-enhanced signal, y(t), is achieved by combining the plurality of enhanced signals and the unenhanced signals.

(23) Specifically, referring to FIGS. 5a-5c and the above-described selective application of the enhancement algorithm, it can be seen that the contrast-enhanced signal, y(t), can be highly controlled such that enhancement is only applied as desirable. Specifically, the block Toeplitz submatrix described above is illustrated as having been flipped in FIG. 5a. The synthetic acoustic signal decomposed into subbands is shown unprocessed in FIG. 5b and is illustrated as having been selectively processed by using the above-described modified Toeplitz matrix. As illustrated when comparing FIGS. 5b and 5c, channels 1 to 30 remain unchanged, and 31 to 100 are significantly sharpened as a consequence of contrast enhancement. Thus, as illustrated, in the present invention, by recognizing that, for example, impairment rarely extends across the entire frequency range of hearing, and providing a highly-controllable mechanisms for controlling enhancement, provides the ability to restrict the contrast enhancement to only the pathological channels. For example, most commonly, hearing loss is most severe at higher frequencies; although, listeners can have selective losses at other frequencies. The present invention allows selection and user-adjustment of those areas that are to be enhanced and those that will remain unenhanced.

(24) The above-described systems and method for selective contrast enhancement may be coupled with a variety of additional processing techniques. For example, nonlinear frequency compression remaps high-frequency information above a certain start frequency into a smaller bandwidth, while leaving low frequencies below the start frequency unaltered. This represents an advance in hearing aid processing. One limitation to this new technology is that spectral contrast between peaks in the spectrum is reduced, thereby exacerbating the already limited spectral resolution of the impaired cochlea. Pre- or post-processing, frequency-compressed speech coupled with the above-described selective CE systems and methods help overcome some of this reduction in spectral contrast and allows one to effectively select the areas of compression and areas of remapped high-frequency information without disturbing areas of the spectrum that an impaired individual is capable of processing substantially normally.

(25) Similarly, because many sources of background noise tend to be stationary and because the present CE algorithm attenuates static spectral features, noise reduction is a natural byproduct of the processing that could augment or replace existing noise reduction strategies (e.g., spectral subtraction). Along a similar line of reasoning, a persistent spectral peak associated with acoustic feedback in hearing aids could be eliminated with the CE algorithm and replace other, less desirable, feedback cancellation strategies, such as, notch filtering and a reduction in much needed high-frequency gain. Yet again, the above-described selective CE systems and methods allow one to select areas of processing and others to remain substantially unprocessed.

(26) Thus, the present invention recognizes that impairments rarely extends across the entire frequency range of hearing. Rather, most commonly, hearing loss is most severe at specific frequencies, such as higher frequencies; although, listeners can have selective losses at other frequencies. Similarly, the present invention recognizes that normal receivers rarely benefit from enhancements or the like being applied across the full listening spectrum. For example, such enhancement signal processing often introduces distortion. With this recognition in place, the present invention provides a system and method to restrict contrast enhancement to only, for example, pathological channels or other designated channels that can benefit from enhancement without being overridden by distortion or other negative effects.

(27) It is understood that the present invention is not limited to the specific applications and embodiments illustrated and described herein, but embraces such modified forms thereof as come within the scope of the following claims.

System and method for selective enhancement of speech signals

Assignee

Inventors

Cpc classification

Classification Explorer

G10L2021/03643

PHYSICS

Classification Explorer

H04R25/505

ELECTRICITY

Classification Explorer

G10L25/90

PHYSICS

International classification

Classification Explorer

H04R25/00

ELECTRICITY

Abstract

Claims

Description