Patent classifications
G10L21/0264
SPEECH ENHANCEMENT APPARATUS, LEARNING APPARATUS, METHOD AND PROGRAM THEREOF
A mask to enhance speech emitted from a speaker is estimated from an observation signal, the mask is applied to the observation signal, and thereby a post-mask speech signal is acquired. The mask is estimated from a feature obtained by combining a feature for speaker recognition extracted from the observation signal and a feature for generalized mask estimation extracted from the observation signal.
SPEECH ENHANCEMENT APPARATUS, LEARNING APPARATUS, METHOD AND PROGRAM THEREOF
A mask to enhance speech emitted from a speaker is estimated from an observation signal, the mask is applied to the observation signal, and thereby a post-mask speech signal is acquired. The mask is estimated from a feature obtained by combining a feature for speaker recognition extracted from the observation signal and a feature for generalized mask estimation extracted from the observation signal.
SPEECH ENHANCEMENT METHOD AND APPARATUS, DEVICE, AND STORAGE MEDIUM
A speech enhancement method includes: determining a glottal parameter corresponding to a target speech frame according to a frequency domain representation of the target speech frame; determining a gain corresponding to the target speech frame according to a gain corresponding to a historical speech frame of the target speech frame; determining an excitation signal corresponding to the target speech frame according to the frequency domain representation of the target speech frame; and synthesizing the glottal parameter corresponding to the target speech frame, the gain corresponding to the target speech frame, and the excitation signal corresponding to the target speech frame, to obtain an enhanced speech signal corresponding to the target speech frame.
SPEECH ENHANCEMENT METHOD AND APPARATUS, DEVICE, AND STORAGE MEDIUM
A speech enhancement method includes: determining a glottal parameter corresponding to a target speech frame according to a frequency domain representation of the target speech frame; determining a gain corresponding to the target speech frame according to a gain corresponding to a historical speech frame of the target speech frame; determining an excitation signal corresponding to the target speech frame according to the frequency domain representation of the target speech frame; and synthesizing the glottal parameter corresponding to the target speech frame, the gain corresponding to the target speech frame, and the excitation signal corresponding to the target speech frame, to obtain an enhanced speech signal corresponding to the target speech frame.
Method and System for Dereverberation of Speech Signals
A system and method for reverberation reduction is disclosed. A first Deep Neural Network (DNN) produces a first estimate of a target direct-path signal from a mixture of acoustic signals that include the target direct-path signal and a reverberation of the target direct-path signal. A filter modeling a room impulse response (RIR) for the first estimate is estimated. The filter when applied to the first estimate of the target direct-path signal generates a result closest to a residual between the mixture of the acoustic signals and the first estimate of the target direct-path signal according to a distance function. A mixture with reduced reverberation of the target direct-path signal is obtained by removing the result of applying the filter to the first estimate of the target direct-path signal from the received mixture. A second DNN produces a second estimate of the target direct-path signal from the mixture with reduced reverberation.
Method and System for Dereverberation of Speech Signals
A system and method for reverberation reduction is disclosed. A first Deep Neural Network (DNN) produces a first estimate of a target direct-path signal from a mixture of acoustic signals that include the target direct-path signal and a reverberation of the target direct-path signal. A filter modeling a room impulse response (RIR) for the first estimate is estimated. The filter when applied to the first estimate of the target direct-path signal generates a result closest to a residual between the mixture of the acoustic signals and the first estimate of the target direct-path signal according to a distance function. A mixture with reduced reverberation of the target direct-path signal is obtained by removing the result of applying the filter to the first estimate of the target direct-path signal from the received mixture. A second DNN produces a second estimate of the target direct-path signal from the mixture with reduced reverberation.
Methods and apparatus to reduce noise from harmonic noise sources
Methods, apparatus, systems and articles of manufacture are disclosed to reduce noise from harmonic noise sources. An example apparatus includes at least one memory; at least one processor to execute the computer readable instructions to at least: determine a first amplitude value of a frequency component in a frequency spectrum of an audio sample; determine a set of points in the frequency spectrum having at least one of (a) amplitude values within an amplitude threshold of the first amplitude value, (b) frequency values within a frequency threshold of the first amplitude value, or (c) phase values within a phase threshold of the first amplitude value; increment a counter when a distance between (1) a second amplitude value in the set of points and (2) the first amplitude value satisfies a distance threshold; and when the counter satisfies a counter threshold, generate a contour trace based on the set of points.
Methods and apparatus to reduce noise from harmonic noise sources
Methods, apparatus, systems and articles of manufacture are disclosed to reduce noise from harmonic noise sources. An example apparatus includes at least one memory; at least one processor to execute the computer readable instructions to at least: determine a first amplitude value of a frequency component in a frequency spectrum of an audio sample; determine a set of points in the frequency spectrum having at least one of (a) amplitude values within an amplitude threshold of the first amplitude value, (b) frequency values within a frequency threshold of the first amplitude value, or (c) phase values within a phase threshold of the first amplitude value; increment a counter when a distance between (1) a second amplitude value in the set of points and (2) the first amplitude value satisfies a distance threshold; and when the counter satisfies a counter threshold, generate a contour trace based on the set of points.
Detection and removal of wind noise
An electronic device includes one or more microphones that generate audio signals and a wind noise detection subsystem. The electronic device may also include a wind noise reduction subsystem. The wind noise detection subsystem applies multiple wind noise detection techniques to the set of audio signals to generate corresponding indications of whether wind noise is present. The wind noise detection subsystem determines whether wind noise is present based on the indications generated by each detection technique and generates an overall indication of whether wind noise is present. The wind noise reduction subsystem applies one or more wind noise reduction techniques to the audio signal if wind noise is detected. The wind noise detection and reduction techniques may work in multiple domains (e.g., the time, spatial, and frequency domains).
APPROACH FOR DETECTING ALERT SIGNALS IN CHANGING ENVIRONMENTS
In an audio system, an audio signal is preprocessed to provide an input signal to a fast detector and a slow detector, the input signal comprising alert signals and ambient sounds. The slow detector determines the ambient sound level of the input signal which is output to an alert signal detector. The alert signal detector uses the ambient sound level to compute an adaptive threshold level using an adaptive threshold function. The fast detector determines the envelope level of the input signal which is output to the alert signal detector. The alert signal detector compares the envelope level to the adaptive threshold level to determine if an alert signal is present in the input signal. The adaptive threshold level varies depending on the ambient sound level of the input signal and the alert signal detection of the audio system automatically adapts to changing acoustic environments having different ambient sound levels.