G10L2021/02168

Method, apparatus and server for processing noisy speech

According to an embodiment, a power spectrum iteration factor is determined according to a noisy speech and a background noise, and a moving average power spectrum of the speech is obtained according to the power spectrum iteration factor. A server is able to trace the noisy speech according to the power spectrum iteration factor.

METHOD AND ARRANGEMENT FOR CONTROLLING SMOOTHING OF STATIONARY BACKGROUND NOISE
20180075854 · 2018-03-15 ·

In a method for coding of information for enhancing a background noise representation, voice activity of an input speech signal is determined. A noisiness parameter is determined for an inactive speech signal, wherein the noisiness parameter is based on a ratio of prediction gains of two Linear Predictive Coder (LPC) prediction filters with different orders. The noisiness parameter is quantized, and the quantized noisiness parameter is encoded for transmission.

Headset with end-firing microphone array and automatic calibration of end-firing array
09860634 · 2018-01-02 · ·

In one invention embodiment two microphones are attached to the ear cup and are configured as an end-firing array. The end-firing array suppresses unwanted sounds using an adaptive spectral method and spectral subtraction. According to a second embodiment, Automatic Calibration of an end-firing Microphone Array is provided.

Audio signal processing device, audio signal processing method, and recording medium storing a program
09847097 · 2017-12-19 · ·

An audio signal processing device that includes: a processor configured to execute a procedure, the procedure comprising: detecting a speech segment of an audio signal; suppressing noise in the audio signal; and adjusting an amount of suppression of noise such that the amount of suppression during a specific period, which starts from a position based on a terminal end of the detected speech segment and is a period shorter than a period spanning from the terminal end of the detected speech segment to a starting end of a next speech segment, becomes greater than in other segments, and a memory configured to store audio signals before and after noise suppression and the amount of suppression before and after adjustment.

SPATIO-TEMPORAL SPEECH ENHANCEMENT TECHNIQUE BASED ON GENERALIZED EIGENVALUE DECOMPOSITION

The present invention describes a speech enhancement method using microphone arrays and a new iterative technique for enhancing noisy speech signals under low signal-to-noise-ratio (SNR) environments. A first embodiment involves the processing of the observed noisy speech both in the spatial- and the temporal-domains to enhance the desired signal component speech and an iterative technique to compute the generalized eigenvectors of the multichannel data derived from the microphone array. The entire processing is done on the spatio-temporal correlation coefficient sequence of the observed data in order to avoid large matrix-vector multiplications. A further embodiment relates to a speech enhancement system that is composed of two stages. In the first stage, the noise component of the observed signal is whitened, and in the second stage a spatio-temporal power method is used to extract the most dominant speech component. In both the stages, the filters are adapted using the multichannel spatio-temporal correlation coefficients of the data and hence avoid large matrix vector multiplications.

Methods and systems for providing consistency in noise reduction during speech and non-speech periods
09812149 · 2017-11-07 · ·

Methods and systems for providing consistency in noise reduction during speech and non-speech periods are provided. First and second signals are received. The first signal includes at least a voice component. The second signal includes at least the voice component modified by human tissue of a user. First and second weights may be assigned per subband to the first and second signals, respectively. The first and second signals are processed to obtain respective first and second full-band power estimates. During periods when the user's speech is not present, the first weight and the second weight are adjusted based at least partially on the first full-band power estimate and the second full-band power estimate. The first and second signals are blended based on the adjusted weights to generate an enhanced voice signal. The second signal may be aligned with the first signal prior to the blending.

SYSTEMS, METHODS, APPARATUS, AND COMPUTER-READABLE MEDIA FOR ADAPTIVE FORMANT SHARPENING IN LINEAR PREDICTION CODING
20170301364 · 2017-10-19 ·

An apparatus includes a first calculator configured to determine a long-term noise estimate of the audio signal. The apparatus also includes a second calculator configured to determine a formant-sharpening factor based on the determined long-term noise estimate. The apparatus includes a filter configured to filter a codebook vector to generate a filtered codebook vector. The filter is based on the determined formant-sharpening factor, and the codebook vector is based on information from the audio signal. The apparatus further includes an audio coder configured to generate a formant-sharpened low-band excitation signal based on the filtered codebook vector.

METHOD AND SYSTEM FOR MUTLIPLE TIME RESOLUTION AUDIO PROCESSING
20250061911 · 2025-02-20 ·

Aspects of the present disclosure provided a method for voice control that includes transforming, using a short-time Fourier transform (STFT) applied to data in each window aligned across each input channel of the multichannel audio stream, the multichannel audio stream into a complex valued frequency-domain representation. For a current window, the method further includes: updating a first complex-valued covariance matrix corresponding to a slowly-adapting beamformer and forming a single-channel denoised estimate for each frequency band in the STFT; calculating a voice activity detection (VAD) estimate for each frequency band in the STFT by comparing a magnitude of the single-channel denoised estimate to a magnitude of each input channel of the multichannel audio stream; and selectively updating or refraining from updating, responsive to the VAD estimate respectively indicating a presence or an absence of speech, a second complex-valued covariance matrix corresponding to a quickly-adapting beamformer.

SPATIO-TEMPORAL SPEECH ENHANCEMENT TECHNIQUE BASED ON GENERALIZED EIGENVALUE DECOMPOSITION

The present invention describes a speech enhancement method using microphone arrays and a new iterative technique for enhancing noisy speech signals under low signal-to-noise-ratio (SNR) environments. A first embodiment involves the processing of the observed noisy speech both in the spatial- and the temporal-domains to enhance the desired signal component speech and an iterative technique to compute the generalized eigenvectors of the multichannel data derived from the microphone array. The entire processing is done on the spatio-temporal correlation coefficient sequence of the observed data in order to avoid large matrix-vector multiplications. A further embodiment relates to a speech enhancement system that is composed of two stages. In the first stage, the noise component of the observed signal is whitened, and in the second stage a spatio-temporal power method is used to extract the most dominant speech component. In both the stages, the filters are adapted using the multichannel spatio-temporal correlation coefficients of the data and hence avoid large matrix vector multiplications.

COMFORT NOISE GENERATION APPARATUS AND METHOD
20170092281 · 2017-03-30 ·

A comfort noise generation apparatus constituted of: near and far end speech detectors arranged to detect speech activity in near-end and far-end signals and a comfort noise generator, wherein responsive to an indication from the near-end speech detector that speech activity is absent on the near-end signal and an indication from the far-end silence detector that speech activity is absent on the far-end signal, the comfort noise generator is arranged to initiate a determination of an estimation of near-end background noise, wherein responsive to an indication from the near-end speech detector that speech activity is present on the near-end signal or an indication from the far-end silence detector that speech activity is present on the far-end signal, the comfort noise generator is arranged to terminate the estimation determination of near-end background noise, and wherein the comfort noise generator is arranged to output a function of the near-end background noise estimation.