G10L25/12

METHOD OF ACCESSING A DIAL-UP SERVICE
20170287488 · 2017-10-05 ·

A method of accessing a dial-up service is disclosed. An example method of providing access to a service includes receiving a first speech signal from a user to form a first utterance; recognizing the first utterance using speaker independent speaker recognition; requesting the user to enter a personal identification number; and when the personal identification number is valid, receiving a second speech signal to form a second utterance and providing access to the service.

FREQUENCY DOMAIN PARAMETER SEQUENCE GENERATING METHOD, ENCODING METHOD, DECODING METHOD, FREQUENCY DOMAIN PARAMETER SEQUENCE GENERATING APPARATUS, ENCODING APPARATUS, DECODING APPARATUS, PROGRAM, AND RECORDING MEDIUM

The present invention reduces encoding distortion in frequency domain encoding compared to conventional techniques, and obtains LSP parameters that correspond to quantized LSP parameters for the preceding frame and are to be used in time domain encoding from coefficients equivalent to linear prediction coefficients resulting from frequency domain encoding. When p is an integer equal to or greater than 1, a linear prediction coefficient sequence which is obtained by linear prediction analysis of audio signals in a predetermined time segment is represented as a[1], a[2], . . . , a[p], and ω[1], ω[2], . . . , ω[p] are a frequency domain parameter sequence derived from the linear prediction coefficient sequence a[1], a[2], . . . , a[p], an LSP linear transformation unit (300) determines the value of each converted frequency domain parameter ˜ω[i] (i=1, 2, . . . , p) in a converted frequency domain parameter sequence ˜ω[1], ˜ω[2], . . . , ˜ω[p] using the frequency domain parameter sequence ω[1], ω[2], . . . , ω[p] as input, through linear transformation which is based on the relationship of values between ω[i] and one or more frequency domain parameters adjacent to ω[i].

NOISE CHARACTERIZATION AND ATTENUATION USING LINEAR PREDICTIVE CODING
20170272869 · 2017-09-21 ·

Disclosed herein, among other things, are apparatus and methods for noise characterization and attenuation for hearing assistance devices. In various embodiments, a method of operating a hearing assistance device includes receiving an audio signal using a microphone of the hearing assistance device and identifying a transient in the audio signal. Linear predictive coding (LPC) is used to isolate speech segments and non-speech segments of the transient and fluctuating noise, and the non-speech segments of the transient and fluctuating noise are attenuated to reduce annoyance of the noise and maintain audibility of perceptually important transients in speech.

Speech signal processing apparatus and method for enhancing speech intelligibility

A speech signal processing apparatus and a speech signal processing method for enhancing speech intelligibility are provided. The speech signal processing apparatus includes an input signal gain determiner to determine a gain of an input signal based on a harmonic characteristic of a voiced speech, a voiced speech output unit to output a voiced speech in which a harmonic component is preserved by applying the gain to the input signal, a linear predictive coefficient determiner to determine a linear predictive coefficient based on the voiced speech, and an unvoiced speech preserver to preserve an unvoiced speech of the input signal based on the linear predictive coefficient.

Speech signal processing apparatus and method for enhancing speech intelligibility

A speech signal processing apparatus and a speech signal processing method for enhancing speech intelligibility are provided. The speech signal processing apparatus includes an input signal gain determiner to determine a gain of an input signal based on a harmonic characteristic of a voiced speech, a voiced speech output unit to output a voiced speech in which a harmonic component is preserved by applying the gain to the input signal, a linear predictive coefficient determiner to determine a linear predictive coefficient based on the voiced speech, and an unvoiced speech preserver to preserve an unvoiced speech of the input signal based on the linear predictive coefficient.

KALMAN FILTERING BASED SPEECH ENHANCEMENT USING A CODEBOOK BASED APPROACH

A hearing device for enhancing speech intelligibility, the hearing device includes: an input transducer for providing an input signal comprising a speech signal and a noise signal; a processing unit; an acoustic output transducer coupled to the processing unit, the acoustic output transducer configured to provide an audio output signal based on an output signal form the processing unit; wherein the processing unit is configured to determine one or more parameters of the input signal based on a codebook based approach (CBA) processing; and wherein the processing unit is configured to perform a Kalman filtering of the input signal based on the determined one or more parameters so that the output signal has an enhanced speech intelligibility.

MDCT-based complex prediction stereo coding

The invention provides methods and devices for stereo encoding and decoding using complex prediction in the frequency domain. In one embodiment, a decoding method, for obtaining an output stereo signal from an input stereo signal encoded by complex prediction coding and comprising first frequency-domain representations of two input channels, comprises the upmixing steps of: (i) computing a second frequency-domain representation of a first input channel; and (ii) computing an output channel on the basis of the first and second frequency-domain representations of the first input channel, the first frequency-domain representation of the second input channel and a complex prediction coefficient. The method comprises applying independent band-width limits for the input channels.

VOICE SIGNAL DEREVERBERATION PROCESSING METHOD AND APPARATUS, COMPUTER DEVICE AND STORAGE MEDIUM

A speech signal dereverberation processing method includes extracting an amplitude spectrum feature and a phase spectrum feature of a current frame in an original speech signal, extracting subband amplitude spectrums from the amplitude spectrum feature corresponding to the current frame, determining, based on the subband amplitude spectrums and by using a first reverberation predictor, a reverberation strength indicator corresponding to the current frame, and determining, based on the subband amplitude spectrums and the reverberation strength indicator, and by using a second reverberation predictor, a clean speech subband spectrum corresponding to the current frame.

VOICE SIGNAL DEREVERBERATION PROCESSING METHOD AND APPARATUS, COMPUTER DEVICE AND STORAGE MEDIUM

A speech signal dereverberation processing method includes extracting an amplitude spectrum feature and a phase spectrum feature of a current frame in an original speech signal, extracting subband amplitude spectrums from the amplitude spectrum feature corresponding to the current frame, determining, based on the subband amplitude spectrums and by using a first reverberation predictor, a reverberation strength indicator corresponding to the current frame, and determining, based on the subband amplitude spectrums and the reverberation strength indicator, and by using a second reverberation predictor, a clean speech subband spectrum corresponding to the current frame.

Low energy deep-learning networks for generating auditory features for audio processing pipelines

Low energy deep-learning networks for generating auditory features such as mel frequency cepstral coefficients in audio processing pipelines are provided. In various embodiments, a first neural network is trained to output auditory features such as mel-frequency cepstral coefficients, linear predictive coding coefficients, perceptual linear predictive coefficients, spectral coefficients, filter bank coefficients, and/or spectro-temporal receptive fields based on input audio samples. A second neural network is trained to output a classification based on input auditory features such as mel-frequency cepstral coefficients. An input audio sample is provided to the first neural network. Auditory features such as mel-frequency cepstral coefficients are received from the first neural network. The auditory features such as mel-frequency cepstral coefficients are provided to the second neural network. A classification of the input audio sample is received from the second neural network.