G10L25/15

Voice modification detection using physical models of speech production

A computer may train a single-class machine learning using normal speech recordings. The machine learning model or any other model may estimate the normal range of parameters of a physical speech production model based on the normal speech recordings. For example, the computer may use a source-filter model of speech production, where voiced speech is represented by a pulse train and unvoiced speech by a random noise and a combination of the pulse train and the random noise is passed through an auto-regressive filter that emulates the human vocal tract. The computer leverages the fact that intentional modification of human voice introduces errors to source-filter model or any other physical model of speech production. The computer may identify anomalies in the physical model to generate a voice modification score for an audio signal. The voice modification score may indicate a degree of abnormality of human voice in the audio signal.

TRAINING APPARATUS, METHOD OF THE SAME AND PROGRAM

A training device changes feedback formant frequencies which are formant frequencies of a picked-up speech signal, applies a lowpass filter, converts the picked-up speech signal, adds high-pass noise to the converted speech signal, feeds back the converted speech signal with the high-pass noise added to a subject, calculates a compensatory response vector by using pickup formant frequencies which are formant frequencies of a speech signal acquired by picking up an utterance made by the subject while feeding back a speech signal that has been converted with change of the feedback formant frequencies to the subject, and pickup formant frequencies which are formant frequencies of a speech signal acquired by picking up an utterance made by the subject while feeding back a speech signal that has been converted without change of the feedback formant frequencies to the subject, and determines an evaluation based on the compensatory response vector and a correct compensatory response vector.

Evaluation apparatus, training apparatus, methods and programs for the same

An evaluation device applies a lowpass filter with a cutoff frequency being a first predetermined value or a second predetermined value greater than the first predetermined value with or without change of feedback formant frequencies which are formant frequencies of a picked-up speech signal, converts the picked-up speech signal, feeds back the converted speech signal to a subject, and includes an evaluation unit that calculates a compensatory response vector by using pickup formant frequencies which are formant frequencies of a speech signal acquired by picking up an utterance made by the subject while feeding back a speech signal that has been converted with change of the feedback formant frequencies to the subject, and pickup formant frequencies which are formant frequencies of a speech signal acquired by picking up an utterance made by the subject while feeding back a speech signal that has been converted without change of the feedback formant frequencies to the subject, and determines an evaluation based on a compensatory response vector for each cutoff frequency.

Evaluation apparatus, training apparatus, methods and programs for the same

An evaluation device applies a lowpass filter with a cutoff frequency being a first predetermined value or a second predetermined value greater than the first predetermined value with or without change of feedback formant frequencies which are formant frequencies of a picked-up speech signal, converts the picked-up speech signal, feeds back the converted speech signal to a subject, and includes an evaluation unit that calculates a compensatory response vector by using pickup formant frequencies which are formant frequencies of a speech signal acquired by picking up an utterance made by the subject while feeding back a speech signal that has been converted with change of the feedback formant frequencies to the subject, and pickup formant frequencies which are formant frequencies of a speech signal acquired by picking up an utterance made by the subject while feeding back a speech signal that has been converted without change of the feedback formant frequencies to the subject, and determines an evaluation based on a compensatory response vector for each cutoff frequency.

Method and apparatus for processing speech signal

An apparatus for processing a speech signal is provided. The apparatus includes a communicator comprising communication circuitry configured to transmit and receive data, an actuator comprising actuation circuitry configured to generate vibration and to output a signal, a formant enhancement filter configured to increase a formant of the speech signal, and a controller comprising processing circuitry configured to control the speech signal to be received through the communicator, to estimate at least one formant frequency from the speech signal based on linear predictive coding (LPC), to estimate a bandwidth of the at least one formant frequency, to determine whether the speech signal is a voiced sound or a voiceless sound, to configure the formant enhancement filter based on the at least one formant frequency, the bandwidth of the at least one formant frequency, characteristics of the determined voiced sound or voiceless sound, and signal delivery characteristics of a human body, to apply the formant enhancement filter to the speech signal, and to control the speech signal to which the formant enhancement filter is applied to be output using the actuator through the human body.

Method and apparatus for processing speech signal

An apparatus for processing a speech signal is provided. The apparatus includes a communicator comprising communication circuitry configured to transmit and receive data, an actuator comprising actuation circuitry configured to generate vibration and to output a signal, a formant enhancement filter configured to increase a formant of the speech signal, and a controller comprising processing circuitry configured to control the speech signal to be received through the communicator, to estimate at least one formant frequency from the speech signal based on linear predictive coding (LPC), to estimate a bandwidth of the at least one formant frequency, to determine whether the speech signal is a voiced sound or a voiceless sound, to configure the formant enhancement filter based on the at least one formant frequency, the bandwidth of the at least one formant frequency, characteristics of the determined voiced sound or voiceless sound, and signal delivery characteristics of a human body, to apply the formant enhancement filter to the speech signal, and to control the speech signal to which the formant enhancement filter is applied to be output using the actuator through the human body.

Audio encoder for encoding an audio signal, method for encoding an audio signal and computer program under consideration of a detected peak spectral region in an upper frequency band

An audio encoder for encoding an audio signal having a lower frequency band and an upper frequency band includes: a detector for detecting a peak spectral region in the upper frequency band of the audio signal; a shaper for shaping the lower frequency band using shaping information for the lower band and for shaping the upper frequency band using at least a portion of the shaping information for the lower band, wherein the shaper is configured to additionally attenuate spectral values in the detected peak spectral region in the upper frequency band; and a quantizer and coder stage for quantizing a shaped lower frequency band and a shaped upper frequency band and for entropy coding quantized spectral values from the shaped lower frequency band and the shaped upper frequency band.

Audio encoder for encoding an audio signal, method for encoding an audio signal and computer program under consideration of a detected peak spectral region in an upper frequency band

An audio encoder for encoding an audio signal having a lower frequency band and an upper frequency band includes: a detector for detecting a peak spectral region in the upper frequency band of the audio signal; a shaper for shaping the lower frequency band using shaping information for the lower band and for shaping the upper frequency band using at least a portion of the shaping information for the lower band, wherein the shaper is configured to additionally attenuate spectral values in the detected peak spectral region in the upper frequency band; and a quantizer and coder stage for quantizing a shaped lower frequency band and a shaped upper frequency band and for entropy coding quantized spectral values from the shaped lower frequency band and the shaped upper frequency band.

SYSTEM AND METHOD FOR TONE RECOGNITION IN SPOKEN LANGUAGES
20230186905 · 2023-06-15 ·

There is provided a system and method for recognizing tone patterns in spoken languages using sequence-to-sequence neural networks in an electronic device. The recognized tone patterns can be used to improve the accuracy for a speech recognition system on tonal languages.

SYSTEM AND METHOD FOR TONE RECOGNITION IN SPOKEN LANGUAGES
20230186905 · 2023-06-15 ·

There is provided a system and method for recognizing tone patterns in spoken languages using sequence-to-sequence neural networks in an electronic device. The recognized tone patterns can be used to improve the accuracy for a speech recognition system on tonal languages.