G10L21/057

METHOD AND APPARATUS FOR PROCESSING SPEECH
20200388283 · 2020-12-10 ·

Embodiments of the present disclosure provide a method and apparatus for processing a speech. The method may include: acquiring an original speech; performing speech recognition on the original speech, to obtain an original text corresponding to the original speech; associating a speech segment in the original speech with a text segment in the original text; recognizing an abnormal segment in the original speech and/or the original text; and processing a text segment indicated by the abnormal segment in the original text and/or the speech segment indicated by the abnormal segment in the original speech, to generate a final speech. A speech segment in the original speech is associated with a text segment in the original text to realize visual processing of the speech.

VOICE PROCESSING METHOD, VOICE PROCESSING DEVICE, AND RECORDING MEDIUM
20200365170 · 2020-11-19 ·

A voice processing method realized by a computer includes compressing forward a first steady period of a plurality of steady periods in a voice signal representing voice, and extending forward a transition period between the first steady period and a second steady period of the plurality of steady periods in the voice signal. Each of the plurality of steady periods is a period in which acoustic characteristics are temporally stable. The second steady period is a period immediately after the first steady period and has a pitch that is different from a pitch of the first steady period.

VOICE PROCESSING METHOD, VOICE PROCESSING DEVICE, AND RECORDING MEDIUM
20200365170 · 2020-11-19 ·

A voice processing method realized by a computer includes compressing forward a first steady period of a plurality of steady periods in a voice signal representing voice, and extending forward a transition period between the first steady period and a second steady period of the plurality of steady periods in the voice signal. Each of the plurality of steady periods is a period in which acoustic characteristics are temporally stable. The second steady period is a period immediately after the first steady period and has a pitch that is different from a pitch of the first steady period.

Fluency aid
10835413 · 2020-11-17 · ·

The present disclosure relates to a fluency aid comprising: a first microphone; a second microphone; and an altered auditory feedback, AAF, generator operable to receive a first input signal derived from sound detected by the first microphone and to generate a feedback signal for providing altered auditory feedback to a user of the fluency aid; wherein the fluency aid is configured such that a second input signal derived from sound detected by the second microphone bypasses the AAF generator.

Speaking rhythm transformation apparatus, model learning apparatus, methods therefor, and program

It is intended to accurately convert a speech rhythm. A model storage unit (10) stores a speech rhythm conversion model which is a neural network that receives, as an input thereto, a first feature value vector including information related to a speech rhythm of at least a phoneme extracted from a first speech signal resulting from a speech uttered by a speaker in a first group, converts the speech rhythm of the first speech signal to a speech rhythm of a speaker in a second group, and outputs the speech rhythm of the speaker in the second group. A feature value extraction unit (11) extracts, from the input speech signal resulting from the speech uttered by the speaker in the first group, information related to a vocal tract spectrum and information related to the speech rhythm. A conversion unit (12) inputs the first feature value vector including the information related to the speech rhythm extracted from the input speech signal to the speech rhythm conversion model and obtains the post-conversion speech rhythm. A speech synthesis unit (13) uses the post-conversion speech rhythm and the information related to the vocal tract spectrum extracted from the input speech signal to generate an output speech signal.

Speaking rhythm transformation apparatus, model learning apparatus, methods therefor, and program

It is intended to accurately convert a speech rhythm. A model storage unit (10) stores a speech rhythm conversion model which is a neural network that receives, as an input thereto, a first feature value vector including information related to a speech rhythm of at least a phoneme extracted from a first speech signal resulting from a speech uttered by a speaker in a first group, converts the speech rhythm of the first speech signal to a speech rhythm of a speaker in a second group, and outputs the speech rhythm of the speaker in the second group. A feature value extraction unit (11) extracts, from the input speech signal resulting from the speech uttered by the speaker in the first group, information related to a vocal tract spectrum and information related to the speech rhythm. A conversion unit (12) inputs the first feature value vector including the information related to the speech rhythm extracted from the input speech signal to the speech rhythm conversion model and obtains the post-conversion speech rhythm. A speech synthesis unit (13) uses the post-conversion speech rhythm and the information related to the vocal tract spectrum extracted from the input speech signal to generate an output speech signal.

Audio processing method, audio processing apparatus and computer storage medium

An audio processing method applied to a first terminal is described, and includes: in response to receiving of audio data input by a user at the first terminal, and determination that a voice change function is turned on, determining change parameters; and based on the change parameters, performing change processing on the audio data.

Audio processing method, audio processing apparatus and computer storage medium

An audio processing method applied to a first terminal is described, and includes: in response to receiving of audio data input by a user at the first terminal, and determination that a voice change function is turned on, determining change parameters; and based on the change parameters, performing change processing on the audio data.

Systems, methods and devices for intelligent speech recognition and processing
10475467 · 2019-11-12 · ·

Systems, methods, and devices for intelligent speech recognition and processing are disclosed. According to one embodiment, a method for improving intelligibility of a speech signal may include (1) at least one processor receiving an incoming speech signal comprising a plurality of sound elements; (2) the at least one processor recognizing a sound element in the incoming speech signal to improve the intelligibility thereof; (3) the at least one processor processing the sound element by at least one of modifying and replacing the sound element; and (4) the at least one processor outputting the processed speech signal comprising the processed sound element.

Systems, methods and devices for intelligent speech recognition and processing
10475467 · 2019-11-12 · ·

Systems, methods, and devices for intelligent speech recognition and processing are disclosed. According to one embodiment, a method for improving intelligibility of a speech signal may include (1) at least one processor receiving an incoming speech signal comprising a plurality of sound elements; (2) the at least one processor recognizing a sound element in the incoming speech signal to improve the intelligibility thereof; (3) the at least one processor processing the sound element by at least one of modifying and replacing the sound element; and (4) the at least one processor outputting the processed speech signal comprising the processed sound element.