G10L13/033

Speech synthesizer for evaluating quality of synthesized speech using artificial intelligence and method of operating the same
11705105 · 2023-07-18 · ·

A speech synthesizer for evaluating quality of a synthesized speech using artificial intelligence includes a database configured to store a synthesized speech corresponding to text, a correct speech corresponding to the text and a speech quality evaluation model for evaluating the quality of the synthesized speech, and a processor configured to compare a first speech feature set indicating a feature of the synthesized speech and a second speech feature set indicating a feature of the correct speech, acquire a quality evaluation index set including indices used to evaluate the quality of the synthesized speech according to a result of comparison, and determine weights as model parameters of the speech quality evaluation model using the acquired quality evaluation index set and the speech quality evaluation model.

Speech synthesizer for evaluating quality of synthesized speech using artificial intelligence and method of operating the same
11705105 · 2023-07-18 · ·

A speech synthesizer for evaluating quality of a synthesized speech using artificial intelligence includes a database configured to store a synthesized speech corresponding to text, a correct speech corresponding to the text and a speech quality evaluation model for evaluating the quality of the synthesized speech, and a processor configured to compare a first speech feature set indicating a feature of the synthesized speech and a second speech feature set indicating a feature of the correct speech, acquire a quality evaluation index set including indices used to evaluate the quality of the synthesized speech according to a result of comparison, and determine weights as model parameters of the speech quality evaluation model using the acquired quality evaluation index set and the speech quality evaluation model.

Systems and methods for automatic speech translation
11704507 · 2023-07-18 · ·

A method for providing automatic interpretation may include receiving, by a processor, audible speech from a speech source, generating, by the processor, in real-time, a speech transcript by applying an automatic speech recognition model on the speech, segmenting, by the processor, the speech transcript into speech segments based on a content of the speech by applying a segmenter model on the speech transcript, compressing, by the processor, the speech segments based on the content of the speech by applying a compressor model on the speech segments, generating, by the processor, a translation of the speech by applying a machine translation model on the compressed speech segments, and generating, by the processor, audible translated speech based on the translation of the speech by applying a text to speech model on the translation of the speech.

Systems and methods for automatic speech translation
11704507 · 2023-07-18 · ·

A method for providing automatic interpretation may include receiving, by a processor, audible speech from a speech source, generating, by the processor, in real-time, a speech transcript by applying an automatic speech recognition model on the speech, segmenting, by the processor, the speech transcript into speech segments based on a content of the speech by applying a segmenter model on the speech transcript, compressing, by the processor, the speech segments based on the content of the speech by applying a compressor model on the speech segments, generating, by the processor, a translation of the speech by applying a machine translation model on the compressed speech segments, and generating, by the processor, audible translated speech based on the translation of the speech by applying a text to speech model on the translation of the speech.

VOICE INTERACTION METHOD AND ELECTRONIC DEVICE
20230017274 · 2023-01-19 ·

Embodiments of this application provide a voice interaction method and an electronic device, and relate to the field of artificial intelligence AI technologies and the field of voice processing technologies. A specific solution includes: An electronic device may receive first voice information sent by a second user, and the electronic device recognizes the first voice information in response to the first voice information. The first voice information is used to request a voice conversation with a first user. The electronic device may have, on a basis that the electronic device recognizes that the first voice information is voice information of the second user, a voice conversation with the second user by imitating a voice of the first user and in a mode in which the first user has a voice conversation with the second user.

Two-Level Text-To-Speech Systems Using Synthetic Training Data

A method includes obtaining training data including a plurality of training audio signals and corresponding transcripts. Each training audio signal is spoken by a target speaker in a first accent/dialect. For each training audio signal of the training data, the method includes generating a training synthesized speech representation spoken by the target speaker in a second accent/dialect different than the first accent/dialect and training a text-to-speech (TTS) system based on the corresponding transcript and the training synthesized speech representation. The method also includes receiving an input text utterance to be synthesized into speech in the second accent/dialect. The method also includes obtaining conditioning inputs that include a speaker embedding and an accent/dialect identifier that identifies the second accent/dialect. The method also includes generating an output audio waveform corresponding to a synthesized speech representation of the input text sequence that clones the voice of the target speaker in the second accent/dialect.

VOICE CONVERSION METHOD AND RELATED DEVICE
20230223006 · 2023-07-13 ·

A voice conversion method and a related device are provided to implement diversified human voice beautification. A method in embodiments of this application includes: receiving a mode selection operation input by a user, where the mode selection operation is for selecting a voice conversion mode. A plurality of provided selectable modes include: a style conversion mode, for performing speaking style conversion on a to-be-converted first voice; a dialect conversion mode, for adding an accent to or removing an accent from the first voice; and a voice enhancement mode, for implementing voice enhancement on the first voice. The three modes have corresponding voice conversion networks. Based on a target conversion mode selected by the user, a target voice conversion network corresponding to the target conversion mode is selected to convert the first voice, and output a second voice obtained through conversion.

VOICE CONVERSION METHOD AND RELATED DEVICE
20230223006 · 2023-07-13 ·

A voice conversion method and a related device are provided to implement diversified human voice beautification. A method in embodiments of this application includes: receiving a mode selection operation input by a user, where the mode selection operation is for selecting a voice conversion mode. A plurality of provided selectable modes include: a style conversion mode, for performing speaking style conversion on a to-be-converted first voice; a dialect conversion mode, for adding an accent to or removing an accent from the first voice; and a voice enhancement mode, for implementing voice enhancement on the first voice. The three modes have corresponding voice conversion networks. Based on a target conversion mode selected by the user, a target voice conversion network corresponding to the target conversion mode is selected to convert the first voice, and output a second voice obtained through conversion.

Preprocessor System for Natural Language Avatars

A preprocessor for use with natural language processors for control of computerized avatars provides for an embedding of avatar control information in a speech response file of the natural language processor providing avatars with improved perception of emotional intelligence. Rapid avatar response is provided by independent end of speech detection and a response cache bypassing text-to-speech conversion times. The preprocessor may be shared among multiple websites to provide a shared analysis of query optimization.

Preprocessor System for Natural Language Avatars

A preprocessor for use with natural language processors for control of computerized avatars provides for an embedding of avatar control information in a speech response file of the natural language processor providing avatars with improved perception of emotional intelligence. Rapid avatar response is provided by independent end of speech detection and a response cache bypassing text-to-speech conversion times. The preprocessor may be shared among multiple websites to provide a shared analysis of query optimization.