G10L13/02

Systems and Methods for Performing Video Communication Using Text-Based Compression

Systems and method for video communication using text-based compression in accordance with embodiments of the invention are illustrated. One embodiment includes a method for video communication using text-based compression, where the method includes receiving a file comprising captured audio-video content, encoding captured audio-video content to a text transcript using an encoder, transmitting initialization data, and the text transcript to a decoder, initializing a facial animation model and a text-to-speech (TTS) system using the initialization data, and reconstructing an animated version of the captured audio-video content using the text transcript, the facial animation model and the TTS system at the decoder.

Systems and Methods for Performing Video Communication Using Text-Based Compression

Systems and method for video communication using text-based compression in accordance with embodiments of the invention are illustrated. One embodiment includes a method for video communication using text-based compression, where the method includes receiving a file comprising captured audio-video content, encoding captured audio-video content to a text transcript using an encoder, transmitting initialization data, and the text transcript to a decoder, initializing a facial animation model and a text-to-speech (TTS) system using the initialization data, and reconstructing an animated version of the captured audio-video content using the text transcript, the facial animation model and the TTS system at the decoder.

Speech style transfer

Computer-implemented methods for speech synthesis are provided. A speech synthesizer may be trained to generate synthesized audio data that corresponds to words uttered by a source speaker according to speech characteristics of a target speaker. The speech synthesizer may be trained by time-stamped phoneme sequences, pitch contour data and speaker identification data. The speech synthesizer may include a voice modeling neural network and a conditioning neural network.

Speech style transfer

Computer-implemented methods for speech synthesis are provided. A speech synthesizer may be trained to generate synthesized audio data that corresponds to words uttered by a source speaker according to speech characteristics of a target speaker. The speech synthesizer may be trained by time-stamped phoneme sequences, pitch contour data and speaker identification data. The speech synthesizer may include a voice modeling neural network and a conditioning neural network.

TEXT INFORMATION PROCESSING METHOD AND APPARATUS
20220406290 · 2022-12-22 ·

Embodiments of the present application provide a text information processing method and apparatus, the method includes: acquiring a phoneme vector corresponding to an individual phoneme and a semantic vector corresponding to the individual phoneme in text information; acquiring first semantic information output at a last moment, wherein the first semantic information is semantic information corresponding to part of the text information in the text information, and the part of the text information is text information that has been converted into voice information; determining a context vector corresponding to a current moment according to the first semantic information, the phoneme vector corresponding to the individual phoneme and the semantic vector corresponding to the individual phoneme; and determining voice information at the current moment according to the context vector and the first semantic information.

AUTOMATED PROCESS FOR GENERATING NATURAL LANGUAGE DESCRIPTIONS OF RASTER-BASED WEATHER VISUALIZATIONS FOR OUTPUT IN WRITTEN AND AUDIBLE FORM
20220405486 · 2022-12-22 ·

Generating specific and contextualized natural language descriptions based upon raster-based weather visualizations for a defined geographic region. The generated natural language descriptions are provided in a written and/or audible form. In some cases, these natural language descriptions are generated based on weather forecast data sets that indicate a relative motion of certain weather-related events.

AUTOMATED PROCESS FOR GENERATING NATURAL LANGUAGE DESCRIPTIONS OF RASTER-BASED WEATHER VISUALIZATIONS FOR OUTPUT IN WRITTEN AND AUDIBLE FORM
20220405486 · 2022-12-22 ·

Generating specific and contextualized natural language descriptions based upon raster-based weather visualizations for a defined geographic region. The generated natural language descriptions are provided in a written and/or audible form. In some cases, these natural language descriptions are generated based on weather forecast data sets that indicate a relative motion of certain weather-related events.

CONFERENCE TERMINAL AND EMBEDDING METHOD OF AUDIO WATERMARKS
20220406317 · 2022-12-22 · ·

A conference terminal and an embedding method of audio watermarks are provided. In the method, a first speech signal and a first audio watermark signal are received respectively. The first speech signal relates to a speaker corresponding to another conference terminal, and the first audio watermark signal corresponds to the another conference terminal. The first speech signal is assigned to a host path to output a second speech signal. The first audio watermark signal is assigned to an offload path to output a second audio watermark signal. The host path provides more digital signal processing (DSP) effects than the offload path. The second speech signal and the second audio watermark signal are synthesized to output a synthesized audio signal. The synthesized audio signal is adapted for audio playback. A completed audio watermark signal is outputted accordingly.

CONFERENCE TERMINAL AND EMBEDDING METHOD OF AUDIO WATERMARKS
20220406317 · 2022-12-22 · ·

A conference terminal and an embedding method of audio watermarks are provided. In the method, a first speech signal and a first audio watermark signal are received respectively. The first speech signal relates to a speaker corresponding to another conference terminal, and the first audio watermark signal corresponds to the another conference terminal. The first speech signal is assigned to a host path to output a second speech signal. The first audio watermark signal is assigned to an offload path to output a second audio watermark signal. The host path provides more digital signal processing (DSP) effects than the offload path. The second speech signal and the second audio watermark signal are synthesized to output a synthesized audio signal. The synthesized audio signal is adapted for audio playback. A completed audio watermark signal is outputted accordingly.

RESPONSE METHOD, TERMINAL, AND STORAGE MEDIUM
20220399013 · 2022-12-15 ·

A response method, a terminal, and a storage medium. The response method comprises: determining, at a first time point by means of speech recognition processing, a first target text corresponding to the first time point (1001); determining, according to the first target text, a first predicted intention and an answer to be pushed, wherein said answer is used for responding to speech information (1002); continuing to determine, by means of the speech recognition processing, a second target text corresponding to a second time point and a second predicted intention, wherein the second time point is the next successive time point of the first time point (1003); determining, according to the first predicted intention and the second predicted intention, whether a preset response condition is satisfied (1004); and responding according to said answer if the preset response condition is determined to be satisfied (1005).