Patent classifications
G10L15/04
VIRTUAL OBJECT LIP DRIVING METHOD, MODEL TRAINING METHOD, RELEVANT DEVICES AND ELECTRONIC DEVICE
A virtual object lip driving method performed by an electronic device includes: obtaining a speech segment and target face image data about a virtual object; and inputting the speech segment and the target face image data into a first target model to perform a first lip driving operation, so as to obtain first lip image data about the virtual object driven by the speech segment. The first target model is trained in accordance with a first model and a second model, the first model is a lip-speech synchronization discriminative model with respect to lip image data, and the second model is a lip-speech synchronization discriminative model with respect to a lip region in the lip image data.
SYSTEM AND METHOD FOR EXTRACTING AND DISPLAYING SPEAKER INFORMATION IN AN ATC TRANSCRIPTION
A system for extracting speaker information in an ATC transcription and displaying the speaker information on a graphical display unit is provided. The system is configured to: segment a stream of audio received from an ATC and other aircraft into a plurality of chunks; determine, for each chunk, if the speaker is enrolled in an enrolled speaker database; when the speaker is enrolled in the enrolled speaker database, decode the chunk using a speaker-dependent automatic speech recognition (ASR) model and tag the chunk with a permanent name for the speaker; when the speaker is not enrolled in the enrolled speaker database, assign a temporary name for the speaker, tag the chunk with the temporary name, and decode the chunk using a speaker independent speech recognition model; format the decoded chunk as text; and signal the graphical display unit to display the formatted text along with an identity for the speaker.
SYSTEM AND METHOD FOR EXTRACTING AND DISPLAYING SPEAKER INFORMATION IN AN ATC TRANSCRIPTION
A system for extracting speaker information in an ATC transcription and displaying the speaker information on a graphical display unit is provided. The system is configured to: segment a stream of audio received from an ATC and other aircraft into a plurality of chunks; determine, for each chunk, if the speaker is enrolled in an enrolled speaker database; when the speaker is enrolled in the enrolled speaker database, decode the chunk using a speaker-dependent automatic speech recognition (ASR) model and tag the chunk with a permanent name for the speaker; when the speaker is not enrolled in the enrolled speaker database, assign a temporary name for the speaker, tag the chunk with the temporary name, and decode the chunk using a speaker independent speech recognition model; format the decoded chunk as text; and signal the graphical display unit to display the formatted text along with an identity for the speaker.
Performance mode control method and electronic device supporting same
An embodiment of the present invention comprises: a communication module for communicating with at least one external device; a microphone for receiving a user utterance; a memory for storing performance mode information having been configured in the electronic device; and a processor electrically connected to the communication module, the microphone, and the memory, wherein the processor is configured to: receive, through the microphone, a second user utterance associated with task execution; transmit first data associated with the second user utterance to an external device; receive, from the external device, second data associated with at least a part of processing of the first data; identify a first work load allocated to the electronic device at the time of receiving the second data; and compare a second work load required for processing the second data and the first work load, so as to control the performance mode. In addition, various embodiments recognized through the specification are possible.
Performance mode control method and electronic device supporting same
An embodiment of the present invention comprises: a communication module for communicating with at least one external device; a microphone for receiving a user utterance; a memory for storing performance mode information having been configured in the electronic device; and a processor electrically connected to the communication module, the microphone, and the memory, wherein the processor is configured to: receive, through the microphone, a second user utterance associated with task execution; transmit first data associated with the second user utterance to an external device; receive, from the external device, second data associated with at least a part of processing of the first data; identify a first work load allocated to the electronic device at the time of receiving the second data; and compare a second work load required for processing the second data and the first work load, so as to control the performance mode. In addition, various embodiments recognized through the specification are possible.
METHODS TO EMPLOY COMPACTION IN ASR SERVICE USAGE
Systems and methods for processing audio streams are disclosed herein. An audio stream including speech content is received. The audio stream is compacted to generate a compacted audio stream and the compacted audio stream is transmitted to an automatic speech recognition (ASR) service for transcription of the speech content to text content. In response to transmitting the compacted audio stream for transcription, text content, a transcription of the audio stream, is received from the ASR service.
METHOD FOR ANIMATION SYNTHESIS, ELECTRONIC DEVICE AND STORAGE MEDIUM
A method for animation synthesis includes: obtaining an audio stream to be processed and a syllable sequence, wherein both the audio stream and the syllable sequence correspond to the same text and each syllable in the syllable sequence is pinyin of each character of the text; obtaining a phoneme information sequence of the audio stream by performing phoneme detection on the audio stream, wherein each piece of phoneme information in the phoneme information sequence comprises a phoneme category and a pronunciation time period; determining a pronunciation time period corresponding to each syllable in the syllable sequence based on the syllable sequence, phoneme categories and pronunciation time periods in the phoneme information sequence; and generating an animation video corresponding to the audio stream based on the pronunciation time period corresponding to each syllable in the syllable sequence and an animation frame sequence corresponding to each syllable.
Client device, information processing system, storage medium, and information processing method
Provided is a client device that includes a detection unit that detects an act of expressing gratitude by a user, a communication unit that transmits and receives at least a first portion of virtual currency, and a control unit that performs control, with recognition of an act of expressing gratitude by a first user on the basis of data detected, such that a certain amount corresponding to the act in virtual currency held by the first user is subtracted and a first portion of the certain amount of the virtual currency is managed as gratitude currency held by the first user.
Client device, information processing system, storage medium, and information processing method
Provided is a client device that includes a detection unit that detects an act of expressing gratitude by a user, a communication unit that transmits and receives at least a first portion of virtual currency, and a control unit that performs control, with recognition of an act of expressing gratitude by a first user on the basis of data detected, such that a certain amount corresponding to the act in virtual currency held by the first user is subtracted and a first portion of the certain amount of the virtual currency is managed as gratitude currency held by the first user.
Speech translation device, speech translation method, and recording medium
A speech translation device, for conversation between a first speaker making an utterance in a first language and a second speaker making an utterance in a second language different from the first language, includes: a speech detector that detects, from sounds that are input, a speech segment in which the first speaker or the second speaker made an utterance; a display that, after speech recognition is performed on the utterance, displays a translation result obtained by translating the utterance from the first language to the second language or from the second language to the first language; and an utterance instructor that outputs, in the second language via the display, a message prompting the second speaker to make an utterance after a first speaker's utterance or outputs, in the first language via the display, a message prompting the first speaker to make an utterance after a second speaker's utterance.