G10L21/003

VOICE DRIVEN DYNAMIC MENUS
20230229290 · 2023-07-20 ·

Disclosed are systems, methods, and computer-readable storage media to provide voice driven dynamic menus. One aspect disclosed is a method including receiving, by an electronic device, video data and audio data, displaying, by the electronic device, a video window, determining, by the electronic device, whether the audio data includes a voice signal, displaying, by the electronic device, a first menu in the video window in response to the audio data including a voice signal, displaying, by the electronic device, a second menu in the video window in response to a voice signal being absent from the audio data, receiving, by the electronic device, input from the displayed menu, and writing, by the electronic device, to an output device based on the received input.

VOICE TRANSLATION AND VIDEO MANIPULATION SYSTEM
20230009957 · 2023-01-12 · ·

A communication modification system including an audio gathering unit that gathers an audio stream, a language detection unit that converts the audio stream into text, where the language detection unit correlates portions of the text with audio portions of the audio stream, and the language detection unit determines a first and second deviation in the audio stream portion based on the text portion and audio portion gathered by the audio gathering unit.

SELECTIVE FINE-TUNING OF SPEECH

Speech conveyed over a network, such as during an electronic conference may be more difficult to understand if the recipient has difficulty understanding the speech of users having a particular speech attribute. However, other recipients may have no difficulty understanding the speech. As provided herein, speech provided by a user may have phonemes comprising accents or other speech pattern that, if removed, are more readily understood by a particular user. Such alterations are provided only to the users that require it, such as by a server or a specific user's communication device, without affecting the speech concurrently presented to other users.

Audio file processing method, electronic device, and storage medium

An audio file processing method is provided for an electronic device. The method includes extracting at least one audio segment from a first audio file, recognizing at least one to-be-replaced audio segment representing a target role from the at least one audio segment, and determining time frame information of each to-be-replaced audio segment in the first audio file. The method also includes obtaining to-be-dubbed audio data for each to-be-replaced audio segment, and replacing data in the to-be-replaced audio segment with the to-be-dubbed audio data according to the time frame information, to obtain a second audio file. The at least one to-be-replaced audio segment is divided from the at least one audio segment based on a structure and a word count in a sentence corresponding to each to-be-replaced audio segment.

SPEECH RECOGNITION APPARATUS AND METHOD

According to one embodiment, a speech recognition apparatus includes processing circuitry. The processing circuitry generates a plurality of augmented speech data, based on input speech data, generates a plurality of acoustic scores, based on the plurality of augmented speech data and an acoustic model, generates a plurality of adjusted acoustic scores by resampling the acoustic scores, generates an integrated acoustic score by integrating the adjusted acoustic scores, generates an integrated lattice, based on the integrated acoustic score, a pronunciation dictionary, and a language model, and searches a speech recognition result with a highest likelihood from the integrated lattice.

Dynamic creation and insertion of content

In an aspect, during a presentation of a presentation material, viewers of the presentation material can be monitored. Based on the monitoring, new content can be determined for insertion into the presentation material. The new content can be automatically inserted to the presentation material in real time. In another aspect, during the presentation, a presenter of the presentation material can be monitored. The presenter's speech can be intercepted and analyzed to detect a level of confidence. Based on the detected level of confidence, the presenter's speech can be adjusted and the adjusted speech can be played back automatically, for example, in lieu of the presenter's original speech that is intercepted.

Dynamic creation and insertion of content

In an aspect, during a presentation of a presentation material, viewers of the presentation material can be monitored. Based on the monitoring, new content can be determined for insertion into the presentation material. The new content can be automatically inserted to the presentation material in real time. In another aspect, during the presentation, a presenter of the presentation material can be monitored. The presenter's speech can be intercepted and analyzed to detect a level of confidence. Based on the detected level of confidence, the presenter's speech can be adjusted and the adjusted speech can be played back automatically, for example, in lieu of the presenter's original speech that is intercepted.

Communication using interactive avatars

Generally this disclosure describes a video communication system that replaces actual live images of the participating users with animated avatars. A method may include selecting an avatar; initiating communication; detecting a user input; identifying the user input; identifying an animation command based on the user input; generating avatar parameters; and transmitting at least one of the animation command and the avatar parameters.

Communication using interactive avatars

Generally this disclosure describes a video communication system that replaces actual live images of the participating users with animated avatars. A method may include selecting an avatar; initiating communication; detecting a user input; identifying the user input; identifying an animation command based on the user input; generating avatar parameters; and transmitting at least one of the animation command and the avatar parameters.

Method and electronic device for formant attenuation/amplification
11594241 · 2023-02-28 · ·

A method comprising determining feature values of an input audio window and determining a formant attenuation/amplification coefficient for the input audio window based on the processing of the feature values by a neural network.