Patent classifications
G10L13/086
Electronic apparatus and controlling method thereof
A electronic apparatus includes a display, a voice receiver configured to receive a user voice input, and a processor to obtain a first text from the user voice input that is received through the voice receiver based on a function corresponding to a first voice recognition related to a first language, based on an entity name not being included in the first text using the function corresponding to the first voice recognition related to the first language, obtain a second text corresponding to the entity name from of the user voice input based on a function corresponding to a second voice recognition related to a second language, and control the display to display a voice recognition result corresponding to the user voice input based on the first text and the second text.
DETERMINING MULTILINGUAL CONTENT IN RESPONSES TO A QUERY
Implementations relate to determining multilingual content to render at an interface in response to a user submitted query. Those implementations further relate to determining a first language response and a second language response to a query that is submitted to an automated assistant. Some of those implementations relate to determining multilingual content that includes a response to the query in both the first and second languages. Other implementations relate to determining multilingual content that includes a query suggestion in the first language and a query suggestion in a second language. Some of those implementations relate to pre-fetching results for the query suggestions prior to rendering the multilingual content.
Method and system for parametric speech synthesis
Embodiments of the present systems and methods may provide techniques for synthesizing speech in any voice in any language in any accent. For example, in an embodiment, a text-to-speech conversion system may comprise a text converter adapted to convert input text to at least one phoneme selected from a plurality of phonemes stored in memory, a machine-learning model storing voice patterns for a plurality of individuals and adapted to receive the at least one phoneme and an identity of a speaker and to generate acoustic features for each phoneme, and a decoder adapted to receive the generated acoustic features and to generate a speech signal simulating a voice of the identified speaker in a language.
DIGITIZED VOICE ALERTS
Methods, systems and processor-readable media for providing instant/real-time voice alerts automatically to remote electronic devices. An activity can be detected utilizing one or more sensors. A text message indicative of the activity can be generated and converted into a digitized voice alert. The activity can also be a live utterance (e.g., a live announcement), which can then be instantly converted into a digitized voice alert for automatic delivery in a selected series of languages following the base language (e.g., English). The combined digitized voice alert can then be instantly transmitted through a network for broadcast of consecutive alerts (e.g., English followed by Spanish followed by Vietnamese, etc.) to one or more remote electronic devices that communicate with the network for an automatic audio announcement of the digitized voice alert through the one or more remote electronic devices.
GENERATING AUDIO FILES BASED ON USER GENERATED SCRIPTS AND VOICE COMPONENTS
A computer-implemented method, according to one embodiment, includes: determining whether a predetermined version of a source script is available. In response to determining that a predetermined version of the source script is available, it is used to condition a first processor, and instructions are sent to the conditioned first processor to generate a translated copy of the source script by translating the words in the source script from a source language to a target language. Instructions are also sent to a second processor to determine a distribution of metrics associated with the speech of each of the actors in the source audio file. The distributions are used to condition a third processor, and instructions are sent to the conditioned third processor to generate an audio file that includes words spoken in the target language. Furthermore, instructions are sent to merge the generated audio file with the video file.
Methods and systems for facilitating conversion of content for transfer and storage of content
Various embodiments provide methods and devices for utilizing content conversion for the communication of content. In an embodiment, method, performed by a user device, includes receiving a user input comprising one or more user preferences to facilitate at least one output content; Further, the method includes receiving at least one input content from at least one content source based on the one or more user preferences. Thereafter, the method includes separating the at least one input content from the at least one content source using delimiters, and generating the at least one output content from the at least one input content based on one or more content characteristics. The at least one output content has data size less than the data size of the at least one input content. Furthermore, the method includes transmitting, the at least one output content to another user device.
DYNAMIC LANGUAGE SELECTION OF AN AI VOICE ASSISTANCE SYSTEM
The computer-implemented method provides for a digital virtual assistant (DVA) receiving input spoken in a first language by a user. The DVA determines a context of a current situation based on language and identity of individuals within a proximity of the DVA. The DVA determines whether the context of the current situation includes providing a response using a second language. In response to determining the context of the current situation calls for providing the response in the second language, the DVA determines the second language based on the context, and the DVA responds to the input spoken in the first language by the user, such that the response includes a dynamic selection of the second language and is based on an interaction context of the user and the DVA, and reference to a corpus of interaction context usage of the second language in a historically similar situation.
AUTOMATIC DUBBING METHOD AND APPARATUS
A method and system for automatic dubbing method is disclosed, comprising, responsive to receiving a selection of media content for playback on a user device by a user of the user device, processing extracted speeches of a first voice from the media content to generate replacement speeches using a set of phenomes of a second voice of the user of the user device, and replacing the extracted speeches of the first voice with the generated replacement speeches in the audio portion of the media content for playback on the user device.
Speech Recognition Method and Apparatus, Terminal, and Storage Medium
An artificial intelligence (AI)-based speech recognition method includes steps for obtaining a target speech signal, determining a target language type of the target speech signal, and outputting text information of the target speech signal using a real-time speech recognition model corresponding to the target language type. The real-time speech recognition model is obtained by training a training set including an original speech signal and an extended speech signal, and the extended speech signal is obtained by converting an existing text of a basic language type.
Neural-Network-Based Text-to-Speech Model for Novel Speaker Generation
Systems and methods for text-to-speech with novel speakers can obtain text data and output audio data. The input text data may be input along with one or more speaker preferences. The speaker preferences can include speaker characteristics. The speaker preferences can be processed by a machine-learned model conditioned on a learned prior distribution to determine a speaker embedding. The speaker embedding can then be processed with the text data to generate an output that includes audio data descriptive of the text data spoken by a novel speaker.