Patent classifications
G10L13/086
Process for improving pronunciation of proper nouns foreign to a target language text-to-speech system
A system and method configured for use in a text-to-speech (TTS) system is provided. Embodiments may include identifying, using one or more processors, a word or phrase as a named entity and identifying a language of origin associated with the named entity. Embodiments may further include transliterating the named entity to a script associated with the language of origin. If the TTS system is operating in the language of origin, embodiments may include passing the transliterated script to the TTS system. If the TTS system is not operating in the language of origin, embodiments may include generating a phoneme sequence in the language of origin using a grapheme to phoneme (G2P) converter.
ELECTRONIC APPARATUS AND CONTROLLING METHOD THEREOF
Disclosed is an electronic apparatus. The electronic apparatus includes a memory configured to store first voice recognition information related to a first language and second voice recognition information related to a second language, and a processor to obtain a first text corresponding to a user voice that is received on the basis of first voice recognition information, based on an entity name being included in the user voice according to the obtained first text, identify a segment in the user voice in which the entity name is included, and obtain a second text corresponding to the identified segment of the user voice on the basis of the second voice recognition information, and obtain control information corresponding to the user voice on the basis of the first text and the second text.
ELECTRONIC DEVICE AND CONTROL METHOD THEREOF
An electronic apparatus includes: a microphone; a communication interface including communication circuitry; a memory configured to store a first encoder corresponding to a first language and a first decoder corresponding to the first language; and a processor configured to: based on a user voice in the first language being received through the microphone, acquire text in the first language corresponding to the user voice, acquire a first feature vector by inputting the text in the first language to the first encoder, control the communication interface to transmit the first feature vector to an external device, and based on a second feature vector being received from the external device through the communication interface, acquire text in the first language corresponding to the second feature vector by inputting the second feature vector to the first decoder.
ANIMATION SYNTHESIS SYSTEM AND LIP ANIMATION SYNTHESIS METHOD
An animation display system is provided. The animation display system includes a display; a storage configured to store a language model database, a phonetic-symbol lip-motion matching database and a lip motion synthesis database; and a processor electronically connected to the storage and the display, respectively. The processor includes a speech conversion module, a phonetic-symbol lip-motion matching module, and a lip motion synthesis module. A lip animation display method is also provided.
Artificial ventriloquist-like contact center agents
The need for efficient and effective communications is of key importance to contact centers. Agent communications with customers are designed to maximize results while minimizing resources, in particular the time required for human agents to be engaged with a particular customer. Often the impact of two agents on a communication can both improve customer satisfaction and better produce the intended result of the communication. However, two (or more) live agents is resource intensive. By providing a virtual agent controlled, entirely or in part, by a live agent, the customer may be presented with the appearance of two agents while requiring the human resources of a single agent.
Method and system for remote communication based on real-time translation service
A method for remote communication based on a real-time translation service according to an embodiment of the present disclosure, as a method for providing remote communication based on a real-time translation service by a real-time translation application executed by at least one or more processors of a computing device, comprises performing augmented reality-based remote communication; setting an initial value of a translation function for the remote communication; obtaining communication data of other users through the remote communication; performing language detection for the obtained communication data; when a target translation language is detected within the communication data from the performed language detection, translating communication data of the target translation language detected; and providing the translated communication data.
Translation device, translation method, and program
A translation device includes a speech recognition unit, a storage, a translation processor, and an information acquisition unit. The speech recognition unit recognizes a voice to generate a spoken sentence in a first language. The storage stores a plurality of example sentences each including a parameter representing a category corresponding to a plurality of terms. The translation processor searches the plurality of example sentences stored in the storage for an example sentence on the basis of the spoken sentence as a search result example sentence, and generates a converted sentence based on the search result example sentence. The information acquisition unit acquires specific information representing a specific term which corresponds to a specific parameter. If the search result example sentence includes the specific parameter, the translation processor generates the converted sentence based on the specific term represented by the specific information.
Methods, apparatus and data structure for cross-language speech adaptation
Adapted speech models produce fluent synthesized speech in a voice that sounds as if the speaker were fluent in a language in which the speaker is actually non-fluent. A full speech model is obtained based on fluent speech in the language spoken by a first person who is fluent in the language. A limited set of utterances is obtained in the language spoken by a second person who is non-fluent in the language but able to speak the limited set of utterances in the language. The full speech model of the first person is then processed with the limited set of utterances of the second person to produce an adapted speech model. The adapted speech model may be stored to a multi-lingual speech model as a child node of a root with an associated language selection question and branches pointed to the adapted speech model and other speech models, respectively.
System and method for using prior frame data for OCR processing of frames in video sources
Disclosed are systems, methods and computer program products for using prior frame data for OCR processing of frames in video sources to detect natural language text therein. An example includes receiving a frame from a video source and retrieving prior frame data associated with the video source. The OCR-processing includes using prior frame data to detect blobs similar to blobs described in the prior frame data; using detected similar blobs to detect in the frame character candidates similar to character candidates described in the prior frame data; using detected similar character candidates to detect in the frame text candidates similar to text candidates described in the prior frame data; and using detected similar text candidates to detect in the frame text strings similar to text strings described in the prior frame data.
APPARATUS AND METHOD FOR RECOGNIZING VOICE COMMANDS
The variety of embodiments according to the present invention relate to an apparatus and a method for recognizing voice commands in an electronic apparatus. As such, the method for voice recognition comprises the operations of: outputting a voice or an audio signal comprising a plurality of successive components; receiving the voice signal; determining one or more components from among the plurality of components by utilizing the time at which the voice signal was received; and generating response information for the voice signal on the basis of one or more components or at least a part of the information regarding the component.