Patent classifications
G10L15/005
INTERACTION INFORMATION PROCESSING METHOD AND APPARATUS, DEVICE, AND MEDIUM
An interaction information processing method and apparatus, a device, and a medium are provided. The method includes: collecting voice data of at least one participating user in an interaction conducted by users on a real-time interactive interface; determining, based on the voice data, a source language type used by each of the at least one participating user; converting the voice data of the at least one participating user from the source language type to a target language type, to obtain translation data; and displaying the translation data on a target client device.
Assistance during audio and video calls
Implementations relate to providing information items for display during a communication session. In some implementations, a computer-implemented method includes receiving, during a communication session between a first computing device and a second computing device, first media content from the communication session. The method further includes determining a first information item for display in the communication session based at least in part on the first media content. The method further includes sending a first command to at least one of the first computing device and the second computing device to display the first information item.
Speech translation device, speech translation method, and recording medium
A speech translation device, for conversation between a first speaker making an utterance in a first language and a second speaker making an utterance in a second language different from the first language, includes: a speech detector that detects, from sounds that are input, a speech segment in which the first speaker or the second speaker made an utterance; a display that, after speech recognition is performed on the utterance, displays a translation result obtained by translating the utterance from the first language to the second language or from the second language to the first language; and an utterance instructor that outputs, in the second language via the display, a message prompting the second speaker to make an utterance after a first speaker's utterance or outputs, in the first language via the display, a message prompting the first speaker to make an utterance after a second speaker's utterance.
TRANSLATION SYSTEM, TRANSLATION APPARATUS, TRANSLATION METHOD, AND TRANSLATION PROGRAM
The present invention contributes to reducing the burden on a user while preventing speeches translated into a plurality of languages from interfering with each other. A translation system comprises a camera that obtains surroundings information; a directional speaker that is movable so as to output sound toward a specified position; a directional microphone that is movable so as to receive sound from a specified position; and a translation apparatus that determines a location of a user from the surroundings information obtained by the camera, moves the directional speaker and the directional microphone toward the location of the user, identifies the language of a speech received by the directional microphone, translates the language into another language to output the translated language from another directional speaker, and retranslates the translation in the another language into the language to output the retranslated language from the directional speaker.
Speech translation method electronic device and computer-readable storage medium using SEQ2SEQ for determining alternative translated speech segments
Provided are a speech translation method and apparatus, an electronic device and a storage medium. The method includes: acquiring a source speech corresponding to a to-be-translated language; acquiring a specified target language; inputting the source speech and indication information matched with the target language into a pre-trained speech translation model, where the speech translation model is configured to translate a language in a first language set into a language in a second language set, the first language set includes a plurality of languages, the first language set includes the to-be-translated language, the second language set includes a plurality of languages, and the second language set includes the target language; and acquiring a translated speech corresponding to the target language and output by the speech translation model; where the to-be-translated language is different from the target language.
Leveraging unpaired text data for training end-to-end spoken language understanding systems
An illustrative embodiment includes a method for training an end-to-end (E2E) spoken language understanding (SLU) system. The method includes receiving a training corpus comprising a set of text classified using one or more sets of semantic labels but unpaired with speech and using the set of unpaired text to train the E2E SLU system to classify speech using at least one of the one or more sets of semantic labels. The method may include training a text-to-intent model using the set of unpaired text; and training a speech-to-intent model using the text-to-intent model. Alternatively or additionally, the method may include using a text-to-speech (TTS) system to generate synthetic speech from the unpaired text; and training the E2E SLU system using the synthetic speech.
Method for operating a motor vehicle having an operating device
The invention relates to a method for operating a motor vehicle having an operating device, which includes a speech recognition and language determination device. A recognition of a voice input of a user of the motor vehicle, and a check as to whether a language of the voice input corresponds to the first operating language take place in a first operating mode with a first operating language. Depending on a result of the checking process, a confidence value is assigned to the voice input, which describes a probability with which the language of the voice input is the second operating language. Depending on the assigned confidence value, a query signal is generated, which describes a request, understandable in a second operating language, to the user for indicating the operating mode to be set or the operating language to be set. In response to a received operating signal, the operating mode to be set or the operating language to be set are set.
Electronic device and method for controlling the electronic device thereof based on determining intent of a user speech in a first language machine translated into a predefined second language
An electronic device and a method for controlling the electronic device thereof are provided. The electronic device includes a memory storing instructions, and a processor configured to control the electronic device by executing the instructions stored in the memory, and the processor is configured to, based on a user's speech being input, acquire a first sentence in a first language corresponding to the user's speech through a speech recognition model corresponding to a language of the user's speech, acquire a second sentence in a second language corresponding to the first sentence in the first language through a machine translation model trained to translate a plurality of languages into the predefined second language, and acquire a control instruction of the electronic device corresponding to the acquired second sentence or acquire a response to the second sentence through a natural language understanding model trained based on the second language.
Method, system for speech recognition, electronic device and storage medium
Disclosed are a method and a system for speech recognition, an electronic device and a storage medium, which relates to the technical field of speech recognition. Embodiments of the application comprise performing encoded representation on an audio to be recognized to obtain an acoustic encoded state vector sequence of the audio to be recognized; performing sparse encoding on the acoustic encoded state vector sequence of the audio to be recognized to obtain an acoustic encoded sparse vector; determining a text prediction vector of each label in a preset vocabulary; recognizing the audio to be recognized and determining a text content corresponding to the audio to be recognized according to the acoustic encoded sparse vector and the text prediction vector. The acoustic encoded sparse vector of the audio to be recognized is obtained by performing sparse encoding on the acoustic encoded state vector of the audio to be recognized.
Information presentation device, and information presentation method
There is provided an information presentation device that is configured to present information, to a plurality of users that differ in level, in such a manner that each of the users can easily understand the information, and an information presentation method. The information presentation device includes: an identification unit that identifies respective levels of one or more users; an obtaining unit that obtains presentation information to be presented to the users; a conversion unit that appropriately converts the obtained presentation information according to the level of each user; and a presentation unit that presents the appropriately converted presentation information to each user. The present technology can be applied to, for example, a robot, a signage device, a car navigation device, and the like.