G06F40/51

SYSTEMS, METHODS, AND APPARATUS FOR SWITCHING BETWEEN AND DISPLAYING TRANSLATED TEXT AND TRANSCRIBED TEXT IN THE ORIGINAL SPOKEN LANGUAGE

A method for managing a cloud-based meeting between participants that speak and understand different languages is disclosed. The method includes receiving, via a microphone at a first client device, first audio content in a first language preference of a first meeting participant; transcribing the first audio content into a first transcribed text by using the first language preference; receiving, from a second client device, a second language preference that is different from the first language preference; translating the first transcribed text into a second transcribed text by using the second language preference; and transmitting the first and second transcribed text to the second client device. The second client device is configured to concurrently display the first transcribed text and the second transcribed text on a display device. The second client device can also be configured to provide a second audio content from the second transcribed text.

PROVIDING A WELL-FORMED ALTERNATE PHRASE AS A SUGGESTION IN LIEU OF A NOT WELL-FORMED PHRASE

Implementations relate to determining a well-formed phrase to suggest to a user to submit in lieu of a not well-formed phrase. The suggestion is rendered via an interface that is provided to a client device of the user. Those implementations relate to determining that a phrase is not well-formed, identifying alternate phrases that are related to the not well-formed phrase, and scoring the alternate phrases to select one or more of the alternate phrases to render via the interface. Some of those implementations are related to identifying that the phrase is not well-formed based on occurrences of the phrase in documents that are generated by a source with the language of the phrase as the primary language of the creator.

PROVIDING A WELL-FORMED ALTERNATE PHRASE AS A SUGGESTION IN LIEU OF A NOT WELL-FORMED PHRASE

Implementations relate to determining a well-formed phrase to suggest to a user to submit in lieu of a not well-formed phrase. The suggestion is rendered via an interface that is provided to a client device of the user. Those implementations relate to determining that a phrase is not well-formed, identifying alternate phrases that are related to the not well-formed phrase, and scoring the alternate phrases to select one or more of the alternate phrases to render via the interface. Some of those implementations are related to identifying that the phrase is not well-formed based on occurrences of the phrase in documents that are generated by a source with the language of the phrase as the primary language of the creator.

Rewriting queries

Systems and methods are described for mitigating errors introduced during processing of user input such as voice input. A query may be derived from processed user input. A performance predictor analyzes the query and uses historical data to predict whether the query will return relevant results if executed. If the query's predicted performance is below a threshold, a query rewriter may identify potential alternatives to the query from a library of “known good” queries. Different analyzers may be applied to identify different sets of alternatives, and machine learning models may be applied to rank the outputs of the analyzers. The best-matching alternatives from each analyzer may then be provided as inputs to a further machine learning model, which assesses the probability that each of the identified alternatives reflects the intent of the user. A most likely alternative may then be selected to execute in place of the original query.

DETECTION OF ABBREVIATION AND MAPPING TO FULL ORIGINAL TERM

Translation capability for language processing determines an existence of an abbreviation, followed by non-exact matching to map the abbreviation to the original full term. A received string in a source language is provided as input to a translation service. Translation proposals in a different target language are received back. A ruleset (considering factors, e.g., camel case format, the presence of a concluding period, and/or consecutive consonants) is applied to generate abbreviation candidates from the translation proposals. Non-exact matching (referencing e.g., a comparison metric) may then be used to map the abbreviation candidates to text strings of their original full terms. A mapping of the abbreviation to the text string of the original full term is stored in a translation database comprising linguistic data. Embodiments leverage existing resources (e.g., translation service, non-exact matching) to reduce effort and expense of accurately identifying abbreviations and then mapping them to their full original terms.

VIDEO TRANSLATION METHOD AND APPARATUS, STORAGE MEDIUM, AND ELECTRONIC DEVICE

A video translation method includes: converting speech in a video to be translated into text; displaying the text and first time information, second time information, and a reference translation of the text; in response to an operation by a user on the text or the reference translation, displaying an editing area supporting the user inputting a translation; following input by the user, providing a translation suggestion from the reference translation; when a confirmation operation by the user for the translation suggestion is detected, using the translation suggestion as a translation result and displaying the same; when a non-confirmation operation by the user for the translation suggestion is detected, receiving a translation inputted by the user which is different from the translation suggestion, using the inputted translation as a translation result and displaying the same, and updating the reference translation in a translation area according to the inputted translation.

VIDEO TRANSLATION METHOD AND APPARATUS, STORAGE MEDIUM, AND ELECTRONIC DEVICE

A video translation method includes: converting speech in a video to be translated into text; displaying the text and first time information, second time information, and a reference translation of the text; in response to an operation by a user on the text or the reference translation, displaying an editing area supporting the user inputting a translation; following input by the user, providing a translation suggestion from the reference translation; when a confirmation operation by the user for the translation suggestion is detected, using the translation suggestion as a translation result and displaying the same; when a non-confirmation operation by the user for the translation suggestion is detected, receiving a translation inputted by the user which is different from the translation suggestion, using the inputted translation as a translation result and displaying the same, and updating the reference translation in a translation area according to the inputted translation.

SYSTEMS AND METHODS FOR CROSS-LINGUAL CROSS-MODAL TRAINING FOR MULTIMODAL RETRIEVAL
20220383048 · 2022-12-01 · ·

Current pretrained vision-language models for cross-modal retrieval tasks in English depend upon on the availability of many annotated image-caption datasets for pretraining to have English text. However, the texts are not necessarily in English. Although machine translation (MT) tools may be used to translate text to English, the performance largely relies on MT's quality and may suffer from high latency problems in real-world applications. Embodiments herein address these problems by learning cross-lingual cross-modal representations for matching images and their relevant captions in multiple languages. Embodiments seamlessly combine cross-lingual pretraining objectives and cross-modal pretraining objectives in a unified framework to learn image and text in a joint embedding space from available English image-caption data, monolingual corpus, and parallel corpus. Embodiments are shown to achieve state-of-the-art performance in retrieval tasks on multimodal multilingual image caption datasets.

SYSTEMS AND METHODS FOR CROSS-LINGUAL CROSS-MODAL TRAINING FOR MULTIMODAL RETRIEVAL
20220383048 · 2022-12-01 · ·

Current pretrained vision-language models for cross-modal retrieval tasks in English depend upon on the availability of many annotated image-caption datasets for pretraining to have English text. However, the texts are not necessarily in English. Although machine translation (MT) tools may be used to translate text to English, the performance largely relies on MT's quality and may suffer from high latency problems in real-world applications. Embodiments herein address these problems by learning cross-lingual cross-modal representations for matching images and their relevant captions in multiple languages. Embodiments seamlessly combine cross-lingual pretraining objectives and cross-modal pretraining objectives in a unified framework to learn image and text in a joint embedding space from available English image-caption data, monolingual corpus, and parallel corpus. Embodiments are shown to achieve state-of-the-art performance in retrieval tasks on multimodal multilingual image caption datasets.

MULTILINGUAL SUBTITLE SERVICE SYSTEM AND METHOD FOR CONTROLLING SERVER THEREOF
20220383228 · 2022-12-01 ·

Proposed are a multilingual subtitle service system and a method for controlling a service server thereof. The subtitle service system includes: a subtitle service server configured to, in response to a request from a worker, provide a subtitle content creating tool for creating a subtitle content for a content image requested by a client and evaluate task performance of the worker based on the subtitle content created by the worker; and a user terminal device configured to access the subtitle service server to transmit project information on the content image requested by the client, and, in response to a request from the worker, display a subtitle service window including the subtitle content creating tool provided by the subtitle service server.