G06F40/45

Alignment generation device and alignment generation method

A computer generates a plurality of encoded sentences in the first language by encoding a plurality of sentences in the first language in prescribed units. Next, the computer generates a plurality of encoded sentences in the second language by encoding a plurality of sentences in the second language, each of which is associated with each of the plurality of sentences in the first language, in the prescribed units. The computer generates alignment information based on a code included in each of the plurality of encoded sentences in the first language and a code included in an encoded sentence in the second language, which is associated with each of the plurality of encoded sentences in the first language. The alignment information indicates an alignment between a plurality of codes in the first language and a plurality of codes in the second language.

Translation training with cross-lingual multi-media support
11256882 · 2022-02-22 · ·

An improved lecture support system integrates multi-media presentation materials with spoken content so that the listener can follow with both the speech and the supporting materials that accompany the presentation to provide additional understanding. Computer-based systems and methods are disclosed for translation of a spoken presentation (e.g., a lecture, a video) along with the accompanying presentation materials. The content of the presentation materials can be used to improve presentation translation, as it extracts supportive material from the presentation materials as they relate to the speech.

TRANSLATION USING RELATED TERM PAIRS
20170300475 · 2017-10-19 ·

A method includes translating a source to generate a translated source, extracting a set of terms from one of the source and the translated source comprising at least a first term and a second teim related to the first term, comparing the extracted set of terms with at least one translation pair, and determining a correct translation based on the comparison.

METHOD FOR DETECTING ORIGINAL LANGUAGE

A system for detecting an original language of a translated document retrieves the translated document, and identifies a language of the retrieved document. The system calculates a language model for the language of the retrieved document (LM(RD)). The system calculates a distinct vector as a difference between LM(RD) and a common language model for the language of the retrieved document (LMT(RD)). The system obtains pair vectors for language model pairs associated with the language of the retrieved document, and calculates a vector distance between the distinct vector and each pair vector (or between the (LM(RD)) and each pair vector). The system identifies a given pair vector within a threshold vector distance, and calculates the confidence score. The system then identifies the original language corresponding to the given pair vector as the original language of the retrieved document, and retrieves an original document in the original language of the retrieved document.

Improving Automatic Speech Recognition of Multilingual Named Entities

Methods and systems are provided for improving speech recognition of multilingual named entities. In some embodiments, a list comprising a plurality of named entities may be accessed by a computing device. A first named entity represented in the native language may be compared with the first named entity represented in the foreign language. One or more words that appear in both the first named entity represented in the native language and the first named entity represented in the foreign language may be identified as one or more foreign words. A grapheme-to-phoneme (G2P) conversion may be applied to the one or more foreign words, wherein graphemes of the one or more foreign words are mapped to phonemes in the native language. The G2P conversion may result in a native pronunciation for each of the one or more foreign words, which are added to a recognition dictionary along with the native pronunciations.

SYSTEM, METHOD, AND RECORDING MEDIUM FOR CORPUS PATTERN PARAPHRASING
20170286399 · 2017-10-05 ·

A corpus pattern paraphrasing method, system, and non-transitory computer readable medium, include an analyzing circuit configured to analyze a corpus of sentences stored in a database to determine regular structures including a plurality of substitute words for verbs expressed as patterns and apply deep learning of the regular structures over the patterns, a representative word determining circuit configured to determine a plurality of representative words that represents each class of word of the regular structures, and an aligning circuit configured to align word slots of a paraphrase pattern of the classes of words replaced with substitute words and representative words in the paraphrase pattern to give a same semantic meaning to the paraphrase pattern as a sentence of the corpus of sentences.

Cross-lingual discriminative learning of sequence models with posterior regularization
09779087 · 2017-10-03 · ·

A computer-implemented method can include obtaining (i) an aligned bi-text for a source language and a target language, and (ii) a supervised sequence model for the source language. The method can include labeling a source side of the aligned bi-text using the supervised sequence model and projecting labels from the labeled source side to a target side of the aligned bi-text to obtain a labeled target side of the aligned bi-text. The method can include filtering the labeled target side based on a task of a natural language processing (NLP) system configured to utilize a sequence model for the target language to obtain a filtered target side of the aligned bi-text. The method can also include training the sequence model for the target language using posterior regularization with soft constraints on the filtered target side to obtain a trained sequence model for the target language.

Translation method and apparatus

Embodiments of the present invention provide a translation method and apparatus, and relate to the field of machine translation. The method includes: obtaining a to-be-translated sentence, where the to-be-translated sentence is a sentence expressed in a first language; determining a first named entity set in the to-be-translated sentence, and an entity type of each first named entity in the first named entity set; determining, based on the first named entity set and the entity type of each first named entity, a second named entity set expressed in a second language; determining a source semantic template of the to-be-translated sentence, and obtaining a target semantic template corresponding to the source semantic template from a semantic template correspondence; and determining a target translation sentence based on the second named entity set and the target semantic template.

Systems and methods for processing nuances in natural language
11244120 · 2022-02-08 · ·

Systems, apparatuses, methods, and computer program products are disclosed for processing electronic information indicative of natural language. An example method includes receiving first electronic information indicative of a sequence of words provided by a user and identifying, based on the first electronic information, a first word and a first natural language. The example method further includes receiving second electronic information indicative of an exogenous event and identifying, based on the second electronic information, the exogenous event. The example method further includes generating one or more natural language attribute data sets based on the identified first word, first language, and exogenous event. The example method further includes generating a natural language transliteration data set based on the one or more natural language attribute data sets. Subsequently, the example method includes generating, based on the natural language transliteration data set, a translation of the first word in a second natural language.

Systems and methods for processing nuances in natural language
11244120 · 2022-02-08 · ·

Systems, apparatuses, methods, and computer program products are disclosed for processing electronic information indicative of natural language. An example method includes receiving first electronic information indicative of a sequence of words provided by a user and identifying, based on the first electronic information, a first word and a first natural language. The example method further includes receiving second electronic information indicative of an exogenous event and identifying, based on the second electronic information, the exogenous event. The example method further includes generating one or more natural language attribute data sets based on the identified first word, first language, and exogenous event. The example method further includes generating a natural language transliteration data set based on the one or more natural language attribute data sets. Subsequently, the example method includes generating, based on the natural language transliteration data set, a translation of the first word in a second natural language.