G06F40/49

Word vector retrofitting method and apparatus

The present disclosure discloses a word vector retrofitting method. The method includes obtaining, by a computing device, a first model and a second model that are generated when original word vectors are trained, the first model being configured to predict a context according to an inputted word, and the second model being configured to predict a target word according to a context; inputting a corpus unit from a target corpus into the first model, inputting an output of the first model into the second model, and determining losses generated by the first model and the second model when the second model outputs the corpus unit; and retrofitting the first model and the second model according to the losses.

Word vector retrofitting method and apparatus

The present disclosure discloses a word vector retrofitting method. The method includes obtaining, by a computing device, a first model and a second model that are generated when original word vectors are trained, the first model being configured to predict a context according to an inputted word, and the second model being configured to predict a target word according to a context; inputting a corpus unit from a target corpus into the first model, inputting an output of the first model into the second model, and determining losses generated by the first model and the second model when the second model outputs the corpus unit; and retrofitting the first model and the second model according to the losses.

NATURAL LANGUAGE PROCESSING AND MACHINE-LEARNING FOR EVENT IMPACT ANALYSIS

Systems and methods may use natural language processing (NLP) and machine-learning techniques to detect an impact that an event will have on a domain-specific topic. For example, the system may use multi-stage cleaning using a rules-based and an artificial intelligence (AI)-based filter to filter large quantities of event items that may not be relevant to a domain of interest. The AI-based filter may be trained using labeled event items that were previously known to be impactful. The system may cluster the cleaned event items to group similar event items and eliminate redundancy. The system may then predict and quantify the impact that events described by clustered event items will have on the domain-specific topic. Such prediction may be based on a classifier trained using various model features that correlate with impactful events, including prior similar events.

NATURAL LANGUAGE PROCESSING AND MACHINE-LEARNING FOR EVENT IMPACT ANALYSIS

Systems and methods may use natural language processing (NLP) and machine-learning techniques to detect an impact that an event will have on a domain-specific topic. For example, the system may use multi-stage cleaning using a rules-based and an artificial intelligence (AI)-based filter to filter large quantities of event items that may not be relevant to a domain of interest. The AI-based filter may be trained using labeled event items that were previously known to be impactful. The system may cluster the cleaned event items to group similar event items and eliminate redundancy. The system may then predict and quantify the impact that events described by clustered event items will have on the domain-specific topic. Such prediction may be based on a classifier trained using various model features that correlate with impactful events, including prior similar events.

Method, apparatus, electronic device and readable storage medium for translation

The present disclosure provides a method, apparatus, electronic device and readable storage medium for translation and relates to translation technologies. In the embodiments of the present disclosure, the at least one knowledge element is obtained according to associated information of content to be translated, and respective knowledge element in the at least one knowledge element comprise an element of the first language type and an element of the second language type so that the at least one knowledge element can be used to obtain a translation result of the content to be translated. Since the at least one knowledge element obtained in advance is taken as global information of the translation task of this time, it can be ensured that the translation result of the same content to be translated is consistent, thereby improving the quality of the translation result.

Method, apparatus, electronic device and readable storage medium for translation

The present disclosure provides a method, apparatus, electronic device and readable storage medium for translation and relates to translation technologies. In the embodiments of the present disclosure, the at least one knowledge element is obtained according to associated information of content to be translated, and respective knowledge element in the at least one knowledge element comprise an element of the first language type and an element of the second language type so that the at least one knowledge element can be used to obtain a translation result of the content to be translated. Since the at least one knowledge element obtained in advance is taken as global information of the translation task of this time, it can be ensured that the translation result of the same content to be translated is consistent, thereby improving the quality of the translation result.

METHOD AND SYSTEM FOR EVALUATING AND IMPROVING LIVE TRANSLATION CAPTIONING SYSTEMS

Methods, systems, and apparatus, including computer programs encoded on computer storage media for evaluating and improving live translation captioning systems. An exemplary method includes: displaying a word in a first language; receiving a first audio sequence, the first audio sequence comprising a verbal description of the word; generating a first translated text in a second language; displaying the first translated text; receiving a second audio sequence, the second audio sequence comprising a guessed word based on the first translated text; generating a second translated text in the first language; determining a matching score between the word and the second translated text; determining a performance score of the live translation captioning system based on the matching score.

METHOD AND SYSTEM FOR EVALUATING AND IMPROVING LIVE TRANSLATION CAPTIONING SYSTEMS

Methods, systems, and apparatus, including computer programs encoded on computer storage media for evaluating and improving live translation captioning systems. An exemplary method includes: displaying a word in a first language; receiving a first audio sequence, the first audio sequence comprising a verbal description of the word; generating a first translated text in a second language; displaying the first translated text; receiving a second audio sequence, the second audio sequence comprising a guessed word based on the first translated text; generating a second translated text in the first language; determining a matching score between the word and the second translated text; determining a performance score of the live translation captioning system based on the matching score.

TRANSLATION APPARATUS, TRANSLATION METHOD AND PROGRAM

A translation apparatus includes: a preprocessing unit that takes an input sentence in a source language and outputs a token string in which the input sentence has been segmented in tokens, the tokens being a predetermined unit of processing; an output sequence prediction unit that inputs the token string output by the preprocessing unit to a trained translation model and predicts a word translation probability of a translation candidate for each token of the token string from the trained translation model; a word set prediction unit that checks each token of the token string output by the preprocessing unit against entry words of a bilingual dictionary, and upon detecting an entry word that agrees with the token in the bilingual dictionary, generates a target-language word set from a set of tokens constituting a translation phrase corresponding to the detected entry word; and an output sequence determination unit that computes a reward which is based on whether a translation candidate for each token of the input sentence is included in the target-language word set or not and determines a translated sentence of the input sentence based on a word translation score computed by adding the reward to the word translation probability of the translation candidate. Units of tokens constituting the translation phrase in the bilingual dictionary are subwords.

TRANSLATION APPARATUS, TRANSLATION METHOD AND PROGRAM

A translation apparatus includes: a preprocessing unit that takes an input sentence in a source language and outputs a token string in which the input sentence has been segmented in tokens, the tokens being a predetermined unit of processing; an output sequence prediction unit that inputs the token string output by the preprocessing unit to a trained translation model and predicts a word translation probability of a translation candidate for each token of the token string from the trained translation model; a word set prediction unit that checks each token of the token string output by the preprocessing unit against entry words of a bilingual dictionary, and upon detecting an entry word that agrees with the token in the bilingual dictionary, generates a target-language word set from a set of tokens constituting a translation phrase corresponding to the detected entry word; and an output sequence determination unit that computes a reward which is based on whether a translation candidate for each token of the input sentence is included in the target-language word set or not and determines a translated sentence of the input sentence based on a word translation score computed by adding the reward to the word translation probability of the translation candidate. Units of tokens constituting the translation phrase in the bilingual dictionary are subwords.