G06F17/27

PREDICTING FUTURE TRANSLATIONS
20180004734 · 2018-01-04 ·

Technology is disclosed for snippet pre-translation and dynamic selection of translation systems. Pre-translation uses snippet attributes such as characteristics of a snippet author, snippet topics, snippet context, expected snippet viewers, etc., to predict how many translation requests for the snippet are likely to be received. An appropriate translator can be dynamically selected to produce a translation of a snippet either as a result of the snippet being selected for pre-translation or from another trigger, such as a user requesting a translation of the snippet. Different translators can generate high quality translations after a period of time or other translators can generate lower quality translations earlier. Dynamic selection of translators involves dynamically selecting machine or human translation, e.g., based on a quality of translation that is desired. Translations can be improved over time by employing better machine or human translators, such as when a snippet is identified as being more popular.

CORPUS GENERATION DEVICE AND METHOD, HUMAN-MACHINE INTERACTION SYSTEM
20180004730 · 2018-01-04 ·

A corpus generation device and method, the device comprising: a segmentation module, connected to at least one monolingual parallel corpus for segmenting a sentence into words and processing the segmented words by a knowledge-driven approach; a classification module, for classifying sentences having different tag sequences but the same meaning into the same sentence cluster; a mapping module, for determining the categories of sentence structures of all the sentences in the sentence cluster, recording and storing a mapping mode for transforming tags between sentence structures when different categories of sentence structures in the same sentence cluster are transformed; a sentence structure generation module, for generating sentence structures according to a first mapping mode between a first category of sentence structures in one of the sentence clusters and other categories of sentence structures in the same sentence cluster; and a corpus generation module, for nesting a word corresponding to a sequence tag to generate a new monolingual parallel corpus.

TEXT TRANSLATION USING CONTEXTUAL INFORMATION RELATED TO TEXT OBJECTS IN TRANSLATED LANGUAGE
20180012278 · 2018-01-11 ·

In an example embodiment, input is received from a first user of a computer system. A text object relating to a first item from the input is created, and translated from a first language to a second language. A plurality of text objects, in the second language, having text similar to the translated text object, are located in a database, each text object comprising textual information pertaining to the first item. The plurality of text objects having text similar to the translated text are then ranked based on a comparison of the contextual information about the first item and the contextual information stored in the database for the plurality of text objects having text similar to the translated text object. At least one of the ranked text objects is translated to the first language.

Method of Lemmatization, Corresponding Device and Program
20180011835 · 2018-01-11 ·

A method is provided for creating a lexical tree from a statement in a natural language. The method is implemented by a natural-language processing module. The method includes: receiving a statement in natural language in the form of a string of characters; iteratively processing the statement as a function of at least one processing parameter and one ontological dictionary, delivering at least one relational graph corresponding to at least one lexical item included in the statement in natural language; and creating a data structure at output having all possible combinations of the lexical items of the statement in natural language on the basis of the at least one relational graph.

ARCHITECTURE FOR MULTI-DOMAIN NATURAL LANGUAGE PROCESSING

Features are disclosed for processing a user utterance with respect to multiple subject matters or domains, and for selecting a likely result from a particular domain with which to respond to the utterance or otherwise take action. A user utterance may be transcribed by an automatic speech recognition (“ASR”) module, and the results may be provided to a multi-domain natural language understanding (“NLU”) engine. The multi-domain NLU engine may process the transcription(s) in multiple individual domains rather than in a single domain. In some cases, the transcription(s) may be processed in multiple individual domains in parallel or substantially simultaneously. In addition, hints may be generated based on previous user interactions and other data. The ASR module, multi-domain NLU engine, and other components of a spoken language processing system may use the hints to more efficiently process input or more accurately generate output.

COMPUTING DEVICE AND CORRESPONDING METHOD FOR GENERATING DATA REPRESENTING TEXT
20180011834 · 2018-01-11 ·

An example method involves (i) accessing first data representing text, wherein the text defines at least one position representing a particular type of grammatical break between two portions of the text; (ii) identifying, from among the at least one position, a position that is closest to a target position within the text; (iii) based on the identified position within the text, generating second data that represents a proper subset of the text, wherein the proper subset extends from an initial position within the text to the identified position within the text; and (iv) providing output based on the generated second data.

Annotation Assisting Apparatus and Computer Program Therefor

annotation data generation assisting system includes: an input/output device receiving an input through an interactive process; morphological analysis system and dependency parsing system performing morphological and dependency parsing on text data in text archive; first to fourth candidate generating units detecting a zero anaphor or a referring expression in the dependency relation of a predicate in a sequence of morphemes, identifying a position as an object of annotation and estimating candidates of expressions to be inserted by using language knowledge; a candidate DB storing estimated candidates; and an interactive annotation device reading candidates of annotation from candidate DB and annotate a candidate selected by an interactive process by input/output device.

SYNTAX ANALYZING DEVICE, LEARNING DEVICE, MACHINE TRANSLATION DEVICE AND STORAGE MEDIUM
20180011833 · 2018-01-11 ·

A syntax analyzing device includes: a syntax analyzing unit that analyzes syntax of a sentence received by a receiving unit, thereby acquiring a first analysis result, which is an analysis result having one or more elements constituting the sentence and parts of speech of the respective one or more elements and has one or more binary trees each having the parts of speech or the elements as nodes; a category acquiring unit that acquires categories of the respective one or more elements constituting the sentence; a category inserting unit that acquires a second analysis result in which the categories of the elements are respectively inserted between the elements and the parts of speech of the elements, which respectively correspond to the one or more categories, and constituting the first analysis result; and a learning unit that outputs the second analysis result acquired by the category inserting unit.

COMPUTER DATA SYSTEM DATA SOURCE REFRESHING USING AN UPDATE PROPAGATION GRAPH

Described are methods, systems and computer readable media for data source refreshing.

Tibetan Character Constituent Analysis Method, Tibetan Sorting Method And Corresponding Devices
20180011836 · 2018-01-11 ·

The present invention discloses a Tibetan character constituent analysis method, a Tibetan sorting method and corresponding devices, and relates to the field of natural language processing. The present invention is proposed to solve the problem that the existing Tibetan sorting methods have no universality or compatibility, which is inconvenient for the use of automatic computer Tibetan sorting. The technical solution provided by the present invention includes: S10, acquiring a Tibetan text to be analyzed; S20, using Tibetan characters in the Tibetan text as the input of a preset finite state automaton group; and S30, acquiring the constituents of the Tibetan characters according to a target finite state automaton, when the target finite state automaton in the finite state automaton group determines that the Tibetan characters in the Tibetan text are correctly spelled.