G06F17/27

Latent Ambiguity Handling in Natural Language Processing
20170371860 · 2017-12-28 ·

Mechanisms are provided for latent ambiguity handling in natural language processing. The method may include: providing an annotated semantic graph based on a knowledge base in which nodes representing semantic concepts are linked by semantic relationships, wherein one or more nodes are annotated to indicate a latent ambiguity of the semantic concept; processing a natural language input by activation of the annotated semantic graph; during processing when encountering an annotated node indicating a latent ambiguity of the semantic concept, applying a pre-defined latent ambiguity process to the node; and replacing any annotated nodes indicating a latent ambiguity of the concept remaining in an output of the processing of the natural language input with an alternative node from the semantic graph.

ARCHITECTURE AND PROCESSES FOR COMPUTER LEARNING AND UNDERSTANDING

An architecture and processes enable computer learning and developing an understanding of arbitrary natural language text through collaboration with humans in the context of joint problem solving. The architecture ingests the text and then syntactically and semantically processes the text to infer an initial understanding of the text. The initial understanding is captured in a story model of semantic and frame structures. The story model is then tested through computer generated questions that are posed to humans through interactive dialog sessions. The knowledge gleaned from the humans is used to update the story model as well as the computing system's current world model of understanding. The process is repeated for multiple stories over time, enabling the computing system to grow in knowledge and thereby understand stories of increasingly higher reading comprehension levels.

MACHINE TRANSLATION SYSTEM EMPLOYING CLASSIFIER
20170371870 · 2017-12-28 · ·

Exemplary embodiments relate to detecting, removing, and/or replacing objectionable words and phrases in a machine-generated translation. A classifier identifies translations containing target words or phrases. The classifier may be applied to the output translation to remove target words and phrases from the translation, or to prevent target words and phrases from being automatically presented. Further, the classifier may be applied to a translation model to prevent the target words and phrases from appearing in the output translation. Still further, the classifier may be applied to training data so that the translation model is not trained using the target words of phrases. The classifier may remove target words or phrases only when the target words or phrases appear in the output translation but not the source language input data. The classifier may be provided as a standalone service, or may be employed in the context of a machine translation system.

SYSTEM AND METHOD OF ASSESSING DEPTH-OF-UNDERSTANDING
20170372630 · 2017-12-28 ·

A computer-implemented method of assessing depth-of-understanding is disclosed. The method may comprise receiving a target response associated with a question. The method may comprise receiving a learner response associated with the question and calculating a total relevancy score for each of the learner responses by determining a relevancy score with respect to each word, keyword, and keyphrase in the target response text and in light of all learner responses, and summing all relevancy scores determined for each learner response. The method may also comprise assigning each of the learner responses to one of a plurality of tiers based upon which tier contains the total relevancy score of the learner response and sending to a facilitator client device an assessed response for each of the learner responses. The assignment of each of the learner responses to one of the tiers may be substantially concurrent with the receipt of the learner response.

CREATING RULES AND DICTIONARIES IN A CYCLICAL PATTERN MATCHING PROCESS

In an approach to rule and dictionary creation based on an improvement cycle, a computing device receives a corpus, wherein the corpus comprises textual data. The computing device stores, in a rule database, a rule based on a user input. The computing device stores, in a dictionary, one or more words based on the corpus. The computing device updates one or more of the rule database and the dictionary based on an improvement cycle comprising user selections from system-generated recommendations.

HYBRID APPROACH FOR SHORT FORM DETECTION AND EXPANSION TO LONG FORMS

Embodiments provide a system and method for short form and long form detection. Given candidate short forms, the system can generate one or more n-gram combinations, resulting in one or more candidate short form and n-gram combination pairs. For each candidate short form and n-gram combination pair, the system can calculate an approximate string matching distance, calculate a best possible alignment score, calculate a confidence score, calculate a topic similarity score, and calculate a semantic similarity score. The system can determine the validity, through a meta learner, of the one or more valid candidate short form and n-gram combination pairs based upon each short form and n-gram combination pair's confidence score, topic similarity score, and semantic similarity score, and store the valid short form and n-gram combination pairs in a repository. The system has no language specific constraints and can extract short form and long form pairs from documents written in various languages. The system is also not limited to whether the language of the given corpus is case sensitive or not.

HYBRID APPROACH FOR SHORT FORM DETECTION AND EXPANSION TO LONG FORMS

Embodiments provide a system and method for short form and long form detection. Using a language-independent process, the detection system can ingest a corpus of documents, pre-process those documents by tokenizing the documents and performing a part-of-speech analysis, and can filter one or more candidate short forms using one or more filters that select for semantic criteria. Semantic criteria can include the part of speech of a token, whether the token contains more than a pre-determined amount of symbols or digits, whether the token appears too frequently in the corpus of documents, and whether the token has at least one uppercase letter. The detection system can detect short forms independent of case and punctuation, and independent of language-specific metaphone variants.

IDENTIFYING RISKY TRANSLATIONS

Exemplary embodiments provide techniques for evaluating when words or phrases of a translation were generated with a low degree of confidence, and conveying this information when the translation is presented. For example, if a source language word is encountered in source material for translation, but the source language word was only encountered a few times (or not at all) in the training data used to train the translation system, then the resulting translation may be flagged as being of low confidence. Other situations, such as the generation of two equally-likely translations, or translation system model disagreement, may also indicate a questionable translation. When the translation is displayed, questionable words and phrases may be flagged, and possible alternative translations may be presented. If one of the alternatives is selected, this information may be used to update the translation system's models in order to improve translation quality in the future.

COMPUTERIZED SYSTEM AND METHOD FOR MODIFYING A MEDIA FILE BY AUTOMATICALLY APPLYING SECURITY FEATURES TO SELECT PORTIONS OF MEDIA FILE CONTENT
20170372046 · 2017-12-28 ·

Disclosed are systems and methods for improving interactions with and between computers in content providing, generating, securing and/or hosting systems supported by or configured with personal computing devices, servers and/or platforms. The systems interact to identify and retrieve data within or across platforms, which can be used to improve the security and quality of data used in processing interactions between or among processors in such systems. The disclosed systems and methods provide added security features and functionality to media files through computerized, automated encoding and decoding of portions of media file content such that identified portions of the content are obfuscated upon display and communication to other users. The disclosed systems and methods provide a novel, computerized security feature(s) that protects data of media files from unintended exposure to third parties. The disclosed security features automatically prevent personal and/or private information from unwanted viewing and access from unauthenticated users.

METHOD AND SYSTEM FOR MANAGING STORAGE SYSTEM
20170371597 · 2017-12-28 ·

Embodiments of the present invention provide a method and a system for managing a storage system. Specifically, in one embodiment of the present invention there is provided a method for managing a storage system, the method comprising: in response to receiving a write request for writing target data to the storage system, writing the target data to an intermediate address range in an intermediate storage area of the storage system; parsing, based on an address mapping of the storage system, a target address range associated with the write request so as to obtain an actual address range; and moving the target data from the intermediate address range to the actual address range. In one embodiment of the present invention there is further provided a corresponding system and apparatus.