Patent classifications
G06F16/31
Database generation from natural language text documents
Some embodiments may perform operations of a process that includes obtaining a natural language text document and use a machine learning model to generate a set of attributes based on a set of machine-learning-model-generated classifications in the document. The process may include performing hierarchical data extraction operations to populate the attributes, where different machine learning models may be used in sequence. The process may include using a pre-trained Bidirectional Encoder Representations from Transformers (BERT) model augmented with a pooling operation to determine a BERT output via a multi-channel transformer model to generate vectors on a per-sentence level or other per-text-section level. The process may include using a finer-grain model to extract quantitative or categorical values of interest, where the context of the per-sentence level may be retained for the finer-grain model.
Database generation from natural language text documents
Some embodiments may perform operations of a process that includes obtaining a natural language text document and use a machine learning model to generate a set of attributes based on a set of machine-learning-model-generated classifications in the document. The process may include performing hierarchical data extraction operations to populate the attributes, where different machine learning models may be used in sequence. The process may include using a pre-trained Bidirectional Encoder Representations from Transformers (BERT) model augmented with a pooling operation to determine a BERT output via a multi-channel transformer model to generate vectors on a per-sentence level or other per-text-section level. The process may include using a finer-grain model to extract quantitative or categorical values of interest, where the context of the per-sentence level may be retained for the finer-grain model.
Scoring members of a set dependent on eliciting preference data amongst subsets selected according to a height-balanced tree
A software voting or prediction system iteratively solicits participant preferences between members of a set, with a binary tree built used to minimize the number of iterations required. As each member of the set is considered, it is pairwise-compared with select members represented by nodes already in the binary tree, with iterations beginning at a root node of the tree and continuing to a leaf node. The newly considered member is placed as a new leaf node, and the tree is height-rebalanced as appropriate. Red-black tree coloring and tree rotation rules are optionally used for this purpose. Yes/no preference tallies are kept for each member of the set throughout the tree-building process and are ultimately used for scoring. Height-rebalancing of the tree helps minimize the number of iterations needed to precisely score each member of the set relative to its alternatives.
NATURAL LANGUAGE PROCESSING COMPREHENSION AND RESPONSE SYSTEM AND METHODS
An automatic, system-generated, multi-faceted comprehension and response capability, using Natural Language Processing, to provide value specific answers from available unstructured data, documents and text. Questions and queries are interpreted by the system's capability to determine the type of questions and provide a response or answer based on the data or information available. If the answer is in the ingested data, a response is provided that is either; a list of documents, a list of document snippets with the answer contained in the snippets, a formalized and templated response, or a highly relevant hand curated response.
NATURAL LANGUAGE PROCESSING COMPREHENSION AND RESPONSE SYSTEM AND METHODS
An automatic, system-generated, multi-faceted comprehension and response capability, using Natural Language Processing, to provide value specific answers from available unstructured data, documents and text. Questions and queries are interpreted by the system's capability to determine the type of questions and provide a response or answer based on the data or information available. If the answer is in the ingested data, a response is provided that is either; a list of documents, a list of document snippets with the answer contained in the snippets, a formalized and templated response, or a highly relevant hand curated response.
TECHNOLOGIES FOR RELATING TERMS AND ONTOLOGY CONCEPTS
This disclosure enables various technologies that can (1) learn new synonyms for a given concept without manual curation techniques, (2) relate (e.g., map) some, many, most, or all raw named entity recognition outputs (e.g., “United States”, “United States of America”) to ontological concepts (e.g., ISO-3166 country code: “USA”), (3) account for false positives from a prior named entity recognition process, or (4) aggregate some, many, most, or all named entity recognition results from machine learning or rules based approaches to provide a best of breed hybrid approach (e.g., synergistic effect).
Extraction of semantic relation
A computer-implemented method for extracting semantic relations is disclosed. In the method, a plurality of hierarchal structures that originates from a corpus of documents is obtained. Each hierarchal structure includes a plurality of elements having respective recitations included in a corresponding document. In the method, for each predetermined relationship between ancestor and descendant elements in the hierarchal structures, a first keyword list is extracted from the ancestor element and a second keyword list is extracted from the descendant element. A statistical index is calculated for each pair of first and second keywords using the first keyword lists and the second keyword lists. The index indicates a strength of association between the first and second keywords. In the method, a candidate list of keyword pairs having semantic relationships is output using the statistical index calculated for each pair.
Effective retrieval of text data based on semantic attributes between morphemes
An apparatus generates an index including positions of morphemes included in a target text data and semantic attributes between the morphemes corresponding to the positions. The apparatus gives information including positions of morphemes included in an input query and semantic attributes between the morphemes corresponding to the positions to the query, and executes a retrieval on the target text data, based on the information given to the query and the index.
Phrase indexing
Intent-resolution using a phrase index may include obtaining data expressing a usage intent, the data expressing the usage intent including an unresolved data portion, identifying a phrase fragment based on the data expressing the usage intent and a defined phrase pattern, the phrase fragment including the unresolved data portion, identifying, by a processor, an indexed phrase as a candidate phrase by searching a phrase index based on the phrase fragment, wherein the candidate phrase at least partially matches the phrase fragment in accordance with the defined phrase pattern, and outputting the candidate phrase for presentation to a user as a candidate for resolving the unresolved portion.
Phrase indexing
Intent-resolution using a phrase index may include obtaining data expressing a usage intent, the data expressing the usage intent including an unresolved data portion, identifying a phrase fragment based on the data expressing the usage intent and a defined phrase pattern, the phrase fragment including the unresolved data portion, identifying, by a processor, an indexed phrase as a candidate phrase by searching a phrase index based on the phrase fragment, wherein the candidate phrase at least partially matches the phrase fragment in accordance with the defined phrase pattern, and outputting the candidate phrase for presentation to a user as a candidate for resolving the unresolved portion.