G06F40/237

Information processing apparatus, information processing method, and computer-readable recording medium
11507744 · 2022-11-22 · ·

An information processing apparatus includes a lexical analysis unit that generates a training word string, a pair generation unit that generates a plurality of training word pairs, a matrix generation unit that generates, for each training word pair, a training matrix in which a plurality of words and respective semantic vectors of the words are associated, a classification unit that calculates, for a word of each position of the training word string, a probability of the word corresponding to a specific word, using the training matrices generated by the matrix generation unit and a determination model that uses a convolutional neural network, and an optimization processing unit that updates parameters of the determination model, such that the probability of the word labeled as corresponding to the specific word is high, among the probabilities of the words of the respective positions of the training word string calculated by the classification unit.

Paraphrase sentence generation method and apparatus

A paraphrase sentence generation method and apparatus relating to the research field of natural language processing include generating m second sentences based on a first sentence and a paraphrase generation model, determining a matching degree between each of the m second sentences and the first sentence based on a paraphrase matching model, and determining n second sentences from the m second sentences based on matching degrees among the m second sentences and the first sentence, where the paraphrase generation model is obtained through reinforcement learning-based training based on a reward of the paraphrase matching model.

Paraphrase sentence generation method and apparatus

A paraphrase sentence generation method and apparatus relating to the research field of natural language processing include generating m second sentences based on a first sentence and a paraphrase generation model, determining a matching degree between each of the m second sentences and the first sentence based on a paraphrase matching model, and determining n second sentences from the m second sentences based on matching degrees among the m second sentences and the first sentence, where the paraphrase generation model is obtained through reinforcement learning-based training based on a reward of the paraphrase matching model.

Pre-processing a table in a document for natural language processing

Provided are a computer program product, system, and method for pre-processing a table in a document for natural language processing. A table in a document is parsed to extract column headers, row headers, and data cells, which are processed to determine an initial set of a main element comprising an entity whose value is to be extracted, a conditional element that refines the entity, and a value element comprising a value for the entity. A user selection is received of at least one of the column headers, row headers, and data cells for at least one of the main element, conditional element, and the value element in the initial set to produce a modified set of the main element, conditional element, and value element. The modified set is provided to a natural language processing engine to perform natural language processing of the document including the table, using the modified set.

Pre-processing a table in a document for natural language processing

Provided are a computer program product, system, and method for pre-processing a table in a document for natural language processing. A table in a document is parsed to extract column headers, row headers, and data cells, which are processed to determine an initial set of a main element comprising an entity whose value is to be extracted, a conditional element that refines the entity, and a value element comprising a value for the entity. A user selection is received of at least one of the column headers, row headers, and data cells for at least one of the main element, conditional element, and the value element in the initial set to produce a modified set of the main element, conditional element, and value element. The modified set is provided to a natural language processing engine to perform natural language processing of the document including the table, using the modified set.

HIERARCHICAL, PARALLEL MODELS FOR EXTRACTING IN REAL TIME HIGH-VALUE INFORMATION FROM DATA STREAMS AND SYSTEM AND METHOD FOR CREATION OF SAME

A method includes receiving a first post from an electronic source including first content; determining a source identifier; determining an attribute for the source by broadcasting the first post to a first plurality of filter graph definitions configured to identify attributes of sources according to the respective filter graph definition; and storing in memory, as a source profile identified by the source identifier for the source, the attribute for the source; receiving a second post from the source including second content; determining a source identifier; using the source identifier, querying the memory to access the source profile; correlating the second post with attributes of the source stored in the source profile to produce a correlated second post; and broadcasting the correlated second post to a second plurality of filter graph definitions configured to identify posts with high value information according to the respective filter graph definition.

HIERARCHICAL, PARALLEL MODELS FOR EXTRACTING IN REAL TIME HIGH-VALUE INFORMATION FROM DATA STREAMS AND SYSTEM AND METHOD FOR CREATION OF SAME

A method includes receiving a first post from an electronic source including first content; determining a source identifier; determining an attribute for the source by broadcasting the first post to a first plurality of filter graph definitions configured to identify attributes of sources according to the respective filter graph definition; and storing in memory, as a source profile identified by the source identifier for the source, the attribute for the source; receiving a second post from the source including second content; determining a source identifier; using the source identifier, querying the memory to access the source profile; correlating the second post with attributes of the source stored in the source profile to produce a correlated second post; and broadcasting the correlated second post to a second plurality of filter graph definitions configured to identify posts with high value information according to the respective filter graph definition.

COMPUTER-IMPLEMENTED METHOD OF PREPARING A TRAINING DATASET FOR A NATURAL LANGUAGE PROCESSING OR NATURAL LANGUAGE UNDERSTANDING MACHINE LEARNING ALGORITHM
20220358282 · 2022-11-10 ·

Described and claimed is a computer-implemented method of preparing a training dataset for a natural language processing, NLP, or natural language understanding, NLU, machine learning algorithm from an original text dataset, the method comprising the steps of selecting one or more sentences from the original text dataset as selected sentences, determining for each selected sentence one or more grammatical elements of the selected sentence that can be negated as negatable elements, determining for one or more negatable words in each negatable element one or more antonyms, based on each determined antonym creating a negated sentence by replacing the respective negatable element in the selected sentence for which the negatable element was determined with the determined antonym, and adding the negated sentences to the training dataset. Further, a computer-implemented method of training a word embedding or an NLP or NLU machine learning algorithm, a system and a computer program product are described and claimed.

TRANSFORMER-BASED ENCODING INCORPORATING METADATA

From metadata of a corpus of natural language text documents, a relativity matrix is constructed, a row-column intersection in the relativity matrix corresponding to a relationship between two instances of a type of metadata. An encoder model is trained, generating a trained encoder model, to compute an embedding corresponding to a token of a natural language text document within the corpus and the relativity matrix, the encoder model comprising a first encoder layer, the first encoder layer comprising a token embedding portion, a relativity embedding portion, a token self-attention portion, a metadata self-attention portion, and a fusion portion, the training comprising adjusting a set of parameters of the encoder model.

TRANSFORMER-BASED ENCODING INCORPORATING METADATA

From metadata of a corpus of natural language text documents, a relativity matrix is constructed, a row-column intersection in the relativity matrix corresponding to a relationship between two instances of a type of metadata. An encoder model is trained, generating a trained encoder model, to compute an embedding corresponding to a token of a natural language text document within the corpus and the relativity matrix, the encoder model comprising a first encoder layer, the first encoder layer comprising a token embedding portion, a relativity embedding portion, a token self-attention portion, a metadata self-attention portion, and a fusion portion, the training comprising adjusting a set of parameters of the encoder model.