G06F40/216

Descriptor uniqueness for entity clustering

A mechanism is provided in a data processing system to implement a cognitive natural language processing (NLP) system with descriptor uniqueness identification to support named entity mention clustering. The mechanism annotates a set of documents from a corpus of documents for entity types and mentions, collects descriptor usages from all documents in the corpus of documents, analyzes the descriptor usages to classify the descriptors as base terms or modifier terms, generates compatibility scores for the descriptors, and performs entity merging of entity clusters based on the compatibility scores.

Interpretable label-attentive encoder-decoder parser

Systems and methods for parsing natural language sentences using an artificial neural network (ANN) are described. Embodiments of the described systems and methods may generate a plurality of word representation matrices for an input sentence, wherein each of the word representation matrices is based on an input matrix of word vectors, a query vector, a matrix of key vectors, and a matrix of value vectors, and wherein a number of the word representation matrices is based on a number of syntactic categories, compress each of the plurality of word representation matrices to produce a plurality of compressed word representation matrices, concatenate the plurality of compressed word representation matrices to produce an output matrix of word vectors, and identify at least one word from the input sentence corresponding to a syntactic category based on the output matrix of word vectors.

Token-position handling for sequence based neural networks

Embodiments of the present disclosure include a method for token-position handling comprising: processing a first sequence of tokens to produce a second sequence of tokens, wherein the second sequence of tokens has a smaller number of tokens than the first sequence of tokens; masking at least some tokens in the second sequence to produce masked tokens; moving the masked tokens to the beginning of the second sequence to produce a third sequence; encoding tokens in the third sequence into a set of numeric vectors in a first array; and processing the first array in a transformer neural network to determine correlations among the third sequence, the processing the first array producing a second array.

Structured adversarial, training for natural language machine learning tasks

A method includes obtaining first training data having multiple first linguistic samples. The method also includes generating second training data using the first training data and multiple symmetries. The symmetries identify how to modify the first linguistic samples while maintaining structural invariants within the first linguistic samples, and the second training data has multiple second linguistic samples. The method further includes training a machine learning model using at least the second training data. At least some of the second linguistic samples in the second training data are selected during the training based on a likelihood of being misclassified by the machine learning model.

SYSTEMS AND PROCESSES OF POSITION FULFILLMENT

The present disclosure relates generally to systems and processes for position fulfillment and, more particularly, to systems and methods of identifying and matching human resources to an open employment position within an organization. The method includes: obtaining, by a computer system, one or more profiles from one or more data sources; analyzing, by the computer system, the one or more profiles to parse attributes and find similarities and/or recurring occurrences in the parsed attributes; normalizing the parsed attributes based on the at least one similarities and recurring occurrences; and matching the normalized attributes to attributes of an open position.

Processing structured documents using convolutional neural networks
11550871 · 2023-01-10 · ·

Structured documents are processed using convolutional neural networks. For example, the processing can include receiving a rendered form of a structured document; mapping a grid of cells to the rendered form; assigning a respective numeric embedding to each cell in the grid, comprising, for each cell: identifying content in the structured document that corresponds to a portion of the rendered form that is mapped to the cell, mapping the identified content to a numeric embedding for the identified content, and assigning the numeric embedding for the identified content to the cell; generating a matrix representation of the structured document from the numeric embeddings assigned to the cells of the grids; and generating neural network features of the structured document by processing the matrix representation of the structured document through a subnetwork comprising one or more convolutional neural network layers.

Methods and systems for generating domain-specific text summarizations

Embodiments provide methods and systems for generating domain-specific text summary. Method performed by processor includes receiving request to generate text summary of textual content from user device of user and applying pre-trained language generation model over textual content for encoding textual content into word embedding vectors. Method includes predicting current word of the text summary, by iteratively performing: generating first probability distribution of first set of words using first decoder based on word embedding vectors, generating second probability distribution of second set of words using second decoder based on word embedding vectors, and ensembling first and second probability distributions using configurable weight parameter for determining current word. First probability distribution indicates selection probability of each word being selected as current word. Method includes providing custom reward score as feedback to second decoder based on custom reward model and modifying second probability distribution of words for text summary based on feedback.

Methods and systems for generating domain-specific text summarizations

Embodiments provide methods and systems for generating domain-specific text summary. Method performed by processor includes receiving request to generate text summary of textual content from user device of user and applying pre-trained language generation model over textual content for encoding textual content into word embedding vectors. Method includes predicting current word of the text summary, by iteratively performing: generating first probability distribution of first set of words using first decoder based on word embedding vectors, generating second probability distribution of second set of words using second decoder based on word embedding vectors, and ensembling first and second probability distributions using configurable weight parameter for determining current word. First probability distribution indicates selection probability of each word being selected as current word. Method includes providing custom reward score as feedback to second decoder based on custom reward model and modifying second probability distribution of words for text summary based on feedback.

SYSTEM PERFORMANCE LOGGING OF COMPLEX REMOTE QUERY PROCESSOR QUERY OPERATIONS

Described are methods, systems and computer readable media for performance logging of complex query operations.

SYNTAX ANALYZING DEVICE, LEARNING DEVICE, MACHINE TRANSLATION DEVICE AND STORAGE MEDIUM
20180011833 · 2018-01-11 ·

A syntax analyzing device includes: a syntax analyzing unit that analyzes syntax of a sentence received by a receiving unit, thereby acquiring a first analysis result, which is an analysis result having one or more elements constituting the sentence and parts of speech of the respective one or more elements and has one or more binary trees each having the parts of speech or the elements as nodes; a category acquiring unit that acquires categories of the respective one or more elements constituting the sentence; a category inserting unit that acquires a second analysis result in which the categories of the elements are respectively inserted between the elements and the parts of speech of the elements, which respectively correspond to the one or more categories, and constituting the first analysis result; and a learning unit that outputs the second analysis result acquired by the category inserting unit.