Patent classifications
G06F40/49
CREATING LINE ITEM INFORMATION FROM FREE-FORM TABULAR DATA
The present disclosure involves systems, software, and computer implemented methods for creating line item information from tabular data. One example method includes receiving event data values at a system. Column headers of columns in the event data values are identified. At least one column header is not included in standard line item terms used by the system. Column values of the columns in the event data values are identified. The identified column headers and the identified column values are processed using one or more models to map each column to a standard line item term used by the system. The processing includes using context determination and content recognition to identify standard line item terms. An event is created in the system, including the creation of line items from the identified column value. Each line item includes standard line item terms mapped to the columns.
Generation of domain thesaurus
Embodiments provide a computer implemented method for generating a domain-specific thesaurus on a cognitive system, comprising: receiving data of the domain-specific corpus and a plurality of terms of interest from a user; splitting the data of the domain-specific corpus into a plurality of sentences using natural language processing techniques; for each term in the plurality of terms of interest, retrieving a plurality of candidate sentences containing a corresponding term, from the plurality of sentences; for each candidate sentence, providing a list of synonyms of the corresponding term, wherein the synonyms are contextual alternatives in the corresponding candidate sentence; for each term in the plurality of terms of interest, tracking a frequency of each synonym, and forming a frequency map including all the synonyms of a corresponding term and the frequency of each synonym; and generating a domain-specific thesaurus based on a combination of all the synonyms in the frequency map.
Generation of domain thesaurus
Embodiments provide a computer implemented method for generating a domain-specific thesaurus on a cognitive system, comprising: receiving data of the domain-specific corpus and a plurality of terms of interest from a user; splitting the data of the domain-specific corpus into a plurality of sentences using natural language processing techniques; for each term in the plurality of terms of interest, retrieving a plurality of candidate sentences containing a corresponding term, from the plurality of sentences; for each candidate sentence, providing a list of synonyms of the corresponding term, wherein the synonyms are contextual alternatives in the corresponding candidate sentence; for each term in the plurality of terms of interest, tracking a frequency of each synonym, and forming a frequency map including all the synonyms of a corresponding term and the frequency of each synonym; and generating a domain-specific thesaurus based on a combination of all the synonyms in the frequency map.
METHOD AND APPARATUS FOR PERFORMING ENTITY LINKING
Provided is a method for performing entity linking between a surface entity mention in a surface text and entities of a knowledge graph, including supplying the surface text to a contextual text representation model, pooling contextual representations of the tokens of a surface entity mention in the surface text with contextual representations of the other tokens within the surface text to provide a contextual entity representation vector representing the surface entity mention; supplying an identifier of a candidate knowledge graph entity to a knowledge graph embedding model, to provide an entity node embedding vector and combining the contextual entity representation vector with the entity node embedding vector to generate an input vector applied to a fully connected layer which provides an unnormalized output transformed by a softmax function into a normalized output processed to classify whether the surface entity mention corresponds to the candidate knowledge graph entity.
METHOD AND APPARATUS FOR PERFORMING ENTITY LINKING
Provided is a method for performing entity linking between a surface entity mention in a surface text and entities of a knowledge graph, including supplying the surface text to a contextual text representation model, pooling contextual representations of the tokens of a surface entity mention in the surface text with contextual representations of the other tokens within the surface text to provide a contextual entity representation vector representing the surface entity mention; supplying an identifier of a candidate knowledge graph entity to a knowledge graph embedding model, to provide an entity node embedding vector and combining the contextual entity representation vector with the entity node embedding vector to generate an input vector applied to a fully connected layer which provides an unnormalized output transformed by a softmax function into a normalized output processed to classify whether the surface entity mention corresponds to the candidate knowledge graph entity.
METHOD AND APPARATUS FOR TRAINING MODELS IN MACHINE TRANSLATION, ELECTRONIC DEVICE AND STORAGE MEDIUM
A method and apparatus for training models in machine translation, an electronic device and a storage medium are disclosed, which relates to the field of natural language processing technologies and the field of deep learning technologies. An implementation includes mining similar target sentences of a group of samples based on a parallel corpus using a machine translation model and a semantic similarity model, and creating a first training sample set; training the machine translation model with the first training sample set; mining a negative sample of each sample in the group of samples based on the parallel corpus using the machine translation model and the semantic similarity model, and creating a second training sample set; and training the semantic similarity model with the second sample training set. With the above-mentioned technical solution of the present application, by training the two models jointly, while the semantic similarity model is trained, the machine translation model may be optimized and nurtures the semantic similarity model, thus further improving the accuracy of the semantic similarity model.
METHOD AND APPARATUS FOR TRAINING MODELS IN MACHINE TRANSLATION, ELECTRONIC DEVICE AND STORAGE MEDIUM
A method and apparatus for training models in machine translation, an electronic device and a storage medium are disclosed, which relates to the field of natural language processing technologies and the field of deep learning technologies. An implementation includes mining similar target sentences of a group of samples based on a parallel corpus using a machine translation model and a semantic similarity model, and creating a first training sample set; training the machine translation model with the first training sample set; mining a negative sample of each sample in the group of samples based on the parallel corpus using the machine translation model and the semantic similarity model, and creating a second training sample set; and training the semantic similarity model with the second sample training set. With the above-mentioned technical solution of the present application, by training the two models jointly, while the semantic similarity model is trained, the machine translation model may be optimized and nurtures the semantic similarity model, thus further improving the accuracy of the semantic similarity model.
NAME ENTITY RECOGNITION WITH DEEP LEARNING
Systems, methods and apparatus are provided for identifying entities in a corpus of text. The system comprising: a first named entity recognition (NER) system comprising one or more entity dictionaries, the first NER system configured to identify entities and/or entity types within a corpus of text based on the one or more entity dictionaries, a second NER system comprising an NER model configured for predicting entities and/or entity types within the corpus of text; and a comparison module configured for identifying entities based on comparing the entity results output from the first and second NER systems, where the identified entities are different to the entities identified by the first NER system. The system may further include an updating module configured to update the one or more entity dictionaries based on the identified entities. The system may further include a dictionary building module configured to build a set of entity dictionaries based on at least the identified entities. The system may further comprise a training module configured to generate or update the NER model by training a machine learning, ML, technique for predicting entities and/or entity types from the corpus of text using a training dataset based on data representative of the identified entities and/or entity types.
NAME ENTITY RECOGNITION WITH DEEP LEARNING
Systems, methods and apparatus are provided for identifying entities in a corpus of text. The system comprising: a first named entity recognition (NER) system comprising one or more entity dictionaries, the first NER system configured to identify entities and/or entity types within a corpus of text based on the one or more entity dictionaries, a second NER system comprising an NER model configured for predicting entities and/or entity types within the corpus of text; and a comparison module configured for identifying entities based on comparing the entity results output from the first and second NER systems, where the identified entities are different to the entities identified by the first NER system. The system may further include an updating module configured to update the one or more entity dictionaries based on the identified entities. The system may further include a dictionary building module configured to build a set of entity dictionaries based on at least the identified entities. The system may further comprise a training module configured to generate or update the NER model by training a machine learning, ML, technique for predicting entities and/or entity types from the corpus of text using a training dataset based on data representative of the identified entities and/or entity types.
MULTILINGUAL SUPPORT FOR NATURAL LANGUAGE PROCESSING APPLICATIONS
A data processing system implements obtaining textual content in a first language from a first client device and segmenting the textual content into a plurality of first tokens. The system also implements translating the first tokens from the first language to a second language using a bilingual dictionary, extracting features information from the second tokens to create a features vector, providing the feature vector to a first natural language processing model trained to analyze textual input in the second language and to output contextual information indicating one or more topics or subject matter of the first textual content, and providing the contextual information to a first machine learning model configured to analyze the contextual information and to identify one or more content items predicted to be relevant to the contextual information. The system further implements providing the information identifying the one or more content items to the first client device.