Patent classifications
G06F40/295
MACHINE LEARNING METHOD AND NAMED ENTITY RECOGNITION APPARATUS
A computer divides a character string included in text data into a plurality of tokens. The computer searches, by performing matching processing between a token string indicating a specific number of consecutive tokens among the plurality of tokens and dictionary information including a plurality of named entities, the plurality of named entities for a similar named entity whose similarity to the token string is equal to or more than a threshold. The computer converts matching information indicating a result of the matching processing between the token string and the similar named entity into first vector data. The computer generates input data by using a plurality of pieces of vector data converted from the plurality of tokens and the first vector data. The computer generates a named entity recognition model that detects a named entity by performing machine learning using the input data.
MACHINE LEARNING METHOD AND NAMED ENTITY RECOGNITION APPARATUS
A computer divides a character string included in text data into a plurality of tokens. The computer searches, by performing matching processing between a token string indicating a specific number of consecutive tokens among the plurality of tokens and dictionary information including a plurality of named entities, the plurality of named entities for a similar named entity whose similarity to the token string is equal to or more than a threshold. The computer converts matching information indicating a result of the matching processing between the token string and the similar named entity into first vector data. The computer generates input data by using a plurality of pieces of vector data converted from the plurality of tokens and the first vector data. The computer generates a named entity recognition model that detects a named entity by performing machine learning using the input data.
SYSTEM AND METHOD FOR IMPROVING NAMED ENTITY RECOGNITION
A method includes training a set of teacher models. Training the set of teacher models includes, for each individual teacher model of the set of teacher models, training the individual teacher model to transcribe unlabeled audio samples and predict a pseudo labeled dataset having multiple labels. At least some of the unlabeled audio samples contain named entity (NE) audio data. At least some of the labels include transcribed NE labels corresponding to the NE audio data. The method also includes correcting at least some of the transcribed NE labels using user-specific NE textual data. The method further includes retraining the set of teacher models based on the pseudo labeled dataset from a selected one of the teacher models, where the selected one of the teacher models predicts the pseudo labeled dataset more accurately than other teacher models of the set of teacher models.
System and method for performing a meaning search using a natural language understanding (NLU) framework
The present disclosure is directed to an agent automation framework that is capable of extracting meaning from user utterances and suitably responding using a search-based natural language understanding (NLU) framework. The NLU framework includes a meaning extraction subsystem capable of detecting multiple alternative meaning representations for a given natural language utterance. Furthermore, the NLU framework includes a meaning search subsystem that enables elastic confidence thresholds (e.g., elastic beam-width meaning searches), forced diversity, and cognitive construction grammar (CCG)-based predictive scoring functions to provide an efficient and effective meaning search. As such, the disclosed meaning extraction subsystem and meaning search subsystem improve the performance, the domain specificity, the inference quality, and/or the efficiency of the NLU framework.
System and method for performing a meaning search using a natural language understanding (NLU) framework
The present disclosure is directed to an agent automation framework that is capable of extracting meaning from user utterances and suitably responding using a search-based natural language understanding (NLU) framework. The NLU framework includes a meaning extraction subsystem capable of detecting multiple alternative meaning representations for a given natural language utterance. Furthermore, the NLU framework includes a meaning search subsystem that enables elastic confidence thresholds (e.g., elastic beam-width meaning searches), forced diversity, and cognitive construction grammar (CCG)-based predictive scoring functions to provide an efficient and effective meaning search. As such, the disclosed meaning extraction subsystem and meaning search subsystem improve the performance, the domain specificity, the inference quality, and/or the efficiency of the NLU framework.
Processing entity groups to generate analytics
A computer system processes a group of inputs. A group of entities that is input for processing is intercepted. The intercepted group is expanded into individual entities. Each of the individual entities is processed to produce results for each individual entity. The results for each individual entity are intercepted and merged to produce results for the group of entities. Embodiments of the present invention further include a method and program product for processing a group of inputs in substantially the same manner described above.
Processing entity groups to generate analytics
A computer system processes a group of inputs. A group of entities that is input for processing is intercepted. The intercepted group is expanded into individual entities. Each of the individual entities is processed to produce results for each individual entity. The results for each individual entity are intercepted and merged to produce results for the group of entities. Embodiments of the present invention further include a method and program product for processing a group of inputs in substantially the same manner described above.
Systems and methods for coverage analysis of textual queries
A computer based system and method for assigning queries to topics and/or visualizing or analyzing query coverage may include, using a computer processor, searching, using a set of queries, over a set of text documents, to produce for each query a set of search results for the query. Each search result may include a subset of text from a text document of the set of text documents. For each query, a query vector may be calculated based on the set of search results for the query, and for each of a set of topics describing the set of text documents, a topic vector may be calculated. A report or visualization may be generated including the set of queries and the set of topics using the topic vectors and the query vectors.
Systems and methods for coverage analysis of textual queries
A computer based system and method for assigning queries to topics and/or visualizing or analyzing query coverage may include, using a computer processor, searching, using a set of queries, over a set of text documents, to produce for each query a set of search results for the query. Each search result may include a subset of text from a text document of the set of text documents. For each query, a query vector may be calculated based on the set of search results for the query, and for each of a set of topics describing the set of text documents, a topic vector may be calculated. A report or visualization may be generated including the set of queries and the set of topics using the topic vectors and the query vectors.
Content extraction system
A system includes a content extraction engine comprising at least one processor and configured to receive a content page for a target product including product data for the target product and noise content unrelated to the target product, identify noise content pertaining to data unrelated to the target product, remove noise content from the content page, thereby generating a remainder content page containing target product data usable to enable product comparison between multiple sources.