G06F40/284

Natural language processing engine for translating questions into executable database queries
11573957 · 2023-02-07 · ·

A system and method for translating questions into database queries are provided. A text to database query system receives a natural language question and a structure in a database. Question tokens are generated from the question and query tokens are generated from the structure in the database. The question tokens and query tokens are concatenated into a sentence and a sentence token is added to the sentence. A BERT network generates question hidden states for the question tokens, query hidden states for the query tokens, and a classifier hidden state for the sentence token. A translatability predictor network determines if the question is translatable or untranslatable. A decoder converts a translatable question into an executable query. A confusion span predictor network identifies a confusion span in the untranslatable question that causes the question to be untranslatable. An auto-correction module to auto-correct the tokens in the confusion span.

Systems and methods for determining consensus values

Systems and methods are provided to determine consensus values for duplicate fields in a document or form.

Systems and methods for determining consensus values

Systems and methods are provided to determine consensus values for duplicate fields in a document or form.

Identifying similar documents in a file repository using unique document signatures
11593439 · 2023-02-28 · ·

Methods, systems, and non-transitory computer readable storage media are disclosed for determining clusters of similar digital documents using unique document signatures. Specifically, the disclosed system processes digital text in a digital document to tokenize character strings (e.g., words) in the digital document by combining a subset of character values and string lengths in the character strings. Additionally, the disclosed system generates a document signature for the digital document by combining subsets of tokens generated for the digital document into a token sequence indicative of the digital text in the digital document. The disclosed system determines a cluster of similar digital documents including the digital document by comparing the document signature of the digital document to document signatures corresponding to a plurality of digital documents.

Identifying similar documents in a file repository using unique document signatures
11593439 · 2023-02-28 · ·

Methods, systems, and non-transitory computer readable storage media are disclosed for determining clusters of similar digital documents using unique document signatures. Specifically, the disclosed system processes digital text in a digital document to tokenize character strings (e.g., words) in the digital document by combining a subset of character values and string lengths in the character strings. Additionally, the disclosed system generates a document signature for the digital document by combining subsets of tokens generated for the digital document into a token sequence indicative of the digital text in the digital document. The disclosed system determines a cluster of similar digital documents including the digital document by comparing the document signature of the digital document to document signatures corresponding to a plurality of digital documents.

Machine translation method, device, and computer-readable storage medium

A machine translation method includes: receiving to-be-processed information expressed in a source language; encoding the to-be-processed information, and generating an expression vector sequence of the to-be-processed information; and predicting feature information of a target foresight word at a first moment by using a prediction model. The feature information includes at least one of a part of speech or a word category of the target foresight word. The method also includes: determining a context vector corresponding to the first moment in the expression vector sequence according to the feature information of the target foresight word; and decoding the context vector by using a decoder, to obtain target content that corresponds to the context vector and expressed in a target language.

Conversation history within conversational machine reading comprehension

Aspects described herein include a method of conversational machine reading comprehension, as well as an associated system and computer program product. The method comprises receiving a plurality of questions relating to a context, and generating a sequence of context graphs. Each of the context graphs includes encoded representations of: (i) the context, (ii) a respective question of the plurality of questions, and (iii) a respective conversation history reflecting: (a) one or more previous questions relative to the respective question, and (b) one or more previous answers to the one or more previous questions. The method further comprises identifying, using at least one graph neural network, one or more temporal dependencies between adjacent context graphs of the sequence. The method further comprises predicting, based at least on the one or more temporal dependencies, an answer for a first question of the plurality of questions.

Determining topics and action items from conversations

Embodiments are directed to organizing conversation information. Two or more machine learning (ML) models and a plurality of sentences provided from a conversation may be employed to generate insight scores for each sentence such that each insight score correlates to a probability that its sentence includes one or more of an action or a question. In response to one or more sentences having insight scores that exceed a threshold value an information score and a definiteness score may be determined for the one or more sentences. And one or more insights associated with the conversation may be generated based on the one or more sentences. A report may be generated that associates the one or more insights with one or more portions of the conversation that include the one or more sentences that are associated with the insights.

Method and system for normalization of gene names in medical text

A method (100) for standardizing gene nomenclature, comprising: (i) receiving (110) a source; (ii) tokenizing (120) the source; (iii) comparing (130) a first token to a prefix tree structure with a root node, edges, and leaf nodes; (iv) determining (140) which edge extending from the root node to associated first leaf nodes the first token matches; (v) updating (150) an identification pointer with the location of the first leaf node; (vi) determining (160) which of one or more edges that a second token matches; (vii) updating (170) the identification pointer with the location of the second leaf node; (viii) repeating (172) the determining (160) and updating (170) steps with subsequent tokens until a subsequent token fails to match an edge extending from a leaf node or there is no edge extending from the leaf node; and (ix) providing (180) an identification of a canonical gene name.

Method and system for normalization of gene names in medical text

A method (100) for standardizing gene nomenclature, comprising: (i) receiving (110) a source; (ii) tokenizing (120) the source; (iii) comparing (130) a first token to a prefix tree structure with a root node, edges, and leaf nodes; (iv) determining (140) which edge extending from the root node to associated first leaf nodes the first token matches; (v) updating (150) an identification pointer with the location of the first leaf node; (vi) determining (160) which of one or more edges that a second token matches; (vii) updating (170) the identification pointer with the location of the second leaf node; (viii) repeating (172) the determining (160) and updating (170) steps with subsequent tokens until a subsequent token fails to match an edge extending from a leaf node or there is no edge extending from the leaf node; and (ix) providing (180) an identification of a canonical gene name.