G06F16/3347

Categorization for a global taxonomy

Methods and systems are provided for generating training data for training a classifier to assign nodes of a taxonomy graph to items based on item descriptions. Each node has a label. For each item, the system identifies for that item one or more candidate paths within the taxonomy graph that are relevant to that item. The system identifies the candidate paths based on content of the item description of that item matching labels of nodes. A candidate path is a sequence of nodes starting a root node of the taxonomy graph. For each identified candidate path, the system labels the item description with the candidate path equivalently with leaf node or label of the leaf node. The labeled item descriptions compose the training data for training the classifier.

Machine learning worker node architecture

A database contains a corpus of incident reports, a machine learning (ML) model trained to calculate paragraph vectors of the incident reports, and a look-up set table that contains a list of paragraph vectors respectively associated with sets of the incident reports. A plurality of ML worker nodes each store the look-up set table and are configured to execute the ML model. An update thread is configured to: determine that the look-up set table has expired; update the look-up set table by: (i) adding a first set of incident reports received since a most recent update of the look-up set table, and (ii) removing a second set of incident reports containing timestamps that are no longer within a sliding time window; store, in the database, the look-up set table as updated; and transmit, to the ML worker nodes, respective indications that the look-up set table has been updated.

DATABASE QUERY GENERATION USING NATURAL LANGUAGE TEXT

Some embodiments may obtain a natural language question, determine a context of the natural language question, and generate a first vector based on the natural language question using encoder neural network layers. Some embodiments may access a data table comprising column names, generate vectors based on the column names, and determine attention scores based on the vectors. Some embodiments may update the vectors based on the attention scores, generating a second vector based on the natural language question, determine a set of strings comprising a name of the column names and a database language operator based on the vectors. Some embodiments may determine a values based on the determined database language operator, the name, using a transformer neural network model. Some embodiments may generate a query based on the set of strings and the values.

SYSTEMS, METHODS, AND APPARATUS FOR PROVIDING DYNAMIC AUTO-RESPONSES AT A MEDIATING ASSISTANT APPLICATION
20230029783 · 2023-02-02 ·

Methods, apparatus, systems, and computer-readable media are provided for providing context specific schema files that allow an automated assistant to broker human-to-computer dialogs between a user and an application that is separate from the automated assistant. The context specific schema file can provide the automated assistant with sufficient data to be responsive to user queries without necessarily communicating with a remote device, such as a server. Multiple different context specific schema files can be made available to the automated assistant according to a context in which a user is interacting with the automated assistant. In this way, latency otherwise exhibited by the automated assistant can be mitigated by providing the automated assistant with the information needed to respond to a user without continually retrieving the information over a network.

COMMUNITY QUESTION-ANSWER WEBSITE ANSWER SORTING METHOD AND SYSTEM COMBINED WITH ACTIVE LEARNING
20230035338 · 2023-02-02 ·

A community question-answer (CQA) web site answer sorting method and system combined with active learning. The sorting method comprises: step S1, performing question-answer data representation and modeling; and step S2, constructing a training set in combination with active learning, and predicting a sorting relationship of candidate question-answer pairs. Also provided is a community question-answer website answer sorting system combined with active learning. CQA website question-answer data is first represented and modeled, interference to answers sorting caused by long tail distribution of the community data is solved by means of a long tail factor, and an attention mechanism is introduced in a convolutional neural network to relieve a semantic gap problem among question-answer texts. Then, an unlabeled training set is also constructed, a sample is additionally selected from the unlabeled training set and labeled, and an answer sorting model is trained again after labeling results are merged.

Machine Learning Augmented System for Medical Episode Identification and Reporting
20230103143 · 2023-03-30 ·

A medical episode analysis engine is provided. The engine generates a first matrix data structure having an entry for each concept pairing and storing a value representing relatedness weighted according to a temporal weighting function. The engine generates a second matrix data structure by calculating, for each entry in the first matrix, a relatedness measure of the concepts in the concept pairing based on a frequency of occurrence together. The engine generates, for each first concept, a concept embedding, based on the second matrix, that specifies, for each other second concept, a temporally weighted relatedness measure. The engine generates, for each anchor concepts, a corresponding episode definition comprising a plurality of related concepts corresponding to a same episode, based on the concept embedding. The engine processes new input data based on the episode definition data structures to identify instances of corresponding episodes in the new input data.

Artificial Intelligence Assisted Originality Evaluator

A method is disclosed, involving converting each structured text document stored in a database into one or more vectors, using the vectors of the structured text documents stored in a database to create a similarity search index, then for each structured text document from the database, searching the search index using the one or more vectors of the structured text document in order to generate a list of N other structured text document from the database similar to the structured text document based on said search; an storing each list of N other structured text document from the database similar to the structured text document in a table.

AUTOMATED CATEGORIZATION AND ASSEMBLY OF LOW-QUALITY IMAGES INTO ELECTRONIC DOCUMENTS

An apparatus includes a memory and processor. The memory stores OCR and NLP algorithms. The processor receives an image of a physical document page and executes the OCR algorithm to convert the image into text. The processor identifies errors in the text, which are associated with noise in the image. The processor generates a feature vector that includes features obtained by executing the NLP algorithm on the text, and features associated with the identified errors in the text. The processor uses the feature vector to assign the image to a document category. Documents assigned to the document category share one or more characteristics, and the feature vector is associated with a probability greater than a threshold that the physical document associated with the image includes those characteristics. The processor then stores the image in a database as a page of an electronic document belonging to the assigned document category.

Artificial Intelligence Assisted Transfer Tool

A method is disclosed, involving converting each structured text document stored in a database into one or more vectors, training a machine learning model to associate structured text document vectors with the journals said structured text document were published in; receiving an additional structured text document, converting said additional structured text document into one or more vectors, and processing the additional structured text document through the trained machine learning model to identify an appropriate journal for publication. Systems and computer-readable media implementing the method are also disclosed.

EMBEDDING SERVICE FOR UNSTRUCTURED DATA

A method may include generating a vector from unstructured data included in an untransformed transaction, and determining, for the vector, a cluster ID of cluster IDs by matching the vector with a matching cluster vector of cluster vectors. The method may further include generating a query using the cluster ID and the untransformed transaction, and transforming, using the cluster IDs, untransformed transactions to transformed transactions. The transformed transactions may each include a cluster ID. The method may further include generating, using the query, a query result from features of the transformed transactions, generating a fraud score using the query result, and presenting the fraud score and the cluster ID.