G06F16/3347

MACHINE LEARNING ENHANCED CLASSIFIER
20230046471 · 2023-02-16 ·

The presently disclosed subject matter includes a computerized method and system that provide the ability to train and execute a unique machine learning (ML) model specifically configured to enhance classifier (e.g., RegEx) output by identifying and removing false positive results from the classifiers output. Classifier output, comprising a collection of data-subsets (e.g., columns in a relational database) of one or more structured or semi-structured data sources (e.g., tables of a relational database), are transformed to be represented by a plurality of numerical vectors. The numerical vectors are used during a training phase (as well as the execution phase) for training a machine learning model to enhance the classifier output and reduce false positives.

WORD MINING METHOD AND APPARATUS, ELECTRONIC DEVICE AND READABLE STORAGE MEDIUM

The present disclosure provides a word mining method and apparatus, an electronic device and a readable storage medium, and relates to the field of artificial intelligence technologies, such as natural language processing technologies, deep learning technologies, cloud service technologies, or the like. The word mining method includes: acquiring search data; taking first identification information, a search sentence and second identification information in the search data as nodes, and taking a relationship between the first identification information and the search sentence, a relationship between the first identification information and the second identification information and a relationship between the search sentence and the second identification information as sides to construct a behavior graph; obtaining a label vector of each search sentence in the behavior graph according to a search sentence with a preset label in the behavior graph; determining a target search sentence in the behavior graph according to the label vector; and extracting a target word from the target search sentence, and taking the target word as a word mining result of the search data.

RELATIONSHIP ANALYSIS USING VECTOR REPRESENTATIONS OF DATABASE TABLES
20230051059 · 2023-02-16 · ·

A computer-implemented method includes representing a plurality of database tables as respective vectors in a multi-dimensional vector space, receiving an indication that a first database table represented by a first vector and a second database table represented by a second vector are related to each other, moving the respective vectors representing the plurality of database tables in the multi-dimensional vector space in response to the indication, and grouping the plurality of database tables into one or more table clusters based on positions of the respective vectors representing the plurality of database tables in the multi-dimensional vector space.

Query rephrasing using encoder neural network and decoder neural network

A method comprising receiving first data representative of a query. A representation of the query is generated using an encoder neural network and the first data. Words for a rephrased version of the query are selected from a set of words comprising a first subset of words comprising words of the query and a second subset of words comprising words absent from the query. Second data representative of the rephrased version of the query is generated.

Predictive time series data object machine learning system

Provided is a method including obtaining a first data object including a first set of data entries, wherein each data entry of the first set of data entries includes text content associated with a time entry. The method includes generating a first data object score using the text content and the time entries included in the first set of data entries and using scoring parameters, determine that the first data object score satisfies a data object score condition; perform in response to the first data object score satisfying the data object score condition, a condition-specific action associated with the data object score condition.

Database generation from natural language text documents

Some embodiments may perform operations of a process that includes obtaining a natural language text document and use a machine learning model to generate a set of attributes based on a set of machine-learning-model-generated classifications in the document. The process may include performing hierarchical data extraction operations to populate the attributes, where different machine learning models may be used in sequence. The process may include using a pre-trained Bidirectional Encoder Representations from Transformers (BERT) model augmented with a pooling operation to determine a BERT output via a multi-channel transformer model to generate vectors on a per-sentence level or other per-text-section level. The process may include using a finer-grain model to extract quantitative or categorical values of interest, where the context of the per-sentence level may be retained for the finer-grain model.

TECHNOLOGIES FOR RELATING TERMS AND ONTOLOGY CONCEPTS

This disclosure enables various technologies that can (1) learn new synonyms for a given concept without manual curation techniques, (2) relate (e.g., map) some, many, most, or all raw named entity recognition outputs (e.g., “United States”, “United States of America”) to ontological concepts (e.g., ISO-3166 country code: “USA”), (3) account for false positives from a prior named entity recognition process, or (4) aggregate some, many, most, or all named entity recognition results from machine learning or rules based approaches to provide a best of breed hybrid approach (e.g., synergistic effect).

METHOD AND APPARATUS FOR QUERYING QUESTIONS, DEVICE, AND STORAGE MEDIUM

Provided is a method for querying questions. The method includes: acquiring input information of a user; acquiring intention information of the user based on the input information of the user; determining an answer generation rule; and generating, based on the input information and the intention information, a first answer in accordance with the answer generation rule, and providing the first answer to the user.

IDENTIFYING AND TRANSFORMING TEXT DIFFICULT TO UNDERSTAND BY USER

A computer-implemented method, system and computer program product for improving understandability of text by a user. A final word vector for each word in a sentence of a document is computed, such as by averaging a first word vector and a second word vector for that word. Furthermore, elements of a user portrait are vectorized. A distance is then computed between a vector for each word in the sentence and a vectorized element in the user’s portrait which is summed to form an evaluation result for the element. An evaluation result is also formed for every other element in the user’s portrait by performing such a computation step. A “final evaluation result” is then generated corresponding to the evaluation results for every element in the user’s portrait. The document is then transformed in response to the final evaluation result indicating a lack of understanding of the sentence by the user.

Identifying similar content in a multi-item embedding space

Systems and methods for identifying content for an input query are presented. A mapping model is trained to map elements of an input query embedding vector for a received query into one or more elements of a destination embedding vector. In response to receiving an input query, an input query embedding vector is generated that projects into an input query embedding space. The input query embedding vector is processed by the mapping model to map the input query embedding vector into one or more elements of a destination embedding vector in a destination embedding space, resulting in a partial destination embedding vector. Items of a corpus of content are projected into the destination embedding space and the partial destination embedding vector is also projected into the destination embedding space. A similarity measure determines the most-similar items to the partial destination embedding vector and at least some of the most-similar items are returned in response to the input query.