G06F16/374

METHOD AND SYSTEM FOR MANAGING WORKFLOWS FOR AUTHORING DATA DOCUMENTS
20230342383 · 2023-10-26 ·

A method and system for managing workflows receives a text string being typed within a data document and executes a connection engine that performs natural language processing (NLP) to extract words and phrases having keywords corresponding to data operations, parse the text string into nested nodes including sub-phrases of arguments and keywords. The arguments and keywords are assembled into one or more complete data operation which is executed to return matching results from within a dataset as dependent phrase candidates to complete the text string. The writer selects a candidate from the dependent phrase candidates in response to which the connection engine creates a persistent text-data connection between the selected candidate and the dataset. This persistent text-data connection automatically updates the selected candidate when one or more of the dataset, arguments, and keywords are modified.

SYSTEM AND METHOD FOR QUERYING A DATA REPOSITORY

The present disclosure relates to methods and systems for querying data in a data repository. According to a first aspect, this disclosure describes a method of querying a database, comprising: receiving, at a computing device, a plurality of keywords; determining, by the computer device, a plurality of datasets relating to the keywords; identifying, by the computer device, metadata for the plurality of datasets indicating a relationship between the datasets by examining an ontology associated with the datasets; providing, by the computer device, one or more suggested database queries in natural language form, the one or more suggested database queries constructed based on the plurality of keywords and the metadata; receiving, by the computing device, a selection of the one or more suggested database queries; and constructing, by the computer device, an object view for the plurality of datasets based on the selected query and the metadata.

Computer-implemented presentation of synonyms based on syntactic dependency

In an embodiment, the disclosed technologies are capable of identifying a target word within a text sequence; displaying a subset of candidate synonyms for the target word, determining a synonym selected from the subset of candidate synonyms, and replacing the target word with the selected synonym, where the subset of candidate synonyms has been created using syntactic dependency data for the target word.

System, computer program product and method for generating embeddings of textual and quantitative data
11423070 · 2022-08-23 · ·

A method, computer program product and computer system is disclosed that generates a set of distributed representation vectors from a dataset of textual and non-text data. In one method, a computer system receives a dataset, cleans the received dataset, parses the cleaned dataset to identify known classes of data, extracts data elements from the dataset based on the known classes of data, organizes the extracted data elements into one or more records, compiles a dictionary of unique data elements and associated codes from the one or more records, creates a set of training pairs using permutations of the codes that correspond to data elements within each record, and computes a distributed representation vector for each of the data elements in the dictionary using the set of training pairs.

Automatic extraction of a training corpus for a data classifier based on machine learning algorithms

An iterative classifier for unsegmented electronic documents is based on machine learning algorithms. The textual strings in the electronic document are segmented using a composite dictionary that combines a conventional dictionary and an adaptive dictionary developed based on the context and nature of the electronic document. The classifier is built using a corpus of training and testing samples automatically extracted from the electronic document by detecting signatures for a set of pre-established classes for the textual strings. The classifier is further iteratively improved by automatically expanding the corpus of training and testing samples in real-time when textual strings in new electronic documents are processed and classified.

CREATING A SUPERSET OF KNOWLEDGE

A method includes determining a set of identigens for words of content to produce a sets of identigens and interpreting the sets of identigens to determine a most likely meaning interpretation of the content and produce a baseline entigen group. The method further includes recovering an incomplete entigen group for the topic from a first knowledge database based on a knowledge defect of the incomplete entigen group with regards to the topic. The method further includes obtaining an additive entigen group from a second knowledge database based on the knowledge defect and modifying the incomplete entigen group utilizing the additive entigen group to produce an updated entigen group to provide a beneficial cure for the knowledge defect of the incomplete entigen group.

SYNONYM MINING METHOD, APPLICATION METHOD OF SYNONYM DICTIONARY, MEDICAL SYNONYM MINING METHOD, APPLICATION METHOD OF MEDICAL SYNONYM DICTIONARY, SYNONYM MINING DEVICE AND STORAGE MEDIUM
20220083733 · 2022-03-17 · ·

Disclosed are a synonym mining method, an application method of a synonym dictionary, a medical synonym mining method, an application method of a medical synonym dictionary, a synonym mining device and a storage medium. The synonym mining method includes: performing a recognition process on corpus data to obtain a named entity set of at least one category, wherein the named entity set of each category includes a plurality of named entities; performing a clustering process on the plurality of named entities in the named entity set of the each category to obtain a synonym candidate set corresponding to the each category; and performing, based on a word from similarity and a context similarity, a filtering process on the synonym candidate set corresponding to the each category to obtain a synonym set corresponding to the each category.

Automated digital asset tagging using multiple vocabulary sets
11301506 · 2022-04-12 · ·

Automated digital asset tagging techniques and systems are described that support use of multiple vocabulary sets. In one example, a plurality of digital assets are obtained having first-vocabulary tags taken from a first-vocabulary set. Second-vocabulary tags taken from a second-vocabulary set are assigned to the plurality of digital assets through machine learning. A determination is made that at least one first-vocabulary tag includes a plurality of visual classes based on the assignment of at least one second-vocabulary tag. Digital assets are collected from the plurality of digital assets that correspond to one visual class of the plurality of visual classes. The model is generated using machine learning based on the collected digital assets.

Finding a resource in response to a query including unknown words

A computer receives a search query from a user for finding a resource. The computer obtains a description on a page on a net on which an unknown word extracted from the search query is found. The computer extracts one or more second words from the description using morphological analysis. The computer assigns at least one second category to the one or more second words extracted from the description. The computer finds, among the one or more second words, a particular word to which a predetermined category is assigned, extracts a correlation word having a high correlation with the particular word, and finds, in a dictionary, a search word from among one or more first words extracted from the search query and assigned a first category that is the same as the predetermined category. The computer finds, from a repository, resource data using the correlation word and the search word.

Artificial intelligence engine for generating semantic directions for websites for automated entity targeting to mapped identities

A method and system for employing a Language Processing machine learning Artificial Intelligence engine to employ word embeddings and term frequency-inverse document frequency to create numerical representations of document meaning in a high-dimensional semantic space or an overall semantic direction. This semantic direction can be used to quantitatively measure semantic similarity between online content consumed by a potential prospect and a given product or product family. The AI can automate the process of creating audiences for on-line marketplaces for programmatic advertising purposes by using representative product descriptions, such as a grouping of product descriptions for scalable, cloud-based databases, and then creating a hyper-focused intent-based audience based on companies that are showing a significant increase in intent.