G06F16/316

EXTRACTING INFORMATION FROM UNSTRUCTURED DOCUMENTS USING NATURAL LANGUAGE PROCESSING AND CONVERSION OF UNSTRUCTURED DOCUMENTS INTO STRUCTURED DOCUMENTS

Aspects of the present disclosure describe techniques for generating a machine learning model for extracting information from textual content. The method generally includes receiving a training data set including a plurality of documents having related textual strings. A relevancy model is generated from the training data set. The relevancy model is generally configured to generate relevance scores for a plurality of words extracted from the plurality of documents. A knowledge graph model illustrating relationships between the plurality of words extracted from the plurality of documents is generated from the training data set. The relevancy model and the knowledge graph model are aggregated into a complimentary model including a plurality of nodes from the knowledge graph model and weights associated with edges between connected nodes, wherein the weights comprise relevance scores generated from the relevancy model, and the complimentary model is deployed for use in analyzing documents.

Multi-format content repository search

An audio file format of an audio portion of a natural language content is determined. Using a trained audio language identification model, a human language included in the audio portion is identified. Using a trained audio to text model trained on the human language, the audio portion is converted to a corresponding set of text data. The set of text data is indexed. Using the indexed set of text data responsive to a search query, a search result is generated, the search query specifying a search including a non-textual portion of the natural language content.

METHODS AND SYSTEMS FOR PROCESSING NATURAL LANGUAGE COMMUNICATIONS
20210303795 · 2021-09-30 ·

Techniques are disclosed for processing natural language communications. A computing device receives a set of natural language communications. The computing device generates a word index from the communications and generates, from the word index, a set of topics, each topic including two or more words. For each topic, the computing device generates a score indicative of an amount of semantic information represented by the topic. The computing device then discards topics that are supersets or subsets of other topics. The computing device presents the remaining topics based on to the score of each topic.

Methods, systems, and computer-readable media for semantically enriching content and for semantic navigation

Methods, systems and computer-readable media enable various techniques related to semantic navigation. One aspect is a technique for displaying semantically derived facets in the search engine interface. Each of the facets comprises faceted search results. Each of the faceted search results is displayed in association with user interface elements for including or excluding the faceted search result as additional search terms to subsequently refine the search query. Another aspect automatically infers new metadata from the content and from existing metadata and then automatically annotates the content with the new metadata to improve recall and navigation. Another aspect identifies semantic annotations by determining semantic connections between the semantic annotations and then dynamically generating a topic page based on the semantic connections.

Method and device for comparing similarities of high dimensional features of images

The present invention provides a method and device for comparing similarities of high dimensional features of images, capable of improving the retrieval speed and retrieval precision in a similarity retrieval from massive images based on Locality Sensitive HASH (LSH) code. The method for comparing similarities of high dimensional features of images according to the present invention comprises: reducing dimensions of extracted eigenvectors of the images by the LSH algorithm to obtain low dimensional eigenvectors; averagely segmenting the low dimensional eigenvectors and establishing a segment index table; retrieving the segmented low dimensional eigenvector of a queried image from the segment index table to obtain a candidate sample set; and performing a similarity metric between a sample in the candidate sample set and the low dimensional eigenvector of the queried image.

COOPERATIVE BUILD AND CONTENT ANNOTATION FOR CONVERSATIONAL DESIGN OF VIRTUAL ASSISTANTS

In an approach for a cooperative build and content annotation system for conversational design of virtual assistants, a processor formulates a build context based on a build activity of a user. A processor formulates one or more content queries based on the build context. A processor builds a content index by augmenting a text-search index with a neural Information Retrieval (IR) index. A processor searches the content index using the one or more content queries to identify content relevant to the build context. A processor determines at least one recommendation for the user based on heuristic rules applied to the build context and the identified content, wherein each recommendation is a build suggestion or a content annotation suggestion.

Efficient resolution of syntactic patterns in question and answer (QA) pairs in an n-ary focus cognitive QA system

Embodiments for processing questions based on equivalence classes in a cognitive question answering system. A plurality of syntactic representations of a plurality of questions asked of the cognitive question answering system are provided. A plurality of syntactic representations of a plurality of passages ingested by the cognitive question answering system are provided. Question focus to candidate passage pairs are mapped to form an equivalence class mapping, and the equivalence class mapping is used to determine an answer to one of the plurality of questions asked of the cognitive question answering system.

Framework for analyzing graphical data by question answering systems

A system for handling a graphical representation of data associated with a question answering (QA) input document includes a memory having instructions therein and includes at least one processor in communication with the memory. The at least one processor is configured to execute the instructions to derive, at least from a portion of the QA input document, first metadata regarding a context of the graphical representation of data. The at least one processor is also configured to execute the instructions to derive, at least from a portion of the graphical representation of data, tabular data. The at least one processor is also configured to execute the instructions to determine, at least in part by comparing at least a portion of the first metadata to existing table annotations from a QA knowledge base, how to incorporate the tabular data into the QA knowledge base.

Metadata indexing

Techniques for searching using metadata indexing. In some implementations, a computing device receives data indicating a search request from a client device. The computing device analyzes the received data indicating the search request to determine content of the search request. The computing device receives one or more dossiers based on the content of the search query. The computing device identifies metadata and one or more index templates corresponding to each of the one or more retrieved dossiers. The computing device determines one or more matches between the data indicating the search query to the metadata and the one or more index templates corresponding to each of the one or more retrieved dossiers. The computing device generates search results that include the one or more matches based on characteristics of a type of match and weight values applied to each of the one or more matches based on the characteristics of the type of the match. The computing device provides data indicating the search results to the client device.

SOLUTION GRAPH FOR MANAGING CONTENT IN A MULTI-STAGE PROJECT

A method and system provide the ability to manage entities of a marketing domain model in a multi-state workflow. Multiple entities are acquired in a content hub. Each entity is a set of data that belongs together as one and includes properties that describe entity details. Relations are created between the multiple entities to give meaning to the marketing domain model. A solution graph is generated that represents all of the multiple entities (nodes) and relations (edges). Inside the solution graph, a state workflow can be created for each node. Nodes can be linked to a state and there are transitions between the states. Multiple non-linear state workflows can be orchestrated by an overall waterfall-based workflow (that is linear and time duration based. A graphical user interface enables management of and renders a representation of the multiple entities, the solution graph, and the workflows.