Patent classifications
G06F16/316
Search engines and systems with handheld document data capture devices
Embodiments of the disclosed innovations provide systems and methods for locating data associated with rendered documents. Some embodiments support the use of a handheld document data capture device.
TRAINING A NEURAL DATABASE FOR EFFICIENT DOCUMENT SEARCH
Aspects of the disclosure provide a method, including: generating a plurality of text chunks by processing a document, wherein each text chunk includes: a configured portion of the document, and location metadata associated with the document; processing, with a machine learning model, a first subset of the text chunks to extract contextual metadata; processing, with the machine learning model, a second subset of the text chunks to extract index metadata; generating a first structured data file including a mapping between the contextual metadata and the location metadata; generating a second structured data file including a mapping between the index metadata and the location metadata; associating each contextual metadatum and each index metadatum with at least one text chunk based on the first structured data file and the second structured data file to generate a plurality of augmented text chunks; and training a neural database based on the augmented text chunks.
Information retrieval using an augmented query produced by graph convolution
An information retrieval technique uses one or more machine-trained models to generate one or more metadata embeddings. The technique then combines a query embedding with the metadata embedding(s). In some cases, the technique performs this operation using a graph convolution operation. This yields an augmented embedding. The technique then uses the augmented embedding to retrieve at least one item. The augmented embedding lies in the same vector space as target-item embeddings associated with candidate target items. Otherwise, the vector spaces associated with the query embedding and metadata embedding(s) can be different. In some implementations, the technique use dense retrieval, which enables the technique to deliver output results in real time.
AIRCRAFT HARDWARE COMPONENT ROTABILITY CLASSIFICATION USING MACHINE LEARNING
An application extracts a plurality of features of a hardware component of an aircraft. The application inputs a first subset of features of the plurality of features into a first machine learning model, and receives as output a first determination of whether the hardware component is rotable. The application inputs a second subset of features of the plurality of features into a second machine learning model, and receives as output a second determination of whether the hardware component is rotable. The applications determines, based on the first determination and the second determination, a final determination of whether the hardware component is rotable, and adds a data structure for the hardware component with the final determination in a searchable database. The application receives a query from a user that is associated with the hardware component, runs a search, outputs whether the hardware component is rotable.
System and method for hybrid multilingual search indexing
System and method for the indexing and searching of multilingual documents are disclosed.
ARTIFICIAL INTELLIGENCE-ASSISTED AUTOMATED ANALYSIS AND COMPARISON OF UNSTRUCTURED CONTRACTUAL DOCUMENTS IN VIEW OF CONTRACT STANDARDS
In an illustrative embodiment, systems for performing automated comparisons of contractual agreements include a vector database storing high-dimensional vectors each representing a translated text section of an unstructured document related to corresponding contractual agreement, a knowledge graph storing a taxonomy and/or ontology of relationships pertinent to a standard document type, a generative AI model tuned to analyze sets of vectors corresponding to a set of documents, a document processing pipeline configured to convert unstructured documents for storage in the vector database, and an AI-enhanced virtual agent configured to automatically compare contents of a set of unstructured documents using vector sets from the vector database and information stored in the knowledge graph.
Prediction of table column items in unstructured documents using a hybrid model
One example method includes collecting annotated unstructured documents that each include a table with words whose respective column indices are known, using the documents to train a model to detect a table header in a given document, identifying, by the model, a region of a document that corresponds to a table header in a new document that is not part of the training data, using an algorithm to perform a segmentation process on the table header that identifies column boundaries in the table header, and to use the identified column boundaries to preliminarily assign a respective column index to each word in the table header. Finally, a graph neural network model is run on a graph that includes the words in the table, and running the graph neural network generates a refined prediction of a respective column index for each of the words in the table of the new document.
EMBEDDINGS-BASED INDEX FOR CONTENT SIMILARITY OPERATIONS IN OBJECT STORES
Generating embeddings offline for content similarity functionality is disclosed. Objects stored in a storage system are processed offline to generate embeddings. The embeddings are stored in an embeddings index. The process of generating the embeddings is guided by policies. Content similarity searches may be performed inline by generating embeddings for an input object and then searching the embeddings index based on the input embeddings for the input object. The embeddings index allows additional functionality to be implemented based on the content-similarity search.
LEVERAGING LARGE LANGUAGE MODELS (LLMS) FOR SEMANTICALLY CHUNKING CONTENT
This disclosure provides methods, devices, and systems for generating vector embeddings. The present implementations more specifically relate to techniques for segmenting data along semantic boundaries to be mapped to vector embeddings. In some aspects, a data orchestration system may determine one or more semantic boundaries associated with a data asset based on a neural network model and segment the data asset into chunks based at least in part on the one or more semantic boundaries. The data orchestration system further maps each chunk to a respective vector embedding associated with the neural network model.
Recommendation Generation with User Values
A system can obtain a first prompt as output from inputting an alert about a computer system to a first retrieval-augmented generation system (RAG). The system can obtain a first answer as output from inputting the first prompt to a first large language model (LLM). The system can obtain a value maintained by an entity associated with the computing system as output from inputting the alert to a second RAG. The system can obtain a second answer as output from inputting the first answer, the value, and a second prompt to a second LLM, wherein the second LLM comprises the first LLM or another LLM different from the first LLM. The system can obtain a third answer as output from inputting the second answer, user information associated with the entity, and a third prompt to a third LLM. The system can make the third answer available to the entity.