Patent classifications
G06F16/316
Dynamic optimization of key value pair extractors for document data extraction
Disclosed embodiments provide techniques for monitoring and evaluating the effectiveness of key value pairs (KVPs) used in a document processing system. In embodiments, KVPs are obtained from multiple extractors of a document processing system. A score is computed for the KVPs by computing an effectiveness metric for each KVP from the multiple KVPs. In response to the computed score being below a predetermined threshold, a model retraining process is performed to generate a new set of KVP extractors, and provide the new set of KVPs to the document processing system.
KNOWLEDGE RE-RANKING TECHNIQUES
Techniques are disclosed herein for selecting document chunks that are most relevant to a query. The techniques include receiving a query and comparing a plurality of stored text passages to the query using a first similarity metric. Based on the comparison, a subset of the plurality of stored text passages that are most similar to the query are selected. A plurality of sentences from the subset of the plurality of stored text passages are identified. The identified sentences are ranked based on the query and a second similarity metric. A subset of the sentences are selected based on the ranking. The subset of the sentences or a derivative thereof are output in response to the query.
DOCUMENT PROCESSING AND RETRIEVAL FOR KNOWLEDGE-BASED QUESTION ANSWERING
Techniques are disclosed herein for generating and using a knowledge base of information extracted from documents. The techniques include accessing a document comprising text and dividing the document into a plurality of chunks of text. The chunks are indexed by storing each chunk mapped to respective identifying metadata including a chunk index for each chunk. A query is received and a chunk relevant to the query is identified. A prompt is formulated including the query, the identified relevant chunk, and a subsequent chunk. The prompt is provided to a language model and output is received from the language model based on the prompt. An answer to the query is returned based on the received output.
RETURNING REFERENCES FOR ANSWERS GENERATED BY A LANGUAGE MODEL
Techniques are disclosed for returning references associated with an answer to a query. The techniques include accessing a text portion and identifying a plurality of sentences in the text portion. Each of the sentences is embedded to generate a respective plurality of text sentence embeddings. The text portion or a derivative thereof and a query are provided to a language model and a response to the query based on the text portion is received from the language model. A plurality of sentences are identified in the response. The plurality of sentences in the response is embedded to generate a plurality of response embeddings. The response embeddings are compared to the sentence embeddings to generate a similarity score for each sentence embedding-response embedding pair. Based on the similarity scores, an indication of a subset of the plurality of sentences is output with the response to the query.
Methods and arrangements to adjust communications
Logic may adjust communications between customers. Logic may cluster customers into a first group associated with a first subset of synonyms and a second group associated with a second subset of the synonyms. Logic may associate a first tag with the first group and with each of the synonyms of the first subset. Logic may associate a second tag with the second group and with each of the synonyms of the second subset. Logic may associate one or more models with pairs of the groups. A first pair may comprise the first group and the second group. The first model associated with the first pair may adjust words in communications between the first group and the second group, based on the synonyms associated with the first pair, by replacement of words in a communication between customers of the first subset and customers of the second subset.
Systems and methods for document partitioning and partition labeling
In some aspects, the techniques described herein relate to a method including: determining a first, a second, and a third logical partition separation indicator in a string file, wherein the first logical partition separation indicator is for a first partition level, the second partition separation indicator is for a second level, and the third logical partition separation indicator is also for the first partition level, each in a partition hierarchy; setting a first variable value to a value of the first logical partition separation indicator and a second variable value to a value of the second logical partition separation indicator; writing the first variable value to a data structure and writing the second variable value to the data structure; persisting the data structure to a search index; and clearing the first variable value and the second variable value.
LEVERAGING DATA FOR PLATFORM SUPPORT USING LARGE LANGUAGE MACHINE-LEARNED MODEL-BASED AGENTS
An online system provides a support application including a chatbot application. One or more tools may each be configured to access external data. The interface system hosts an agent powered by an underlying large language model. The online system receives a user query via the chatbot application. For at least one or more iterations, the online system performs steps to provide a prompt to the LLM that specifies at least the user query, contextual information, a list of available tools, or a request to output an action. The system parses the response from the LLM to extract a selected action and action inputs for the selected action. The system triggers execution of a respective tool that corresponds to the selected action with the action inputs. The system generates a response to the user query and transmits the response to the client device.
DYNAMIC OPTIMIZATION OF KEY VALUE PAIR EXTRACTORS FOR DOCUMENT DATA EXTRACTION
Disclosed embodiments provide techniques for monitoring and evaluating the effectiveness of key value pairs (KVPs) used in a document processing system. In embodiments, KVPs are obtained from multiple extractors of a document processing system. A score is computed for the KVPs by computing an effectiveness metric for each KVP from the multiple KVPs. In response to the computed score being below a predetermined threshold, a model retraining process is performed to generate a new set of KVP extractors, and provide the new set of KVPs to the document processing system.
AUTOMATIC GENERATION OF SCIENTIFIC ARTICLE METADATA
Examples of the disclosure are directed to systems and methods of using natural language processing techniques to automatically assign metadata to articles as they are published. The automatically-assigned metadata can then feed into the algorithms that calculate updated causation scores for agent-outcome hypotheses, powering live visualizations of the data that update automatically as new scientific articles become available.
SYSTEM FOR THE EXTRACTION OF INFORMATION FROM DOCUMENTS
The invention pertains to a system for the extraction of information from documents, in particular natural language documents, the system comprising an encoder with a neural network; and a retriever that is configured as a reasoning engine. The system is configured such that it supports user-defined queries for at least two pieces of information; the encoder is applied to the documents, in particular the natural language documents, to generate document encodings; the user-defined queries, to generate encoded instructions; the document encodings are queried by the retriever in lookup steps based on the encoded instructions.