Patent classifications
G06F16/316
SYSTEMS AND METHODS FOR DOCUMENT PARTITIONING AND PARTITION LABELING
In some aspects, the techniques described herein relate to a method including: determining a first, a second, and a third logical partition separation indicator in a string file, wherein the first logical partition separation indicator is for a first partition level, the second partition separation indicator is for a second level, and the third logical partition separation indicator is also for the first partition level, each in a partition hierarchy; setting a first variable value to a value of the first logical partition separation indicator and a second variable value to a value of the second logical partition separation indicator; writing the first variable value to a data structure and writing the second variable value to the data structure; persisting the data structure to a search index; and clearing the first variable value and the second variable value.
GENERATING SYNTHESIZED USER DATA
Disclosed are examples of systems, apparatuses, methods, and computer program products for generating synthesized user data. A method may involve receiving a data specification schema. A method may involve determining a number of test data objects to be generated. A method may involve defining the test data objects, the defining of each test data object including: determining, from the data specification schema, a number of fields of the test data object to be populated, the fields representing categories of simulated user data; and determining values for the fields, the values simulating user data. The method may involve storing the test data objects in a database. The method may involve generating a tabular data file including or identifying the test data objects, the tabular data file configured to be processed by one or more processors of a computing system during a user data testing procedure of the computing system.
Methods, systems, and computer-readable media for semantically enriching content and for semantic navigation
Methods, systems and computer-readable media enable various techniques related to semantic navigation. One aspect is a technique for displaying semantically derived facets in the search engine interface. Each of the facets comprises faceted search results. Each of the faceted search results is displayed in association with user interface elements for including or excluding the faceted search result as additional search terms to subsequently refine the search query. Another aspect automatically infers new metadata from the content and from existing metadata and then automatically annotates the content with the new metadata to improve recall and navigation. Another aspect identifies semantic annotations by determining semantic connections between the semantic annotations and then dynamically generating a topic page based on the semantic connections.
Cognition management system and methods for managing research and development activity
Disclosed are systems, apparatuses, methods, and computer readable medium for managing research activity and development across scientific, technical, medical, and other knowledge domains. A method includes: identifying nodes from content from different data sources, wherein the content includes grant information, a technical publication, or a legal publication and each node corresponds to an entity associated with technical data; associating at least one content item from the different data sources to a corresponding node; normalizing vectors identifying features of each content item based on linguistic differences associated with the different data sources; generating embeddings associated with each content item based on normalized vectors associated with each content item; and identifying a first node based on content items associated with the first node.
CUSTOM EMBEDDING MODEL FOR SEMANTIC SEARCH
System, methods, apparatuses, and computer program products are disclosed for generating a field-specific embedding model and employing the generated field-specific embedding model for semantic search of a field-specific database. A plurality of field-specific documents stored in a field-specific database are analyzed to determine a first document and a second document that are relevant to each other. A training dataset including a first training pair is generated by pairing a first portion of the first document with a corresponding second portion of the second document based on determining that the first document and second document are relevant to each other. A field-specific embedding model is trained based on the training dataset. The plurality of field-specific documents are provided to the field-specific embedding model to generate index embedding vectors. The field-specific database storing the plurality of field-specific documents is indexed based on the generated index embedding vectors.
Priming Generative AI Model Leveraging Directed Acyclic Graph-Driven Notebook Environment
An application receives a natural language query from a user into a cell of a notebook environment and responsively determines a set of precedent cells and a profile of the user. The application determines a portion of the data warehouse graph that corresponds to the natural language query. The application primes the large language model with priming context that is based on the portion of the data warehouse graph that corresponds to the natural language query, the precedent cells from which the code cell depends, and the profile of the user, the priming resulting in a primed large language model. The application inputs the natural language query into the primed large language model and receives, as output from the large language model, a response to the natural language query. The application provides the response to the natural language query to the user.
Custom embedding model for semantic search
System, methods, apparatuses, and computer program products are disclosed for generating a field-specific embedding model and employing the generated field-specific embedding model for semantic search of a field-specific database. A plurality of field-specific documents stored in a field-specific database are analyzed to determine a first document and a second document that are relevant to each other. A training dataset including a first training pair is generated by pairing a first portion of the first document with a corresponding second portion of the second document based on determining that the first document and second document are relevant to each other. A field-specific embedding model is trained based on the training dataset. The plurality of field-specific documents are provided to the field-specific embedding model to generate index embedding vectors. The field-specific database storing the plurality of field-specific documents is indexed based on the generated index embedding vectors.
System and method for modification, personalization and customization of search results and search result ranking in an internet-based search engine
A computer server system and method are disclosed for personalization and customization of network search results and rankings, such as for Internet searching. A representative server system comprises: a network interface to receive a query from a user and transmit return queries and search results; a data storage device having a first, lexical database having one or more compilations and templates; and one or more processors configured to access the first database and search a selected compilation using the query to generate initial search results; to comparatively score each selected parsed phrase of the initial search results, for each classification of a selected template and a selected compilation, and to output initial and final search results arranged according to the classifications and the predetermined order of the template. A representative embodiment may also include use of a second, semantic database having multi-dimensional vectors corresponding to parsed phrases, paragraphs, or clauses.
Search engines and systems with handheld document data capture devices
Embodiments of the disclosed innovations provide systems and methods for locating data associated with rendered documents. Some embodiments support the use of a handheld document data capture device.
CHUNK SYNTHESIS FOR RETRIEVAL AUGMENTED GENERATION ASSISTANTS
A query answering system may access a collection of data sources to populate an index. A query answering system derives content from a collection of data sources to create synthetic chunks that are each representative of a portion of content from one or more of the data sources. A query answering system populates the index with the synthetic chunks. A query answering system identifies a subset of the synthetic chunks as relevant to a user query, generates a large language model (LLM) prompt that includes the subset of the synthetic chunks from the index and the user query, provides the LLM prompt to an LLM., and generates a response to the user query based on output of the LLM.