Patent classifications
G06F16/316
Systems and methods for generating and using aggregated search indices and non-aggregated value storage
Systems, methods and computer program products for using searchable aggregate indices associated with non-aggregated value storage. In one method, a search system stores metadata values for each of a plurality of objects in a storage unit. The metadata values are stored in corresponding value storage locations that are associated with an identifiable metadata fields. An aggregate index is provided which includes a dictionary of terms that are contained in metadata values associated with a designated set of the metadata fields. The aggregate index is searched for one or more specific search terms, and one or more of the metadata values are retrieved from the value storage locations in response to the search, where the individual metadata fields associated with the retrieved metadata values are identified.
SYSTEMS AND METHODS FOR GENERATING A GUARDRAIL DATA STRUCTURE
A system for managing a guardrail data structure is provided. The system includes one or more processors; and one or more transitory or non-transitory computer-readable media storing instructions that are executable to cause the one or more processors to perform operations, the operations comprising: receiving, from a first user of a plurality of users of a guardrail data structure, interaction data associated with chat data; processing, by a large language model (LLM), the interaction data to determine an update to at least one cluster membership of at least one content cluster of a plurality of content clusters; transmitting the update of the at least one cluster membership to a remote computing device; and instructing the remote computing device to update the guardrail data structure based on the update to the at least one cluster membership.
Methods and systems for a compliance framework database schema
Generating a compliance framework. The compliance framework facilitates an organization's compliance with multiple authority documents by providing efficient methodologies and refinements to existing technologies, such as providing hierarchical fidelity to the original authority document; separating auditable citations from their context (e.g., prepositions and or informational citations); asset focused citations; SNED and Live values, among others.
Text data structuring method and apparatus using line information
A text data structuring apparatus according to the present invention includes: a data extraction unit which extracts text included in an image and position information of the text on the basis of OCR; a data processing unit which extracts line information included in the image by using the text, the position information, and the image; a labeling unit which labels the text as keys or values; and a relationship identification unit which acquires a mapping candidate group including first text, second text, and third text labeled on the basis of the line information, calculates a first similarity score representing meaning similarity between the first text and the third text and a second similarity score representing meaning similarity between the second text and the third text, and decides text to be mapped with the third text among of the first text and the second text.
TRAINING DATA GENERATING DEVICE AND TRAINING DATA GENERATING METHOD
A training data generating device and a training data generating method are provided. The device stores first single language code data, the first single language code data corresponding to a first language. The device generates a second single language code data corresponding to each of the first single language code data based on a second language and a whole sentence translation algorithm. The second single language code data corresponding to the second language. The device aligns text segments corresponding to the first single language code data and the second single language code data. The device generates code-mixing data based on at least one valid segment position corresponding to the text segments of each of the first single language code data.
KNOWLEDGE GRAPH CONSTRUCTION USING GENERATIVE ARTIFICIAL INTELLIGENCE FOR INTENT CLASSIFICATION
This application is directed to constructing a knowledge graph using generative artificial intelligence. A system can include one or more processors coupled with memory to identify a plurality of items of unstructured data. The system can provide, for one or more generative artificial intelligence models, a first prompt to cause the models to output a plurality of first level categories of a hierarchical data structure for the items. The system can receive the first level categories, each corresponding to a subset of the items grouped by semantic similarity, and evaluate each category according to taxonomy criteria. The system can provide a second prompt to generate second level categories for each first level category, receive the second level categories, and construct a knowledge graph data structure linking the categories and their respective subsets to relate each item of unstructured data with corresponding categories according to the hierarchical data structure.
COMPUTING SYSTEMS AND METHODS FOR GENERATING A RESPONSE TO A QUERY BASED ON A CORPUS OF DOCUMENTS
Systems and method for generating a response to a query. The method includes using a first large language model (LLM) to generate synthetic information related to a query; generating an amended query based on the synthetic information related to the query; using an information retrieval system to retrieve, from a plurality of chunks, a set of chunks that are relevant to the amended query, wherein each chunk of the plurality of chunks is all or a portion of a document in a corpus of documents; using a second LLM to rank the set of chunks based on a relevance to the query; selecting a subset of chunks from the set of chunks based on the ranking; and using a third LLM to generate a response to the query based on the subset of chunks.
Knowledge re-ranking techniques
Techniques are disclosed herein for selecting document chunks that are most relevant to a query. The techniques include receiving a query and comparing a plurality of stored text passages to the query using a first similarity metric. Based on the comparison, a subset of the plurality of stored text passages that are most similar to the query are selected. A plurality of sentences from the subset of the plurality of stored text passages are identified. The identified sentences are ranked based on the query and a second similarity metric. A subset of the sentences are selected based on the ranking. The subset of the sentences or a derivative thereof are output in response to the query.
DATA STRUCTURES FOR STORING AND MANIPULATING LONGITUDINAL DATA AND CORRESPONDING NOVEL COMPUTER ENGINES AND METHODS OF USE THEREOF
In some embodiments, the present disclosure provides for an exemplary computer-implemented system that may include a longitudinal data engine, including: a processor and specialized index generation software to generate: an index data structure for a respective event type associated with each respective subject or object; where each respective index data structure is a respective event type-specific data schema, defining how to store events of a particular event type to form longitudinal data of each respective subject or object; an ontology data structure that is configured to describe one or more properties of a respective event of a respective subject or object; and longitudinal data extraction software to extract a respective longitudinal data for a plurality of index data structures and a plurality of ontology data structures associated with a plurality of subjects or objects.
Methods and systems for a compliance framework database schema
Generating a compliance framework. The compliance framework facilitates an organization's compliance with multiple authority documents by providing efficient methodologies and refinements to existing technologies, such as providing hierarchical fidelity to the original authority document; separating auditable citations from their context (e.g., prepositions and or informational citations); asset focused citations; SNED and Live values, among others.