G06F16/316

Providing incremental updates of categorical information using a probabilistic encoding data structure
11720538 · 2023-08-08 · ·

Information corresponding to one or more traversable map elements (TMEs) within a zone of interest is accessed from the geographic database. A respective category of a plurality of categories is determined for each of the one or more TMEs based at least in part on the information corresponding to the TME. A first category encoding data structure is generated based at least in part on map version agnostic identifiers corresponding to TMEs determined to be in a first category of the plurality of categories, wherein the first category encoding data structure is a probabilistic data structure configured to not provide false negatives for TMEs within the zone of interest. The first category encoding data structure is provided such that a mobile apparatus receives the first category encoding data structure.

SYSTEM AND METHOD FOR RELATION EXTRACTION WITH ADAPTIVE THRESHOLDING AND LOCALIZED CONTEXT POOLING
20220121822 · 2022-04-21 ·

System and method for relation extraction using adaptive thresholding and localized context pooling (ATLOP). The system includes a computing device, the computing device has a processer and a storage device storing computer executable code. The computer executable code is configured to provide a document; embed entities in the document into embedding vectors; and predict relations between a pair of entities in the document using their embedding vectors. The relation prediction is performed based on an improved language model. Each relation has an adaptive threshold, and the relation between the pair of entities is determined to exist when a logit of the relation between the pair of entities is greater than a logit function of the corresponding adaptive threshold.

Training and applying structured data extraction models

A computer system for extracting structured data from unstructured or semi-structured text in an electronic document, the system comprising: a graphical user interface configured to present to a user a graphical view of a document for use in training multiple data extraction models for the document, each data extraction model associated with a user defined question; a user input component configured to enable the user to highlight portions of the document; the system configured to present in association with each highlighted portion an interactive user entry object which presents a menu of question types to a user in a manner to enable the user to select one of the question types, and a field for receiving from the user a question identifier in the form of human readable text, wherein the question identifier and question type selected by the user are used for selecting a data extraction model, and wherein the highlighted portion of the document associated with the question identifier is used to train the selected data extraction model.

Methods and arrangements to adjust communications

Logic may adjust communications between customers. Logic may cluster customers into a first group associated with a first subset of synonyms and a second group associated with a second subset of the synonyms. Logic may associate a first tag with the first group and with each of the synonyms of the first subset. Logic may associate a second tag with the second group and with each of the synonyms of the second subset. Logic may associate one or more models with pairs of the groups. A first pair may comprise the first group and the second group. The first model associated with the first pair may adjust words in communications between the first group and the second group, based on the synonyms associated with the first pair, by replacement of words in a communication between customers of the first subset and customers of the second sub set.

Framework for analyzing table data by question answering systems

A question answering (QA) system comprising memory for storing instructions, and a processor configured to execute the instructions to ingest source documents that include structured data and unstructured data to create a knowledge base, wherein the unstructured data includes table data; create table annotations to represent the table data; store the ingested structured data, unstructured data, and the table annotations in the knowledge base; and determine answers to questions using the knowledge base.

Automated Process Collaboration Platform in Domains
20220027399 · 2022-01-27 · ·

A computing server may receive master data, transaction data, and a process model of a domain. The computing server may aggregate, based on domain knowledge ontology of the domain, the master data and the transaction data to generate a fact table. For example, entries in the fact table may be identified as relevant to the target process model and include attributes and facts that are extracted from master data or transaction data. The computing server may convert the entries in the fact table into vectors. The computing server may identify, based on the vectors, an attribute in the process model as being statistically significant on impacting the process model. For example, a regression model may be used to determine the statistical significance of an attribute on the model process. The computing server may generate an action associated with the attribute to improve the process model.

Wildcard searches using numeric string hash

Techniques herein improve computational efficiency for wildcard searches by using numeric string hashes. In an embodiment, a plurality of query K-gram tokens for a term in a query are generated. Using a first index, an intersection of hash tokens is determined, wherein said first index indexes each query K-gram token of said K-gram tokens to a respective subset of hash tokens of a plurality of hash tokens, each of hash token of said plurality of hash tokens corresponding to a term found in one or more documents of a corpus of documents. The intersection of hash tokens comprises only hash tokens indexed to all of said plurality of query K-gram tokens by said first index. Using a second index, documents of said corpus of documents that contain said term are determined, said second index indexing said hash tokens to a plurality of terms in said corpus of documents and for each term of said plurality of terms, a respective subset of documents of corpus of documents that contain said each term.

Rewriting corpus content in a search index and processing search queries using the rewritten search index

A method, a computing system, and a computer program product are provided for processing search queries. A computing device executing a content management system receives a content rewriting rule. A content item including the content rewriting rule is stored. The stored content rewriting rule is associated with a first search index, which includes indexed content of a corpus having unstructured textual content. The content of the corpus is rewritten into a second search index of an index overlay structure by applying the content rewriting rule to the content of the corpus. The second search index is used for searching the content of the corpus for content satisfying a received search query.

DECODING A ROUTE ENCODED BY A PROBABILISTIC ENCODING DATA STRUCTURE
20210364318 · 2021-11-25 ·

A mobile apparatus receives a route response including information identifying a starting location and a target location of a route and an encoding data structure encoding the route. The encoding data structure is a probabilistic data structure configured to not provide false negatives. The mobile apparatus uses the information identifying the starting and target locations to identify a decoded origin traversable map element (TME) and a decoded target TME of the mobile version of the digital map for the route; accesses map information for determining a cost value for TMEs of the digital map, wherein a TME that satisfies the encoding data structure is assigned a minimal cost value; determines a decoded route from the decoded starting TME to the decoded target TME based on the cost value assigned to the TMEs using a cost minimization route determination algorithm; and performs at least one navigation function using the decoded route.

USABILITY IN INFORMATION RETRIEVAL SYSTEMS

In order to facilitate a search and identification of documents, an information retrieval system is provided for performing a search on a corpus of data objects. The information retrieval system comprises a device and a database. The database is configured to store at least one syntactic search index data structure and at least one semantic search index data structure. The syntactic search index data structure is configured to index and store in the database a plurality of terms from the corpus of data objects along with syntactic annotations indicating syntactic information. The at least one semantic search index data structure is configured to index and store in the database the plurality of terms from the corpus of data objects along with semantic annotations indicating semantic information. The device comprises an input unit, a processing unit, and an output unit. The input unit is configured to receive a syntactic query and a semantic query. The processing unit is configured to match the syntactic query against the syntactic search index data structure to obtain a first set of data objects, each of which has a set of terms that are syntactically related to the syntactic query. The processing is configured to match the semantic query against The at least one semantic search index data structure to obtain second set of the data objects, each of which has a set of terms that are semantically related to the semantic query, wherein the second set of data objects is a sub-set of the first set of the data objects. The output unit is configured to output information of the second set of data objects.