G06F16/316

GENERATION AND USE OF DELTA INDEX
20170091311 · 2017-03-30 ·

According to an embodiment of the present disclosure, it is determined whether a delta index is beneficial based on the difference between a first version and a second version of a document, wherein the first version is associated with a first index comprising a plurality of keywords appeared in the first version. The delta index is generated for the difference between the first and second versions if the delta index is beneficial, wherein the delta index comprises a first section including information about one or more keywords affected by the difference and the information about the positions of the affected keywords.

Systems and methods to build and utilize a search infrastructure

Methods and systems to build and utilize a search infrastructure are described. The system generates index information components in real-time based on a database that is time-stamped. The system updates index information at a plurality of query node servers based on the index information components. A query engine receives a search query from a client machine and identifies search results based on the query and the index information. The system communicates the search results, over the network, to the client machine.

Stemming for searching

Embodiments of a search stemming module at a computer configured to receive a query and identify stems from the query and configured to search a secondary index for variants corresponding to the stems are disclosed. The secondary index may comprise one or more lists of stems associated with variants from a primary index; a search reconfiguration module configured to reformat the query to include variants found from the secondary index; and a search engine configured to implement a search of the primary index using the variants received from the search reconfiguration module.

Training and applying structured data extraction models

A computer system for extracting structured data from unstructured or semi-structured text in an electronic document, the system comprising: a graphical user interface configured to present to a user a graphical view of a document for use in training multiple data extraction models for the document, each data extraction model associated with a user defined question; a user input component configured to enable the user to highlight portions of the document; the system configured to present in association with each highlighted portion an interactive user entry object which presents a menu of question types to a user in a manner to enable the user to select one of the question types, and a field for receiving from the user a question identifier in the form of human readable text, wherein the question identifier and question type selected by the user are used for selecting a data extraction model, and wherein the highlighted portion of the document associated with the question identifier is used to train the selected data extraction model.

Traversing a SPARQL query and translation to a semantic equivalent SQL

In an approach for semantically translating data. Aspects of an embodiment of the present invention include an approach for semantically translating data, wherein the approach includes a processor selecting a first node. A processor identifies a parent node of the first node. A processor determines that a value of the first node is unknown. A processor responsive to determining that the value of the first node is unknown, annotates the first node to indicate that the first node is at least partially unknown. A processor identifies a common table expression of the first node. A processor determines that the common table expression of the first node matches, within a predetermined threshold, a common table expression of the second node. A processor merges information from the common table expression of the second node with the common table expression of the first node.

RECYCLABLE PRIVATE MEMORY HEAPS FOR DYNAMIC SEARCH INDEXES
20170061012 · 2017-03-02 ·

In one embodiment, a search engine may generate and store a plurality of search index segments such that each of the search index segments is stored in a corresponding one of a plurality of heaps of memory. The plurality of search index segments may include inverted index segments mapping content to documents containing the content. A garbage collection module may release one or more heaps of the memory.

PROVIDING SECURE INDEXES FOR SEARCHING ENCRYPTED DATA

Providing an encrypted search index for performing searches on encrypted documents, the method comprising: (i) providing a set of documents, the documents comprising a plurality of unencrypted phrases; (ii) providing a master key; (iii) providing, based on the master key, for each phrase a set of encryption keys comprising one or more encryption keys; (iv) selecting, for each phrase, one encryption key of the set of encryption keys; (v) encrypting each phrase with the selected encryption key; and (vi) building an index based on the encrypted phrases, the index comprising information regarding which encrypted phrase is comprised within a certain document.

PROVIDING SECURE INDEXES FOR SEARCHING ENCRYPTED DATA

Providing an encrypted search index for performing searches on encrypted documents, the method comprising: (i) providing a set of documents, the documents comprising a plurality of unencrypted phrases; (ii) providing a master key; (iii) providing, based on the master key, for each phrase a set of encryption keys comprising one or more encryption keys; (iv) selecting, for each phrase, one encryption key of the set of encryption keys; (v) encrypting each phrase with the selected encryption key; and (vi) building an index based on the encrypted phrases, the index comprising information regarding which encrypted phrase is comprised within a certain document.

SYSTEM AND METHOD OF CONTEXT-BASED PREDICTIVE CONTENT TAGGING FOR SEGMENTED PORTIONS OF ENCRYPTED MULTIMODAL DATA

This disclosure relates to systems, methods, and computer readable media for performing multi-format, multi-protocol message threading in a way that is most beneficial for the individual user. Users desire a system that will provide for ease of message threading by stitching together related communications in a manner that is seamless from the user's perspective. Such stitching together of communications across multiple formats and protocols may occur, e.g., by: 1) direct user action in a centralized communications application (e.g., by a user clicking Reply on a particular message); 2) using semantic matching (or other search-style message association techniques); 3) element-matching (e.g., matching on subject lines or senders/recipients/similar quoted text, etc.); and 4) state-matching (e.g., associating messages if they are specifically tagged as being related to another message, sender, etc. by a third-party service, e.g., a webmail provider or Instant Messaging (IM) service).

Traffic-aware route encoding using a probabilistic encoding data
12265515 · 2025-04-01 · ·

A network apparatus determines a route from an origin traversable map element (TME) to a target TME. The route comprises a list of route TMEs to be traveled from the starting location to the target location. The network apparatus identifies adjacent TMEs to the route, wherein an adjacent TME is a TME of the digital map that intersects the route and is not a route TME; determines an expected traffic delay for each adjacent TME based on traffic data; separates the adjacent TMEs into a plurality of delay groups based on the corresponding expected traffic delays; generates delay encoding data structures; and provides the delay encoding data structures and information identifying the route. Each delay encoding data structure encodes a map version agnostic identifier for the adjacent TMEs of one of the delay groups and is a probabilistic data structure configured to not provide false negatives.