Patent classifications
G06F16/316
RECORDING MEDIUM RECORDING INDEXED DATA GENERATION PROGRAM, INDEXED DATA GENERATION METHOD AND RETRIEVAL METHOD
A non-transitory computer-readable recording medium recording an indexed data generation program causing a computer to execute processing of generating ledger sheet output format data from ledger sheet data including a ledger sheet having a plurality of columns; generating index information for words, characters, or numerical values, the index information including positional information capable of specifying attributes of the plurality of columns and a positional relationship in the ledger sheet data between pieces of data corresponding to the plurality of columns included in the ledger sheet output format data; and outputting an output file including the index information and the ledger sheet output format data.
WILDCARD SEARCHES USING NUMERIC STRING HASH
Techniques herein improve computational efficiency for wildcard searches by using numeric string hashes. In an embodiment, a plurality of query K-gram tokens for a term in a query are generated. Using a first index, an intersection of hash tokens is determined, wherein said first index indexes each query K-gram token of said K-gram tokens to a respective subset of hash tokens of a plurality of hash tokens, each of hash token of said plurality of hash tokens corresponding to a term found in one or more documents of a corpus of documents. The intersection of hash tokens comprises only hash tokens indexed to all of said plurality of query K-gram tokens by said first index. Using a second index, documents of said corpus of documents that contain said term are determined, said second index indexing said hash tokens to a plurality of terms in said corpus of documents and for each term of said plurality of terms, a respective subset of documents of corpus of documents that contain said each term.
Corpus search systems and methods
A corpus of texts relating to a domain of knowledge may be searched by determining word-pair proximity scores measuring associations between pairs of words that appear in the corpus and that are semantically related to the domain of knowledge. When a search term is received, the word-pair proximity scores may be used (at least in part) with dictionary overlays, user feedback, and other feature vectors as weighting mechanisms to identify one or more related words that are strongly associated with the search term within the corpus. One or more texts may be selected from the corpus, texts in which the search term and the related words appear near each other in one or more places. The selected texts may be categorized and/or clustered based on the related words before being returned for presentation as Search Results.
METHODS, SYSTEMS, AND COMPUTER-READABLE MEDIA FOR SEMANTICALLY ENRICHING CONTENT AND FOR SEMANTIC NAVIGATION
Methods, systems and computer-readable media enable various techniques related to semantic navigation. One aspect is a technique for displaying semantically derived facets in the search engine interface. Each of the facets comprises faceted search results. Each of the faceted search results is displayed in association with user interface elements for including or excluding the faceted search result as additional search terms to subsequently refine the search query. Another aspect automatically infers new metadata from the content and from existing metadata and then automatically annotates the content with the new metadata to improve recall and navigation. Another aspect identifies semantic annotations by determining semantic connections between the semantic annotations and then dynamically generating a topic page based on the semantic connections.
Methods and systems for efficient and accurate text extraction from unstructured documents
According to one aspect, the subject matter described herein includes a method for extracting text from unstructured documents. The method includes creating a spatial index for storing information about words on a page of a document to be analyzed; using the spatial index to detect white space that indicates boundaries of columns within the page, aggregate words into lines, identify lines that are part of a header or footer of the page, and identify lines that are part of a table or a figures within the page; and joining lines together to generate continuous text flows. In one embodiment, the continuous text is divided into sections. In one embodiment, references within the document are identified. In one embodiment, inline citations within the document body are replaced with the corresponding reference information, or portions thereof.
Text processing method, system and computer program
A method includes hierarchically identifying occurrences of some of the words in the set of sentences; creating a first index for each of some of the words based on the upper hierarchy of occurrences identified for each word; receiving input of a queried word; hierarchically identifying occurrences of the queried word in the set of sentences; creating a second index based on the upper hierarchy of occurrences identified for the queried word; comparing the first index and the second index to calculate an estimated value for the number of occurrences of a word in the neighborhood of the queried word; and calculating the actual value of the number of occurrences of a word in the neighborhood of the queried word based on an upper hierarchy and lower hierarchy of the occurrences on condition that the estimated value is equal to or greater than a predetermined number.
Methods and systems for a compliance framework database schema
Generating a compliance framework. The compliance framework facilitates an organization's compliance with multiple authority documents by providing efficient methodologies and refinements to existing technologies, such as providing hierarchical fidelity to the original authority document; separating auditable citations from their context (e.g., prepositions and or informational citations); asset focused citations; SNED and Live values, among others.
Systems and methods for caching structural elements of electronic documents
Systems and methods are disclosed herein for caching structural elements of electronic documents. A plurality of indices is stored in a database. The plurality of indices corresponds to locations within an electronic document of portions of a structural element. A mutation to the electronic document is received. Based on the plurality of indices, it is determined that the mutation modifies the structural element. Based on the determination, the structural element is updated. The updated structural element is displayed at a user device.
Techniques for invite enforcement
Techniques for invite enforcement. In one embodiment, for example, a method comprises: receiving an acceptance of an invitation to join a team of one or more team user accounts held with the online service; based on receiving the acceptance and based on determining that an owned domain name service domain of an owned electronic mail address is associated with the team, offering a personal user account an option between at least: (a) converting the personal user account to a team user account that is a member of the team, and (b) changing an electronic mail address associated with the personal user account from the owned electronic mail address to one that has a different domain name service domain than the owned domain name service domain associated with the team.
METHOD AND SYSTEM FOR CONFIGURATION MANAGEMENT OF HIERARCHICALLY ORGANIZED UNSTRUCTURED DATA USING ASSOCIATIVE TEMPLATES
Methods, computer program products, and computer systems for configuration management are disclosed. Such methods, computer program products, and computer systems include determining whether unstructured data or structured data is available for use as a basis to define an associative template, and in response to a determination that unstructured data is available for use as the basis to define the associative template, determining whether the unstructured data is hierarchically organized. In response to a determination that the unstructured data is hierarchically organized, an associative template can then be defined using the hierarchical organization of the unstructured data, where the unstructured data is hierarchically-organized unstructured data (HOUD).