Patent classifications
G06F16/316
Method and system for high performance integration, processing and searching of structured and unstructured data
Disclosed herein are methods and systems for integrating an enterprise's structured and unstructured data to provide users and enterprise applications with efficient and intelligent access to that data. In accordance with exemplary embodiments, the generation of classification information about unstructured data can be hardware-accelerated by processing streaming unstructured data through a reconfigurable logic device, a graphics processor unit (GPU), or chip multi-processor (CMP) to generate the classification metadata about the unstructured data.
TEXT ANALYSIS SYSTEM AND PROGRAM
A morpheme analysis unit sets beforehand a meaning-candidate tag and a sentimental theme tag for a morpheme required to be input as a text. A syntax analysis unit generates an index where a clause including a meaning-candidate tag and a sentimental theme tag and a type of each tag. A meaning attribute extraction unit recognizes a clause including a meaning-candidate and a type of tag with reference to the index, and then applies a meaning attribute rule, sets a meaning attribute tag for a necessary clause, and updates the index. A sentimental analysis unit also recognizes a clause including a sentimental theme tag and a clause including a meaning attribute tag with reference to the index, and then applies a sentimental analysis rule and sets a sentimental attribute tag for a necessary clause.
TECHNIQUES FOR DOMAIN CAPTURE
Techniques for domain capture. In one embodiment, for example, a method comprises detecting an attempt to create a personal user account with an online service that uses an electronic mail address that belongs to an owner of a particular domain name service domain. Based on the detecting, the online service prompts a user associated with the attempt to create a team account with the online service that is under management of the owner of the particular domain name service domain. Also based on the detecting the online service offers the user an option of creating a personal user account with the online service using an unowned electronic mail address.
Contact Information Identification System
A contact information identification system, has the steps of a user selecting content from the display, the system scanning the content with and images with optical character recognition, to produce text, the system parses the text, the system groups text according to a factor selected from the group consisting of proximity of words, line or section, or by matching key words, to form a group, the system matching the group to a data field, the data associated with each field is presented in a list configured for correction by a user, and fields may be deleted by a user if they do not form part of the contact information or are not relevant to the contact information.
BUCKET BASED DISTRIBUTED SEARCH SYSTEM
A distributed search system can be partitioned into buckets based on entities and time periods. Addresses for the partitions can be formed from entity parameters and time period parameters. An indexing scheme for the partitions can be maintained at one or more search clusters, which may be geographically separate from one another. Consistency can be maintained across the search clusters though routing queries between clusters based at least in part on the status of partitions.
ANNOTATIONS OF TEXTUAL SEGMENTS BASED ON USER FEEDBACK
Methods and apparatus related to verifying annotations of textual segments based on human feedback that is responsive to questions generated to solicit feedback relevant to the annotations. Some implementations are directed generally to generating one or more task specifications to solicit feedback relevant to a potential annotation of a target textual segment, transmitting the task specifications for review by a plurality of human reviewers, receiving feedback responsive to the task specifications, and using the feedback to determine whether the potential annotation is a verified annotation. Some implementations are directed generally to determining an effectiveness measure for a task system and/or one or more reviewers, wherein the effectiveness measure is indicative of effectiveness in providing feedback instances related to one or more annotations for one or more textual segments.
CORPUS SEARCH SYSTEMS AND METHODS
A corpus of texts relating to a domain of knowledge may be searched by determining word-pair proximity scores measuring associations between pairs of words that appear in the corpus and that are semantically related to the domain of knowledge. When a search term is received, the word-pair proximity scores may be used (at least in part) with dictionary overlays, user feedback, and other feature vectors as weighting mechanisms to identify one or more related words that are strongly associated with the search term within the corpus. One or more texts may be selected from the corpus, texts in which the search term and the related words appear near each other in one or more places. The selected texts may be categorized and/or clustered based on the related words before being returned for presentation as Search Results.
Social media driven information interface
One or more techniques and/or systems are provided for populating an information interface based upon social media data. For example, users may post, share, and/or discuss various information through social media sources. Accordingly, social media data may be obtained from such social media sources. The social media data may be grouped into sets of social media data based upon temporal information. Within the sets of social media data, social media entries may be clustered into topic clusters (e.g., a royal wedding topic cluster, a plane crash topic cluster, etc.). Event summaries may be generated for respective topic clusters. The event summaries may be used to populate timeslots of an information interface, such as a calendar or timeline, to create annotated timeslots. In this way, the information interface may provide users with an interactive view of events over a time period, such as a year-in-review, based upon social media data.
Methods and systems for indexing references to documents of a database and for locating documents in the database
Methods and systems allow indexing references to documents of a database according to database reference profiles. Documents may then be located in the database using decoding protocols based on the database reference profiles. To this end, the documents are stored in the database and searchable terms extracted therefrom are associated with posting lists. Each posting list is divided into blocks of M database references. The blocks are encoded according to a pattern that depends on the M database references. A corresponding pointer to a table of encoding patterns is appended to each block. When a query is received for a searchable term, blocks are extracted from a posting list corresponding to the searchable term and a pointer for each block is used to extract a decoding protocol related to an encoding pattern for the block.
Managing replicated data
An approach for managing replicated data is presented. A current usage of resources in a system and a threshold usage of the resources are determined. Based on inter-replica correlation(s) and inter-data correlation(s) specified by a unified replication metadata model, a proper subset of replicas included in a plurality of replicas is indexed by (i) if the current usage is less than the threshold usage, determining an expected additional resource usage due to performing an indexing task online and based on the expected additional resource usage, determining a resource affinity score for performing the indexing task online, or (ii) if the current usage is greater than or equal to the threshold usage, determining an expected resource usage due to performing the indexing task offline and based on the expected resource usage, determining a resource affinity score for performing the indexing task offline.