Patent classifications
G06F16/3335
GENERATING HYPOTHESES AND RECOGNIZING EVENTS IN DATA SETS
A hypotheses generation and event recognition system that enables event recognition by analyzing documents to construct one or more qualitative metrics (e.g., frequency of keywords, changes in sentiment, occurrence of ontological terms, evolution of topics, etc.), establishing a baseline for the qualitative metric(s), and outputting changes to that baseline for display. In aggregate, those qualitative metrics comprise temporal and/or spatial signals that, when combined, define signatures of events of interest. Accordingly, the user and/or the system may identify an event of interest based on the change in baseline. The system may further provide functionality to generate hypotheses by coding data according to an ontology, populating an ontology space, and using an optimization algorithm to rank points or neighborhoods in the coded ontology space. The system may further store links between ontological terms and qualitative metrics to provide functionality to test generated hypotheses that include those linked ontological terms.
SYSTEMS AND METHOD FOR GENERATING A STRUCTURED REPORT FROM UNSTRUCTURED DATA
Methods and systems for providing computer-assisted guided review of unstructured data to generate a structured data output based on customizable template rules are provided. In embodiments, an unstructured file is received, and a predefined template is selected. The predefined template includes a plurality of fields, each field corresponding to a field of the structured report. The predefined template also defines extraction rules for each field of the predefined template, and the extraction rules define parameters for identifying unstructured data relevant to the associated field. The extraction rules are applied to the unstructured file to identify data relevant to the field associated with the corresponding extraction rule, and the data identified as relevant is confirmed. Confirming the relevant data includes determining to refine the relevant data based on a condition, and modifying the extraction rule associated with the field to refine the relevant data.
METHODS AND SYSTEMS FOR SEARCHING A SEARCH QUERY HAVING A NON-CHARACTER-BASED INPUT
A method and system are provided for searching a search query having a non-character-based input. The method comprises receiving the search query comprising a first part and a second part. The first part comprises a non-character-based input. The method further comprises identifying a first plurality of keywords associated with the non-character-based input and receiving a selection of at least one of the first plurality of keywords. The method further comprises generating a modified search query comprising the at least one selected keyword and the second part. The method further comprises retrieving search results based on the modified search query and generating for presentation the search results.
INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND NON-TRANSITORY COMPUTER READABLE MEDIUM
One embodiment of the present invention provides an information processing apparatus that transforms a text into data indicating entities and association among entities, compares the transformed data, and thereby enables texts having a relationship to be detected. An information processing apparatus as one embodiment of the present invention includes: a transformer; and a detector. The transformer is configured to transform a text into entity data indicating entities and association among entities. The detector is configured to detect a second text having a relationship with a first text by comparing first entity data transformed from the first text with one or more second entity data transformed from one or more second texts.
Multigram index for database query
Disclosed embodiments provide techniques for database query utilizing a multigram index. In embodiments, a search query is divided into multiple regex subcomponents. Regex subcomponent indexes are created and searched in parallel, and/or in a sequential manner on reduced data sets, increasing search performance, especially for NoSQL databases.
Efficient resolution of syntactic patterns in question and answer (QA) pairs in an n-ary focus cognitive QA system
Embodiments for processing questions based on equivalence classes in a cognitive question answering system. A plurality of syntactic representations of a plurality of questions asked of the cognitive question answering system are provided. A plurality of syntactic representations of a plurality of passages ingested by the cognitive question answering system are provided. Question focus to candidate passage pairs are mapped to form an equivalence class mapping, and the equivalence class mapping is used to determine an answer to one of the plurality of questions asked of the cognitive question answering system.
TRAINED SEQUENCE-TO-SEQUENCE CONVERSION OF DATABASE QUERIES
Methods and systems are provided for sequence-to-sequence conversion from unstructured search queries to structured database queries, so that lay persons may retrieve information from relational databases without specialized knowledge of database query languages. An encoder module and a decoder module of a learning model are trained to convert an unstructured search query to an intermediate feature vector by computing co-attention and self-attention based on a context string and a database schema, encoding the database schema in the context string by application of self-attention between the context string containing tokens of the database schema with learned structural attention heads which relate the token to logic of the database. Training is performed using labeled training datasets which include structured database queries which are normalized by parsing into a semantic representation thereof, followed by linearization.
METHOD AND APPARATUS FOR DETERMINING FEATURE WORDS AND SERVER
The present specification provides a method and apparatus for determining feature words and a server. The method includes: obtaining text data; extracting a first feature word from the text data; updating a word segmentation library based on the first feature word to obtain an updated word segmentation library, the word segmentation library including a plurality of predetermined feature words for representing predetermined attribute types; and extracting a second feature word from the text data based on the updated word segmentation library and the predetermined feature words.
SYSTEM AND METHOD FOR CONFIDENTIALITY-PRESERVING RANK-ORDERED SEARCH
A confidentiality preserving system and method for performing a rank-ordered search and retrieval of contents of a data collection. The system includes at least one computer system including a search and retrieval algorithm using term frequency and/or similar features for rank-ordering selective contents of the data collection, and enabling secure retrieval of the selective contents based on the rank-order. The search and retrieval algorithm includes a baseline algorithm, a partially server oriented algorithm, and/or a fully server oriented algorithm. The partially and/or fully server oriented algorithms use homomorphic and/or order preserving encryption for enabling search capability from a user other than an owner of the contents of the data collection. The confidentiality preserving method includes using term frequency for rank-ordering selective contents of the data collection, and retrieving the selective contents based on the rank-order.
Enforcing data ownership at gateway registration using natural language processing
Enforcing data ownership may include receiving a request to register an application programming interface (API) endpoint. A plurality of elements of the API endpoint and a target API endpoint may be preprocessed. A distance may be computed for each of element of the API endpoint relative to at least one of the elements of the target API endpoint. A distance score for the API endpoint may be computed based on the distance scores. A term frequency-inverse document frequency (TF-IDF) value may be computed for a plurality of metadata terms of the API endpoint and the target API endpoint. A similarity score between the TF-IDF values of the metadata terms may be computed. An adjusted score may be computed for the API endpoint based on the distance score and the similarity scores. The API endpoint may be registered based on the adjusted score being below a permissions threshold.