Patent classifications
G06F16/3346
VOCABULARY SIZE ESTIMATION APPARATUS, VOCABULARY SIZE ESTIMATION METHOD, AND PROGRAM
An apparatus selects a plurality of test words from a plurality of words, presents the test words to users, receives answers regarding knowledge of the test words of the users, and obtains a model representing a relationship between values based on probabilities that the users answer that the users know the words and values based on vocabulary sizes of the users when the users answer that the users know the words, by using the test words, estimated vocabulary sizes of people who know the test words, and the answers regarding the knowledge of the test words. Here, the estimated vocabulary sizes are obtained based on frequencies of appearance of the words in a corpus and parts of speech of the words.
Systems, apparatuses, and methods for document querying
Techniques for searching documents are described. An exemplary method includes receiving a document search query; querying at least one index based upon the document search query to identify matching data; fetching the identified matched data; one or more of a top ranked passage and top ranked documents from the set of documents based upon one or more invocations of one or more machine learning models based at least on the fetched identified matched data and the document search query, wherein at least one of the machine learning models has been trained for the third party; and returning one or more of the top ranked passage and the proper subset of documents.
RANKING EXPLANATORY VARIABLES IN MULTIVARIATE ANALYSIS
A computer-implemented method, a computer program product, and a computer system for ranking explanatory variables in multivariate analysis. A computer system extracts words from documents related to categories, creates a histogram of the words in each category, and selects top words in each histogram, where the top words are used as representing words in each category. A computer system generates respective feature vectors of explanatory variable candidates and a feature vector of an objective variable, where a feature vector of a corresponding variable includes elements corresponding to respective ones of the categories and a value of element indicates whether a name of the corresponding variable is included in the top words. A computer system calculates cosine similarity between each of the respective feature vectors of the explanatory variable candidates and the feature vector of the objective variable. A computer system ranks the explanatory variable candidates, based on the cosine similarity.
METHOD AND APPARATUS FOR CONSTRUCTING EVENT LIBRARY, ELECTRONIC DEVICE AND COMPUTER READABLE MEDIUM
The present disclosure provides a method and apparatus for constructing an event library, which relate to the technical fields of deep learning, natural language processing, big data, and the like. An implementation includes: acquiring at least one event text data, the at least one event text data being to be assigned to a text library; obtaining an extraction event name based on the event text data; performing a match between the extraction event name and event information in the text library, to obtain a recalled event in the text library; detecting, based on the recalled event, whether the extraction event name meets a unifying condition; and in response to detecting that the extraction event name does not meet the unifying condition, obtaining a new event name in the event library based on the extraction event name, and adding the event text data as a new event into the text library.
COMPUTERIZED SELECTION OF SEMANTIC FRAME ELEMENTS FROM TEXTUAL TASK DESCRIPTIONS
A computer identifies, within a task description, words that correspond to semantic element labels for the task. The computer receives, from a task source operatively connected with the computer, a textual description of a task. The computer receives semantic element labels, element identification rules, and at least one reference sentence showing natural language semantic element label use. The computer parses the description to generate words for the semantic element label to generate, a Rule Match Values based on the element identification rules for the parsed words. The computer collects words having RMVs above a threshold into sets of associated of candidate words and generates, using a neural network trained on the reference sentence, Match Likelihood Values (MLVs) indicating whether the candidate words represent a semantic element label with which the candidate word is associated. The computer selects to represent the semantic element, the associated candidate word having a highest MLV.
System and method for adaptively adjusting related search words
A system for adaptively adjusting related search words are provided. The system includes an input device, a search log collection module, a threshold setting module and a process evolution module. The input device receives a search word. The search log collection module determines whether the cumulative search count of the search word is greater than a first threshold or less than a second threshold. The threshold setting module sets the first threshold and the second threshold in terms of the number of search logs. When the cumulative search count of the search word is between the first threshold and the second threshold, the process evolution module optimizes the middle search process to find out at least one related word and/or at least one historical search word most related to the attributes or content of the search word from the indexed text and the historical search log.
Dimension-specific dynamic text interface for data analytics
Embodiments relate to a dynamic text provider that generates and communicates a text object to a text consumer (e.g., a table with a text header, a chart having text axis labels and/or title). An engine is positioned between a dynamic text service, and an underlying data set organized according to a model with hierarchical elements (e.g., measures, dimensions, pages). The engine receives an input from the text consumer. The input includes at least a first identifier of the text consumer, a second identifier of the data set, and a third identifier of a specific element (e.g., dimension) of the model. The engine references the model to create a context. Based upon that context, the engine queries the data set to generate a dynamic text object including a list of values (LOV) for the dimension. The dynamic text object including the LOV is communicated to the text consumer.
Metadata aggregation using a trained entity matching predictive model
A metadata aggregation system includes a computing platform having a hardware processor and a memory storing a software code including a trained entity matching predictive model trained using training data obtained from a reference database. The hardware processor executes the software code to obtain metadata inputs from multiple sources, conform the metadata inputs to a common format, match, using the trained entity matching predictive model, at least some of the conformed metadata inputs to the same entity, and determine, using the trained entity matching predictive model, a confidence score for each match. The software code further sends a request to one or more human editor(s) for confirmation of each match having a confidence score greater than a first threshold and less than a second threshold, and updates the reference database, in response to receiving a confirmation that at least one match is a confirmed match, to include the confirmed match.
SAMPLING DEVICE
A sampling device capable of balancing workload among units in data parallel processing is provided. A sampling device includes priority assignment method S201 that assigns higher priority to units with more remaining workload, priority-aware scheduling method S202-S204 that enables units with higher priority to do the sampling and model update when conflict happens, a modified priority-aware scheduling method shown as FIG. 10 that reduces scheduling overhead by re-assigning priority every several iterations, and another modified priority-aware scheduling method shown as FIG. 12 that explores different priority re-assignment frequency and stores the sorted sequences in memory.
DEPENDENCY GRAPH-BASED WORD EMBEDDINGS MODEL GENERATION AND UTILIZATION
A method for dependency graph-based word embeddings model generation includes the loading into memory of a computer of a corpus of text organized as a collection of sentences and the generation of a dependency tree for each word of each of the sentences. The method additionally includes the matrix factorization of each generated dependency tree so as to produce a corresponding word embedding for each word of each of the sentences without utilizing co-occurrence in order to create a word embeddings model. Finally, the method includes the storage of the model as a code book in the memory of the computer. The code book may then be used in producing a probability that a prospective term during textual analysis of a target document appears in the target document based upon a known presence of a different word in the target document and a relationship therebetween specified by the code book.