Patent classifications
G06F16/316
Distribution of index settings in a machine data processing system
Provided are systems and methods for causing display of an index management graphical user interface (GUI). In one embodiment, a method can be provided. The method can include causing display of an index management GUI including one or more user editable fields for specifying one or more index settings; receiving, via the one or more user editable fields of the index management GUI, one or more user specified index settings; generating an index definition corresponding to the one or more user specified index settings; and distributing the index definition to one or more indexers of a data processing system. The one or more indexers are able to manage storage of data in one or more indexes based at least in part on the index definition.
Automatic generation of scientific article metadata
Examples of the disclosure are directed to systems and methods of using natural language processing techniques to automatically assign metadata to articles as they are published. The automatically-assigned metadata can then feed into the algorithms that calculate updated causation scores for agent-outcome hypotheses, powering live visualizations of the data that update automatically as new scientific articles become available.
Unstructured data fusion by content-aware concurrent data processing pipeline
The disclosure relates to a data analytics platform in which a linear pipeline processing framework may use an abstracted query language to define a data fusion pipeline assembly mechanism. More particularly, the linear pipeline processing framework may include various operator groups that work in conjunction to organize data entries that can have substantially disparate data types (e.g., text, binary, video, audio, etc.) into a single normalized stream such that one or more processing modules may perform type-specific data processing and feature extraction, normalize an output into a single stream, and finally render the different data types as a fused output.
REWRITING CORPUS CONTENT IN A SEARCH INDEX AND PROCESSING SEARCH QUERIES USING THE REWRITTEN SEARCH INDEX
A method, a computing system, and a computer program product are provided for processing search queries. A computing device executing a content management system receives a content rewriting rule. A content item including the content rewriting rule is stored. The stored content rewriting rule is associated with a first search index, which includes indexed content of a corpus having unstructured textual content. The content of the corpus is rewritten into a second search index of an index overlay structure by applying the content rewriting rule to the content of the corpus. The second search index is used for searching the content of the corpus for content satisfying a received search query.
Non-transitory computer readable medium, encode device, and encode method
A non-transitory computer readable medium storing a program that causes a computer to execute a process, the process including obtaining text data, generating first index information indicating appearance positions in the text data for each of a plurality of characters or words obtained based on lexical analysis of the text data, generating second index information, the second index information being index information in which the appearance positions in the text data are aggregated for each character or word, specifying a data range in the first index information, to be referred in a pattern match search by using the second index information, and performing encoding for the text data based on the pattern match search by using the data range in the first index information.
TECHNIQUES FOR DATABASE ENTRIES DE-DUPLICATION
A system and method for data entries deduplication are provided. The method includes indexing an input data set, wherein the input data set is in a tabular formant and the indexing includes providing a unique Row identifier (RowID), wherein rows are the data entries; computing attribute similarity for each column across each pair of rows; computing, for each pair of rows, row-to-row similarity as a weighted sum of attribute similarities; clustering pairs of rows based on their row-to-row similarities; and providing an output data set including at least the clustered pairs of rows.
Indexing of large scale patient set
Systems and methods for indexing data include formulating an objective function to index a dataset, a portion of the dataset including supervision information. A data property component of the objective function is determined, which utilizes a property of the dataset to group data of the dataset. A supervised component of the objective function is determined, which utilizes the supervision information to group data of the dataset. The objective function is optimized using a processor based upon the data property component and the supervised component to partition a node into a plurality of child nodes.
Methods and systems for a compliance framework database schema
Generating a compliance framework. The compliance framework facilitates an organization's compliance with multiple authority documents by providing efficient methodologies and refinements to existing technologies, such as providing hierarchical fidelity to the original authority document; separating auditable citations from their context (e.g., prepositions and or informational citations); asset focused citations; SNED and Live values, among others.
KNOWLEDGE-BASED INFORMATION RETRIEVAL SYSTEM EVALUATION
Embodiments provide a computer implemented method of evaluating one or more IR systems, the method including: providing, by a processor, a pre-indexed knowledge-based document to a pre-trained sentence identification model; identifying, by the sentence identification model, a predetermined number of query-worthy sentences from the pre-indexed knowledge-based document, wherein the query-worthy sentences are ranked based on a prediction probability value of each query-worthy sentence; providing, by the sentence identification model, the query-worthy sentences to a pre-trained query generation model; generating, by the query generation model, a query for each query-worthy sentence; and evaluating, by the processor, the one or more IR systems using the generated queries, wherein one or more searches are performed via the one or more IR systems, and the one or more searches are performed in a set of knowledge-based documents including the pre-indexed knowledge-based document.
METHOD OF MATCHING A SET TO EVALUATE AND A REFERENCE LIST, CORRESPONDING MATCHING ENGINE AND COMPUTER PROGRAM
A method of matching a set to be evaluated and a reference list, the reference list being associated with a reference vector representative of the entries in the list. Such a method of matching includes: calculating a distance between the reference vector and a vector, associated with the set to evaluate, representative of elements contained in the set to evaluate, the elements comprising character strings and groups of character strings; for each entry in the reference list, calculating a first matching score for the set to evaluate and for the entry in the reference list, on the basis of the distance calculated between the reference vector and the vector associated with the set to evaluate; providing a list of entries from the reference list ordered according to the first calculated matching scores.