G06F16/316

Combined content indexing and data reduction
09772981 · 2017-09-26 · ·

Data storage is improved by combining content indexing and data reduction in text-containing files by using common word elimination. Raw data is processed by finding words in selected files, creating an index of found words, and replacing the words in the raw data with pointers to the corresponding words in the index. Each word appears only once in the index. Consequently, the index is relatively small and the procedure is completely reversible. In particular, the index is small relative to other methods because the data is transformed in place, and the transformed data and index are used together to capture the total information about the data.

IDENTIFYING RELEVANT PAGE CONTENT
20170329846 · 2017-11-16 ·

A computer-implemented method according to one embodiment includes identifying a plurality of related web pages, extracting textual data within the identified plurality of related web pages, determining a plurality of groupings of the extracted textual data, calculating a frequency of each of the determined plurality of groupings within the identified plurality of related web pages, creating a subset of the determined plurality of groupings, based on the calculated frequency of each of the plurality of groupings, and returning the subset of the determined plurality of groupings.

CATEGORY-BASED FULL-TEXT SEARCHING

Various embodiments of the present disclosure provide a solution for category-based full-text searching. In some embodiments, there is provided a method of full-text searching. The method includes generating a first full-text index based on an obtained electronic document content. The method also includes categorizing the electronic document to determine a category identifier for the electronic document, and generating a second full-text index based on the category identifier. The method further includes storing the first full-text index and the second full-text index.

Group based document retrieval

Embodiments relate to retrieving a document from a plurality of document groups in which mutually related documents are each included. An aspect includes acquiring a retrieval condition that includes a plurality of conditions and at least one logical operator that connects the plurality of conditions. Another aspect includes identifying, with respect to each condition of the plurality of conditions, a document group including a document satisfying the condition from among the plurality of document groups. Another aspect includes identifying a document that satisfies at least one condition. Another aspect includes determining a document that is a retrieval result by making a selection to omit or retain that depends on the at least one logical operator. Another aspect includes generating information showing the document that is the retrieval result based on the retrieval condition.

Method and system for indexing in datastores

A method, system, apparatus, and computer program product for indexing information stored in data-stores. The system receives a new index request. The system creates an index in response to the request. The new index includes at least one segment, a first flag, and a last flag. Each segment comprises index summary information. The system then stores the index in memory.

METHODS AND SYSTEMS FOR GENERATING A VIRTUAL ASSISTANT IN A MESSAGING USER INTERFACE
20210409363 · 2021-12-30 ·

A system for generating a virtual assistant in a messaging user interface the system including a computing device configured to initiate a virtual message user interface between a user client device and the computing device; receive a user message entered by a user into the virtual message user interface; retrieve data relating to a user agenda list including a plurality of agenda actions; analyze the user message to identify an agenda action related to the user message; and generate a response to the user message as a function of analyzing the user message, wherein generating the response further comprises generating a user-action learner, wherein the user-action learner utilizes a previous message and the user message as an input and output a response; identifying a response as a function of generating the user-action learner; and displaying the response within the virtual message user interface.

Relation extraction across sentence boundaries

Systems, methods, and computer-readable media provide entity relation extraction across sentences in a document using distant supervision. A computing device can receive an input, such as a document comprising a plurality of sentences. The computing device can identify syntactic and/or semantic links between words in a sentence and/or between words in different sentences, and extract relationships between entities throughout the document. A knowledge base (e.g., a table, chart, database etc.) of entity relations based on the extracted relationships can be populated. An output of the populated knowledge base can be used by a classifier to identify additional relationships between entities in various documents. Machine learning can be applied to train the classifier to predict relations between entities. The classifier can be trained using known entity relations, syntactic links and/or semantic links.

Differential indexing for fast database search

Methods, systems, and computer programs are presented for improving search speed and quality using differential indexing. One method includes an operation for building a first index for a database, the first index being for first tokens resulting from normalizing words in input data. Further, the method includes building a second index for the database, the second index being for second tokens comprising words of the input data eliminated from the first index during the normalizing. The method further includes operations for receiving a raw query for a search of the database, and for generating a search query based on tokens of the raw query. The search query comprises a combined search of the first index and the second index. A search is performed based on the search query, and results of the search are returned for presentation on a display.

Deriving signature-based rules for creating events from machine data

Methods and apparatus consistent with the invention provide the ability to organize and build understandings of machine data generated by a variety of information-processing environments. Machine data is a product of information-processing systems (e.g., activity logs, configuration files, messages, database records) and represents the evidence of particular events that have taken place and been recorded in raw data format. In one embodiment, machine data is turned into a machine data web by organizing machine data into events and then linking events together.

GENERATION OF PROCESS MODELS IN DOMAINS WITH UNSTRUCTURED DATA
20210390128 · 2021-12-16 ·

A computing server configured to process data of a domain from heterogeneous data sources. A domain may store data and schema, domain knowledge ontology such as resource description framework, and unstructured data. The computing server may extract objects from the unstructured data. The computing server may convert the extracted named entities and activities to word embeddings and input the word embeddings to a machine learning model to generate an activity time sequence. The machine learning model may be a long short-term memory. A process model may be generated from the time sequence. The computing server may identify outliers in the process model based on metrics defined by the domain. The computing server may convert transactions without outliers as word embeddings and generate signatures of the transactions using cosine similarity. The computing server may augment the results with the domain knowledge ontology.