G06F16/316

On-demand, dynamic and optimized indexing in natural language processing

Indexing natural language processing, a request is received from a user to access a document at a server, the server routes the request to an indexing server. A validation service checks if the CUID of the document is available in the indexing server repository or a file system associated with the indexing server. If the CUID of dataset exists, determine if a timestamp of the new document matches the timestamp of the previously indexed document. Upon determining that the above conditions are fulfilled, the previously indexed data is returned to the server. If it is determined that the above conditions do not match, then a transformation service is invoked at the indexing server. The transformation service compares a hash value of a dataset. If the transformation service determines that the hash value of a dataset in the document is not available, an indexing service is invoked to index the document.

System and methods for units-based numeric information retrieval
09830378 · 2017-11-28 · ·

An information retrieval and analysis system for numeric data which provides high precision and recall for numeric search and uses a methodology for determining contextualization of the extracted data. The capabilities include extracting, parsing, and contextualizing numeric data including both a numeric value and an accompanying unit. This system facilitates the organization of largely unstructured numeric data into an inverted index and other database formats. An information retrieval system which enables the exploration and refinement of an extracted numeric data set defined by a search input that may be precise or initially vague. This system also facilitates analyzing and portraying numeric data graphically, creating knowledge by combining data from multiple sources, extracting correlations between seemingly disparate variables, and recognizing numeric data trends. This system uses local natural language processing, mathematical analysis, and expert-based scientific heuristics to score the numeric and contextual relevancy of the data to the query parameters.

METHODS AND SYSTEMS FOR PROVIDING A SEARCH SERVICE APPLICATION

A system for providing a search service application is disclosed and includes an application builder component that provides a search model for a first object of a plurality of objects. The search model is based at least on an end-user input field corresponding to a first attribute of the first object and a search result output field corresponding to a second attribute of the first object. The search model is also associated with a backend data store that supports a storage structure that stores information relating to the first object. The system also includes a deployment engine that automatically configures a search engine system associated with the backend data store to place a portion of indexed data into a first partition and to place another portion of indexed data into at least another partition based on the search model.

Dynamic detection of cross-document associations

Systems and methods are configured to generate a set of related document objects for a predictive entity and/or to generate an optimal document sequence for a set of related document objects. In one embodiment, a set of related document objects for a predictive entity is generated by processing entity metadata features associated with the predictive entity using an entity-document correlation machine learning model, and an optimal document sequence is generated for the set of related document objects by processing the set of related document objects using a document sequence optimization machine learning model.

Methods and systems for indexing references to documents of a database and for locating documents in the database
09824109 · 2017-11-21 · ·

Methods and systems allow indexing references to documents of a database according to database reference profiles. Documents may then be located in the database using decoding protocols based on the database reference profiles. To this end, the documents are stored in the database and searchable terms extracted therefrom are associated with posting lists. Each posting list is divided into blocks of M database references. The blocks are encoded according to a pattern that depends on the M database references. A corresponding pointer to a table of encoding patterns is appended to each block. When a query is received for a searchable term, blocks are extracted from a posting list corresponding to the searchable term and a pointer for each block is used to extract a decoding protocol related to an encoding pattern for the block.

METHOD AND APPARATUS FOR INFORMATION ACQUISITION, ELECTRONIC DEVICE, AND COMPUTER-READABLE STORAGE MEDIUM
20230169100 · 2023-06-01 ·

The present disclosure relates to the field of natural language processing technologies, and more particularly, to a method and an apparatus for information acquisition, an electronic device, and a computer-readable storage medium. The method includes: recognizing at least one entity retrieval word in a to-be-answered question; performing information retrieval according to the at least one entity retrieval word to obtain a retrieval text in a sub-graph form corresponding to the at least one entity retrieval word; determining a retrieval text in a target sub-graph form by matching the retrieval text in the sub-graph form with the to-be-answered question; and determining a target answer to the to-be-answered question according to the retrieval text in the target sub-graph form.

Centralized coordination of data collection tasks from multiple sources
09807192 · 2017-10-31 · ·

A scheduler manages execution of a plurality of data-collection jobs, assigns individual jobs to specific forwarders in a set of forwarders, and generates and transmits tokens (e.g., pairs of data-collection tasks and target sources) to assigned forwarders. The forwarder uses the tokens, along with stored information applicable across jobs, to collect data from the target source and forward it onto an indexer for processing. For example, the indexer can then break a data stream into discrete events, extract a timestamp from each event and index (e.g., store) the event based on the timestamp. The scheduler can monitor forwarders' job performance, such that it can use the performance to influence subsequent job assignments. Thus, data-collection jobs can be efficiently assigned to and executed by a group of forwarders, where the group can potentially be diverse and dynamic in size.

Document Collaboration Discovery

Technologies are described herein for document collaboration discovery. A collaboration system enables users to collaboratively author documents. The collaboration system receives edits to a document in real or near real time, and indexes the edits in a search index. The collaboration system can also receive and index metadata associated with the document. The collaboration system can also receive a search query from a user and perform a search of the search index. If the document is identified by the search, the user can request to be admitted as an active editor of the document. The user can also request to join a real-time messaging session with other active editors of the document. The active editors can be notified of the search terms that led the user to the document, and indicate whether the user is to be admitted to the document as an active editor or the real-time messaging session.

COMPUTER READABLE RECORDING MEDIUM, INDEX GENERATION DEVICE AND INDEX GENERATION METHOD
20170300507 · 2017-10-19 · ·

An index generation device 100 generates key presence information for a plurality of input files when lexical analysis on the plurality of input files are executed, the key presence information including information whether each of a plurality of keys is present in the plurality of input files and and presence positions of the respective plurality of keys when the respective plurality of keys are present in the plurality of input files. The index generation device 100 generates index information about the keys and the positions for the plurality of input files based on the key presence information.

Electronic device and control method

Disclosed are an artificial intelligence (AI) system using a machine learning algorithm such as deep learning, and an application thereof. The present disclosure provides an electronic device comprising: an input unit for receiving content data; a memory for storing information on the content data; an audio output unit for outputting the content data; and a processor, which acquires a plurality of data keywords by analyzing the inputted content data, matches and stores time stamps, of the content data, respectively corresponding to the plurality of acquired keywords, based on a user command being inputted, searches for a data keyword corresponding to the inputted user command among the stored data keywords, and plays the content data based on the time stamp corresponding to the searched data keyword.