G06F16/316

Smart exchange database index

A full-text index can be created for each mailbox of an EDB to facilitate the performance of complex queries to quickly search for email data. In this way, relevant email data can be identified and retrieved quickly and efficiently from the full-text index rather than from the EDB. To create such indexes, each email in a mailbox can be retrieved and processed to convert the email from its native format into textual name/value pairs which can then be submitted for indexing. This use of name/value pairs to index each email enables the emails across all mailboxes to be efficiently queried using any possible combination of values.

POST-SPEECH RECOGNITION REQUEST SURPLUS DETECTION AND PREVENTION

Systems and methods for determining that artificial commands, in excess of a threshold value, are detected by multiple voice activated electronic devices is described herein. In some embodiments, numerous voice activated electronic devices may send audio data representing a phrase to a backend system at a substantially same time. Text data representing the phrase, and counts for instances of that text data, may be generated. If the number of counts exceeds a predefined threshold, the backend system may cause any remaining response generation functionality that particular command that is in excess of the predefined threshold to be stopped, and those devices returned to a sleep state. In some embodiments, a sound profile unique to the phrase that caused the excess of the predefined threshold may be generated such that future instances of the same phrase may be recognized prior to text data being generated, conserving the backend system's resources.

Contextual overlay for documents

Approaches provide for analyzing document data to provide contextual overlays. For example, an application executing on a computing device (or at least in communication with the computing device) can analyze document data to determine a set of keywords based on features extracted from the document data. The keywords can be used to query an index of websites based on a relevance function in order to determine websites that are most relevant to the text identified from the document, at least some of which can be analyzed using a search engine to identify contextual information in the websites associated with the document. Thereafter, the contextual information can be provided for display with the document.

Framework for Analyzing Table Data by Question Answering Systems

A question answering (QA) system comprising memory for storing instructions, and a processor configured to execute the instructions to ingest source documents that include structured data and unstructured data to create a knowledge base, wherein the unstructured data includes table data; create table annotations to represent the table data; store the ingested structured data, unstructured data, and the table annotations in the knowledge base; and determine answers to questions using the knowledge base.

INFORMATION EXTRACTION FROM OPEN-ENDED SCHEMA-LESS TABLES
20200104350 · 2020-04-02 ·

Systems and methods for generating and annotating cell documents include extracting tables from a document using a table extraction engine. Headers are extracted for each of the tables using a header detection engine. Cells are extracted from each of the tables using a cell extraction engine. A cell document is generated for each of the cells which are each correlated to corresponding portions of the headers, each cell document recording the correlation between the cells and the the headers. Each cell document is annotated to generate annotated cell documents with a cell recognition model trained to perform natural language processing on the cell documents by classifying each term in each of the cell documents and extracting relationships between the terms of each of the cell documents.

MACHINE LEARNING WORKER NODE ARCHITECTURE

A database contains a corpus of incident reports, a machine learning (ML) model trained to calculate paragraph vectors of the incident reports, and a look-up set table that contains a list of paragraph vectors respectively associated with sets of the incident reports. A plurality of ML worker nodes each store the look-up set table and are configured to execute the ML model. An update thread is configured to: determine that the look-up set table has expired; update the look-up set table by: (i) adding a first set of incident reports received since a most recent update of the look-up set table, and (ii) removing a second set of incident reports containing timestamps that are no longer within a sliding time window; store, in the database, the look-up set table as updated; and transmit, to the ML worker nodes, respective indications that the look-up set table has been updated.

Electronic device and control method

Disclosed are an artificial intelligence (AI) system using a machine learning algorithm such as deep learning, and an application thereof. The present disclosure provides an electronic device comprising: an input unit for receiving content data; a memory for storing information on the content data; an audio output unit for outputting the content data; and a processor, which acquires a plurality of data keywords by analyzing the inputted content data, matches and stores time stamps, of the content data, respectively corresponding to the plurality of acquired keywords, based on a user command being inputted, searches for a data keyword corresponding to the inputted user command among the stored data keywords, and plays the content data based on the time stamp corresponding to the searched data keyword.

Apparatus and method for providing indexing and search service based on important sentence

Disclosed herein are an apparatus and method for providing a search service based on important sentences. The apparatus for providing a search service based on important sentences includes memory in which at least one program and a previously trained word importance measurement model are recorded and a processor for executing the program. The program may include a word importance measurement unit for measuring the importance of each of multiple words included in input text in the corresponding input text based on the word importance measurement model and a sentence importance measurement unit for measuring the importance of each of at least one sentence included in the text based on the measured importance of each of the multiple words.

Generation of process models in domains with unstructured data

A computing server configured to process data of a domain from heterogeneous data sources. A domain may store data and schema, domain knowledge ontology such as resource description framework, and unstructured data. The computing server may extract objects from the unstructured data. The computing server may convert the extracted named entities and activities to word embeddings and input the word embeddings to a machine learning model to generate an activity time sequence. The machine learning model may be a long short-term memory. A process model may be generated from the time sequence. The computing server may identify outliers in the process model based on metrics defined by the domain. The computing server may convert transactions without outliers as word embeddings and generate signatures of the transactions using cosine similarity. The computing server may augment the results with the domain knowledge ontology.

UNSTRUCTURED DATA FUSION BY CONTENT-AWARE CONCURRENT DATA PROCESSING PIPELINE
20200082015 · 2020-03-12 ·

The disclosure relates to a data analytics platform in which a linear pipeline processing framework may use an abstracted query language to define a data fusion pipeline assembly mechanism. More particularly, the linear pipeline processing framework may include various operator groups that work in conjunction to organize data entries that can have substantially disparate data types (e.g., text, binary, video, audio, etc.) into a single normalized stream such that one or more processing modules may perform type-specific data processing and feature extraction, normalize an output into a single stream, and finally render the different data types as a fused output.