G06F16/316

System and method for document data extraction, data indexing, data searching and data filtering

Systems and methods are described for extracting data from digital documents, indexing the data, and providing a user interface for filtering the data and generating a document based on the filtered data. In one implementation, a method includes extracting data from one or more digital documents, the extracted data including elements of a first type, the elements of the first type including key-value pairs; indexing the extracted data; hosting a web-based application instance, the web-based application instance including a user interface for searching the indexed data and filtering elements of the first type based on rules defined by a user of the user interface; receiving rules for filtering the elements of the first type; and filtering the elements of the first type based on the received rules.

INFORMATION PROCESSING APPARATUS AND NON-TRANSITORY COMPUTER READABLE MEDIUM STORING PROGRAM

An information processing apparatus includes a segment obtaining section that obtains a segment described in a document designated by a user, an extraction condition obtaining section that obtains an extraction condition for extracting information including a concept related to the segment as knowledge information from a concept structure information storage section storing concept structure information in which concepts representing events and relationships related to knowledge are related to each other in a hierarchical structure, a specifying section that specifies a storage location of the knowledge information in the concept structure information storage section and an extraction method for the concept included in the knowledge information from a designated content of the extraction condition, an extraction section that extracts the knowledge information in accordance with the specified extraction method from the storage location specified by the specifying section, and a presentation section that presents the knowledge information to the user.

Edoc utility using non-structured-query-language databases

A database management system for processing large volumes of data in a key-value store database is provided. The system may be configured to receive a plurality of filled fillable request forms where each request form may include a request including a plurality of field labels and a plurality of fillable text fields corresponding to each of the plurality of the field labels. The system may be configured to extract each set of inputted data from each fillable text field. The system may be configured to store, in the key-value store database, for each request form, each of the plurality of field labels and the corresponding set of inputted data as a combination key-value pair. The combination key may be equal to a WIP ID number, form ID number and field ID number. The corresponding value may be equal to the set of data of the corresponding field ID number.

METHODS AND ARRANGEMENTS TO ADJUST COMMUNICATIONS

Logic may adjust communications between customers. Logic may cluster customers into a first group associated with a first subset of synonyms and a second group associated with a second subset of the synonyms. Logic may associate a first tag with the first group and with each of the synonyms of the first subset. Logic may associate a second tag with the second group and with each of the synonyms of the second subset. Logic may associate one or more models with pairs of the groups. A first pair may comprise the first group and the second group. The first model associated with the first pair may adjust words in communications between the first group and the second group, based on the synonyms associated with the first pair, by replacement of words in a communication between customers of the first subset and customers of the second sub set.

KNOWLEDGE GRAPHING PLATFORM
20200265075 · 2020-08-20 ·

Knowledge graphing can include: generating a document selection interface that enables a user to browse at least one document storage service and select a set of documents stored on the document storage service for inclusion in a knowledge set; obtaining a set of meta-data for each document selected for the knowledge set from the respective document storage service; and generating a knowledge graph that spatially depicts a set of relationships among the documents of the knowledge set in terms of the meta-data.

Building a data query engine that leverages expert data preparation operations

A method, system and computer program product for building a data query engine. Initial taxonomies that describe and categorize data are built by expert users (e.g., data scientists) employing machine learning algorithms. The data is also indexed and stored in an index. Queries are then received from non-expert users to query the data based on data categorization from built taxonomies and the indexing. After the queries are executed using the machine learning algorithms in an environment (e.g., Hadoop), the results of the queries are rated for relevance, precision and accuracy. The machine learning algorithms are also rated based on the number of successful queries. Those machine learning algorithms with a rating above a threshold are identified to be utilized to scan new data to be stored in the index to provide a new environment that replaces the initial environment.

Compressing method, compressing apparatus, and computer-readable recording medium

A non-transitory computer-readable recording medium stores a compressing program that causes a computer to execute a process including: extracting words from a file serving as a processing target; counting how many times each of the extracted words appears; registering bit strings each expressing, in multiple bits, the number of times of appearance into an index so as to be kept in correspondence with the words and the file; among the plurality of bit strings registered in the index while being kept in correspondence with the words and the file, each rearranging, within the bit string, bits included in a first bit string and bits included in a second bit string, so as to be in a different order; and compressing the index in which the bits have been rearranged, by using mutually-different mathematical functions.

DOCUMENT INDEXING, SEARCHING, AND RANKING WITH SEMANTIC INTELLIGENCE
20200257712 · 2020-08-13 ·

There is a need for solutions that perform preprocessing and/or searching of documents with semantic intelligence. This need can be addressed by, for example, by performing pre-processing of each document of a plurality of documents to generate an indexed representation for the document by identifying sentences in the document; determining, for each n-gram of one or more n-grams associated with the document, one or more n-gram semantic scores based semantic proximity indicators for the n-gram; determining, based at least in part on each one or more n-gram semantic scores, one or more sentence semantic labels for each sentence in the document; and determining the indexed representation for the document based at least in part on the one or more sentence semantic labels for the document; performing the search query based each indexed representation associated with a document; and transmitting the result to a computing device associated with the search query.

Information extraction from open-ended schema-less tables

Systems and methods for generating and annotating cell documents include extracting tables from a document using a table extraction engine. Headers are extracted for each of the tables using a header detection engine. Cells are extracted from each of the tables using a cell extraction engine. A cell document is generated for each of the cells which are each correlated to corresponding portions of the headers, each cell document recording the correlation between the cells and the the headers. Each cell document is annotated to generate annotated cell documents with a cell recognition model trained to perform natural language processing on the cell documents by classifying each term in each of the cell documents and extracting relationships between the terms of each of the cell documents.

Data structures for storing and manipulating longitudinal data and corresponding novel computer engines and methods of use thereof
10740345 · 2020-08-11 · ·

In some embodiments, the present disclosure provides for an exemplary computer-implemented system that may include a longitudinal data engine, including: a processor and specialized index generation software to generate: an index data structure for a respective event type associated with each respective subject or object; where each respective index data structure is a respective event type-specific data schema, defining how to store events of a particular event type to form longitudinal data of each respective subject or object; an ontology data structure that is configured to describe one or more properties of a respective event of a respective subject or object; and longitudinal data extraction software to extract a respective longitudinal data for a plurality of index data structures and a plurality of ontology data structures associated with a plurality of subjects or objects.