Patent classifications
G06F16/328
DOCUMENT RETRIEVAL SYSTEM AND RETRIEVAL METHOD
A document retrieval system includes a storage device and a retrieval server. The storage device stores: document data including documents compressed so as to correspond to a plurality of divided document groups; indexes indicating correspondence relations between partial character strings for retrieving the documents and first identifiers indicating the documents; and a first correspondence table in which the document groups are divided into document small groups, and second identifiers which indicate the document small groups correspond to the partial character strings. The retrieval server: retrieves, in response to an input of a retrieval character string, a second identifier corresponding to a partial character string included in the retrieval character string, from the first correspondence table; decompresses document data in a document group; retrieves a first identifier corresponding to the partial character string; obtains document data corresponding to the retrieved first identifier; and outputs the retrieved first identifier.
Reading and Information Enhancement System and Method
A written document (hereinafter referred to as a “work,” on electronic format which includes, stories, novels, education texts, biographies, compilations, collections, anthologies, tracts, and any other traditional format for relatively extensive texts) provides access to reference, bibliography and/or definition material through an electronic software capability associated with the work. Depending upon reader access information or characteristics (e.g., age, grade, proficiency, or position within the work or any other identifiable reader characteristic or access limitation), any request for reference material, definitions, explanations, translations, or other material provided in the associated software capability is automatically limited by system acknowledgement of the reader access information or characteristics. As the reader's access information or characteristics change, the quality and/or quantity and/or format of requested information with respect to a work changes.
Method, apparatus, and computer program product for classification and tagging of textual data
Provided herein are systems, methods and computer readable media for classification and tagging of textual data. An example method may include accessing a corpus comprising a plurality of documents, each document having one or more labels indicative of services offered by a merchant, generating a query based on extracted features and the documents, generating a precision score for at least a portion of the generated query and selecting a subset of the generated queries based on an assigned precision score satisfying a precision score threshold, the selected subset of the generated queries configured to provide an indication of one or more labels to be applied to machine readable text. A second example method, utilized for tagging machine readable text with unknown labels, may include assigning a label to textual portions of the machine readable text based on results of the application of the queries.
Generating and using a customized index
In various embodiments, systems and methods are provided for generating and using a customized index. In embodiments, an index structure is constructed to efficiently utilize machines containing index portions. In this regard, the index structure for a particular application is customizable such that a number of virtual index units for a particular index type and/or a number of machines associated with the virtual index units for the particular index type can be optimized for machine and/or system performance and efficiency. Utilizing the constructed index structure, documents can be distributed to various index units, virtual index units, and/or machines in real-time or near real-time. Further, the customized index structure can be used to efficiently serve search results in response to search queries.
Taxonomy enrichment using ensemble classifiers
A taxonomy of categories, attributes, and values can be conflated with new data triplets by identifying one or more conflation candidates among the attribute-value pairs within a category of the taxonomy that matches the category of the data triplet, and determining a suitable merge action for conflating the data triplet with each conflation candidate. The task of determining merge actions may be cast as a classification problem, and may be solved by an ensemble classifier.
Sensitive Data Evaluation
Evaluating risk of sensitive data associated with a target data set includes a computer system receiving a pattern that defines sensitive data and a selection of a data set as the target data set for evaluating. The system determines portions of the target data set from which to select sample data sets and determines, responsive to a confidence limit and sizes of the respective portions of the target data, a size of a sample data set for each respective target data set portion. The system randomly samples the target data set portions to provide sample data sets of the determined sample data set sizes and determines whether there is an occurrence of the sensitive data in each sample data set by searching for the pattern in the sample data sets. The system determines a proportion of the sample data sets that have the occurrence of the sensitive data.
DATA ANALYTICS SYSTEMS AND METHODS
Data analytics systems and methods are disclosed herein. A parser can parse reference data from various data sources to store in a data structure. An uploader can receive study data designated by a researcher and store the study data in the data structure. A matcher can compare analyte nameset data in the study data with analyte nameset data from the reference data to generate one or more links each correlating an instance of an analyte in the study data with an instance of that analyte in the reference data. Library overlays each include one or more modules to access reference data to generate organized associations of reference data. A calculation engine can receive a selection of one or more library overlay(s) and manipulate the reference data and study data according to the organized associations of the selected library overlay(s) to generate configured data stored in a collection of data caches for presentation to a researcher via a user interface.
END-TO-END EMAIL TAG PREDICTION
A system provides automatic, end-to-end tagging of email messages. While a message is being composed at a sending email client, the server may receive email information that is used as an input to a predictive model. The model identifies tags that are available to a specific user group or email list that apply to the email message. These predicted tags are sent back to the email client, where they may be embedded in the email message with other user-defined tags. As the message is passed through the email server, the system may use any changes made to the predicted tags to retrain the model. When the message is received at a second email client, the receiver may further edit the tags, and any changes may again be used to retrain the model.
DIGITAL PROCESSING SYSTEMS AND METHODS FOR THIRD PARTY BLOCKS IN AUTOMATIONS IN COLLABORATIVE WORK SYSTEMS
Systems, methods, and computer-readable media for remotely automating changes to third party applications from within a primary application are disclosed. The systems and methods may involve maintaining in the primary application, a table having rows, columns, and cells at intersections of the rows and columns, wherein the primary application is configured to enable the construction of automations defined by conditional rules for altering internal information in the primary application and external information in the third party applications; receiving an automation definition conditional on specific information input into at least one specific cell in the table of the primary application, wherein the automation definition is constructed using internal blocks and external blocks, the external blocks having links to the external third party applications; monitoring the at least one specific cell of the primary application for an occurrence of the specific information.
METHOD, APPARATUS, AND COMPUTER PROGRAM PRODUCT FOR CLASSIFICATION AND TAGGING OF TEXTUAL DATA
Provided herein are systems, methods and computer readable media for classification and tagging of textual data. An example method may include accessing a corpus comprising a plurality of documents, each document having one or more labels indicative of services offered by a merchant, generating a query based on extracted features and the documents, generating a precision score for at least a portion of the generated query and selecting a subset of the generated queries based on an assigned precision score satisfying a precision score threshold, the selected subset of the generated queries configured to provide an indication of one or more labels to be applied to machine readable text. A second example method, utilized for tagging machine readable text with unknown labels, may include assigning a label to textual portions of the machine readable text based on results of the application of the queries.