G06F2216/03

Densely grouping dimensional data

Methods, computer systems, and stored instructions are described herein for densely grouping dimensional data and/or aggregating data using a data structure, such as one that is constructed based on dimensional data. When smaller tables are joined with a larger table, a server may analyze the smaller tables first to determine actual value combinations that occur in the smaller tables, and these actual value combinations are used to more efficiently process the larger table. A dense data structure may be generated by processing dimensional data before processing data from fact table. The dense data structure may be generated by compressing ranges of values that are possible in dimensions into a range of values that actually occurs in the dimensions. The compressed range of values may be represented by dense set identifiers rather than the actual compressed range of values.

Online analytic processing cube with time stamping
09830366 · 2017-11-28 · ·

In one example embodiment, a system and method are shown for receiving data that include a time stamp. The system and method also include building an Online Analytical Processing (OLAP) cube that includes a dimension, the dimension acting as a schema for the data that include the time stamp. The system and method may also include populating the OLAP cube with an object, the object including the data and the time stamp as at least one attribute. The system and method may also include storing the OLAP cube.

SURFACING UNIQUE FACTS FOR ENTITIES
20230177360 · 2023-06-08 ·

Systems and methods identify and provide interesting facts about an entity. An example method includes selecting documents associated with at least one unique fact trigger, the documents being from a document repository. The method also includes generating entity-sentence pairs from the documents and, for a first entity of the entities represented by the entity-sentence pairs, clustering the entity-sentence pairs for the first entity using salient terms occurring in the sentence. The method also includes determining a representative sentence for each of the clusters and providing at least one of the representative sentences in response to a query that identifies the first entity. Another example method includes determining that a query relates to an entity in a knowledge base, determining that the entity has an associated unique fact list, and providing at least one of the unique facts in the list in response to the query.

SYSTEMS AND METHODS FOR PROTECTING USER PRIVACY IN NETWORKED DATA COLLECTION

Disclosed herein are systems and methods for protecting user privacy in networked data collection. One embodiment takes the form of a method that includes obtaining a user-data request that is associated with a requesting party. The method also includes preparing a first candidate response to the user-data request, where the first candidate response is based at least in part on data that is associated with a first user. The method also includes receiving additional candidate responses that are respectively based on data that is respectively associated with a plurality of additional users. The method also includes determining a privacy level of the first candidate response based at least in part on the received plurality of additional candidate responses. The method also includes determining that the privacy level exceeds a privacy threshold, and responsively sending, to the requesting party, a user-data response associated with the user-data request.

DISTRIBUTED SEQUENTIAL PATTERN MINING (SPM) USING STATIC TASK DISTRIBUTION STRATEGY
20170308584 · 2017-10-26 ·

Seed patterns are derived from a sequence database. Execution costs for types of seed patterns are computed. Each seed pattern is iteratively distributed to distributed nodes along with that seed pattern's assigned execution cost. The distributed nodes processing in parallel to mine the sequence database for super patterns found in the sequence database. When a distributed node exhausts its execution budget, any remaining mining needed for the seed pattern being mined is reallocated to another distributed node having remaining execution budget.

Re-sizing data partitions for ensemble models in a mapreduce framework

Techniques are described for revising data partition size for use in generating predictive models. In one example, a method includes determining an initial number of base model partitions of data from a plurality of data sources; determining an initial base model partition size based at least in part on the initial number of base model partitions; and evaluating the initial base model partition size at least in part with reference to at least one base model partition size reference. The method further includes determining a finalized number of base model partitions based at least in part on the initial base model partition size; determining a revised base model partition size; and generating revised base models based at least in part on the revised base model partition size, including using a predictive modeling framework to randomly assign input data records from the plurality of data sources into the base model partitions.

HYBRID ENSEMBLE MODEL LEVERAGING EDGE AND SERVER SIDE INFERENCE
20220058494 · 2022-02-24 ·

In an approach for a hybrid ensemble model leveraging edge and server side inference, a processor receives data on an edge device. A processor sends the data to a server. A processor performs, in parallel, inference on the data using a first model on the edge device and a second model on the server. A processor returns a result of the second model to the edge device. A processor ensembles, on the edge device, a result of the first model and the result of the second model based on a set of weights to produce an ensembled result. A processor outputs the ensemble result for a user to view through a user interface of the edge device.

Method of creating process protocols
11256708 · 2022-02-22 · ·

A computer-implemented method of creating a process protocol in a local computer system is provided. The local computer system comprises a processor and a storage device, wherein the process protocol is created from raw data, which is stored in part in a first external computer system and in part in a second external computer system, wherein the raw data in the external computer systems is stored in a number of data tables, wherein the raw data comprises data, which is created during the execution of processes in the first external computer system and in the second external computer system.

Advanced data collection block identification
11669588 · 2023-06-06 · ·

Systems and methods that allow examination of response data collected from content providers and provide for classification and routing according to the classification. The process of classification employs an unsupervised, or partially unsupervised, Machine Learning classifier model for identifying data collection responses that contains no data, mangled data, or a block, for assigning a classification correspondingly and for feeding the classification decision back to a data collection platform.

System and method for biomarker-outcome prediction and medical literature exploration

A system and method for biomarker-outcome prediction and medical literature exploration which utilizes a data platform to analyze, optimize, and explore the knowledge contained in or derived from clinical trials. The system utilizes a knowledge graph and data analysis engine capabilities of the data platform. The knowledge graph may be used to link biomarkers with molecules, proteins, and genetic data to provide insight into the relationship between biomarkers, outcomes, and adverse events. The system uses natural language processing techniques on a large corpus of medical literature to perform advanced text mining to identify biomarkers associated with adverse events and to curate a comprehensive profile of biomarker-outcome associations. These associations may then be ranked to identify the most-common biomarker-outcome association pairs. Having a comprehensive profile of ranked biomarker-outcome data allows the system to predict biomarkers associated with a given disease and serious adverse events linked to biomarker data.