Patent classifications
G06F16/24556
SYSTEMS AND METHODS FOR IDENTIFYING EVENTS THAT SHARE A CHARACTERISTIC
A method including receiving, from a device, interaction data associated with an event including a first identifier, identifying a second identifier associated with the first identifier, identifying one or more previous interactions associated with the first identifier or the second identifier, annotating the interaction data based on the identified one or more previous interactions, wherein the annotation indicates a preference for previous interactions associated with the first identifier or previous interactions associated with the second identifier, and transmitting an indication that the event is associated with one or more of the previous interactions, wherein the one or more previous interactions are determined based on the preference indicated by the annotation.
MANAGING MACHINE OPERATIONS USING ENCODED MULTI-SCALE TIME SERIES DATA
Techniques for managing machine operations using encoded multi-scale time series data are provided. In one technique, operational data is received from a sensor coupled to an industrial device. For each portion in a first set of portions of the operational data (where each portion corresponds to a first time scale), first aggregated data is generated based on time series data from that portion and a first encoding is generated based on the first aggregated data. For each portion in a second set of portions of the operational data (where each portion of the second set corresponds to a second time scale that is different than the first time scale), second aggregated data is generated based on time series data from that portion and a second encoding is generated based on the second aggregated data. The operational data is classified to determine a condition of the industrial device during the time interval based on the first and second encodings.
Query response using semantically similar database records
From a first attribute-value pair in a record, new data is created including a first token. Using a first model and using a processor and a memory, each token is vectorized into new data including a corresponding vector. From the record, a target row is selected, wherein a target attribute-value pair in the target row includes a value for which a semantic similarity computation is to be performed. Using a similarity measure, a set of most similar rows to the target row is determined, wherein each row in the set of most similar rows to the target row has a corresponding similarity measure above a threshold similarity measure and wherein each row in the set of most similar rows includes the target attribute. The set of most similar rows is used to compute a response to a database query.
Database management system and database processing method
The database management system (DBMS) receives a first instruction specifying anonymization rule information corresponding to a column of the relation table, among anonymization rule information that is present for each column included in the relation table and shows a plurality of generalization rules. The DBMS reads the column from the relation table in response to the first instruction, and generates a temporary result obtained by generalizing each attribute value of the column based on any of a plurality of generalization rules. The DBMS generates an aggregate result obtained by aggregating the temporary result. The DBMS generates an anonymization method including generalization information indicating a correspondence relationship of each attribute value of the column and any of the plurality of generalization rules when the aggregate result satisfies a disclosure rule. The DBMS generates anonymization information as a result of processing the relation table based on the first anonymization method.
Efficient aggregation of time series data
Efficient aggregation of time series data is disclosed, including: obtaining a first entry value corresponding to an item, wherein the first entry value comprises a first recorded data point that is associated with a first time interval; generating a compressed block based at least in part on compressing the first entry value with at least a second entry value; storing the compressed block in a document corresponding to the item; determining that the item matches an aggregation search query; decompressing the compressed block from the document corresponding to the item to obtain the first entry value and the second entry value; and generating an aggregation result in response to the aggregation search query based on at least a portion of the first entry value and the second entry value.
Database search enhancement and interactive user interface therefor
A method, system and computer program product for database search enhancement and interactive user interface therefor. Database records are ranked by a match score, calculated using a plurality of criteria for determining a match level between values of a record and a search query, and using a plurality of priority parameters for aggregating the match level determined. A plurality of top-ranking records is selected, and a diversity measure is calculated therefor, using at least one class label assigned to records therein. If a sufficiency condition is not met by the diversity measure, at least one reference set of records sharing a class label in common is extracted from the plurality of top-ranking records and analyzed for determining at least one modification to the search query in improvement of the diversity measure, the scoring computational operator is accordingly redefined, and the process reiterates; otherwise, the plurality of top-ranking records is outputted.
Method, apparatus, and computer program product for identifying privacy risks in datasets
Embodiments described herein relate to establishing a privacy risk score between two datasets based on features common to the datasets. Methods may include: receiving a first dataset of probe data points defining a trajectory; receiving a second dataset of the probe data points defining the trajectory; identifying a plurality of features common to the first dataset and the second dataset; computing a privacy risk value for the identified features common to the first dataset and the second dataset; and computing an aggregate privacy risk score between the first dataset and the second dataset.
SYSTEM AND METHOD FOR DISJUNCTIVE JOINS
Joining data using a disjunctive operator is described. An example computer-implemented method can include generating, with a processing device, a query plan for a query, the query comprising a join operator expression for a disjunctive predicate, wherein the join operator expression includes a conjunctive predicate and a disjunctive operator. The method may further include generating a bloom filter for the disjunctive operator. Additionally, the method may include generating a result set as a result of evaluating the join operator expression using the disjunctive operator and bloom filter for the disjunctive predicate.
Estimating Accuracy of Privacy-Preserving Data Analyses
Systems and methods for estimating the accuracy, in the form of confidence intervals, of data released under Differential Privacy (DP) mechanisms and their aggregation. Reasoning about the accuracy of aggregated released data can be improved by combining the use of probabilistic bounds like union and Chernoff bounds. Some probabilistic bounds, e.g., Chernoff bounds, rely on detecting statistical independence of random variables, which in this case corresponds to sources of statistical noise of DP mechanisms. To detect such independence, and provide accuracy calculations, provenance of statistical noise sources as well as information flows of random variables are tracked within data analyses, i.e., where, within data analyses, randomly generated statistical noise propagates and how it gets manipulated.
Identifying missing questions by clustering and outlier detection
A machine learning system may be used to suggest clinical questions to ask during or after a patient appointment. A first encoder may encode information and a second encoder may encode second information related to the current patient appointment. An aggregate encoding may be generated using the encoded first information and encoded second information. The current patient appointment may be clustered with similar appointments based on the aggregate encoding. Outlier analysis may be performed to determine if the appointment is an outlier, and, if so, which features contribute the most to outlier status. The system may generate one or more questions to ask about the features that contribute the most to outlier status.