Patent classifications
G06F2216/03
Methods and systems relating to network based storage retention
Cloud storage provides accessible interfaces, near-instant elasticity, scalability, multi-tenancy, and metered resources in a distributed framework providing fault tolerant solutions with high data durability. Stored data may have legal or compliance requirements defining retention periods ensuring the data is preserved without modification for a period of time. However, data privacy rules such as the European Union's General Data Protection Regulation can require modification or destruction of records at any point. Further, many retention structures are user driven but users make mistakes requiring a change to the record's associated retention period. Retention period mechanism enforced with immutable storage can satisfy compliance requirements but run contrary to data privacy rules as well as blocking adjustments. Accordingly, processes, methods and systems are required allowing retention policy application to data being stored within network based storage as well as allowing retention policies to be applied to stored data thereby facilitating retention period adjustments.
Optimizing insight generation in heterogeneous datasets
Embodiments relate to a system, computer program product, and method to merge two or more heterogeneous datasets. Seed attributes of each dataset that is the subject of the merge are identified. The seed attributes are derived from candidate attributes of the respective datasets. A correlation is assessed to create a set of mergeable attributes and a set of non-mergeable attributes. A cohesiveness characteristic is leveraged to iteratively identify one or more attributes from the set of non-mergeable attributes, and to amend the set of mergeable attributes with the one or more attributes identified in the set of non-mergeable attributes. A merged dataset based on the amended set of mergeable attributes and representing non-trivial similarities between the first and second dataset is formed as output.
Alife machine learning system and method
Roughly described, a problem solving platform distributes the solving of the problem over a evolvable individuals, each of which also evolves its own pool of actors. The actors have the ability to contribute collaboratively to a solution at the level of the individual, instead of each actor being a candidate for the full solution. Populations evolve both at the level of the individual and at the level of actors within an individual. In an embodiment, an individual defines parameters according to which its population of actors can evolve. The individual is fixed prior to deployment to a production environment, but its actors can continue to evolve and adapt while operating in the production environment. Thus a goal of the evolutionary process at the level of individuals is to find populations of actors that can sustain themselves and survive, solving a dynamic problem for a given domain as a consequence.
Machine learning based prediction of outcomes associated with populations of users
A system analyzes periodically collected data associated with entities, for example, users, servers, or systems. The system determines anomalies associated with populations of entities. The system excludes anomalies from consideration to increase efficiency of execution. The system may rank the anomalies based on relevance scores. The system determines relevance scores based on various factors describing the sets of entities. The system may present information describing the anomalies based on the ranking. The system may use a machine learning based model for predicting likelihoods of outcomes associated with sets of entities. The system generates alerts for reporting the outcomes based on the predictions.
Mini-batch top-k-medoids for extracting specific patterns from CGM data
A method is provided for clustering data elements to extract specific patterns. The method specifies some data elements with a uniform distribution as a mini-batch and performs a single-pass cluster initialization by selecting a respective data element from the mini-batch as a respective initial cluster center to obtain cluster centers for clusters. The method assigns each data element in the mini-batch to a closest cluster by calculating a distance between each of the data elements in the mini-batch and each of the clusters. The method assigns k-minimum new centers by calculating an averaged distance to each data element in a same cluster. The method repeats the specifying step and the assigning steps responsive to a dissatisfaction of loop stop criteria which is based on distances between the centers and the K-minimum new centers. The method outputs a cluster id sequence responsive to a satisfaction of the loop stop criteria.
Frequent source code pattern mining
A data mining technique is used to find large frequently-occurring source code patterns from methods/APIs that can be used in code development. Simplified trees that represent the syntactic structure and type and method usage of a source code fragment, such as a method, are mined to find closed and maximal frequent subtrees which represent the largest frequently-occurring source code patterns or idioms associated with a particular type and method usage. These idioms are then used in an idiom web service and/or a code completion system to assist users in the development of source code programs.
ASSOCIATION RULE MINING USING MAX PATTERN TRANSACTIONS
Embodiments of the present invention provide systems and methods for associating rule mining in data comprising first and second variables, by generating a first and second filtering bitmap. The first filtering bitmap represents a set of values for the second variable for each of a plurality of values of the first variable. The second filtering bitmap represents, associations between frequent values of the second variable, which enables the identification of frequent 2-pattern itemsets. Thus, by performing first logical operations on the frequent 2-pattern itemsets, frequent high order-pattern itemsets may be identified. A max pattern itemset may be identified among the frequent high order-pattern itemsets. As a result, embodiments may mitigate performance and stability problems associated with conventional association rule mining approaches.
FRAMEWORK FOR IMPLEMENTING PROCESS APPS FOR PROCESS MINING
Systems and methods for performing process mining are provided. Data from one or more source systems is extracted by a data connector of a process app. The extracted data is transformed into a normalized data model by transforms of the process app. One or more process mining algorithms of the process app are applied to the normalized data. Results of the one or more process mining algorithms are presented to a user via a user interface of the process app.
Method, device and system for estimating causality among observed variables
A method, device and system for estimating causality among observed variables are provided. In response to receiving observation data of a plurality of observed variables, a causality objective function is determined, based on fitting inconsistencies when fitting is performed using the observed variables and a sparse constraint for a causal network structure. The fitting inconsistencies are adjusted based on weighting factors of the observed variables, wherein a weighting factor of an observed variable indicates a minimum underestimate value of cost required for fitting a target variable using all other observed variables than the above observed variable. Then, the causality among the plurality of observed variables is estimated by using the observations data to optimally solve the causality objective function through sparse causal reasoning under a directed acyclic graph constraint.
Unifying terms of interest from a dataset of electronic documents
A method is provided for analyzing and interpreting a dataset composed of electronic documents including free-form text. The method includes unifying terms of interest in the collection of terms of interest to identify variants of the terms of interest. This includes identifying candidate variants of a term of interest based on semantic similarity between the term of interest and other terms in the database, determined using an unsupervised machine learning algorithm. Linguistic features and contextual features of the term of interest and its candidate variants are extracted, at least the contextual features being extracted using the unsupervised machine learning algorithm. And a supervised machine learning algorithm is used with the linguistic features and contextual features to identify variants of the term of interest from the candidate variants, such as for application to generate features of the documents for data analytics performed thereon.