Patent classifications
G06F18/232
Systems for Estimating Terminal Event Likelihood
In implementations of systems for estimating terminal event likelihood, a computing device implements a termination system to receive observed data describing values of a treatment metric and indications of a terminal event. Values of the treatment metric are grouped into groups using a mixture model that represents the treatment metric as a mixture of distributions. Parameters of a distribution are estimated for each of the groups and mixing proportions are also estimated for each of the groups. In response to receiving a user input requesting an estimate of a likelihood of the terminal event for a particular value of the treatment metric, the termination system generates an indication of the estimate of the likelihood of the terminal event for the particular value based on a distribution density at the particular value for each of the groups and a probability of including the particular value in each of the groups.
System, Method, and Computer Program Product for Segmenting Accounts
Described are a system, method, and computer program product for segmenting a plurality of accounts. The method includes processing transaction data for a plurality of transactions conducted by a plurality of accounts using a plurality of account identifiers, the transaction data for each transaction including data identifying the transaction as an electronic transaction or a physical transaction, segmenting the plurality of accounts into at least two groups including an active customer group and an inactive customer group based on the transaction data for each transaction conducted by each of the plurality of accounts, determining a third subset of customers from the second subset of customers based on at least one predicative model and a transaction profile of each customer of the second subset of customers, and automatically enrolling the third subset of customers into an automated campaign.
VARIABLE DENSITY-BASED CLUSTERING ON DATA STREAMS
In some implementations, a device may receive, from a data stream, a set of data points arranged in a dimensional data space. The device may compare the set of data points to identify one or more clusters using values of a distance parameter for data points included in the set of data points, wherein the values of distance parameter includes different values of the distance parameter for different data points. The device may transmit an indication of the one or more clusters to cause a device to display information associated with the one or more clusters. The device may receive, from the device, feedback information associated with at least one data point, wherein the feedback information indicates that at least one data point is associated with an error. The device may modify a value of the distance parameter associated with the at least one data point to a modified value.
DEEP LEARNING-BASED USE OF PROTEIN CONTACT MAPS FOR VARIANT PATHOGENICITY PREDICTION
The technology disclosed relates to a variant pathogenicity classifier. The variant pathogenicity classifier comprises memory and runtime logic. The memory stores (i) a reference amino acid sequence of a protein, (ii) an alternative amino acid sequence of the protein that contains a variant amino acid caused by a variant nucleotide, and (iii) a protein contact map of the protein. The runtime logic has access to the memory, and is configured to provide (i) the reference amino acid sequence, (ii) the alternative amino acid sequence, and (iii) the protein contact map as input to a first neural network, and to cause the first neural network to generate a pathogenicity indication of the variant amino acid as output in response to processing (i) the reference amino acid sequence, (ii) the alternative amino acid sequence, and (iii) the protein contact map.
Method for computation relating to clumps of virtual fibers
A computer-implemented method for processing a set of virtual fibers into a set of clusters of virtual fibers, usable for manipulation on a cluster basis in a computer graphics generation system, may include determining aspects for virtual fibers in the set of virtual fibers, determining similarity scores between the virtual fibers based on their aspects, and determining an initial cluster comprising the virtual fibers of the set of virtual fibers. The method may further include instantiating a cluster list in at least one memory, adding the initial cluster to the cluster list, partitioning the initial cluster into a first subsequent cluster and a second subsequent cluster based on similarity scores among fibers in the initial cluster, adding the first subsequent cluster and the second subsequent cluster to the cluster list, and testing whether a number of clusters in the cluster list is below a predetermined threshold.
Data analysis device, data analysis method and data analysis program
A data analysis device 10 comprises: a frequency analysis unit 11 that performs frequency analysis, under a predetermined condition, on each piece of a plurality of training data pieces including a plurality of class training data pieces some of which have been assigned a label indicating the data class; a cluster analysis unit 12 that clusters the frequency analyzed training data pieces into a number of classes of frequency analyzed training data; a computation unit 13 that computes, on the basis of the clusters, the degree to which frequency analyzed training data pieces assigned the same label are not included in the same cluster; and a selection unit 14 that selects, as a clustering model for assigning a label to a training data piece, clustering results according to the cluster analysis unit 12 when the smallest degree was computed, from among the plurality of computed degrees.
TEMPORAL-BASED VISUALIZED IDENTIFICATION OF COHORTS OF DATA POINTS PRODUCED FROM WEIGHTED DISTANCES AND DENSITY-BASED GROUPING
A user-selected group of data points is received. Weighted distances between further data points with the user-selected group of data points are computed, the weighted distances computed based on respective weights assigned to dimensions of data points. Density-based grouping of the further data points is performed based on the computed weighted distances, the density-based grouping producing cohorts of data points. A graphical visualization is generated including pixels representing the user-selected group of data points and the cohorts of data points. The graphical visualization provides a temporal-based visualized identification of the cohorts with the user selected group of data points.
System and Method for Electronic Chat Production
Systems, methods, and computer program products for adaptively splitting electronic chats are provided. One embodiment includes receiving, by an electronic discovery system, an electronic chat comprising a set of electronic chat messages, each of the electronic chat messages in the set of electronic chat messages having a timestamp; determining a set of time gaps between the electronic chat messages from the set of electronic chat messages, based on selecting a Gaussian mixture model as a model of the time gaps, splitting the set of electronic chat message into a set of conversations based on the Gaussian mixture model; performing a text analysis on the set of conversations based on a chat subject matter identified in the set of electronic chat messages; and splitting the set of conversations based on the chat subject matter.
PRIVACY PRESERVING AND DE-CENTRALIZED DETECTION OF GLOBAL OUTLIERS
Systems and methods are provided for implementing a distributed training by exchanging learnt parameters generated from unsupervised machine learning (ML) modeling. Each device in a distributed network may implement the unsupervised ML model to determine clusters of input data and/or determine a centroid of each determined cluster. The approximate centroid location of each cluster of data may be transmitted to other network devices in the local computing environment or other distributed computing environments. Each device may share their list of centroids of the clusters with other network devices (e.g., to implement swarm learning). These distributed network devices may compare the received centroids with centroids generated from a local ML model at each network device and initiate an action in response to the comparison.
PRIVACY PRESERVING AND DE-CENTRALIZED DETECTION OF GLOBAL OUTLIERS
Systems and methods are provided for implementing a distributed training by exchanging learnt parameters generated from unsupervised machine learning (ML) modeling. Each device in a distributed network may implement the unsupervised ML model to determine clusters of input data and/or determine a centroid of each determined cluster. The approximate centroid location of each cluster of data may be transmitted to other network devices in the local computing environment or other distributed computing environments. Each device may share their list of centroids of the clusters with other network devices (e.g., to implement swarm learning). These distributed network devices may compare the received centroids with centroids generated from a local ML model at each network device and initiate an action in response to the comparison.